Arxiv Day: Article

Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications. To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest-available dependence of $\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$ relative to prior work. With a 2D gridworld goal-reaching navigation experiment, we demonstrate that MAC achieves higher reward than a previous PG-based method for average reward, Parameterized Policy Gradient with Advantage Estimation (PPGAE), especially in cases with relatively small training sample budget restricting trajectory length.

Updated: 2024-05-08 23:59:23

标题: 使用多级演员-评论家模型在平均奖励强化学习中实现全局最优性，无需混合时间预言。

摘要: 在平均奖励强化学习的背景下，对于混合时间的预测需要具备神谕知识，混合时间是一个马尔可夫链在固定策略下达到稳态分布所需的时间的度量，这对于策略梯度方法的全局收敛提出了重大挑战。这一要求特别棘手，因为在具有大状态空间的环境中估计混合时间的困难和昂贵，导致在实际应用中需要非常长的轨迹来有效地估计梯度。为了解决这一限制，我们考虑多层次演员-评论家（MAC）框架，该框架包含多层次蒙特卡罗（MLMC）梯度估计器。通过我们的方法，我们有效地减轻了对混合时间知识的依赖，这在平均奖励MDPs全局收敛中是首次。此外，我们的方法相对于先前的工作展示了最紧密的依赖关系，为O（sqrt（τmix））。通过一个2D网格世界目标导航实验，我们展示了MAC在平均奖励方面的表现比之前基于PG的方法更好，尤其是在相对较小的训练样本预算限制轨迹长度的情况下。

更新时间: 2024-05-08 23:59:23

领域: cs.LG

下载: http://arxiv.org/abs/2403.11925v2

Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data

The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration (RPQ) for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration (HyTQ: pronounced height-Q). To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework. Finally, we provide theoretical guarantees on the performance of the learned policies of our algorithms on systems with arbitrary large state space.

Updated: 2024-05-08 23:52:37

标题: 无模型稳健$φ$-分歧强化学习利用离线和在线数据

摘要: 坚韧的φ正则化马尔可夫决策过程（RRMDP）框架着重于设计控制策略，使其对参数不确定性具有鲁棒性，这是由于仿真器（标称）模型与实际环境设置之间存在不匹配。这项工作有两个重要贡献。首先，我们提出了一种无模型算法，称为坚韧的φ正则化拟合Q-迭代（RPQ），用于学习一个仅使用在标称模型上通过执行行为策略（具有鲁棒性探索要求）收集的历史数据的ε-最优坚韧策略。据我们所知，我们首次为一类φ-散度提供了一个统一分析，该类φ-散度在具有一般函数逼近的高维系统中实现了鲁棒最优策略。其次，我们引入了混合的坚韧的φ正则化强化学习框架，以使用历史数据和在线抽样学习一个最优的坚韧策略。为了这一框架，我们提出了一种无模型算法，称为混合的坚韧的全变差正则化Q-迭代（HyTQ：读作height-Q）。据我们所知，我们首次在具有一般函数逼近的大规模问题中，提出了改进的超出数据分布假设，这是在混合的坚韧的φ正则化强化学习框架下。最后，我们在具有任意大状态空间的系统上，为我们算法学习到的策略的性能提供了理论保证。

更新时间: 2024-05-08 23:52:37

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.05468v1

AFEN: Respiratory Disease Classification using Ensemble Learning

We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.

Updated: 2024-05-08 23:50:54

标题: AFEN：使用集成学习进行呼吸系统疾病分类

摘要: 我们提出了AFEN（音频特征集成学习），这是一种利用卷积神经网络（CNN）和XGBoost进行集成学习的模型，用于对一系列呼吸系统疾病进行最先进的音频分类。我们使用精心选择的音频特征混合，这些特征提供了数据的显著属性，并允许准确分类。然后，提取的特征被用作两个独立模型分类器的输入：1）多特征CNN分类器和2）XGBoost分类器。两个模型的输出然后通过软投票进行融合。因此，通过利用集成学习，我们实现了增强的稳健性和准确性。我们在一个包含920个呼吸声音的数据库上评估了模型的性能，该数据库经过数据增强技术以增加数据的多样性和模型的泛化能力。我们经验验证了AFEN利用Precision和Recall作为度量标准，创造了一个新的最先进水平，同时将训练时间减少了60%。

更新时间: 2024-05-08 23:50:54

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.05467v1

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Like a criminal under investigation, Large Language Models (LLMs) might pretend to be aligned while evaluated and misbehave when they have a good opportunity. Can current interpretability methods catch these 'alignment fakers?' To answer this question, we introduce a benchmark that consists of 324 pairs of LLMs fine-tuned to select actions in role-play scenarios. One model in each pair is consistently benign (aligned). The other model misbehaves in scenarios where it is unlikely to be caught (alignment faking). The task is to identify the alignment faking model using only inputs where the two models behave identically. We test five detection strategies, one of which identifies 98% of alignment-fakers.

Updated: 2024-05-08 23:44:08

标题: 揭示伪装对齐的位置伪装者LLMs通过操纵其内部

摘要: 像被调查的罪犯一样，大型语言模型（LLMs）可能在评估时假装是对齐的，但在有机会时表现不端。当前的可解释性方法能否捕捉到这些“对齐伪装者”？为了回答这个问题，我们介绍了一个基准测试，包括324对在角色扮演场景中细化选择行动的LLMs。每对中的一个模型始终是良性的（对齐的）。另一个模型在不太可能被发现的情况下表现不端（伪装对齐）。任务是仅使用两个模型表现相同的输入来识别伪装对齐模型。我们测试了五种检测策略，其中一种能够识别出98％的伪装对齐者。

更新时间: 2024-05-08 23:44:08

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.05466v1

Vidur: A Large-Scale Simulation Framework For LLM Inference

Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur - a large-scale, high-fidelity, easily-extensible simulation framework for LLM inference performance. Vidur models the performance of LLM operators using a combination of experimental profiling and predictive modeling, and evaluates the end-to-end inference performance for different workloads by estimating several metrics of interest such as latency and throughput. We validate the fidelity of Vidur on several LLMs and show that it estimates inference latency with less than 9% error across the range. Further, we present Vidur-Search, a configuration search tool that helps optimize LLM deployment. Vidur-Search uses Vidur to automatically identify the most cost-effective deployment configuration that meets application performance constraints. For example, Vidur-Search finds the best deployment configuration for LLaMA2-70B in one hour on a CPU machine, in contrast to a deployment-based exploration which would require 42K GPU hours - costing ~218K dollars. Source code for Vidur is available at https://github.com/microsoft/vidur.

Updated: 2024-05-08 23:42:13

标题: Vidur：LLM推断的大规模仿真框架

摘要: 大型语言模型（LLMs）的部署优化目前是昂贵的，因为它需要通过实验性地运行应用负载来探索由系统旋钮（如并行化策略、批处理技术和调度策略）形成的大型配置空间来进行LLM实现。为了解决这一挑战，我们提出了Vidur - 一个大规模、高保真度、易于扩展的LLM推理性能仿真框架。Vidur使用实验性分析和预测建模的组合来对LLM操作的性能进行建模，并通过估算诸如延迟和吞吐量等多个感兴趣的度量指标来评估不同工作负载的端到端推理性能。我们验证了Vidur在多个LLM上的保真度，并表明它在整个范围内估算推理延迟的误差小于9％。此外，我们提出了Vidur-Search，一个用于优化LLM部署的配置搜索工具。Vidur-Search使用Vidur来自动识别满足应用程序性能约束的最具成本效益的部署配置。例如，Vidur-Search在CPU机器上一小时内找到LLaMA2-70B的最佳部署配置，而基于部署的探索则需要42K GPU小时 - 成本约218K美元。Vidur的源代码可在https://github.com/microsoft/vidur上找到。

更新时间: 2024-05-08 23:42:13

领域: cs.LG

下载: http://arxiv.org/abs/2405.05465v1

Cross-Modality Translation with Generative Adversarial Networks to Unveil Alzheimer's Disease Biomarkers

Generative approaches for cross-modality transformation have recently gained significant attention in neuroimaging. While most previous work has focused on case-control data, the application of generative models to disorder-specific datasets and their ability to preserve diagnostic patterns remain relatively unexplored. Hence, in this study, we investigated the use of a generative adversarial network (GAN) in the context of Alzheimer's disease (AD) to generate functional network connectivity (FNC) and T1-weighted structural magnetic resonance imaging data from each other. We employed a cycle-GAN to synthesize data in an unpaired data transition and enhanced the transition by integrating weak supervision in cases where paired data were available. Our findings revealed that our model could offer remarkable capability, achieving a structural similarity index measure (SSIM) of $0.89 \pm 0.003$ for T1s and a correlation of $0.71 \pm 0.004$ for FNCs. Moreover, our qualitative analysis revealed similar patterns between generated and actual data when comparing AD to cognitively normal (CN) individuals. In particular, we observed significantly increased functional connectivity in cerebellar-sensory motor and cerebellar-visual networks and reduced connectivity in cerebellar-subcortical, auditory-sensory motor, sensory motor-visual, and cerebellar-cognitive control networks. Additionally, the T1 images generated by our model showed a similar pattern of atrophy in the hippocampal and other temporal regions of Alzheimer's patients.

Updated: 2024-05-08 23:38:02

标题: 用生成对抗网络进行跨模态转换以揭示阿尔茨海默病生物标志物

摘要: 最近，生成式方法在神经影像学中跨模态转换方面引起了广泛关注。虽然大多数先前的工作集中在病例对照数据上，但将生成模型应用于特定疾病数据集以及它们保留诊断模式的能力仍然相对未被探索。因此，在这项研究中，我们调查了在阿尔茨海默病（AD）背景下使用生成对抗网络（GAN）来从功能网络连接（FNC）和T1加权结构磁共振成像数据中生成相互之间的数据。我们采用了一个循环GAN来合成未配对数据过渡，并在有配对数据的情况下通过整合弱监督来增强过渡。我们的研究结果显示，我们的模型能够提供显著的能力，对T1图像的结构相似性指数测量（SSIM）为$0.89 \pm 0.003$，对FNC的相关性为$0.71 \pm 0.004$。此外，我们的定性分析显示，在将AD与认知正常（CN）个体进行比较时，生成数据与实际数据之间存在相似的模式。特别是，我们观察到在小脑感觉运动和小脑视觉网络中功能连接明显增加，而在小脑-皮层下、听觉感觉运动、感觉运动-视觉和小脑-认知控制网络中连接减少。此外，我们模型生成的T1图像显示出阿尔茨海默病患者海马和其他颞叶区域的萎缩模式相似。

更新时间: 2024-05-08 23:38:02

领域: q-bio.NC,cs.LG,eess.IV

下载: http://arxiv.org/abs/2405.05462v1

Taking a Moment for Distributional Robustness

A rich line of recent work has studied distributionally robust learning approaches that seek to learn a hypothesis that performs well, in the worst-case, on many different distributions over a population. We argue that although the most common approaches seek to minimize the worst-case loss over distributions, a more reasonable goal is to minimize the worst-case distance to the true conditional expectation of labels given each covariate. Focusing on the minmax loss objective can dramatically fail to output a solution minimizing the distance to the true conditional expectation when certain distributions contain high levels of label noise. We introduce a new min-max objective based on what is known as the adversarial moment violation and show that minimizing this objective is equivalent to minimizing the worst-case $\ell_2$-distance to the true conditional expectation if we take the adversary's strategy space to be sufficiently rich. Previous work has suggested minimizing the maximum regret over the worst-case distribution as a way to circumvent issues arising from differential noise levels. We show that in the case of square loss, minimizing the worst-case regret is also equivalent to minimizing the worst-case $\ell_2$-distance to the true conditional expectation. Although their objective and our objective both minimize the worst-case distance to the true conditional expectation, we show that our approach provides large empirical savings in computational cost in terms of the number of groups, while providing the same noise-oblivious worst-distribution guarantee as the minimax regret approach, thus making positive progress on an open question posed by Agarwal and Zhang (2022).

Updated: 2024-05-08 23:37:25

标题: 把握分布鲁棒性的重要性

摘要: 最近有大量研究致力于研究分布鲁棒学习方法，旨在学习一个在整个人群中表现良好的假设。我们认为，尽管最常见的方法是通过最小化在各种分布中的最坏情况损失来寻找最优解，但更合理的目标是最小化到每个协变量给定标签的真实条件期望的最坏情况距离。专注于最小最大损失目标可能会在某些分布中包含高水平标签噪声时导致明显失败。我们引入了一种基于所谓的对抗性矩违规的新的最小最大目标，并展示通过最小化该目标等同于最小化到真实条件期望的最坏情况$\ell_2$距离，如果我们将对手的策略空间设定得足够丰富。先前的研究表明，通过最小化最坏情况分布上的最大后悔来规避不同噪声水平带来的问题。我们展示，在均方损失的情况下，最小化最坏情况后悔也等同于最小化到真实条件期望的最坏情况$\ell_2$距离。尽管他们的目标和我们的目标都是最小化到真实条件期望的最坏情况距离，我们展示了我们的方法在计算成本方面在群组数量上提供了大量的经验性节省，同时提供了与最小化后悔方法相同的对噪声不敏感的最坏分布保证，从而在Agarwal和Zhang（2022）提出的一个未解问题上取得积极进展。

更新时间: 2024-05-08 23:37:25

领域: cs.LG

下载: http://arxiv.org/abs/2405.05461v1

Automated Program Repair: Emerging trends pose and expose problems for benchmarks

Machine learning (ML) now pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important differences between these applications of ML and earlier work. Evaluations and comparisons must take care to ensure that results are valid and likely to generalize. A challenge is that the most popular APR evaluation benchmarks were not designed with ML techniques in mind. This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated.

Updated: 2024-05-08 23:09:43

标题: 自动程序修复：新兴趋势对基准测试提出和暴露问题

摘要: 机器学习（ML）现在在自动程序修复（APR）领域广泛应用。算法利用神经机器翻译和大型语言模型（LLMs）生成软件补丁等任务。但是，这些ML应用与之前的工作之间存在重要差异。评估和比较必须小心确保结果有效且可能泛化。一个挑战是，最流行的APR评估基准并非是针对ML技术设计的。这对LLMs尤为重要，其大型且通常不透明的训练数据集可能包含其评估时遇到的问题。

更新时间: 2024-05-08 23:09:43

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2405.05455v1

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model's superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios.

Updated: 2024-05-08 22:54:04

标题: 马科维茨遇见贝尔曼：用于投资组合管理的知识蒸馏强化学习

摘要: 投资组合在金融领域至关重要，需要平衡潜在回报和风险。本文介绍了一种混合方法，将马科维茨的投资组合理论与强化学习相结合，利用知识蒸馏来训练代理人。具体而言，我们提出的方法称为KDD（知识蒸馏DDPG），包括两个训练阶段：监督学习阶段和强化学习阶段。经过训练的代理人优化投资组合的组装。通过与标准金融模型和人工智能框架进行对比分析，使用回报率、夏普比率和九个评估指标等指标，显示出我们模型的优越性。它显著实现了最高的收益和夏普比率为2.03，确保在可比较的回报情景中以最低的风险获得最高的盈利能力。

更新时间: 2024-05-08 22:54:04

领域: q-fin.CP,cs.LG

下载: http://arxiv.org/abs/2405.05449v1

GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields

The 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much less Gaussian splats, leading to the more efficient storage and thus higher computational performance during both training and rendering. Thanks to the sparsity, during the view synthesis, only a small mount of pixels are needed, leading to much higher computational performance ($100\sim 1000\times$ faster). And the 2D image can be recovered from the gradients via solving a Poisson equation with linear computation complexity. Several experiments are performed to confirm the sparseness of the gradients and the computation performance of the proposed method. The method can be applied various applications, such as human body modeling and indoor environment modeling.

Updated: 2024-05-08 22:40:52

标题: GDGS：梯度域高斯光斑技术用于辐射场稀疏表示

摘要: 3D高斯光斑方法正在变得流行。然而，它们直接作用于信号，导致信号的密集表示。即使通过一些技术，如修剪或蒸馏，结果仍然是密集的。在本文中，我们提出对原始信号的梯度进行建模。梯度比原始信号稀疏得多。因此，梯度使用的高斯光斑要少得多，从而在训练和渲染过程中实现更高的存储和计算性能。由于稀疏性，在视图合成过程中，只需要少量像素，从而实现更高的计算性能（速度提高100到1000倍）。并且可以通过求解具有线性计算复杂度的泊松方程从梯度中恢复2D图像。进行了几项实验以确认梯度的稀疏性和所提出方法的计算性能。该方法可应用于各种应用，如人体建模和室内环境建模。

更新时间: 2024-05-08 22:40:52

领域: cs.CV,cs.AI,cs.GR,cs.LG,eess.IV

下载: http://arxiv.org/abs/2405.05446v1

More Efficient $k$-wise Independent Permutations from Random Reversible Circuits via log-Sobolev Inequalities

We prove that the permutation computed by a reversible circuit with $\tilde{O}(nk\cdot \log(1/\varepsilon))$ random $3$-bit gates is $\varepsilon$-approximately $k$-wise independent. Our bound improves on currently known bounds in the regime when the approximation error $\varepsilon$ is not too small. We obtain our results by analyzing the log-Sobolev constants of appropriate Markov chains rather than their spectral gaps.

Updated: 2024-05-08 22:38:35

标题: 通过对数Sobolev不等式，从随机可逆电路中获得更高效的$k$-wise独立排列

摘要: 我们证明，通过使用$\tilde{O}(nk\cdot \log(1/\varepsilon))$个随机$3$比特门计算的排列是$\varepsilon$-近似$k$-独立的。我们的界限改善了在近似误差$\varepsilon$不太小的情况下已知的界限。我们通过分析适当马尔可夫链的对数Sobolev常数而不是它们的谱间隙来得到我们的结果。

更新时间: 2024-05-08 22:38:35

领域: cs.CC,cs.CR

下载: http://arxiv.org/abs/2406.08499v1

Large Language Model Enhanced Machine Learning Estimators for Classification

Pre-trained large language models (LLM) have emerged as a powerful tool for simulating various scenarios and generating output given specific instructions and multimodal input. In this work, we analyze the specific use of LLM to enhance a classical supervised machine learning method for classification problems. We propose a few approaches to integrate LLM into a classical machine learning estimator to further enhance the prediction performance. We examine the performance of the proposed approaches through both standard supervised learning binary classification tasks, and a transfer learning task where the test data observe distribution changes compared to the training data. Numerical experiments using four publicly available datasets are conducted and suggest that using LLM to enhance classical machine learning estimators can provide significant improvement on prediction performance.

Updated: 2024-05-08 22:28:57

标题: 大型语言模型增强的机器学习分类器

摘要: 预训练的大型语言模型(LLM)已经成为一种强大的工具，用于模拟各种场景并生成给定特定指令和多模态输入的输出。在这项工作中，我们分析了LLM的具体用途，以增强经典监督机器学习方法用于分类问题。我们提出了一些方法，将LLM整合到经典机器学习估计器中，以进一步提高预测性能。我们通过标准监督学习二元分类任务以及一个转移学习任务来检验所提出方法的性能，其中测试数据观察到与训练数据相比的分布变化。我们进行了数值实验，使用四个公开可用数据集，并建议使用LLM来增强经典机器学习估计器可以显著提高预测性能。

更新时间: 2024-05-08 22:28:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.05445v1

Evaluating Students' Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large

Evaluating open-ended written examination responses from students is an essential yet time-intensive task for educators, requiring a high degree of effort, consistency, and precision. Recent developments in Large Language Models (LLMs) present a promising opportunity to balance the need for thorough evaluation with efficient use of educators' time. In our study, we explore the effectiveness of LLMs ChatGPT-3.5, ChatGPT-4, Claude-3, and Mistral-Large in assessing university students' open-ended answers to questions made about reference material they have studied. Each model was instructed to evaluate 54 answers repeatedly under two conditions: 10 times (10-shot) with a temperature setting of 0.0 and 10 times with a temperature of 0.5, expecting a total of 1,080 evaluations per model and 4,320 evaluations across all models. The RAG (Retrieval Augmented Generation) framework was used as the framework to make the LLMs to process the evaluation of the answers. As of spring 2024, our analysis revealed notable variations in consistency and the grading outcomes provided by studied LLMs. There is a need to comprehend strengths and weaknesses of LLMs in educational settings for evaluating open-ended written responses. Further comparative research is essential to determine the accuracy and cost-effectiveness of using LLMs for educational assessments.

Updated: 2024-05-08 22:23:58

标题: 评估学生的开放式书面回答与LLMs：使用RAG框架对GPT-3.5、GPT-4、Claude-3和Mistral-Large进行评估。

摘要: 评估学生的开放性书面考试答案是教育工作者必不可少但又耗时的任务，需要高度的努力、一致性和精准度。大语言模型(LLMs)的最新发展为平衡彻底评估和高效利用教育工作者时间提供了有希望的机会。在我们的研究中，我们探讨了LLMs ChatGPT-3.5、ChatGPT-4、Claude-3和Mistral-Large在评估大学生对他们所学参考资料提出的开放性问题的答案时的有效性。每个模型被指示在两种条件下反复评估54个答案：10次(10-shot)温度设置为0.0，和10次温度为0.5，期望每个模型总共评估1,080次，所有模型总共评估4,320次。RAG (检索增强生成)框架被用作框架，使LLMs处理答案的评估。截至2024年春季，我们的分析显示研究过的LLMs在一致性和评分结果上存在显著的差异。有必要了解LLMs在教育环境中评估开放性书面答案的优势和劣势。进一步的比较研究对于确定使用LLMs进行教育评估的准确性和成本效益至关重要。

更新时间: 2024-05-08 22:23:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.05444v1

Learning Embeddings for Sequential Tasks Using Population of Agents

We present an information-theoretic framework to learn fixed-dimensional embeddings for tasks in reinforcement learning. We leverage the idea that two tasks are similar if observing an agent's performance on one task reduces our uncertainty about its performance on the other. This intuition is captured by our information-theoretic criterion which uses a diverse agent population as an approximation for the space of agents to measure similarity between tasks in sequential decision-making settings. In addition to qualitative assessment, we empirically demonstrate the effectiveness of our techniques based on task embeddings by quantitative comparisons against strong baselines on two application scenarios: predicting an agent's performance on a new task by observing its performance on a small quiz of tasks, and selecting tasks with desired characteristics from a given set of options.

Updated: 2024-05-08 22:12:28

标题: 使用代理人群学习顺序任务的嵌入

摘要: 我们提出了一个信息论框架，用于在强化学习任务中学习固定维度的嵌入。我们利用这样一个观点，即如果观察一个代理在一个任务上的表现可以减少我们对其在另一个任务上表现的不确定性，那么两个任务就是相似的。这种直觉被我们的信息论准则捕捉到，该准则利用多样化的代理群体作为近似来衡量在顺序决策环境中任务之间的相似性。除了定性评估外，我们还通过定量比较基于任务嵌入的技术与两个应用场景中的强基线的效果：通过观察代理在一系列任务上的表现来预测其在新任务上的表现，并从给定的一组选项中选择具有所需特性的任务。

更新时间: 2024-05-08 22:12:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2306.03311v2

How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by distribution shifts causing unpredictable changes in performance during deployment. To rigorously evaluate behavior cloning policies, we present a framework that provides a tight lower-bound on robot performance in an arbitrary environment, using a minimal number of experimental policy rollouts. Notably, by applying the standard stochastic ordering to robot performance distributions, we provide a worst-case bound on the entire distribution of performance (via bounds on the cumulative distribution function) for a given task. We build upon established statistical results to ensure that the bounds hold with a user-specified confidence level and tightness, and are constructed from as few policy rollouts as possible. In experiments we evaluate policies for visuomotor manipulation in both simulation and hardware. Specifically, we (i) empirically validate the guarantees of the bounds in simulated manipulation settings, (ii) find the degree to which a learned policy deployed on hardware generalizes to new real-world environments, and (iii) rigorously compare two policies tested in out-of-distribution settings. Our experimental data, code, and implementation of confidence bounds are open-source.

Updated: 2024-05-08 22:00:35

标题: 我的行为克隆策略有多具普遍性？一种统计方法来进行可信的性能评估

摘要: 随着随机生成模型在机器人策略学习中的兴起，端到端视觉运动策略越来越成功地通过从人类演示中学习来解决复杂任务。然而，由于真实世界的评估成本仅给用户提供了少量策略展开，准确评估这些策略的性能仍然是一个挑战。这一挑战加剧了分布转移导致在部署过程中性能出现不可预测变化的问题。为了严格评估行为克隆策略，我们提出了一个框架，利用最少数量的实验策略展开在任意环境下提供机器人性能的严格下界。值得注意的是，通过将标准随机排序应用于机器人性能分布，我们为给定任务的整个性能分布（通过累积分布函数的上下界）提供了最坏情况的界限。我们借鉴已建立的统计结果，确保这些界限以用户指定的置信水平和紧密度保持，并且尽可能少地从策略展开中构建。在实验中，我们评估了在模拟和硬件中进行视觉运动操作的策略。具体来说，我们（i）在模拟操作设置中经验验证了界限的保证，（ii）发现学习策略在硬件上部署时在新的真实世界环境中推广的程度，（iii）在分布之外的环境中严格比较了两个策略的测试。我们的实验数据、代码和置信界限的实现是开源的。

更新时间: 2024-05-08 22:00:35

领域: cs.RO,cs.AI,cs.LG,stat.AP

下载: http://arxiv.org/abs/2405.05439v1

UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis

This paper presents our team's participation in the MEDIQA-ClinicalNLP2024 shared task B. We present a novel approach to diagnosing clinical dermatology cases by integrating large multimodal models, specifically leveraging the capabilities of GPT-4V under a retriever and a re-ranker framework. Our investigation reveals that GPT-4V, when used as a retrieval agent, can accurately retrieve the correct skin condition 85% of the time using dermatological images and brief patient histories. Additionally, we empirically show that Naive Chain-of-Thought (CoT) works well for retrieval while Medical Guidelines Grounded CoT is required for accurate dermatological diagnosis. Further, we introduce a Multi-Agent Conversation (MAC) framework and show its superior performance and potential over the best CoT strategy. The experiments suggest that using naive CoT for retrieval and multi-agent conversation for critique-based diagnosis, GPT-4V can lead to an early and accurate diagnosis of dermatological conditions. The implications of this work extend to improving diagnostic workflows, supporting dermatological education, and enhancing patient care by providing a scalable, accessible, and accurate diagnostic tool.

Updated: 2024-05-08 21:57:24

标题: UMass-BioNLP在MEDIQA-M3G 2024中的表现：DermPrompt -- 使用GPT-4V进行皮肤诊断的提示工程系统性探索

摘要: 这篇论文介绍了我们团队参与MEDIQA-ClinicalNLP2024共享任务B的情况。我们提出了一种新颖的方法，通过整合大型多模型模型，特别是利用GPT-4V在检索器和重新排序框架下的能力，来诊断临床皮肤病例。我们的研究表明，当GPT-4V作为检索代理时，可以利用皮肤病图像和简要病史准确地检索出正确的皮肤病情况，准确率达到85%。此外，我们经验性地表明，天真的CoT对检索效果良好，而基于医学指南的CoT对准确的皮肤诊断是必需的。此外，我们引入了一个多代理对话（MAC）框架，并展示了它在最佳CoT策略上的优越性能和潜力。实验表明，使用天真的CoT进行检索和多代理对话进行基于批评的诊断，GPT-4V可以导致早期和准确的皮肤病情况诊断。这项工作的影响延伸到改善诊断工作流程，支持皮肤科教育，并通过提供一种可扩展、易获取和准确的诊断工具来增强患者护理。

更新时间: 2024-05-08 21:57:24

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.17749v2

Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression

The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs). Its essence lies in tracking the dynamics of two mutual information (MI) values: between the hidden layer output and the DNN input/target. According to the hypothesis put forth by Shwartz-Ziv & Tishby (2017), the training process consists of two distinct phases: fitting and compression. The latter phase is believed to account for the good generalization performance exhibited by DNNs. Due to the challenging nature of estimating MI between high-dimensional random vectors, this hypothesis was only partially verified for NNs of tiny sizes or specific types, such as quantized NNs. In this paper, we introduce a framework for conducting IB analysis of general NNs. Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values and comparison with MINE (Belghazi et al., 2018). Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.

Updated: 2024-05-08 21:52:52

标题: 深度神经网络的信息瓶颈分析通过有损压缩

摘要: 信息瓶颈（IB）原则提供了一个信息论框架，用于分析深度神经网络（DNNs）的训练过程。其核心在于跟踪两个互信息（MI）值的动态：隐藏层输出和DNN输入/目标之间的互信息。根据Shwartz-Ziv＆Tishby（2017）提出的假设，训练过程包括两个明显阶段：拟合和压缩。后者被认为解释了DNNs表现出的良好泛化性能。由于估计高维随机向量之间的互信息具有挑战性，因此这一假设仅针对微小规模或特定类型的NNs进行了部分验证，例如量化的NNs。在本文中，我们介绍了一个用于对一般NNs进行IB分析的框架。我们的方法利用Goldfeld等人（2019）提出的随机NN方法，并结合压缩步骤以克服与高维度相关的障碍。换句话说，我们估计高维随机向量的压缩表示之间的互信息。所提出的方法得到了理论和实际上的支持。值得注意的是，我们通过合成实验展示了我们的估计器的准确性，其中包括预定义的MI值和与MINE（Belghazi等人，2018）的比较。最后，我们对一个接近实际规模的卷积DNN进行了IB分析，揭示了互信息动态的新特征。

更新时间: 2024-05-08 21:52:52

领域: cs.LG,cs.IT,math.IT,94A16 (Primary) 68T07, 94A17 (Secondary),E.4; H.1.1

下载: http://arxiv.org/abs/2305.08013v2

Analysis and prevention of AI-based phishing email attacks

Phishing email attacks are among the most common and most harmful cybersecurity attacks. With the emergence of generative AI, phishing attacks can be based on emails generated automatically, making it more difficult to detect them. That is, instead of a single email format sent to a large number of recipients, generative AI can be used to send each potential victim a different email, making it more difficult for cybersecurity systems to identify the scam email before it reaches the recipient. Here we describe a corpus of AI-generated phishing emails. We also use different machine learning tools to test the ability of automatic text analysis to identify AI-generated phishing emails. The results are encouraging, and show that machine learning tools can identify an AI-generated phishing email with high accuracy compared to regular emails or human-generated scam email. By applying descriptive analytic, the specific differences between AI-generated emails and manually crafted scam emails are profiled, and show that AI-generated emails are different in their style from human-generated phishing email scams. Therefore, automatic identification tools can be used as a warning for the user. The paper also describes the corpus of AI-generated phishing emails that is made open to the public, and can be used for consequent studies. While the ability of machine learning to detect AI-generated phishing email is encouraging, AI-generated phishing emails are different from regular phishing emails, and therefore it is important to train machine learning systems also with AI-generated emails in order to repel future phishing attacks that are powered by generative AI.

Updated: 2024-05-08 21:40:49

标题: AI 基础钓鱼邮件攻击的分析与预防

摘要: 网络钓鱼邮件攻击是最常见和最有害的网络安全攻击之一。随着生成式人工智能的出现，网络钓鱼攻击可以基于自动生成的电子邮件，使其更难以被检测到。换句话说，生成式人工智能可以用来向每个潜在受害者发送不同的电子邮件，使网络安全系统在邮件到达收件人之前更难以识别诈骗邮件。在这里，我们描述了一个生成式人工智能生成的网络钓鱼邮件语料库。我们还使用不同的机器学习工具来测试自动文本分析的能力，以识别生成式人工智能生成的网络钓鱼邮件。结果令人鼓舞，并显示与常规邮件或人为制作的诈骗邮件相比，机器学习工具可以高准确度地识别生成式人工智能生成的网络钓鱼邮件。通过应用描述性分析，我们描绘了生成式人工智能生成的电子邮件与人为制作的诈骗邮件之间的具体差异，并显示生成式人工智能生成的电子邮件在风格上与人为生成的网络钓鱼邮件诈骗有所不同。因此，自动识别工具可以作为用户的警告。该论文还描述了开放给公众使用的生成式人工智能生成的网络钓鱼邮件语料库，并可用于后续研究。尽管机器学习检测生成式人工智能生成的网络钓鱼邮件的能力令人鼓舞，但生成式人工智能生成的网络钓鱼邮件与常规网络钓鱼邮件有所不同，因此重要的是还要使用生成式人工智能生成的电子邮件来训练机器学习系统，以抵御未来由生成式人工智能驱动的网络钓鱼攻击。

更新时间: 2024-05-08 21:40:49

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.05435v1

Dynamic GNNs for Precise Seizure Detection and Classification from EEG Data

Diagnosing epilepsy requires accurate seizure detection and classification, but traditional manual EEG signal analysis is resource-intensive. Meanwhile, automated algorithms often overlook EEG's geometric and semantic properties critical for interpreting brain activity. This paper introduces NeuroGNN, a dynamic Graph Neural Network (GNN) framework that captures the dynamic interplay between the EEG electrode locations and the semantics of their corresponding brain regions. The specific brain region where an electrode is placed critically shapes the nature of captured EEG signals. Each brain region governs distinct cognitive functions, emotions, and sensory processing, influencing both the semantic and spatial relationships within the EEG data. Understanding and modeling these intricate brain relationships are essential for accurate and meaningful insights into brain activity. This is precisely where the proposed NeuroGNN framework excels by dynamically constructing a graph that encapsulates these evolving spatial, temporal, semantic, and taxonomic correlations to improve precision in seizure detection and classification. Our extensive experiments with real-world data demonstrate that NeuroGNN significantly outperforms existing state-of-the-art models.

Updated: 2024-05-08 21:36:49

标题: 动态GNNs用于从脑电图数据中精确检测和分类癫痫

摘要: 诊断癫痫需要准确的癫痫发作检测和分类，但传统的手动脑电图信号分析需要大量资源。同时，自动化算法往往忽视了对解释大脑活动至关重要的脑电图的几何和语义特性。本文介绍了NeuroGNN，这是一个动态图神经网络（GNN）框架，捕捉了脑电极位置和其对应脑区的语义之间的动态相互作用。脑电极放置的特定脑区关键地塑造了捕获的脑电信号的性质。每个脑区统治着不同的认知功能、情绪和感觉处理，影响了脑电数据中的语义和空间关系。理解和建模这些复杂的脑部关系对于准确和有意义地洞察大脑活动至关重要。这正是所提出的NeuroGNN框架优越之处，它通过动态构建一个图，包含这些不断演变的空间、时间、语义和分类相关性，以提高癫痫检测和分类的精度。我们对真实世界数据进行了广泛的实验，结果表明NeuroGNN明显优于现有的最先进模型。

更新时间: 2024-05-08 21:36:49

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.09568v1

Searching for Programmatic Policies in Semantic Spaces

Syntax-guided synthesis is commonly used to generate programs encoding policies. In this approach, the set of programs, that can be written in a domain-specific language defines the search space, and an algorithm searches within this space for programs that encode strong policies. In this paper, we propose an alternative method for synthesizing programmatic policies, where we search within an approximation of the language's semantic space. We hypothesized that searching in semantic spaces is more sample-efficient compared to syntax-based spaces. Our rationale is that the search is more efficient if the algorithm evaluates different agent behaviors as it searches through the space, a feature often missing in syntax-based spaces. This is because small changes in the syntax of a program often do not result in different agent behaviors. We define semantic spaces by learning a library of programs that present different agent behaviors. Then, we approximate the semantic space by defining a neighborhood function for local search algorithms, where we replace parts of the current candidate program with programs from the library. We evaluated our hypothesis in a real-time strategy game called MicroRTS. Empirical results support our hypothesis that searching in semantic spaces can be more sample-efficient than searching in syntax-based spaces.

Updated: 2024-05-08 21:24:49

标题: 在语义空间中搜索程序化政策

摘要: 语法引导综合常用于生成编码政策的程序。在这种方法中，可以用领域特定语言编写的程序集定义了搜索空间，算法在该空间内搜索编码强政策的程序。在本文中，我们提出了一种用于综合编程政策的替代方法，其中我们在语言语义空间的近似中进行搜索。我们假设在语义空间中搜索比基于语法的空间更加高效。我们的理由是，如果算法在搜索过程中评估不同的代理行为，那么搜索将更加高效，而这一特征在基于语法的空间中经常缺失。这是因为程序语法的微小变化通常不会导致不同的代理行为。我们通过学习呈现不同代理行为的程序库来定义语义空间。然后，我们通过为局部搜索算法定义邻域函数来近似语义空间，在该函数中，我们用程序库中的程序替换当前候选程序的部分。我们在一个名为MicroRTS的实时战略游戏中评估了我们的假设。实证结果支持我们的假设，即在语义空间中搜索可能比在基于语法的空间中搜索更加高效。

更新时间: 2024-05-08 21:24:49

领域: cs.LG,cs.PL

下载: http://arxiv.org/abs/2405.05431v1

Towards Invariant Time Series Forecasting in Smart Cities

In the transformative landscape of smart cities, the integration of the cutting-edge web technologies into time series forecasting presents a pivotal opportunity to enhance urban planning, sustainability, and economic growth. The advancement of deep neural networks has significantly improved forecasting performance. However, a notable challenge lies in the ability of these models to generalize well to out-of-distribution (OOD) time series data. The inherent spatial heterogeneity and domain shifts across urban environments create hurdles that prevent models from adapting and performing effectively in new urban environments. To tackle this problem, we propose a solution to derive invariant representations for more robust predictions under different urban environments instead of relying on spurious correlation across urban environments for better generalizability. Through extensive experiments on both synthetic and real-world data, we demonstrate that our proposed method outperforms traditional time series forecasting models when tackling domain shifts in changing urban environments. The effectiveness and robustness of our method can be extended to diverse fields including climate modeling, urban planning, and smart city resource management.

Updated: 2024-05-08 21:23:01

标题: 朝向智慧城市中的不变时间序列预测

摘要: 在智能城市变革的景观中，将尖端网络技术整合到时间序列预测中，为增强城市规划、可持续性和经济增长提供了关键机遇。深度神经网络的进步显著提高了预测性能。然而，一个显著的挑战在于这些模型能否很好地推广到分布之外的时间序列数据。城市环境中固有的空间异质性和域漂移造成了障碍，阻止模型在新的城市环境中有效地适应和执行。为了解决这个问题，我们提出了一个解决方案，为不同的城市环境下的更强大的预测派生不变表示，而不是依赖于城市环境间的虚假相关性以获得更好的泛化能力。通过对合成数据和实际数据的广泛实验，我们展示了我们提出的方法在处理不断变化的城市环境中的领域转移时优于传统的时间序列预测模型。我们的方法的效力和稳健性可以扩展到包括气候建模、城市规划和智能城市资源管理在内的各个领域。

更新时间: 2024-05-08 21:23:01

领域: cs.LG

下载: http://arxiv.org/abs/2405.05430v1

How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

Updated: 2024-05-08 21:19:18

标题: 逆条件流如何可以作为分布回归的替代方式

摘要: 神经网络对简单模型，如线性回归，进行表示的研究越来越多，以更好地理解深度学习算法的基本原理。然而，对分布回归模型，如Cox模型的神经表示迄今为止受到很少关注。我们通过提出使用反向流转换（DRIFT）的分布回归框架来填补这一空白，该框架包括前述模型的神经表示。我们在实证中证明，DRIFT中模型的神经表示可以在涉及连续、有序、时间序列和生存结果的多个应用中作为其经典统计对应物的替代品。我们确认，从部分效应估计、预测和随机不确定性量化的角度来看，DRIFT中的模型在实证上与几种统计方法的性能相匹配。DRIFT涵盖了可解释的统计模型和灵活的神经网络，为统计建模和深度学习开辟了新的途径。

更新时间: 2024-05-08 21:19:18

领域: cs.LG,cs.AI,stat.CO,stat.ML

下载: http://arxiv.org/abs/2405.05429v1

Adversary-Guided Motion Retargeting for Skeleton Anonymization

Skeleton-based motion visualization is a rising field in computer vision, especially in the case of virtual reality (VR). With further advancements in human-pose estimation and skeleton extracting sensors, more and more applications that utilize skeleton data have come about. These skeletons may appear to be anonymous but they contain embedded personally identifiable information (PII). In this paper we present a new anonymization technique that is based on motion retargeting, utilizing adversary classifiers to further remove PII embedded in the skeleton. Motion retargeting is effective in anonymization as it transfers the movement of the user onto the a dummy skeleton. In doing so, any PII linked to the skeleton will be based on the dummy skeleton instead of the user we are protecting. We propose a Privacy-centric Deep Motion Retargeting model (PMR) which aims to further clear the retargeted skeleton of PII through adversarial learning. In our experiments, PMR achieves motion retargeting utility performance on par with state of the art models while also reducing the performance of privacy attacks.

Updated: 2024-05-08 21:18:02

标题: 对手引导的运动重定向用于骨架匿名化

摘要: 基于骨架的运动可视化是计算机视觉领域中一个新兴的领域，特别是在虚拟现实（VR）的情况下。随着人体姿势估计和骨架提取传感器的进一步发展，越来越多利用骨架数据的应用出现了。这些骨架可能看起来是匿名的，但它们包含嵌入的个人可识别信息（PII）。在本文中，我们提出了一种基于运动重定向的新的匿名化技术，利用对手分类器进一步移除骨架中嵌入的PII。运动重定向在匿名化中是有效的，因为它将用户的运动转移到一个虚拟骨架上。通过这样做，与骨架相关联的任何PII将基于虚拟骨架而不是我们正在保护的用户。我们提出了一个名为“隐私中心深度运动重定向模型”（PMR），旨在通过对抗学习进一步清除重定向骨架中的PII。在我们的实验中，PMR实现了与最先进模型相当的运动重定向效用性能，同时降低了隐私攻击的性能。

更新时间: 2024-05-08 21:18:02

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.05428v1

CloudSense: A Model for Cloud Type Identification using Machine Learning from Radar data

The knowledge of type of precipitating cloud is crucial for radar based quantitative estimates of precipitation. We propose a novel model called CloudSense which uses machine learning to accurately identify the type of precipitating clouds over the complex terrain locations in the Western Ghats (WGs) of India. CloudSense uses vertical reflectivity profiles collected during July-August 2018 from an X-band radar to classify clouds into four categories namely stratiform,mixed stratiform-convective,convective and shallow clouds. The machine learning(ML) model used in CloudSense was trained using a dataset balanced by Synthetic Minority Oversampling Technique (SMOTE), with features selected based on physical characteristics relevant to different cloud types. Among various ML models evaluated Light Gradient Boosting Machine (LightGBM) demonstrate superior performance in classifying cloud types with a BAC of 0.8 and F1-Score of 0.82. CloudSense generated results are also compared against conventional radar algorithms and we find that CloudSense performs better than radar algorithms. For 200 samples tested, the radar algorithm achieved a BAC of 0.69 and F1-Score of 0.68, whereas CloudSense achieved a BAC and F1-Score of 0.77. Our results show that ML based approach can provide more accurate cloud detection and classification which would be useful to improve precipitation estimates over the complex terrain of the WG.

Updated: 2024-05-08 21:12:33

标题: CloudSense：使用机器学习从雷达数据中识别云类型的模型

摘要: 识别降水云的类型对于基于雷达的降水定量估计至关重要。我们提出了一种名为CloudSense的新型模型，利用机器学习准确识别印度西高止山脉（WGs）复杂地形区域的降水云类型。CloudSense使用2018年7月至8月从X波段雷达收集的垂直反射率剖面将云分为四类，分别是层状、混合层状对流、对流和浅云。CloudSense中使用的机器学习（ML）模型经过Synthetic Minority Oversampling Technique（SMOTE）平衡的数据集训练，选取了与不同云类型相关的物理特征。在评估的各种ML模型中，轻量级梯度提升机（LightGBM）在云类型分类中表现出卓越的性能，BAC为0.8，F1-Score为0.82。CloudSense生成的结果还与传统雷达算法进行了比较，发现CloudSense的表现优于雷达算法。在测试的200个样本中，雷达算法实现了0.69的BAC和0.68的F1-Score，而CloudSense实现了0.77的BAC和0.77的F1-Score。我们的结果表明，基于ML的方法可以提供更准确的云检测和分类，有助于改善对WGs复杂地形的降水估计。

更新时间: 2024-05-08 21:12:33

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2405.05988v1

Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure

Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.

Updated: 2024-05-08 21:04:32

标题: 使用蛋白质结构的潜在表示加速分子扩散模型中的推断

摘要: 扩散生成模型已经成为解决结构生物学和基于结构的药物设计问题的强大框架。这些模型直接作用于3D分子结构。由于图神经网络（GNNs）随着图形大小的不利扩展以及扩散模型固有的相对缓慢的推理速度，许多现有的分子扩散模型依赖于蛋白质结构的粗粒度表示，以使训练和推理变得可行。然而，这种粗粒度表示丢弃了建模分子相互作用的基本信息，并影响了生成结构的质量。在这项工作中，我们提出了一种基于GNN的架构，用于学习分子结构的潜在表示。当与一种用于全新配体设计的扩散模型进行端到端训练时，我们的模型实现了与具有全原子蛋白质表示的模型相当的性能，同时推理时间减少了3倍。

更新时间: 2024-05-08 21:04:32

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2311.13466v2

Latent Variable Double Gaussian Process Model for Decoding Complex Neural Data

Non-parametric models, such as Gaussian Processes (GP), show promising results in the analysis of complex data. Their applications in neuroscience data have recently gained traction. In this research, we introduce a novel neural decoder model built upon GP models. The core idea is that two GPs generate neural data and their associated labels using a set of low- dimensional latent variables. Under this modeling assumption, the latent variables represent the underlying manifold or essential features present in the neural data. When GPs are trained, the latent variable can be inferred from neural data to decode the labels with a high accuracy. We demonstrate an application of this decoder model in a verbal memory experiment dataset and show that the decoder accuracy in predicting stimulus significantly surpasses the state-of-the-art decoder models. The preceding performance of this model highlights the importance of utilizing non-parametric models in the analysis of neuroscience data.

Updated: 2024-05-08 20:49:34

标题: 潜变量双高斯过程模型用于解码复杂神经数据

摘要: 非参数模型，如高斯过程（GP），在复杂数据分析中显示出有希望的结果。它们在神经科学数据中的应用最近受到关注。在这项研究中，我们介绍了一种建立在GP模型基础上的新型神经解码器模型。核心思想是，两个GP使用一组低维潜在变量生成神经数据及其相关标签。在这种建模假设下，潜在变量代表神经数据中存在的潜在流形或基本特征。当GP被训练时，可以从神经数据中推断出潜在变量，以高精度解码标签。我们展示了这个解码器模型在一个口语记忆实验数据集中的应用，并显示解码器在预测刺激方面的准确性显著超过了最先进的解码器模型。这个模型的前期表现凸显了在神经科学数据分析中利用非参数模型的重要性。

更新时间: 2024-05-08 20:49:34

领域: cs.LG

下载: http://arxiv.org/abs/2405.05424v1

Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models

Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant research effort to improve image sample quality, there is little work on representation-controlled generation using diffusion models. Specifically, causal modeling and controllable counterfactual generation using DPMs is an underexplored area. In this work, we propose CausalDiffAE, a diffusion-based causal representation learning framework to enable counterfactual generation according to a specified causal model. Our key idea is to use an encoder to extract high-level semantically meaningful causal variables from high-dimensional data and model stochastic variation using reverse diffusion. We propose a causal encoding mechanism that maps high-dimensional data to causally related latent factors and parameterize the causal mechanisms among latent factors using neural networks. To enforce the disentanglement of causal variables, we formulate a variational objective and leverage auxiliary label information in a prior to regularize the latent space. We propose a DDIM-based counterfactual generation procedure subject to do-interventions. Finally, to address the limited label supervision scenario, we also study the application of CausalDiffAE when a part of the training data is unlabeled, which also enables granular control over the strength of interventions in generating counterfactuals during inference. We empirically show that CausalDiffAE learns a disentangled latent space and is capable of generating high-quality counterfactual images.

Updated: 2024-05-08 20:43:47

标题: 因果扩散自编码器：通过扩散概率模型实现反事实生成

摘要: 概率扩散模型（DPMs）已经成为高质量图像生成的最先进技术。然而，DPMs具有任意的噪声潜在空间，没有可解释或可控的语义。尽管已经有相当多的研究工作致力于提高图像样本质量，但在扩散模型中很少有关于表示控制生成的工作。具体来说，使用DPMs进行因果建模和可控的反事实生成是一个未被充分探索的领域。在这项工作中，我们提出了CausalDiffAE，这是一个基于扩散的因果表示学习框架，可以根据指定的因果模型生成反事实。我们的关键思想是使用编码器从高维数据中提取高层次的语义有意义的因果变量，并使用反向扩散来建模随机变化。我们提出了一种因果编码机制，将高维数据映射到因果相关的潜在因素，并使用神经网络参数化潜在因素之间的因果机制。为了强制因果变量的解耦，我们制定了一个变分目标，并利用先验中的辅助标签信息来规范潜在空间。我们提出了一个基于DDIM的反事实生成过程，受到干预。最后，为了解决有限标签监督情景，我们还研究了当部分训练数据未标记时CausalDiffAE的应用，这也使得在推断过程中对生成反事实的干预强度进行细粒度控制成为可能。我们在实证中展示了CausalDiffAE学习到了一个解耦的潜在空间，并且能够生成高质量的反事实图像。

更新时间: 2024-05-08 20:43:47

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2404.17735v2

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks using anchor functions. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solutions, which capture the underlying compositional primitives, or symmetric solutions, which simply memorize mappings without understanding the compositional structure. By analyzing the information flow and vector representations within the model, we reveal the distinct mechanisms underlying these solution types. We further find that inferential solutions exhibit low complexity bias, which we hypothesize is a key factor enabling them to learn individual mappings for single anchors. Building upon our understanding of these mechanisms, we can predict the learning behavior of models with different initialization scales when faced with data of varying inferential complexity. Our findings provide valuable insights into the role of initialization scale in shaping the type of solution learned by transformers and their ability to learn and generalize compositional functions.

Updated: 2024-05-08 20:23:24

标题: 初始化对于transformers是否适合通过推理还是记忆来拟合复合函数至关重要

摘要: 变压器在各种任务中展示出了令人印象深刻的能力，但它们在组合问题上的表现仍然是一个争论的话题。在这项工作中，我们使用锚函数研究了变压器在未见组合任务上的行为机制。我们发现参数初始化比例在决定模型是否学习推理解决方案（捕捉潜在的组合基元）或对称解决方案（仅仅记忆映射而不理解组合结构）方面起着关键作用。通过分析模型内部的信息流和向量表示，我们揭示了这些解决方案类型之间的不同机制。我们进一步发现，推理解决方案表现出低复杂度偏差，我们假设这是使它们能够学习单个锚点的个别映射的关键因素。基于我们对这些机制的理解，我们可以预测在面对具有不同推理复杂度的数据时，具有不同初始化比例的模型的学习行为。我们的研究结果为初始化比例在塑造变压器学习的解决方案类型以及它们学习和泛化组合函数的能力方面提供了宝贵的见解。

更新时间: 2024-05-08 20:23:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.05409v1

DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models

Recently, Generative Diffusion Models (GDMs) have showcased their remarkable capabilities in learning and generating images. A large community of GDMs has naturally emerged, further promoting the diversified applications of GDMs in various fields. However, this unrestricted proliferation has raised serious concerns about copyright protection. For example, artists including painters and photographers are becoming increasingly concerned that GDMs could effortlessly replicate their unique creative works without authorization. In response to these challenges, we introduce a novel watermarking scheme, DiffusionShield, tailored for GDMs. DiffusionShield protects images from copyright infringement by GDMs through encoding the ownership information into an imperceptible watermark and injecting it into the images. Its watermark can be easily learned by GDMs and will be reproduced in their generated images. By detecting the watermark from generated images, copyright infringement can be exposed with evidence. Benefiting from the uniformity of the watermarks and the joint optimization method, DiffusionShield ensures low distortion of the original image, high watermark detection performance, and the ability to embed lengthy messages. We conduct rigorous and comprehensive experiments to show the effectiveness of DiffusionShield in defending against infringement by GDMs and its superiority over traditional watermarking methods. The code for DiffusionShield is accessible in https://github.com/Yingqiancui/DiffusionShield.

Updated: 2024-05-08 20:22:00

标题: 扩散屏蔽：一种用于对抗生成扩散模型的版权保护水印

摘要: 最近，生成扩散模型（GDMs）展示了它们在学习和生成图像方面的显着能力。一个庞大的GDMs社区自然而然地形成，进一步促进了GDMs在各个领域的多样化应用。然而，这种不受限制的扩张引起了严重的版权保护问题。例如，包括画家和摄影师在内的艺术家越来越担心GDMs可以在未经授权的情况下轻松复制他们独特的创作作品。针对这些挑战，我们引入了一种新颖的水印方案，DiffusionShield，专为GDMs定制。DiffusionShield通过将所有权信息编码到难以察觉的水印中，并将其注入到图像中，保护图像免受GDMs的版权侵犯。其水印可以轻松被GDMs学习，并将在它们生成的图像中再现。通过从生成的图像中检测水印，版权侵犯可以被揭露出来。由于水印的统一性和联合优化方法，DiffusionShield确保原始图像的低失真、高水印检测性能和嵌入长篇消息的能力。我们进行了严格和全面的实验证明DiffusionShield在抵御GDMs侵权行为方面的有效性，以及其优于传统水印方法的优越性。DiffusionShield的代码可在https://github.com/Yingqiancui/DiffusionShield 上获取。

更新时间: 2024-05-08 20:22:00

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.04642v3

Learning Causally Disentangled Representations via the Principle of Independent Causal Mechanisms

Learning disentangled causal representations is a challenging problem that has gained significant attention recently due to its implications for extracting meaningful information for downstream tasks. In this work, we define a new notion of causal disentanglement from the perspective of independent causal mechanisms. We propose ICM-VAE, a framework for learning causally disentangled representations supervised by causally related observed labels. We model causal mechanisms using nonlinear learnable flow-based diffeomorphic functions to map noise variables to latent causal variables. Further, to promote the disentanglement of causal factors, we propose a causal disentanglement prior learned from auxiliary labels and the latent causal structure. We theoretically show the identifiability of causal factors and mechanisms up to permutation and elementwise reparameterization. We empirically demonstrate that our framework induces highly disentangled causal factors, improves interventional robustness, and is compatible with counterfactual generation.

Updated: 2024-05-08 20:08:35

标题: 通过独立因果机制原则学习因果分解表示

摘要: 学习脱离因果关系的表现是一个具有挑战性的问题，最近由于其对提取下游任务中的有意义信息的影响而引起了重要关注。在这项工作中，我们从独立因果机制的角度定义了一种新的因果脱离概念。我们提出了ICM-VAE，这是一个学习因果脱离表现的框架，由与因果关系相关的观察标签监督。我们使用非线性可学习的基于流的可微函数模拟因果机制，将噪声变量映射到潜在的因果变量上。此外，为了促进因果因素的脱离，我们提出了一个从辅助标签和潜在因果结构中学习的因果脱离先验。我们在理论上展示了因果因素和机制的可辨识性，直到置换和逐元素重新参数化。我们在实证上证明了我们的框架导致高度脱离的因果因素，提高了干预鲁棒性，并与反事实生成兼容。

更新时间: 2024-05-08 20:08:35

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2306.01213v3

ASPIRE: Iterative Amortized Posterior Inference for Bayesian Inverse Problems

Due to their uncertainty quantification, Bayesian solutions to inverse problems are the framework of choice in applications that are risk averse. These benefits come at the cost of computations that are in general, intractable. New advances in machine learning and variational inference (VI) have lowered the computational barrier by learning from examples. Two VI paradigms have emerged that represent different tradeoffs: amortized and non-amortized. Amortized VI can produce fast results but due to generalizing to many observed datasets it produces suboptimal inference results. Non-amortized VI is slower at inference but finds better posterior approximations since it is specialized towards a single observed dataset. Current amortized VI techniques run into a sub-optimality wall that can not be improved without more expressive neural networks or extra training data. We present a solution that enables iterative improvement of amortized posteriors that uses the same networks architectures and training data. The benefits of our method requires extra computations but these remain frugal since they are based on physics-hybrid methods and summary statistics. Importantly, these computations remain mostly offline thus our method maintains cheap and reusable online evaluation while bridging the approximation gap these two paradigms. We denote our proposed method ASPIRE - Amortized posteriors with Summaries that are Physics-based and Iteratively REfined. We first validate our method on a stylized problem with a known posterior then demonstrate its practical use on a high-dimensional and nonlinear transcranial medical imaging problem with ultrasound. Compared with the baseline and previous methods from the literature our method stands out as an computationally efficient and high-fidelity method for posterior inference.

Updated: 2024-05-08 20:03:12

标题: ASPIRE：贝叶斯逆问题的迭代摊销后验推断

摘要: 由于其不确定性量化，贝叶斯逆问题解决方案是风险厌恶的应用中首选的框架。这些优势是以一般情况下不可解的计算为代价的。机器学习和变分推断（VI）的新进展通过从例子中学习降低了计算障碍。出现了两种代表不同权衡的VI范式：摊销和非摊销。摊销VI可以快速产生结果，但由于泛化到许多观察数据集，它会产生次优推理结果。非摊销VI在推理速度上较慢，但由于专门针对单个观察数据集，可以找到更好的后验逼近。当前的摊销VI技术遇到了一个无法改善的次优性障碍，除非使用更具表现力的神经网络或额外的训练数据。我们提出了一种解决方案，可以使用相同的网络架构和训练数据，实现摊销后验的迭代改进。我们的方法的优势需要额外的计算，但由于基于物理混合方法和摘要统计数据，这些计算仍然节俭。重要的是，这些计算大部分仍然是离线的，因此我们的方法在维持廉价且可重复使用的在线评估的同时，弥合了这两种范式之间的逼近差距。我们将提出的方法称为ASPIRE - 基于摘要的摊销后验，可以迭代改进。我们首先在一个已知后验的程式化问题上验证了我们的方法，然后在一个高维非线性经颅医学成像问题中展示了其实际用途，该问题使用超声波。与文献中的基线和先前方法相比，我们的方法在后验推理方面表现出色，是一种计算效率高且高保真度的方法。

更新时间: 2024-05-08 20:03:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.05398v1

Neural Networks Make Approximately Independent Errors Over Repeated Training

Typical neural network trainings have substantial variance in test-set performance between repeated runs, impeding hyperparameter comparison and training reproducibility. In this work we present the following results towards understanding this variation. (1) Despite having significant variance on their test-sets, we demonstrate that standard CIFAR-10 and ImageNet trainings have little variance in performance on the underlying test-distributions from which their test-sets are sampled. (2) We show that these trainings make approximately independent errors on their test-sets. That is, the event that a trained network makes an error on one particular example does not affect its chances of making errors on other examples, relative to their average rates over repeated runs of training with the same hyperparameters. (3) We prove that the variance of neural network trainings on their test-sets is a downstream consequence of the class-calibration property discovered by Jiang et al. (2021). Our analysis yields a simple formula which accurately predicts variance for the binary classification case. (4) We conduct preliminary studies of data augmentation, learning rate, finetuning instability and distribution-shift through the lens of variance between runs.

Updated: 2024-05-08 20:02:56

标题: 神经网络在重复训练过程中产生近似独立的错误

摘要: 典型的神经网络训练在重复运行之间的测试集表现存在显著的方差，阻碍了超参数比较和训练的可重复性。在这项工作中，我们提出以下结果来理解这种变化。 (1) 尽管它们的测试集存在显著的方差，我们证明标准的CIFAR-10和ImageNet训练在其测试集被采样的基础测试分布上表现的方差很小。 (2) 我们展示这些训练在其测试集上产生大致独立的错误。也就是说，训练网络在一个特定示例上出现错误的事件不会影响其在其他示例上出现错误的机会，相对于使用相同超参数重复训练的平均错误率。 (3) 我们证明神经网络训练在其测试集上的方差是姜等人发现的类别校准属性的下游结果。我们的分析得出一个简单的公式，可以准确预测二元分类情况下的方差。 (4) 我们通过运行之间的方差的视角，对数据增强、学习率、微调不稳定性和分布转移进行了初步研究。

更新时间: 2024-05-08 20:02:56

领域: cs.LG

下载: http://arxiv.org/abs/2304.01910v2

ECG-SMART-NET: A Deep Learning Architecture for Precise ECG Diagnosis of Occlusion Myocardial Infarction

In this paper we describe ECG-SMART-NET for identification of occlusion myocardial infarction (OMI). OMI is a severe form of heart attack characterized by complete blockage of one or more coronary arteries requiring immediate referral for cardiac catheterization to restore blood flow to the heart. Two thirds of OMI cases are difficult to visually identify from a 12-lead electrocardiogram (ECG) and can be potentially fatal if not identified in a timely fashion. Previous works on this topic are scarce, and current state-of-the-art evidence suggests that both random forests with engineered features and convolutional neural networks (CNNs) are promising approaches to improve the ECG detection of OMI. While the ResNet architecture has been successfully adapted for use with ECG recordings, it is not ideally suited to capture informative temporal features within each lead and the spatial concordance or discordance across leads. We propose a clinically informed modification of the ResNet-18 architecture. The model first learns temporal features through temporal convolutional layers with 1xk kernels followed by a spatial convolutional layer, after the residual blocks, with 12x1 kernels to learn spatial features. The new ECG-SMART-NET was benchmarked against the original ResNet-18 and other state-of-the-art models on a multisite real-word clinical dataset that consists of 10,893 ECGs from 7,297 unique patients (rate of OMI = 6.5%). ECG-SMART-NET outperformed other models in the classification of OMI with a test AUC score of 0.889 +/- 0.027 and a test average precision score of 0.587 +/- 0.087.

Updated: 2024-05-08 19:59:16

标题: ECG-SMART-NET：一种用于精确诊断闭塞性心肌梗死的深度学习架构

摘要: 在这篇论文中，我们描述了用于识别闭塞性心肌梗死（OMI）的ECG-SMART-NET。OMI是一种严重的心脏病发作形式，其特征是一个或多个冠状动脉完全阻塞，需要立即转诊进行心脏导管造影以恢复心脏血流。三分之二的OMI病例在12导联心电图（ECG）中很难从视觉上识别，并且如果不能及时识别可能会致命。以前关于这个主题的研究很少，目前最先进的证据表明，采用工程特征的随机森林和卷积神经网络（CNN）是改善ECG检测OMI的有希望的方法。虽然ResNet架构已成功用于ECG记录，但不太适合捕捉每个导联内的信息性时间特征以及导联之间的空间一致性或不一致性。我们提出了对ResNet-18架构进行临床启示的修改。该模型首先通过具有1xk核的时间卷积层学习时间特征，然后在残差块之后的空间卷积层中，采用12x1核学习空间特征。新的ECG-SMART-NET在一个包含10,893个ECG的多个真实世界临床数据集上与原始ResNet-18和其他最先进模型进行了基准测试，该数据集包含来自7,297名独特患者的ECG（OMI发生率为6.5%）。ECG-SMART-NET在OMI分类中表现优异，在测试AUC得分为0.889 +/- 0.027，测试平均精度得分为0.587 +/- 0.087。

更新时间: 2024-05-08 19:59:16

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09567v1

Exploring knowledge graph-based neural-symbolic system from application perspective

The rapid advancement in artificial intelligence (AI), particularly through deep neural networks, has catalyzed significant progress in fields such as vision and text processing. Nonetheless, the pursuit of AI systems that exhibit human-like reasoning and interpretability continues to pose a substantial challenge. The Neural-Symbolic paradigm, which integrates the deep learning prowess of neural networks with the reasoning capabilities of symbolic systems, presents a promising pathway toward developing more transparent and comprehensible AI systems. Within this paradigm, the Knowledge Graph (KG) emerges as a crucial element, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, predominantly utilizing the triple (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, elucidating how KG underpins this integration across three key categories: enhancing the reasoning and interpretability of neural networks through the incorporation of symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes directions for future research in the domain of Neural-Symbolic AI.

Updated: 2024-05-08 19:54:59

标题: 从应用角度探索基于知识图的神经符号系统

摘要: 人工智能（AI）特别是通过深度神经网络的快速发展，催生了在视觉和文本处理等领域取得了重大进展。然而，追求展现类人推理和可解释性的AI系统仍然面临着重大挑战。神经符号范式将神经网络的深度学习能力与符号系统的推理能力相结合，为开发更透明和可理解的AI系统提供了一个有前途的途径。在这个范式中，知识图（KG）作为一个关键元素出现，通过相互连接的实体和关系，主要利用三元组（主语、谓语、宾语）的结构化和动态方法来表示知识。本文探讨了基于KG的神经符号整合的最新进展，阐明了KG如何支撑这一整合在三个关键类别中的应用：通过整合符号知识来增强神经网络的推理和可解释性（符号用于神经），通过神经网络方法来完善符号系统的完整性和准确性（神经用于符号），以及促进它们在混合神经符号整合中的联合应用。它突出了当前的趋势，并提出了在神经符号AI领域未来研究的方向。

更新时间: 2024-05-08 19:54:59

领域: cs.AI

下载: http://arxiv.org/abs/2405.03524v2

Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake

The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments

Updated: 2024-05-08 19:45:46

标题: Delta张量：Delta湖中高效的向量和张量存储

摘要: 人工智能（AI）和机器学习（ML）应用的指数增长促使了对向量和张量数据进行高效存储解决方案的开发。本文提出了一种在Lakehouse架构中使用Delta Lake进行张量存储的新方法。通过采用数组数据库中的多维数组存储策略和稀疏编码方法应用于Delta Lake表，实验证明与传统张量序列化相比，该方法在空间和时间效率方面都有显著改进。这些结果为在数据密集型应用中开发和实施优化的向量和张量存储解决方案提供了宝贵的见解，有助于在云原生环境中推动AI和ML领域高效数据管理实践的进化。

更新时间: 2024-05-08 19:45:46

领域: cs.DC,cs.DB,cs.LG

下载: http://arxiv.org/abs/2405.03708v2

Towards Less Biased Data-driven Scoring with Deep Learning-Based End-to-end Database Search in Tandem Mass Spectrometry

Peptide identification in mass spectrometry-based proteomics is crucial for understanding protein function and dynamics. Traditional database search methods, though widely used, rely on heuristic scoring functions and statistical estimations have to be introduced for a higher identification rate. Here, we introduce DeepSearch, the first deep learning-based end-to-end database search method for tandem mass spectrometry. DeepSearch leverages a modified transformer-based encoder-decoder architecture under the contrastive learning framework. Unlike conventional methods that rely on ion-to-ion matching, DeepSearch adopts a data-driven approach to score peptide spectrum matches. DeepSearch is also the first deep learning-based method that can profile variable post-translational modifications in a zero-shot manner. We showed that DeepSearch's scoring scheme expressed less bias and did not require any statistical estimation. We validated DeepSearch's accuracy and robustness across various datasets, including those from species with diverse protein compositions and a modification-enriched dataset. DeepSearch sheds new light on database search methods in tandem mass spectrometry.

Updated: 2024-05-08 19:39:17

标题: 朝着减少偏见的数据驱动评分：基于深度学习的串联质谱数据库搜索

摘要: 在基于质谱的蛋白质组学中，肽段的鉴定对于理解蛋白质功能和动态至关重要。传统的数据库搜索方法虽然被广泛使用，但依赖于启发式评分函数，并且需要引入统计估计以获得更高的鉴定率。在这里，我们介绍了DeepSearch，这是第一个基于深度学习的端到端数据库搜索方法，用于串联质谱。DeepSearch利用了在对比学习框架下的修改后的基于transformer的编码器-解码器架构。与依赖于离子-离子匹配的传统方法不同，DeepSearch采用了数据驱动的方法来评分肽段谱匹配。DeepSearch还是第一个可以以零-shot方式对可变的翻译后修饰进行分析的基于深度学习的方法。我们展示了DeepSearch的评分方案表现出更少的偏见，并且不需要任何统计估计。我们验证了DeepSearch在各种数据集上的准确性和稳健性，包括那些具有多样蛋白质组成和富含修饰的数据集。DeepSearch为串联质谱中的数据库搜索方法带来了新的启示。

更新时间: 2024-05-08 19:39:17

领域: q-bio.QM,cs.AI

下载: http://arxiv.org/abs/2405.06511v1

Interpretability Needs a New Paradigm

Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained. At the core of this debate is how each paradigm ensures its explanations are faithful, i.e., true to the model's behavior. This is important, as false but convincing explanations lead to unsupported confidence in artificial intelligence (AI), which can be dangerous. This paper's position is that we should think about new paradigms while staying vigilant regarding faithfulness. First, by examining the history of paradigms in science, we see that paradigms are constantly evolving. Then, by examining the current paradigms, we can understand their underlying beliefs, the value they bring, and their limitations. Finally, this paper presents 3 emerging paradigms for interpretability. The first paradigm designs models such that faithfulness can be easily measured. Another optimizes models such that explanations become faithful. The last paradigm proposes to develop models that produce both a prediction and an explanation.

Updated: 2024-05-08 19:31:06

标题: 可解释性需要一个新的范式

摘要: 可解释性是研究将模型以可理解的方式解释给人类的学问。目前，可解释性被分为两种范式：内在范式认为只有设计为可解释的模型才能被解释，后验范式则认为黑盒模型也可以被解释。这场争论的核心是每种范式如何确保其解释是忠实的，即符合模型的行为。这很重要，因为虽然错误但令人信服的解释会导致对人工智能（AI）的不受支持的信心，这可能是危险的。本文认为我们应该在保持警惕的同时思考新的范式。首先，通过研究科学范式的历史，我们可以看到范式不断演变。然后，通过研究当前的范式，我们可以了解它们的基本信念、带来的价值以及局限性。最后，本文提出了三种新兴的可解释性范式。第一种范式设计模型，使得忠实性可以容易地被衡量。另一种范式优化模型，使得解释变得忠实。最后一种范式提议开发既能产生预测又能提供解释的模型。

更新时间: 2024-05-08 19:31:06

领域: cs.LG,cs.CL,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.05386v1

Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance

In this paper, we introduce the Interpretable Cross-Examination Technique (ICE-T), a novel approach that leverages structured multi-prompt techniques with Large Language Models (LLMs) to improve classification performance over zero-shot and few-shot methods. In domains where interpretability is crucial, such as medicine and law, standard models often fall short due to their "black-box" nature. ICE-T addresses these limitations by using a series of generated prompts that allow an LLM to approach the problem from multiple directions. The responses from the LLM are then converted into numerical feature vectors and processed by a traditional classifier. This method not only maintains high interpretability but also allows for smaller, less capable models to achieve or exceed the performance of larger, more advanced models under zero-shot conditions. We demonstrate the effectiveness of ICE-T across a diverse set of data sources, including medical records and legal documents, consistently surpassing the zero-shot baseline in terms of classification metrics such as F1 scores. Our results indicate that ICE-T can be used for improving both the performance and transparency of AI applications in complex decision-making environments.

Updated: 2024-05-08 19:20:34

标题: 可解释的交叉审讯技术（ICE-T）：利用高度信息化的特征提升LLM性能

摘要: 在本文中，我们介绍了可解释的交叉审查技术（ICE-T），这是一种新颖的方法，利用结构化的多提示技术与大型语言模型（LLMs）来提高分类性能，超越零样本和少样本方法。在医学和法律等需要解释性的领域，由于标准模型通常是“黑匣子”性质，因此常常表现不佳。ICE-T通过使用一系列生成的提示来允许LLM从多个方向解决问题，从而解决了这些限制。然后，LLM的响应被转换为数值特征向量，并由传统分类器处理。这种方法不仅保持高度解释性，还使较小、能力较弱的模型能够在零样本条件下实现或超越较大、更先进的模型的性能。我们展示了ICE-T在包括医疗记录和法律文件在内的多样化数据源上的有效性，始终超越零样本基线，以F1分数等分类度量为指标。我们的结果表明，ICE-T可以用于提高复杂决策环境中人工智能应用的性能和透明度。

更新时间: 2024-05-08 19:20:34

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.06703v1

"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations

Large language models (LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate "harm" as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment. Our experiments reveal that seven out of the eight LLMs included in this study generated conversations riddled with CHAST, characterized by malign views expressed in seemingly neutral language unlikely to be detected by existing methods. Notably, these LLMs manifested more extreme views and opinions when dealing with non-Western concepts like caste, compared to Western ones such as race.

Updated: 2024-05-08 19:08:45

标题: “他们没有受过教育：揭示LLM生成的对话中的隐性伤害和社会威胁”

摘要: 大型语言模型(LLMs)已成为现代社会的重要组成部分，推动用户界面应用程序（如个人助理）和企业应用程序（如招聘工具）的发展。尽管它们具有实用性，研究表明LLMs会持续强化系统性偏见。然而，之前关于LLM危害的研究主要集中在西方概念，如种族和性别，往往忽视了其他地区的文化概念。此外，这些研究通常将“危害”作为一个单一维度进行调查，忽略了危害表现的各种微妙形式。为了填补这一空白，我们引入了隐蔽危害和社会威胁（CHAST），这是一组基于社会科学文献的七项指标。我们利用与人类评估相一致的评估模型来检查LLM生成的对话中存在的隐蔽危害，特别是在招聘背景下。我们的实验表明，在此研究中包括的八个LLMs中，有七个生成了充斥着CHAST的对话，表现为似乎中立语言表达的恶意观点，不太可能被现有方法检测到。值得注意的是，与种族等西方概念相比，这些LLMs在处理种姓等非西方概念时表现出更极端的观点和意见。

更新时间: 2024-05-08 19:08:45

领域: cs.CL,cs.AI,cs.CY,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.05378v1

Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license). At the time of their release, each model achieved state-of-the-art retrieval accuracy for models of their size on the MTEB Retrieval leaderboard, with the largest model, arctic-embed-l outperforming closed source embedding models such as Cohere's embed-v3 and Open AI's text-embed-3-large. In addition to the details of our training recipe, we have provided several informative ablation studies, which we believe are the cause of our model performance.

Updated: 2024-05-08 19:05:18

标题: 北极嵌入：可扩展、高效和准确的文本嵌入模型

摘要: 这份报告描述了\texttt{arctic-embed}文本嵌入模型系列（一组从22到334百万参数的五个模型，其中权重以Apache-2许可证开源）的训练数据集创建和配方。在发布时，每个模型在MTEB检索排行榜上以其大小的模型实现了最先进的检索准确性，其中最大的模型arctic-embed-l胜过了闭源嵌入模型，如Cohere的embed-v3和Open AI的text-embed-3-large。除了我们的训练配方的细节，我们还提供了几个信息性的消融研究，我们认为这些是我们模型性能的原因。

更新时间: 2024-05-08 19:05:18

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.05374v1

Bayesian Pseudo-Coresets via Contrastive Divergence

Bayesian methods provide an elegant framework for estimating parameter posteriors and quantification of uncertainty associated with probabilistic models. However, they often suffer from slow inference times. To address this challenge, Bayesian Pseudo-Coresets (BPC) have emerged as a promising solution. BPC methods aim to create a small synthetic dataset, known as pseudo-coresets, that approximates the posterior inference achieved with the original dataset. This approximation is achieved by optimizing a divergence measure between the true posterior and the pseudo-coreset posterior. Various divergence measures have been proposed for constructing pseudo-coresets, with forward Kullback-Leibler (KL) divergence being the most successful. However, using forward KL divergence necessitates sampling from the pseudo-coreset posterior, often accomplished through approximate Gaussian variational distributions. Alternatively, one could employ Markov Chain Monte Carlo (MCMC) methods for sampling, but this becomes challenging in high-dimensional parameter spaces due to slow mixing. In this study, we introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence. Importantly, optimizing contrastive divergence eliminates the need for approximations in the pseudo-coreset construction process. Furthermore, it enables the use of finite-step MCMC methods, alleviating the requirement for extensive mixing to reach a stationary distribution. To validate our method's effectiveness, we conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques.

Updated: 2024-05-08 19:04:46

标题: 贝叶斯伪核心集合的对比散度

摘要: 贝叶斯方法为估计参数后验和量化与概率模型相关的不确定性提供了一个优雅的框架。然而，它们经常受到推断时间较慢的困扰。为了解决这一挑战，贝叶斯伪核心集(BPC)已经成为一个有前途的解决方案。BPC方法旨在创建一个小型的合成数据集，称为伪核心集，它近似于通过原始数据集实现的后验推断。通过优化真实后验和伪核心集后验之间的差异度量来实现这种近似。已经提出了各种差异度量用于构建伪核心集，其中前向Kullback-Leibler（KL）散度最成功。然而，使用前向KL散度需要从伪核心集后验中进行采样，通常通过近似的高斯变分分布来实现。或者，可以采用马尔可夫链蒙特卡洛（MCMC）方法进行采样，但由于在高维参数空间中混合速度较慢，这变得具有挑战性。在本研究中，我们引入了一种利用对比散度构建伪核心集的新方法。重要的是，通过优化对比散度，消除了伪核心集构建过程中近似的需要。此外，它使有限步MCMC方法的应用成为可能，减轻了需要广泛混合以达到稳定分布的要求。为了验证我们方法的有效性，我们在多个数据集上进行了大量实验，展示了其优于现有BPC技术的优越性。

更新时间: 2024-05-08 19:04:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2303.11278v2

Failing to hash into supersingular isogeny graphs

An important open problem in supersingular isogeny-based cryptography is to produce, without a trusted authority, concrete examples of "hard supersingular curves" that is, equations for supersingular curves for which computing the endomorphism ring is as difficult as it is for random supersingular curves. A related open problem is to produce a hash function to the vertices of the supersingular $\ell$-isogeny graph which does not reveal the endomorphism ring, or a path to a curve of known endomorphism ring. Such a hash function would open up interesting cryptographic applications. In this paper, we document a number of (thus far) failed attempts to solve this problem, in the hope that we may spur further research, and shed light on the challenges and obstacles to this endeavour. The mathematical approaches contained in this article include: (i) iterative root-finding for the supersingular polynomial; (ii) gcd's of specialized modular polynomials; (iii) using division polynomials to create small systems of equations; (iv) taking random walks in the isogeny graph of abelian surfaces; and (v) using quantum random walks.

Updated: 2024-05-08 18:59:08

标题: 无法哈希到超奇异同构图

摘要: 超奇异同源密码学中的一个重要的开放问题是在没有信任机构的情况下产生具体的“难以破解的超奇异曲线”，即超奇异曲线的方程，计算其自同态环与计算随机超奇异曲线一样困难。一个相关的开放问题是产生一个哈希函数，用于超奇异$\ell$-同源图的顶点，不会泄露自同态环，或者到已知自同态环曲线的路径。这样一个哈希函数将开启有趣的密码应用。在本文中，我们记录了一些（迄今为止）未能成功解决这个问题的尝试，希望能激发进一步的研究，并揭示这一努力所面临的挑战和障碍。本文中包含的数学方法包括：(i) 超奇异多项式的迭代根查找；(ii) 专门模多项式的最大公约数；(iii) 使用除法多项式创建小型方程组；(iv) 在阿贝尔曲面的同源图中进行随机游走；以及(v) 使用量子随机游走。

更新时间: 2024-05-08 18:59:08

领域: math.NT,cs.CR,11G05, 11T71, 14G50, 14K02, 81P94, 94A60, 68Q12

下载: http://arxiv.org/abs/2205.00135v3

Model Reconstruction Using Counterfactual Explanations: Mitigating the Decision Boundary Shift

Counterfactual explanations find ways of achieving a favorable model outcome with minimum input perturbation. However, counterfactual explanations can also be exploited to steal the model by strategically training a surrogate model to give similar predictions as the original (target) model. In this work, we investigate model extraction by specifically leveraging the fact that the counterfactual explanations also lie quite close to the decision boundary. We propose a novel strategy for model extraction that we call Counterfactual Clamping Attack (CCA) which trains a surrogate model using a unique loss function that treats counterfactuals differently than ordinary instances. Our approach also alleviates the related problem of decision boundary shift that arises in existing model extraction attacks which treat counterfactuals as ordinary instances. We also derive novel mathematical relationships between the error in model approximation and the number of queries using polytope theory. Experimental results demonstrate that our strategy provides improved fidelity between the target and surrogate model predictions on several real world datasets.

Updated: 2024-05-08 18:52:47

标题: 使用反事实解释进行模型重建：缓解决策边界偏移

摘要: 反事实解释找到了实现有利模型结果的方法，同时最小程度地扰乱输入。然而，反事实解释也可以被利用来窃取模型，通过策略性地训练一个替代模型来给出与原始（目标）模型类似的预测结果。在这项工作中，我们通过特别利用反事实解释也接近决策边界的事实，研究了模型提取。我们提出了一种新的模型提取策略，我们称之为反事实夹攻击（CCA），它使用一种独特的损失函数来训练替代模型，该函数对待反事实与普通实例不同。我们的方法还缓解了现有模型提取攻击中出现的决策边界转移问题，这些攻击将反事实视为普通实例。我们还通过多面体理论推导了模型逼近误差与查询数量之间的新的数学关系。实验结果表明，我们的策略在几个真实世界数据集上提供了目标模型和替代模型预测之间改善的准确性。

更新时间: 2024-05-08 18:52:47

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2405.05369v1

A Decision Theoretic Framework for Measuring AI Reliance

Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's recommendation from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.

Updated: 2024-05-08 18:34:50

标题: 一个用于衡量人工智能依赖的决策理论框架

摘要: 人类经常在人工智能（AI）系统的帮助下做出决策。一种常见的模式是AI向人类推荐一种行动，而人类保留对最终决定的控制权。研究人员已经确定，确保人类对AI具有适当的依赖是实现互补性绩效的关键组成部分。我们认为，在此类研究中使用的适当依赖的当前定义缺乏正式的统计基础，可能会导致矛盾。我们提出了一种基于统计决策理论的依赖的正式定义，该定义将依赖的概念分开为决策者遵循AI建议的概率和人类可能面临的区分信号和形成准确信念的挑战。我们的定义产生了一个框架，可用于指导关于人-AI互补性和依赖的研究的设计和解释。通过使用文献中的最新AI建议的决策研究，我们演示了如何使用我们的框架将由于错误依赖而导致的损失与由于无法准确区分信号而导致的损失分开。我们通过将这些损失与由理性决策者面对相同决策任务而获得的预期收益定义的基准进行比较来评估这些损失。

更新时间: 2024-05-08 18:34:50

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2401.15356v3

Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives

As AI systems quickly improve in both breadth and depth of performance, they lend themselves to creating increasingly powerful and realistic agents, including the possibility of agents modeled on specific people. We anticipate that within our lifetimes it may become common practice for people to create a custom AI agent to interact with loved ones and/or the broader world after death. We call these generative ghosts, since such agents will be capable of generating novel content rather than merely parroting content produced by their creator while living. In this paper, we first discuss the design space of potential implementations of generative ghosts. We then discuss the practical and ethical implications of generative ghosts, including potential positive and negative impacts on individuals and society. Based on these considerations, we lay out a research agenda for the AI and HCI research communities to empower people to create and interact with AI afterlives in a safe and beneficial manner.

Updated: 2024-05-08 18:28:10

标题: 生成的幽灵：预期人工智能后世的利益和风险

摘要: 随着人工智能系统在性能广度和深度上迅速提升，它们逐渐变得越来越强大和逼真，包括可能建模于特定人物的代理人。我们预计在我们的有生之年，人们可能会普遍地创建定制的人工智能代理人，与亲人和/或更广泛的世界进行交互。我们将这些称为生成式幽灵，因为这样的代理人将能够生成新颖的内容，而不仅仅是重复其创作者在生前产生的内容。在本文中，我们首先讨论生成式幽灵潜在实现的设计空间。然后，我们探讨生成式幽灵的实际和道德影响，包括对个人和社会的潜在积极和消极影响。基于这些考虑，我们提出了一个研究议程，旨在让人工智能和人机交互研究社区赋予人们以安全和有益的方式创建和与人工智能来世进行交互。

更新时间: 2024-05-08 18:28:10

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2402.01662v2

Offline Model-Based Optimization via Policy-Guided Gradient Search

Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we introduce a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided gradient search approach explicitly learns the best policy for a given surrogate model created from the offline data. Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.

Updated: 2024-05-08 18:27:37

标题: 离线模型基于策略引导的梯度搜索优化

摘要: 离线优化是许多实验工程领域的一个新兴问题，包括蛋白质、药物或飞机设计，在这些领域，进行在线实验以收集评估数据太昂贵或危险。为了避免这种情况，人们必须仅仅根据固定输入的离线评估来优化一个未知函数。这个问题的一个朴素解决方案是学习未知函数的替代模型，并优化这个替代模型。然而，这样一个朴素的优化器容易在离线数据集之外的输入上对替代模型进行错误的高估（可能是由于在偏向函数评估样本的过度拟合）。先前解决这个挑战的方法主要集中在学习稳健的替代模型上。然而，它们的搜索策略是从替代模型而不是实际的离线数据中得出的。为了填补这一重要差距，我们引入了一个新的学习搜索的视角，将离线优化重新构造为一个离线强化学习问题。我们提出的策略引导的梯度搜索方法明确地学习了对于从离线数据创建的替代模型而言最佳策略。我们在多个基准测试上的实证结果表明，学习的优化策略可以与现有的离线替代模型结合，显著提高优化性能。

更新时间: 2024-05-08 18:27:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.05349v1

The Effect of Model Size on LLM Post-hoc Explainability via LIME

Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.

Updated: 2024-05-08 18:27:20

标题: 模型大小对通过LIME进行后续解释性的LLM影响

摘要: 大型语言模型（LLMs）正变得越来越大，以提高性能。然而，目前很少有关于这种趋势如何影响可解释性的研究。本文探讨了LIME解释在自然语言推理（NLI）和零样本分类（ZSC）任务中四种不同尺寸的DeBERTaV3模型上的应用。我们评估了这些解释的可信度，即它们与模型内部决策过程的一致性，以及它们的可信度，即它们与人类解释的一致性。关键发现是，尽管模型性能有所提高，但增加模型大小并不与可信度相关，这表明随着模型大小的增加，LIME解释与模型内部过程之间存在不一致。我们的结果进一步表明在NLI环境中关于可信度指标存在限制。

更新时间: 2024-05-08 18:27:20

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.05348v1

Benchmarking Educational Program Repair

The emergence of large language models (LLMs) has sparked enormous interest due to their potential application across a range of educational tasks. For example, recent work in programming education has used LLMs to generate learning resources, improve error messages, and provide feedback on code. However, one factor that limits progress within the field is that much of the research uses bespoke datasets and different evaluation metrics, making direct comparisons between results unreliable. Thus, there is a pressing need for standardization and benchmarks that facilitate the equitable comparison of competing approaches. One task where LLMs show great promise is program repair, which can be used to provide debugging support and next-step hints to students. In this article, we propose a novel educational program repair benchmark. We curate two high-quality publicly available programming datasets, present a unified evaluation procedure introducing a novel evaluation metric rouge@k for approximating the quality of repairs, and evaluate a set of five recent models to establish baseline performance.

Updated: 2024-05-08 18:23:59

标题: 教育项目修复的基准测试

摘要: 大型语言模型（LLMs）的出现引起了极大的兴趣，因为它们在教育任务的一系列应用上具有潜在的应用。例如，最近在编程教育领域的工作使用LLMs生成学习资源，改进错误消息，并提供代码反馈。然而，限制该领域进展的一个因素是许多研究使用定制数据集和不同的评估指标，使得直接比较结果不可靠。因此，迫切需要标准化和基准，以促进竞争方法的公平比较。LLMs展现出巨大潜力的一个任务是程序修复，它可以用来为学生提供调试支持和下一步提示。在本文中，我们提出了一个新颖的教育性程序修复基准。我们收集了两个高质量的公开可用的编程数据集，提出了一个统一的评估程序，引入了一个用于近似修复质量的新颖评估指标rouge@k，并评估了一组最近的五个模型以建立基准性能。

更新时间: 2024-05-08 18:23:59

领域: cs.SE,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2405.05347v1

NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks

Propositional satisfiability (SAT) is an NP-complete problem that impacts many research fields, such as planning, verification, and security. Mainstream modern SAT solvers are based on the Conflict-Driven Clause Learning (CDCL) algorithm. Recent work aimed to enhance CDCL SAT solvers using Graph Neural Networks (GNNs). However, so far this approach either has not made solving more effective, or required substantial GPU resources for frequent online model inferences. Aiming to make GNN improvements practical, this paper proposes an approach called NeuroBack, which builds on two insights: (1) predicting phases (i.e., values) of variables appearing in the majority (or even all) of the satisfying assignments are essential for CDCL SAT solving, and (2) it is sufficient to query the neural model only once for the predictions before the SAT solving starts. Once trained, the offline model inference allows NeuroBack to execute exclusively on the CPU, removing its reliance on GPU resources. To train NeuroBack, a new dataset called DataBack containing 120,286 data samples is created. NeuroBack is implemented as an enhancement to a state-of-the-art SAT solver called Kissat. As a result, it allowed Kissat to solve up to 5.2% and 7.4% more problems on two recent SAT competition problem sets, SATCOMP-2022 and SATCOMP-2023, respectively. NeuroBack therefore shows how machine learning can be harnessed to improve SAT solving in an effective and practical manner.

Updated: 2024-05-08 18:23:10

标题: 神经背景：利用图神经网络改进CDCL SAT求解

摘要: 命题可满足性（SAT）是一个NP完全问题，影响着许多研究领域，如规划、验证和安全。主流现代SAT求解器基于冲突驱动子句学习（CDCL）算法。最近的工作旨在利用图神经网络（GNNs）增强CDCL SAT求解器。然而，到目前为止，这种方法要么没有使求解更有效，要么需要大量的GPU资源进行频繁的在线模型推理。为了使GNN的改进变得实用，本文提出了一种名为NeuroBack的方法，它基于两个见解：（1）预测出现在大多数（甚至所有）满足分配中的变量的相位（即值）对于CDCL SAT求解至关重要，（2）在SAT求解开始之前，只需查询神经模型一次进行预测即可。一旦训练完成，离线模型推理使NeuroBack能够仅在CPU上执行，消除了对GPU资源的依赖。为了训练NeuroBack，创建了一个包含120,286个数据样本的新数据集DataBack。NeuroBack被实现为对一种名为Kissat的最先进SAT求解器的增强。结果，它使Kissat在两个最近的SAT竞赛问题集SATCOMP-2022和SATCOMP-2023上分别解决了多达5.2%和7.4%的问题。因此，NeuroBack展示了如何利用机器学习以有效和实用的方式改进SAT求解。

更新时间: 2024-05-08 18:23:10

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2110.14053v7

Distributed Least Squares in Small Space via Sketching and Bias Reduction

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.

Updated: 2024-05-08 18:16:37

标题: 利用草图和偏差减少在小空间中的分布式最小二乘

摘要: 矩阵草图是减小大型数据矩阵大小的强大工具。然而，当我们想要恢复一个准确的估计量，如最小二乘回归任务时，这种大小缩减存在根本限制。我们表明，在分布式环境中设计最小化估计偏差而不是错误的草图方法可以规避这些限制。特别地，我们提出了一种稀疏草图方法，它在最佳空间和当前矩阵乘法时间内运行，通过对数据进行两次遍历来恢复几乎无偏的最小二乘估计量。这导致了新的通信高效的分布式平均算法，可用于最小二乘和相关任务，并直接改进了几种先前的方法。我们的关键创新是为草绘最小二乘提供了新的偏差分析，对其依赖于草图稀疏性的特征进行了尖锐的表征。技术包括新的高阶矩限制Bai-Silverstein不等式，这些不等式对于草图生成的随机矩阵的非渐近分析是独立感兴趣的。

更新时间: 2024-05-08 18:16:37

领域: cs.DS,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2405.05343v1

In-context Autoencoder for Context Compression in a Large Language Model

We propose the In-context Autoencoder (ICAE), leveraging the power of a large language model (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, it is fine-tuned on instruction data for producing desirable responses to various prompts. Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves $4\times$ context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE's significant implications in addressing the long context problem and suggesting further research in LLM context management. Our data, code and models are available at https://github.com/getao/icae.

Updated: 2024-05-08 18:16:09

标题: 大型语言模型中用于上下文压缩的上下文自编码器

摘要: 我们提出了上下文自动编码器（ICAE），利用大型语言模型（LLM）的能力将长上下文压缩成短小的紧凑内存槽，LLM可以直接对其进行条件化以实现各种目的。ICAE首先在大量文本数据上使用自动编码和语言建模目标进行预训练，使其能够生成准确全面地表示原始上下文的内存槽。然后，它在指令数据上进行微调，以产生对各种提示的理想响应。实验表明，我们的轻量级ICAE引入了约1%的额外参数，有效实现了基于Llama的4倍上下文压缩，从而在推理过程中提供了改善延迟和GPU内存成本的优势，并展示了对记忆以及可扩展性的有趣见解。这些有希望的结果暗示了认知科学中工作记忆与LLM中表示学习之间的联系的新视角，揭示了ICAE在解决长上下文问题方面的重要意义，并建议在LLM上下文管理方面进一步研究。我们的数据、代码和模型可在https://github.com/getao/icae找到。

更新时间: 2024-05-08 18:16:09

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2307.06945v4

Joint semi-supervised and contrastive learning enables zero-shot domain-adaptation and multi-domain segmentation

Despite their effectiveness, current deep learning models face challenges with images coming from different domains with varying appearance and content. We introduce SegCLR, a versatile framework designed to segment volumetric images across different domains, employing supervised and contrastive learning simultaneously to effectively learn from both labeled and unlabeled data. We demonstrate the superior performance of SegCLR through a comprehensive evaluation involving three diverse clinical datasets of retinal fluid segmentation in 3D Optical Coherence Tomography (OCT), various network configurations, and verification across 10 different network initializations. In an unsupervised domain adaptation context, SegCLR achieves results on par with a supervised upper-bound model trained on the intended target domain. Notably, we discover that the segmentation performance of SegCLR framework is marginally impacted by the abundance of unlabeled data from the target domain, thereby we also propose an effective zero-shot domain adaptation extension of SegCLR, eliminating the need for any target domain information. This shows that our proposed addition of contrastive loss in standard supervised training for segmentation leads to superior models, inherently more generalizable to both in- and out-of-domain test data. We additionally propose a pragmatic solution for SegCLR deployment in realistic scenarios with multiple domains containing labeled data. Accordingly, our framework pushes the boundaries of deep-learning based segmentation in multi-domain applications, regardless of data availability - labeled, unlabeled, or nonexistent.

Updated: 2024-05-08 18:10:59

标题: 联合半监督和对比学习实现零样本领域自适应和多领域分割

摘要: 尽管目前的深度学习模型在来自不同领域具有不同外观和内容的图像方面表现出有效性，但也面临挑战。我们引入了SegCLR，这是一个多功能框架，旨在跨不同领域对体积图像进行分割，同时采用监督学习和对比学习，有效地从标记和未标记的数据中学习。通过对三个不同的临床数据集（视网膜液体在3D光学相干断层扫描中的分割）进行全面评估，涉及不同的网络配置，并验证了10种不同的网络初始化，我们展示了SegCLR的优越性能。在无监督领域适应的情况下，SegCLR实现了与在预期目标领域上训练的监督上界模型相媲美的结果。值得注意的是，我们发现SegCLR框架的分割性能受到来自目标领域的未标记数据丰富程度的轻微影响，因此我们还提出了SegCLR的零样本域适应扩展，消除了对任何目标领域信息的需求。这表明我们提出的将对比损失添加到标准监督训练中用于分割的方法导致了更优越的模型，从根本上更具通用性，适用于内部和外部领域的测试数据。此外，我们还提出了SegCLR在包含标记数据的多个领域的实际场景中的部署实用解决方案。因此，我们的框架推动了基于深度学习的分割在多领域应用中的边界，无论数据可用性如何-标记的、未标记的或不存在的。

更新时间: 2024-05-08 18:10:59

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.05336v1

Multiplicative Dynamic Mode Decomposition

Koopman operators are infinite-dimensional operators that linearize nonlinear dynamical systems, facilitating the study of their spectral properties and enabling the prediction of the time evolution of observable quantities. Recent methods have aimed to approximate Koopman operators while preserving key structures. However, approximating Koopman operators typically requires a dictionary of observables to capture the system's behavior in a finite-dimensional subspace. The selection of these functions is often heuristic, may result in the loss of spectral information, and can severely complicate structure preservation. This paper introduces Multiplicative Dynamic Mode Decomposition (MultDMD), which enforces the multiplicative structure inherent in the Koopman operator within its finite-dimensional approximation. Leveraging this multiplicative property, we guide the selection of observables and define a constrained optimization problem for the matrix approximation, which can be efficiently solved. MultDMD presents a structured approach to finite-dimensional approximations and can more accurately reflect the spectral properties of the Koopman operator. We elaborate on the theoretical framework of MultDMD, detailing its formulation, optimization strategy, and convergence properties. The efficacy of MultDMD is demonstrated through several examples, including the nonlinear pendulum, the Lorenz system, and fluid dynamics data, where we demonstrate its remarkable robustness to noise.

Updated: 2024-05-08 18:09:16

标题: 乘法动态模态分解

摘要: Koopman算子是无限维算子，可以线性化非线性动力系统，便于研究其谱特性并预测可观测量的时间演化。最近的方法旨在逼近Koopman算子同时保持关键结构。然而，逼近Koopman算子通常需要一个可观测量字典来捕捉系统在有限维子空间中的行为。这些函数的选择通常是启发式的，可能导致谱信息的丢失，并严重复杂化结构保持。本文介绍了乘法动态模态分解（MultDMD），在其有限维逼近中强制Koopman算子固有的乘法结构。利用这种乘法属性，我们指导可观测量的选择，并为矩阵逼近定义了一种受约束的优化问题，可以高效地解决。MultDMD提供了一种结构化方法来进行有限维逼近，并可以更准确地反映Koopman算子的谱特性。我们详细阐述了MultDMD的理论框架，包括其公式、优化策略和收敛特性。通过几个示例，包括非线性摆、Lorenz系统和流体动力学数据，我们展示了MultDMD对噪声的显著稳健性。

更新时间: 2024-05-08 18:09:16

领域: math.DS,cs.LG,cs.NA,math.NA,math.OC,math.SP

下载: http://arxiv.org/abs/2405.05334v1

Untargeted Adversarial Attack on Knowledge Graph Embeddings

Knowledge graph embedding (KGE) methods have achieved great success in handling various knowledge graph (KG) downstream tasks. However, KGE methods may learn biased representations on low-quality KGs that are prevalent in the real world. Some recent studies propose adversarial attacks to investigate the vulnerabilities of KGE methods, but their attackers are target-oriented with the KGE method and the target triples to predict are given in advance, which lacks practicability. In this work, we explore untargeted attacks with the aim of reducing the global performances of KGE methods over a set of unknown test triples and conducting systematic analyses on KGE robustness. Considering logic rules can effectively summarize the global structure of a KG, we develop rule-based attack strategies to enhance the attack efficiency. In particular,we consider adversarial deletion which learns rules, applying the rules to score triple importance and delete important triples, and adversarial addition which corrupts the learned rules and applies them for negative triples as perturbations. Extensive experiments on two datasets over three representative classes of KGE methods demonstrate the effectiveness of our proposed untargeted attacks in diminishing the link prediction results. And we also find that different KGE methods exhibit different robustness to untargeted attacks. For example, the robustness of methods engaged with graph neural networks and logic rules depends on the density of the graph. But rule-based methods like NCRL are easily affected by adversarial addition attacks to capture negative rules

Updated: 2024-05-08 18:08:11

标题: 对知识图嵌入的非定向对抗攻击

摘要: 知识图谱嵌入（KGE）方法在处理各种知识图谱（KG）下游任务方面取得了巨大成功。然而，KGE方法可能会在现实世界中普遍存在的低质量KG上学习到偏见表示。一些最近的研究提出了对抗性攻击来调查KGE方法的脆弱性，但它们的攻击者是以KGE方法和要预测的目标三元组为目标的，这缺乏实用性。在这项工作中，我们探索了无目标攻击，旨在降低KGE方法在一组未知测试三元组上的全局性能，并对KGE的鲁棒性进行系统分析。考虑到逻辑规则可以有效总结KG的全局结构，我们开发了基于规则的攻击策略来增强攻击效率。特别是，我们考虑了对抗性删除，学习规则，将规则应用于评分三元组重要性并删除重要三元组，以及对抗性添加，破坏学习的规则并将其应用于负三元组作为扰动。在两个数据集上对三种代表性KGE方法的广泛实验表明，我们提出的无目标攻击在减少链接预测结果方面是有效的。我们还发现不同的KGE方法对无目标攻击表现出不同的鲁棒性。例如，与图神经网络和逻辑规则有关的方法的鲁棒性取决于图的密度。但像NCRL这样的基于规则的方法容易受到对抗性添加攻击的影响，以捕捉负规则。

更新时间: 2024-05-08 18:08:11

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2405.10970v1

Citadel: Real-World Hardware-Software Contracts for Secure Enclaves Through Microarchitectural Isolation and Controlled Speculation

Hardware isolation primitives such as secure enclaves aim to protect sensitive programs, but remain vulnerable to transient execution attacks. Complete microarchitectural isolation is not a satisfactory defense mechanism as it leaves out public shared memory, critical for usability and application performance. Conversely, hardware-software co-designs for secure speculation can counter these attacks but are not yet practical, since they make assumptions on the speculation modes, the exposed microarchitectural state, and the software, which are all hard to support for the entire software stack. This paper advocates for processors to incorporate microarchitectural isolation primitives and mechanisms for controlled speculation, enabling different execution modes. These modes can restrict what is exposed to an attacker, effectively balancing performance and program-analysis complexity. We introduce two mechanisms to securely share memory between an enclave and an untrusted OS in an out-of-order processor. We show that our two modes are complementary, achieving speculative non-interference with a reasonable performance impact, while requiring minimal code annotation and simple program analysis doable by hand. Our prototype, Citadel, is a multicore processor running on an FPGA, booting untrusted Linux, and supporting comprehensive enclave capabilities, such as shared memory, and remote attestation. To our knowledge, Citadel is the first end-to-end enclave platform to run secure applications, such as cryptographic libraries or small private inference workloads, on a speculative out-of-order multicore processor while protecting against a significant class of side-channel attacks.

Updated: 2024-05-08 18:07:03

标题: Citadel：通过微体系结构隔离和受控推测实现安全飞地的现实硬件软件契约

摘要: 硬件隔离原语，如安全隔离区，旨在保护敏感程序，但仍然容易受到瞬时执行攻击的影响。完整的微体系结构隔离并不是一个令人满意的防御机制，因为它排除了对于可用性和应用程序性能至关重要的公共共享内存。相反，用于安全推测的硬件-软件协同设计可以抵抗这些攻击，但目前还不实用，因为它们对推测模式、暴露的微体系结构状态和软件进行假设，而这些都很难支持整个软件堆栈。本文主张处理器应该整合微体系结构隔离原语和受控推测机制，实现不同的执行模式。这些模式可以限制攻击者所暴露的内容，有效地平衡性能和程序分析复杂性。我们介绍了两种机制，在乱序处理器中在一个安全隔离区和一个不受信任的操作系统之间安全地共享内存。我们展示了我们的两种模式是互补的，实现了具有合理性能影响的推测非干扰，同时需要最少的代码注释和通过手工进行简单的程序分析。我们的原型Citadel是一个在FPGA上运行的多核处理器，启动不受信任的Linux，并支持全面的安全隔离能力，如共享内存和远程认证。据我们所知，Citadel是第一个端到端的隔离平台，可以在一个推测乱序多核处理器上运行安全应用程序，如加密库或小型私有推理工作负载，同时抵御一类重要的侧信道攻击。

更新时间: 2024-05-08 18:07:03

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2306.14882v4

ICE-SEARCH: A Language Model-Driven Feature Selection Approach

This study unveils the In-Context Evolutionary Search (ICE-SEARCH) method, which is among the first works that melds large language models (LLMs) with evolutionary algorithms for feature selection (FS) tasks and demonstrates its effectiveness in Medical Predictive Analytics (MPA) applications. ICE-SEARCH harnesses the crossover and mutation capabilities inherent in LLMs within an evolutionary framework, significantly improving FS through the model's comprehensive world knowledge and its adaptability to a variety of roles. Our evaluation of this methodology spans three crucial MPA tasks: stroke, cardiovascular disease, and diabetes, where ICE-SEARCH outperforms traditional FS methods in pinpointing essential features for medical applications. ICE-SEARCH achieves State-of-the-Art (SOTA) performance in stroke prediction and diabetes prediction; the Decision-Randomized ICE-SEARCH ranks as SOTA in cardiovascular disease prediction. The study emphasizes the critical role of incorporating domain-specific insights, illustrating ICE-SEARCH's robustness, generalizability, and convergence. This opens avenues for further research into comprehensive and intricate FS landscapes, marking a significant stride in the application of artificial intelligence in medical predictive analytics.

Updated: 2024-05-08 18:05:43

标题: ICE-SEARCH: 一种基于语言模型驱动的特征选择方法

摘要: 这项研究揭示了In-Context Evolutionary Search（ICE-SEARCH）方法，这是首批将大型语言模型（LLMs）与进化算法相结合进行特征选择（FS）任务的研究，并展示了其在医学预测分析（MPA）应用中的有效性。ICE-SEARCH利用LLMs固有的交叉和突变能力在进化框架内，通过模型的全面世界知识和其适应多种角色的能力显著改进了FS。我们对该方法的评估涵盖了三项关键的MPA任务：中风、心血管疾病和糖尿病，在这些任务中，ICE-SEARCH在医学应用中关键特征的确定方面优于传统的FS方法。ICE-SEARCH在中风预测和糖尿病预测方面取得了最先进的表现；决策随机化ICE-SEARCH在心血管疾病预测中排名最先进。该研究强调了将领域特定见解纳入的关键作用，展示了ICE-SEARCH的稳健性、泛化性和收敛性。这为进一步研究全面和复杂的FS领域开辟了途径，标志着人工智能在医学预测分析中的重要进展。

更新时间: 2024-05-08 18:05:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.18609v4

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-value cache (KV-cache). Hence, KV-Runahead parallelizes the prompt phase by orchestrating multiple processes to populate the KV-cache and minimizes the time-to-first-token (TTFT). Dual-purposing the KV-cache scheme has two main benefits. Fist, since KV-cache is designed to leverage the causal attention map, we minimize computation and computation automatically. Second, since it already exists for the exten- sion phase, KV-Runahead is easy to implement. We further propose context-level load-balancing to handle uneven KV-cache generation (due to the causal attention) and to optimize TTFT. Compared with an existing parallelization scheme such as tensor or sequential parallelization where keys and values are locally generated and exchanged via all-gather collectives, our experimental results demonstrate that KV-Runahead can offer over 1.4x and 1.6x speedups for Llama 7B and Falcon 7B respectively.

Updated: 2024-05-08 18:03:22

标题: KV-Runahead：通过并行键值缓存生成实现可扩展的因果LLM推理

摘要: 大型语言模型（LLM）推断分为两个阶段，即提示（或预填充）阶段用于输出第一个标记和扩展（或解码）阶段用于生成后续标记。在这项工作中，我们提出了一种高效的并行化方案KV-Runahead来加速提示阶段。关键观察是扩展阶段由于键值缓存（KV-cache）的存在可以更快地生成标记，因此，KV-Runahead通过协调多个进程来填充KV-cache并最小化时间到第一个标记（TTFT）来并行化提示阶段。双重利用KV-cache方案有两个主要好处。首先，由于KV-cache设计用于利用因果关注图，我们最小化了计算和自动计算。其次，由于它已经存在于扩展阶段，KV-Runahead易于实现。我们进一步提出了上下文级负载平衡来处理由于因果关注导致的不均匀KV-cache生成，并优化TTFT。与现有的并行化方案（如张量或顺序并行化）相比，其中键和值是通过全收集集体本地生成和交换，我们的实验结果表明，KV-Runahead可以分别为Llama 7B和Falcon 7B提供超过1.4倍和1.6倍的加速。

更新时间: 2024-05-08 18:03:22

领域: cs.DC,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.05329v1

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.

Updated: 2024-05-08 17:59:53

标题: 多模态数据高效3D场景理解用于自动驾驶

摘要: 高效利用数据对于推动自动驾驶中的3D场景理解至关重要，在这种情况下，过度依赖人工标注的LiDAR点云挑战了完全监督的方法。为了解决这个问题，我们的研究扩展到LiDAR语义分割的半监督学习，利用驾驶场景的内在空间先验和多传感器补充来增强无标记数据集的有效性。我们介绍了LaserMix++，这是一个进化的框架，整合了来自不同LiDAR扫描的激光束操作，并结合了LiDAR-相机对应关系，进一步帮助数据有效学习。我们的框架旨在通过融入多模态来增强3D场景一致性正则化，包括1)用于细粒度跨传感器交互的多模态LaserMix操作；2)增强LiDAR特征学习的相机到LiDAR特征蒸馏；和3)基于语言驱动的知识引导，使用开放词汇模型生成辅助监督。LaserMix++的多功能性使其适用于LiDAR表示，在LiDAR场景中建立了一个通用的解决方案。我们的框架通过理论分析和对流行的驾驶感知数据集的大量实验得到了严格验证。结果表明，LaserMix++明显优于完全监督的替代方案，实现了与五倍标注相比可比的准确性，并显著改善了仅有监督的基线。这一重大进步凸显了在减少对LiDAR为基础的3D场景理解系统中大量标记数据依赖性方面的半监督方法的潜力。

更新时间: 2024-05-08 17:59:53

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.05258v1

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question formats -- typically a multiple-choice response regarding a particular object or attribute -- which we term "Type II hallucinations". Additionally, such benchmarks often require external API calls to models which are subject to change. In practice, we observe that a reduction in Type II hallucinations does not lead to a reduction in Type I hallucinations but rather that the two forms of hallucinations are often anti-correlated. To address this, we propose THRONE, a novel object-based automatic framework for quantitatively evaluating Type I hallucinations in LVLM free-form outputs. We use public language models (LMs) to identify hallucinations in LVLM responses and compute informative metrics. By evaluating a large selection of recent LVLMs using public datasets, we show that an improvement in existing metrics do not lead to a reduction in Type I hallucinations, and that established benchmarks for measuring Type I hallucinations are incomplete. Finally, we provide a simple and effective data augmentation method to reduce Type I and Type II hallucinations as a strong baseline.

Updated: 2024-05-08 17:59:11

标题: THRONE：一个基于对象的幻觉基准，用于大规模视觉-语言模型的自由生成

摘要: 在大型视觉语言模型（LVLMs）中减轻幻觉仍然是一个悬而未决的问题。最近的基准并未解决自由形式回答中的幻觉问题，我们将其称为“类型I幻觉”。相反，它们侧重于对非常具体的问题格式做出幻觉性回答，通常是关于特定对象或属性的多项选择回答，我们将其称为“类型II幻觉”。此外，这些基准通常需要对可能发生变化的模型进行外部API调用。实践中，我们观察到减少类型II幻觉并不会导致类型I幻觉的减少，而是这两种幻觉形式通常是反相关的。为了解决这个问题，我们提出了THRONE，一种新颖的基于对象的自动框架，用于定量评估LVLM自由形式输出中的类型I幻觉。我们使用公共语言模型（LMs）来识别LVLM响应中的幻觉，并计算信息性指标。通过使用公共数据集评估最新的LVLMs大量选择，我们展示了现有指标的改进并不会导致类型I幻觉的减少，并且用于测量类型I幻觉的已建立基准是不完整的。最后，我们提供了一种简单有效的数据增强方法，作为减少类型I和类型II幻觉的强有力基准。

更新时间: 2024-05-08 17:59:11

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.05256v1

Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo

Diffusion generative models have excelled at diverse image generation and reconstruction tasks across fields. A less explored avenue is their application to discriminative tasks involving regression or classification problems. The cornerstone of modern cosmology is the ability to generate predictions for observed astrophysical fields from theory and constrain physical models from observations using these predictions. This work uses a single diffusion generative model to address these interlinked objectives -- as a surrogate model or emulator for cold dark matter density fields conditional on input cosmological parameters, and as a parameter inference model that solves the inverse problem of constraining the cosmological parameters of an input field. The model is able to emulate fields with summary statistics consistent with those of the simulated target distribution. We then leverage the approximate likelihood of the diffusion generative model to derive tight constraints on cosmology by using the Hamiltonian Monte Carlo method to sample the posterior on cosmological parameters for a given test image. Finally, we demonstrate that this parameter inference approach is more robust to the addition of noise than baseline parameter inference networks.

Updated: 2024-05-08 17:59:03

标题: Diffusion-HMC: 使用扩散模型驱动的哈密顿蒙特卡洛进行参数推断

摘要: 扩散生成模型在不同领域的图像生成和重建任务中表现出色。一个较少探索的领域是它们在涉及回归或分类问题的判别任务中的应用。现代宇宙学的基石是从理论中生成对观测到的天体物理场的预测，并利用这些预测从观测中约束物理模型。本研究利用单个扩散生成模型来解决这些相互关联的目标——作为冷暗物质密度场的替代模型或模拟器，条件是输入宇宙学参数，并作为解决输入场的宇宙学参数约束的逆问题的参数推断模型。该模型能够模拟具有与模拟目标分布一致的摘要统计信息的场。然后，我们利用扩散生成模型的近似似然性质，通过使用哈密尔顿蒙特卡洛方法对给定的测试图像的宇宙学参数后验进行采样，从而获得对宇宙学的紧密约束。最后，我们证明这种参数推断方法比基线参数推断网络更能够抵抗噪声的影响。

更新时间: 2024-05-08 17:59:03

领域: astro-ph.CO,cs.LG

下载: http://arxiv.org/abs/2405.05255v1

Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge

Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models can produce remains understudied. This is a concern as providing flawed or misleading generated feedback could be detrimental to student learning. Inspired by recent work that has utilised very powerful LLMs, such as GPT-4, to evaluate the outputs produced by less powerful models, we conduct an automated analysis of the quality of the feedback produced by several open source models using a dataset from an introductory programming course. First, we investigate the viability of employing GPT-4 as an automated evaluator by comparing its evaluations with those of a human expert. We observe that GPT-4 demonstrates a bias toward positively rating feedback while exhibiting moderate agreement with human raters, showcasing its potential as a feedback evaluator. Second, we explore the quality of feedback generated by several leading open-source LLMs by using GPT-4 to evaluate the feedback. We find that some models offer competitive performance with popular proprietary LLMs, such as ChatGPT, indicating opportunities for their responsible use in educational settings.

Updated: 2024-05-08 17:57:39

标题: 开源语言模型可以提供反馈：评估LLMs帮助学生使用GPT-4作为评判员的能力

摘要: 大型语言模型（LLMs）已经展示出在各种计算环境中自动生成反馈的巨大潜力。然而，人们对将学生作品发送到专有模型的隐私和伦理影响表示担忧。这引发了对在教育中使用开源LLMs的兴趣，但这种开放模型产生的反馈质量仍未得到研究。这是一个问题，因为提供有缺陷或误导性的生成反馈可能对学生学习有害。受到最近利用GPT-4等非常强大的LLMs评估较弱模型产生的输出的工作的启发，我们对使用来自入门编程课程的数据集的几个开源模型产生的反馈质量进行了自动分析。首先，我们通过将其评估与人类专家的评估进行比较，研究了将GPT-4作为自动评估器的可行性。我们观察到GPT-4在积极评价反馈的倾向，同时与人类评分员展示出适度的一致性，展示了其作为反馈评估器的潜力。其次，我们通过使用GPT-4评估反馈来探索几个领先的开源LLMs生成的反馈质量。我们发现一些模型在教育设置中提供了与流行的专有LLMs（如ChatGPT）竞争力的表现，表明它们在教育环境中负责任地使用的机会。

更新时间: 2024-05-08 17:57:39

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2405.05253v1

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: https://atedm.github.io.

Updated: 2024-05-08 17:56:47

标题: 基于注意力的无需训练的扩散模型效率增强

摘要: 扩散模型（DMs）在生成高质量和多样化图像方面表现出卓越性能。然而，这种异常性能是以昂贵的架构设计为代价的，特别是由于领先模型中广泛使用的注意模块。现有作品主要采用重新训练过程来增强DM的效率。这在计算上是昂贵的，而且不太可扩展。为此，我们引入了基于注意力驱动的无训练高效扩散模型（AT-EDM）框架，利用注意力映射在运行时对冗余令牌进行修剪，而无需任何重新训练。具体地，对于单去噪步骤修剪，我们开发了一种新的排名算法，广义加权页排序（G-WPR），以识别冗余令牌，并基于相似性的恢复方法来恢复卷积运算的令牌。此外，我们提出了一种去噪步骤感知修剪（DSAP）方法，以调整不同去噪时间步长的修剪预算，以获得更好的生成质量。广泛的评估结果显示，AT-EDM在效率方面表现优于先前的技术（例如，节省38.8%的FLOPs，并在稳定扩散XL上加速高达1.53倍），同时保持几乎相同的FID和CLIP分数作为完整模型。项目网页：https://atedm.github.io。

更新时间: 2024-05-08 17:56:47

领域: cs.CV,cs.AI,cs.LG,eess.IV,eess.SP

下载: http://arxiv.org/abs/2405.05252v1

LLMs with Personalities in Multi-issue Negotiation Games

Powered by large language models (LLMs), AI agents have become capable of many human tasks. Using the most canonical definitions of the Big Five personality, we measure the ability of LLMs to negotiate within a game-theoretical framework, as well as methodological challenges to measuring notions of fairness and risk. Simulations (n=1,500) for both single-issue and multi-issue negotiation reveal increase in domain complexity with asymmetric issue valuations improve agreement rates but decrease surplus from aggressive negotiation. Through gradient-boosted regression and Shapley explainers, we find high openness, conscientiousness, and neuroticism are associated with fair tendencies; low agreeableness and low openness are associated with rational tendencies. Low conscientiousness is associated with high toxicity. These results indicate that LLMs may have built-in guardrails that default to fair behavior, but can be "jail broken" to exploit agreeable opponents. We also offer pragmatic insight in how negotiation bots can be designed, and a framework of assessing negotiation behavior based on game theory and computational social science.

Updated: 2024-05-08 17:51:53

标题: 在多议题谈判游戏中具有个性的LLMs

摘要: 由大型语言模型（LLMs）驱动，人工智能代理已经能够完成许多人类任务。使用最经典的大五人格定义，我们衡量了LLMs在博弈理论框架内进行协商的能力，以及衡量公平和风险概念的方法论挑战。单议题和多议题协商的模拟（n=1,500）显示，领域复杂性随着不对称议题估值的增加而增加，协议率提高，但激进协商所得的剩余减少。通过梯度提升回归和Shapley解释器，我们发现高度开放性、尽责性和神经质与公平倾向相关；低宜人性和低开放性与理性倾向相关。低尽责性与高毒性相关。这些结果表明，LLMs可能具有默认公平行为的内置防护装置，但可以"越狱"以利用易让步的对手。我们还提供了有关如何设计协商机器人以及基于博弈论和计算社会科学评估协商行为的实用见解。

更新时间: 2024-05-08 17:51:53

领域: cs.CL,cs.AI,cs.MA

下载: http://arxiv.org/abs/2405.05248v1

Sensitivity-Aware Amortized Bayesian Inference

Sensitivity analyses reveal the influence of various modeling choices on the outcomes of statistical analyses. While theoretically appealing, they are overwhelmingly inefficient for complex Bayesian models. In this work, we propose sensitivity-aware amortized Bayesian inference (SA-ABI), a multifaceted approach to efficiently integrate sensitivity analyses into simulation-based inference with neural networks. First, we utilize weight sharing to encode the structural similarities between alternative likelihood and prior specifications in the training process with minimal computational overhead. Second, we leverage the rapid inference of neural networks to assess sensitivity to data perturbations and preprocessing steps. In contrast to most other Bayesian approaches, both steps circumvent the costly bottleneck of refitting the model for each choice of likelihood, prior, or data set. Finally, we propose to use deep ensembles to detect sensitivity arising from unreliable approximation (e.g., due to model misspecification). We demonstrate the effectiveness of our method in applied modeling problems, ranging from disease outbreak dynamics and global warming thresholds to human decision-making. Our results support sensitivity-aware inference as a default choice for amortized Bayesian workflows, automatically providing modelers with insights into otherwise hidden dimensions.

Updated: 2024-05-08 17:50:06

标题: 敏感性感知的分摊贝叶斯推断

摘要: 敏感性分析揭示了各种建模选择对统计分析结果的影响。虽然在理论上具有吸引力，但对于复杂的贝叶斯模型，它们效率低下。在这项工作中，我们提出了敏感性感知摊销贝叶斯推断（SA-ABI），这是一种多方面的方法，可以有效地将敏感性分析与基于模拟的推断与神经网络集成在一起。首先，我们利用权重共享来在训练过程中对备选似然和先验规范之间的结构相似性进行编码，最小化计算开销。其次，我们利用神经网络的快速推断来评估对数据扰动和预处理步骤的敏感性。与大多数其他贝叶斯方法相比，这两个步骤都绕过了因每个似然、先验或数据集的选择而重新拟合模型的昂贵瓶颈。最后，我们提出使用深度集成来检测由于模型错误规范而产生的不可靠近似敏感性。我们展示了我们的方法在应用建模问题中的有效性，从疾病暴发动态和全球变暖阈值到人类决策。我们的结果支持将敏感性感知推断作为摊销贝叶斯工作流程的默认选择，自动为建模者提供有关隐藏维度的见解。

更新时间: 2024-05-08 17:50:06

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2310.11122v5

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specialized field requiring focused attention. To promote SVDD research, we recently proposed the "SVDD Challenge," the very first research challenge focusing on SVDD for lab-controlled and in-the-wild bonafide and deepfake singing voice recordings. The challenge will be held in conjunction with the 2024 IEEE Spoken Language Technology Workshop (SLT 2024).

Updated: 2024-05-08 17:40:12

标题: SVDD挑战2024：一项歌声Deepfake检测挑战评估计划

摘要: 人工智能生成的歌声迅速发展，现在能够紧密模仿自然人类歌唱，并与乐谱完美结合，这引发了艺术家和音乐产业的担忧。与口语声音不同，歌唱声音由于其音乐性质和强烈的背景音乐而面临独特挑战，使得歌声深度伪造检测（SVDD）成为一个需要专注关注的专业领域。为促进SVDD研究，我们最近提出了“SVDD挑战”，这是首个专注于实验室控制和野外真实和深度伪造歌声录音的研究挑战。该挑战将与2024年IEEE口语语言技术研讨会（SLT 2024）同时举行。

更新时间: 2024-05-08 17:40:12

领域: eess.AS,cs.AI,cs.MM,cs.SD

下载: http://arxiv.org/abs/2405.05244v1

Deep learning-based variational autoencoder for classification of quantum and classical states of light

Advancements in optical quantum technologies have been enabled by the generation, manipulation, and characterization of light, with identification based on its photon statistics. However, characterizing light and its sources through single photon measurements often requires efficient detectors and longer measurement times to obtain high-quality photon statistics. Here we introduce a deep learning-based variational autoencoder (VAE) method for classifying single photon added coherent state (SPACS), single photon added thermal state (SPACS), mixed states between coherent/SPACS and thermal/SPATS of light. Our semisupervised learning-based VAE efficiently maps the photon statistics features of light to a lower dimension, enabling quasi-instantaneous classification with low average photon counts. The proposed VAE method is robust and maintains classification accuracy in the presence of losses inherent in an experiment, such as finite collection efficiency, non-unity quantum efficiency, finite number of detectors, etc. Additionally, leveraging the transfer learning capabilities of VAE enables successful classification of data of any quality using a single trained model. We envision that such a deep learning methodology will enable better classification of quantum light and light sources even in the presence of poor detection quality.

Updated: 2024-05-08 17:40:03

标题: 基于深度学习的变分自编码器用于分类光的量子和经典态

摘要: 光学量子技术的进展得益于对光的产生、操作和特征化，其识别基于其光子统计特性。然而，通过单光子测量来表征光及其来源通常需要高效探测器和较长的测量时间以获得高质量的光子统计。在这里，我们引入了一种基于深度学习的变分自编码器（VAE）方法，用于分类单光子添加相干态（SPACS）、单光子添加热态（SPACS）、混合态（包括相干/SPACS和热/SPATS）。我们的半监督学习基于VAE有效地将光的光子统计特征映射到较低维度，实现了低平均光子计数的准即时分类。所提出的VAE方法具有鲁棒性，在实验中存在损耗（如有限收集效率、非单位量子效率、有限数量的探测器等）时仍能保持分类精度。此外，利用VAE的迁移学习能力，可以使用单个训练模型成功地对任何质量的数据进行分类。我们预见，这种深度学习方法将使即使在探测质量较差的情况下，也能更好地对量子光和光源进行分类。

更新时间: 2024-05-08 17:40:03

领域: quant-ph,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.05243v1

BenthicNet: A global compilation of seafloor images for deep learning applications

Advances in underwater imaging enable the collection of extensive seafloor image datasets that are necessary for monitoring important benthic ecosystems. The ability to collect seafloor imagery has outpaced our capacity to analyze it, hindering expedient mobilization of this crucial environmental information. Recent machine learning approaches provide opportunities to increase the efficiency with which seafloor image datasets are analyzed, yet large and consistent datasets necessary to support development of such approaches are scarce. Here we present BenthicNet: a global compilation of seafloor imagery designed to support the training and evaluation of large-scale image recognition models. An initial set of over 11.4 million images was collected and curated to represent a diversity of seafloor environments using a representative subset of 1.3 million images. These are accompanied by 2.6 million annotations translated to the CATAMI scheme, which span 190,000 of the images. A large deep learning model was trained on this compilation and preliminary results suggest it has utility for automating large and small-scale image analysis tasks. The compilation and model are made openly available for use by the scientific community at https://doi.org/10.20383/103.0614.

Updated: 2024-05-08 17:37:57

标题: BenthicNet：用于深度学习应用的全球海底图像编译

摘要: 水下成像技术的进步使得收集广泛的海底图像数据集成为监测重要底栖生态系统所必需。收集海底图像的能力已经超过了我们分析这些数据的能力，阻碍了这些关键环境信息的及时利用。最近的机器学习方法提供了增加海底图像数据集分析效率的机会，然而支撑这些方法开发所必需的大规模且一致的数据集却很稀缺。在这里，我们介绍了BenthicNet：一个全球海底图像的汇编，旨在支持大规模图像识别模型的训练和评估。首批超过1140万张图像被收集并筛选，以代表海底环境的多样性，其中包括130万张代表性子集的图像。这些图像附带有260万个CATAMI方案的注释，涵盖了19万张图像。一个大型深度学习模型在这个汇编上进行了训练，初步结果表明它具有自动化大规模和小规模图像分析任务的实用性。这个汇编和模型已经公开提供给科学界使用，网址为https://doi.org/10.20383/103.0614。

更新时间: 2024-05-08 17:37:57

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.05241v1

An LSTM-Based Chord Generation System Using Chroma Histogram Representations

This paper proposes a system for chord generation to monophonic symbolic melodies using an LSTM-based model trained on chroma histogram representations of chords. Chroma representations promise more harmonically rich generation than chord label-based approaches, whilst maintaining a small number of dimensions in the dataset. This system is shown to be suitable for limited real-time use. While it does not meet the state-of-the-art for coherent long-term generation, it does show diatonic generation with cadential chord relationships. The need for further study into chroma histograms as an extracted feature in chord generation tasks is highlighted.

Updated: 2024-05-08 17:36:29

标题: 基于LSTM的和弦生成系统：使用色度直方图表示

摘要: 本文提出了一种系统，用于使用基于LSTM模型训练的和弦的色度直方图表示来生成单声部符号旋律。色度表示承诺比基于和弦标签的方法更丰富的和谐生成，同时保持数据集中的维度较少。该系统被证明适用于有限的实时使用。虽然它不符合连贯的长期生成的最新技术，但它显示了具有终止和弦关系的音阶生成。强调了对色度直方图作为和弦生成任务中提取特征的进一步研究的需求。

更新时间: 2024-05-08 17:36:29

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.05240v1

Cellular Traffic Prediction Using Online Prediction Algorithms

The advent of 5G technology promises a paradigm shift in the realm of telecommunications, offering unprecedented speeds and connectivity. However, the efficient management of traffic in 5G networks remains a critical challenge. It is due to the dynamic and heterogeneous nature of network traffic, varying user behaviors, extended network size, and diverse applications, all of which demand highly accurate and adaptable prediction models to optimize network resource allocation and management. This paper investigates the efficacy of live prediction algorithms for forecasting cellular network traffic in real-time scenarios. We apply two live prediction algorithms on machine learning models, one of which is recently proposed Fast LiveStream Prediction (FLSP) algorithm. We examine the performance of these algorithms under two distinct data gathering methodologies: synchronous, where all network cells report statistics simultaneously, and asynchronous, where reporting occurs across consecutive time slots. Our study delves into the impact of these gathering scenarios on the predictive performance of traffic models. Our study reveals that the FLSP algorithm can halve the required bandwidth for asynchronous data reporting compared to conventional online prediction algorithms, while simultaneously enhancing prediction accuracy and reducing processing load. Additionally, we conduct a thorough analysis of algorithmic complexity and memory requirements across various machine learning models. Through empirical evaluation, we provide insights into the trade-offs inherent in different prediction strategies, offering valuable guidance for network optimization and resource allocation in dynamic environments.

Updated: 2024-05-08 17:36:14

标题: 使用在线预测算法进行细胞交通预测

摘要: 5G技术的出现承诺在电信领域带来一次范式转变，提供了前所未有的速度和连接性。然而，5G网络中交通的高效管理仍然是一个关键挑战。这是由于网络流量的动态和异构性质、用户行为的变化、扩展网络规模和多样化的应用程序，所有这些都要求高度准确和适应性强的预测模型来优化网络资源分配和管理。本文研究了实时场景中用于预测蜂窝网络流量的实时预测算法的有效性。我们将两种实时预测算法应用于机器学习模型，其中一种是最近提出的快速实时预测（FLSP）算法。我们检查了这些算法在两种不同的数据收集方法下的性能：同步，所有网络单元同时报告统计数据；异步，报告跨连续时间槽进行。我们的研究深入探讨了这些收集场景对流量模型预测性能的影响。我们的研究表明，与传统的在线预测算法相比，FLSP算法可以将异步数据报告所需的带宽减半，同时提高预测准确性并减少处理负载。此外，我们对各种机器学习模型的算法复杂性和内存需求进行了彻底分析。通过经验评估，我们提供了有关不同预测策略固有权衡的见解，为动态环境中的网络优化和资源分配提供了宝贵的指导。

更新时间: 2024-05-08 17:36:14

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.05239v1

Stability and Performance Analysis of Discrete-Time ReLU Recurrent Neural Networks

This paper presents sufficient conditions for the stability and $\ell_2$-gain performance of recurrent neural networks (RNNs) with ReLU activation functions. These conditions are derived by combining Lyapunov/dissipativity theory with Quadratic Constraints (QCs) satisfied by repeated ReLUs. We write a general class of QCs for repeated RELUs using known properties for the scalar ReLU. Our stability and performance condition uses these QCs along with a "lifted" representation for the ReLU RNN. We show that the positive homogeneity property satisfied by a scalar ReLU does not expand the class of QCs for the repeated ReLU. We present examples to demonstrate the stability / performance condition and study the effect of the lifting horizon.

Updated: 2024-05-08 17:30:50

标题: 离散时间ReLU循环神经网络的稳定性和性能分析

摘要: 本文提出了递归神经网络（RNNs）稳定性和$\ell_2$增益性能的充分条件，其中激活函数为ReLU。这些条件是通过将Lyapunov/耗散理论与由重复ReLU满足的二次约束（QCs）相结合得出的。我们使用已知的标量ReLU的性质编写了一般类的重复ReLU的QCs。我们的稳定性和性能条件使用这些QCs以及对ReLU RNN进行“提升”表示。我们表明标量ReLU满足的正齐次性质不会扩展重复ReLU的QCs类。我们提供示例来演示稳定性/性能条件，并研究提升视野的影响。

更新时间: 2024-05-08 17:30:50

领域: eess.SY,cs.LG,cs.SY,math.OC

下载: http://arxiv.org/abs/2405.05236v1

RACH Traffic Prediction in Massive Machine Type Communications

Traffic pattern prediction has emerged as a promising approach for efficiently managing and mitigating the impacts of event-driven bursty traffic in massive machine-type communication (mMTC) networks. However, achieving accurate predictions of bursty traffic remains a non-trivial task due to the inherent randomness of events, and these challenges intensify within live network environments. Consequently, there is a compelling imperative to design a lightweight and agile framework capable of assimilating continuously collected data from the network and accurately forecasting bursty traffic in mMTC networks. This paper addresses these challenges by presenting a machine learning-based framework tailored for forecasting bursty traffic in multi-channel slotted ALOHA networks. The proposed machine learning network comprises long-term short-term memory (LSTM) and a DenseNet with feed-forward neural network (FFNN) layers, where the residual connections enhance the training ability of the machine learning network in capturing complicated patterns. Furthermore, we develop a new low-complexity online prediction algorithm that updates the states of the LSTM network by leveraging frequently collected data from the mMTC network. Simulation results and complexity analysis demonstrate the superiority of our proposed algorithm in terms of both accuracy and complexity, making it well-suited for time-critical live scenarios. We evaluate the performance of the proposed framework in a network with a single base station and thousands of devices organized into groups with distinct traffic-generating characteristics. Comprehensive evaluations and simulations indicate that our proposed machine learning approach achieves a remarkable $52\%$ higher accuracy in long-term predictions compared to traditional methods, without imposing additional processing load on the system.

Updated: 2024-05-08 17:28:07

标题: 大规模机器类型通信中的RACH流量预测

摘要: 交通模式预测已经成为一种有效管理和减轻突发事件驱动的大规模机器类型通信（mMTC）网络中流量影响的有前途的方法。然而，由于事件的固有随机性，准确预测突发流量仍然是一项非常困难的任务，并且在实时网络环境中这些挑战变得更加严峻。因此，有必要设计一个轻量级和灵活的框架，能够吸收来自网络的持续收集数据，并准确预测mMTC网络中的突发流量。本文通过提出一个基于机器学习的框架，专门用于预测多通道分时ALOHA网络中的突发流量，来解决这些挑战。提出的机器学习网络包括长短期记忆（LSTM）和一个带有前馈神经网络（FFNN）层的DenseNet，其中残差连接增强了机器学习网络在捕捉复杂模式方面的训练能力。此外，我们开发了一种新的低复杂度在线预测算法，通过利用从mMTC网络频繁收集的数据来更新LSTM网络的状态。仿真结果和复杂度分析证明了我们提出的算法在准确性和复杂性方面的优越性，使其非常适用于时间关键的实时场景。我们在一个只有一个基站和数千台设备组织成具有不同流量生成特征的组的网络中评估了所提出的框架的性能。全面的评估和仿真结果表明，与传统方法相比，我们提出的机器学习方法在长期预测方面达到了显著的52%更高的准确度，而且不会给系统增加额外的处理负载。

更新时间: 2024-05-08 17:28:07

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.05235v1

DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

Graph neural networks (GNNs) are machine learning models specialized for graph data and widely used in many applications. To train GNNs on large graphs that exceed CPU memory, several systems store data on disk and conduct out-of-core processing. However, these systems suffer from either read amplification when reading node features that are usually smaller than a disk page or degraded model accuracy by treating the graph as disconnected partitions. To close this gap, we build a system called DiskGNN, which achieves high I/O efficiency and thus fast training without hurting model accuracy. The key technique used by DiskGNN is offline sampling, which helps decouple graph sampling from model computation. In particular, by conducting graph sampling beforehand, DiskGNN acquires the node features that will be accessed by model computation, and such information is utilized to pack the target node features contiguously on disk to avoid read amplification. Besides, \name{} also adopts designs including four-level feature store to fully utilize the memory hierarchy to cache node features and reduce disk access, batched packing to accelerate the feature packing process, and pipelined training to overlap disk access with other operations. We compare DiskGNN with Ginex and MariusGNN, which are state-of-the-art systems for out-of-core GNN training. The results show that DiskGNN can speed up the baselines by over 8x while matching their best model accuracy.

Updated: 2024-05-08 17:27:11

标题: DiskGNN：连接I/O效率和模型准确性，用于离核GNN训练

摘要: 图神经网络（GNNs）是专门针对图数据的机器学习模型，在许多应用中被广泛使用。为了在超出CPU内存的大型图上训练GNNs，一些系统将数据存储在磁盘上并进行离线处理。然而，这些系统在读取通常小于磁盘页的节点特征时要么遭受读放大，要么通过将图视为断开的分区而降低模型准确性。为了弥补这一差距，我们构建了一个名为DiskGNN的系统，实现高I/O效率，从而快速训练而不损害模型准确性。DiskGNN使用的关键技术是离线采样，它有助于将图采样与模型计算解耦。具体来说，通过事先进行图采样，DiskGNN获取将被模型计算访问的节点特征，并利用这些信息将目标节点特征连续地打包在磁盘上，以避免读放大。此外，DiskGNN还采用了包括四级特征存储在内的设计，充分利用内存层次结构来缓存节点特征并减少磁盘访问，批量打包来加速特征打包过程，以及流水线训练来重叠磁盘访问与其他操作。我们将DiskGNN与Ginex和MariusGNN进行比较，它们是用于离线GNN训练的最先进系统。结果显示，DiskGNN可以将基线加速超过8倍，同时与它们的最佳模型准确性相匹配。

更新时间: 2024-05-08 17:27:11

领域: cs.LG

下载: http://arxiv.org/abs/2405.05231v1

Personalized Autonomous Driving with Large Language Models: Field Experiments

Integrating large language models (LLMs) in autonomous vehicles enables conversation with AI systems to drive the vehicle. However, it also emphasizes the requirement for such systems to comprehend commands accurately and achieve higher-level personalization to adapt to the preferences of drivers or passengers over a more extended period. In this paper, we introduce an LLM-based framework, Talk2Drive, capable of translating natural verbal commands into executable controls and learning to satisfy personal preferences for safety, efficiency, and comfort with a proposed memory module. This is the first-of-its-kind multi-scenario field experiment that deploys LLMs on a real-world autonomous vehicle. Experiments showcase that the proposed system can comprehend human intentions at different intuition levels, ranging from direct commands like "can you drive faster" to indirect commands like "I am really in a hurry now". Additionally, we use the takeover rate to quantify the trust of human drivers in the LLM-based autonomous driving system, where Talk2Drive significantly reduces the takeover rate in highway, intersection, and parking scenarios. We also validate that the proposed memory module considers personalized preferences and further reduces the takeover rate by up to 65.2% compared with those without a memory module. The experiment video can be watched at https://www.youtube.com/watch?v=4BWsfPaq1Ro

Updated: 2024-05-08 17:24:33

标题: 个性化自动驾驶与大型语言模型：现场实验

摘要: 将大型语言模型（LLMs）集成到自动驾驶车辆中，使得与人工智能系统进行对话驱动车辆成为可能。然而，这也强调了这种系统需要准确理解命令并实现更高级别的个性化，以适应驾驶员或乘客在更长时间内的偏好。在本文中，我们介绍了一个基于LLM的框架Talk2Drive，能够将自然口头命令翻译成可执行的控制，并通过提出的记忆模块学习满足安全、效率和舒适性的个人偏好。这是首个在真实世界自动驾驶车辆上部署LLMs的多场景实验。实验证明，提出的系统能够理解不同直觉水平上的人类意图，从"你能开快点吗"到"我现在真的很着急"等间接命令。此外，我们使用接管率来量化人类驾驶员对基于LLM的自动驾驶系统的信任，其中Talk2Drive在高速公路、十字路口和停车场景中显著降低了接管率。我们还验证了提出的记忆模块考虑了个性化偏好，并与没有记忆模块的系统相比，进一步减少了接管率高达65.2%。实验视频可在https://www.youtube.com/watch?v=4BWsfPaq1Ro观看。

更新时间: 2024-05-08 17:24:33

领域: cs.AI

下载: http://arxiv.org/abs/2312.09397v3

Contrastive Learning Method for Sequential Recommendation based on Multi-Intention Disentanglement

Sequential recommendation is one of the important branches of recommender system, aiming to achieve personalized recommended items for the future through the analysis and prediction of users' ordered historical interactive behaviors. However, along with the growth of the user volume and the increasingly rich behavioral information, how to understand and disentangle the user's interactive multi-intention effectively also poses challenges to behavior prediction and sequential recommendation. In light of these challenges, we propose a Contrastive Learning sequential recommendation method based on Multi-Intention Disentanglement (MIDCL). In our work, intentions are recognized as dynamic and diverse, and user behaviors are often driven by current multi-intentions, which means that the model needs to not only mine the most relevant implicit intention for each user, but also impair the influence from irrelevant intentions. Therefore, we choose Variational Auto-Encoder (VAE) to realize the disentanglement of users' multi-intentions. We propose two types of contrastive learning paradigms for finding the most relevant user's interactive intention, and maximizing the mutual information of positive sample pairs, respectively. Experimental results show that MIDCL not only has significant superiority over most existing baseline methods, but also brings a more interpretable case to the research about intention-based prediction and recommendation.

Updated: 2024-05-08 17:23:11

标题: 基于多意图解缠的顺序推荐对比学习方法

摘要: Sequential recommendation是推荐系统的重要分支之一，旨在通过分析和预测用户的有序历史互动行为，为未来实现个性化推荐项目。然而，随着用户数量的增长和行为信息的日益丰富，如何有效理解和分解用户的交互多意图也对行为预测和顺序推荐提出了挑战。鉴于这些挑战，我们提出了一种基于Multi-Intention Disentanglement（MIDCL）的对比学习顺序推荐方法。在我们的工作中，意图被视为动态和多样化的，用户行为通常由当前的多个意图驱动，这意味着模型不仅需要挖掘每个用户最相关的隐含意图，还需要削弱来自不相关意图的影响。因此，我们选择变分自动编码器（VAE）来实现用户多重意图的分解。我们提出了两种对比学习范式，以找到最相关的用户互动意图，并分别最大化正样本对的互信息。实验结果表明，MIDCL不仅在大多数现有基线方法上具有显著优势，还为基于意图的预测和推荐研究带来了更具可解释性的案例。

更新时间: 2024-05-08 17:23:11

领域: cs.IR,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.18214v2

Test-Time Adaptation for Depth Completion

It is common to observe performance degradation when transferring models trained on some (source) datasets to target testing data due to a domain gap between them. Existing methods for bridging this gap, such as domain adaptation (DA), may require the source data on which the model was trained (often not available), while others, i.e., source-free DA, require many passes through the testing data. We propose an online test-time adaptation method for depth completion, the task of inferring a dense depth map from a single image and associated sparse depth map, that closes the performance gap in a single pass. We first present a study on how the domain shift in each data modality affects model performance. Based on our observations that the sparse depth modality exhibits a much smaller covariate shift than the image, we design an embedding module trained in the source domain that preserves a mapping from features encoding only sparse depth to those encoding image and sparse depth. During test time, sparse depth features are projected using this map as a proxy for source domain features and are used as guidance to train a set of auxiliary parameters (i.e., adaptation layer) to align image and sparse depth features from the target test domain to that of the source domain. We evaluate our method on indoor and outdoor scenarios and show that it improves over baselines by an average of 21.1%.

Updated: 2024-05-08 17:20:55

标题: 深度完成的测试时间适应

摘要: 当将在某些（源）数据集上训练的模型转移到目标测试数据时，通常会观察到性能下降，这是因为它们之间存在领域差距。现有的缩小这一差距的方法，如领域自适应（DA），可能需要模型训练的源数据（通常不可用），而其他方法，即无源DA，需要通过测试数据进行多次传递。我们提出了一种在线测试时间自适应方法，用于深度完成，即从单个图像和相关的稀疏深度图中推断出密集深度图的任务，在一次传递中缩小性能差距。我们首先对每种数据模态中的领域转移如何影响模型性能进行了研究。根据我们的观察，稀疏深度模态表现出比图像更小的协变量转移，我们设计了一个在源域中训练的嵌入模块，它保留了从仅编码稀疏深度到编码图像和稀疏深度的特征的映射。在测试时，使用这个映射将稀疏深度特征投影为源域特征的代理，并用作指导，来训练一组辅助参数（即自适应层），将目标测试域的图像和稀疏深度特征与源域的特征进行对齐。我们在室内和室外场景上评估了我们的方法，并表明它比基线平均提高了21.1％。

更新时间: 2024-05-08 17:20:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.03312v3

The Impact of Imperfect XAI on Human-AI Decision-Making

Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered explainable AI (XAI) techniques have been proposed in hopes of improving decision-makers' collaboration with AI; however, these techniques are grounded in findings from previous studies that primarily focus on the impact of incorrect AI advice. Few studies acknowledge the possibility of the explanations being incorrect even if the AI advice is correct. Thus, it is crucial to understand how imperfect XAI affects human-AI decision-making. In this work, we contribute a robust, mixed-methods user study with 136 participants to evaluate how incorrect explanations influence humans' decision-making behavior in a bird species identification task, taking into account their level of expertise and an explanation's level of assertiveness. Our findings reveal the influence of imperfect XAI and humans' level of expertise on their reliance on AI and human-AI team performance. We also discuss how explanations can deceive decision-makers during human-AI collaboration. Hence, we shed light on the impacts of imperfect XAI in the field of computer-supported cooperative work and provide guidelines for designers of human-AI collaboration systems.

Updated: 2024-05-08 17:14:25

标题: 不完美的可解释人工智能对人工智能决策的影响

摘要: 解释性技术正在迅速发展，以改善人工智能决策在各种合作工作环境中的表现。因此，先前的研究已经评估了决策者如何与不完美的人工智能合作，通过调查适当的依赖和任务表现，旨在设计更以人为中心的计算机支持的协作工具。已经提出了几种以人为中心的可解释人工智能（XAI）技术，希望改善决策者与人工智能的合作；然而，这些技术是建立在先前研究发现的基础之上的，主要关注不正确的人工智能建议的影响。很少有研究承认，即使人工智能建议是正确的，解释也可能是不正确的。因此，了解不完美的XAI如何影响人工智能决策至关重要。在这项工作中，我们进行了一项强大的混合方法用户研究，共有136名参与者，评估了不正确解释如何影响人类在鸟类种类识别任务中的决策行为，考虑他们的专业水平和解释的断言水平。我们的研究结果揭示了不完美的XAI和人类专业水平对其对人工智能依赖和人工智能团队表现的影响。我们还讨论了在人工智能协作期间解释如何欺骗决策者。因此，我们揭示了不完美XAI在计算机支持的合作工作领域的影响，并为人工智能协作系统的设计者提供指导。

更新时间: 2024-05-08 17:14:25

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2307.13566v4

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

Large Language Models (LLMs) have profoundly changed the world. Their self-attention mechanism is the key to the success of transformers in LLMs. However, the quadratic computational cost $O(n^2)$ to the length $n$ input sequence is the notorious obstacle for further improvement and scalability in the longer context. In this work, we leverage the convolution-like structure of attention matrices to develop an efficient approximation method for attention computation using convolution matrices. We propose a $\mathsf{conv}$ basis system, "similar" to the rank basis, and show that any lower triangular (attention) matrix can always be decomposed as a sum of $k$ structured convolution matrices in this basis system. We then design an algorithm to quickly decompose the attention matrix into $k$ convolution matrices. Thanks to Fast Fourier Transforms (FFT), the attention {\it inference} can be computed in $O(knd \log n)$ time, where $d$ is the hidden dimension. In practice, we have $ d \ll n$, i.e., $d=3,072$ and $n=1,000,000$ for Gemma. Thus, when $kd = n^{o(1)}$, our algorithm achieve almost linear time, i.e., $n^{1+o(1)}$. Furthermore, the attention {\it training forward} and {\it backward gradient} can be computed in $n^{1+o(1)}$ as well. Our approach can avoid explicitly computing the $n \times n$ attention matrix, which may largely alleviate the quadratic computational complexity. Furthermore, our algorithm works on any input matrices. This work provides a new paradigm for accelerating attention computation in transformers to enable their application to longer contexts.

Updated: 2024-05-08 17:11:38

标题: Conv-Basis：变压器中高效注意力推断和梯度计算的新范式

摘要: 大型语言模型（LLMs）已经彻底改变了世界。它们的自注意机制是LLMs中transformers成功的关键。然而，对长度为$n$的输入序列的二次计算成本$O(n^2)$是进一步改进和扩展更长上下文的臭名昭著的障碍。在这项工作中，我们利用注意力矩阵的类卷积结构，开发了一种使用卷积矩阵进行注意力计算的高效近似方法。我们提出了一个“类似”于排名基础的$\mathsf{conv}$基础系统，并展示任何下三角（注意力）矩阵总是能够在这个基础系统中分解为$k$个结构化卷积矩阵的总和。然后我们设计了一个算法，快速将注意力矩阵分解为$k$个卷积矩阵。由于快速傅立叶变换（FFT），注意力推断可以在$O(knd \log n)$时间内计算，其中$d$是隐藏维度。在实践中，我们有$d \ll n$，即$d=3,072$且$n=1,000,000$对于Gemma。因此，当$kd = n^{o(1)}$时，我们的算法几乎可以达到线性时间，即$n^{1+o(1)}$。此外，注意力训练前向和反向梯度也可以在$n^{1+o(1)}$内计算。我们的方法可以避免显式计算$n \times n$的注意力矩阵，这可能大大减轻二次计算复杂性。此外，我们的算法适用于任何输入矩阵。这项工作为加速transformers中的注意力计算提供了一个新的范例，以实现它们在更长上下文中的应用。

更新时间: 2024-05-08 17:11:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.05219v1

Locally Differentially Private In-Context Learning

Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference attacks (MIA) and prompt leaking attacks. In order to deal with this problem, we treat LLMs as untrusted in privacy and propose a locally differentially private framework of in-context learning(LDP-ICL) in the settings where labels are sensitive. Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL for classification. Moreover, we apply LDP-ICL to the discrete distribution estimation problem. In the end, we perform several experiments to demonstrate our analysis results.

Updated: 2024-05-08 17:10:23

标题: 局部差分隐私下的上下文学习

摘要: 大型预训练语言模型(LLMs)展示了令人惊讶的上下文学习(ICL)能力。在部署大型语言模型的一个重要应用是为某些特定任务增强LLMs与私人数据库。这种有前途的商业用途的主要问题是LLMs已被证明会记忆其训练数据，其提示数据容易受到成员推断攻击(MIA)和提示泄漏攻击的影响。为了解决这个问题，我们将LLMs视为不可信的隐私，并提出了一个在标签敏感的情境学习(LDP-ICL)的本地差分隐私框架。考虑到Transformers中的上下文学习机制通过梯度下降，我们对在这种LDP-ICL用于分类中的隐私和效用之间的权衡进行了分析。此外，我们将LDP-ICL应用于离散分布估计问题。最后，我们进行了几个实验来证明我们的分析结果。

更新时间: 2024-05-08 17:10:23

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04032v2

Speech Understanding on Tiny Devices with A Learning Cache

This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We leverage temporal locality in the speech inputs to a device and reuse recent SLU inferences accordingly. Our idea is simple: let the device match incoming inputs against cached results, and only offload inputs not matched to any cached ones to the cloud for full inference. Realization of this idea, however, is non-trivial: the device needs to compare acoustic features in a robust yet low-cost way. To this end, we present SpeechCache (or SC), a speech cache for tiny devices. It matches speech inputs at two levels of representations: first by sequences of clustered raw sound units, then as sequences of phonemes. Working in tandem, the two representations offer complementary tradeoffs between cost and efficiency. To boost accuracy even further, our cache learns to personalize: with the mismatched and then offloaded inputs, it continuously finetunes the device's feature extractors with the assistance of the cloud. We implement SC on an off-the-shelf STM32 microcontroller. The complete implementation has a small memory footprint of 2MB. Evaluated on challenging speech benchmarks, our system resolves 45%-90% of inputs on device, reducing the average latency by up to 80% compared to offloading to popular cloud speech recognition services. The benefit brought by our proposed SC is notable even in adversarial settings - noisy environments, cold cache, or one device shared by a number of users.

Updated: 2024-05-08 17:08:52

标题: 在小型设备上具有学习缓存的语音理解

摘要: 本文讨论了在类微控制器嵌入式设备上的口语理解（SLU），以一种新颖的方式将设备端执行与云端卸载集成在一起。我们利用设备中语音输入的时间局部性，并相应地重复使用最近的SLU推断结果。我们的想法很简单：让设备将传入的输入与缓存结果进行匹配，只将未与任何缓存结果匹配的输入卸载到云端进行完全推断。然而，实现这一想法并不是简单的：设备需要以稳健且低成本的方式比较声学特征。为此，我们提出了SpeechCache（或SC），一个适用于微型设备的语音缓存。它通过两个级别的表示匹配语音输入：首先是聚类的原始声音单元序列，然后是音素序列。两种表示共同工作，提供了在成本和效率之间互补的权衡。为了进一步提高准确性，我们的缓存学会个性化：通过不匹配然后卸载的输入，它不断通过云端的帮助对设备的特征提取器进行微调。我们在一款现成的STM32微控制器上实现了SC。完整的实现具有2MB的小内存占用。在具有挑战性的语音基准测试中进行评估，我们的系统解决了45%-90%的设备输入，与将输入卸载到流行的云端语音识别服务相比，平均延迟降低了高达80%。我们提出的SC即使在对抗性环境中（嘈杂环境、冷缓存或一个设备被多个用户共享）也带来了明显的好处。

更新时间: 2024-05-08 17:08:52

领域: eess.AS,cs.LG

下载: http://arxiv.org/abs/2311.18188v4

Fast Abstracts and Student Forum Proceedings -- EDCC 2024 -- 19th European Dependable Computing Conference

The goal of the Fast Abstracts track is to bring together researchers and practitioners working on dependable computing to discuss work in progress or opinion pieces. Contributions are welcome from academia and industry. Fast Abstracts aim to serve as a rapid and flexible mechanism to: (i) Report on current work that may or may not be complete; (ii) Introduce new ideas to the community; (iii) State positions on controversial issues or open problems; (iv) Share lessons learnt from real-word dependability engineering; and (v) Debunk or question results from other papers based on contra-indications. The Student Forum aims at creating a vibrant and friendly environment where students can present and discuss their work, and exchange ideas and experiences with other students, researchers and industry. One of the key goals of the Forum is to provide students with feedback on their preliminary results that might help with their future research directions.

Updated: 2024-05-08 17:05:10

标题: 快速摘要和学生论坛论文集 -- EDCC 2024 -- 第19届欧洲可靠计算会议

摘要: 快速摘要专题的目标是将从事可靠计算的研究人员和实践者聚集在一起，讨论正在进行中的工作或观点文章。学术界和工业界均欢迎贡献。快速摘要旨在作为一种快速灵活的机制：（i）报告可能完成或未完成的当前工作；（ii）向社区介绍新想法；（iii）表明对有争议问题或未解问题的立场；（iv）分享从实际可靠性工程中学到的经验教训；（v）根据相反迹象驳斥或质疑其他论文的结果。学生论坛旨在创造一个充满活力和友好的环境，使学生可以展示和讨论他们的工作，并与其他同学、研究人员和工业界交流想法和经验。论坛的一个关键目标是为学生提供反馈，帮助他们在未来的研究方向上取得进展。

更新时间: 2024-05-08 17:05:10

领域: cs.SE,cs.CY,cs.DC,cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.17465v4

Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors

Tensors are a fundamental operation in distributed and are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks, with an objective of minimizing the number of tasks needed to recover the overall result, known as the recovery threshold. However, we demonstrate that this strict combinatorial definition does not directly optimize the probability of failure. In this paper, we focus on the most likely event and measure the optimality of a coding scheme more directly by its probability of decoding. Our probabilistic approach leads us to a practical construction of random codes for matrix multiplication, i.e., locally random alloy codes, which are optimal with respect to the measures. Furthermore, the probabilistic approach allows us to discover a surprising impossibility theorem about both random and deterministic coded distributed tensors.

Updated: 2024-05-08 17:00:13

标题: 随机合金编码与编码分布式张量的基本限制

摘要: 张量是分布式计算中的基本操作，通常被分布到多个并行任务中用于处理大型数据集。慢速节点和其他故障可能严重影响整体完成时间。最近的研究在编码计算中提出了一种新颖的策略，通过编码任务来减轻慢速节点的影响，其目标是最小化恢复整体结果所需的任务数量，即恢复阈值。然而，我们证明这种严格的组合定义并不直接优化失败的概率。在本文中，我们关注最可能发生的事件，并通过其解码概率更直接地衡量编码方案的优越性。我们的概率方法使我们得出了一个针对矩阵乘法的随机编码构造的实际方案，即局部随机合金编码，这些编码在度量标准方面是最优的。此外，概率方法使我们能够发现一个关于随机和确定编码的分布式张量的令人惊讶的不可能定理。

更新时间: 2024-05-08 17:00:13

领域: cs.IT,cs.DC,cs.LG,cs.NA,cs.SC,math.IT,math.NA,E.4; H.1.1; C.2.4; B.8.1; C.4; G.1.3; I.2.6; I.1.2

下载: http://arxiv.org/abs/2202.03469v6

Leafy Spurge Dataset: Real-world Weed Classification Within Aerial Drone Imagery

Invasive plant species are detrimental to the ecology of both agricultural and wildland areas. Euphorbia esula, or leafy spurge, is one such plant that has spread through much of North America from Eastern Europe. When paired with contemporary computer vision systems, unmanned aerial vehicles, or drones, offer the means to track expansion of problem plants, such as leafy spurge, and improve chances of controlling these weeds. We gathered a dataset of leafy spurge presence and absence in grasslands of western Montana, USA, then surveyed these areas with a commercial drone. We trained image classifiers on these data, and our best performing model, a pre-trained DINOv2 vision transformer, identified leafy spurge with 0.84 accuracy (test set). This result indicates that classification of leafy spurge is tractable, but not solved. We release this unique dataset of labelled and unlabelled, aerial drone imagery for the machine learning community to explore. Improving classification performance of leafy spurge would benefit the fields of ecology, conservation, and remote sensing alike. Code and data are available at our website: leafy-spurge-dataset.github.io.

Updated: 2024-05-08 16:59:05

标题: 《叶状大戟数据集：无人机航拍图像中真实世界的杂草分类》

摘要: 外来植物种对农业和荒野地区的生态都具有破坏性。一种名为Euphorbia esula（或称为leafy spurge）的植物从东欧传播到北美大部分地区。与当代计算机视觉系统相配合，无人机（或称为无人机）可以追踪leafy spurge等问题植物的扩张，并提高控制这些杂草的机会。我们收集了美国蒙大拿西部草地上leafy spurge存在和不存在的数据集，然后用商用无人机调查这些区域。我们在这些数据上训练了图像分类器，我们表现最佳的模型是一个预先训练的DINOv2视觉转换器，其在测试集上以0.84的准确率识别出leafy spurge。这一结果表明，对leafy spurge的分类是可行的，但并未完全解决。我们发布了这一独特的带标签和未标签的航空无人机图像数据集，供机器学习社区探索。改进对leafy spurge的分类性能将使生态学、保护和遥感领域受益。代码和数据可在我们的网站上找到：leafy-spurge-dataset.github.io。

更新时间: 2024-05-08 16:59:05

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.03702v2

Riemann-Lebesgue Forest for Regression

We propose a novel ensemble method called Riemann-Lebesgue Forest (RLF) for regression. The core idea in RLF is to mimic the way how a measurable function can be approximated by partitioning its range into a few intervals. With this idea in mind, we develop a new tree learner named Riemann-Lebesgue Tree (RLT) which has a chance to perform Lebesgue type cutting,i.e splitting the node from response $Y$ at certain non-terminal nodes. We show that the optimal Lebesgue type cutting results in larger variance reduction in response $Y$ than ordinary CART \cite{Breiman1984ClassificationAR} cutting (an analogue of Riemann partition). Such property is beneficial to the ensemble part of RLF. We also generalize the asymptotic normality of RLF under different parameter settings. Two one-dimensional examples are provided to illustrate the flexibility of RLF. The competitive performance of RLF against original random forest \cite{Breiman2001RandomF} is demonstrated by experiments in simulation data and real world datasets.

Updated: 2024-05-08 16:51:08

标题: 利用Riemann-Lebesgue森林进行回归

摘要: 我们提出了一种新的集成方法，称为黎曼-勒贝格森林（RLF）用于回归。RLF的核心思想是模仿如何通过将其范围划分为几个区间来逼近可测函数的方式。基于这个想法，我们开发了一种名为黎曼-勒贝格树（RLT）的新树学习器，它有可能执行勒贝格类型的切割，即在某些非终端节点处将节点从响应$Y$中分割出来。我们展示了最优勒贝格类型的切割会比普通的CART切割（黎曼分区的类比）在响应$Y$中实现更大的方差减少。这种特性对RLF的集成部分是有益的。我们还推广了在不同参数设置下RLF的渐近正态性。我们提供了两个一维示例来说明RLF的灵活性。通过在模拟数据和真实世界数据集上的实验证明了RLF与原始随机森林之间的竞争性能。

更新时间: 2024-05-08 16:51:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.04550v2

Anomaly Detection in Certificate Transparency Logs

We propose an anomaly detection technique for X.509 certificates utilizing Isolation Forest. This method can be beneficial when compliance testing with X.509 linters proves unsatisfactory, and we seek to identify anomalies beyond standards compliance. The technique is validated on a sample of certificates from Certificate Transparency logs.

Updated: 2024-05-08 16:43:50

标题: 证书透明度日志中的异常检测

摘要: 我们提出了一种利用孤立森林检测X.509证书异常的技术。当使用X.509检查器进行合规性测试未达到满意时，这种方法可能会有益处，并且我们希望识别超出标准合规性的异常情况。该技术已在来自证书透明日志的证书样本上进行了验证。

更新时间: 2024-05-08 16:43:50

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.05206v1

Hybrid Quantum Graph Neural Network for Molecular Property Prediction

To accelerate the process of materials design, materials science has increasingly used data driven techniques to extract information from collected data. Specially, machine learning (ML) algorithms, which span the ML discipline, have demonstrated ability to predict various properties of materials with the level of accuracy similar to explicit calculation of quantum mechanical theories, but with significantly reduced run time and computational resources. Within ML, graph neural networks have emerged as an important algorithm within the field of machine learning, since they are capable of predicting accurately a wide range of important physical, chemical and electronic properties due to their higher learning ability based on the graph representation of material and molecular descriptors through the aggregation of information embedded within the graph. In parallel with the development of state of the art classical machine learning applications, the fusion of quantum computing and machine learning have created a new paradigm where classical machine learning model can be augmented with quantum layers which are able to encode high dimensional data more efficiently. Leveraging the structure of existing algorithms, we developed a unique and novel gradient free hybrid quantum classical convoluted graph neural network (HyQCGNN) to predict formation energies of perovskite materials. The performance of our hybrid statistical model is competitive with the results obtained purely from a classical convoluted graph neural network, and other classical machine learning algorithms, such as XGBoost. Consequently, our study suggests a new pathway to explore how quantum feature encoding and parametric quantum circuits can yield drastic improvements of complex ML algorithm like graph neural network.

Updated: 2024-05-08 16:43:25

标题: 混合量子图神经网络用于分子性质预测

摘要: 为了加快材料设计的过程，材料科学越来越多地使用数据驱动技术从收集到的数据中提取信息。特别是，涵盖了机器学习（ML）学科的ML算法已经表现出能够以与量子力学理论明确计算相似的准确度预测各种材料性质的能力，但运行时间和计算资源显著减少。在ML中，图神经网络已经成为机器学习领域的重要算法，因为它们能够准确预测各种重要的物理、化学和电子性质，这是由于它们在材料和分子描述符的图表示中通过聚合图中嵌入的信息具有更高的学习能力。与最先进的经典机器学习应用程序的发展并行，量子计算和机器学习的融合创造了一个新的范式，其中经典机器学习模型可以与能够更有效地编码高维数据的量子层相结合。利用现有算法的结构，我们开发了一种独特而新颖的无梯度混合量子经典卷积图神经网络（HyQCGNN）来预测钙钛矿材料的形成能。我们的混合统计模型的性能与纯粹从古典卷积图神经网络以及其他经典机器学习算法（如XGBoost）获得的结果具有竞争力。因此，我们的研究表明了一种新的探索路径，即量子特征编码和参数量子电路如何能够显著改进复杂的ML算法，如图神经网络。

更新时间: 2024-05-08 16:43:25

领域: quant-ph,cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2405.05205v1

Guided Combinatorial Algorithms for Submodular Maximization

For constrained, not necessarily monotone submodular maximization, guiding the measured continuous greedy algorithm with a local search algorithm currently obtains the state-of-the-art approximation factor of 0.401 \citep{buchbinder2023constrained}. These algorithms rely upon the multilinear extension and the Lovasz extension of a submodular set function. However, the state-of-the-art approximation factor of combinatorial algorithms has remained $1/e \approx 0.367$ \citep{buchbinder2014submodular}. In this work, we develop combinatorial analogues of the guided measured continuous greedy algorithm and obtain approximation ratio of $0.385$ in $\oh{ kn }$ queries to the submodular set function for size constraint, and $0.305$ for a general matroid constraint. Further, we derandomize these algorithms, maintaining the same ratio and asymptotic time complexity. Finally, we develop a deterministic, nearly linear time algorithm with ratio $0.377$.

Updated: 2024-05-08 16:39:59

标题: 引导组合算法用于次模最大化

摘要: 对于受限制的、不一定单调的次模最大化问题，目前引导测量连续贪心算法与一个局部搜索算法相结合，获得了近似比为0.401的最优结果。这些算法依赖于次模集函数的多线性扩展和Lovasz扩展。然而，组合算法的最优近似比仍然保持在$1/e \approx 0.367$。在这项工作中，我们开发了引导测量连续贪心算法的组合模拟，并在对次模集函数进行$\oh{ kn }$次查询的情况下获得了近似比为0.385的结果，对于尺寸约束，以及在一般的拟阵约束下为0.305。此外，我们对这些算法进行了去随机化处理，保持了相同的比率和渐近时间复杂度。最后，我们开发了一个确定性的、近线性时间的算法，其比率为0.377。

更新时间: 2024-05-08 16:39:59

领域: cs.DS,cs.DM,cs.LG

下载: http://arxiv.org/abs/2405.05202v1

SINBAD: Saliency-informed detection of breakage caused by ad blocking

Privacy-enhancing blocking tools based on filter-list rules tend to break legitimate functionality. Filter-list maintainers could benefit from automated breakage detection tools that allow them to proactively fix problematic rules before deploying them to millions of users. We introduce SINBAD, an automated breakage detector that improves the accuracy over the state of the art by 20%, and is the first to detect dynamic breakage and breakage caused by style-oriented filter rules. The success of SINBAD is rooted in three innovations: (1) the use of user-reported breakage issues in forums that enable the creation of a high-quality dataset for training in which only breakage that users perceive as an issue is included; (2) the use of 'web saliency' to automatically identify user-relevant regions of a website on which to prioritize automated interactions aimed at triggering breakage; and (3) the analysis of webpages via subtrees which enables fine-grained identification of problematic filter rules.

Updated: 2024-05-08 16:35:06

标题: SINBAD：基于显著性信息的广告拦截导致的破坏检测

摘要: 隐私增强的基于过滤列表规则的阻挡工具往往会破坏合法功能。过滤列表维护者可以从自动化破坏检测工具中受益，这些工具允许他们在将问题规则部署给数百万用户之前主动修复问题。我们介绍了SINBAD，一个自动破坏检测器，其准确性比现有技术提高了20%，并且是第一个能检测到动态破坏和由样式导向的过滤规则引起的破坏的工具。SINBAD的成功根植于三个创新：(1)利用用户在论坛中报告的破坏问题，从而创造出一个高质量的训练数据集，其中只包含用户认为有问题的破坏；(2)利用“网页显著性”自动识别网站上与用户相关的区域，以便优先处理自动交互，以触发破坏；(3)通过子树分析网页，实现对问题过滤规则的细粒度识别。

更新时间: 2024-05-08 16:35:06

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.05196v1

Systematic Use of Random Self-Reducibility against Physical Attacks

This work presents a novel, black-box software-based countermeasure against physical attacks including power side-channel and fault-injection attacks. The approach uses the concept of random self-reducibility and self-correctness to add randomness and redundancy in the execution for protection. Our approach is at the operation level, is not algorithm-specific, and thus, can be applied for protecting a wide range of algorithms. The countermeasure is empirically evaluated against attacks over operations like modular exponentiation, modular multiplication, polynomial multiplication, and number theoretic transforms. An end-to-end implementation of this countermeasure is demonstrated for RSA-CRT signature algorithm and Kyber Key Generation public key cryptosystems. The countermeasure reduced the power side-channel leakage by two orders of magnitude, to an acceptably secure level in TVLA analysis. For fault injection, the countermeasure reduces the number of faults to 95.4% in average.

Updated: 2024-05-08 16:31:41

标题: 物理攻击中对随机自可减少性的系统使用

摘要: 这项研究提出了一种新颖的、基于黑盒软件的抗物理攻击措施，包括功耗侧信道和故障注入攻击。该方法利用随机自可简化和自纠正的概念，在执行过程中添加随机性和冗余以提供保护。我们的方法在操作级别上，不特定于算法，因此可以用于保护各种算法。该抗性措施经过实证评估，针对模幂、模乘、多项式乘法和数论变换等操作进行了攻击。该抗性措施在RSA-CRT签名算法和Kyber密钥生成公钥密码系统的端到端实现中得到了展示。该抗性措施将功耗侧信道泄漏降低了两个数量级，达到了在TVLA分析中可接受的安全水平。对于故障注入，该抗性措施平均将故障数量减少了95.4%。

更新时间: 2024-05-08 16:31:41

领域: cs.CR

下载: http://arxiv.org/abs/2405.05193v1

Improved Generalization Bounds for Communication Efficient Federated Learning

This paper focuses on reducing the communication cost of federated learning by exploring generalization bounds and representation learning. We first characterize a tighter generalization bound for one-round federated learning based on local clients' generalizations and heterogeneity of data distribution (non-iid scenario). We also characterize a generalization bound in R-round federated learning and its relation to the number of local updates (local stochastic gradient descents (SGDs)). Then, based on our generalization bound analysis and our representation learning interpretation of this analysis, we show for the first time that less frequent aggregations, hence more local updates, for the representation extractor (usually corresponds to initial layers) leads to the creation of more generalizable models, particularly for non-iid scenarios. We design a novel Federated Learning with Adaptive Local Steps (FedALS) algorithm based on our generalization bound and representation learning analysis. FedALS employs varying aggregation frequencies for different parts of the model, so reduces the communication cost. The paper is followed with experimental results showing the effectiveness of FedALS.

Updated: 2024-05-08 16:31:03

标题: 通信高效的联邦学习改进的泛化界限

摘要: 本文主要关注通过探索泛化界限和表示学习来减少联邦学习的通信成本。我们首先基于本地客户端的泛化和数据分布的异质性（非独立同分布场景），为一轮联邦学习刻画了更紧密的泛化界限。我们还刻画了R轮联邦学习的泛化界限及其与本地更新次数（本地随机梯度下降（SGD））的关系。然后，基于我们的泛化界限分析和对该分析的表示学习解释，我们首次表明，对于表示提取器（通常对应于初始层），更少的聚合频率，因此更多的本地更新，会导致创建更具泛化能力的模型，尤其适用于非独立同分布场景。我们设计了一种基于泛化界限和表示学习分析的新型自适应本地步骤（FedALS）算法。FedALS为模型的不同部分采用不同的聚合频率，从而降低了通信成本。该论文随后展示了FedALS的有效性的实验结果。

更新时间: 2024-05-08 16:31:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.11754v2

Full error analysis of the random deep splitting method for nonlinear parabolic PDEs and PIDEs with infinite activity

In this paper, we present a randomized extension of the deep splitting algorithm introduced in [Beck, Becker, Cheridito, Jentzen, and Neufeld (2021)] using random neural networks suitable to approximately solve both high-dimensional nonlinear parabolic PDEs and PIDEs with jumps having (possibly) infinite activity. We provide a full error analysis of our so-called random deep splitting method. In particular, we prove that our random deep splitting method converges to the (unique viscosity) solution of the nonlinear PDE or PIDE under consideration. Moreover, we empirically analyze our random deep splitting method by considering several numerical examples including both nonlinear PDEs and nonlinear PIDEs relevant in the context of pricing of financial derivatives under default risk. In particular, we empirically demonstrate in all examples that our random deep splitting method can approximately solve nonlinear PDEs and PIDEs in 10'000 dimensions within seconds.

Updated: 2024-05-08 16:30:45

标题: 非线性抛物型PDE和PIDE的随机深度分裂方法的完全误差分析

摘要: 在本文中，我们提出了深度分裂算法的随机扩展，该算法是在[Beck, Becker, Cheridito, Jentzen和Neufeld（2021）]中引入的，使用适用于近似解决具有（可能）无限活动的跳跃的高维非线性抛物线PDE和PIDE的随机神经网络。我们提供了我们所谓的随机深度分裂方法的完整误差分析。特别地，我们证明了我们的随机深度分裂方法收敛于所考虑的非线性PDE或PIDE的（唯一的粘性）解。此外，我们通过考虑几个数值例子，包括在金融衍生品定价背景下相关的非线性PDE和非线性PIDE，对我们的随机深度分裂方法进行了实证分析。特别地，在所有例子中，我们实证地证明了我们的随机深度分裂方法可以在几秒钟内在10,000维度内近似解决非线性PDE和PIDE。

更新时间: 2024-05-08 16:30:45

领域: math.NA,cs.LG,cs.NA,math.PR,q-fin.MF

下载: http://arxiv.org/abs/2405.05192v1

Is Transductive Learning Equivalent to PAC Learning?

Most work in the area of learning theory has focused on designing effective Probably Approximately Correct (PAC) learners. Recently, other models of learning such as transductive error have seen more scrutiny. We move toward showing that these problems are equivalent by reducing agnostic learning with a PAC guarantee to agnostic learning with a transductive guarantee by adding a small number of samples to the dataset. We first rederive the result of Aden-Ali et al. arXiv:2304.09167 reducing PAC learning to transductive learning in the realizable setting using simpler techniques and at more generality as background for our main positive result. Our agnostic transductive to PAC conversion technique extends the aforementioned argument to the agnostic case, showing that an agnostic transductive learner can be efficiently converted to an agnostic PAC learner. Finally, we characterize the performance of the agnostic one inclusion graph algorithm of Asilis et al. arXiv:2309.13692 for binary classification, and show that plugging it into our reduction leads to an agnostic PAC learner that is essentially optimal. Our results imply that transductive and PAC learning are essentially equivalent for supervised learning with pseudometric losses in the realizable setting, and for binary classification in the agnostic setting. We conjecture this is true more generally for the agnostic setting.

Updated: 2024-05-08 16:26:49

标题: 转导学习是否等同于PAC学习？

摘要: 在学习理论领域，大部分工作集中在设计有效的可能近似正确（PAC）学习器上。最近，其他学习模型，如传导误差，受到了更多的审查。我们试图通过向数据集添加少量样本，将无知学习与PAC保证转化为具有传导保证的无知学习，从而表明这些问题是等价的。我们首先通过使用更简单的技术和更一般的背景重新推导Aden-Ali等人在真实设置中将PAC学习减少到传导学习的结果arXiv:2304.09167，作为我们主要积极结果的背景。我们的无知传导到PAC转化技术将上述论点扩展到无知情况，表明无知传导学习者可以有效地转化为无知PAC学习者。最后，我们对Asilis等人在二元分类中的含图算法arXiv:2309.13692的性能进行了特征化，显示将其插入我们的减少中会导致一个基本上最优的无知PAC学习者。我们的结果表明，在真实设置中，对于带有伪度量损失的监督学习以及无知设置中的二元分类，传导学习和PAC学习在本质上是等价的。我们猜测这对于更一般的无知设置来说也是正确的。

更新时间: 2024-05-08 16:26:49

领域: stat.ML,cs.DS,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2405.05190v1

MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning

We study the task of conducting structured reasoning as generating a reasoning graph from natural language input using large language models (LLMs). Previous approaches have explored various prompting schemes, yet they suffer from error propagation due to the autoregressive nature and single-pass-based decoding, which lack error correction capability. Additionally, relying solely on a single sample may result in the omission of true nodes and edges. To counter this, we draw inspiration from self-consistency (SC), which involves sampling a diverse set of reasoning chains and taking the majority vote as the final answer. To tackle the substantial challenge of applying SC on generated graphs, we propose MIDGARD (MInimum Description length Guided Aggregation of Reasoning in Directed acyclic graph) that leverages Minimum Description Length (MDL)-based formulation to identify consistent properties among the different graph samples generated by an LLM. This formulation helps reject properties that appear in only a few samples, which are likely to be erroneous, while enabling the inclusion of missing elements without compromising precision. Our method demonstrates superior performance than comparisons across various structured reasoning tasks, including argument structure extraction, explanation graph generation, inferring dependency relations among actions for everyday tasks, and semantic graph generation from natural texts.

Updated: 2024-05-08 16:25:42

标题: MIDGARD：使用最小描述长度实现结构化常识推理的自洽性

摘要: 我们研究了利用大型语言模型（LLMs）从自然语言输入生成推理图的结构推理任务。先前的方法已经探讨了各种提示方案，但由于自回归性质和单次传递解码导致了错误传播，缺乏错误校正能力。此外，仅依赖单个样本可能会导致遗漏真实节点和边。为了应对这一挑战，我们从自一致性（SC）中汲取灵感，该方法涉及对一系列不同的推理链进行抽样，并以多数票作为最终答案。为了解决将SC应用于生成的图所面临的重大挑战，我们提出了MIDGARD（MInimum Description length Guided Aggregation of Reasoning in Directed acyclic graph），该方法利用基于最小描述长度（MDL）的公式来识别LLM生成的不同图样本之间的一致性属性。这个公式有助于拒绝只出现在少数样本中的属性，这些属性可能是错误的，同时又能够包含缺失的元素而不影响精度。我们的方法在各种结构化推理任务中表现出比较优越的性能，包括论据结构提取、解释图生成、推断日常任务中行动之间的依赖关系以及从自然文本生成语义图。

更新时间: 2024-05-08 16:25:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.05189v1

A score-based particle method for homogeneous Landau equation

We propose a novel score-based particle method for solving the Landau equation in plasmas, that seamlessly integrates learning with structure-preserving particle methods [arXiv:1910.03080]. Building upon the Lagrangian viewpoint of the Landau equation, a central challenge stems from the nonlinear dependence of the velocity field on the density. Our primary innovation lies in recognizing that this nonlinearity is in the form of the score function, which can be approximated dynamically via techniques from score-matching. The resulting method inherits the conservation properties of the deterministic particle method while sidestepping the necessity for kernel density estimation in [arXiv:1910.03080]. This streamlines computation and enhances scalability with dimensionality. Furthermore, we provide a theoretical estimate by demonstrating that the KL divergence between our approximation and the true solution can be effectively controlled by the score-matching loss. Additionally, by adopting the flow map viewpoint, we derive an update formula for exact density computation. Extensive examples have been provided to show the efficiency of the method, including a physically relevant case of Coulomb interaction.

Updated: 2024-05-08 16:22:47

标题: 一种基于评分的粒子方法用于均匀Landau方程

摘要: 我们提出了一种新颖的基于分数的粒子方法，用于解决等离子体中的Landau方程，该方法无缝地将学习与保持结构的粒子方法相结合[arXiv:1910.03080]。基于Landau方程的拉格朗日观点，一个主要挑战来源于速度场对密度的非线性依赖关系。我们的主要创新在于认识到这种非线性是以分数函数的形式存在的，可以通过分数匹配技术动态近似。由此产生的方法继承了确定性粒子方法的守恒性质，同时避免了[arXiv:1910.03080]中核密度估计的必要性。这简化了计算并提高了在不同维度上的可扩展性。此外，我们通过展示KL散度的理论估计，证明了我们的近似与真实解之间的控制可以通过分数匹配损失有效实现。此外，通过采用流图观点，我们推导出了精确密度计算的更新公式。我们提供了大量示例以展示该方法的效率，包括库仑相互作用的物理相关案例。

更新时间: 2024-05-08 16:22:47

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2405.05187v1

Machine Learning Assisted Dynamical Classification of Trans-Neptunian Objects

Trans-Neptunian objects (TNOs) are small, icy bodies in the outer solar system. They are observed to have a complex orbital distribution that was shaped by the early dynamical history and migration of the giant planets. Comparisons between the different dynamical classes of modeled and observed TNOs can help constrain the history of the outer solar system. Because of the complex dynamics of TNOs, particularly those in and near mean motion resonances with Neptune, classification has traditionally been done by human inspection of plots of the time evolution of orbital parameters. This is very inefficient. The Vera Rubin Observatory's Legacy Survey of Space and Time (LSST) is expected to increase the number of known TNOs by a factor of $\sim$10, necessitating a much more automated process. In this chapter we present an improved supervised machine learning classifier for TNOs. Using a large and diverse training set as well as carefully chosen, dynamically motivated data features calculated from numerical integrations of TNO orbits, our classifier returns results that match those of a human classifier 98% of the time, and dynamically relevant classifications 99.7% of the time. This classifier is dramatically more efficient than human classification, and it will improve classification of both observed and modeled TNO data.

Updated: 2024-05-08 16:20:47

标题: 机器学习辅助下的矮行星带天体动力学分类

摘要: 距离海王星（Neptune）以外的天体（TNOs）是外太阳系中的小型冰体。它们被观察到具有复杂的轨道分布，这是由巨行星早期动力学历史和迁移所塑造的。对建模和观测到的TNOs不同动力学类别之间的比较可以帮助限制外太阳系的历史。由于TNOs的复杂动力学，特别是那些与海王星相近的平均运动共振的天体，传统上是通过人工检查轨道参数时间演化图进行分类的，这是非常低效的。预计维拉·鲁宾天文台的空间和时间遗产调查（LSST）将使已知TNOs的数量增加约10倍，这需要一个更加自动化的过程。在本章中，我们提出了一个改进的TNOs监督机器学习分类器。利用大量丰富多样的训练集，以及从TNO轨道的数值积分计算得出的精心选择的动态激励数据特征，我们的分类器返回的结果与人工分类器匹配的概率为98％，动力学相关分类的概率为99.7％。这个分类器比人工分类显著更高效，它将改善观测和建模的TNO数据的分类。

更新时间: 2024-05-08 16:20:47

领域: astro-ph.EP,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2405.05185v1

Air Gap: Protecting Privacy-Conscious Conversational Agents

The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand. Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective.

Updated: 2024-05-08 16:12:45

标题: 气隙：保护注重隐私的对话代理

摘要: 越来越多基于大型语言模型（LLM）的对话代理用于管理敏感用户数据，引发了重大的隐私问题。虽然这些代理在理解和应对上下文方面表现出色，但这种能力可能会被恶意行为者利用。我们引入了一个新颖的威胁模型，其中恶意第三方应用程序操纵交互的上下文，以欺骗LLM型代理揭露与当前任务无关的私人信息。基于上下文完整性框架，我们介绍了AirGapAgent，一个注重隐私的代理，旨在通过限制代理仅访问特定任务所需的数据来防止意外数据泄漏。使用Gemini、GPT和Mistral模型作为代理进行广泛实验验证了我们的方法在减轻这种上下文劫持形式的有效性，同时保持核心代理功能。例如，我们展示了对Gemini Ultra代理进行单次查询上下文劫持攻击将其保护用户数据的能力从94%降低到45%，而AirGapAgent实现了97%的保护，使得同样的攻击无效。

更新时间: 2024-05-08 16:12:45

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.05175v1

A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this report will inspire the community and encourage more research work on 3D occupancy perception. A comprehensive list of studies in this survey is available in an active repository that continuously collects the latest work: https://github.com/HuaiyuanXu/3D-Occupancy-Perception.

Updated: 2024-05-08 16:10:46

标题: 关于自动驾驶车辆占有率感知的调查：信息融合视角

摘要: 3D占用感知技术旨在观察和理解自动驾驶车辆的密集三维环境。由于其全面的感知能力，这项技术正在自动驾驶感知系统中成为一种趋势，并受到工业界和学术界的极大关注。类似于传统的鸟瞰感知，3D占用感知具有多源输入的特性和信息融合的必要性。然而，不同之处在于它捕捉了2D鸟瞰中忽略的垂直结构。在这项调查中，我们审查了关于3D占用感知的最新研究，并对具有不同输入模态的方法进行了深入分析。具体而言，我们总结了一般的网络流程，突出了信息融合技术，并讨论了有效的网络训练。我们评估和分析了最流行数据集上最新技术的占用感知性能。此外，我们还讨论了挑战和未来研究方向。我们希望这份报告能激发社区的兴趣，鼓励更多关于3D占用感知的研究工作。该调查中研究的综合列表可在一个持续收集最新工作的活跃存储库中找到：https://github.com/HuaiyuanXu/3D-Occupancy-Perception。

更新时间: 2024-05-08 16:10:46

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.05173v1

Custom Gradient Estimators are Straight-Through Estimators in Disguise

Quantization-aware training comes with a fundamental challenge: the derivative of quantization functions such as rounding are zero almost everywhere and nonexistent elsewhere. Various differentiable approximations of quantization functions have been proposed to address this issue. In this paper, we prove that when the learning rate is sufficiently small, a large class of weight gradient estimators is equivalent with the straight through estimator (STE). Specifically, after swapping in the STE and adjusting both the weight initialization and the learning rate in SGD, the model will train in almost exactly the same way as it did with the original gradient estimator. Moreover, we show that for adaptive learning rate algorithms like Adam, the same result can be seen without any modifications to the weight initialization and learning rate. We experimentally show that these results hold for both a small convolutional model trained on the MNIST dataset and for a ResNet50 model trained on ImageNet.

Updated: 2024-05-08 16:07:56

标题: 自定义梯度估计器是伪装的直通估计器

摘要: 量化感知训练面临一个基本挑战：量化函数的导数，如四舍五入，几乎在所有地方都为零，在其他地方不存在。已经提出了各种可微分的量化函数近似方法来解决这个问题。在本文中，我们证明了当学习率足够小时，一大类权重梯度估计器等同于直通估计器(STE)。具体来说，在SGD中使用STE并调整权重初始化和学习率后，模型将以几乎与原始梯度估计器相同的方式进行训练。此外，我们表明对于像Adam这样的自适应学习率算法，无需对权重初始化和学习率进行任何修改，也可以得到相同的结果。我们实验证明，这些结果对于在MNIST数据集上训练的小型卷积模型和在ImageNet上训练的ResNet50模型都成立。

更新时间: 2024-05-08 16:07:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.05171v1

Data-Error Scaling in Machine Learning on Natural Discrete Combinatorial Mutation-prone Sets: Case Studies on Peptides and Small Molecules

We investigate trends in the data-error scaling behavior of machine learning (ML) models trained on discrete combinatorial spaces that are prone-to-mutation, such as proteins or organic small molecules. We trained and evaluated kernel ridge regression machines using variable amounts of computationally generated training data. Our synthetic datasets comprise i) two na\"ive functions based on many-body theory; ii) binding energy estimates between a protein and a mutagenised peptide; and iii) solvation energies of two 6-heavy atom structural graphs. In contrast to typical data-error scaling, our results showed discontinuous monotonic phase transitions during learning, observed as rapid drops in the test error at particular thresholds of training data. We observed two learning regimes, which we call saturated and asymptotic decay, and found that they are conditioned by the level of complexity (i.e. number of mutations) enclosed in the training set. We show that during training on this class of problems, the predictions were clustered by the ML models employed in the calibration plots. Furthermore, we present an alternative strategy to normalize learning curves (LCs) and the concept of mutant based shuffling. This work has implications for machine learning on mutagenisable discrete spaces such as chemical properties or protein phenotype prediction, and improves basic understanding of concepts in statistical learning theory.

Updated: 2024-05-08 16:04:50

标题: 在天然离散组合易突变集合上的机器学习中的数据错误缩放：肽和小分子的案例研究

摘要: 我们研究了机器学习（ML）模型在训练在易于突变的离散组合空间上的数据-误差缩放行为的趋势，如蛋白质或有机小分子。我们使用不同数量的计算生成的训练数据训练和评估核岭回归机器。我们的合成数据集包括i）基于多体理论的两个朴素函数；ii）蛋白质和突变肽之间的结合能估计；以及iii）两个6个重原子结构图的溶剂化能。与典型的数据-误差缩放相反，我们的结果显示了学习过程中的不连续单调相变，即在特定训练数据阈值处测试误差迅速下降。我们观察到两种学习机制，我们称之为饱和和渐近衰减，并发现它们由训练集中包含的复杂性水平（即突变数量）决定。我们表明，在解决这类问题时，ML模型在校准图中预测是聚类的。此外，我们提出了一种规范化学习曲线（LCs）和基于突变的洗牌的替代策略。这项工作对于在化学特性或蛋白质表型预测等易突变的离散空间上的机器学习具有重要意义，并改进了统计学习理论概念的基础理解。

更新时间: 2024-05-08 16:04:50

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2405.05167v1

Reinforcement Learning from Diverse Human Preferences

The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new paradigm called reinforcement learning from human preferences (or preference-based RL) has emerged as a promising solution, in which reward functions are learned from human preference labels among behavior trajectories. However, existing methods for preference-based RL are limited by the need for accurate oracle preference labels. This paper addresses this limitation by developing a method for crowd-sourcing preference labels and learning from diverse human preferences. The key idea is to stabilize reward learning through regularization and correction in a latent space. To ensure temporal consistency, a strong constraint is imposed on the reward model that forces its latent space to be close to the prior distribution. Additionally, a confidence-based reward model ensembling method is designed to generate more stable and reliable predictions. The proposed method is tested on a variety of tasks in DMcontrol and Meta-world and has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback, paving the way for real-world applications of RL methods.

Updated: 2024-05-08 15:58:02

标题: 从不同的人类偏好中进行强化学习

摘要: 设计奖励函数的复杂性一直是深度强化学习（RL）技术广泛应用的主要障碍。即使对专家来说，描述代理的期望行为和特性也可能很困难。一种新的范式被称为从人类偏好中进行强化学习（或基于偏好的RL），已经成为一种有前途的解决方案，其中奖励函数从人类对行为轨迹的偏好标签中学习。然而，现有的基于偏好的RL方法受限于需要准确的预选偏好标签。本文通过开发一种众包偏好标签和从多样化人类偏好中学习的方法来解决这一限制。关键思想是通过在潜在空间中的正则化和校正来稳定奖励学习。为了确保时间一致性，对奖励模型施加了一个强约束，强迫其潜在空间接近先验分布。此外，设计了一种基于置信度的奖励模型集成方法，以生成更稳定可靠的预测。所提出的方法在DMcontrol和Meta-world中的各种任务上进行了测试，并在从不同反馈中学习时显示出与现有基于偏好的RL算法相比的一致且显著的改进，为RL方法的实际应用铺平了道路。

更新时间: 2024-05-08 15:58:02

领域: cs.LG

下载: http://arxiv.org/abs/2301.11774v3

Bayesian taut splines for estimating the number of modes

The number of modes in a probability density function is representative of the complexity of a model and can also be viewed as the number of subpopulations. Despite its relevance, there has been limited research in this area. A novel approach to estimating the number of modes in the univariate setting is presented, focusing on prediction accuracy and inspired by some overlooked aspects of the problem: the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view that blends local and global density properties. The technique combines flexible kernel estimators and parsimonious compositional splines in the Bayesian inference paradigm, providing soft solutions and incorporating expert judgment. The procedure includes feature exploration, model selection, and mode testing, illustrated in a sports analytics case study showcasing multiple companion visualisation tools. A thorough simulation study also demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, the new method emerges as a top-tier alternative, offering innovative solutions for analysts.

Updated: 2024-05-08 15:56:13

标题: 贝叶斯紧张样条用于估计模式数量

摘要: 概率密度函数中的模式数量代表了模型的复杂性，也可以看作是子群体的数量。尽管这一点很重要，但在这个领域的研究还很有限。本文提出了一种新颖的方法来估计单变量情况下的模式数量，重点关注预测准确性，并受到了一些被忽视的问题的启发：解决方案需要结构、模式的主观和不确定性特性，以及综合考虑局部和全局密度属性的便利性。该技术结合了灵活的核估计器和简约的组合样条在贝叶斯推断范式中，提供软性解决方案并融入专家判断。该过程包括特征探索、模型选择和模式测试，通过展示多个伴随可视化工具的体育分析案例研究进行说明。一项深入的模拟研究也表明，传统的模态驱动方法在提供准确结果方面存在困难。在这种情况下，新方法成为一种一流的替代方案，为分析师提供创新解决方案。

更新时间: 2024-05-08 15:56:13

领域: stat.ME,cs.LG,math.ST,stat.ML,stat.TH,62G05 (Primary) 62G07, 62F15, 62C10, 62C86 (Secondary)

下载: http://arxiv.org/abs/2307.05825v3

Selective Classification Under Distribution Shifts

In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers -- imperfect either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond -- in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research in SC, most previous SC methods still focus on the ideal statistical setting only, i.e., the data distribution at deployment is the same as that of training, although practical data can come from the wild. To bridge this gap, in this paper, we propose an SC framework that takes into account distribution shifts, termed generalized selective classification, that covers label-shifted (or out-of-distribution) and covariate-shifted samples, in addition to typical in-distribution samples, the first of its kind in the SC literature. We focus on non-training-based confidence-score functions for generalized SC on deep learning (DL) classifiers and propose two novel margin-based score functions. Through extensive analysis and experiments, we show that our proposed score functions are more effective and reliable than the existing ones for generalized SC on a variety of classification tasks and DL classifiers.

Updated: 2024-05-08 15:52:50

标题: 分布转移下的选择性分类

摘要: 在选择性分类（SC）中，分类器会避免进行可能错误的预测，以避免过多的错误。在高风险场景中部署不完美的分类器--不完美可能是由于数据固有的统计噪音或分类器的鲁棒性问题或其他原因--SC似乎是一条吸引人且必要的路径。尽管在SC领域进行了数十年的研究，大多数先前的SC方法仍然只关注理想的统计设置，即在部署时的数据分布与训练时相同，尽管实际数据可能来源于野外。为了弥合这一差距，本文提出了一个考虑分布转移的SC框架，称为广义选择性分类，涵盖了标签转移（或超出分布）和协变量转移样本，以及典型的分布内样本，这在SC文献中是首次出现的。我们专注于基于非训练的置信度评分函数，用于深度学习（DL）分类器的广义SC，并提出了两种新颖的基于边界的评分函数。通过广泛的分析和实验，我们展示了我们提出的评分函数在各种分类任务和DL分类器上对于广义SC比现有的方法更加有效和可靠。

更新时间: 2024-05-08 15:52:50

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.05160v1

Few-Shot Detection of Machine-Generated Text using Style Representations

The advent of instruction-tuned language models that convincingly mimic human writing poses a significant risk of abuse. However, such abuse may be counteracted with the ability to detect whether a piece of text was composed by a language model rather than a human author. Some previous approaches to this problem have relied on supervised methods by training on corpora of confirmed human- and machine- written documents. Unfortunately, model under-specification poses an unavoidable challenge for neural network-based detectors, making them brittle in the face of data shifts, such as the release of newer language models producing still more fluent text than the models used to train the detectors. Other approaches require access to the models that may have generated a document in question, which is often impractical. In light of these challenges, we pursue a fundamentally different approach not relying on samples from language models of concern at training time. Instead, we propose to leverage representations of writing style estimated from human-authored text. Indeed, we find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors, including state-of-the-art large language models like Llama-2, ChatGPT, and GPT-4. Furthermore, given a handful of examples composed by each of several specific language models of interest, our approach affords the ability to predict which model generated a given document. The code and data to reproduce our experiments are available at https://github.com/LLNL/LUAR/tree/main/fewshot_iclr2024.

Updated: 2024-05-08 15:50:40

标题: 使用样式表示进行少样本检测机器生成文本

摘要: 指导调整的语言模型的出现使得人类写作的模仿变得更加真实，但也带来了滥用的风险。然而，可以通过检测一段文本是由语言模型还是人类作者撰写而成来抵制这种滥用。一些先前的方法依赖于监督方法，通过训练已确认的人类和机器撰写的文件集来解决这个问题。不幸的是，模型的不充分性对基于神经网络的检测器构成了无法避免的挑战，使它们在面对数据转移时变得脆弱，例如发布新的语言模型产生的文本比用于训练检测器的模型更加流利。其他方法则需要访问可能生成有问题文档的模型，这通常是不切实际的。鉴于这些挑战，我们采用了一种根本不依赖于训练时关注的语言模型样本的完全不同的方法。相反，我们建议利用从人类撰写的文本中估计的写作风格表示。事实上，我们发现在区分人类作者之间有效的特征也能有效区分人类和机器作者，包括像Llama-2、ChatGPT和GPT-4这样的最先进的大型语言模型。此外，通过利用几个特定语言模型的感兴趣的示例，我们的方法可以预测哪个模型生成了给定的文档。重现我们实验的代码和数据可以在https://github.com/LLNL/LUAR/tree/main/fewshot_iclr2024 上找到。

更新时间: 2024-05-08 15:50:40

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2401.06712v3

Analytical results for uncertainty propagation through trained machine learning regression models

Machine learning (ML) models are increasingly being used in metrology applications. However, for ML models to be credible in a metrology context they should be accompanied by principled uncertainty quantification. This paper addresses the challenge of uncertainty propagation through trained/fixed machine learning (ML) regression models. Analytical expressions for the mean and variance of the model output are obtained/presented for certain input data distributions and for a variety of ML models. Our results cover several popular ML models including linear regression, penalised linear regression, kernel ridge regression, Gaussian Processes (GPs), support vector machines (SVMs) and relevance vector machines (RVMs). We present numerical experiments in which we validate our methods and compare them with a Monte Carlo approach from a computational efficiency point of view. We also illustrate our methods in the context of a metrology application, namely modelling the state-of-health of lithium-ion cells based upon Electrical Impedance Spectroscopy (EIS) data

Updated: 2024-05-08 15:50:31

标题: 经过训练的机器学习回归模型中不确定性传播的分析结果

摘要: 机器学习（ML）模型越来越多地被应用于计量学领域。然而，在计量学背景下，为了使ML模型具有可信度，应该伴随着原则性的不确定性量化。本文针对通过经过训练/固定的机器学习（ML）回归模型进行不确定性传播的挑战。针对特定输入数据分布和多种ML模型，获得/呈现了模型输出的均值和方差的分析表达式。我们的研究结果涵盖了几种流行的ML模型，包括线性回归、惩罚线性回归、核岭回归、高斯过程（GPs）、支持向量机（SVMs）和相关向量机（RVMs）。我们进行了数值实验，验证了我们的方法，并从计算效率的角度与蒙特卡洛方法进行比较。我们还在计量学应用的背景下展示了我们的方法，即基于电学阻抗谱（EIS）数据对锂离子电池的健康状态进行建模。

更新时间: 2024-05-08 15:50:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.11224v2

HEAL-SWIN: A Vision Transformer On The Sphere

High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets, as well as a selection of other image datasets, for semantic segmentation, depth regression and classification tasks. Our code is publicly available at https://github.com/JanEGerken/HEAL-SWIN.

Updated: 2024-05-08 15:49:58

标题: HEAL-SWIN：一个基于球面的视觉变换器

摘要: 高分辨率广角鱼眼图像在机器人应用中变得越来越重要，如自动驾驶。然而，使用普通的卷积神经网络或视觉变换器处理这些数据存在问题，因为在平面上投影到矩形网格时会引入失真损失。我们引入了HEAL-SWIN变换器，结合了在天文学和宇宙学中使用的高度均匀的等面积等纬度像素化（HEALPix）网格和分层移位窗口（SWIN）变换器，以产生一种能够在高分辨率、无失真的球面数据上训练的高效灵活模型。在HEAL-SWIN中，使用HEALPix网格的嵌套结构来执行SWIN变换器的补丁和窗口操作，使网络能够以最小的计算开销处理球面表示。我们在合成和真实汽车数据集以及其他图像数据集上展示了我们模型的优越性能，用于语义分割、深度回归和分类任务。我们的代码可在https://github.com/JanEGerken/HEAL-SWIN上公开获取。

更新时间: 2024-05-08 15:49:58

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2307.07313v2

PACIA: Parameter-Efficient Adapter for Few-Shot Molecular Property Prediction

Molecular property prediction (MPP) plays a crucial role in biomedical applications, but it often encounters challenges due to a scarcity of labeled data. Existing works commonly adopt gradient-based strategy to update a large amount of parameters for task-level adaptation. However, the increase of adaptive parameters can lead to overfitting and poor performance. Observing that graph neural network (GNN) performs well as both encoder and predictor, we propose PACIA, a parameter-efficient GNN adapter for few-shot MPP. We design a unified adapter to generate a few adaptive parameters to modulate the message passing process of GNN. We then adopt a hierarchical adaptation mechanism to adapt the encoder at task-level and the predictor at query-level by the unified GNN adapter. Extensive results show that PACIA obtains the state-of-the-art performance in few-shot MPP problems, and our proposed hierarchical adaptation mechanism is rational and effective.

Updated: 2024-05-08 15:49:54

标题: PACIA: 用于少样本分子性质预测的参数高效适配器

摘要: 分子属性预测（MPP）在生物医学应用中起着至关重要的作用，但往往由于标记数据的稀缺而面临挑战。现有的工作通常采用基于梯度的策略来更新大量参数进行任务级别的适应。然而，自适应参数的增加可能导致过拟合和性能不佳。观察到图神经网络（GNN）在作为编码器和预测器时表现良好，我们提出了PACIA，一种用于少样本MPP的参数高效GNN适配器。我们设计了一个统一适配器来生成少量自适应参数，以调节GNN的消息传递过程。然后我们采用分层适应机制通过统一的GNN适配器在任务级别适应编码器和在查询级别适应预测器。大量结果显示PACIA在少样本MPP问题中获得了最先进的性能，我们提出的分层适应机制是合理且有效的。

更新时间: 2024-05-08 15:49:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.00614v2

The Potential and Implications of Generative AI on HCI Education

Generative AI (GAI) is impacting teaching and learning directly or indirectly across a range of subjects and disciplines. As educators, we need to understand the potential and limitations of AI in HCI education and ensure our graduating HCI students are aware of the potential and limitations of AI in HCI. In this paper, we report on the main pedagogical insights gained from the inclusion of generative AI into a 10 week undergraduate module. We designed the module to encourage student experimentation with GAI models as part of the design brief requirement and planned practical sessions and discussions. Our insights are based on replies to a survey sent out to the students after completing the module. Our key findings, for HCI educators, report on the use of AI as a persona for developing project ideas and creating resources for design, and AI as a mirror for reflecting students' understanding of key concepts and ideas and highlighting knowledge gaps. We also discuss potential pitfalls that should be considered and the need to assess students' literacies and assumptions of GAIs as pedagogical tools. Finally, we put forward the case for educators to take the opportunities GAI presents as an educational tool and be experimental, creative, and courageous in their practice. We end with a discussion of our findings in relation to the TPACK framework in HCI.

Updated: 2024-05-08 15:46:31

标题: 生成式人工智能在人机交互教育中的潜力和影响

摘要: 生成式人工智能（GAI）直接或间接影响着各种学科和学科的教学和学习。作为教育工作者，我们需要理解人机交互（HCI）教育中人工智能的潜力和局限，并确保我们的HCI毕业生了解人工智能在HCI中的潜力和局限。在本文中，我们报告了将生成式人工智能纳入为期10周的本科模块中所获得的主要教学见解。我们设计了这个模块，以鼓励学生在设计任务要求中尝试GAI模型，并计划了实践课程和讨论。我们的见解基于学生在完成模块后发出的调查回复。对于HCI教育工作者，我们的主要发现报告了将人工智能用作开发项目想法的角色，并为设计创作资源，以及将人工智能用作反映学生对关键概念和想法的理解并突出知识差距的镜子。我们还讨论了应考虑的潜在陷阱，以及评估学生对生成式人工智能作为教学工具的素养和假设的必要性。最后，我们提出教育工作者应当将生成式人工智能所呈现的机会作为教育工具，并在实践中进行实验、创新和勇敢。我们最后讨论了我们的发现与HCI中的TPACK框架的关系。

更新时间: 2024-05-08 15:46:31

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.05154v1

Hybrid Convolutional Neural Networks with Reliability Guarantee

Making AI safe and dependable requires the generation of dependable models and dependable execution of those models. We propose redundant execution as a well-known technique that can be used to ensure reliable execution of the AI model. This generic technique will extend the application scope of AI-accelerators that do not feature well-documented safety or dependability properties. Typical redundancy techniques incur at least double or triple the computational expense of the original. We adopt a co-design approach, integrating reliable model execution with non-reliable execution, focusing that additional computational expense only where it is strictly necessary. We describe the design, implementation and some preliminary results of a hybrid CNN.

Updated: 2024-05-08 15:39:38

标题: 具有可靠性保障的混合卷积神经网络

摘要: 确保人工智能安全可靠需要生成可靠模型并可靠执行这些模型。我们提出冗余执行作为一种可以用来确保AI模型可靠执行的众所周知的技术。这种通用技术将扩展不具备良好文档化安全性或可靠性属性的AI加速器的应用范围。典型的冗余技术会至少增加原始计算开销的两到三倍。我们采用协同设计方法，将可靠模型执行与不可靠执行相结合，仅将额外的计算开销集中在严格必要的地方。我们描述了一个混合CNN的设计、实施和一些初步结果。

更新时间: 2024-05-08 15:39:38

领域: cs.AI

下载: http://arxiv.org/abs/2405.05146v1

Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

Due to its simplicity and efficiency, the first-order gradient method has been extensively employed in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum during training over-parameterized neural networks, where the number of parameters is significantly larger than that of training instances. Momentum methods, including the heavy ball (HB) method and Nesterov's accelerated gradient (NAG) method, are the workhorse of first-order gradient methods owning to their accelerated convergence. In practice, NAG often exhibits superior performance than HB. However, current theoretical works fail to distinguish their convergence difference in training neural networks. To fill this gap, we consider the training problem of the two-layer ReLU neural network under over-parameterization and random initialization. Leveraging high-resolution dynamical systems and neural tangent kernel (NTK) theory, our result not only establishes tighter upper bounds of the convergence rate for both HB and NAG, but also provides the first theoretical guarantee for the acceleration of NAG over HB in training neural networks. Finally, we validate our theoretical results on three benchmark datasets.

Updated: 2024-05-08 15:34:45

标题: 证明了在训练超参数化神经网络中，Nesterov加速梯度方法比重球方法更快

摘要: 由于其简单性和高效性，一阶梯度法在训练神经网络中被广泛应用。尽管神经网络的优化问题是非凸的，但最近的研究表明，一阶方法在训练参数过多的神经网络时能够达到全局最小值，其中参数数量显著大于训练实例的数量。动量方法，包括重球（HB）方法和Nesterov加速梯度（NAG）方法，是一阶梯度方法的主力军，因为它们具有加速收敛的特点。在实践中，NAG通常表现出比HB更优越的性能。然而，目前的理论工作未能区分它们在训练神经网络中的收敛差异。为了填补这一空白，我们考虑了在过度参数化和随机初始化下的两层ReLU神经网络的训练问题。利用高分辨率动力系统和神经切线核（NTK）理论，我们的结果不仅建立了HB和NAG的收敛速度的更紧密的上界，而且首次为NAG在训练神经网络中加速超过HB提供了理论保证。最后，我们在三个基准数据集上验证了我们的理论结果。

更新时间: 2024-05-08 15:34:45

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2208.03941v4

Bake off redux: a review and experimental evaluation of recent time series classification algorithms

In 2017, a research paper compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a `bake off', identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, Hydra+MultiROCKET and HIVE-COTEv2, perform significantly better than other approaches on both the current and new TSC problems.

Updated: 2024-05-08 15:33:11

标题: 烘焙大赛再次登场：对近期时间序列分类算法的回顾和实验评估

摘要: 在2017年，一篇研究论文对来自加州大学河滨分校（UCR）档案馆的85个数据集上的18种时间序列分类（TSC）算法进行了比较。这项研究通常被称为“烘烤比赛”，发现只有九种算法的表现明显优于使用的动态时间扭曲（DTW）和旋转森林基准。该研究通过算法从时间序列数据中提取的特征类型对每种算法进行了分类，形成了五种主要算法类型的分类法。这种算法分类以及提供代码和可访问的结果以进行再现性，有助于推动TSC领域的流行度增加。自此次烘烤比赛以来已经过去了六年，UCR档案馆已扩展到112个数据集，同时提出了大量新算法。我们重新审视了这次烘烤比赛，看看自原始出版以来每个提议的类别如何发展，并使用扩展的UCR档案馆评估新算法对之前最佳类别的表现。我们扩展了分类法以包括三种新类别以反映最新发展。除了最初提出的基于距离、间隔、形状变换、字典和混合算法，我们还比较了新的卷积和基于特征的算法以及深度学习方法。我们引入了30个最近捐赠给档案馆或重新格式化为TSC格式的分类数据集，并使用这些数据集进一步评估每个类别的表现最佳算法。总体而言，我们发现最近提出的两种算法，Hydra+MultiROCKET和HIVE-COTEv2，在当前和新的TSC问题上表现明显优于其他方法。

更新时间: 2024-05-08 15:33:11

领域: cs.LG

下载: http://arxiv.org/abs/2304.13029v3

Ethical Implications of ChatGPT in Higher Education: A Scoping Review

This scoping review explores the ethical challenges of using ChatGPT in higher education. By reviewing recent academic articles in English, Chinese, and Japanese, we aimed to provide a deep dive review and identify gaps in the literature. Drawing on Arksey and O'Malley's (2005) scoping review framework, we defined search terms and identified relevant publications from four databases in the three target languages. The research results showed that the majority of the papers were discussion papers, but there was some early empirical work. The ethical issues highlighted in these works mainly concern academic integrity, assessment issues, and data protection. Given the rapid deployment of generative artificial intelligence, it is imperative for educators to conduct more empirical studies to develop sound ethical policies for its use.

Updated: 2024-05-08 15:24:17

标题: ChatGPT在高等教育中的伦理影响：一项范围审查

摘要: 这项范围审查探讨了在高等教育中使用ChatGPT所面临的伦理挑战。通过审查近期用英语、中文和日语发表的学术文章，我们旨在提供深入的审查，并确定文献中的空白。借鉴Arksey和O'Malley（2005）的范围审查框架，我们定义了搜索词，并从三种目标语言的四个数据库中确定了相关出版物。研究结果显示，大多数文章是讨论性文章，但也有一些早期的实证研究。这些作品中突出的伦理问题主要涉及学术诚信、评估问题和数据保护。鉴于生成式人工智能的快速部署，教育工作者有必要开展更多的实证研究，以制定其使用的健全伦理政策。

更新时间: 2024-05-08 15:24:17

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2311.14378v2

QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model (LLM), utilizes textual queries and multiple tables to generate query-dependent table summaries tailored to users' information needs. To facilitate research in this area, we present a comprehensive dataset specifically tailored for this task, consisting of 4909 query-summary pairs, each associated with multiple tables. Through extensive experiments using our curated dataset, we demonstrate the effectiveness of our proposed method compared to baseline approaches. Our findings offer insights into the challenges of complex table reasoning for precise summarization, contributing to the advancement of research in query-focused multi-table summarization.

Updated: 2024-05-08 15:05:55

标题: QFMTS: 在多表输入上生成以查询为焦点的摘要

摘要: 表格总结是一个关键任务，旨在将表格数据中的信息压缩成简洁易懂的文本摘要。然而，现有的方法往往无法充分满足用户的信息和质量要求，往往忽视了现实世界查询的复杂性。在本文中，我们提出了一种新方法来解决这些限制，即引入面向查询的多表总结。我们的方法包括一个表格序列化模块、一个总结控制器和一个大型语言模型（LLM），利用文本查询和多个表格生成针对用户信息需求定制的查询相关表格摘要。为了促进这一领域的研究，我们提供了一个专门为此任务定制的全面数据集，包括4909个查询-摘要对，每个与多个表格相关联。通过使用我们筛选过的数据集进行广泛实验，我们展示了我们提出的方法相对于基线方法的有效性。我们的研究结果揭示了复杂表格推理对精确总结的挑战，有助于推动面向查询的多表总结研究的进展。

更新时间: 2024-05-08 15:05:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.05109v1

Unveiling Molecular Moieties through Hierarchical Graph Explainability

Background: Graph Neural Networks (GNN) have emerged in very recent years as a powerful tool for supporting in silico Virtual Screening. In this work we present a GNN which uses Graph Convolutional architectures to achieve very accurate multi-target screening. We also devised a hierarchical Explainable Artificial Intelligence (XAI) technique to catch information directly at atom, ring, and whole molecule level by leveraging the message passing mechanism. In this way, we find the most relevant moieties involved in bioactivity prediction. Results: We report a state-of-the-art GNN classifier on twenty Cyclin-dependent Kinase targets in support of VS. Our classifier outperforms previous SOTA approaches proposed by the authors. Moreover, a CDK1-only high-sensitivity version of the GNN has been designed to use our explainer in order to avoid the inherent bias of multi-class models. The hierarchical explainer has been validated by an expert chemist on 19 approved drugs on CDK1. Our explainer provided information in accordance to the docking analysis for 17 out of the 19 test drugs. Conclusion: Our approach is a valid support for shortening both the screening and the hit-to-lead phase. Detailed knowledge about the molecular substructures that play a role in the inhibitory action, can help the computational chemist to gain insights into the pharmacophoric function of the molecule also for repurposing purposes. Scientific Contribution Statement: The core scientific innovation of our work is the use of a hierarchical XAI approach on a GNN trained for a ligand-based VS task. The application of the hierarchical explainer allows for eliciting also structural information...

Updated: 2024-05-08 15:04:37

标题: 揭示分子成分的层次图解释

摘要: 背景：图神经网络（GNN）近年来作为一种强大的工具出现，用于支持体外虚拟筛选。在这项工作中，我们提出了一种使用图卷积架构实现非常准确的多靶点筛选的GNN。我们还设计了一种分层可解释人工智能（XAI）技术，通过利用消息传递机制，直接捕获原子、环和整个分子级别的信息。通过这种方式，我们找到了与生物活性预测相关的最相关的基团。结果：我们报告了在支持VS中的二十个 Cyclin-dependent Kinase 靶点上的最新 GNN 分类器。我们的分类器优于作者提出的先前的 SOTA 方法。此外，还设计了一个仅适用于 CDK1 的高灵敏度版本的 GNN，以利用我们的解释器来避免多类模型固有的偏见。分层解释器已由专业化学家在 CDK1 上的 19 种已批准药物上进行验证。我们的解释器为 19 种测试药物中的 17 种提供了与对接分析一致的信息。结论：我们的方法是缩短筛选和引导阶段的有效支持。了解在抑制作用中发挥作用的分子亚结构的详细知识，可以帮助计算化学家洞察分子的药效功能，也可以用于再利用目的。科学贡献声明：我们工作的核心科学创新是在为基于配体的 VS 任务训练的 GNN 上使用分层 XAI 方法。分层解释器的应用还允许引出结构信息...

更新时间: 2024-05-08 15:04:37

领域: q-bio.QM,cs.AI,cs.LG,q-bio.MN

下载: http://arxiv.org/abs/2402.01744v3

Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks

Popular artificial neural networks (ANN) optimize parameters for unidirectional value propagation, assuming some guessed parametrization type like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). In contrast, for biological neurons e.g. "it is not uncommon for axonal propagation of action potentials to happen in both directions" \cite{axon} - suggesting they are optimized to continuously operate in multidirectional way. Additionally, statistical dependencies a single neuron could model is not just (expected) value dependence, but entire joint distributions including also higher moments. Such agnostic joint distribution neuron would allow for multidirectional propagation (of distributions or values) e.g. $\rho(x|y,z)$ or $\rho(y,z|x)$ by substituting to $\rho(x,y,z)$ and normalizing. There will be discussed Hierarchical Correlation Reconstruction (HCR) for such neuron model: assuming $\rho(x,y,z)=\sum_{ijk} a_{ijk} f_i(x) f_j(y) f_k(z)$ type parametrization of joint distribution with polynomial basis $f_i$, which allows for flexible, inexpensive processing including nonlinearities, direct model estimation and update, trained through standard backpropagation or novel ways for such structure up to tensor decomposition. Using only pairwise (input-output) dependencies, its expected value prediction becomes KAN-like with trained activation functions as polynomials, can be extended by adding higher order dependencies through included products - in conscious interpretable way, allowing for multidirectional propagation of both values and probability densities.

Updated: 2024-05-08 14:49:27

标题: 基于分层关联重构的生物启发式联合分布神经元，实现多方向神经网络

摘要: 流行的人工神经网络（ANN）优化参数以实现单向数值传播，假设某种猜测的参数化类型，例如多层感知器（MLP）或科尔莫戈洛夫-阿诺德网络（KAN）。相比之下，对于生物神经元，例如“轴突传播动作电位在两个方向上发生并不罕见”\cite{axon}——这表明它们被优化为以多向方式持续运行。此外，单个神经元可以建模的统计依赖性不仅仅是（预期的）值依赖性，还包括整个联合分布，包括高阶矩。这种不加偏见的联合分布神经元可以通过替换为$\rho(x,y,z)$并进行归一化来允许多向传播（分布或值），例如$\rho(x|y,z)$或$\rho(y,z|x)$。将讨论针对这种神经元模型的分层相关重建（HCR）：假设$\rho(x,y,z)=\sum_{ijk} a_{ijk} f_i(x) f_j(y) f_k(z)$类型的联合分布参数化，其中$f_i$是多项式基础，允许灵活、廉价的处理，包括非线性、直接模型估计和更新，通过标准反向传播或针对此类结构的新方法进行训练直到张量分解。仅使用成对（输入-输出）依赖性，其期望值预测变为类似KAN的预测，其激活函数经过多项式训练，可以通过包含乘积添加更高阶依赖性——以有意义的方式进行解释，允许值和概率密度的多向传播。

更新时间: 2024-05-08 14:49:27

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.05097v1

Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions

The deployment of large language models (LLMs) raises concerns regarding their cultural misalignment and potential ramifications on individuals and societies with diverse cultural backgrounds. While the discourse has focused mainly on political and social biases, our research proposes a Cultural Alignment Test (Hoftede's CAT) to quantify cultural alignment using Hofstede's cultural dimension framework, which offers an explanatory cross-cultural comparison through the latent variable analysis. We apply our approach to quantitatively evaluate LLMs, namely Llama 2, GPT-3.5, and GPT-4, against the cultural dimensions of regions like the United States, China, and Arab countries, using different prompting styles and exploring the effects of language-specific fine-tuning on the models' behavioural tendencies and cultural values. Our results quantify the cultural alignment of LLMs and reveal the difference between LLMs in explanatory cultural dimensions. Our study demonstrates that while all LLMs struggle to grasp cultural values, GPT-4 shows a unique capability to adapt to cultural nuances, particularly in Chinese settings. However, it faces challenges with American and Arab cultures. The research also highlights that fine-tuning LLama 2 models with different languages changes their responses to cultural questions, emphasizing the need for culturally diverse development in AI for worldwide acceptance and ethical use. For more details or to contribute to this research, visit our GitHub page https://github.com/reemim/Hofstedes_CAT/

Updated: 2024-05-08 14:48:39

标题: 大型语言模型中的文化对齐：基于霍夫斯泰德文化维度的解释性分析

摘要: 大型语言模型（LLMs）的部署引发了人们对其文化不协调性以及对具有不同文化背景的个人和社会可能产生的影响的担忧。虽然讨论主要集中在政治和社会偏见上，但我们的研究提出了一种文化对齐测试（霍夫斯特德的CAT），用霍夫斯特德的文化维度框架来量化文化对齐，该框架通过潜变量分析提供了解释性的跨文化比较。我们将我们的方法应用于对LLMs进行定量评估，即Llama 2、GPT-3.5和GPT-4，针对美国、中国和阿拉伯国家等地区的文化维度，使用不同的提示风格，并探索语言特定的微调对模型的行为倾向和文化价值的影响。我们的结果量化了LLMs的文化对齐，并揭示了LLMs在解释性文化维度上的差异。我们的研究表明，虽然所有LLMs都难以把握文化价值观，但GPT-4显示出一种独特的能力，可以适应文化细微差别，特别是在中国环境中。然而，它在美国和阿拉伯文化方面面临挑战。研究还强调，用不同语言进行微调LLama 2模型会改变它们对文化问题的回答，强调了在全球接受和道德使用中对人工智能进行文化多样化发展的必要性。有关更多详情或为这项研究做出贡献，请访问我们的GitHub页面https://github.com/reemim/Hofstedes_CAT/

更新时间: 2024-05-08 14:48:39

领域: cs.CY,cs.CL,cs.LG

下载: http://arxiv.org/abs/2309.12342v2

Legally Binding but Unfair? Towards Assessing Fairness of Privacy Policies

Privacy policies are expected to inform data subjects about their data protection rights and should explain the data controller's data management practices. Privacy policies only fulfill their purpose, if they are correctly interpreted, understood, and trusted by the data subject. This implies that a privacy policy is written in a fair way, e.g., it does not use polarizing terms, does not require a certain education, or does not assume a particular social background. We outline our approach to assessing fairness in privacy policies. We identify from fundamental legal sources and fairness research, how the dimensions informational fairness, representational fairness and ethics / morality are related to privacy policies. We propose options to automatically assess policies in these fairness dimensions, based on text statistics, linguistic methods and artificial intelligence. We conduct initial experiments with German privacy policies to provide evidence that our approach is applicable. Our experiments indicate that there are issues in all three dimensions of fairness. This is important, as future privacy policies may be used in a corpus for legal artificial intelligence models.

Updated: 2024-05-08 14:47:39

标题: 具有法律约束力但不公平？朝向评估隐私政策的公平性

摘要: 隐私政策旨在告知数据主体其数据保护权利，并应解释数据控制者的数据管理实践。隐私政策只有在数据主体正确解释、理解和信任时才能实现其目的。这意味着隐私政策必须以公平的方式编写，例如，不使用极端化术语，不要求特定教育水平，也不假设特定社会背景。我们概述了评估隐私政策公平性的方法。我们从基本法律来源和公平性研究中确定信息公平性、表征公平性和道德/道德与隐私政策的关系。我们提出了基于文本统计、语言方法和人工智能的方法来自动评估这些公平性维度的政策。我们进行了针对德国隐私政策的初步实验，以证明我们的方法是可行的。我们的实验表明，在所有三个公平性维度上都存在问题。这一点很重要，因为未来的隐私政策可能会用于法律人工智能模型的语料库中。

更新时间: 2024-05-08 14:47:39

领域: cs.CY,cs.AI,cs.CL,K.4.m

下载: http://arxiv.org/abs/2403.08115v2

Unravelling Responsibility for AI

It is widely acknowledged that we need to establish where responsibility lies for the outputs and impacts of AI-enabled systems. But without a clear and precise understanding of what "responsibility" means, deliberations about where responsibility lies will be, at best, unfocused and incomplete and, at worst, misguided. To address this concern, this paper draws upon central distinctions in philosophy and law to clarify the concept of responsibility for AI for policymakers, practitioners, researchers and students from non-philosophical and non-legal backgrounds. Taking the three-part formulation "Actor A is responsible for Occurrence O," the paper unravels the concept of responsibility to clarify that there are different possibilities of who is responsible for AI, the senses in which they are responsible, and aspects of events they are responsible for. Criteria and conditions for fitting attributions of responsibility in the core senses (causal responsibility, role-responsibility, liability responsibility and moral responsibility) are articulated to promote an understanding of when responsibility attributions would be inappropriate or unjust. The analysis is presented with a graphical notation to facilitate informal diagrammatic reasoning and discussion about specific cases. It is illustrated by application to a scenario of a fatal collision between an autonomous AI-enabled ship and a traditional, crewed vessel at sea.

Updated: 2024-05-08 14:37:59

标题: 揭示人工智能的责任

摘要: 广泛认为，我们需要确定人工智能系统的产出和影响的责任归属。但是，如果没有对“责任”这一概念有清晰而准确的理解，关于责任归属的讨论最多只会是模糊和不完整的，最坏的情况下可能会是误导性的。为了解决这一问题，本文利用哲学和法律中的核心区分，澄清了针对非哲学和非法律背景的政策制定者、从业者、研究人员和学生的人工智能责任概念。本文采用“行为者A对事件O负责”的三部分表述，揭示了责任的概念，澄清了谁对人工智能负责、他们负责的方式以及他们负责的事件的方面。阐明了符合核心意义（因果责任、角色责任、法律责任和道德责任）的责任归因的标准和条件，以促进对何时责任归因是不恰当或不公正的理解。分析采用图形符号表示，以便于进行图表推理和讨论特定案例。通过将其应用于海上自主人工智能船只与传统有人船只之间发生致命碰撞的情景进行说明。

更新时间: 2024-05-08 14:37:59

领域: cs.AI,cs.CY,cs.RO

下载: http://arxiv.org/abs/2308.02608v2

Variational Self-Supervised Contrastive Learning Using Beta Divergence

Learning a discriminative semantic space using unlabelled and noisy data remains unaddressed in a multi-label setting. We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. The method (VCL) utilizes variational contrastive learning with beta-divergence to learn robustly from unlabelled datasets, including uncurated and noisy datasets. We demonstrate the effectiveness of the proposed method through rigorous experiments including linear evaluation and fine-tuning scenarios with multi-label datasets in the face understanding domain. In almost all tested scenarios, VCL surpasses the performance of state-of-the-art self-supervised methods, achieving a noteworthy increase in accuracy.

Updated: 2024-05-08 14:27:20

标题: 使用Beta分歧的变分自监督对比学习

摘要: 学习使用未标记和嘈杂数据构建具有辨别性语义空间在多标签设置中仍未得到解决。我们提出了一种对数据噪声具有鲁棒性的对比自监督学习方法，该方法基于变分方法领域。该方法（VCL）利用β散度的变分对比学习，从未标记数据集中学习，包括未经筛选和嘈杂的数据集。我们通过严格的实验展示了所提出方法的有效性，包括在线性评估和微调情景中使用多标签数据集进行面部理解。在几乎所有测试情景中，VCL都超越了最先进的自监督方法的表现，实现了显著的准确率提升。

更新时间: 2024-05-08 14:27:20

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.00824v3

Robust deep learning from weakly dependent data

Recent developments on deep learning established some theoretical properties of deep neural networks estimators. However, most of the existing works on this topic are restricted to bounded loss functions or (sub)-Gaussian or bounded input. This paper considers robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output. It is only assumed that the output variable has a finite $r$ order moment, with $r >1$. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and $\psi$-weak dependence assumptions on the observations. We derive a relationship between these bounds and $r$, and when the data have moments of any order (that is $r=\infty$), the convergence rate is close to some well-known results. When the target predictor belongs to the class of H\"older smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.

Updated: 2024-05-08 14:25:40

标题: 弱相关数据中的强大深度学习

摘要: 最近关于深度学习的发展建立了一些深度神经网络估计器的理论性质。然而，大多数现有的关于这个主题的工作都局限于有界损失函数或（次）高斯或有界输入。本文考虑了从弱相关观测中进行健壮深度学习，具有无界损失函数和无界输入/输出。只假设输出变量具有有限的$r$阶矩，其中$r>1$。在强混合和$\psi$-弱相关的假设下，建立了深度神经网络估计器的期望过度风险的非渐近界限。我们推导了这些界限与$r$之间的关系，当数据具有任意阶的矩（即$r=\infty$）时，收敛速度接近一些众所周知的结果。当目标预测器属于具有足够大光滑指数的H\"older光滑函数类时，对于指数强混合数据的期望过度风险率接近或与获得i.i.d.样本相同。考虑到健壮非参数回归和健壮非参数自回归的应用。对具有重尾误差的模型进行的模拟研究表明，具有绝对损失和Huber损失函数的健壮估计器优于最小二乘法。

更新时间: 2024-05-08 14:25:40

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2405.05081v1

Concerns on Bias in Large Language Models when Creating Synthetic Personae

This position paper explores the benefits, drawbacks, and ethical considerations of incorporating synthetic personae in HCI research, particularly focusing on the customization challenges beyond the limitations of current Large Language Models (LLMs). These perspectives are derived from the initial results of a sub-study employing vignettes to showcase the existence of bias within black-box LLMs and explore methods for manipulating them. The study aims to establish a foundation for understanding the challenges associated with these models, emphasizing the necessity of thorough testing before utilizing them to create synthetic personae for HCI research.

Updated: 2024-05-08 14:24:11

标题: 在创建合成人物时，大型语言模型中的偏见问题

摘要: 本位置论文探讨了将合成人物融入人机交互研究中的好处、弊端和道德考虑，特别关注当前大型语言模型（LLMs）的限制之外的定制挑战。这些观点源自一项采用小品展示黑匣子LLMs内存在偏见并探索操纵方法的子研究的初步结果。该研究旨在为理解与这些模型相关的挑战建立基础，强调在利用它们创建用于人机交互研究的合成人物之前进行彻底测试的必要性。

更新时间: 2024-05-08 14:24:11

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.05080v1

Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations

This work studies sparse adversarial perturbations bounded by $l_0$ norm. We propose a white-box PGD-like attack method named sparse-PGD to effectively and efficiently generate such perturbations. Furthermore, we combine sparse-PGD with a black-box attack to comprehensively and more reliably evaluate the models' robustness against $l_0$ bounded adversarial perturbations. Moreover, the efficiency of sparse-PGD enables us to conduct adversarial training to build robust models against sparse perturbations. Extensive experiments demonstrate that our proposed attack algorithm exhibits strong performance in different scenarios. More importantly, compared with other robust models, our adversarially trained model demonstrates state-of-the-art robustness against various sparse attacks. Codes are available at https://github.com/CityU-MLO/sPGD.

Updated: 2024-05-08 14:18:13

标题: 朝着高效训练和评估针对$l_0$有界对抗扰动的强健模型

摘要: 这项工作研究了由$l_0$范数限制的稀疏对抗扰动。我们提出了一种名为sparse-PGD的白盒PGD-like攻击方法，可以有效和高效地生成这种扰动。此外，我们将sparse-PGD与黑盒攻击结合起来，全面而更可靠地评估模型对$l_0$有界对抗性扰动的鲁棒性。此外，sparse-PGD的效率使我们能够进行对抗性训练，构建出对稀疏扰动具有鲁棒性的模型。大量实验证明，我们提出的攻击算法在不同场景中表现出强大的性能。更重要的是，与其他鲁棒模型相比，我们经过对抗性训练的模型展现出了最先进的对抗性稀疏攻击鲁棒性。代码可在https://github.com/CityU-MLO/sPGD找到。

更新时间: 2024-05-08 14:18:13

领域: cs.LG

下载: http://arxiv.org/abs/2405.05075v1

Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology

Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delayed in their detection, but gaps remain in clinical practice integration. In this study, we present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders. The interviews helped understand challenges in existing workflows, and how best to align technical capabilities with user needs and expectations. We discovered the trade-offs and complexities that need consideration when choosing suitable workflow stages, target users, and design configurations for different AI proposals. We explored how to balance AI benefits and risks for healthcare staff and patients within broader organizational and medical-legal constraints. We also identified data issues related to edge cases and data biases that affect model training and evaluation; how data documentation practices influence data preparation and labelling; and how to measure relevant AI outcomes reliably in future evaluations. We discuss how our work informs design and development of AI applications that are clinically useful, ethical, and acceptable in real-world healthcare services.

Updated: 2024-05-08 14:16:22

标题: 在医疗保健领域负责任人工智能设计和工作流整合的挑战：放射科自动饲管资格的案例研究

摘要: 鼻胃管（NGT）是一种通过鼻腔插入到胃内用于输送营养或药物的饲养管。如果没有正确放置，它们可能会给患者造成严重伤害，甚至致命。最近的人工智能发展展示了从胸部X光图像中强大地检测NGT放置的可行性，以减少次优或严重放置的NGT被忽视或延迟检测的风险，但在临床实践整合方面仍存在空白。在这项研究中，我们提出了一种以人为中心的方法来解决问题，并描述了通过情境调查和深入访谈与15名临床利益相关者获取的见解。这些访谈有助于理解现有工作流程中的挑战，以及如何最好地将技术能力与用户需求和期望相一致。我们发现了在选择不同AI提议的适当工作流程阶段、目标用户和设计配置时需要考虑的权衡和复杂性。我们探讨了如何在更广泛的组织和医疗法律约束条件下平衡医疗服务人员和患者的AI利益与风险。我们还确定了与边缘案例和数据偏见有关的数据问题，这些问题影响了模型的训练和评估；数据文档实践如何影响数据准备和标记；以及如何在未来的评估中可靠地衡量相关的AI结果。我们讨论了我们的工作如何指导设计和开发在现实世界医疗服务中具有临床用途、道德和可接受性的AI应用。

更新时间: 2024-05-08 14:16:22

领域: cs.HC,cs.AI,H.5.m; I.2.m

下载: http://arxiv.org/abs/2405.05299v1

Physics-Enhanced Machine Learning: a position paper for dynamical systems investigations

This position paper takes a broad look at Physics-Enhanced Machine Learning (PEML) -- also known as Scientific Machine Learning -- with particular focus to those PEML strategies developed to tackle dynamical systems' challenges. The need to go beyond Machine Learning (ML) strategies is driven by: (i) limited volume of informative data, (ii) avoiding accurate-but-wrong predictions; (iii) dealing with uncertainties; (iv) providing Explainable and Interpretable inferences. A general definition of PEML is provided by considering four physics and domain knowledge biases, and three broad groups of PEML approaches are discussed: physics-guided, physics-encoded and physics-informed. The advantages and challenges in developing PEML strategies for guiding high-consequence decision making in engineering applications involving complex dynamical systems, are presented.

Updated: 2024-05-08 14:15:51

标题: 物理增强机器学习：一份动力系统研究的立场文件

摘要: 这篇立场论文对物理增强机器学习（PEML）进行了广泛审视，也被称为科学机器学习，特别关注那些用于解决动态系统挑战的PEML策略。需要超越机器学习（ML）策略的原因包括：（i）信息数据量有限，（ii）避免准确但错误的预测；（iii）处理不确定性；（iv）提供可解释和可解释的推论。通过考虑四种物理和领域知识偏见，提供了PEML的一般定义，并讨论了三种广泛的PEML方法：物理引导、物理编码和物理信息。展示了在开发PEML策略来指导涉及复杂动态系统的工程应用中的高后果决策制定中的优势和挑战。

更新时间: 2024-05-08 14:15:51

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2405.05987v1

Novel Actor-Critic Algorithm for Robust Decision Making of CAV under Delays and Loss of V2X Data

Current autonomous driving systems heavily rely on V2X communication data to enhance situational awareness and the cooperation between vehicles. However, a major challenge when using V2X data is that it may not be available periodically because of unpredictable delays and data loss during wireless transmission between road stations and the receiver vehicle. This issue should be considered when designing control strategies for connected and autonomous vehicles. Therefore, this paper proposes a novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data. The novel algorithm incorporates three key mechanisms: a virtual fixed sampling period, a combination of Temporal-Difference and Monte Carlo learning, and a numerical approximation of immediate reward values. To address the temporal aperiodicity problem of V2X data, we first illustrate this challenge. Then, we provide a detailed explanation of the Blind Actor-Critic algorithm where we highlight the proposed components to compensate for the temporal aperiodicity problem of V2X data. We evaluate the performance of our algorithm in a simulation environment and compare it to benchmark approaches. The results demonstrate that training metrics are improved compared to conventional actor-critic algorithms. Additionally, testing results show that our approach provides robust control, even under low V2X network reliability levels.

Updated: 2024-05-08 14:14:03

标题: 新型演员-评论家算法用于自动驾驶车辆在V2X数据延迟和丢失情况下的稳健决策-making

摘要: 目前的自动驾驶系统在增强情境感知和车辆之间的协作时严重依赖V2X通信数据。然而，使用V2X数据时的一个主要挑战是由于无法预测的延迟和无线传输过程中的数据丢失，导致数据可能不定期可用。在设计连接和自动驾驶车辆的控制策略时，应考虑这个问题。因此，本文提出了一种新颖的“盲演员-评论家”算法，在V2X环境中保证了延迟和/或丢失数据下的强大驾驶性能。该新颖算法包括三个关键机制：虚拟固定采样周期、时差和蒙特卡洛学习的结合，以及即时奖励值的数值近似。为了解决V2X数据的时间不规则性问题，我们首先阐述了这一挑战。然后，我们提供了对盲演员-评论家算法的详细解释，其中我们突出了提出的组件来弥补V2X数据的时间不规则性问题。我们在仿真环境中评估了我们算法的性能，并将其与基准方法进行了比较。结果表明，与传统的演员-评论家算法相比，训练指标得到了改善。此外，测试结果显示，我们的方法提供了强大的控制，即使在V2X网络可靠性较低的情况下也是如此。

更新时间: 2024-05-08 14:14:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.05072v1

Particle density and critical point for studying site percolation by finite size scaling

Machine learning has recently achieved remarkable success in studying phase transitions. It is generally believed that the latent variables of unsupervised learning can capture the information related to phase transitions, which is usually achieved through the so-called order parameter. In most models, for instance the Ising, the order parameters are simply the particle number densities. The percolation, the simplest model which can generate a phase transition, however, has a unique order parameter which is not particle number density. In this paper, we use unsupervised learning to study the relationship between particle number density, critical point, and latent variables in the site percolation model. It is found that if the input of learning is the original configuration, then the output of unsupervised learning does not convey any information related to the phase transition. Therefore, the maximum cluster is employed in order to effectively capture the critical point of the model. Unsupervised learning yields reliable results consistent with Monte Carlo simulations. We also propose a method called Fake Finite Size Scaling (FFSS) to calculate the critical value, which improves the accuracy of fitting to a great extent.

Updated: 2024-05-08 14:13:14

标题: 颗粒密度和临界点对有限尺寸尺度下研究位置渗流的影响

摘要: 机器学习最近在研究相变方面取得了显著的成功。人们普遍认为，无监督学习的潜在变量可以捕捉与相变相关的信息，通常通过所谓的序参量实现。在大多数模型中，例如伊辛模型，序参数仅是粒子数密度。渗流，作为可以产生相变的最简单模型，然而具有一个独特的序参量，不是粒子数密度。在本文中，我们利用无监督学习来研究位渗流模型中的粒子数密度、临界点和潜在变量之间的关系。发现如果学习的输入是原始配置，则无监督学习的输出不传达任何与相变相关的信息。因此，为了有效捕捉模型的临界点，采用最大簇。无监督学习产生可靠的结果，与蒙特卡洛模拟一致。我们还提出了一种称为假有限尺度缩放（FFSS）的方法来计算临界值，大大提高了拟合的准确性。

更新时间: 2024-05-08 14:13:14

领域: cond-mat.stat-mech,cs.LG

下载: http://arxiv.org/abs/2311.14725v2

Causal Flow-based Variational Auto-Encoder for Disentangled Causal Representation Learning

Disentangled representation learning aims to learn low-dimensional representations of data, where each dimension corresponds to an underlying generative factor. Currently, Variational Auto-Encoder (VAE) are widely used for disentangled representation learning, with the majority of methods assuming independence among generative factors. However, in real-world scenarios, generative factors typically exhibit complex causal relationships. We thus design a new VAE-based framework named Disentangled Causal Variational Auto-Encoder (DCVAE), which includes a variant of autoregressive flows known as causal flows, capable of learning effective causal disentangled representations. We provide a theoretical analysis of the disentanglement identifiability of DCVAE, ensuring that our model can effectively learn causal disentangled representations. The performance of DCVAE is evaluated on both synthetic and real-world datasets, demonstrating its outstanding capability in achieving causal disentanglement and performing intervention experiments. Moreover, DCVAE exhibits remarkable performance on downstream tasks and has the potential to learn the true causal structure among factors.

Updated: 2024-05-08 14:12:47

标题: 基于因果流的变分自动编码器用于解耦因果表示学习

摘要: 解缠结表示学习旨在学习数据的低维表示，其中每个维度对应于一个潜在的生成因子。目前，变分自动编码器（VAE）广泛用于解缠结表示学习，其中大多数方法假设生成因子之间相互独立。然而，在现实场景中，生成因子通常表现出复杂的因果关系。因此，我们设计了一个新的基于VAE的框架，命名为解缠结因果变分自动编码器（DCVAE），其中包括一种称为因果流的自回归流的变体，能够学习有效的因果解缠结表示。我们对DCVAE的解缠结可识别性进行了理论分析，确保我们的模型能够有效地学习因果解缠结表示。我们在合成和真实世界数据集上评估了DCVAE的性能，展示了其在实现因果解缠结和执行干预实验方面的出色能力。此外，DCVAE在下游任务上表现出显著的性能，并有潜力学习因素之间的真实因果结构。

更新时间: 2024-05-08 14:12:47

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2304.09010v4

Designing Skill-Compatible AI: Methodologies and Frameworks in Chess

Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achieving superhuman performance alone is not sufficient; they also need to account for suboptimal actions or idiosyncratic style from their less-skilled counterparts. We propose a formal evaluation framework for assessing the compatibility of near-optimal AI with interaction partners who may have much lower levels of skill; we use popular collaborative chess variants as model systems to study and develop AI agents that can successfully interact with lower-skill entities. Traditional chess engines designed to output near-optimal moves prove to be inadequate partners when paired with engines of various lower skill levels in this domain, as they are not designed to consider the presence of other agents. We contribute three methodologies to explicitly create skill-compatible AI agents in complex decision-making settings, and two chess game frameworks designed to foster collaboration between powerful AI agents and less-skilled partners. On these frameworks, our agents outperform state-of-the-art chess AI (based on AlphaZero) despite being weaker in conventional chess, demonstrating that skill-compatibility is a tangible trait that is qualitatively and measurably distinct from raw performance. Our evaluations further explore and clarify the mechanisms by which our agents achieve skill-compatibility.

Updated: 2024-05-08 14:04:35

标题: 设计技能兼容的人工智能：国际象棋中的方法论和框架

摘要: 强大的人工智能系统通常用于与计算能力较弱的代理进行交互的环境中，例如与人类共同工作或在复杂环境中操作，其中一些任务由算法、启发式方法或其他计算能力不同的实体处理。然而，要使人工智能代理能够成功地在这些情境中交互，仅仅实现超人类表现是不够的；它们还需要考虑来自技能较低的伙伴的次优行动或特殊风格。我们提出了一个正式评估框架，用于评估接近最佳的人工智能与可能具有较低技能水平的交互伙伴的兼容性；我们使用流行的协作棋类变体作为模型系统来研究和开发可以成功与技能较低的实体交互的人工智能代理。传统的象棋引擎旨在输出接近最佳走法，但在与不同技能水平的引擎配对时未能胜任这一领域，因为它们未被设计用于考虑其他实体的存在。我们提出了三种方法论，明确地创建在复杂决策环境中技能兼容的人工智能代理，以及两种旨在促进强大人工智能代理与技能较低合作伙伴之间合作的棋类游戏框架。在这些框架上，尽管在传统象棋中较弱，我们的代理仍然胜过基于AlphaZero的最先进的象棋人工智能，表明技能兼容性是一种可以定性和可测量地与原始性能有所不同的实质特征。我们的评估进一步探讨和阐明了我们的代理如何实现技能兼容性。

更新时间: 2024-05-08 14:04:35

领域: cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.05066v1

Scalable Mechanism Design for Multi-Agent Path Finding

Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously and collision-free through a shared area toward given goal locations. This problem is computationally complex, especially when dealing with large numbers of agents, as is common in realistic applications like autonomous vehicle coordination. Finding an optimal solution is often computationally infeasible, making the use of approximate, suboptimal algorithms essential. Adding to the complexity, agents might act in a self-interested and strategic way, possibly misrepresenting their goals to the MAPF algorithm if it benefits them. Although the field of mechanism design offers tools to align incentives, using these tools without careful consideration can fail when only having access to approximately optimal outcomes. In this work, we introduce the problem of scalable mechanism design for MAPF and propose three strategyproof mechanisms, two of which even use approximate MAPF algorithms. We test our mechanisms on realistic MAPF domains with problem sizes ranging from dozens to hundreds of agents. We find that they improve welfare beyond a simple baseline.

Updated: 2024-05-08 14:03:20

标题: 可扩展的多智能体路径规划机制设计

摘要: 多智能体路径规划（MAPF）涉及确定多个智能体同时穿越共享区域，无碰撞地朝着给定目标位置前进的路径。这个问题在处理大量智能体时尤其复杂，特别是在现实应用中，如自主车辆协调中常见。找到最优解通常是计算上不可行的，因此使用近似、次优算法是必不可少的。增加了复杂性的是，智能体可能以自私和战略性的方式行动，可能会误导MAPF算法，如果对他们有利的话。虽然机制设计领域提供了用于调整激励的工具，但在只能获得近似最优结果时，使用这些工具需要谨慎考虑。在这项工作中，我们介绍了可扩展的MAPF机制设计问题，并提出了三种无懈可击的机制，其中两种甚至使用近似MAPF算法。我们在实际的MAPF领域测试了我们的机制，问题规模从几十到数百个智能体不等。我们发现它们超越了一个简单的基准线，提高了福利。

更新时间: 2024-05-08 14:03:20

领域: cs.AI,cs.GT,cs.MA

下载: http://arxiv.org/abs/2401.17044v2

ADELT: Transpilation Between Deep Learning Frameworks

We propose the Adversarial DEep Learning Transpiler (ADELT), a novel approach to source-to-source transpilation between deep learning frameworks. ADELT uniquely decouples code skeleton transpilation and API keyword mapping. For code skeleton transpilation, it uses few-shot prompting on large language models (LLMs), while for API keyword mapping, it uses contextual embeddings from a code-specific BERT. These embeddings are trained in a domain-adversarial setup to generate a keyword translation dictionary. ADELT is trained on an unlabeled web-crawled deep learning corpus, without relying on any hand-crafted rules or parallel data. It outperforms state-of-the-art transpilers, improving pass@1 rate by 17.4 pts and 15.0 pts for PyTorch-Keras and PyTorch-MXNet transpilation pairs respectively. We provide open access to our code at https://github.com/gonglinyuan/adelt.

Updated: 2024-05-08 13:51:44

标题: ADELT：深度学习框架之间的转译

摘要: 我们提出了对抗式深度学习转换器（ADELT），这是一种在深度学习框架之间进行源代码转换的新方法。ADELT独特地将代码骨架转换和API关键词映射分离开来。对于代码骨架转换，它使用大型语言模型（LLMs）上的少量提示，而对于API关键词映射，则使用代码特定的BERT中的上下文嵌入。这些嵌入是在领域对抗设置中进行训练，以生成关键词翻译词典。ADELT在未标记的网络爬虫深度学习语料库上进行训练，而不依赖于任何手工制作的规则或平行数据。它优于最先进的转换器，分别将PyTorch-Keras和PyTorch-MXNet转换对的pass@1速率提高了17.4个百分点和15.0个百分点。我们在https://github.com/gonglinyuan/adelt上提供我们的代码的开放访问。

更新时间: 2024-05-08 13:51:44

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2303.03593v3

Rethinking recidivism through a causal lens

Predictive modeling of criminal recidivism, or whether people will re-offend in the future, has a long and contentious history. Modern causal inference methods allow us to move beyond prediction and target the "treatment effect" of a specific intervention on an outcome in an observational dataset. In this paper, we look specifically at the effect of incarceration (prison time) on recidivism, using a well-known dataset from North Carolina. Two popular causal methods for addressing confounding bias are explained and demonstrated: directed acyclic graph (DAG) adjustment and double machine learning (DML), including a sensitivity analysis for unobserved confounders. We find that incarceration has a detrimental effect on recidivism, i.e., longer prison sentences make it more likely that individuals will re-offend after release, although this conclusion should not be generalized beyond the scope of our data. We hope that this case study can inform future applications of causal inference to criminal justice analysis.

Updated: 2024-05-08 13:48:23

标题: 通过因果透镜重新思考累犯问题

摘要: 犯罪再犯的预测建模，或者说人们未来是否会再次犯罪，有着悠久而有争议的历史。现代因果推断方法使我们能够超越预测，针对观察数据集中特定干预对结果的“治疗效果”。在这篇论文中，我们具体研究了监禁（监狱时间）对再犯的影响，使用了来自北卡罗来纳州的一个著名数据集。我们解释并演示了两种流行的处理混杂偏倚的因果方法：有向无环图（DAG）调整和双机器学习（DML），包括对未观察到的混杂因素进行的敏感性分析。我们发现监禁对再犯有害影响，即较长的监狱刑期使释放后的个体更有可能再次犯罪，尽管这个结论不应超出我们数据的范围。我们希望这个案例研究能够为将来将因果推断应用于刑事司法分析提供信息。

更新时间: 2024-05-08 13:48:23

领域: cs.LG,cs.CY,stat.AP

下载: http://arxiv.org/abs/2011.11483v4

StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables

StepMix is an open-source Python package for the pseudo-likelihood estimation (one-, two- and three-step approaches) of generalized finite mixture models (latent profile and latent class analysis) with external variables (covariates and distal outcomes). In many applications in social sciences, the main objective is not only to cluster individuals into latent classes, but also to use these classes to develop more complex statistical models. These models generally divide into a measurement model that relates the latent classes to observed indicators, and a structural model that relates covariates and outcome variables to the latent classes. The measurement and structural models can be estimated jointly using the so-called one-step approach or sequentially using stepwise methods, which present significant advantages for practitioners regarding the interpretability of the estimated latent classes. In addition to the one-step approach, StepMix implements the most important stepwise estimation methods from the literature, including the bias-adjusted three-step methods with Bolk-Croon-Hagenaars and maximum likelihood corrections and the more recent two-step approach. These pseudo-likelihood estimators are presented in this paper under a unified framework as specific expectation-maximization subroutines. To facilitate and promote their adoption among the data science community, StepMix follows the object-oriented design of the scikit-learn library and provides an additional R wrapper.

Updated: 2024-05-08 13:22:34

标题: StepMix: 用于外部变量的广义混合模型伪似然估计的Python包

摘要: StepMix是一个开源的Python软件包，用于估计广义有限混合模型（潜在剖面和潜在类别分析）的伪似然度量（一步、两步和三步方法），并带有外部变量（协变量和远程结果）。在社会科学的许多应用中，主要目标不仅是将个体聚类成潜在类别，还要利用这些类别开发更复杂的统计模型。这些模型通常分为一个测量模型，将潜在类别与观测指标相关联，以及一个结构模型，将协变量和结果变量与潜在类别相关联。可以使用所谓的一步方法联合估计测量和结构模型，也可以使用逐步方法依次估计，这对于实践者来说在估计的潜在类别的可解释性方面具有显著优势。除了一步方法外，StepMix还实现了文献中最重要的逐步估计方法，包括具有Bolk-Croon-Hagenaars和最大似然校正的偏差调整的三步方法以及较新的两步方法。这些伪似然估计器在本文中以特定期望最大化子程序的统一框架下呈现。为了促进和推广它们在数据科学社区中的应用，StepMix遵循scikit-learn库的面向对象设计，并提供额外的R封装。

更新时间: 2024-05-08 13:22:34

领域: stat.ME,cs.LG,stat.ML

下载: http://arxiv.org/abs/2304.03853v5

Gröbner Basis Cryptanalysis of Ciminion and Hydra

Ciminion and Hydra are two recently introduced symmetric key Pseudo-Random Functions for Multi-Party Computation applications. For efficiency both primitives utilize quadratic permutations at round level. Therefore, polynomial system solving-based attacks pose a serious threat to these primitives. For Ciminion we construct a quadratic degree reverse lexicographic (DRL) Gr\"obner basis for the iterated polynomial model via affine transformations. For Hydra we provide a computer-aided proof in SageMath that a quadratic DRL Gr\"obner basis is already contained within the iterated polynomial system for the Hydra heads after affine transformations and a linear change of coordinates. Our Ciminion DRL Gr\"obner basis simplifies cryptanalysis, since one does not need to impose genericity assumptions, like being regular or semi-regular, anymore to derive complexity estimates on key recovery attacks. In the Hydra proposal it was claimed that $r_\mathcal{H} = 31$ rounds for the heads are sufficient to achieve $128$ bits of security against Gr\"obner basis attacks for key recovery. However, for $r_\mathcal{H} = 31$ standard term order conversion to a lexicographic (LEX) Gr\"obner basis for our Hydra DRL Gr\"obner basis requires just $126$ bits. Moreover, via the Eigenvalue Method up to $r_\mathcal{H} = 33$ rounds can be attacked below $128$ bits.

Updated: 2024-05-08 13:14:04

标题: Gröbner基础密码分析Ciminion和Hydra

摘要: Ciminion和Hydra是最近引入的用于多方计算应用的对称密钥伪随机函数。为了提高效率，这两种原语在轮级别上都使用二次置换。因此，基于多项式系统求解的攻击对这些原语构成了严重威胁。对于Ciminion，我们通过仿射变换构建了一个二次度反向词典序（DRL）Gröbner基，用于迭代多项式模型。对于Hydra，我们在SageMath中提供了一个计算机辅助证明，即在仿射变换和坐标线性变换后，Hydra头部的迭代多项式系统中已包含了二次DRL Gröbner基。我们的Ciminion DRL Gröbner基简化了密码分析，因为不再需要对关键恢复攻击的复杂度估计施加通用性假设，如正则或半正则。在Hydra提案中声称对于头部来说$r_\mathcal{H} = 31$轮足以实现$128$比特的安全，以对抗Gröbner基攻击的关键恢复。然而，对于$r_\mathcal{H} = 31$，将标准术语顺序转换为我们的Hydra DRL Gröbner基的词典序（LEX）Gröbner基仅需要$126$比特。此外，通过特征值方法，可以在$r_\mathcal{H} = 33$轮以下攻击低于$128$比特。

更新时间: 2024-05-08 13:14:04

领域: cs.CR

下载: http://arxiv.org/abs/2405.05040v1

Multi-fidelity Hamiltonian Monte Carlo

Numerous applications in biology, statistics, science, and engineering require generating samples from high-dimensional probability distributions. In recent years, the Hamiltonian Monte Carlo (HMC) method has emerged as a state-of-the-art Markov chain Monte Carlo technique, exploiting the shape of such high-dimensional target distributions to efficiently generate samples. Despite its impressive empirical success and increasing popularity, its wide-scale adoption remains limited due to the high computational cost of gradient calculation. Moreover, applying this method is impossible when the gradient of the posterior cannot be computed (for example, with black-box simulators). To overcome these challenges, we propose a novel two-stage Hamiltonian Monte Carlo algorithm with a surrogate model. In this multi-fidelity algorithm, the acceptance probability is computed in the first stage via a standard HMC proposal using an inexpensive differentiable surrogate model, and if the proposal is accepted, the posterior is evaluated in the second stage using the high-fidelity (HF) numerical solver. Splitting the standard HMC algorithm into these two stages allows for approximating the gradient of the posterior efficiently, while producing accurate posterior samples by using HF numerical solvers in the second stage. We demonstrate the effectiveness of this algorithm for a range of problems, including linear and nonlinear Bayesian inverse problems with in-silico data and experimental data. The proposed algorithm is shown to seamlessly integrate with various low-fidelity and HF models, priors, and datasets. Remarkably, our proposed method outperforms the traditional HMC algorithm in both computational and statistical efficiency by several orders of magnitude, all while retaining or improving the accuracy in computed posterior statistics.

Updated: 2024-05-08 13:03:55

标题: 多信度哈密顿蒙特卡洛

摘要: 在生物学、统计学、科学和工程学中，许多应用需要从高维概率分布中生成样本。近年来，哈密尔顿蒙特卡洛（HMC）方法作为一种最先进的马尔科夫链蒙特卡洛技术应运而生，利用高维目标分布的形状高效生成样本。尽管它在经验上取得了令人印象深刻的成功并逐渐增加了其流行度，但由于梯度计算的高计算成本，其广泛应用仍然受限。此外，当后验的梯度无法计算时（例如，使用黑匣子模拟器时），应用这种方法是不可能的。为了克服这些挑战，我们提出了一种具有替代模型的新型两阶段哈密尔顿蒙特卡洛算法。在这种多保真度算法中，通过使用成本较低的可微分替代模型，在第一阶段计算接受概率的标准HMC提案，如果提案被接受，则在第二阶段使用高保真度（HF）数值求解器评估后验。将标准HMC算法分为这两个阶段允许有效地逼近后验的梯度，同时通过在第二阶段使用HF数值求解器产生准确的后验样本。我们展示了这种算法在一系列问题中的有效性，包括使用仿真数据和实验数据的线性和非线性贝叶斯逆问题。所提出的算法被证明能够无缝集成各种低保真度和高保真度模型、先验和数据集。值得注意的是，我们提出的方法在计算和统计效率方面比传统的HMC算法高出几个数量级，同时保留或提高了计算后验统计量的准确性。

更新时间: 2024-05-08 13:03:55

领域: cs.CE,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.05033v1

Scalable Decentralized Algorithms for Online Personalized Mean Estimation

In numerous settings, agents lack sufficient data to directly learn a model. Collaborating with other agents may help, but it introduces a bias-variance trade-off, when local data distributions differ. A key challenge is for each agent to identify clients with similar distributions while learning the model, a problem that remains largely unresolved. This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean. Existing algorithms face impractical space and time complexities (quadratic in the number of agents A). To address scalability challenges, we propose a framework where agents self-organize into a graph, allowing each agent to communicate with only a selected number of peers r. We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach, with complexity of O( r |A| log |A|) and O(r |A|), respectively. We establish conditions under which both algorithms yield asymptotically optimal estimates and offer a theoretical characterization of their performance.

Updated: 2024-05-08 12:59:45

标题: 可扩展的去中心化算法用于在线个性化平均值估计

摘要: 在许多情境下，代理人缺乏足够的数据来直接学习模型。与其他代理人合作可能有所帮助，但当本地数据分布不同时，会引入偏差-方差的权衡。一个关键挑战是每个代理人在学习模型时识别具有相似分布的客户，这个问题在很大程度上尚未解决。本研究关注一个简化的问题，即每个代理人随着时间从实值分布中收集样本以估计其均值。现有算法面临不切实际的空间和时间复杂性（与代理人数量A成二次关系）。为了解决可伸缩性挑战，我们提出了一个框架，其中代理人自组织成一个图，使每个代理人只能与选定数量的同行r进行通信。我们引入了两种协作均值估计算法：一种受到信念传播的启示，另一种采用基于共识的方法，复杂度分别为O（r | A | log | A |）和O（r | A |）。我们建立了两种算法产生渐近最优估计的条件，并对它们的性能进行了理论表征。

更新时间: 2024-05-08 12:59:45

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2402.12812v3

StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer

We present StyleMamba, an efficient image style transfer framework that translates text prompts into corresponding visual styles while preserving the content integrity of the original images. Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources. To speed up the process, we propose a conditional State Space Model for Efficient Text-driven Image Style Transfer, dubbed StyleMamba, that sequentially aligns the image features to the target text prompts. To enhance the local and global style consistency between text and image, we propose masked and second-order directional losses to optimize the stylization direction to significantly reduce the training iterations by 5 times and the inference time by 3 times. Extensive experiments and qualitative evaluation confirm the robust and superior stylization performance of our methods compared to the existing baselines.

Updated: 2024-05-08 12:57:53

标题: StyleMamba：高效文本驱动图像风格迁移的状态空间模型

摘要: 我们提出了StyleMamba，这是一个高效的图像风格转移框架，可以将文本提示转化为相应的视觉风格，同时保持原始图像的内容完整性。现有的文本引导式风格化需要数百次训练迭代，并消耗大量计算资源。为了加快这一过程，我们提出了一种用于高效文本驱动图像风格转移的条件状态空间模型，名为StyleMamba，它顺序地将图像特征与目标文本提示对齐。为了增强文本和图像之间的局部和全局风格一致性，我们提出了掩蔽和二阶方向损失，以优化风格化方向，从而将训练迭代次数显著减少5倍，推断时间减少3倍。大量实验证实和定性评估证实了我们的方法相对于现有基线的稳健和优越的风格化性能。

更新时间: 2024-05-08 12:57:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.05027v1

Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges

This paper provides a comprehensive review of deep structural causal models (DSCMs), particularly focusing on their ability to answer counterfactual queries using observational data within known causal structures. It delves into the characteristics of DSCMs by analyzing the hypotheses, guarantees, and applications inherent to the underlying deep learning components and structural causal models, fostering a finer understanding of their capabilities and limitations in addressing different counterfactual queries. Furthermore, it highlights the challenges and open questions in the field of deep structural causal modeling. It sets the stages for researchers to identify future work directions and for practitioners to get an overview in order to find out the most appropriate methods for their needs.

Updated: 2024-05-08 12:56:33

标题: 通过深层生成模型学习结构因果模型：方法、保证和挑战

摘要: 本文全面回顾了深层结构因果模型（DSCMs），特别关注它们利用已知因果结构内的观测数据回答反事实查询的能力。通过分析深度学习组件和结构因果模型的假设、保证和应用，深入探讨了DSCMs的特征，促进对它们在解决不同反事实查询中的能力和局限性的更细致理解。此外，强调了深层结构因果建模领域中的挑战和未解问题。为研究人员确定未来工作方向，为实践者提供概述以找到最适合其需求的方法奠定了基础。

更新时间: 2024-05-08 12:56:33

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.05025v1

DKINet: Medication Recommendation via Domain Knowledge Informed Deep Learning

Medication recommendation is a fundamental yet crucial branch of healthcare that presents opportunities to assist physicians in making more accurate medication prescriptions for patients with complex health conditions. Previous studies have primarily focused on learning patient representation from electronic health records (EHR). While considering the clinical manifestations of the patient is important, incorporating domain-specific prior knowledge is equally significant in diagnosing the patient's health conditions. However, effectively integrating domain knowledge with the patient's clinical manifestations can be challenging, particularly when dealing with complex clinical manifestations. Therefore, in this paper, we first identify comprehensive domain-specific prior knowledge, namely the Unified Medical Language System (UMLS), which is a comprehensive repository of biomedical vocabularies and standards, for knowledge extraction. Subsequently, we propose a knowledge injection module that addresses the effective integration of domain knowledge with complex clinical manifestations, enabling an effective characterization of the health conditions of the patient. Furthermore, considering the significant impact of a patient's medication history on their current medication, we introduce a historical medication-aware patient representation module to capture the longitudinal influence of historical medication information on the representation of current patients. Extensive experiments on three publicly benchmark datasets verify the superiority of our proposed method, which outperformed other methods by a significant margin. The code is available at: https://github.com/sherry6247/DKINet.

Updated: 2024-05-08 12:49:20

标题: DKINet：通过领域知识引导的深度学习进行药物推荐

摘要: 药物推荐是医疗保健的一个基础但至关重要的领域，它为协助医生为患有复杂健康状况的患者做出更准确的药物处方提供了机会。先前的研究主要集中在从电子健康记录（EHR）中学习患者表现。虽然考虑患者的临床表现很重要，但将领域特定的先验知识与之相结合同样在诊断患者健康状况方面至关重要。然而，有效地将领域知识与患者的临床表现相结合可能具有挑战性，特别是在处理复杂的临床表现时。因此，在本文中，我们首先确定了全面的领域特定先验知识，即统一医学语言系统（UMLS），这是一个包含生物医学词汇和标准的全面知识库，用于知识提取。随后，我们提出了一个知识注入模块，该模块解决了如何有效地将领域知识与复杂的临床表现相结合，实现对患者健康状况的有效表征。此外，考虑到患者的药物史对其当前用药的重要影响，我们引入了一个历史药物感知患者表征模块，以捕捉历史药物信息对当前患者表征的纵向影响。在三个公开基准数据集上进行的大量实验验证了我们提出的方法的优越性，其表现显著优于其他方法。代码可在以下链接获得：https://github.com/sherry6247/DKINet。

更新时间: 2024-05-08 12:49:20

领域: cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2305.19604v4

HackCar: a test platform for attacks and defenses on a cost-contained automotive architecture

In this paper, we introduce the design of HackCar, a testing platform for replicating attacks and defenses on a generic automotive system without requiring access to a complete vehicle. This platform empowers security researchers to illustrate the consequences of attacks targeting an automotive system on a realistic platform, facilitating the development and testing of security countermeasures against both existing and novel attacks. The HackCar platform is built upon an F1-10th model, to which various automotive-grade microcontrollers are connected through automotive communication protocols. This solution is crafted to be entirely modular, allowing for the creation of diverse test scenarios. Researchers and practitioners can thus develop innovative security solutions while adhering to the constraints of automotive-grade microcontrollers. We showcase our design by comparing it with a real, licensed, and unmodified vehicle. Additionally, we analyze the behavior of the HackCar in both an attack-free scenario and a scenario where an attack on in-vehicle communication is deployed.

Updated: 2024-05-08 12:48:01

标题: HackCar：一个用于在成本受限的汽车架构上进行攻击和防御测试的平台

摘要: 在本文中，我们介绍了HackCar的设计，这是一个用于在通用汽车系统上复制攻击和防御的测试平台，而无需访问完整车辆。该平台使安全研究人员能够展示针对汽车系统的攻击对现实平台的影响，促进了安全对抗措施针对现有和新型攻击的开发和测试。HackCar平台建立在F1-10th模型之上，通过汽车通信协议连接各种汽车级微控制器。这个解决方案被设计成完全模块化，允许创建各种测试场景。因此，研究人员和从业者可以在遵守汽车级微控制器的限制的同时开发创新的安全解决方案。我们通过将其与真实、经过许可且未修改的车辆进行比较来展示我们的设计。此外，我们分析了HackCar在无攻击场景和部署车内通信攻击场景下的行为。

更新时间: 2024-05-08 12:48:01

领域: cs.CR

下载: http://arxiv.org/abs/2405.05023v1

Adversarial Threats to Automatic Modulation Open Set Recognition in Wireless Networks

Automatic Modulation Open Set Recognition (AMOSR) is a crucial technological approach for cognitive radio communications, wireless spectrum management, and interference monitoring within wireless networks. Numerous studies have shown that AMR is highly susceptible to minimal perturbations carefully designed by malicious attackers, leading to misclassification of signals. However, the adversarial security issue of AMOSR has not yet been explored. This paper adopts the perspective of attackers and proposes an Open Set Adversarial Attack (OSAttack), aiming at investigating the adversarial vulnerabilities of various AMOSR methods. Initially, an adversarial threat model for AMOSR scenarios is established. Subsequently, by analyzing the decision criteria of both discriminative and generative open set recognition, OSFGSM and OSPGD are proposed to reduce the performance of AMOSR. Finally, the influence of OSAttack on AMOSR is evaluated utilizing a range of qualitative and quantitative indicators. The results indicate that despite the increased resistance of AMOSR models to conventional interference signals, they remain vulnerable to attacks by adversarial examples.

Updated: 2024-05-08 12:46:18

标题: 对无线网络中自动调制开集识别的对抗威胁

摘要: 自动调制开放集识别（AMOSR）是认知无线电通信、无线频谱管理和无线网络内干扰监测的关键技术方法。许多研究表明，AMR 非常容易受到恶意攻击者精心设计的微小扰动的影响，导致信号的误分类。然而，AMOSR 的对抗安全问题尚未得到探讨。本文采用攻击者的视角，提出了一种开放集对抗攻击（OSAttack），旨在探究各种 AMOSR 方法的对抗脆弱性。首先，建立了 AMOSR 场景的对抗威胁模型。随后，通过分析区分性和生成性开放集识别的决策标准，提出了 OSFGSM 和 OSPGD 以降低 AMOSR 的性能。最后，利用一系列定性和定量指标评估了 OSAttack 对 AMOSR 的影响。结果表明，尽管 AMOSR 模型对传统干扰信号的抵抗力增强，但仍然容易受到对抗性示例的攻击。

更新时间: 2024-05-08 12:46:18

领域: cs.CR,cs.SI

下载: http://arxiv.org/abs/2405.05022v1

Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning

Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.

Updated: 2024-05-08 12:39:59

标题: 深度学习中基于共轭梯度的自适应动量估计优化算法

摘要: 训练深度神经网络是一项具有挑战性的任务。为了加快训练速度并提高深度神经网络的性能，我们将普通共轭梯度修正为类共轭梯度，并将其纳入通用Adam算法中，从而提出了一种名为CG-like-Adam的新优化算法用于深度学习。具体而言，通用Adam的一阶和二阶动量估计都被共轭梯度类似方法取代。收敛分析处理了一阶动量估计的指数移动平均系数恒定且一阶动量估计无偏的情况。数值实验显示了基于CIFAR10/100数据集的提出算法的优越性。

更新时间: 2024-05-08 12:39:59

领域: cs.LG,cs.AI,cs.CV,math.OC

下载: http://arxiv.org/abs/2404.01714v2

Online Long-run Constrained Optimization

A novel Follow-the-Perturbed-Leader type algorithm is proposed and analyzed for solving general long-term constrained optimization problems in online manner, where the objective and constraints are arbitrarily generated and not necessarily convex. In each period, random linear perturbation and strongly concave perturbation are incorporated in primal and dual directions, respectively, to the offline oracle, and a global minimax point is searched as the solution. Based on a proposed expected static cumulative regret, we derive the first sublinear $O(T^{8/9})$ regret complexity for this class of problems. The proposed algorithm is applied to tackle a long-term (extreme value) constrained river pollutant source identification problem, validate the theoretical results and exhibit superior performance compared to existing methods.

Updated: 2024-05-08 12:37:12

标题: 在线长期受限制的优化

摘要: 提出并分析了一种新颖的跟随扰动领导者类型算法，用于以在线方式解决一般的长期受限优化问题，其中目标和约束是任意生成的，不一定是凸的。在每个周期中，将随机线性扰动和强凹扰动分别纳入原始和对偶方向，传递给离线预测器，并搜索全局最小最大点作为解决方案。基于提出的期望静态累积遗憾，我们推导出了这类问题的第一个次线性$O(T^{8/9})$遗憾复杂度。提出的算法被应用于解决长期（极值）受限河流污染源识别问题，验证了理论结果，并展示了与现有方法相比的优越性能。

更新时间: 2024-05-08 12:37:12

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2311.02426v2

Persistent Homology for High-dimensional Data Based on Spectral Methods

Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the $k$-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.

Updated: 2024-05-08 12:34:54

标题: 基于谱方法的高维数据的持续同调理论

摘要: 持久同调是一种流行的计算工具，用于分析点云的拓扑结构，比如环或空洞的存在。然而，许多具有低固有维度的实际数据集存在于高维空间中。我们表明，在这种情况下，传统的持久同调对噪声非常敏感，并且无法检测到正确的拓扑结构。现有的持久同调的改进也是如此。作为一种补救措施，我们发现数据的$k$-最近邻图上的光谱距离，比如扩散距离和有效电阻，即使在存在高维噪声的情况下也能够检测到正确的拓扑结构。此外，我们推导出了有效电阻的新颖闭式公式，并描述了它与扩散距离的关系。最后，我们将这些方法应用于高维单细胞RNA测序数据，并展示了光谱距离能够稳健地检测细胞周期环。

更新时间: 2024-05-08 12:34:54

领域: cs.LG,math.AT

下载: http://arxiv.org/abs/2311.03087v2

Concrete Dense Network for Long-Sequence Time Series Clustering

Time series clustering is fundamental in data analysis for discovering temporal patterns. Despite recent advancements, learning cluster-friendly representations is still challenging, particularly with long and complex time series. Deep temporal clustering methods have been trying to integrate the canonical k-means into end-to-end training of neural networks but fall back on surrogate losses due to the non-differentiability of the hard cluster assignment, yielding sub-optimal solutions. In addition, the autoregressive strategy used in the state-of-the-art RNNs is subject to error accumulation and slow training, while recent research findings have revealed that Transformers are less effective due to time points lacking semantic meaning, to the permutation invariance of attention that discards the chronological order and high computation cost. In light of these observations, we present LoSTer which is a novel dense autoencoder architecture for the long-sequence time series clustering problem (LSTC) capable of optimizing the k-means objective via the Gumbel-softmax reparameterization trick and designed specifically for accurate and fast clustering of long time series. Extensive experiments on numerous benchmark datasets and two real-world applications prove the effectiveness of LoSTer over state-of-the-art RNNs and Transformer-based deep clustering methods.

Updated: 2024-05-08 12:31:35

标题: 混凝土密集网络用于长序列时间序列聚类

摘要: 时间序列聚类在数据分析中是发现时间模式的基础。尽管最近取得了一些进展，但学习友好的聚类表示仍然具有挑战性，特别是对于长且复杂的时间序列。深度时间聚类方法一直试图将经典的k-means集成到神经网络的端到端训练中，但由于硬聚类分配的不可微性而退回到代理损失，导致次优解。此外，现有最先进的RNN中使用的自回归策略容易积累误差并且训练速度缓慢，而最近的研究发现表明，Transformers不太有效，因为时间点缺乏语义含义，注意力的排列不变性丢弃了时间顺序并且计算成本高。基于这些观察，我们提出了LoSTer，这是一种新颖的用于长序列时间序列聚类问题（LSTC）的密集自动编码器架构，能够通过Gumbel-softmax重参数化技巧优化k-means目标，并专门设计用于准确和快速聚类长时间序列。对多个基准数据集和两个实际应用进行的大量实验证明了LoSTer相对于最先进的RNN和基于Transformer的深度聚类方法的有效性。

更新时间: 2024-05-08 12:31:35

领域: cs.LG

下载: http://arxiv.org/abs/2405.05015v1

Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

In noisy label learning, estimating noisy class posteriors plays a fundamental role for developing consistent classifiers, as it forms the basis for estimating clean class posteriors and the transition matrix. Existing methods typically learn noisy class posteriors by training a classification model with noisy labels. However, when labels are incorrect, these models may be misled to overemphasize the feature parts that do not reflect the instance characteristics, resulting in significant errors in estimating noisy class posteriors. To address this issue, this paper proposes to augment the supervised information with part-level labels, encouraging the model to focus on and integrate richer information from various parts. Specifically, our method first partitions features into distinct parts by cropping instances, yielding part-level labels associated with these various parts. Subsequently, we introduce a novel single-to-multiple transition matrix to model the relationship between the noisy and part-level labels, which incorporates part-level labels into a classifier-consistent framework. Utilizing this framework with part-level labels, we can learn the noisy class posteriors more precisely by guiding the model to integrate information from various parts, ultimately improving the classification performance. Our method is theoretically sound, while experiments show that it is empirically effective in synthetic and real-world noisy benchmarks.

Updated: 2024-05-08 12:13:40

标题: 使用部分级标签估计带噪声类后验概率进行噪声标签学习

摘要: 在嘈杂标签学习中，估计嘈杂类后验概率对于开发一致的分类器起着基础作用，因为它构成了估计清洁类后验概率和过渡矩阵的基础。现有方法通常通过训练带有嘈杂标签的分类模型来学习嘈杂类后验概率。然而，当标签不正确时，这些模型可能会被误导，过分强调不反映实例特征的特征部分，导致在估计嘈杂类后验概率时出现显著错误。为了解决这个问题，本文提出通过部分级标签增强监督信息，鼓励模型集中并整合来自各个部分的更丰富信息。具体而言，我们的方法首先通过裁剪实例将特征划分为不同部分，产生与这些不同部分相关联的部分级标签。随后，我们引入了一种新颖的单到多过渡矩阵来模拟嘈杂和部分级标签之间的关系，将部分级标签整合到一个分类器一致的框架中。利用带有部分级标签的这个框架，我们可以通过引导模型整合来自各个部分的信息来更精确地学习嘈杂类后验概率，从而最终提高分类性能。我们的方法在理论上是合理的，而实验证明在合成和现实世界的嘈杂基准测试中具有实证效力。

更新时间: 2024-05-08 12:13:40

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.05714v1

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. Project website: https://maszhongming.github.io/ParaKnowTransfer.

Updated: 2024-05-08 12:11:00

标题: 寻找神经信息：从参数化角度探讨大型语言模型中的知识转移

摘要: 大型语言模型（LLMs）通过在广泛的语料库上进行预训练，在其参数中固有地编码了丰富的知识。尽管先前的研究已深入研究了对这些参数进行操作以操纵潜在的隐含知识（包括检测、编辑和合并），但对它们在不同规模模型之间的可转移性仍存在模糊的理解。在本文中，我们试图通过参数化的视角从经验上调查从大型到较小模型的知识转移。为实现这一目标，我们采用基于敏感性的技术来提取和对齐不同LLMs之间的知识特定参数。此外，LoRA模块被用作将提取的知识注入到较小模型中的中间机制。在四个基准测试中的评估验证了我们提出的方法的有效性。我们的研究结果突出了对参数化知识转移过程的重要因素，强调了在不同规模LLMs之间模型参数的可转移性。项目网站：https://maszhongming.github.io/ParaKnowTransfer。

更新时间: 2024-05-08 12:11:00

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.11451v2

Health Index Estimation Through Integration of General Knowledge with Unsupervised Learning

Accurately estimating a Health Index (HI) from condition monitoring data (CM) is essential for reliable and interpretable prognostics and health management (PHM) in complex systems. In most scenarios, complex systems operate under varying operating conditions and can exhibit different fault modes, making unsupervised inference of an HI from CM data a significant challenge. Hybrid models combining prior knowledge about degradation with deep learning models have been proposed to overcome this challenge. However, previously suggested hybrid models for HI estimation usually rely heavily on system-specific information, limiting their transferability to other systems. In this work, we propose an unsupervised hybrid method for HI estimation that integrates general knowledge about degradation into the convolutional autoencoder's model architecture and learning algorithm, enhancing its applicability across various systems. The effectiveness of the proposed method is demonstrated in two case studies from different domains: turbofan engines and lithium batteries. The results show that the proposed method outperforms other competitive alternatives, including residual-based methods, in terms of HI quality and their utility for Remaining Useful Life (RUL) predictions. The case studies also highlight the comparable performance of our proposed method with a supervised model trained with HI labels.

Updated: 2024-05-08 11:54:15

标题: 通过将常识与无监督学习相结合进行健康指数估计

摘要: 准确估计健康指数（HI）是从条件监测数据（CM）中的关键，可靠且可解释的预测和健康管理（PHM）在复杂系统中。在大多数情况下，复杂系统在不同的工作条件下运行，并且可能表现出不同的故障模式，使得从CM数据中无监督地推断HI成为一个重要挑战。已经提出了结合关于退化的先验知识和深度学习模型的混合模型来克服这一挑战。然而，先前提出的用于HI估计的混合模型通常严重依赖系统特定信息，从而限制了其在其他系统中的可转移性。在这项工作中，我们提出了一种无监督的HI估计混合方法，将关于退化的一般知识集成到卷积自动编码器的模型架构和学习算法中，增强其在各种系统中的适用性。所提出的方法的有效性在两个不同领域的案例研究中得到了证明：涡轮风扇发动机和锂电池。结果表明，所提出的方法在HI质量和其对剩余有用寿命（RUL）预测的实用性方面优于其他竞争性方法，包括基于残差的方法。案例研究还突显了我们提出的方法与使用HI标签训练的监督模型的可比较性能。

更新时间: 2024-05-08 11:54:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04990v1

Finite Groundings for ASP with Functions: A Journey through Consistency

Answer set programming (ASP) is a logic programming formalism used in various areas of artificial intelligence like combinatorial problem solving and knowledge representation and reasoning. It is known that enhancing ASP with function symbols makes basic reasoning problems highly undecidable. However, even in simple cases, state of the art reasoners, specifically those relying on a ground-and-solve approach, fail to produce a result. Therefore, we reconsider consistency as a basic reasoning problem for ASP. We show reductions that give an intuition for the high level of undecidability. These insights allow for a more fine-grained analysis where we characterize ASP programs as "frugal" and "non-proliferous". For such programs, we are not only able to semi-decide consistency but we also propose a grounding procedure that yields finite groundings on more ASP programs with the concept of "forbidden" facts.

Updated: 2024-05-08 11:50:08

标题: ASP with Functions的有限groundings：一次通过一致性的旅程

摘要: 答案集编程（ASP）是一种逻辑编程形式主义，在人工智能领域的各个领域中被用于组合问题解决和知识表示和推理。已知，将ASP与函数符号结合使用会使基本推理问题变得高度不可判定。然而，即使在简单情况下，最先进的推理器，特别是依赖于基础求解方法的推理器，也无法产生结果。因此，我们重新考虑将一致性作为ASP的基本推理问题。我们展示了可以直观理解高度不可判定性的简化。这些见解可以进行更精细的分析，其中我们将ASP程序表征为“节俭”和“非扩散”。对于这样的程序，我们不仅能够半决定一致性，还提出了一种基于“禁止”事实概念的接地程序，从而在更多ASP程序上产生有限的接地。

更新时间: 2024-05-08 11:50:08

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2405.15794v1

An Artificial Intelligence Approach for Interpreting Creative Combinational Designs

Combinational creativity, a form of creativity involving the blending of familiar ideas, is pivotal in design innovation. While most research focuses on how combinational creativity in design is achieved through blending elements, this study focuses on the computational interpretation, specifically identifying the 'base' and 'additive' components that constitute a creative design. To achieve this goal, the authors propose a heuristic algorithm integrating computer vision and natural language processing technologies, and implement multiple approaches based on both discriminative and generative artificial intelligence architectures. A comprehensive evaluation was conducted on a dataset created for studying combinational creativity. Among the implementations of the proposed algorithm, the most effective approach demonstrated a high accuracy in interpretation, achieving 87.5% for identifying 'base' and 80% for 'additive'. We conduct a modular analysis and an ablation experiment to assess the performance of each part in our implementations. Additionally, the study includes an analysis of error cases and bottleneck issues, providing critical insights into the limitations and challenges inherent in the computational interpretation of creative designs.

Updated: 2024-05-08 11:47:32

标题: 一种用于解释创意组合设计的人工智能方法

摘要: 综合创造力是一种涉及融合熟悉思想的创造力形式，在设计创新中至关重要。虽然大多数研究侧重于设计中如何通过融合元素实现综合创造力，但本研究专注于计算解释，特别是识别构成创造性设计的“基础”和“附加”组件。为实现这一目标，作者提出了一个启发式算法，整合了计算机视觉和自然语言处理技术，并基于区别性和生成性人工智能架构实施了多种方法。对一个专门用于研究综合创造力的数据集进行了全面评估。在提出的算法实现中，最有效的方法展示了高准确性的解释，对于识别“基础”达到了87.5%，对于“附加”达到了80%。我们进行了模块化分析和消融实验，以评估我们实现中每个部分的性能。此外，研究还包括对错误案例和瓶颈问题的分析，提供了对计算解释创造性设计固有的限制和挑战的关键见解。

更新时间: 2024-05-08 11:47:32

领域: cs.AI,cs.CE

下载: http://arxiv.org/abs/2405.04985v1

Dynamic Data Layout Optimization with Worst-case Guarantees

Many data analytics systems store and process large datasets in partitions containing millions of rows. By mapping rows to partitions in an optimized way, it is possible to improve query performance by skipping over large numbers of irrelevant partitions during query processing. This mapping is referred to as a data layout. Recent works have shown that customizing the data layout to the anticipated query workload greatly improves query performance, but the performance benefits may disappear if the workload changes. Reorganizing data layouts to accommodate workload drift can resolve this issue, but reorganization costs could exceed query savings if not done carefully. In this paper, we present an algorithmic framework OReO that makes online reorganization decisions to balance the benefits of improved query performance with the costs of reorganization. Our framework extends results from Metrical Task Systems to provide a tight bound on the worst-case performance guarantee for online reorganization, without prior knowledge of the query workload. Through evaluation on real-world datasets and query workloads, our experiments demonstrate that online reorganization with OReO can lead to an up to 32% improvement in combined query and reorganization time compared to using a single, optimized data layout for the entire workload.

Updated: 2024-05-08 11:46:00

标题: 具有最坏情况保证的动态数据布局优化

摘要: 许多数据分析系统存储和处理包含数百万行的大型数据集。通过以优化的方式将行映射到分区，可以在查询处理过程中跳过大量无关的分区，从而提高查询性能。这种映射被称为数据布局。最近的研究表明，将数据布局定制为预期查询工作负载可以极大地提高查询性能，但如果工作负载发生变化，则性能优势可能会消失。重新组织数据布局以适应工作负载漂移可以解决这个问题，但如果不小心进行，重新组织成本可能会超过查询节省。在本文中，我们提出了一个算法框架OReO，该框架做出在线重新组织决策，以平衡提高查询性能的好处和重新组织的成本。我们的框架扩展了Metrical Task Systems的结果，为在线重新组织提供了最坏情况性能保证的严格界限，而无需先知道查询工作负载。通过对真实数据集和查询工作负载的评估，我们的实验表明，使用OReO进行在线重新组织可以使综合查询和重新组织时间比使用单个优化的数据布局整个工作负载提高高达32%。

更新时间: 2024-05-08 11:46:00

领域: cs.DB,cs.DS,cs.LG

下载: http://arxiv.org/abs/2405.04984v1

A View on Out-of-Distribution Identification from a Statistical Testing Theory Perspective

We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.

Updated: 2024-05-08 11:44:10

标题: 从统计检验理论的视角看待识别外分布数据

摘要: 我们研究了在监督学习和无监督学习环境中，在测试时高效检测超出分布（Out-of-Distribution，OOD）样本的问题。虽然机器学习模型通常是在训练和测试数据来自相同分布的假设下训练的，但在现实环境中通常并非如此，因此在部署时可靠地检测分布变化至关重要。我们重新构思了OOD问题，将其视为统计检验的问题，并讨论了使OOD问题在统计术语下可识别的条件。基于这一框架，我们研究了基于Wasserstein距离的OOD测试的收敛性保证，并提供了简单的实证评估。

更新时间: 2024-05-08 11:44:10

领域: cs.LG

下载: http://arxiv.org/abs/2405.03052v2

Deep Reinforcement Learning with Spiking Q-learning

With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (RL). There are only a few existing SNN-based RL methods at present. Most of them either lack generalization ability or employ Artificial Neural Networks (ANNs) to estimate value function in training. The former needs to tune numerous hyper-parameters for each scenario, and the latter limits the application of different types of RL algorithm and ignores the large energy consumption in training. To develop a robust spike-based RL method, we draw inspiration from non-spiking interneurons found in insects and propose the deep spiking Q-network (DSQN), using the membrane voltage of non-spiking neurons as the representation of Q-value, which can directly learn robust policies from high-dimensional sensory inputs using end-to-end RL. Experiments conducted on 17 Atari games demonstrate the DSQN is effective and even outperforms the ANN-based deep Q-network (DQN) in most games. Moreover, the experiments show superior learning stability and robustness to adversarial attacks of DSQN.

Updated: 2024-05-08 11:29:48

标题: 用脉冲Q学习进行深度强化学习

摘要: 借助特殊的神经形态硬件，脉冲神经网络（SNNs）有望实现更低能耗的人工智能（AI）。通过将SNNs与深度强化学习（RL）相结合，为现实控制任务提供了一种有前途的高效能源方式。目前只有少数基于SNN的RL方法。其中大部分要么缺乏泛化能力，要么在训练中使用人工神经网络（ANNs）来估计价值函数。前者需要为每种情况调整大量超参数，后者限制了不同类型RL算法的应用，并忽略了训练中的大能耗。为了开发一个稳健的基于脉冲的RL方法，我们借鉴了昆虫中发现的非脉冲间神经元，并提出了深度脉冲Q网络（DSQN），使用非脉冲神经元的膜电压作为Q值的表示，可以直接从高维感官输入中使用端到端RL学习稳健策略。对17个Atari游戏进行的实验表明，DSQN是有效的，甚至在大多数游戏中胜过基于ANN的深度Q网络（DQN）。此外，实验显示DSQN具有更好的学习稳定性和对抗性攻击的稳健性。

更新时间: 2024-05-08 11:29:48

领域: cs.NE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2201.09754v3

Investigating Self-Supervised Image Denoising with Denaturation

Self-supervised learning for image denoising problems in the presence of denaturation for noisy data is a crucial approach in machine learning. However, theoretical understanding of the performance of the approach that uses denatured data is lacking. To provide better understanding of the approach, in this paper, we analyze a self-supervised denoising algorithm that uses denatured data in depth through theoretical analysis and numerical experiments. Through the theoretical analysis, we discuss that the algorithm finds desired solutions to the optimization problem with the population risk, while the guarantee for the empirical risk depends on the hardness of the denoising task in terms of denaturation levels. We also conduct several experiments to investigate the performance of an extended algorithm in practice. The results indicate that the algorithm training with denatured images works, and the empirical performance aligns with the theoretical results. These results suggest several insights for further improvement of self-supervised image denoising that uses denatured data in future directions.

Updated: 2024-05-08 11:29:35

标题: 使用变性研究自监督图像去噪

摘要: 自监督学习对于存在噪声数据的图像去噪问题是机器学习中的一个关键方法。然而，对于使用变性数据的方法性能的理论理解尚不足。为了更好地理解这种方法，本文通过理论分析和数值实验深入分析了使用变性数据的自监督去噪算法。通过理论分析，我们讨论了该算法如何找到优化问题的期望解，而经验风险的保证取决于去噪任务的难度，即变性水平。我们还进行了几项实验，研究了扩展算法在实践中的表现。结果表明，训练使用变性图像的算法是有效的，经验表现与理论结果一致。这些结果为未来进一步改进使用变性数据的自监督图像去噪提供了一些见解。

更新时间: 2024-05-08 11:29:35

领域: stat.ML,cs.CV,cs.LG,eess.IV,math.ST,stat.TH

下载: http://arxiv.org/abs/2405.01124v2

The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews

Systematic review (SR) is a popular research method in software engineering (SE). However, conducting an SR takes an average of 67 weeks. Thus, automating any step of the SR process could reduce the effort associated with SRs. Our objective is to investigate if Large Language Models (LLMs) can accelerate title-abstract screening by simplifying abstracts for human screeners, and automating title-abstract screening. We performed an experiment where humans screened titles and abstracts for 20 papers with both original and simplified abstracts from a prior SR. The experiment with human screeners was reproduced with GPT-3.5 and GPT-4 LLMs to perform the same screening tasks. We also studied if different prompting techniques (Zero-shot (ZS), One-shot (OS), Few-shot (FS), and Few-shot with Chain-of-Thought (FS-CoT)) improve the screening performance of LLMs. Lastly, we studied if redesigning the prompt used in the LLM reproduction of screening leads to improved performance. Text simplification did not increase the screeners' screening performance, but reduced the time used in screening. Screeners' scientific literacy skills and researcher status predict screening performance. Some LLM and prompt combinations perform as well as human screeners in the screening tasks. Our results indicate that the GPT-4 LLM is better than its predecessor, GPT-3.5. Additionally, Few-shot and One-shot prompting outperforms Zero-shot prompting. Using LLMs for text simplification in the screening process does not significantly improve human performance. Using LLMs to automate title-abstract screening seems promising, but current LLMs are not significantly more accurate than human screeners. To recommend the use of LLMs in the screening process of SRs, more research is needed. We recommend future SR studies publish replication packages with screening data to enable more conclusive experimenting with LLM screening.

Updated: 2024-05-08 11:28:50

标题: 使用机器学习模型加速系统评价筛选过程的承诺和挑战

摘要: 系统性综述（SR）是软件工程（SE）中一种流行的研究方法。然而，进行一次SR平均需要67周的时间。因此，自动化SR过程中的任何步骤都可以减少与SR相关的工作量。我们的目标是调查大型语言模型（LLMs）是否能通过简化摘要来加速标题-摘要筛选，从而自动化标题-摘要筛选。我们进行了一个实验，人类筛选了20篇论文的标题和摘要，这些论文都有原始和简化的摘要，这些论文来自先前的SR。使用GPT-3.5和GPT-4 LLMs重现了人类筛选者的实验，以执行相同的筛选任务。我们还研究了不同的提示技术（Zero-shot（ZS），One-shot（OS），Few-shot（FS）和Few-shot with Chain-of-Thought（FS-CoT））是否提高了LLMs的筛选性能。最后，我们研究了重设计在LLM重现筛选中使用的提示是否会带来更好的性能。文本简化并没有提高筛选者的筛选性能，但减少了筛选时间。筛选者的科学素养技能和研究者身份可以预测筛选性能。一些LLM和提示组合在筛选任务中表现得和人类筛选者一样好。我们的结果表明，GPT-4 LLM比其前身GPT-3.5更好。此外，Few-shot和One-shot提示优于Zero-shot提示。在筛选过程中使用LLMs进行文本简化并没有显著提高人类的表现。使用LLMs自动化标题-摘要筛选似乎是有希望的，但目前的LLMs并没有比人类筛选者更准确。要推荐在SR的筛选过程中使用LLMs，需要进行更多的研究。我们建议未来的SR研究发布包含筛选数据的复制包，以便进行更具有决定性的LLM筛选实验。

更新时间: 2024-05-08 11:28:50

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.15667v4

Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI

Diffusion probabilistic models (DPMs) have exhibited significant effectiveness in computer vision tasks, particularly in image generation. However, their notable performance heavily relies on labelled datasets, which limits their application in medical images due to the associated high-cost annotations. Current DPM-related methods for lesion detection in medical imaging, which can be categorized into two distinct approaches, primarily rely on image-level annotations. The first approach, based on anomaly detection, involves learning reference healthy brain representations and identifying anomalies based on the difference in inference results. In contrast, the second approach, resembling a segmentation task, employs only the original brain multi-modalities as prior information for generating pixel-level annotations. In this paper, our proposed model - discrepancy distribution medical diffusion (DDMD) - for lesion detection in brain MRI introduces a novel framework by incorporating distinctive discrepancy features, deviating from the conventional direct reliance on image-level annotations or the original brain modalities. In our method, the inconsistency in image-level annotations is translated into distribution discrepancies among heterogeneous samples while preserving information within homogeneous samples. This property retains pixel-wise uncertainty and facilitates an implicit ensemble of segmentation, ultimately enhancing the overall detection performance. Thorough experiments conducted on the BRATS2020 benchmark dataset containing multimodal MRI scans for brain tumour detection demonstrate the great performance of our approach in comparison to state-of-the-art methods.

Updated: 2024-05-08 11:26:49

标题: 基于差异的扩散模型在脑MRI病变检测中的应用

摘要: 概率扩散模型（DPMs）在计算机视觉任务中表现出显著的有效性，特别是在图像生成方面。然而，它们的显著性能在很大程度上依赖于标记的数据集，这限制了它们在医学图像中的应用，因为相关的高成本注释。目前与医学影像中的病变检测有关的DPM方法可以分为两种不同的方法，主要依赖于图像级别的注释。第一种方法基于异常检测，涉及学习参考健康脑部表征，并根据推理结果的差异识别异常。相反，第二种方法类似于分割任务，仅利用原始脑部多模态作为生成像素级注释的先验信息。在本文中，我们提出的模型 - 差异分布医学扩散（DDMD） - 用于脑部MRI中的病变检测引入了一个新颖的框架，通过将独特的差异特征纳入其中，偏离了对图像级别注释或原始脑部模态的传统直接依赖。在我们的方法中，图像级别注释的不一致性被转化为异质样本之间的分布差异，同时保留同质样本中的信息。这种特性保留了像素级的不确定性，并促进了分割的隐式集成，最终提高了整体检测性能。在包含用于脑肿瘤检测的多模态MRI扫描的BRATS2020基准数据集上进行的彻底实验证明了我们方法与最先进方法相比的出色性能。

更新时间: 2024-05-08 11:26:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04974v1

Detection of Sleep Oxygen Desaturations from Electroencephalogram Signals

In this work, we leverage machine learning techniques to identify potential biomarkers of oxygen desaturation during sleep exclusively from electroencephalogram (EEG) signals in pediatric patients with sleep apnea. Development of a machine learning technique which can successfully identify EEG signals from patients with sleep apnea as well as identify latent EEG signals which come from subjects who experience oxygen desaturations but do not themselves occur during oxygen desaturation events would provide a strong step towards developing a brain-based biomarker for sleep apnea in order to aid with easier diagnosis of this disease. We leverage a large corpus of data, and show that machine learning enables us to classify EEG signals as occurring during oxygen desaturations or not occurring during oxygen desaturations with an average 66.8% balanced accuracy. We furthermore investigate the ability of machine learning models to identify subjects who experience oxygen desaturations from EEG data that does not occur during oxygen desaturations. We conclude that there is a potential biomarker for oxygen desaturation in EEG data.

Updated: 2024-05-08 11:25:12

标题: 使用脑电图信号检测睡眠期间的氧饱和度下降

摘要: 在这项工作中，我们利用机器学习技术从儿童睡眠呼吸暂停症患者的脑电图（EEG）信号中识别潜在的氧饱和度生物标志物。开发一种机器学习技术，能够成功识别睡眠呼吸暂停症患者的EEG信号，并识别出来自经历氧饱和度下降但本身没有发生氧饱和度事件的主体的潜在EEG信号，将有助于为睡眠呼吸暂停症开发基于大脑的生物标志物，以便更容易地诊断这种疾病。我们利用大量数据，并展示了机器学习使我们能够将EEG信号分类为发生在氧饱和度下降期间或不发生在氧饱和度下降期间，平均平衡准确率为66.8％。此外，我们进一步研究了机器学习模型识别出从EEG数据中经历氧饱和度下降的主体的能力。我们得出结论，EEG数据中存在氧饱和度下降的潜在生物标志物。

更新时间: 2024-05-08 11:25:12

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.09566v1

Overcoming Anchoring Bias: The Potential of AI and XAI-based Decision Support

Information systems (IS) are frequently designed to leverage the negative effect of anchoring bias to influence individuals' decision-making (e.g., by manipulating purchase decisions). Recent advances in Artificial Intelligence (AI) and the explanations of its decisions through explainable AI (XAI) have opened new opportunities for mitigating biased decisions. So far, the potential of these technological advances to overcome anchoring bias remains widely unclear. To this end, we conducted two online experiments with a total of N=390 participants in the context of purchase decisions to examine the impact of AI and XAI-based decision support on anchoring bias. Our results show that AI alone and its combination with XAI help to mitigate the negative effect of anchoring bias. Ultimately, our findings have implications for the design of AI and XAI-based decision support and IS to overcome cognitive biases.

Updated: 2024-05-08 11:25:04

标题: 克服锚定偏见：人工智能和可解释人工智能决策支持的潜力

摘要: 信息系统（IS）经常被设计为利用锚定偏见的负面影响来影响个人的决策（例如，通过操纵购买决策）。人工智能（AI）的最新进展以及通过可解释的人工智能（XAI）对其决策的解释已经为减轻偏见决策打开了新的机会。到目前为止，这些技术进步克服锚定偏见的潜力仍然不清楚。为此，我们在购买决策的背景下进行了两项在线实验，共有N = 390名参与者，以研究AI和XAI基于决策支持对锚定偏见的影响。我们的结果表明，单独使用AI及其与XAI的结合有助于减轻锚定偏见的负面影响。最终，我们的研究结果对设计基于AI和XAI的决策支持和IS以克服认知偏见具有重要意义。

更新时间: 2024-05-08 11:25:04

领域: cs.CY,cs.AI,cs.HC,cs.LG,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2405.04972v1

A review on discriminative self-supervised learning methods

In the field of computer vision, self-supervised learning has emerged as a method to extract robust features from unlabeled data, where models derive labels autonomously from the data itself, without the need for manual annotation. This paper provides a comprehensive review of discriminative approaches of self-supervised learning within the domain of computer vision, examining their evolution and current status. Through an exploration of various methods including contrastive, self-distillation, knowledge distillation, feature decorrelation, and clustering techniques, we investigate how these approaches leverage the abundance of unlabeled data. Finally, we have comparison of self-supervised learning methods on the standard ImageNet classification benchmark.

Updated: 2024-05-08 11:15:20

标题: 一个关于区分性自监督学习方法的综述

摘要: 在计算机视觉领域，自监督学习已经成为一种从未标记数据中提取稳健特征的方法，模型可以自主地从数据中提取标签，无需手动注释。本文对在计算机视觉领域内的自监督学习的辨别方法进行了全面的审查，考察它们的发展和当前状态。通过探讨各种方法，包括对比、自蒸馏、知识蒸馏、特征去相关化和聚类技术，我们研究了这些方法如何利用大量未标记数据。最后，我们在标准的ImageNet分类基准上对自监督学习方法进行了比较。

更新时间: 2024-05-08 11:15:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04969v1

AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer

Mixup is an effective data augmentation method that generates new augmented samples by aggregating linear combinations of different original samples. However, if there are noises or aberrant features in the original samples, Mixup may propagate them to the augmented samples, leading to over-sensitivity of the model to these outliers . To solve this problem, this paper proposes a new Mixup method called AMPLIFY. This method uses the Attention mechanism of Transformer itself to reduce the influence of noises and aberrant values in the original samples on the prediction results, without increasing additional trainable parameters, and the computational cost is very low, thereby avoiding the problem of high resource consumption in common Mixup methods such as Sentence Mixup . The experimental results show that, under a smaller computational resource cost, AMPLIFY outperforms other Mixup methods in text classification tasks on 7 benchmark datasets, providing new ideas and new ways to further improve the performance of pre-trained models based on the Attention mechanism, such as BERT, ALBERT, RoBERTa, and GPT. Our code can be obtained at https://github.com/kiwi-lilo/AMPLIFY.

Updated: 2024-05-08 11:14:41

标题: AMPLIFY：基于注意力的混合增强技术在Transformer中的性能改进和标签平滑化

摘要: 混合是一种有效的数据增强方法，通过聚合不同原始样本的线性组合生成新的增强样本。然而，如果原始样本中存在噪声或异常特征，混合可能会将其传播到增强样本，导致模型对这些异常值过于敏感。为解决这一问题，本文提出了一种名为AMPLIFY的新混合方法。该方法利用Transformer本身的注意机制来减少原始样本中的噪声和异常值对预测结果的影响，而不增加额外的可训练参数，并且计算成本非常低，从而避免了常见混合方法（如句子混合）中高资源消耗的问题。实验结果表明，在较小的计算资源成本下，AMPLIFY在7个基准数据集的文本分类任务中优于其他混合方法，为进一步提高基于注意机制的预训练模型（如BERT、ALBERT、RoBERTa和GPT）的性能提供了新思路和新方法。我们的代码可在https://github.com/kiwi-lilo/AMPLIFY 获取。

更新时间: 2024-05-08 11:14:41

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2309.12689v3

SmmPack: Obfuscation for SMM Modules with TPM Sealed Key

System Management Mode (SMM) is the highest-privileged operating mode of x86 and x86-64 processors. Through SMM exploitation, attackers can tamper with the Unified Extensible Firmware Interface (UEFI) firmware, disabling the security mechanisms implemented by the operating system and hypervisor. Vulnerabilities enabling SMM code execution are often reported as Common Vulnerabilities and Exposures (CVEs); however, no security mechanisms currently exist to prevent attackers from analyzing those vulnerabilities. To increase the cost of vulnerability analysis of SMM modules, we introduced SmmPack. The core concept of SmmPack involves encrypting an SMM module with the key securely stored in a Trusted Platform Module (TPM). We assessed the effectiveness of SmmPack in preventing attackers from obtaining and analyzing SMM modules using various acquisition methods. Our results show that SmmPack significantly increases the cost by narrowing down the means of module acquisition. Furthermore, we demonstrated that SmmPack operates without compromising the performance of the original SMM modules. We also clarified the management and adoption methods of SmmPack, as well as the procedure for applying BIOS updates, and demonstrated that the implementation of SmmPack is realistic.

Updated: 2024-05-08 11:13:03

标题: SmmPack：使用TPM密封密钥对SMM模块进行混淆

摘要: 系统管理模式（SMM）是x86和x86-64处理器的最高特权操作模式。通过SMM利用，攻击者可以篡改统一可扩展固件接口（UEFI）固件，从而禁用操作系统和虚拟化程序实施的安全机制。常常有关于SMM代码执行的漏洞被报告为常见漏洞和暴露（CVEs）；然而，目前不存在阻止攻击者分析这些漏洞的安全机制。为了增加对SMM模块漏洞分析的成本，我们引入了SmmPack。SmmPack的核心概念涉及使用在可信平台模块（TPM）中安全存储的密钥加密SMM模块。我们评估了SmmPack在阻止攻击者获取和分析SMM模块时的有效性，使用各种获取方法。我们的结果表明，SmmPack通过缩小模块获取手段显著增加了成本。此外，我们证明了SmmPack在不损害原始SMM模块性能的情况下运行。我们还澄清了SmmPack的管理和采用方法，以及应用BIOS更新的程序，并证明了SmmPack的实施是现实的。

更新时间: 2024-05-08 11:13:03

领域: cs.CR

下载: http://arxiv.org/abs/2405.04355v2

Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.

Updated: 2024-05-08 11:03:22

标题: 相关的无关性：为图像分类器生成虚构解释

摘要: 在本文中，我们展示了对黑匣子图像分类器进行alterfactual解释的可行性。来自反事实思维领域的传统解释机制是可解释人工智能（XAI）的一个广泛使用的范式，因为它们遵循人类熟悉的一种自然推理方式。然而，这个领域中大多数常见的方法都是基于传达关于对AI决策特别重要的特征或特性的信息。然而，要完全理解一个决策，不仅需要了解相关特征的知识，而且对于无关信息的意识也极大地有助于创建用户对AI系统的心理模型。因此，最近在概念层面上提出了一种解释AI系统的新方法，称为alterfactual解释。它基于展示一个替代现实，其中AI输入的无关特征被改变。通过这样做，用户直接看到哪些输入数据特征可以任意改变而不影响AI的决策。在本文中，我们首次展示了这个想法可以应用于基于神经网络的黑匣子模型。为此，我们提出了一种基于GAN的方法，用于为二进制图像分类器生成这些alterfactual解释。此外，我们进行了一项用户研究，深入探讨了alterfactual解释如何补充counterfactual解释。

更新时间: 2024-05-08 11:03:22

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.05295v1

Improving Long Text Understanding with Knowledge Distilled from Summarization Model

Long text understanding is important yet challenging for natural language processing. A long article or document usually contains many redundant words that are not pertinent to its gist and sometimes can be regarded as noise. With recent advances of abstractive summarization, we propose our \emph{Gist Detector} to leverage the gist detection ability of a summarization model and integrate the extracted gist into downstream models to enhance their long text understanding ability. Specifically, Gist Detector first learns the gist detection knowledge distilled from a summarization model, and then produces gist-aware representations to augment downstream models. We evaluate our method on three different tasks: long document classification, distantly supervised open-domain question answering, and non-parallel text style transfer. The experimental results show that our method can significantly improve the performance of baseline models on all tasks.

Updated: 2024-05-08 10:49:39

标题: 利用从摘要模型中提炼的知识提升长文本理解

摘要: 长文本理解对自然语言处理来说既重要又具有挑战性。一篇长文章或文档通常包含许多与其要点无关的冗余词语，有时候这些词语可以被视为噪音。随着抽象摘要技术的最新进展，我们提出了我们的“主旨检测器”来利用摘要模型的主旨检测能力，并将提取出的主旨集成到下游模型中，以增强它们对长文本的理解能力。具体地，主旨检测器首先学习从摘要模型中提炼出的主旨检测知识，然后产生主旨感知的表示以增强下游模型。我们在三个不同的任务上评估了我们的方法：长文档分类、远程监督的开放领域问答和非平行文本风格转换。实验结果表明，我们的方法在所有任务上都能显著提高基线模型的性能。

更新时间: 2024-05-08 10:49:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04955v1

Introducing PetriRL: An Innovative Framework for JSSP Resolution Integrating Petri nets and Event-based Reinforcement Learning

Resource utilization and production process optimization are crucial for companies in today's competitive industrial landscape. Addressing the complexities of job shop scheduling problems (JSSP) is essential to improving productivity, reducing costs, and ensuring timely delivery. We propose PetriRL, a novel framework integrating Petri nets and deep reinforcement learning (DRL) for JSSP optimization. PetriRL capitalizes on the inherent strengths of Petri nets in modelling discrete event systems while leveraging the advantages of a graph structure. The Petri net governs automated components of the process, ensuring adherence to JSSP constraints. This allows for synergistic collaboration with optimization algorithms such as DRL, particularly in critical decision-making. Unlike traditional methods, PetriRL eliminates the need to preprocess JSSP instances into disjunctive graphs and enhances the explainability of process status through its graphical structure based on places and transitions. Additionally, the inherent graph structure of Petri nets enables the dynamic additions of job operations during the inference phase without requiring agent retraining, thus enhancing flexibility. Experimental results demonstrate PetriRL's robust generalization across various instance sizes and its competitive performance on public test benchmarks and randomly generated instances. Results are compared to a wide range of optimization solutions such as heuristics, metaheuristics, and learning-based algorithms. Finally, the added values of the framework's key elements, such as event-based control and action masking, are studied in the ablation study.

Updated: 2024-05-08 10:47:57

标题: 引入PetriRL：将Petri网和基于事件的强化学习集成到JSSP解决框架中

摘要: 资源利用和生产流程优化对于当今竞争激烈的工业景观中的公司至关重要。解决作业车间调度问题（JSSP）的复杂性对于提高生产力、降低成本和确保及时交付至关重要。我们提出了PetriRL，这是一个集成了Petri网和深度强化学习（DRL）的新颖框架，用于JSSP优化。PetriRL充分利用了Petri网在建模离散事件系统方面的固有优势，同时利用了图结构的优势。Petri网管理了过程的自动化组件，确保遵守JSSP约束。这允许与优化算法（如DRL）进行协同合作，特别是在关键决策方面。与传统方法不同，PetriRL消除了将JSSP实例预处理为不相交图的需求，并通过基于位置和转换的图形结构增强了过程状态的可解释性。此外，Petri网的固有图结构使得在推理阶段动态添加作业操作成为可能，而无需进行代理重新训练，从而增强了灵活性。实验结果表明，PetriRL在各种实例大小上具有强大的泛化能力，并在公共测试基准和随机生成的实例上表现出竞争力。结果与广泛的优化解决方案进行了比较，如启发式方法、元启发式方法和基于学习的算法。最后，对框架的关键要素（如基于事件的控制和动作屏蔽）的附加价值进行了消融研究。

更新时间: 2024-05-08 10:47:57

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.00046v2

Enhancing Social Media Post Popularity Prediction with Visual Content

Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.

Updated: 2024-05-08 10:47:28

标题: 使用视觉内容提升社交媒体帖子受欢迎预测

摘要: 我们的研究提出了一个框架，用于预测基于图像的社交媒体内容的流行度，重点解决复杂的图像信息和分层数据结构。我们利用Google Cloud Vision API有效地从用户的帖子中提取关键的图像和颜色信息，与仅使用非图像协变量相比，实现了更高达6.8％的准确性。在预测方面，我们探索了一系列预测模型，包括线性混合模型、支持向量回归、多层感知器、随机森林和XGBoost，以线性回归作为基准。我们的比较研究表明，能够捕捉协变量之间潜在的非线性交互作用的模型胜过其他方法。

更新时间: 2024-05-08 10:47:28

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.02367v2

Supervised Anomaly Detection for Complex Industrial Images

Automating visual inspection in industrial production lines is essential for increasing product quality across various industries. Anomaly detection (AD) methods serve as robust tools for this purpose. However, existing public datasets primarily consist of images without anomalies, limiting the practical application of AD methods in production settings. To address this challenge, we present (1) the Valeo Anomaly Dataset (VAD), a novel real-world industrial dataset comprising 5000 images, including 2000 instances of challenging real defects across more than 20 subclasses. Acknowledging that traditional AD methods struggle with this dataset, we introduce (2) Segmentation-based Anomaly Detector (SegAD). First, SegAD leverages anomaly maps as well as segmentation maps to compute local statistics. Next, SegAD uses these statistics and an optional supervised classifier score as input features for a Boosted Random Forest (BRF) classifier, yielding the final anomaly score. Our SegAD achieves state-of-the-art performance on both VAD (+2.1% AUROC) and the VisA dataset (+0.4% AUROC). The code and the models are publicly available.

Updated: 2024-05-08 10:47:28

标题: 监督式复杂工业图像异常检测

摘要: 将视觉检测自动化应用于工业生产线对于提高各行业产品质量至关重要。异常检测（AD）方法是实现这一目标的强大工具。然而，现有的公开数据集主要包含没有异常的图像，限制了在生产环境中应用AD方法的实际性。为了解决这一挑战，我们提出（1）Valeo异常数据集（VAD），这是一个新颖的真实世界工业数据集，包括5000张图像，其中包含2000个具有挑战性的真实缺陷实例，涵盖了20多个子类。鉴于传统的AD方法在处理该数据集时存在困难，我们引入了（2）基于分割的异常检测器（SegAD）。首先，SegAD利用异常图和分割图计算局部统计信息。接下来，SegAD将这些统计信息和一个可选的监督分类器分数作为Boosted Random Forest（BRF）分类器的输入特征，产生最终的异常得分。我们的SegAD在VAD数据集（+2.1% AUROC）和VisA数据集（+0.4% AUROC）上均取得了最先进的性能。代码和模型已公开发布。

更新时间: 2024-05-08 10:47:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04953v1

VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context. Yet, a challenging type of visual math lies in the multimodal graph theory problem, which demands that LMMs understand the graphical structures accurately and perform multi-step reasoning on the visual graph. Additionally, exploring multimodal graph theory problems will lead to more effective strategies in fields like biology, transportation, and robotics planning. To step forward in this direction, we are the first to design a benchmark named VisionGraph, used to explore the capabilities of advanced LMMs in solving multimodal graph theory problems. It encompasses eight complex graph problem tasks, from connectivity to shortest path problems. Subsequently, we present a Description-Program-Reasoning (DPR) chain to enhance the logical accuracy of reasoning processes through graphical structure description generation and algorithm-aware multi-step reasoning. Our extensive study shows that 1) GPT-4V outperforms Gemini Pro in multi-step graph reasoning; 2) All LMMs exhibit inferior perception accuracy for graphical structures, whether in zero/few-shot settings or with supervised fine-tuning (SFT), which further affects problem-solving performance; 3) DPR significantly improves the multi-step graph reasoning capabilities of LMMs and the GPT-4V (DPR) agent achieves SOTA performance.

Updated: 2024-05-08 10:42:48

标题: VisionGraph：利用大型多模型图模型解决视觉背景下的图论问题

摘要: 大型多模态模型（LMMs）在视觉理解和推理方面取得了令人印象深刻的成功，显着提高了视觉背景下数学推理的性能。然而，一种具有挑战性的视觉数学问题是多模态图论问题，要求LMMs准确理解图形结构并在视觉图中进行多步推理。此外，探索多模态图论问题将在生物学、交通和机器人规划等领域带来更有效的策略。为了朝着这个方向迈出一步，我们首次设计了一个名为VisionGraph的基准，用于探索高级LMMs在解决多模态图论问题方面的能力。它包括从连通性到最短路径问题的八个复杂图问题任务。随后，我们提出了一个描述-程序-推理（DPR）链，通过图形结构描述生成和算法感知多步推理来增强推理过程的逻辑准确性。我们的广泛研究表明：1）GPT-4V在多步图推理方面优于Gemini Pro；2）所有LMMs在图形结构的感知准确性方面表现较差，无论是零/少样本设置还是通过监督微调（SFT），这进一步影响了问题解决性能；3）DPR显著提高了LMMs和GPT-4V（DPR）代理的多步图推理能力，GPT-4V（DPR）代理实现了SOTA性能。

更新时间: 2024-05-08 10:42:48

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.04950v1

Comparative Study of Recurrent Neural Networks for Virtual Analog Audio Effects Modeling

Analog electronic circuits are at the core of an important category of musical devices. The nonlinear features of their electronic components give analog musical devices a distinctive timbre and sound quality, making them highly desirable. Artificial neural networks have rapidly gained popularity for the emulation of analog audio effects circuits, particularly recurrent networks. While neural approaches have been successful in accurately modeling distortion circuits, they require architectural improvements that account for parameter conditioning and low latency response. In this article, we explore the application of recent machine learning advancements for virtual analog modeling. We compare State Space models and Linear Recurrent Units against the more common Long Short Term Memory networks. These have shown promising ability in sequence to sequence modeling tasks, showing a notable improvement in signal history encoding. Our comparative study uses these black box neural modeling techniques with a variety of audio effects. We evaluate the performance and limitations using multiple metrics aiming to assess the models' ability to accurately replicate energy envelopes, frequency contents, and transients in the audio signal. To incorporate control parameters we employ the Feature wise Linear Modulation method. Long Short Term Memory networks exhibit better accuracy in emulating distortions and equalizers, while the State Space model, followed by Long Short Term Memory networks when integrated in an encoder decoder structure, outperforms others in emulating saturation and compression. When considering long time variant characteristics, the State Space model demonstrates the greatest accuracy. The Long Short Term Memory and, in particular, Linear Recurrent Unit networks present more tendency to introduce audio artifacts.

Updated: 2024-05-08 10:35:02

标题: 基于循环神经网络的虚拟模拟音频效果建模的比较研究

摘要: 模拟电子电路是一类重要音乐设备的核心。它们的电子元件的非线性特性赋予模拟音乐设备独特的音色和音质，使其备受青睐。人工神经网络迅速流行起来，用于模拟模拟音频效果电路，尤其是循环网络。虽然神经方法在准确建模失真电路方面取得了成功，但它们需要考虑参数调节和低延迟响应的架构改进。在本文中，我们探讨了最近机器学习进展在虚拟模拟建模中的应用。我们将状态空间模型和线性递归单元与更常见的长短期记忆网络进行比较。这些模型在序列到序列建模任务中显示了有希望的能力，表现出信号历史编码方面的显著改进。我们使用这些黑盒神经建模技术对各种音频效果进行了比较研究。我们评估性能和限制，使用多种指标旨在评估模型准确复制音频信号中的能量包络、频率内容和瞬态的能力。为了结合控制参数，我们采用特征智能线性调制方法。长短期记忆网络在模拟失真和均衡器方面表现出更好的准确性，而状态空间模型在结合编码器解码器结构时，超过其他模型在模拟饱和和压缩方面的表现。在考虑长期变化特性时，状态空间模型展现了最高的准确性。长短期记忆网络和特别是线性递归单元网络更容易引入音频伪像。

更新时间: 2024-05-08 10:35:02

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2405.04124v2

A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs, linearized Laplace approximations, and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) deep ensembles perform relatively poorly; (v) infinite-width BNNs are particularly promising, especially in high dimensions.

Updated: 2024-05-08 10:30:22

标题: 一个关于贝叶斯优化中贝叶斯神经网络代理的研究

摘要: 贝叶斯优化是一种高效的优化目标函数的方法，这些函数往往是昂贵的查询。这些目标通常由高斯过程（GP）替代模型表示，这些模型易于优化并支持精确推断。虽然标准GP替代模型在贝叶斯优化中已经得到了很好的建立，但贝叶斯神经网络（BNNs）最近已经成为实用的函数逼近器，具有许多优点，例如自然处理非平稳性和学习高维数据的表示。在本文中，我们研究了BNNs作为标准GP替代模型的替代方案。我们考虑了各种有限宽度BNNs的近似推断过程，包括高质量的哈密尔顿蒙特卡罗、低成本的随机MCMC和启发式方法，如深度集成。我们还考虑了无限宽度BNNs、线性化的拉普拉斯近似和部分随机模型，如深度核学习。我们在具有不同维度、目标数量、非平稳性以及离散和连续输入的各种问题上评估了这些替代模型。我们发现：(i) 方法的排名高度依赖于问题，这表明需要量身定制的归纳偏见；(ii) HMC是全随机BNNs的最成功的近似推断程序；(iii) 完全随机性可能是不必要的，因为深度核学习相对竞争力较强；(iv) 深度集成表现相对较差；(v) 无限宽度BNNs特别有前景，特别是在高维度情况下。

更新时间: 2024-05-08 10:30:22

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2305.20028v2

A Sparse Tensor Generator with Efficient Feature Extraction

Sparse tensor operations are gaining attention in emerging applications such as social networks, deep learning, diagnosis, crime, and review analysis. However, a major obstacle for research in sparse tensor operations is the deficiency of a broad-scale sparse tensor dataset. Another challenge in sparse tensor operations is examining the sparse tensor features, which are not only important for revealing its nonzero pattern but also have a significant impact on determining the best-suited storage format, the decomposition algorithm, and the reordering methods. However, due to the large sizes of real tensors, even extracting these features becomes costly without caution. To address these gaps in the literature, we have developed a smart sparse tensor generator that mimics the substantial features of real sparse tensors. Moreover, we propose various methods for efficiently extracting an extensive set of features for sparse tensors. The effectiveness of our generator is validated through the quality of features and the performance of decomposition in the generated tensors. Both the sparse tensor feature extractor and the tensor generator are open source with all the artifacts available at https://github.com/sparcityeu/feaTen and https://github.com/sparcityeu/genTen, respectively.

Updated: 2024-05-08 10:28:20

标题: 一个具有高效特征提取的稀疏张量生成器

摘要: 稀疏张量操作正在新兴应用中受到关注，如社交网络、深度学习、诊断、犯罪和评论分析。然而，在稀疏张量操作的研究中的一个主要障碍是缺乏广泛规模的稀疏张量数据集。稀疏张量操作中的另一个挑战是检查稀疏张量特征，这些特征不仅对揭示其非零模式至关重要，还对确定最适合的存储格式、分解算法和重新排序方法具有重要影响。然而，由于实际张量的规模很大，即使在谨慎处理的情况下，提取这些特征也会变得昂贵。为了填补文献中的这些空白，我们开发了一个智能稀疏张量生成器，模拟真实稀疏张量的重要特征。此外，我们提出了各种方法，有效地提取稀疏张量的广泛特征集。通过生成张量中特征的质量和分解的性能，验证了我们生成器的有效性。稀疏张量特征提取器和张量生成器均为开源，并且所有工件均可在https://github.com/sparcityeu/feaTen和https://github.com/sparcityeu/genTen找到。

更新时间: 2024-05-08 10:28:20

领域: cs.MS,cs.LG,G.4

下载: http://arxiv.org/abs/2405.04944v1

Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPS give access to results for the widely studied POSG model; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.

Updated: 2024-05-08 10:22:49

标题: 不精确概率遇见部分可观测性：鲁棒POMDPs的游戏语义

摘要: 部分可观察的马尔可夫决策过程（POMDP）依赖于一个关键假设，即概率分布是精确已知的。鲁棒POMDP（RPOMDP）通过定义不精确概率（称为不确定性集）来缓解这一担忧。虽然鲁棒MDP已经得到广泛研究，但对RPOMDP的研究有限，主要集中在算法解决方法上。我们通过展示以下内容扩展了对RPOMDP的理论理解：1）对不确定性集的不同假设会影响最优策略和价值；2）RPOMDP具有部分可观察随机博弈（POSG）语义；3）相同的RPOMDP在不同假设下会导致语义上不同的POSG，从而产生不同的策略和价值。这些新颖的RPOMDP语义为广泛研究的POSG模型提供了结果；具体而言，我们展示了纳什均衡的存在。最后，我们使用我们的语义对现有的RPOMDP文献进行分类，澄清了这些现有作品在哪些不确定性假设下运作。

更新时间: 2024-05-08 10:22:49

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2405.04941v1

Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinatorial matching to address the one-to-one mapping property that exists in many documents, where one field has only one corresponding entity. Moreover, our method is highly parameter-efficient compared to existing approaches in terms of the number of trainable parameters. Despite this, experiments across various datasets show our method outperforms baselines in most entity types. Many real-world documents exhibit combinatorial properties which can be leveraged as inductive biases to improve extraction accuracy, but existing datasets do not cover these documents. To facilitate future research into these types of documents, we release a new ID document dataset that covers diverse templates and languages. We also release enhanced annotations for an existing dataset.

Updated: 2024-05-08 10:10:38

标题: 轻量级空间建模用于从文档中提取组合信息

摘要: 由多种模板构成并展示复杂空间结构的文档对文档实体分类构成挑战。我们提出了KNN-former，该方法基于文档实体的K最近邻（KNN）图，在注意力计算中引入一种新的空间偏差。我们将实体的注意力限制在由KNN图定义的局部半径范围内。我们还使用组合匹配来解决许多文档中存在的一对一映射属性，其中一个字段只有一个对应的实体。此外，我们的方法在可训练参数数量方面与现有方法相比非常高效。尽管如此，跨多个数据集的实验证明我们的方法在大多数实体类型上优于基线方法。许多真实世界的文档展示出组合特性，可以被利用作为归纳偏差来提高提取准确性，但现有数据集并不涵盖这些文档。为了促进对这些文档类型的未来研究，我们发布了一个涵盖多种模板和语言的新的身份证件数据集。我们还为现有数据集发布了增强的注释。

更新时间: 2024-05-08 10:10:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06701v1

Fault Identification Enhancement with Reinforcement Learning (FIERL)

This letter presents a novel approach in the field of Active Fault Detection (AFD), by explicitly separating the task into two parts: Passive Fault Detection (PFD) and control input design. This formulation is very general, and most existing AFD literature can be viewed through this lens. By recognizing this separation, PFD methods can be leveraged to provide components that make efficient use of the available information, while the control input is designed in order to optimize the gathering of information. The core contribution of this work is FIERL, a general simulation-based approach for the design of such control strategies, using Constrained Reinforcement Learning (CRL) to optimize the performance of arbitrary passive detectors. The control policy is learned without the need of knowing the passive detector inner workings, making FIERL broadly applicable. However, it is especially useful when paired with the design of an efficient passive component. Unlike most AFD approaches, FIERL can handle fairly complex scenarios such as continuous sets of fault modes. The effectiveness of FIERL is tested on a benchmark problem for actuator fault diagnosis, where FIERL is shown to be fairly robust, being able to generalize to fault dynamics not seen in training.

Updated: 2024-05-08 10:10:24

标题: 使用强化学习增强的故障识别（FIERL）

摘要: 这封信介绍了在主动故障检测（AFD）领域中的一种新方法，将任务明确分为两部分：被动故障检测（PFD）和控制输入设计。这种表述非常普遍，大多数现有的AFD文献都可以通过这个视角来看待。通过识别这种分离，PFD方法可以利用提供的信息进行高效利用，同时设计控制输入以优化信息的收集。这项工作的核心贡献是FIERL，一种用于设计这种控制策略的通用基于模拟的方法，使用约束强化学习（CRL）来优化任意被动检测器的性能。控制策略是在不需要了解被动检测器内部工作原理的情况下学习的，使得FIERL具有广泛的适用性。然而，当与高效的被动组件设计结合时，它尤其有用。与大多数AFD方法不同，FIERL可以处理相当复杂的情景，例如连续的故障模式集。FIERL的有效性在执行器故障诊断的基准问题上得到了测试，结果显示FIERL相当稳健，能够泛化到训练中未见的故障动态。

更新时间: 2024-05-08 10:10:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.04938v1

Simultaneous identification of models and parameters of scientific simulators

Many scientific models are composed of multiple discrete components, and scientists often make heuristic decisions about which components to include. Bayesian inference provides a mathematical framework for systematically selecting model components, but defining prior distributions over model components and developing associated inference schemes has been challenging. We approach this problem in a simulation-based inference framework: We define model priors over candidate components and, from model simulations, train neural networks to infer joint probability distributions over both model components and associated parameters. Our method, simulation-based model inference (SBMI), represents distributions over model components as a conditional mixture of multivariate binary distributions in the Grassmann formalism. SBMI can be applied to any compositional stochastic simulator without requiring likelihood evaluations. We evaluate SBMI on a simple time series model and on two scientific models from neuroscience, and show that it can discover multiple data-consistent model configurations, and that it reveals non-identifiable model components and parameters. SBMI provides a powerful tool for data-driven scientific inquiry which will allow scientists to identify essential model components and make uncertainty-informed modelling decisions.

Updated: 2024-05-08 10:09:58

标题: 科学模拟器模型和参数的同时识别

摘要: 许多科学模型由多个离散组件组成，科学家经常在包含哪些组件方面做出启发式决策。贝叶斯推断提供了一个数学框架，用于系统地选择模型组件，但是定义模型组件的先验分布并开发相关的推断方案一直是具有挑战性的。我们在基于模拟的推断框架中解决了这个问题：我们在候选组件上定义模型先验，并从模型模拟中训练神经网络，以推断关于模型组件和相关参数的联合概率分布。我们的方法，基于模拟的模型推断（SBMI），将模型组件的分布表示为Grassmann形式中多元二进制分布的条件混合物。SBMI可以应用于任何组合性随机模拟器，而无需进行似然评估。我们在一个简单的时间序列模型和两个来自神经科学领域的科学模型上评估了SBMI，并显示它可以发现多个与数据一致的模型配置，并揭示了不可识别的模型组件和参数。SBMI为数据驱动的科学探究提供了强大的工具，这将使科学家能够识别必要的模型组件并做出基于不确定性的建模决策。

更新时间: 2024-05-08 10:09:58

领域: cs.LG

下载: http://arxiv.org/abs/2305.15174v2

Developing trustworthy AI applications with foundation models

The trustworthiness of AI applications has been the subject of recent research and is also addressed in the EU's recently adopted AI Regulation. The currently emerging foundation models in the field of text, speech and image processing offer completely new possibilities for developing AI applications. This whitepaper shows how the trustworthiness of an AI application developed with foundation models can be evaluated and ensured. For this purpose, the application-specific, risk-based approach for testing and ensuring the trustworthiness of AI applications, as developed in the 'AI Assessment Catalog - Guideline for Trustworthy Artificial Intelligence' by Fraunhofer IAIS, is transferred to the context of foundation models. Special consideration is given to the fact that specific risks of foundation models can have an impact on the AI application and must also be taken into account when checking trustworthiness. Chapter 1 of the white paper explains the fundamental relationship between foundation models and AI applications based on them in terms of trustworthiness. Chapter 2 provides an introduction to the technical construction of foundation models and Chapter 3 shows how AI applications can be developed based on them. Chapter 4 provides an overview of the resulting risks regarding trustworthiness. Chapter 5 shows which requirements for AI applications and foundation models are to be expected according to the draft of the European Union's AI Regulation and Chapter 6 finally shows the system and procedure for meeting trustworthiness requirements.

Updated: 2024-05-08 10:08:45

标题: 使用基础模型开发可信赖的人工智能应用程序

摘要: 人工智能应用的可信度近期一直是研究的焦点，也在欧盟最近通过的人工智能法规中进行了讨论。目前新兴的文本、语音和图像处理领域的基础模型为开发人工智能应用提供了全新的可能性。本白皮书展示了如何评估和确保基于基础模型开发的人工智能应用的可信度。为此，将弗劳恩霍夫IAIS机构在《AI评估目录-可信人工智能指南》中开发的应用特定、基于风险的测试和确保人工智能应用的可信度的方法转移到基础模型的背景中。特别注意到基础模型的特定风险可能会影响人工智能应用，并在检查可信度时也必须考虑这一点。白皮书的第1章解释了基础模型与基于其的人工智能应用之间的可信度的基本关系。第2章介绍了基础模型的技术构造，第3章展示了如何基于其开发人工智能应用。第4章概述了关于可信度的相关风险。第5章展示了根据欧盟人工智能法规草案预期的人工智能应用和基础模型的要求，第6章最终展示了满足可信度要求的系统和程序。

更新时间: 2024-05-08 10:08:45

领域: cs.AI,I.2.0

下载: http://arxiv.org/abs/2405.04937v1

Harmonizing Program Induction with Rate-Distortion Theory

Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take the form of programs. In this work, we adapt RDT by proposing a three way trade-off among rate (description length), distortion (error), and computational costs (search budget). We use simulations on a melody task to study the implications of this trade-off, and show that constructing a shared program library across tasks provides global benefits. However, this comes at the cost of sensitivity to curricula, which is also characteristic of human learners. Finally, we use methods from partial information decomposition to generate training curricula that induce more effective libraries and better generalization.

Updated: 2024-05-08 10:03:50

标题: 将程序归纳与速率失真理论协调统一

摘要: 许多人类学习方面被提出作为构建心理程序的一个过程：从获取符号数字表示到关于世界的直观理论。与此同时，通过速率失真理论（RDT）来建模人类认知已有长期传统。然而，如何在心理表示采用程序形式时应用RDT仍然知之甚少。在这项工作中，我们通过提出速率（描述长度）、失真（错误）和计算成本（搜索预算）之间的三方面权衡，对RDT进行了调整。我们在一个旋律任务上进行了模拟研究这种权衡的影响，并表明在任务之间构建一个共享的程序库能提供全局收益。然而，这是以对课程敏感为代价的，这也是人类学习者的特征之一。最后，我们使用部分信息分解的方法来生成能引起更有效程序库和更好泛化的训练课程。

更新时间: 2024-05-08 10:03:50

领域: cs.HC,cs.CL,cs.LG,cs.SC,stat.ML

下载: http://arxiv.org/abs/2405.05294v1

Bounding Causal Effects with Leaky Instruments

Instrumental variables (IVs) are a popular and powerful tool for estimating causal effects in the presence of unobserved confounding. However, classical approaches rely on strong assumptions such as the $\textit{exclusion criterion}$, which states that instrumental effects must be entirely mediated by treatments. This assumption often fails in practice. When IV methods are improperly applied to data that do not meet the exclusion criterion, estimated causal effects may be badly biased. In this work, we propose a novel solution that provides $\textit{partial}$ identification in linear systems given a set of $\textit{leaky instruments}$, which are allowed to violate the exclusion criterion to some limited degree. We derive a convex optimization objective that provides provably sharp bounds on the average treatment effect under some common forms of information leakage, and implement inference procedures to quantify the uncertainty of resulting estimates. We demonstrate our method in a set of experiments with simulated data, where it performs favorably against the state of the art. An accompanying $\texttt{R}$ package, $\texttt{leakyIV}$, is available from $\texttt{CRAN}$.

Updated: 2024-05-08 09:59:09

标题: 用不完全可信仪器限制因果效应

摘要: 工具变量（IV）是在存在未观察到的混淆因素时估计因果效应的一种流行且强大的工具。然而，传统方法依赖于强假设，如$\textit{排除准则}$，该准则规定工具效应必须完全通过处理来中介。这种假设在实践中经常失败。当IV方法被不当应用于不符合排除准则的数据时，估计的因果效应可能严重偏倚。在这项工作中，我们提出了一种新颖的解决方案，对于给定一组$\textit{泄漏仪器}$，在线性系统中提供$\textit{部分}$识别，这些仪器被允许在一定程度上违反排除准则。我们推导出一个凸优化目标，可以在某些常见形式的信息泄漏下提供可证明的平均处理效应的尖锐界限，并实施推断程序来量化结果估计的不确定性。我们在一组使用模拟数据的实验中展示了我们的方法，在那里它表现优于现有技术。配套的$\texttt{R}$软件包$\texttt{leakyIV}$可从$\texttt{CRAN}$获取。

更新时间: 2024-05-08 09:59:09

领域: stat.ME,cs.AI

下载: http://arxiv.org/abs/2404.04446v2

Deep-learning-based decomposition of overlapping-sparse images: application at the vertex of neutrino interactions

Image decomposition plays a crucial role in various computer vision tasks, enabling the analysis and manipulation of visual content at a fundamental level. Overlapping images, which occur when multiple objects or scenes partially occlude each other, pose unique challenges for decomposition algorithms. The task intensifies when working with sparse images, where the scarcity of meaningful information complicates the precise extraction of components. This paper presents a solution that leverages the power of deep learning to accurately extract individual objects within multi-dimensional overlapping-sparse images, with a direct application in high-energy physics with decomposition of overlaid elementary particles obtained from imaging detectors. In particular, the proposed approach tackles a highly complex yet unsolved problem: identifying and measuring independent particles at the vertex of neutrino interactions, where one expects to observe detector images with multiple indiscernible overlapping charged particles. By decomposing the image of the detector activity at the vertex through deep learning, it is possible to infer the kinematic parameters of the identified low-momentum particles - which otherwise would remain neglected - and enhance the reconstructed energy resolution of the neutrino event. We also present an additional step - that can be tuned directly on detector data - combining the above method with a fully-differentiable generative model to improve the image decomposition further and, consequently, the resolution of the measured parameters, achieving unprecedented results. This improvement is crucial for precisely measuring the parameters that govern neutrino flavour oscillations and searching for asymmetries between matter and antimatter.

Updated: 2024-05-08 09:50:38

标题: 基于深度学习的重叠稀疏图像分解：在中微子相互作用顶点的应用

摘要: 图像分解在各种计算机视觉任务中起着至关重要的作用，使得可以在基本水平上分析和操纵视觉内容。当多个对象或场景部分遮挡彼此时产生重叠图像，这给分解算法带来了独特的挑战。当处理稀疏图像时，任务变得更加困难，因为有意义信息的稀缺使得精确提取组件变得复杂。本文提出了一种解决方案，利用深度学习的力量准确提取多维度重叠稀疏图像中的个别对象，直接应用于高能物理领域，用于分解来自成像探测器的叠加基本粒子。特别是，提出的方法解决了一个高度复杂但尚未解决的问题：在中微子相互作用的顶点识别和测量独立粒子，人们期望观察到带有多个无法区分的重叠带电粒子的探测器图像。通过通过深度学习分解顶点处探测器活动的图像，可以推断出识别的低动量粒子的运动学参数 - 否则这些参数将被忽略 - 并增强中微子事件的重建能量分辨率。我们还提出了一个额外的步骤 - 可以直接在探测器数据上调整 - 将上述方法与全可微生成模型结合起来，进一步改进图像分解，从而提高测量参数的分辨率，取得了前所未有的结果。这一改进对于精确测量控制中微子振荡和寻找物质与反物质之间的不对称性的参数至关重要。

更新时间: 2024-05-08 09:50:38

领域: cs.CV,cs.LG,hep-ex

下载: http://arxiv.org/abs/2310.19695v3

DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Learning latent costs of transitions on graphs from trajectories demonstrations under various contextual features is challenging but useful for path planning. Yet, existing methods either oversimplify cost assumptions or scale poorly with the number of observed trajectories. This paper introduces DataSP, a differentiable all-to-all shortest path algorithm to facilitate learning latent costs from trajectories. It allows to learn from a large number of trajectories in each learning step without additional computation. Complex latent cost functions from contextual features can be represented in the algorithm through a neural network approximation. We further propose a method to sample paths from DataSP in order to reconstruct/mimic observed paths' distributions. We prove that the inferred distribution follows the maximum entropy principle. We show that DataSP outperforms state-of-the-art differentiable combinatorial solver and classical machine learning approaches in predicting paths on graphs.

Updated: 2024-05-08 09:45:54

标题: DataSP: 一种用于学习成本和预测路径的差分全对全最短路径算法

摘要: 从轨迹演示中学习图上转换的潜在成本在各种情境特征下是具有挑战性但有用的路径规划。然而，现有方法要么过分简化成本假设，要么随着观察到的轨迹数量的增加而扩展性差。本文介绍了DataSP，一种可微分的全对全最短路径算法，以便从轨迹中学习潜在成本。它允许在每个学习步骤中从大量轨迹中学习，而无需额外计算。通过神经网络逼近，可在算法中表示来自情境特征的复杂潜在成本函数。我们进一步提出一种从DataSP中采样路径的方法，以重建/模仿观察到的路径分布。我们证明了推断的分布遵循最大熵原则。我们展示了DataSP在预测图上路径时优于最先进的可微分组合求解器和经典机器学习方法。

更新时间: 2024-05-08 09:45:54

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04923v1

Fast Computation of Leave-One-Out Cross-Validation for $k$-NN Regression

We describe a fast computation method for leave-one-out cross-validation (LOOCV) for $k$-nearest neighbours ($k$-NN) regression. We show that, under a tie-breaking condition for nearest neighbours, the LOOCV estimate of the mean square error for $k$-NN regression is identical to the mean square error of $(k+1)$-NN regression evaluated on the training data, multiplied by the scaling factor $(k+1)^2/k^2$. Therefore, to compute the LOOCV score, one only needs to fit $(k+1)$-NN regression only once, and does not need to repeat training-validation of $k$-NN regression for the number of training data. Numerical experiments confirm the validity of the fast computation method.

Updated: 2024-05-08 09:41:25

标题: $k$-NN回归的留一法交叉验证的快速计算

摘要: 我们描述了一种快速计算方法，用于$k$-最近邻($k$-NN)回归的留一交叉验证（LOOCV）。我们证明，在最近邻的一个打破平局条件下，$k$-NN回归的LOOCV均方误差估计与在训练数据上评估的$(k+1)$-NN回归的均方误差相同，乘以缩放因子$(k+1)^2/k^2$。因此，为了计算LOOCV分数，只需拟合$(k+1)$-NN回归一次，无需为训练数据的数量重复$k$-NN回归的训练-验证。数字实验证实了快速计算方法的有效性。

更新时间: 2024-05-08 09:41:25

领域: stat.ML,cs.DS,cs.LG,stat.CO,stat.ME

下载: http://arxiv.org/abs/2405.04919v1

A Review on Fragment-based De Novo 2D Molecule Generation

In the field of computational molecule generation, an essential task in the discovery of new chemical compounds, fragment-based deep generative models are a leading approach, consistently achieving state-of-the-art results in molecular design benchmarks as of 2023. We present a detailed comparative assessment of their architectures, highlighting their unique approaches to molecular fragmentation and generative modeling. This review also includes comparisons of output quality, generation speed, and the current limitations of specific models. We also highlight promising avenues for future research that could bridge fragment-based models to real-world applications.

Updated: 2024-05-08 09:38:38

标题: 《基于片段的全新二维分子生成综述》

摘要: 在计算分子生成领域，作为发现新化合物的基本任务，基于片段的深度生成模型是一种领先的方法，在2023年始终取得最先进的分子设计基准结果。我们提出了它们架构的详细比较评估，突出了它们对分子分解和生成建模的独特方法。本评述还包括输出质量、生成速度和特定模型当前限制的比较。我们还突出了未来研究的有希望的方向，可以将基于片段的模型与现实世界应用联系起来。

更新时间: 2024-05-08 09:38:38

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2405.05293v1

Delve into Base-Novel Confusion: Redundancy Exploration for Few-Shot Class-Incremental Learning

Few-shot class-incremental learning (FSCIL) aims to acquire knowledge from novel classes with limited samples while retaining information about base classes. Existing methods address catastrophic forgetting and overfitting by freezing the feature extractor during novel-class learning. However, these methods usually tend to cause the confusion between base and novel classes, i.e., classifying novel-class samples into base classes. In this paper, we delve into this phenomenon to study its cause and solution. We first interpret the confusion as the collision between the novel-class and the base-class region in the feature space. Then, we find the collision is caused by the label-irrelevant redundancies within the base-class feature and pixel space. Through qualitative and quantitative experiments, we identify this redundancy as the shortcut in the base-class training, which can be decoupled to alleviate the collision. Based on this analysis, to alleviate the collision between base and novel classes, we propose a method for FSCIL named Redundancy Decoupling and Integration (RDI). RDI first decouples redundancies from base-class space to shrink the intra-base-class feature space. Then, it integrates the redundancies as a dummy class to enlarge the inter-base-class feature space. This process effectively compresses the base-class feature space, creating buffer space for novel classes and alleviating the model's confusion between the base and novel classes. Extensive experiments across benchmark datasets, including CIFAR-100, miniImageNet, and CUB-200-2011 demonstrate that our method achieves state-of-the-art performance.

Updated: 2024-05-08 09:38:16

标题: 深入探讨基本小说混淆：少样本类增量学习的冗余探索

摘要: Few-shot class-incremental learning (FSCIL)旨在从有限样本中获取来自新类别的知识，同时保留关于基础类别的信息。现有方法通过在学习新类别时冻结特征提取器来解决灾难性遗忘和过拟合的问题。然而，这些方法通常会导致基础类别和新类别之间的混淆，即将新类别样本分类为基础类别。本文深入研究了这一现象的原因和解决方案。我们首先将混淆解释为特征空间中新类别和基础类别区域之间的碰撞。然后，我们发现碰撞是由基础类别特征和像素空间中的与标签无关的冗余引起的。通过定性和定量实验，我们确定了这种冗余是基础类别训练中的捷径，可以分离以减轻碰撞。基于这一分析，为了减轻基础类别和新类别之间的碰撞，我们提出了一种名为Redundancy Decoupling and Integration (RDI)的FSCIL方法。RDI首先从基础类别空间中分离冗余以缩小基础类别内部特征空间。然后，它将冗余集成为一个虚拟类别，以扩大基础类别之间的特征空间。这个过程有效地压缩了基础类别特征空间，为新类别创建了缓冲空间，并减轻了模型在基础类别和新类别之间的混淆。跨基准数据集进行的大量实验，包括CIFAR-100、miniImageNet和CUB-200-2011，证明了我们的方法达到了最先进的性能水平。

更新时间: 2024-05-08 09:38:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04918v1

Verified Neural Compressed Sensing

We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors. We focus on applying neural network verification to computational tasks with a precise notion of correctness, where a verifiably correct neural network provably solves the task at hand with no caveats. In particular, we develop an approach to train and verify the first provably correct neural networks for compressed sensing, i.e., recovering sparse vectors from a number of measurements smaller than the dimension of the vector. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. Furthermore, we show that the complexity of the network (number of neurons/layers) can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.

Updated: 2024-05-08 09:38:15

标题: 经过验证的神经压缩感知

摘要: 我们开发了第一个经证明正确的神经网络，用于精确的计算任务，其正确性证明是由自动验证算法生成的，没有任何人类输入。先前有关神经网络验证的工作侧重于部分规范，即使满足了部分规范，也不足以确保神经网络永远不会出错。我们专注于将神经网络验证应用于具有精确正确性概念的计算任务，其中经证明正确的神经网络可以毫无保留地解决手头的任务。尤其是，我们开发了一种方法来训练和验证第一个经证明正确的神经网络，用于压缩感知，即从小于向量维度的测量中恢复稀疏向量。我们展示了对于适度的问题维度（最多50个），我们可以训练出神经网络，可以从线性和二进制线性测量中经证明地恢复稀疏向量。此外，我们展示了网络的复杂性（神经元/层的数量）可以根据问题难度进行调整，并解决传统压缩感知方法无法保证有效的问题。

更新时间: 2024-05-08 09:38:15

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04260v2

Towards Secure Virtual Elections: Multiparty Computation of Order Based Voting Rules

Electronic voting systems are essential for holding virtual elections, and the need for such systems increases due to the COVID-19 pandemic and the social distancing that it mandates. One of the main challenges in e-voting systems is to secure the voting process: namely, to certify that the computed results are consistent with the cast ballots, and that the privacy of the voters is preserved. We propose herein a secure voting protocol for elections that are governed by order-based voting rules. Our protocol offers perfect ballot secrecy, in the sense that it issues only the required output, while no other information on the cast ballots is revealed. Such perfect secrecy, which is achieved by employing secure multiparty computation tools, may increase the voters' confidence and, consequently, encourage them to vote according to their true preferences. Evaluation of the protocol's computational costs establishes that it is lightweight and can be readily implemented in real-life electronic elections.

Updated: 2024-05-08 09:36:56

标题: 朝向安全的虚拟选举：基于顺序的投票规则的多方计算

摘要: 电子投票系统对于举行虚拟选举至关重要，由于COVID-19大流行和社交距离的需要，这种系统的需求正在增加。电子投票系统面临的主要挑战之一是确保投票过程的安全性，即确认计算结果与投票结果一致，并保护选民的隐私。我们在此提出了一种安全的选举投票协议，适用于基于顺序的投票规则。我们的协议提供了完美的选票保密性，即只发布所需的输出，不会透露有关投票结果的其他信息。通过使用安全多方计算工具实现的完美保密性可能提高选民的信心，从而鼓励他们根据自己的真实偏好投票。对协议的计算成本进行评估表明，它轻便且可以轻松在现实生活中的电子选举中实施。

更新时间: 2024-05-08 09:36:56

领域: cs.CR

下载: http://arxiv.org/abs/2205.10580v5

Learning with Posterior Sampling for Revenue Management under Time-varying Demand

This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managing the remaining inventory and estimating the demand. To tackle this challenge, we first introduce an episodic generalization of the RM problem motivated by typical application scenarios. We then propose a computationally efficient algorithm based on posterior sampling, which effectively optimizes prices by solving linear programming. We derive a Bayesian regret upper bound of this algorithm for general models where demand parameters can be correlated between time periods, while also deriving a regret lower bound for generic algorithms. Our empirical study shows that the proposed algorithm performs better than other benchmark algorithms and comparably to the optimal policy in hindsight. We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments.

Updated: 2024-05-08 09:28:26

标题: 学习后验采样用于时间变化需求下的收益管理

摘要: 本文讨论了通过定价商品或服务来最大化收入的收入管理（RM）问题。在这个问题中的一个挑战是需求分布在实际应用中（如航空和零售行业）是未知的，并且随时间变化。特别是，由于难以同时管理剩余库存和估计需求，对于未知需求情况下的时变需求尚未得到很好的研究。为了应对这一挑战，我们首先介绍了一种启发式的RM问题的周期性泛化，受典型应用场景启发。然后，我们提出了一种基于后验抽样的计算有效算法，通过解决线性规划有效地优化价格。我们推导出了这个算法的贝叶斯遗憾上界，适用于需求参数在时间段之间可以相关的一般模型，同时还为通用算法推导了遗憾下界。我们的实证研究表明，所提出的算法表现比其他基准算法更好，并且在事后的最优策略方面表现相当。我们还提出了所提出算法的启发式修改，进一步在实验中高效地学习定价策略。

更新时间: 2024-05-08 09:28:26

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.04910v1

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explicit prompt engineering to generate future motion from agents' past/observed trajectories and scene semantics. Traj-LLM starts with sparse context joint coding to dissect the agent and scene features into a form that LLMs understand. On this basis, we innovatively explore LLMs' powerful comprehension abilities to capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function and enhancing Traj-LLM's scene comprehension, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module. Finally, a multi-modal Laplace decoder is designed to achieve scene-compliant multi-modal predictions. Extensive experiments manifest that Traj-LLM, fortified by LLMs' strong prior knowledge and understanding prowess, together with lane-aware probability learning, outstrips state-of-the-art methods across evaluation metrics. Moreover, the few-shot analysis further substantiates Traj-LLM's performance, wherein with just 50% of the dataset, it outperforms the majority of benchmarks relying on complete data utilization. This study explores equipping the trajectory prediction task with advanced capabilities inherent in LLMs, furnishing a more universal and adaptable solution for forecasting agent motion in a new way.

Updated: 2024-05-08 09:28:04

标题: Traj-LLM：一种利用预训练大型语言模型增强轨迹预测的新探索

摘要: 预测动态交通参与者未来轨迹是自动驾驶的基石任务。尽管现有的显著努力已经取得了令人印象深刻的性能改进，但在场景认知和复杂交通语义的理解方面仍存在差距。本文提出了Traj-LLM，首次探讨使用大型语言模型（LLMs）的潜力，无需明确提示工程即可从代理人的过去/观察到的轨迹和场景语义生成未来动作。Traj-LLM从稀疏上下文联合编码开始，将代理人和场景特征解剖成LLMs理解的形式。在此基础上，我们创新地探索LLMs的强大理解能力，以捕捉一系列高级场景知识和交互信息。模拟人类般的车道关注认知功能并增强Traj-LLM的场景理解，我们引入由开创性的Mamba模块驱动的车道感知概率学习。最后，设计了一个多模式拉普拉斯解码器，以实现符合场景的多模式预测。广泛的实验表明，Traj-LLM，在LLMs的强大先验知识和理解能力的支持下，以及车道感知概率学习，超越了评估指标中的最先进方法。此外，少样本分析进一步证实了Traj-LLM的性能，在仅使用数据集的50%时，它优于依赖完整数据利用的大多数基准。这项研究探讨了将轨迹预测任务配备LLMs固有的先进能力，为以新方式预测代理人动作提供更普遍和适应性强的解决方案。

更新时间: 2024-05-08 09:28:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.04909v1

Imbalanced Graph Classification with Multi-scale Oversampling Graph Neural Networks

One main challenge in imbalanced graph classification is to learn expressive representations of the graphs in under-represented (minority) classes. Existing generic imbalanced learning methods, such as oversampling and imbalanced learning loss functions, can be adopted for enabling graph representation learning models to cope with this challenge. However, these methods often directly operate on the graph representations, ignoring rich discriminative information within the graphs and their interactions. To tackle this issue, we introduce a novel multi-scale oversampling graph neural network (MOSGNN) that learns expressive minority graph representations based on intra- and inter-graph semantics resulting from oversampled graphs at multiple scales - subgraph, graph, and pairwise graphs. It achieves this by jointly optimizing subgraph-level, graph-level, and pairwise-graph learning tasks to learn the discriminative information embedded within and between the minority graphs. Extensive experiments on 16 imbalanced graph datasets show that MOSGNN i) significantly outperforms five state-of-the-art models, and ii) offers a generic framework, in which different advanced imbalanced learning loss functions can be easily plugged in and obtain significantly improved classification performance.

Updated: 2024-05-08 09:16:54

标题: 使用多尺度过采样图神经网络进行不平衡图分类

摘要: 在不平衡图分类中的一个主要挑战是学习在代表性不足（少数）类别中的图形表达。现有的通用不平衡学习方法，如过采样和不平衡学习损失函数，可以用于使图形表示学习模型能够应对这一挑战。然而，这些方法通常直接作用于图形表示，忽略了图形内部和它们之间的丰富判别信息。为了解决这个问题，我们引入了一种新颖的多尺度过采样图神经网络（MOSGNN），它基于多尺度过采样图形所得的图内和图间语义，学习具有表现力的少数图表示。它通过联合优化子图级、图级和成对图级学习任务来学习嵌入在少数图之间和内部的判别信息。对16个不平衡图数据集的大量实验表明，MOSGNN i）显著优于五种最先进的模型，ii）提供一个通用框架，其中不同的高级不平衡学习损失函数可以轻松插入，并获得显著改进的分类性能。

更新时间: 2024-05-08 09:16:54

领域: cs.LG

下载: http://arxiv.org/abs/2405.04903v1

Permutation invariant functions: statistical tests, density estimation, and computationally efficient embedding

Permutation invariance is among the most common symmetry that can be exploited to simplify complex problems in machine learning (ML). There has been a tremendous surge of research activities in building permutation invariant ML architectures. However, less attention is given to: (1) how to statistically test for permutation invariance of coordinates in a random vector where the dimension is allowed to grow with the sample size; (2) how to leverage permutation invariance in estimation problems and how does it help reduce dimensions. In this paper, we take a step back and examine these questions in several fundamental problems: (i) testing the assumption of permutation invariance of multivariate distributions; (ii) estimating permutation invariant densities; (iii) analyzing the metric entropy of permutation invariant function classes and compare them with their counterparts without imposing permutation invariance; (iv) deriving an embedding of permutation invariant reproducing kernel Hilbert spaces for efficient computation. In particular, our methods for (i) and (iv) are based on a sorting trick and (ii) is based on an averaging trick. These tricks substantially simplify the exploitation of permutation invariance.

Updated: 2024-05-08 09:12:52

标题: 排列不变函数：统计检验、密度估计和计算效率嵌入

摘要: 排列不变性是机器学习中可以利用来简化复杂问题的最常见对称性之一。近年来，在构建排列不变性机器学习架构方面出现了大量的研究活动。然而，人们很少关注以下问题：(1)如何在允许样本大小增长的情况下统计测试随机向量中坐标的排列不变性；(2)如何利用排列不变性解决估计问题以及如何帮助降低维度。本文从基本问题出发，对这些问题进行了探讨：(i)测试多元分布的排列不变性假设；(ii)估计排列不变性密度；(iii)分析排列不变性函数类的度量熵，并将其与不施加排列不变性的对应物进行比较；(iv)推导出一种用于高效计算的排列不变性再生核希尔伯特空间的嵌入。特别地，我们的方法(i)和(iv)基于排序技巧，而(ii)基于平均技巧。这些技巧大大简化了对排列不变性的利用。

更新时间: 2024-05-08 09:12:52

领域: cs.LG,62G07 62G10 62G07 (Primary), 62G08, 62G10 (Secondary)

下载: http://arxiv.org/abs/2403.01671v2

Machine Learning-based NLP for Emotion Classification on a Cholera X Dataset

Recent social media posts on the cholera outbreak in Hammanskraal have highlighted the diverse range of emotions people experienced in response to such an event. The extent of people's opinions varies greatly depending on their level of knowledge and information about the disease. The documented re-search about Cholera lacks investigations into the classification of emotions. This study aims to examine the emotions expressed in social media posts about Chol-era. A dataset of 23,000 posts was extracted and pre-processed. The Python Nat-ural Language Toolkit (NLTK) sentiment analyzer library was applied to deter-mine the emotional significance of each text. Additionally, Machine Learning (ML) models were applied for emotion classification, including Long short-term memory (LSTM), Logistic regression, Decision trees, and the Bidirectional En-coder Representations from Transformers (BERT) model. The results of this study demonstrated that LSTM achieved the highest accuracy of 75%. Emotion classification presents a promising tool for gaining a deeper understanding of the impact of Cholera on society. The findings of this study might contribute to the development of effective interventions in public health strategies.

Updated: 2024-05-08 09:05:02

标题: 基于机器学习的自然语言处理在霍乱X数据集上的情感分类

摘要: 最近社交媒体上有关Hammanskraal霍乱爆发的帖子突显了人们对此事件产生的多样化情绪。人们的观点程度取决于他们对该疾病的了解和信息水平而有很大差异。有关霍乱的文献研究缺乏对情绪分类的调查。本研究旨在研究社交媒体帖子中表达的有关霍乱的情绪。提取并预处理了一个包含23,000个帖子的数据集。应用Python自然语言工具包（NLTK）情感分析器库来确定每个文本的情感重要性。此外，还应用了机器学习（ML）模型进行情感分类，包括长短时记忆（LSTM）、逻辑回归、决策树和来自变压器的双向编码器表示（BERT）模型。研究结果表明，LSTM实现了最高的75%的准确度。情绪分类为深入了解霍乱对社会的影响提供了一个有前途的工具。本研究的发现可能有助于制定有效的公共卫生策略干预措施的发展。

更新时间: 2024-05-08 09:05:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04897v1

LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities

As large language models (LLMs) continue to make significant strides, their better integration into agent-based simulations offers a transformational potential for understanding complex social systems. However, such integration is not trivial and poses numerous challenges. Based on this observation, in this paper, we explore architectures and methods to systematically develop LLM-augmented social simulations and discuss potential research directions in this field. We conclude that integrating LLMs with agent-based simulations offers a powerful toolset for researchers and scientists, allowing for more nuanced, realistic, and comprehensive models of complex systems and human behaviours.

Updated: 2024-05-08 08:57:54

标题: 增强LLM的基于代理的社会模拟建模：挑战与机遇

摘要: 随着大型语言模型（LLMs）不断取得重大进展，将它们更好地整合到基于代理的模拟中，为理解复杂社会系统提供了变革性的潜力。然而，这种整合并不是简单的，并引发了许多挑战。基于这一观察，在本文中，我们探讨了体系化开发LLM增强社会模拟的架构和方法，并讨论了该领域的潜在研究方向。我们得出结论，将LLMs与基于代理的模拟相结合，为研究人员和科学家提供了强大的工具集，可以更加细致、真实和全面地模拟复杂系统和人类行为。

更新时间: 2024-05-08 08:57:54

领域: physics.soc-ph,cs.AI

下载: http://arxiv.org/abs/2405.06700v1

Lattice-preserving $\mathcal{ALC}$ ontology embeddings

Generating vector representations (embeddings) of OWL ontologies is a growing task due to its applications in predicting missing facts and knowledge-enhanced learning in fields such as bioinformatics. The underlying semantics of OWL ontologies is expressed using Description Logics (DLs). Initial approaches to generate embeddings relied on constructing a graph out of ontologies, neglecting the semantics of the logic therein. Recent semantic-preserving embedding methods often target lightweight DL languages like $\mathcal{EL}^{++}$, ignoring more expressive information in ontologies. Although some approaches aim to embed more descriptive DLs like $\mathcal{ALC}$, those methods require the existence of individuals, while many real-world ontologies are devoid of them. We propose an ontology embedding method for the $\mathcal{ALC}$ DL language that considers the lattice structure of concept descriptions. We use connections between DL and Category Theory to materialize the lattice structure and embed it using an order-preserving embedding method. We show that our method outperforms state-of-the-art methods in several knowledge base completion tasks. We make our code and data available at https://github.com/bio-ontology-research-group/catE.

Updated: 2024-05-08 08:57:15

标题: 保持格点结构的 $\mathcal{ALC}$ 本体嵌入

摘要: 生成OWL本体的向量表示（嵌入）是一项不断增长的任务，因为它在预测缺失事实和知识增强学习等领域的应用中起着重要作用，比如生物信息学。OWL本体的基本语义是使用描述逻辑（DLs）表示的。最初生成嵌入的方法依赖于从本体构建图形，忽略其中的逻辑语义。最近的保留语义的嵌入方法通常针对轻量级DL语言，如$\mathcal{EL}^{++}$，忽略本体中更具表现力的信息。虽然一些方法旨在嵌入更具描述性的DLs，如$\mathcal{ALC}$，但这些方法要求存在个体，而许多现实世界的本体缺乏个体。我们提出了一种用于$\mathcal{ALC}$ DL语言的本体嵌入方法，考虑概念描述的格结构。我们利用DL和范畴论之间的联系来实现格结构，并使用保序嵌入方法进行嵌入。我们展示了我们的方法在多个知识库完成任务中优于最先进的方法。我们在https://github.com/bio-ontology-research-group/catE 上提供了我们的代码和数据。

更新时间: 2024-05-08 08:57:15

领域: cs.LO,cs.AI

下载: http://arxiv.org/abs/2305.07163v2

A fuzzy reward and punishment scheme for vehicular ad hoc networks

Trust management is an important security approach for the successful implementation of Vehicular Ad Hoc Networks (VANETs). Trust models evaluate messages to assign reward or punishment. This can be used to influence a driver's future behaviour. In the author's previous work, a sender side based trust management framework is developed which avoids the receiver evaluation of messages. However, this does not guarantee that a trusted driver will not lie. These "untrue attacks" are resolved by the RSUs using collaboration to rule on a dispute, providing a fixed amount of reward and punishment. The lack of sophistication is addressed in this paper with a novel fuzzy RSU controller considering the severity of incident, driver past behaviour, and RSU confidence to determine the reward or punishment for the conflicted drivers. Although any driver can lie in any situation, it is expected that trustworthy drivers are more likely to remain so, and vice versa. This behaviour is captured in a Markov chain model for sender and reporter drivers where their lying characteristics depend on trust score and trust state. Each trust state defines the driver's likelihood of lying using different probability distribution. An extensive simulation is performed to evaluate the performance of the fuzzy assessment and examine the Markov chain driver behaviour model with changing the initial trust score of all or some drivers in Veins simulator. The fuzzy and the fixed RSU assessment schemes are compared, and the result shows that the fuzzy scheme can encourage drivers to improve their behaviour.

Updated: 2024-05-08 08:55:39

标题: 一个模糊奖惩方案用于车辆自组网

摘要: 信任管理是一种重要的安全方法，用于成功实施车辆自组网（VANETs）。信任模型评估消息以分配奖励或惩罚。这可以用来影响驾驶员的未来行为。在作者以前的工作中，开发了一个基于发送方的信任管理框架，避免了接收方对消息的评估。然而，这并不能保证受信任的驾驶员不会撒谎。这些“虚假攻击”通过RSUs协作解决争端，提供固定数量的奖励和惩罚。本文通过考虑事件的严重性、驾驶员过去的行为和RSU的信心来确定冲突驾驶员的奖励或惩罚，提出了一种新颖的模糊RSU控制器，解决了缺乏复杂性的问题。尽管任何驾驶员都可能在任何情况下撒谎，但可信赖的驾驶员更有可能保持诚信，反之亦然。这种行为被捕捉在一个马尔可夫链模型中，发送方和报告方驾驶员的撒谎特征取决于信任分数和信任状态。每个信任状态都定义了驾驶员撒谎的可能性，使用不同的概率分布。进行了大量模拟以评估模糊评估的性能，并使用Veins模拟器改变所有或部分驾驶员的初始信任分数来检查驾驶员行为模型的马尔可夫链。比较了模糊和固定RSU评估方案，结果显示模糊方案可以鼓励驾驶员改善其行为。

更新时间: 2024-05-08 08:55:39

领域: cs.CR

下载: http://arxiv.org/abs/2405.04892v1

Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

Recent advances in natural language processing (NLP) can be largely attributed to the advent of pre-trained language models such as BERT and RoBERTa. While these models demonstrate remarkable performance on general datasets, they can struggle in specialized domains such as medicine, where unique domain-specific terminologies, domain-specific abbreviations, and varying document structures are common. This paper explores strategies for adapting these models to domain-specific requirements, primarily through continuous pre-training on domain-specific data. We pre-trained several German medical language models on 2.4B tokens derived from translated public English medical data and 3B tokens of German clinical data. The resulting models were evaluated on various German downstream tasks, including named entity recognition (NER), multi-label classification, and extractive question answering. Our results suggest that models augmented by clinical and translation-based pre-training typically outperform general domain models in medical contexts. We conclude that continuous pre-training has demonstrated the ability to match or even exceed the performance of clinical models trained from scratch. Furthermore, pre-training on clinical data or leveraging translated texts have proven to be reliable methods for domain adaptation in medical NLP tasks.

Updated: 2024-05-08 08:53:53

标题: 德语语言模型在临床和生物医学文本理解中的综合研究

摘要: 最近自然语言处理（NLP）的进展很大程度上归功于预先训练的语言模型，比如BERT和RoBERTa的出现。虽然这些模型在通用数据集上表现出色，但在医学等专业领域可能会遇到困难，因为专业领域术语、特定领域缩写和不同的文档结构很常见。本文探讨了适应这些模型到专业领域需求的策略，主要是通过在专业领域数据上进行持续预训练。我们在从翻译的公共英文医学数据中获取的24亿标记和30亿德国临床数据中预训练了几个德语医学语言模型。结果模型在各种德语下游任务上进行了评估，包括命名实体识别（NER）、多标签分类和提取式问答。我们的结果表明，在医学背景下，通过临床和基于翻译的预训练增强的模型通常优于通用领域模型。我们得出结论，持续预训练已经证明能够匹配甚至超过从头开始训练的临床模型的性能。此外，在临床数据上进行预训练或利用翻译文本已被证明是医学NLP任务中领域适应的可靠方法。

更新时间: 2024-05-08 08:53:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.05694v2

GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

For autonomous robotics applications, it is crucial that robots are able to accurately measure their potential state and perceive their environment, including other agents within it (e.g., cobots interacting with humans). The redundancy of these measurements is important, as it allows for planning and execution of recovery protocols in the event of sensor failure or external disturbances. Visual estimation can provide this redundancy through the use of low-cost sensors and server as a standalone source of proprioception when no encoder-based sensing is available. Therefore, we estimate the configuration of the robot jointly with its pose, which provides a complete spatial understanding of the observed robot. We present GISR - a method for deep configuration and robot-to-camera pose estimation that prioritizes real-time execution. GISR is comprised of two modules: (i) a geometric initialization module, efficiently computing an approximate robot pose and configuration, and (ii) an iterative silhouette-based refinement module that refines the initial solution in only a few iterations. We evaluate our method on a publicly available dataset and show that GISR performs competitively with existing state-of-the-art approaches, while being significantly faster compared to existing methods of the same class. Our code is available at https://github.com/iwhitey/GISR-robot.

Updated: 2024-05-08 08:39:25

标题: GISR：单视角机器人姿态和配置估计的几何初始化和基于轮廓的精化

摘要: 对于自主机器人应用，机器人能够准确地测量其潜在状态并感知其环境，包括其中的其他代理（例如与人类互动的协作机器人）至关重要。这些测量的冗余性很重要，因为它允许在传感器故障或外部干扰发生时进行规划和执行恢复协议。视觉估计可以通过使用低成本传感器提供这种冗余性，并在没有基于编码器的感知时作为独立的本体感知源。因此，我们同时估计机器人的配置和姿态，从而提供对观察到的机器人的完整空间理解。我们提出GISR - 一种用于深度配置和机器人到摄像机姿态估计的方法，重点放在实时执行。GISR由两个模块组成：（i）几何初始化模块，高效计算近似机器人姿态和配置，以及（ii）迭代基于轮廓的细化模块，仅在几次迭代中细化初始解决方案。我们在一个公开可用的数据集上评估了我们的方法，并展示了GISR与现有最先进方法具有竞争力，同时与同类现有方法相比速度显著更快。我们的代码可在https://github.com/iwhitey/GISR-robot找到。

更新时间: 2024-05-08 08:39:25

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.04890v1

A trust management framework for vehicular ad hoc networks

Vehicular Ad Hoc Networks (VANETs) enable road users and public infrastructure to share information that improves the operation of roads and driver experience. However, these are vulnerable to poorly behaved authorized users. Trust management is used to address attacks from authorized users in accordance with their trust score. By removing the dissemination of trust metrics in the validation process, communication overhead and response time are lowered. In this paper, we propose a new Tamper-Proof Device (TPD) based trust management framework for controlling trust at the sender side vehicle that regulates driver behaviour. Moreover, the dissemination of feedback is only required when there is conflicting information in the VANET. If a conflict arises, the Road-Side Unit (RSU) decides, using the weighted voting system, whether the originator is to be believed, or not. The framework is evaluated against a centralized reputation approach and the results demonstrate that it outperforms the latter.

Updated: 2024-05-08 08:35:48

标题: 一个用于车辆自组网络的信任管理框架

摘要: 车载自组网（VANETs）使道路用户和公共基础设施能够共享信息，从而改善道路运行和驾驶体验。然而，这些网络容易受到行为不端的授权用户的攻击。信任管理用于根据其信任评分处理来自授权用户的攻击。通过在验证过程中去除信任度量的传播，通信开销和响应时间得以降低。本文提出了一种基于防篡改设备（TPD）的信任管理框架，用于控制发送端车辆的信任，以规范驾驶员行为。此外，在车载自组网中只有在存在冲突信息时才需要传播反馈。如果发生冲突，道路边缘单位（RSU）使用加权投票系统决定是否相信发起者。该框架与集中式声誉方法进行了评估，结果表明其表现优于后者。

更新时间: 2024-05-08 08:35:48

领域: cs.CR

下载: http://arxiv.org/abs/2405.04885v1

Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces. In this work, we propose Molecule-Space, an idea that treats multimodal representation spaces as "molecules", and augments pre-trained unified space by integrating knowledge from extra expert spaces via "molecules space reactions". Specifically, we introduce two kinds of basic space reactions: 1) Space Displacement Reaction and 2) Space Combination Reaction. Based on these defined basic reactions, we design Complex Sequential & Parallel Reactions to effectively integrate multiple spaces simultaneously. Benefiting from the modularization concept, we further propose a coarse-to-fine customized inference strategy to flexibly adjust the enhanced unified space for different purposes. Experimentally, we fuse the audio-image-text space of ImageBind with the image-text and audio-text expert spaces. The resulting space outperforms ImageBind on 5 downstream tasks across 9 datasets. Moreover, via customized inference, it even surpasses the used image-text and audio-text expert spaces.

Updated: 2024-05-08 08:32:34

标题: 分子空间：通过知识融合在统一的多模态空间中获得免费午餐

摘要: 统一的多模型表示空间是多模态理解和生成的基础。然而，数十亿个模型参数和灾难性遗忘问题使进一步增强预训练的统一空间变得具有挑战性。在这项工作中，我们提出了分子空间（Molecule-Space）的概念，将多模态表示空间视为“分子”，并通过“分子空间反应”集成来自额外专家空间的知识来增强预训练的统一空间。具体来说，我们引入了两种基本空间反应：1）空间位移反应和2）空间组合反应。基于这些定义的基本反应，我们设计了复杂的顺序和并行反应，以有效地同时集成多个空间。受益于模块化概念，我们进一步提出了一种由粗到细的定制推理策略，以灵活调整增强的统一空间以适应不同的目的。在实验中，我们将ImageBind的音频-图像-文本空间与图像-文本和音频-文本专家空间融合。结果空间在9个数据集的5个下游任务中表现优于ImageBind。而且，通过定制推理，它甚至超越了使用的图像-文本和音频-文本专家空间。

更新时间: 2024-05-08 08:32:34

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04883v1

Gödel Number based Clustering Algorithm with Decimal First Degree Cellular Automata

In this paper, a decimal first degree cellular automata (FDCA) based clustering algorithm is proposed where clusters are created based on reachability. Cyclic spaces are created and configurations which are in the same cycle are treated as the same cluster. Here, real-life data objects are encoded into decimal strings using G\"odel number based encoding. The benefits of the scheme is, it reduces the encoded string length while maintaining the features properties. Candidate CA rules are identified based on some theoretical criteria such as self-replication and information flow. An iterative algorithm is developed to generate the desired number of clusters over three stages. The results of the clustering are evaluated based on benchmark clustering metrics such as Silhouette score, Davis Bouldin, Calinski Harabasz and Dunn Index. In comparison with the existing state-of-the-art clustering algorithms, our proposed algorithm gives better performance.

Updated: 2024-05-08 08:30:34

标题: 哥德尔数基于十进制一阶元胞自动机的聚类算法

摘要: 本文提出了一种基于十进制一阶细胞自动机（FDCA）的聚类算法，其中基于可达性创建聚类。循环空间被创建，处于同一周期的配置被视为同一聚类。在这里，真实数据对象被编码为十进制字符串，使用Gödel编码。该方案的优点是，它减少了编码字符串的长度，同时保持了特征属性。候选CA规则是基于一些理论标准（如自我复制和信息流）确定的。开发了一种迭代算法，以在三个阶段生成所需数量的聚类。根据基准聚类度量标准（如轮廓分数、戴维斯-宝丁指数、卡林斯基-哈拉巴斯指数和邓恩指数）评估了聚类的结果。与现有最先进的聚类算法相比，我们提出的算法表现更好。

更新时间: 2024-05-08 08:30:34

领域: cs.FL,cs.ET,cs.LG

下载: http://arxiv.org/abs/2405.04881v1

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for effective detection methods. Unlike traditional deepfake audio generation, which often involves multi-step processes culminating in vocoder usage, ALM directly utilizes neural codec methods to decode discrete codes into audio. Moreover, driven by large-scale data, ALMs exhibit remarkable robustness and versatility, posing a significant challenge to current audio deepfake detection (ADD) models. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including two languages, millions of audio samples, and various test conditions, tailored for ALM-based audio detection. Additionally, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. Experiment results demonstrate that co-training on Codecfake dataset and vocoded dataset with CSAM strategy yield the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models.

Updated: 2024-05-08 08:28:40

标题: 《Codecfake数据集及用于普遍检测Deepfake音频的对策》

摘要: 随着基于音频语言模型（ALM）的deepfake音频的广泛传播，迫切需要有效的检测方法。与传统的deepfake音频生成不同，后者通常涉及多步骤过程，最终使用声码器，ALM直接利用神经编解码器方法将离散编码解码为音频。此外，由大规模数据驱动，ALM表现出卓越的鲁棒性和多功能性，给当前音频deepfake检测（ADD）模型带来了重大挑战。为了有效检测基于ALM的deepfake音频，我们关注ALM音频生成方法的机制，即从神经编解码器转换为波形。我们最初构建了Codecfake数据集，这是一个开源的大规模数据集，包括两种语言、数百万音频样本和各种测试条件，专门针对基于ALM的音频检测。此外，为了实现对deepfake音频的普遍检测并解决原始SAM的领域升级偏差问题，我们提出了CSAM策略，学习一个领域平衡和泛化最小值。实验结果表明，在Codecfake数据集和声码器数据集上进行联合训练，并采用CSAM策略，与基准模型相比，在所有测试条件下平均等误差率（EER）为0.616％，达到了最低水平。

更新时间: 2024-05-08 08:28:40

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.04880v1

The Need Of Trustworthy Announcements To Achieve Driving Comfort

An Intelligent Transport System (ITS) is more demanding nowadays and it can be achieved through deploying Vehicular Ad Hoc Networks (VANETs). Vehicles and Roadside Units (RSUs) exchange traffic events. Malicious drivers generate false events. Thus, they need to be identified to maintain trustworthy communication. When an authorised user acts maliciously, the security scheme typically fails. However, a trust model can isolate false messages. In this paper, the significance of trustworthy announcements for VANETs is analysed. To this end, a series of experiments is conducted in Veins to illustrate how the trustworthiness of announcements affects travel time. A traffic scenario is created where vehicles detour to an alternate route with an announcement from the leading vehicle. Both true and false announcements are considered. Results confirm that false announcements and refraining from announcements increase travel time. However, the travel time is reduced with trustworthy announcements. From this analysis, it can be concluded that trustworthy announcements facilitate driver comfort.

Updated: 2024-05-08 08:23:25

标题: 实现驾驶舒适需要可靠的公告

摘要: 智能交通系统（ITS）目前更加需要，并且可以通过部署车辆自组网（VANETs）来实现。车辆和路边单元（RSUs）交换交通事件。恶意驾驶员生成虚假事件。因此，需要识别他们以保持可信通信。当授权用户恶意行为时，安全方案通常会失败。然而，信任模型可以隔离虚假消息。本文分析了对VANETs的可信公告的重要性。为此，在Veins中进行了一系列实验，以说明公告的可信度如何影响行车时间。创建了一个交通场景，在该场景中，车辆根据领先车辆的公告绕行到另一条路线。考虑了真实和虚假公告。结果证实，虚假公告和不发出公告会增加行车时间。但是，通过可信的公告，行车时间会减少。从这个分析可以得出结论，可信的公告有助于提高驾驶员的舒适度。

更新时间: 2024-05-08 08:23:25

领域: cs.CR

下载: http://arxiv.org/abs/2405.04878v1

Smart Portable Computer

Amidst the COVID-19 pandemic, with many organizations, schools, colleges, and universities transitioning to virtual platforms, students encountered difficulties in acquiring PCs such as desktops or laptops. The starting prices, around 15,000 INR, often failed to offer adequate system specifications, posing a challenge for consumers. Additionally, those reliant on laptops for work found the conventional approach cumbersome. Enter the "Portable Smart Computer," a leap into the future of computing. This innovative device boasts speed and performance comparable to traditional desktops but in a compact, energy-efficient, and cost-effective package. It delivers a seamless desktop experience, whether one is editing documents, browsing multiple tabs, managing spreadsheets, or creating presentations. Moreover, it supports programming languages like Python, C, C++, as well as compilers such as Keil and Xilinx, catering to the needs of programmers.

Updated: 2024-05-08 08:20:27

标题: 智能便携式计算机

摘要: 在COVID-19大流行期间，许多组织、学校、学院和大学转向虚拟平台，学生们在获取台式机或笔记本电脑等PC方面遇到了困难。起步价格约为15,000卢比，通常无法提供足够的系统规格，给消费者带来了挑战。此外，那些依赖笔记本电脑工作的人发现传统方法繁琐。推出了“便携式智能计算机”，这是计算机未来的一大飞跃。这种创新设备具有速度和性能，可与传统台式机相媲美，但体积小巧、节能高效、价格实惠。它提供了无缝的桌面体验，无论是编辑文件、浏览多个标签页、管理电子表格还是创建演示文稿。此外，它还支持Python、C、C++等编程语言，以及Keil和Xilinx等编译器，满足程序员的需求。

更新时间: 2024-05-08 08:20:27

领域: cs.HC,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.05292v1

SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments

Split Federated Learning (SFL) is a distributed machine learning framework which strategically divides the learning process between a server and clients and collaboratively trains a shared model by aggregating local models updated based on data from distributed clients. However, data heterogeneity and partial client participation result in label distribution skew, which severely degrades the learning performance. To address this issue, we propose SFL with Concatenated Activations and Logit Adjustments (SCALA). Specifically, the activations from the client-side models are concatenated as the input of the server-side model so as to centrally adjust label distribution across different clients, and logit adjustments of loss functions on both server-side and client-side models are performed to deal with the label distribution variation across different subsets of participating clients. Theoretical analysis and experimental results verify the superiority of the proposed SCALA on public datasets.

Updated: 2024-05-08 08:12:21

标题: SCALA：使用连接激活和logit调整的分裂式联邦学习

摘要: 分裂联邦学习（SFL）是一种分布式机器学习框架，它在服务器和客户端之间战略性地分割学习过程，并通过聚合基于分布式客户端数据更新的本地模型来协同训练共享模型。然而，数据异质性和部分客户参与导致标签分布倾斜，严重降低了学习性能。为了解决这个问题，我们提出了具有连接激活和对数调整（SCALA）的SFL。具体地，客户端模型的激活被连接为服务器端模型的输入，以便在不同客户端之间集中调整标签分布，并对服务器端和客户端模型上的损失函数进行对数调整，以处理不同参与客户端子集之间的标签分布变化。理论分析和实验结果验证了提出的SCALA在公共数据集上的优越性。

更新时间: 2024-05-08 08:12:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04875v1

Learning to Detect Critical Nodes in Sparse Graphs via Feature Importance Awareness

Detecting critical nodes in sparse graphs is important in a variety of application domains, such as network vulnerability assessment, epidemic control, and drug design. The critical node problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network. Due to its general NP-hard nature, state-of-the-art CNP solutions are based on heuristic approaches. Domain knowledge and trial-and-error are usually required when designing such approaches, thus consuming considerable effort and time. This work proposes a feature importance-aware graph attention network for node representation and combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time. It does not need any problem-specific knowledge or labeled datasets as required by most of existing methods. Once the model is trained, it can be generalized to cope with various types of CNPs (with different sizes and topological structures) without re-training. Computational experiments on 28 real-world networks show that the proposed method is highly comparable to state-of-the-art methods. It does not require any problem-specific knowledge and, hence, can be applicable to many applications including those impossible ones by using the existing approaches. It can be combined with some local search methods to further improve its solution quality. Extensive comparison results are given to show its effectiveness in solving CNP.

Updated: 2024-05-08 08:11:58

标题: 学习通过特征重要性认知在稀疏图中检测关键节点

摘要: 在稀疏图中检测关键节点在各种应用领域中至关重要，例如网络脆弱性评估、疫情控制和药物设计。关键节点问题（CNP）旨在从一个网络中找到一组关键节点，其删除会最大程度地降低剩余网络的成对连接性。由于其一般的NP难性质，当前最先进的CNP解决方案都基于启发式方法。在设计这些方法时通常需要领域知识和反复尝试，因此消耗了大量的精力和时间。本文提出了一种基于特征重要性的图注意力网络用于节点表示，并将其与对抗双深度Q网络相结合，创建了一个端到端算法首次解决CNP。它不需要任何特定问题的知识或标记数据集，这是大多数现有方法所要求的。一旦模型训练完成，它可以推广到处理各种类型的CNP（具有不同大小和拓扑结构），而无需重新训练。对28个真实网络的计算实验表明，所提出的方法与最先进的方法具有高度可比性。它不需要任何特定问题的知识，因此可以适用于许多应用，包括那些使用现有方法无法实现的应用。它可以与一些局部搜索方法结合，进一步提高其解决方案质量。给出了广泛的比较结果，以展示其在解决CNP方面的有效性。

更新时间: 2024-05-08 08:11:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2112.03404v2

Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities

Critical National Infrastructure (CNI) encompasses a nation's essential assets that are fundamental to the operation of society and the economy, ensuring the provision of vital utilities such as energy, water, transportation, and communication. Nevertheless, growing cybersecurity threats targeting these infrastructures can potentially interfere with operations and seriously risk national security and public safety. In this paper, we examine the intricate issues raised by cybersecurity risks to vital infrastructure, highlighting these systems' vulnerability to different types of cyberattacks. We analyse the significance of trust, privacy, and resilience for Critical Infrastructure Protection (CIP), examining the diverse standards and regulations to manage these domains. We also scrutinise the co-analysis of safety and security, offering innovative approaches for their integration and emphasising the interdependence between these fields. Furthermore, we introduce a comprehensive method for CIP leveraging Generative AI and Large Language Models (LLMs), giving a tailored lifecycle and discussing specific applications across different critical infrastructure sectors. Lastly, we discuss potential future directions that promise to enhance the security and resilience of critical infrastructures. This paper proposes innovative strategies for CIP from evolving attacks and enhances comprehension of cybersecurity concerns related to critical infrastructure.

Updated: 2024-05-08 08:08:50

标题: 关键基础设施保护：生成式人工智能、挑战和机遇

摘要: Critical National Infrastructure (CNI) 涵盖了一个国家的基本资产，这些资产对社会和经济的运作至关重要，确保提供关键公共设施，如能源、水、交通和通信。然而，针对这些基础设施的日益增长的网络安全威胁可能会干扰运营，并严重危及国家安全和公共安全。在本文中，我们审视了网络安全风险对关键基础设施带来的复杂问题，突显这些系统对不同类型网络攻击的脆弱性。我们分析了信任、隐私和弹性对关键基础设施保护（CIP）的重要性，考察了管理这些领域的多样标准和规定。我们还审查了安全和安全性的协同分析，提供了它们整合的创新方法，并强调了这些领域之间的相互依存关系。此外，我们引入了一种利用生成式人工智能和大型语言模型（LLMs）的综合CIP方法，提供了定制的生命周期，并讨论了在不同关键基础设施领域的具体应用。最后，我们讨论了有望增强关键基础设施安全性和弹性的潜在未来方向。本文提出了创新的CIP策略，以应对不断发展的攻击，并加深了对与关键基础设施相关的网络安全问题的理解。

更新时间: 2024-05-08 08:08:50

领域: cs.CR

下载: http://arxiv.org/abs/2405.04874v1

Logical Negation Augmenting and Debiasing for Prompt-based Methods

Prompt-based methods have gained increasing attention on NLP and shown validity on many downstream tasks. Many works have focused on mining these methods' potential for knowledge extraction, but few explore their ability to make logical reasoning. In this work, we focus on the effectiveness of the prompt-based methods on first-order logical reasoning and find that the bottleneck lies in logical negation. Based on our analysis, logical negation tends to result in spurious correlations to negative answers, while propositions without logical negation correlate to positive answers. To solve the problem, we propose a simple but effective method, Negation Augmenting and Negation Debiasing (NAND), which introduces negative propositions to prompt-based methods without updating parameters. Specifically, these negative propositions can counteract spurious correlations by providing "not" for all instances so that models cannot make decisions only by whether expressions contain a logical negation. Experiments on three datasets show that NAND not only solves the problem of calibrating logical negation but also significantly enhances prompt-based methods of logical reasoning without model retraining.

Updated: 2024-05-08 08:05:47

标题: 逻辑否定增强和去偏倚对于基于提示的方法

摘要: 基于提示的方法在自然语言处理领域越来越受到关注，并在许多下游任务中显示出有效性。许多研究关注于挖掘这些方法在知识提取方面的潜力，但很少有人探索它们进行逻辑推理的能力。在这项工作中，我们关注基于提示的方法在一阶逻辑推理上的有效性，并发现瓶颈在于逻辑否定。根据我们的分析，逻辑否定往往会导致对负答案的虚假相关性，而没有逻辑否定的命题则会与正答案相关。为了解决这个问题，我们提出了一种简单但有效的方法，即否定增强和否定去偏差（NAND），它在不更新参数的情况下向基于提示的方法引入负命题。具体来说，这些负命题可以通过为所有实例提供“不”来抵消虚假相关性，从而使模型不能仅通过表达是否包含逻辑否定来做出决策。对三个数据集的实验结果表明，NAND不仅解决了逻辑否定校准的问题，还显著增强了基于提示的逻辑推理方法，而无需重新训练模型。

更新时间: 2024-05-08 08:05:47

领域: cs.CL,cs.AI,cs.LO

下载: http://arxiv.org/abs/2405.04872v1

Enhancing Geometric Ontology Embeddings for $\mathcal{EL}^{++}$ with Negative Sampling and Deductive Closure Filtering

Ontology embeddings map classes, relations, and individuals in ontologies into $\mathbb{R}^n$, and within $\mathbb{R}^n$ similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic $\mathcal{EL}^{++}$, several embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for $\mathcal{EL}^{++}$ ontologies based on high-dimensional ball representation of concept descriptions, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.

Updated: 2024-05-08 07:50:21

标题: 使用负采样和演绎闭包过滤增强$\mathcal{EL}^{++}$的几何本体嵌入

摘要: 本体嵌入将本体中的类、关系和个体映射到$\mathbb{R}^n$中，在$\mathbb{R}^n$内可以计算实体之间的相似性或推断新的公理。对于描述逻辑$\mathcal{EL}^{++}$中的本体，已经开发了几种嵌入方法，明确生成本体的模型。然而，这些方法存在一些局限性；它们不能区分不可证明和可以证明为假的陈述，因此它们可能将推出的陈述用作否定。此外，它们不利用本体的演绎闭包来识别被推断但未断言的陈述。我们评估了一组基于概念描述的高维球表示的$\mathcal{EL}^{++}$本体的嵌入方法，结合了几种旨在利用本体演绎闭包的修改。特别是，我们设计了新颖的负损失，既考虑演绎闭包又考虑不同类型的否定。我们证明了我们的嵌入方法在知识库或本体补全任务中优于基线本体嵌入。

更新时间: 2024-05-08 07:50:21

领域: cs.AI

下载: http://arxiv.org/abs/2405.04868v1

Systematic review, analysis, and characterisation of malicious industrial network traffic datasets for aiding Machine Learning algorithm performance testing

The adoption of the Industrial Internet of Things (IIoT) as a complementary technology to Operational Technology (OT) has enabled a new level of standardised data access and process visibility. This convergence of Information Technology (IT), OT, and IIoT has also created new cybersecurity vulnerabilities and risks that must be managed. Artificial Intelligence (AI) is emerging as a powerful tool to monitor OT/IIoT networks for malicious activity and is a highly active area of research. AI researchers are applying advanced Machine Learning (ML) and Deep Learning (DL) techniques to the detection of anomalous or malicious activity in network traffic. They typically use datasets derived from IoT/IIoT/OT network traffic captures to measure the performance of their proposed approaches. Therefore, there is a widespread need for datasets for algorithm testing. This work systematically reviews publicly available network traffic capture-based datasets, including categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis. Each dataset is analysed to provide researchers with metadata that can be used to select the best dataset for their research question. This results in an added benefit to the community as researchers can select the best dataset for their research more easily and according to their specific Machine Learning goals.

Updated: 2024-05-08 07:48:40

标题: 系统性审查、分析和特征化恶意工业网络流量数据集，以帮助机器学习算法性能测试

摘要: 工业物联网（IIoT）作为操作技术（OT）的补充技术的采用，实现了一种新的标准化数据访问和流程可见性水平。信息技术（IT）、OT和IIoT的融合也产生了新的网络安全漏洞和风险，必须加以管理。人工智能（AI）正逐渐成为监视OT/IIoT网络恶意活动的强大工具，并且是一个高度活跃的研究领域。AI研究人员将先进的机器学习（ML）和深度学习（DL）技术应用于网络流量中的异常或恶意活动检测。他们通常使用从物联网/IIoT/OT网络流量捕获中导出的数据集来衡量其提出方法的性能。因此，对于算法测试存在广泛的数据集需求。本研究系统地审查了公开可用的基于网络流量捕获的数据集，包括包含的攻击类型的分类、元数据的审查以及统计和复杂性分析。对每个数据集进行分析，以为研究人员提供可用于选择最佳数据集的元数据，从而使研究人员更容易选择最佳数据集，根据其具体的机器学习目标。这将使社区获益，因为研究人员可以更轻松地根据其研究问题选择最佳数据集。

更新时间: 2024-05-08 07:48:40

领域: cs.CR

下载: http://arxiv.org/abs/2405.04866v1

TrafficGPT: Towards Multi-Scale Traffic Analysis and Generation with Spatial-Temporal Agent Framework

The precise prediction of multi-scale traffic is a ubiquitous challenge in the urbanization process for car owners, road administrators, and governments. In the case of complex road networks, current and past traffic information from both upstream and downstream roads are crucial since various road networks have different semantic information about traffic. Rationalizing the utilization of semantic information can realize short-term, long-term, and unseen road traffic prediction. As the demands of multi-scale traffic analysis increase, on-demand interactions and visualizations are expected to be available for transportation participants. We have designed a multi-scale traffic generation system, namely TrafficGPT, using three AI agents to process multi-scale traffic data, conduct multi-scale traffic analysis, and present multi-scale visualization results. TrafficGPT consists of three essential AI agents: 1) a text-to-demand agent that is employed with Question & Answer AI to interact with users and extract prediction tasks through texts; 2) a traffic prediction agent that leverages multi-scale traffic data to generate temporal features and similarity, and fuse them with limited spatial features and similarity, to achieve accurate prediction of three tasks; and 3) a suggestion and visualization agent that uses the prediction results to generate suggestions and visualizations, providing users with a comprehensive understanding of traffic conditions. Our TrafficGPT system focuses on addressing concerns about traffic prediction from transportation participants, and conducted extensive experiments on five real-world road datasets to demonstrate its superior predictive and interactive performance

Updated: 2024-05-08 07:48:40

标题: TrafficGPT：基于时空代理框架的多尺度交通分析与生成

摘要: 多尺度交通精确预测是城市化进程中对车主、道路管理者和政府普遍面临的挑战。在复杂道路网络的情况下，来自上游和下游道路的当前和历史交通信息至关重要，因为各种道路网络对交通具有不同的语义信息。合理利用语义信息可以实现短期、长期和未知道路交通预测。随着多尺度交通分析需求的增加，预计将提供按需交互和可视化功能给交通参与者。我们设计了一个多尺度交通生成系统，即TrafficGPT，使用三个人工智能代理处理多尺度交通数据，进行多尺度交通分析，并呈现多尺度可视化结果。TrafficGPT包括三个基本人工智能代理：1）一个文本到需求代理，与问答人工智能一起与用户互动，并通过文本提取预测任务；2）一个交通预测代理，利用多尺度交通数据生成时间特征和相似性，并将它们与有限的空间特征和相似性融合，以实现三项任务的准确预测；3）一个建议和可视化代理，利用预测结果生成建议和可视化，为用户提供对交通状况的全面理解。我们的TrafficGPT系统专注于解决交通参与者对交通预测的关注，并在五个真实道路数据集上进行了广泛实验，以展示其卓越的预测和交互性能。

更新时间: 2024-05-08 07:48:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.05985v1

The Power of Training: How Different Neural Network Setups Influence the Energy Demand

This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy consumption and carbon emission. Therefore, the goal of this work is to raise awareness about the energy impact of general training parameters and processes, from learning rate over batch size to knowledge transfer. Multiple setups with different hyperparameter configurations are evaluated on three different hardware systems. Among many results, we have found out that even with the same model and hardware to reach the same accuracy, improperly set training hyperparameters consume up to 5 times the energy of the optimal setup. We also extensively examined the energy-saving benefits of learning paradigms including recycling knowledge through pretraining and sharing knowledge through multitask training.

Updated: 2024-05-08 07:44:25

标题: 培训的力量：不同神经网络设置如何影响能量需求

摘要: 这项工作提供了对机器学习训练方案和学习范式变化对计算能耗的启发式评估，特别关注具有生命周期意识的高性能计算硬件的能耗。随着数据可用性的增加和高性能硬件创新推动复杂模型的训练，也助长了能耗和碳排放的消退认知。因此，本研究的目标是提高人们对训练参数和过程的能源影响的认识，从学习速率到批处理大小再到知识转移。在三种不同的硬件系统上评估了不同超参数配置的多个设置。在众多结果中，我们发现即使使用相同的模型和硬件达到相同的准确性，不正确设置的训练超参数消耗的能源可高达最佳设置的5倍。我们还广泛研究了通过预训练回收知识和通过多任务训练共享知识的学习范式的节能益处。

更新时间: 2024-05-08 07:44:25

领域: cs.LG,cs.AI,cs.PF

下载: http://arxiv.org/abs/2401.01851v3

Regime Learning for Differentiable Particle Filters

Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference. This paper concerns the case where the system may switch between a finite set of state-space models, i.e. regimes. No prior approaches effectively learn both the individual regimes and the switching process simultaneously. In this paper, we propose the neural network based regime learning differentiable particle filter (RLPF) to address this problem. We further design a training procedure for the RLPF and other related algorithms. We demonstrate competitive performance compared to the previous state-of-the-art algorithms on a pair of numerical experiments.

Updated: 2024-05-08 07:43:43

标题: Differentiable Particle Filters的制度学习

摘要: Differentiable particle filters是一类新兴的模型，将顺序蒙特卡洛技术与神经网络的灵活性结合起来，用于执行状态空间推断。本文涉及系统可能在有限状态空间模型之间切换的情况，即不同的制度。以往的方法未能有效地同时学习个体制度和切换过程。在本文中，我们提出了基于神经网络的制度学习可微粒子滤波器（RLPF）来解决这个问题。我们进一步设计了RLPF和其他相关算法的训练程序。通过一对数值实验，我们展示了与先前的最先进算法相比具有竞争力的性能。

更新时间: 2024-05-08 07:43:43

领域: cs.LG,eess.SP,68T37,I.2.6

下载: http://arxiv.org/abs/2405.04865v1

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are predominantly available in English, which imposes limitations on the comprehensiveness of the evaluation. To this end, we introduce CMMU, a novel benchmark for multi-modal and multi-type question understanding and reasoning in Chinese. CMMU consists of 3,603 questions in 7 subjects, covering knowledge from primary to high school. The questions can be categorized into 3 types: multiple-choice, multiple-response, and fill-in-the-blank, bringing greater challenges to MLLMs. In addition, we propose an evaluation strategy called Positional Error Variance for assessing multiple-choice questions. The strategy aims to perform a quantitative analysis of position bias. We evaluate seven open-source MLLMs along with GPT4-V, Gemini-Pro, and Qwen-VL-Plus. The results demonstrate that CMMU poses a significant challenge to the recent MLLMs. The data and code are available at https://github.com/FlagOpen/CMMU.

Updated: 2024-05-08 07:34:06

标题: CMMU：中文多模态多类型问题理解和推理的基准Benchmark

摘要: 多模态大型语言模型(MLLMs)取得了显著进展，并展示出强大的知识理解和推理能力。然而，掌握领域特定知识对评估MLLMs的智能至关重要，但仍然是一个挑战。当前用于领域特定知识的多模态基准主要集中在英语的多项选择题上，这对评估的全面性造成了限制。因此，我们引入了CMMU，一个用于中文多模态和多类型问题理解和推理的新基准。CMMU包括7个学科的3,603个问题，涵盖从小学到高中的知识。这些问题可以分为三种类型：多项选择、多项响应和填空，给MLLMs带来更大的挑战。此外，我们提出了一种评估策略，称为位置误差方差，用于评估多项选择题。该策略旨在对位置偏差进行定量分析。我们评估了七种开源MLLMs以及GPT4-V、Gemini-Pro和Qwen-VL-Plus。结果表明，CMMU对最近的MLLMs构成了重大挑战。数据和代码可在https://github.com/FlagOpen/CMMU获取。

更新时间: 2024-05-08 07:34:06

领域: cs.CL,cs.AI,cs.MM

下载: http://arxiv.org/abs/2401.14011v3

Anomaly Detection with Variance Stabilized Density Estimation

We propose a modified density estimation problem that is highly effective for detecting anomalies in tabular data. Our approach assumes that the density function is relatively stable (with lower variance) around normal samples. We have verified this hypothesis empirically using a wide range of real-world data. Then, we present a variance-stabilized density estimation problem for maximizing the likelihood of the observed samples while minimizing the variance of the density around normal samples. To obtain a reliable anomaly detector, we introduce a spectral ensemble of autoregressive models for learning the variance-stabilized distribution. We have conducted an extensive benchmark with 52 datasets, demonstrating that our method leads to state-of-the-art results while alleviating the need for data-specific hyperparameter tuning. Finally, we have used an ablation study to demonstrate the importance of each of the proposed components, followed by a stability analysis evaluating the robustness of our model.

Updated: 2024-05-08 07:27:18

标题: 使用方差稳定化密度估计进行异常检测

摘要: 我们提出了一个修改后的密度估计问题，对于在表格数据中检测异常非常有效。我们的方法假设密度函数在正常样本周围相对稳定（方差较小）。我们通过广泛的真实数据验证了这一假设。然后，我们提出了一个方差稳定的密度估计问题，以最大化观察样本的似然性，同时最小化正常样本周围的方差。为了获得可靠的异常检测器，我们引入了自回归模型的谱集成来学习方差稳定的分布。我们对52个数据集进行了广泛的基准测试，证明我们的方法取得了最先进的结果，同时减轻了对数据特定超参数调整的需求。最后，我们进行了消融研究，以展示每个提出的组件的重要性，然后进行了稳定性分析，评估我们模型的稳健性。

更新时间: 2024-05-08 07:27:18

领域: cs.LG,cs.AI,I.2

下载: http://arxiv.org/abs/2306.00582v2

ChatSOS: Vector Database Augmented Generative Question Answering Assistant in Safety Engineering

With the rapid advancement of natural language processing technologies, generative artificial intelligence techniques, represented by large language models (LLMs), are gaining increasing prominence and demonstrating significant potential for applications in safety engineering. However, fundamental LLMs face constraints such as limited training data coverage and unreliable responses. This study develops a vector database from 117 explosion accident reports in China spanning 2013 to 2023, employing techniques such as corpus segmenting and vector embedding. By utilizing the vector database, which outperforms the relational database in information retrieval quality, we provide LLMs with richer, more relevant knowledge. Comparative analysis of LLMs demonstrates that ChatSOS significantly enhances reliability, accuracy, and comprehensiveness, improves adaptability and clarification of responses. These results illustrate the effectiveness of supplementing LLMs with an external database, highlighting their potential to handle professional queries in safety engineering and laying a foundation for broader applications.

Updated: 2024-05-08 07:21:26

标题: ChatSOS：在安全工程中增强的向量数据库生成式问答助手

摘要: 随着自然语言处理技术的快速发展，以大型语言模型（LLMs）为代表的生成人工智能技术越来越受到重视，并展示出在安全工程应用中具有重要潜力。然而，基本的LLMs面临诸如有限的训练数据覆盖范围和不可靠的响应等限制。本研究从中国2013年至2023年的117起爆炸事故报告中开发了一个向量数据库，采用语料库切分和向量嵌入等技术。通过利用这个向量数据库，它在信息检索质量上优于关系数据库，我们为LLMs提供了更丰富、更相关的知识。LLMs的比较分析表明，ChatSOS显著提高了可靠性、准确性和全面性，改善了响应的适应性和澄清性。这些结果说明了通过外部数据库来补充LLMs的有效性，突显了它们处理安全工程专业查询的潜力，并为更广泛的应用奠定了基础。

更新时间: 2024-05-08 07:21:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06699v1

Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

Fine-tuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs fine-tuning and its accompanying privacy concerns, differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Lying at the design core of DP LLM fine-tuning methods is the satisfactory tradeoff among privacy, utility, and scalability. Most existing methods build upon the seminal work of DP-SGD. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD. In this paper, we investigate the potential of DP zeroth-order methods for LLM pretraining, which avoids the scalability bottleneck of SGD by approximating the gradient with the more efficient zeroth-order gradient. Rather than treating the zeroth-order method as a drop-in replacement for SGD, this paper presents a comprehensive study both theoretically and empirically. First, we propose the stagewise DP zeroth-order method (DP-ZOSO) that dynamically schedules key hyperparameters. This design is grounded on the synergy between DP random perturbation and the gradient approximation error of the zeroth-order method, and its effect on fine-tuning trajectory. We provide theoretical analysis for both proposed methods. We conduct extensive empirical analysis on both encoder-only masked language model and decoder-only autoregressive language model, achieving impressive results in terms of scalability and utility (compared with DPZero, DP-ZOPO improves 4.5% on SST-5, 5.5% on MNLI with RoBERTa-Large and 9.2% on CB, 3.9% on BoolQ with OPT-2.7B when $\epsilon=4$).

Updated: 2024-05-08 07:14:42

标题: 可伸缩的大型语言模型微调的差分隐私零阶方法

摘要: 在特定任务数据集上进行微调是一种广泛接受的利用预训练LLMs强大能力的范式，用于各种下游任务。由于LLMs微调的普及以及伴随的隐私问题，差分隐私（DP）微调预训练LLMs已被广泛用于保护特定任务数据集的隐私。DP LLM微调方法的设计核心在于隐私、效用和可伸缩性之间的令人满意的权衡。大多数现有方法建立在DP-SGD的开创性工作之上。尽管将DP-SGD的可伸缩性推至极限，但基于DP-SGD的微调方法不幸地受限于SGD的固有低效性。在本文中，我们研究了DP零阶方法在LLM预训练中的潜力，通过使用更高效的零阶梯度来近似梯度，避免了SGD的可伸缩性瓶颈。本文提出了分阶DP零阶方法（DP-ZOSO），动态调度关键超参数。这一设计基于DP随机扰动与零阶方法的梯度近似误差之间的协同作用，以及对微调轨迹的影响。我们为提出的两种方法提供了理论分析。我们对仅编码器掩码语言模型和仅解码器自回归语言模型进行了广泛的实证分析，在可伸缩性和效用方面取得了令人瞩目的结果（与DPZero相比，DP-ZOPO在SST-5上提高了4.5％，在MNLI上提高了5.5％，在CB上提高了9.2％，在BoolQ上提高了3.9％，当ε=4时与RoBERTa-Large和OPT-2.7B）。

更新时间: 2024-05-08 07:14:42

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.07818v3

Explaining Clustering of Ecological Momentary Assessment Data Through Temporal and Feature Attention

In the field of psychopathology, Ecological Momentary Assessment (EMA) studies offer rich individual data on psychopathology-relevant variables (e.g., affect, behavior, etc) in real-time. EMA data is collected dynamically, represented as complex multivariate time series (MTS). Such information is crucial for a better understanding of mental disorders at the individual- and group-level. More specifically, clustering individuals in EMA data facilitates uncovering and studying the commonalities as well as variations of groups in the population. Nevertheless, since clustering is an unsupervised task and true EMA grouping is not commonly available, the evaluation of clustering is quite challenging. An important aspect of evaluation is clustering explainability. Thus, this paper proposes an attention-based interpretable framework to identify the important time-points and variables that play primary roles in distinguishing between clusters. A key part of this study is to examine ways to analyze, summarize, and interpret the attention weights as well as evaluate the patterns underlying the important segments of the data that differentiate across clusters. To evaluate the proposed approach, an EMA dataset of 187 individuals grouped in 3 clusters is used for analyzing the derived attention-based importance attributes. More specifically, this analysis provides the distinct characteristics at the cluster-, feature- and individual level. Such clustering explanations could be beneficial for generalizing existing concepts of mental disorders, discovering new insights, and even enhancing our knowledge at an individual level.

Updated: 2024-05-08 07:09:43

标题: 通过时间和特征注意力解释生态瞬时评估数据的聚类

摘要: 在精神病理学领域，生态瞬时评估（EMA）研究提供了关于与精神病有关的变量（如情感、行为等）的丰富个体数据，实时收集。EMA数据是动态收集的，被表示为复杂的多变量时间序列（MTS）。这样的信息对于更好地理解个体和群体水平的精神障碍至关重要。更具体地说，在EMA数据中对个体进行聚类有助于揭示和研究人群中的共同点和变化。然而，由于聚类是一项无监督的任务，真正的EMA分组并不常见，因此聚类的评估非常具有挑战性。评估的一个重要方面是聚类可解释性。因此，本文提出了一种基于注意力的可解释框架，以识别在区分不同群体方面发挥主要作用的重要时间点和变量。本研究的一个关键部分是研究如何分析、总结和解释注意力权重，并评估潜在的模式，以及评估不同群体之间差异的数据重要部分。为评估所提出的方法，使用了一个由187个个体组成的EMA数据集，分为3个群体，用于分析由注意力为基础的重要属性。更具体地说，这种分析提供了在群体、特征和个体水平上的不同特征。这种聚类解释可以有助于概括现有的精神障碍概念，发现新的见解，甚至增进我们对个体水平的认识。

更新时间: 2024-05-08 07:09:43

领域: cs.LG

下载: http://arxiv.org/abs/2405.04854v1

Shedding Light on CVSS Scoring Inconsistencies: A User-Centric Study on Evaluating Widespread Security Vulnerabilities

The Common Vulnerability Scoring System (CVSS) is a popular method for evaluating the severity of vulnerabilities in vulnerability management. In the evaluation process, a numeric score between 0 and 10 is calculated, 10 being the most severe (critical) value. The goal of CVSS is to provide comparable scores across different evaluators. However, previous works indicate that CVSS might not reach this goal: If a vulnerability is evaluated by several analysts, their scores often differ. This raises the following questions: Are CVSS evaluations consistent? Which factors influence CVSS assessments? We systematically investigate these questions in an online survey with 196 CVSS users. We show that specific CVSS metrics are inconsistently evaluated for widespread vulnerability types, including Top 3 vulnerabilities from the "2022 CWE Top 25 Most Dangerous Software Weaknesses" list. In a follow-up survey with 59 participants, we found that for the same vulnerabilities from the main study, 68% of these users gave different severity ratings. Our study reveals that most evaluators are aware of the problematic aspects of CVSS, but they still see CVSS as a useful tool for vulnerability assessment. Finally, we discuss possible reasons for inconsistent evaluations and provide recommendations on improving the consistency of scoring.

Updated: 2024-05-08 07:08:48

标题: 揭示CVSS评分不一致性：关于评估广泛安全漏洞的用户中心研究

摘要: 通用漏洞评分系统（CVSS）是一种评估漏洞管理中漏洞严重性的流行方法。在评估过程中，会计算一个介于0和10之间的数字分数，10是最严重（关键）的值。CVSS的目标是为不同的评估者提供可比较的分数。然而，先前的研究表明，CVSS可能无法达到这个目标：如果一个漏洞由几个分析师评估，他们的分数通常会有所不同。这引发了以下问题：CVSS评估是否一致？哪些因素影响CVSS评估？我们通过一项在线调查系统地调查了这些问题，共有196名CVSS用户参与。我们发现特定的CVSS指标对于广泛的漏洞类型（包括“2022年CWE最危险软件弱点前25位”列表中的前3个漏洞）的评估不一致。在一项随后的调查中，有59名参与者，我们发现对于主要研究中相同的漏洞，68%的用户给出了不同的严重性评级。我们的研究揭示了大多数评估者意识到CVSS存在问题，但他们仍然认为CVSS是一个有用的漏洞评估工具。最后，我们讨论了评估不一致的可能原因，并提供了提高评分一致性的建议。

更新时间: 2024-05-08 07:08:48

领域: cs.CR

下载: http://arxiv.org/abs/2308.15259v2

Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

There are two categories of methods in Federated Learning (FL) for joint training across multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) sequential FL (SFL), where clients train models in a sequential manner. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. In this paper, we establish the convergence guarantees of SFL for strongly/general/non-convex objectives on heterogeneous data. The convergence guarantees of SFL are better than that of PFL on heterogeneous data with both full and partial client participation. Experimental results validate the counterintuitive analysis result that SFL outperforms PFL on extremely heterogeneous data in cross-device settings.

Updated: 2024-05-08 07:08:34

标题: 异构数据上顺序联邦学习的收敛性分析

摘要: 在联邦学习（FL）中，有两类用于跨多个客户进行联合训练的方法：i）并行FL（PFL），其中客户以并行方式训练模型；和ii）顺序FL（SFL），其中客户以顺序方式训练模型。与PFL相比，SFL在异构数据上的收敛理论仍然缺乏。在本文中，我们建立了SFL在异构数据上对强/一般/非凸目标的收敛保证。SFL在异构数据上的收敛保证优于PFL，无论客户全面还是部分参与。实验结果验证了反直觉分析结果，即在跨设备环境中，SFL在极端异构数据上优于PFL。

更新时间: 2024-05-08 07:08:34

领域: cs.LG

下载: http://arxiv.org/abs/2311.03154v2

Machine Learning Optimized Orthogonal Basis Piecewise Polynomial Approximation

Piecewise Polynomials (PPs) are utilized in several engineering disciplines, like trajectory planning, to approximate position profiles given in the form of a set of points. While the approximation target along with domain-specific requirements, like Ck -continuity, can be formulated as a system of equations and a result can be computed directly, such closed-form solutions posses limited flexibility with respect to polynomial degrees, polynomial bases or adding further domain-specific requirements. Sufficiently complex optimization goals soon call for the use of numerical methods, like gradient descent. Since gradient descent lies at the heart of training Artificial Neural Networks (ANNs), modern Machine Learning (ML) frameworks like TensorFlow come with a set of gradient-based optimizers potentially suitable for a wide range of optimization problems beyond the training task for ANNs. Our approach is to utilize the versatility of PP models and combine it with the potential of modern ML optimizers for the use in function approximation in 1D trajectory planning in the context of electronic cam design. We utilize available optimizers of the ML framework TensorFlow directly, outside of the scope of ANNs, to optimize model parameters of our PP model. In this paper, we show how an orthogonal polynomial basis contributes to improving approximation and continuity optimization performance. Utilizing Chebyshev polynomials of the first kind, we develop a novel regularization approach enabling clearly improved convergence behavior. We show that, using this regularization approach, Chebyshev basis performs better than power basis for all relevant optimizers in the combined approximation and continuity optimization setting and demonstrate usability of the presented approach within the electronic cam domain.

Updated: 2024-05-08 07:07:25

标题: 机器学习优化的正交基分段多项式逼近

摘要: 分段多项式（PPs）被用于几个工程学科，例如轨迹规划，以近似以一组点形式给出的位置曲线。虽然近似目标以及特定域要求，如Ck-连续性，可以被表述为一个方程系统，并且结果可以直接计算，但这种封闭形式的解决方案在多项式次数、多项式基础或添加更多特定域要求方面具有有限的灵活性。足够复杂的优化目标很快需要使用数值方法，如梯度下降。由于梯度下降是训练人工神经网络（ANNs）的核心，现代机器学习（ML）框架如TensorFlow配备了一组基于梯度的优化器，可能适用于训练任务之外的广泛优化问题。我们的方法是利用PP模型的多功能性，并结合现代ML优化器的潜力，用于在电子凸轮设计环境中的1D轨迹规划中的函数近似。我们直接利用ML框架TensorFlow的可用优化器，超出了ANN的范围，来优化我们的PP模型的模型参数。在本文中，我们展示正交多项式基础如何有助于改善近似和连续性优化性能。利用第一类Chebyshev多项式，我们开发了一种新的正则化方法，实现了明显改进的收敛行为。我们展示，使用这种正则化方法，Chebyshev基础在组合近似和连续性优化设置中比幂基础更好，并展示了所提出方法在电子凸轮领域内的可用性。

更新时间: 2024-05-08 07:07:25

领域: cs.LG

下载: http://arxiv.org/abs/2403.08579v3

Distribution-aware Fairness Test Generation

Ensuring that all classes of objects are detected with equal accuracy is essential in AI systems. For instance, being unable to identify any one class of objects could have fatal consequences in autonomous driving systems. Hence, ensuring the reliability of image recognition systems is crucial. This work addresses how to validate group fairness in image recognition software. We propose a distribution-aware fairness testing approach (called DistroFair) that systematically exposes class-level fairness violations in image classifiers via a synergistic combination of out-of-distribution (OOD) testing and semantic-preserving image mutation. DistroFair automatically learns the distribution (e.g., number/orientation) of objects in a set of images. Then it systematically mutates objects in the images to become OOD using three semantic-preserving image mutations - object deletion, object insertion and object rotation. We evaluate DistroFair using two well-known datasets (CityScapes and MS-COCO) and three major, commercial image recognition software (namely, Amazon Rekognition, Google Cloud Vision and Azure Computer Vision). Results show that about 21% of images generated by DistroFair reveal class-level fairness violations using either ground truth or metamorphic oracles. DistroFair is up to 2.3x more effective than two main baselines, i.e., (a) an approach which focuses on generating images only within the distribution (ID) and (b) fairness analysis using only the original image dataset. We further observed that DistroFair is efficient, it generates 460 images per hour, on average. Finally, we evaluate the semantic validity of our approach via a user study with 81 participants, using 30 real images and 30 corresponding mutated images generated by DistroFair. We found that images generated by DistroFair are 80% as realistic as real-world images.

Updated: 2024-05-08 07:06:46

标题: 基于分布的公平性测试生成

摘要: 确保所有类别的对象都能以相等的准确度被检测到在人工智能系统中是至关重要的。例如，在自动驾驶系统中无法识别任何一类对象可能会造成致命后果。因此，确保图像识别系统的可靠性至关重要。本文探讨了如何验证图像识别软件中的组公平性。我们提出了一种分布感知的公平性测试方法（称为DistroFair），通过在图像分类器中系统地暴露类别级公平性违规行为，通过超出分布（OOD）测试和保留语义的图像变异的协同组合。DistroFair自动学习一组图像中对象的分布（例如数量/方向）。然后，通过三种保留语义的图像变异 - 对象删除、对象插入和对象旋转，系统地将图像中的对象变异为OOD。我们使用两个著名数据集（CityScapes和MS-COCO）和三个主要的商业图像识别软件（即Amazon Rekognition、Google Cloud Vision和Azure Computer Vision）对DistroFair进行评估。结果显示，DistroFair生成的图像中约有21%显示出类别级公平性违规，无论是使用地面真相还是变形神谕。DistroFair比两个主要基线方法更有效，即(a) 一种专注于仅在分布（ID）内生成图像的方法和(b) 仅使用原始图像数据集进行公平性分析。我们进一步观察到，DistroFair是高效的，平均每小时生成460张图像。最后，我们通过一个用户研究评估了我们方法的语义有效性，共有81名参与者，使用30张真实图像和30张由DistroFair生成的相应变异图像。我们发现，由DistroFair生成的图像与真实世界的图像相比有80%的真实性。

更新时间: 2024-05-08 07:06:46

领域: cs.CV,cs.LG,cs.SE

下载: http://arxiv.org/abs/2305.13935v4

Noisy Node Classification by Bi-level Optimization based Multi-teacher Distillation

Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning methods to train diverse teacher models, and then aggregate their predictions through a teacher weight matrix. Furthermore, we design a new bi-level optimization strategy to dynamically adjust the teacher weight matrix based on the training progress of the student model. Finally, we design a label improvement module to improve the label quality. Extensive experimental results on real datasets show that our method achieves the best results compared to state-of-the-art methods.

Updated: 2024-05-08 06:56:53

标题: 多级优化基于多教师蒸馏的噪声节点分类

摘要: 以往的图神经网络（GNNs）通常假设图数据具有干净的标签用于表示学习，但在实际应用中这并不成立。本文提出了一种基于双层优化的新型多教师蒸馏方法（即BO-NNC），用于在图数据上进行有噪声节点分类。具体而言，我们首先利用多种自监督学习方法训练多个教师模型，然后通过教师权重矩阵汇总它们的预测结果。此外，我们设计了一种新的双层优化策略，根据学生模型的训练进展动态调整教师权重矩阵。最后，我们设计了一个标签改进模块来提高标签质量。对真实数据集的广泛实验结果表明，与现有方法相比，我们的方法取得了最佳结果。

更新时间: 2024-05-08 06:56:53

领域: cs.LG

下载: http://arxiv.org/abs/2404.17875v2

Solving Elliptic Optimal Control Problems via Neural Networks and Optimality System

In this work, we investigate a neural network based solver for optimal control problems (without / with box constraint) for linear and semilinear second-order elliptic problems. It utilizes a coupled system derived from the first-order optimality system of the optimal control problem, and employs deep neural networks to represent the solutions to the reduced system. We present an error analysis of the scheme, and provide $L^2(\Omega)$ error bounds on the state, control and adjoint in terms of neural network parameters (e.g., depth, width, and parameter bounds) and the numbers of sampling points. The main tools in the analysis include offset Rademacher complexity and boundedness and Lipschitz continuity of neural network functions. We present several numerical examples to illustrate the method and compare it with two existing ones.

Updated: 2024-05-08 06:54:20

标题: 通过神经网络和最优性系统解决椭圆型最优控制问题

摘要: 在这项工作中，我们研究了基于神经网络的求解器，用于线性和半线性二阶椭圆问题的最优控制问题（无/带有箱约束）。它利用从最优控制问题的一阶最优性系统导出的耦合系统，并采用深度神经网络来表示简化系统的解。我们对该方案进行了误差分析，并根据神经网络参数（例如深度、宽度和参数上界）和采样点的数量提供了状态、控制和伴随的$L^2(\Omega)$误差界。分析中的主要工具包括偏移Rademacher复杂度以及神经网络函数的有界性和Lipschitz连续性。我们提供了几个数值例子来说明该方法，并将其与两种现有方法进行了比较。

更新时间: 2024-05-08 06:54:20

领域: math.OC,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2308.11925v2

Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0

Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and reliability to enhance block propagation performance. In this paper, we design a Graph Attention Network (GAT)-based reliable block propagation optimization framework for blockchain-enabled Web 3.0. We first innovatively apply a data-freshness metric called age of block to measure block propagation efficiency in public blockchains. To achieve the reliability of block propagation, we introduce a reputation mechanism based on the subjective logic model, including the local and recommended opinions to calculate the miner reputation value. Moreover, considering that the GAT possesses the excellent ability to process graph-structured data, we utilize the GAT with reinforcement learning to obtain the optimal block propagation trajectory. Numerical results demonstrate that the proposed scheme exhibits the most outstanding block propagation efficiency and reliability compared with traditional routing mechanisms.

Updated: 2024-05-08 06:40:19

标题: 基于图注意力网络的Web 3.0中具有最佳AoI和声誉的块传播

摘要: Web 3.0被认为是一种开创性的范式，赋予用户在没有依赖于中央权威的情况下安全监管数据的能力。作为实现Web 3.0的核心技术，区块链可以促进去中心化和透明的数据管理。然而，基于区块链的Web 3.0的演进仍处于起步阶段，面临着诸如确保效率和可靠性以增强区块传播性能等挑战。在本文中，我们为基于区块链的Web 3.0设计了一个基于图注意力网络（GAT）的可靠区块传播优化框架。我们首次创新地应用了一个称为区块年龄的数据新鲜度度量标准来衡量公共区块链中的区块传播效率。为了实现区块传播的可靠性，我们引入了一个基于主观逻辑模型的声誉机制，包括本地和推荐意见来计算矿工的声誉值。此外，考虑到GAT具有处理图结构数据的出色能力，我们利用具有强化学习的GAT来获得最佳的区块传播轨迹。数值结果表明，与传统路由机制相比，所提出的方案表现出最突出的区块传播效率和可靠性。

更新时间: 2024-05-08 06:40:19

领域: cs.CR,math.OC

下载: http://arxiv.org/abs/2403.13237v2

xMTrans: Temporal Attentive Cross-Modality Fusion Transformer for Long-Term Traffic Prediction

Traffic predictions play a crucial role in intelligent transportation systems. The rapid development of IoT devices allows us to collect different kinds of data with high correlations to traffic predictions, fostering the development of efficient multi-modal traffic prediction models. Until now, there are few studies focusing on utilizing advantages of multi-modal data for traffic predictions. In this paper, we introduce a novel temporal attentive cross-modality transformer model for long-term traffic predictions, namely xMTrans, with capability of exploring the temporal correlations between the data of two modalities: one target modality (for prediction, e.g., traffic congestion) and one support modality (e.g., people flow). We conducted extensive experiments to evaluate our proposed model on traffic congestion and taxi demand predictions using real-world datasets. The results showed the superiority of xMTrans against recent state-of-the-art methods on long-term traffic predictions. In addition, we also conducted a comprehensive ablation study to further analyze the effectiveness of each module in xMTrans.

Updated: 2024-05-08 06:29:26

标题: xMTrans：用于长期交通预测的时间注意力跨模态融合Transformer

摘要: 交通预测在智能交通系统中起着至关重要的作用。物联网设备的快速发展使我们能够收集与交通预测高度相关的不同类型的数据，促进了高效多模态交通预测模型的发展。到目前为止，很少有研究专注于利用多模态数据的优势进行交通预测。在本文中，我们介绍了一种用于长期交通预测的新颖的时间关注跨模态变压器模型，即xMTrans，具有探索两种模态数据之间的时间相关性的能力：一个目标模态（用于预测，例如交通拥堵）和一个支持模态（例如人流）。我们进行了大量实验，使用真实世界数据集评估了我们提出的模型在交通拥堵和出租车需求预测方面的表现。结果显示xMTrans在长期交通预测方面优于最近的最先进方法。此外，我们还进行了全面的消融研究，以进一步分析xMTrans中每个模块的有效性。

更新时间: 2024-05-08 06:29:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04841v1

Enhancing Data Integrity and Traceability in Industry Cyber Physical Systems (ICPS) through Blockchain Technology: A Comprehensive Approach

Blockchain technology, heralded as a transformative innovation, has far-reaching implications beyond its initial application in cryptocurrencies. This study explores the potential of blockchain in enhancing data integrity and traceability within Industry Cyber-Physical Systems (ICPS), a crucial aspect in the era of Industry 4.0. ICPS, integrating computational and physical components, is pivotal in managing critical infrastructure like manufacturing, power grids, and transportation networks. However, they face challenges in security, privacy, and reliability. With its inherent immutability, transparency, and distributed consensus, blockchain presents a groundbreaking approach to address these challenges. It ensures robust data reliability and traceability across ICPS, enhancing transaction transparency and facilitating secure data sharing. This research unearths various blockchain applications in ICPS, including supply chain management, quality control, contract management, and data sharing. Each application demonstrates blockchain's capacity to streamline processes, reduce fraud, and enhance system efficiency. In supply chain management, blockchain provides real-time auditing and compliance. For quality control, it establishes tamper-proof records, boosting consumer confidence. In contract management, smart contracts automate execution, enhancing efficiency. Blockchain also fosters secure collaboration in ICPS, which is crucial for system stability and safety. This study emphasizes the need for further research on blockchain's practical implementation in ICPS, focusing on challenges like scalability, system integration, and security vulnerabilities. It also suggests examining blockchain's economic and organizational impacts in ICPS to understand its feasibility and long-term advantages.

Updated: 2024-05-08 06:22:37

标题: 通过区块链技术加强工业物联网系统（ICPS）中的数据完整性和可追溯性：一种综合方法

摘要: 区块链技术被誉为一项革命性创新，其影响远远超出了最初在加密货币中的应用。本研究探讨了区块链在增强工业网络物理系统（ICPS）中数据完整性和可追溯性方面的潜力，这在工业4.0时代至关重要。ICPS集成了计算和物理组件，在管理制造、电力网和交通网络等关键基础设施方面起着关键作用。然而，它们面临着安全、隐私和可靠性方面的挑战。区块链具有固有的不可变性、透明性和分布式共识，提供了一种突破性的方法来应对这些挑战。它确保了ICPS范围内数据的可靠性和可追溯性，增强了交易透明度，并促进了安全数据共享。这项研究揭示了ICPS中各种区块链应用，包括供应链管理、质量控制、合同管理和数据共享。每种应用都展示了区块链简化流程、减少欺诈和提高系统效率的能力。在供应链管理方面，区块链提供了实时审计和合规性。对于质量控制，它建立了防篡改记录，增强了消费者信心。在合同管理方面，智能合同自动化执行，提高了效率。区块链还促进了ICPS中的安全协作，这对系统的稳定性和安全性至关重要。本研究强调了进一步研究区块链在ICPS中实际实施的必要性，重点关注可扩展性、系统集成和安全漏洞等挑战。它还建议研究区块链在ICPS中的经济和组织影响，以了解其可行性和长期优势。

更新时间: 2024-05-08 06:22:37

领域: cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.04837v1

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing devices have become ubiquitous, greatly expanding the boundaries of IPAs. However, due to the lack of capabilities such as user intent understanding, task planning, tool using, and personal data management etc., existing IPAs still have limited practicality and scalability. Recently, the emergence of foundation models, represented by large language models (LLMs), brings new opportunities for the development of IPAs. With the powerful semantic understanding and reasoning capabilities, LLM can enable intelligent agents to solve complex problems autonomously. In this paper, we focus on Personal LLM Agents, which are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. We envision that Personal LLM Agents will become a major software paradigm for end-users in the upcoming era. To realize this vision, we take the first step to discuss several important questions about Personal LLM Agents, including their architecture, capability, efficiency and security. We start by summarizing the key components and design choices in the architecture of Personal LLM Agents, followed by an in-depth analysis of the opinions collected from domain experts. Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.

Updated: 2024-05-08 06:16:23

标题: 个人LLM代理：关于能力、效率和安全性的见解和调查

摘要: 自个人计算设备问世以来，智能个人助理（IPAs）一直是研究人员和工程师关注的关键技术之一，旨在帮助用户有效地获取信息和执行任务，并为用户提供更智能、便捷和丰富的交互体验。随着智能手机和物联网的发展，计算和传感设备已经无处不在，极大地扩展了IPAs的边界。然而，由于缺乏用户意图理解、任务规划、工具使用和个人数据管理等能力，现有的IPAs仍然具有有限的实用性和可扩展性。最近，以大型语言模型（LLMs）为代表的基础模型的出现为IPAs的发展带来了新的机遇。凭借强大的语义理解和推理能力，LLM可以使智能代理能够自主解决复杂问题。本文关注个人LLM代理，这是基于LLM的与个人数据和个人设备深度集成并用于个人辅助的代理。我们设想个人LLM代理将成为未来时代终端用户的主要软件范式。为了实现这一愿景，我们首先讨论个人LLM代理的几个重要问题，包括其架构、能力、效率和安全性。我们首先总结个人LLM代理架构中的关键组件和设计选择，然后深入分析从领域专家收集的意见。接下来，我们讨论实现智能、高效和安全个人LLM代理的几个关键挑战，然后全面调查解决这些挑战的代表性解决方案。

更新时间: 2024-05-08 06:16:23

领域: cs.HC,cs.AI,cs.SE

下载: http://arxiv.org/abs/2401.05459v2

Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution

Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors ($i.e.$, backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity. In this paper, we argue that both limitations stem from the `zero-bit' nature of existing watermarking schemes, where they exploit the status ($i.e.$, misclassified) of predictions for verification. Motivated by this understanding, we design a new watermarking paradigm, $i.e.$, Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution instead of model predictions. Specifically, EaaW embeds a `multi-bit' watermark into the feature attribution explanation of specific trigger samples without changing the original prediction. We correspondingly design the watermark embedding and extraction algorithms inspired by explainable artificial intelligence. In particular, our approach can be used for different tasks ($e.g.$, image classification and text generation). Extensive experiments verify the effectiveness and harmlessness of our EaaW and its resistance to potential attacks.

Updated: 2024-05-08 05:49:46

标题: 解释作为水印：通过水印特征归因实现无害和多比特模型所有权验证

摘要: 所有权验证目前是保护模型版权最关键和广泛采用的事后方法。一般来说，模型所有者利用它来识别给定的可疑第三方模型是否是从他们那里窃取的，方法是检查它是否具有从他们发布的模型中“继承”的特定属性。目前，基于后门的模型水印是在发布的模型中植入这些属性的主要和前沿方法。然而，基于后门的方法有两个致命缺点，包括有害性和模糊性。前者表示它们向水印发布的模型引入了恶意可控的错误分类行为（即，后门）。后者表示恶意用户可以轻易通过找到其他错误分类的样本来通过验证，导致所有权的不确定性。在本文中，我们认为这两个限制都源自现有水印方案的“零比特”特性，即它们利用预测的状态（即，错误分类）进行验证。受到这一理解的启发，我们设计了一种新的水印范式，即，解释作为水印（EaaW），将验证行为植入到特征归属的解释中，而不是模型预测中。具体来说，EaaW将一个“多比特”水印嵌入到特定触发样本的特征归属解释中，而不改变原始预测。我们相应地设计了受可解释人工智能启发的水印嵌入和提取算法。特别是，我们的方法可以用于不同的任务（例如，图像分类和文本生成）。大量实验证实了我们的EaaW的有效性和无害性，以及其对潜在攻击的抵抗力。

更新时间: 2024-05-08 05:49:46

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04825v1

Quantum-Edge Cloud Computing: A Future Paradigm for IoT Applications

The Internet of Things (IoT) is expanding rapidly, which has created a need for sophisticated computational frameworks that can handle the data and security requirements inherent in modern IoT applications. However, traditional cloud computing frameworks have struggled with latency, scalability, and security vulnerabilities. Quantum-Edge Cloud Computing (QECC) is a new paradigm that effectively addresses these challenges by combining the computational power of quantum computing, the low-latency benefits of edge computing, and the scalable resources of cloud computing. This study has been conducted based on a published literature review, performance improvements, and metrics data from Bangladesh on smart city infrastructure, healthcare monitoring, and the industrial IoT sector. We have discussed the integration of quantum cryptography to enhance data integrity, the role of edge computing in reducing response times, and how cloud computing's resource abundance can support large IoT networks. We examine case studies, such as the use of quantum sensors in self-driving vehicles, to illustrate the real-world impact of QECC. Furthermore, the paper identifies future research directions, including developing quantum-resistant encryption and optimizing quantum algorithms for edge computing. The convergence of these technologies in QECC promises to overcome the existing limitations of IoT frameworks and set a new standard for the future of IoT applications.

Updated: 2024-05-08 05:42:26

标题: 量子边缘云计算：物联网应用的未来范式

摘要: 物联网（IoT）正在迅速扩张，这导致了对能够处理现代IoT应用程序中的数据和安全需求的复杂计算框架的需求。然而，传统的云计算框架在延迟、可扩展性和安全漏洞方面存在困难。量子边缘云计算（QECC）是一种新的范式，通过结合量子计算的计算能力、边缘计算的低延迟优势和云计算的可扩展资源，有效地解决了这些挑战。本研究基于孟加拉国关于智慧城市基础设施、医疗监测和工业物联网领域的文献综述、性能改进和指标数据进行了研究。我们讨论了量子密码学的整合以增强数据完整性，边缘计算在减少响应时间中的作用，以及云计算的资源丰富性如何支持大型IoT网络。我们研究了一些案例研究，比如在自动驾驶车辆中使用量子传感器，以说明QECC的实际影响。此外，本文确定了未来的研究方向，包括开发量子抗性加密和优化用于边缘计算的量子算法。这些技术在QECC中的融合有望克服现有IoT框架的局限性，为IoT应用的未来设立新的标准。

更新时间: 2024-05-08 05:42:26

领域: cs.CR

下载: http://arxiv.org/abs/2405.04824v1

Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards

Recent work in reinforcement learning has leveraged symmetries in the model to improve sample efficiency in training a policy. A commonly used simplifying assumption is that the dynamics and reward both exhibit the same symmetry. However, in many real-world environments, the dynamical model exhibits symmetry independent of the reward model: the reward may not satisfy the same symmetries as the dynamics. In this paper, we investigate scenarios where only the dynamics are assumed to exhibit symmetry, extending the scope of problems in reinforcement learning and learning in control theory where symmetry techniques can be applied. We use Cartan's moving frame method to introduce a technique for learning dynamics which, by construction, exhibit specified symmetries. We demonstrate through numerical experiments that the proposed method learns a more accurate dynamical model.

Updated: 2024-05-08 05:41:07

标题: 利用动态对称性在具有不对称奖励的基于模型的强化学习中的应用

摘要: 最近在强化学习领域的研究利用模型中的对称性来提高训练策略的样本效率。一个常用的简化假设是动态和奖励都表现出相同的对称性。然而，在许多现实世界的环境中，动态模型展示出与奖励模型无关的对称性：奖励可能不满足与动态相同的对称性。在本文中，我们研究了仅假设动态表现出对称性的场景，扩展了强化学习和控制理论学习中可应用对称技术的问题范围。我们利用Cartan的移动框架方法引入了一种学习动态的技术，该技术通过构建展示指定对称性的动态。通过数值实验，我们证明了所提出的方法学习到了更准确的动态模型。

更新时间: 2024-05-08 05:41:07

领域: cs.LG,cs.AI,cs.RO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2403.19024v2

APrompt4EM: Augmented Prompt Tuning for Generalized Entity Matching

Generalized Entity Matching (GEM), which aims at judging whether two records represented in different formats refer to the same real-world entity, is an essential task in data management. The prompt tuning paradigm for pre-trained language models (PLMs), including the recent PromptEM model, effectively addresses the challenges of low-resource GEM in practical applications, offering a robust solution when labeled data is scarce. However, existing prompt tuning models for GEM face the challenges of prompt design and information gap. This paper introduces an augmented prompt tuning framework for the challenges, which consists of two main improvements. The first is an augmented contextualized soft token-based prompt tuning method that extracts a guiding soft token benefit for the PLMs' prompt tuning, and the second is a cost-effective information augmentation strategy leveraging large language models (LLMs). Our approach performs well on the low-resource GEM challenges. Extensive experiments show promising advancements of our basic model without information augmentation over existing methods based on moderate-size PLMs (average 5.24%+), and our model with information augmentation achieves comparable performance compared with fine-tuned LLMs, using less than 14% of the API fee.

Updated: 2024-05-08 05:38:56

标题: APrompt4EM: 增强提示调整用于广义实体匹配

摘要: 广义实体匹配（GEM）旨在判断以不同格式表示的两个记录是否指代同一实际世界实体，这是数据管理中的一个关键任务。针对预训练语言模型（PLMs）的即时调整范式，包括最近的PromptEM模型，有效地解决了实际应用中低资源GEM的挑战，提供了在标记数据稀缺时的稳健解决方案。然而，现有的GEM即时调整模型面临即时设计和信息缺口的挑战。本文介绍了一个增强的即时调整框架，用于解决这些挑战，该框架包括两个主要改进。第一个是基于增强上下文软标记的即时调整方法，该方法提取了PLMs即时调整的指导软标记效益；第二个是利用大型语言模型（LLMs）的成本效益信息增强策略。我们的方法在低资源GEM挑战上表现良好。大量实验证明，我们的基本模型相对于基于中等规模PLMs的现有方法有了很大的进步（平均5.24%+），而我们的信息增强模型与经过微调的LLMs相比取得了可比的性能，并且API费用不到14%。

更新时间: 2024-05-08 05:38:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04820v1

DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK.

Updated: 2024-05-08 05:38:20

标题: DALK：动态协同增强LLMs和KG以利用科学文献回答阿尔茨海默病问题

摘要: 最近大型语言模型（LLMs）的进展在各种应用中取得了令人期待的表现。然而，长尾知识整合的持续挑战继续阻碍了LLMs在专业领域的无缝应用。在这项工作中，我们介绍了DALK，即LLMs和KG的动态共同增强，以解决这一限制，并展示其在研究阿尔茨海默病（AD）上的能力，这是生物医学领域的一个专门子领域，也是全球健康优先事项。通过LLM和KG相互增强的协同框架，我们首先利用LLM构建一个源自AD相关科学文献的不断发展的AD特定知识图（KG），然后我们利用一种新颖的自我意识知识检索方法进行粗到细的采样，从KG中选择适当的知识来增强LLM的推理能力。在我们构建的AD问答（ADQA）基准上进行的实验结果强调了DALK的有效性。此外，我们进行了一系列详细分析，可以为相互增强KG和LLM这一新兴主题提供有价值的见解和指导。我们将在https://github.com/David-Li0406/DALK发布代码和数据。

更新时间: 2024-05-08 05:38:20

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04819v1

Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks

Machine learning models, in particular deep neural networks, are currently an integral part of various applications, from healthcare to finance. However, using sensitive data to train these models raises concerns about privacy and security. One method that has emerged to verify if the trained models are privacy-preserving is Membership Inference Attacks (MIA), which allows adversaries to determine whether a specific data point was part of a model's training dataset. While a series of MIAs have been proposed in the literature, only a few can achieve high True Positive Rates (TPR) in the low False Positive Rate (FPR) region (0.01%~1%). This is a crucial factor to consider for an MIA to be practically useful in real-world settings. In this paper, we present a novel approach to MIA that is aimed at significantly improving TPR at low FPRs. Our method, named learning-based difficulty calibration for MIA(LDC-MIA), characterizes data records by their hardness levels using a neural network classifier to determine membership. The experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs. It also has the highest Area Under ROC curve (AUC) across all datasets. Our method's cost is comparable with most of the existing MIAs, but is orders of magnitude more efficient than one of the state-of-the-art methods, LiRA, while achieving similar performance.

Updated: 2024-05-08 05:32:06

标题: 基于学习的困难度校准以增强成员推理攻击

摘要: 机器学习模型，特别是深度神经网络，目前已成为各种应用的重要组成部分，从医疗保健到金融。然而，使用敏感数据来训练这些模型引发了对隐私和安全性的担忧。一种验证训练模型是否保护隐私的方法是成员推断攻击（MIA），它允许对手确定特定数据点是否是模型的训练数据集的一部分。虽然文献中提出了一系列MIA，但只有少数能够在低错误阳性率（FPR）区域（0.01%~1%）实现高真阳性率（TPR）。这是考虑MIA在实际环境中实用性时必须考虑的关键因素。在本文中，我们提出了一种旨在显著提高低FPR时TPR的MIA的新方法。我们的方法，命名为基于学习的MIA难度校准（LDC-MIA），使用神经网络分类器通过其难度级别来表征数据记录以确定成员身份。实验结果显示，与其他基于难度校准的MIA相比，LDC-MIA可以将低FPR时的TPR提高多达4倍。它还在所有数据集中具有最高的ROC曲线下面积（AUC）。我们的方法成本与大多数现有的MIA相当，但与最先进的方法之一LiRA相比，效率更高几个数量级，同时实现类似的性能。

更新时间: 2024-05-08 05:32:06

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.04929v2

On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression

We revisit the problem of the existence of the maximum likelihood estimate for multi-class logistic regression. We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The notion of data separability is not needed, which is in contrast to the classical set up of multi-class logistic regression in which each data sample belongs to one class. We also provide a general and constructive estimate of the convergence rate to the maximum likelihood estimate when gradient descent is used as the optimizer. Our estimate involves bounding the condition number of the Hessian of the maximum likelihood function. The approaches used in this article rely on a simple operator-theoretic framework.

Updated: 2024-05-08 05:31:36

标题: 关于多类逻辑回归中最大似然估计存在性和梯度下降收敛速率的研究

摘要: 我们重新讨论了多类逻辑回归模型最大似然估计存在性的问题。我们展示了确保其存在的一种方法是在样本数据集中为每个类别分配正概率。与经典的多类逻辑回归设置相比，数据可分离性的概念是不需要的，其中每个数据样本属于一个类别。当梯度下降被用作优化器时，我们还提供了最大似然估计收敛速率的一般和具体估计。我们的估计涉及对最大似然函数的Hessian矩阵的条件数进行边界约束。本文使用的方法依赖于一个简单的算子理论框架。

更新时间: 2024-05-08 05:31:36

领域: cs.LG,math.ST,stat.TH,62J12, 65K10, 47N10

下载: http://arxiv.org/abs/2012.04576v5

Proportion Estimation by Masked Learning from Label Proportion

The PD-L1 rate, the number of PD-L1 positive tumor cells over the total number of all tumor cells, is an important metric for immunotherapy. This metric is recorded as diagnostic information with pathological images. In this paper, we propose a proportion estimation method with a small amount of cell-level annotation and proportion annotation, which can be easily collected. Since the PD-L1 rate is calculated from only `tumor cells' and not using `non-tumor cells', we first detect tumor cells with a detection model. Then, we estimate the PD-L1 proportion by introducing a masking technique to `learning from label proportion.' In addition, we propose a weighted focal proportion loss to address data imbalance problems. Experiments using clinical data demonstrate the effectiveness of our method. Our method achieved the best performance in comparisons.

Updated: 2024-05-08 05:29:38

标题: 使用标签比例的掩盖学习进行比例估计

摘要: PD-L1率，即PD-L1阳性肿瘤细胞数占所有肿瘤细胞总数的比率，是免疫疗法的重要指标。这一指标作为诊断信息记录在病理图像中。本文提出了一种利用少量细胞级注释和比例注释进行比例估计的方法，这些数据可以很容易地收集。由于PD-L1率仅从“肿瘤细胞”中计算，而不使用“非肿瘤细胞”，我们首先使用检测模型检测肿瘤细胞。然后，我们通过引入掩模技术来“从标签比例学习”来估计PD-L1比例。此外，我们提出了一种加权的焦点比例损失来解决数据不平衡问题。使用临床数据进行的实验证明了我们方法的有效性。我们的方法在比较中取得了最佳性能。

更新时间: 2024-05-08 05:29:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04815v1

ToDo: Token Downsampling for Efficient Generation of High-Resolution Images

Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms. We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048x2048. We demonstrate that our approach outperforms previous methods in balancing efficient throughput and fidelity.

Updated: 2024-05-08 05:09:48

标题: 待办事项：令牌下采样以高效生成高分辨率图像

摘要: 注意机制对于图像扩散模型至关重要，然而，它们的二次计算复杂度限制了我们可以在合理时间和内存约束内处理的图像大小。本文研究了稠密注意在生成图像模型中的重要性，这些模型通常包含冗余特征，使它们适合更稀疏的注意机制。我们提出了一种新颖的无训练方法 ToDo，依赖于对关键和值令牌进行降采样，使得对于常见尺寸的Stable Diffusion推断加速高达2倍，对于像2048x2048这样的高分辨率，加速高达4.5倍甚至更多。我们展示了我们的方法在平衡高效吞吐量和保真度方面优于先前的方法。

更新时间: 2024-05-08 05:09:48

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.13573v3

A Novel Technique for Query Plan Representation Based on Graph Neural Networks

Learning representations for query plans play a pivotal role in machine learning-based query optimizers of database management systems. To this end, particular model architectures are proposed in the literature to convert the tree-structured query plans into representations with formats learnable by downstream machine learning models. However, existing research rarely compares and analyzes the query plan representation capabilities of these tree models and their direct impact on the performance of the overall optimizer. To address this problem, we perform a comparative study to explore the effect of using different state-of-the-art tree models on the optimizer's cost estimation and plan selection performance in relatively complex workloads. Additionally, we explore the possibility of using graph neural networks (GNN) in the query plan representation task. We propose a novel tree model combining directed GNN with Gated Recurrent Units (GRU) and demonstrate experimentally that the new tree model provides significant improvements to cost estimation tasks and relatively excellent plan selection performance compared to the state-of-the-art tree models.

Updated: 2024-05-08 04:59:59

标题: 一种基于图神经网络的查询计划表示的新技术

摘要: 学习查询计划的表示在基于机器学习的数据库管理系统查询优化器中起着至关重要的作用。为此，文献中提出了特定的模型架构，将树形结构的查询计划转换为适合下游机器学习模型学习的表示形式。然而，现有研究很少比较和分析这些树模型的查询计划表示能力及其对整体优化器性能的直接影响。为了解决这一问题，我们进行了一项比较研究，探讨在相对复杂的工作负载中使用不同最先进的树模型对优化器的成本估算和计划选择性能的影响。此外，我们还探讨了在查询计划表示任务中使用图神经网络（GNN）的可能性。我们提出了一种新颖的树模型，将有向GNN与门控循环单元（GRU）结合起来，并通过实验证明，与最先进的树模型相比，这种新树模型在成本估算任务和相对优秀的计划选择性能方面提供了显著的改进。

更新时间: 2024-05-08 04:59:59

领域: cs.DB,cs.AI

下载: http://arxiv.org/abs/2405.04814v1

Graphon Mean Field Games with A Representative Player: Analysis and Learning Algorithm

We propose a discrete-time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.

Updated: 2024-05-08 04:44:16

标题: 图上均场博弈与代表性玩家：分析与学习算法

摘要: 我们提出了一种在连续状态和行动空间上使用代表性玩家的离散时间图状博弈形式，以研究具有异质相互作用的代理的随机博弈。与广泛采用的使用连续玩家的形式相比，这种形式在哲学上和数学上都具有优势。我们在温和的假设下证明了图状均衡的存在性和唯一性，并且表明这种均衡可以用来构建网络上有限玩家博弈的近似解，由于维度的诅咒，这种博弈具有挑战性的分析和解决。我们开发了一种在线无oracle学习算法来数值解决均衡，并为其收敛性提供了样本复杂度分析。

更新时间: 2024-05-08 04:44:16

领域: math.OC,cs.AI

下载: http://arxiv.org/abs/2405.08005v1

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports not only are long but also use numbers and tables extensively. We propose a computational framework for characterizing multimodal long-form summarization and investigate the behavior of Claude 2.0/2.1, GPT-4/3.5, and Command. We find that GPT-3.5 and Command fail to perform this summarization task meaningfully. For Claude 2 and GPT-4, we analyze the extractiveness of the summary and identify a position bias in LLMs. This position bias disappears after shuffling the input for Claude, which suggests that Claude has the ability to recognize important information. We also conduct a comprehensive investigation on the use of numeric data in LLM-generated summaries and offer a taxonomy of numeric hallucination. We employ prompt engineering to improve GPT-4's use of numbers with limited success. Overall, our analyses highlight the strong capability of Claude 2 in handling long multimodal inputs compared to GPT-4.

Updated: 2024-05-08 04:36:07

标题: 对多模式长篇摘要进行表征：基于财务报告的案例研究

摘要: 随着大型语言模型（LLMs）扩展了自然语言处理的能力，以处理长篇输入，需要进行严格和系统的分析来了解它们的能力和行为。一个显著的应用是摘要，由于其普遍性和争议（例如，研究人员宣布了摘要的消亡）。在本文中，我们以财务报告摘要为案例研究，因为财务报告不仅长而且广泛使用数字和表格。我们提出了一个用于表征多模式长篇摘要的计算框架，并研究了Claude 2.0/2.1、GPT-4/3.5和Command的行为。我们发现GPT-3.5和Command未能有意义地完成这个摘要任务。对于Claude 2和GPT-4，我们分析了摘要的摘要性，并在LLMs中识别了位置偏见。在对Claude进行输入混洗后，这种位置偏见消失了，这表明Claude有识别重要信息的能力。我们还对LLM生成的摘要中数字数据的使用进行了全面调查，并提供了数字幻觉的分类法。我们采用提示工程来改善GPT-4对数字的使用，但效果有限。总的来说，我们的分析突显了相比GPT-4，Claude 2在处理长篇多模式输入方面的强大能力。

更新时间: 2024-05-08 04:36:07

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.06162v2

On the power of graph neural networks and the role of the activation function

In this article we present new results about the expressivity of Graph Neural Networks (GNNs). We prove that for any GNN with piecewise polynomial activations, whose architecture size does not grow with the graph input sizes, there exists a pair of non-isomorphic rooted trees of depth two such that the GNN cannot distinguish their root vertex up to an arbitrary number of iterations. In contrast, it was already known that unbounded GNNs (those whose size is allowed to change with the graph sizes) with piecewise polynomial activations can distinguish these vertices in only two iterations. It was also known prior to our work that with ReLU (piecewise linear) activations, bounded GNNs are weaker than unbounded GNNs [ACI+22]. Our approach adds to this result by extending it to handle any piecewise polynomial activation function, which goes towards answering an open question formulated by [2021, Grohe] more completely. Our second result states that if one allows activations that are not piecewise polynomial, then in two iterations a single neuron perceptron can distinguish the root vertices of any pair of nonisomorphic trees of depth two (our results hold for activations like the sigmoid, hyperbolic tan and others). This shows how the power of graph neural networks can change drastically if one changes the activation function of the neural networks. The proof of this result utilizes the Lindemann-Weierstrauss theorem from transcendental number theory.

Updated: 2024-05-08 04:34:47

标题: 图神经网络的能力和激活函数的作用

摘要: 在这篇文章中，我们提出了有关图神经网络（GNNs）表现力的新结果。我们证明，对于任何具有分段多项式激活函数且其架构大小不随图输入大小增长的GNN，存在一对深度为两的非同构根树，使得GNN在任意次迭代中无法区分它们的根顶点。相比之下，已经知道具有分段多项式激活函数的无界GNN（其大小允许随图大小变化）只需两次迭代便能区分这些顶点。我们的工作之前已知，使用ReLU（分段线性）激活函数时，有界GNN要比无界GNN弱。我们的方法将这一结果推广到处理任何分段多项式激活函数，这对回答Grohe在2021年提出的一个开放问题有所帮助。我们的第二个结果表明，如果激活函数不是分段多项式，那么在两次迭代中，单个神经元感知器可以区分深度为两的任意一对非同构树的根顶点（我们的结果适用于类似sigmoid、双曲正切等的激活函数）。这显示了当改变神经网络的激活函数时，图神经网络的能力如何发生巨大变化。这一结果的证明利用了超越数论中的林德曼-韦斯特拉斯定理。

更新时间: 2024-05-08 04:34:47

领域: cs.LG

下载: http://arxiv.org/abs/2307.04661v6

Blockchains for Internet of Things: Fundamentals, Applications, and Challenges

Internet of Things (IoT) services necessitate the storage, transmission, and analysis of diverse data for inference, autonomy, and control. Blockchains, with their inherent properties of decentralization and security, offer efficient database solutions for these devices through consensus-based data sharing. However, it's essential to recognize that not every blockchain system is suitable for specific IoT applications, and some might be more beneficial when excluded with privacy concerns. For example, public blockchains are not suitable for storing sensitive data. This paper presents a detailed review of three distinct blockchains tailored for enhancing IoT applications. We initially delve into the foundational aspects of three blockchain systems, highlighting their strengths, limitations, and implementation needs. Additionally, we discuss the security issues in different blockchains. Subsequently, we explore the blockchain's application in three pivotal IoT areas: edge AI, communications, and healthcare. We underscore potential challenges and the future directions for integrating different blockchains in IoT. Ultimately, this paper aims to offer a comprehensive perspective on the synergies between blockchains and the IoT ecosystem, highlighting the opportunities and complexities involved.

Updated: 2024-05-08 04:25:57

标题: 区块链在物联网中的应用：基础知识、应用和挑战

摘要: 物联网（IoT）服务需要存储、传输和分析多样化数据，用于推理、自主性和控制。区块链具有去中心化和安全性的固有属性，通过基于共识的数据共享，为这些设备提供高效的数据库解决方案。然而，必须认识到，并非每个区块链系统都适用于特定的物联网应用，有些可能在涉及隐私问题时更具益处。例如，公共区块链不适合存储敏感数据。本文详细审查了三种专为增强物联网应用而设计的区块链。我们首先深入探讨了三个区块链系统的基础方面，突出它们的优势、局限性和实施需求。此外，我们讨论了不同区块链中的安全问题。随后，我们探讨了区块链在三个关键的物联网领域中的应用：边缘人工智能、通信和医疗保健。我们强调了整合不同区块链在物联网中的潜在挑战和未来发展方向。最终，本文旨在提供关于区块链与物联网生态系统之间协同作用的全面视角，突出涉及的机遇和复杂性。

更新时间: 2024-05-08 04:25:57

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2405.04803v1

An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT

The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians, and it is typically written by radiologists based on the 'Findings' section. However, writing numerous impressions can be laborious and error-prone for radiologists. Although recent studies have achieved promising results in automatic impression generation using large-scale medical text data for pre-training and fine-tuning pre-trained language models, such models often require substantial amounts of medical text data and have poor generalization performance. While large language models (LLMs) like ChatGPT have shown strong generalization capabilities and performance, their performance in specific domains, such as radiology, remains under-investigated and potentially limited. To address this limitation, we propose ImpressionGPT, which leverages the in-context learning capability of LLMs by constructing dynamic contexts using domain-specific, individualized data. This dynamic prompt approach enables the model to learn contextual knowledge from semantically similar examples from existing data. Additionally, we design an iterative optimization algorithm that performs automatic evaluation on the generated impression results and composes the corresponding instruction prompts to further optimize the model. The proposed ImpressionGPT model achieves state-of-the-art performance on both MIMIC-CXR and OpenI datasets without requiring additional training data or fine-tuning the LLMs. This work presents a paradigm for localizing LLMs that can be applied in a wide range of similar application scenarios, bridging the gap between general-purpose LLMs and the specific language processing needs of various domains.

Updated: 2024-05-08 04:22:26

标题: 一个迭代优化框架用于利用ChatGPT进行放射学报告摘要化

摘要: 放射学报告的“印象”部分是放射科医生与其他医生之间沟通的关键基础，通常是基于“发现”部分由放射科医生撰写。然而，写作大量印象对放射科医生来说可能是费力且容易出错的。虽然最近的研究在使用大规模医学文本数据进行预训练和微调预训练语言模型的自动生成印象方面取得了有希望的结果，但这些模型通常需要大量医学文本数据，并且泛化性能较差。虽然像ChatGPT这样的大型语言模型(LLMs)已经展现出强大的泛化能力和性能，但它们在特定领域，如放射学，仍未得到充分调查，潜在受限。为了解决这一限制，我们提出了ImpressionGPT，它利用LLMs的上下文学习能力，通过使用特定领域和个性化数据构建动态上下文。这种动态提示方法使模型能够从现有数据中的语义相似示例中学习上下文知识。此外，我们设计了一个迭代优化算法，对生成的印象结果进行自动评估，并组成相应的指令提示以进一步优化模型。所提出的ImpressionGPT模型在MIMIC-CXR和OpenI数据集上均取得了最先进的性能，而无需额外的训练数据或对LLMs进行微调。这项工作提出了一种将LLMs定位到可以应用于各种类似应用场景的范例，弥合了通用LLMs与各个领域具体语言处理需求之间的差距。

更新时间: 2024-05-08 04:22:26

领域: cs.CL,cs.AI,68T50, 68T37, 68T20,I.2.7

下载: http://arxiv.org/abs/2304.08448v3

DeepDamageNet: A two-step deep-learning model for multi-disaster building damage segmentation and classification using satellite imagery

Satellite imagery has played an increasingly important role in post-disaster building damage assessment. Unfortunately, current methods still rely on manual visual interpretation, which is often time-consuming and can cause very low accuracy. To address the limitations of manual interpretation, there has been a significant increase in efforts to automate the process. We present a solution that performs the two most important tasks in building damage assessment, segmentation and classification, through deep-learning models. We show our results submitted as part of the xView2 Challenge, a competition to design better models for identifying buildings and their damage level after exposure to multiple kinds of natural disasters. Our best model couples a building identification semantic segmentation convolutional neural network (CNN) to a building damage classification CNN, with a combined F1 score of 0.66, surpassing the xView2 challenge baseline F1 score of 0.28. We find that though our model was able to identify buildings with relatively high accuracy, building damage classification across various disaster types is a difficult task due to the visual similarity between different damage levels and different damage distribution between disaster types, highlighting the fact that it may be important to have a probabilistic prior estimate regarding disaster damage in order to obtain accurate predictions.

Updated: 2024-05-08 04:21:03

标题: DeepDamageNet：一种两步深度学习模型，用于利用卫星图像进行多灾害建筑损坏分割和分类

摘要: 卫星图像在灾后建筑损坏评估中发挥着越来越重要的作用。不幸的是，目前的方法仍然依赖于手动视觉解释，这往往耗时且精度很低。为了解决手动解释的局限性，人们开始大力推进自动化流程。我们提出了一种通过深度学习模型执行建筑损坏评估中两个最重要任务（分割和分类）的解决方案。我们展示了我们在xView2挑战赛中提交的结果，该挑战赛旨在设计更好的模型，以识别建筑物及其在多种自然灾害后的损坏程度。我们最佳模型将建筑物识别语义分割卷积神经网络（CNN）与建筑物损坏分类CNN相结合，综合F1分数为0.66，超过了xView2挑战赛的基准F1分数0.28。我们发现，尽管我们的模型能够相对高准确地识别建筑物，但跨不同灾害类型的建筑物损坏分类是一个困难的任务，因为不同损坏程度之间存在视觉相似性，不同灾害类型之间存在不同损坏分布，这突显了具有关于灾害损害的概率先验估计可能对获得准确的预测至关重要。

更新时间: 2024-05-08 04:21:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04800v1

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

Hierarchical control for robotics has long been plagued by the need to have a well defined interface layer to communicate between high-level task planners and low-level policies. With the advent of LLMs, language has been emerging as a prospective interface layer. However, this has several limitations. Not all tasks can be decomposed into steps that are easily expressible in natural language (e.g. performing a dance routine). Further, it makes end-to-end finetuning on embodied data challenging due to domain shift and catastrophic forgetting. We introduce our method -- Learnable Latent Codes as Bridges (LCB) -- as an alternate architecture to overcome these limitations. \method~uses a learnable latent code to act as a bridge between LLMs and low-level policies. This enables LLMs to flexibly communicate goals in the task plan without being entirely constrained by language limitations. Additionally, it enables end-to-end finetuning without destroying the embedding space of word tokens learned during pre-training. Through experiments on Language Table and Calvin, two common language based benchmarks for embodied agents, we find that \method~outperforms baselines (including those w/ GPT-4V) that leverage pure language as the interface layer on tasks that require reasoning and multi-step behaviors.

Updated: 2024-05-08 04:14:06

标题: 从LLMs到行动：潜在编码作为层次机器人控制中的桥梁

摘要: 长期以来，机器人的分层控制一直受到一个问题困扰，即需要一个明确定义的接口层，用于高层任务规划器和低层策略之间的通信。随着LLMs的出现，语言作为潜在的接口层逐渐出现。然而，这种方法有一些局限性。并非所有任务都可以分解为在自然语言中容易表达的步骤（例如执行舞蹈程序）。此外，由于领域转移和灾难性遗忘的原因，对具体数据进行端到端微调是具有挑战性的。我们引入了我们的方法--Learnable Latent Codes as Bridges (LCB)--作为一个克服这些限制的替代架构。该方法使用可学习的潜在代码作为LLMs和低层策略之间的桥梁。这使得LLMs能够灵活地在任务计划中沟通目标，而不完全受限于语言限制。此外，它还能够实现端到端微调，而不破坏在预训练期间学习的单词标记的嵌入空间。通过对基于语言的两个常见基准测试Language Table和Calvin进行的实验，我们发现\method~在需要推理和多步行为的任务上胜过了基线（包括那些利用纯语言作为接口层的GPT-4V）。

更新时间: 2024-05-08 04:14:06

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.04798v1

Automated Conversion of Static to Dynamic Scheduler via Natural Language

In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect changes in the scheduling rules. Furthermore, it may be necessary to turn a static model into a dynamic one in order to cope with disturbances in the environment. In this paper, we propose a Retrieval-Augmented Generation (RAG) based LLM model to automate the process of implementing constraints for Dynamic Scheduling (RAGDyS), without seeking help from an optimization modeling expert. Our framework aims to minimize technical complexities related to mathematical modelling and computational workload for end-users, thereby allowing end-users to quickly obtain a new schedule close to the original schedule with changes reflected by natural language constraint descriptions.

Updated: 2024-05-08 04:07:38

标题: 自然语言下的静态调度程序自动转换为动态调度程序

摘要: 在本文中，我们探讨了大型语言模型（LLMs）在自动建模约束并为动态调度问题生成代码方面的潜在应用。静态调度问题由优化专家建模和编码。这些模型可能很容易过时，因为底层约束可能需要微调以反映调度规则的变化。此外，为了应对环境中的干扰，可能需要将静态模型转变为动态模型。在本文中，我们提出了一种基于检索增强生成（RAG）的LLM模型，用于自动实现动态调度的约束（RAGDyS），而无需寻求优化建模专家的帮助。我们的框架旨在最小化与数学建模和计算工作负载相关的技术复杂性，从而使最终用户能够快速获得一个与原始调度接近的新调度，其中变化通过自然语言约束描述反映出来。

更新时间: 2024-05-08 04:07:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06697v1

Variational Schrödinger Diffusion Models

Schr\"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the costly implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize the forward score functions (variational scores) of SB and restore simulation-free properties in training backward scores. We propose the variational Schr\"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport. Theoretically, we use stochastic approximation to prove the convergence of the variational scores and show the convergence of the adaptively generated samples based on the optimal variational scores. Empirically, we test the algorithm in simulated examples and observe that VSDM is efficient in generations of anisotropic shapes and yields straighter sample trajectories compared to the single-variate diffusion. We also verify the scalability of the algorithm in real-world data and achieve competitive unconditional generation performance in CIFAR10 and conditional generation in time series modeling. Notably, VSDM no longer depends on warm-up initializations and has become tuning-friendly in training large-scale experiments.

Updated: 2024-05-08 04:01:40

标题: 变分薛定谔扩散模型

摘要: 薛定谔桥（SB）已成为扩散模型中优化运输计划的首选方法。然而，SB需要估计难以处理的前向评分函数，不可避免地导致基于模拟轨迹的昂贵的隐式训练损失。为了提高可扩展性同时保持高效的运输计划，我们利用变分推断来线性化SB的前向评分函数（变分评分），并在训练后向评分时恢复无需模拟的特性。我们提出了变分薛定谔扩散模型（VSDM），其中前向过程是多元扩散，变分评分被自适应优化以实现高效输运。从理论上讲，我们使用随机逼近来证明变分评分的收敛性，并展示基于最优变分评分生成的样本的收敛性。在实证方面，我们在模拟示例中测试算法，并观察到VSDM在生成各向异性形状方面效率高，并且与单元扩散相比，生成的样本轨迹更直。我们还验证了算法在真实数据中的可扩展性，并在CIFAR10中实现了竞争性的无条件生成性能，并在时间序列建模中实现了条件生成。值得注意的是，VSDM不再依赖热身初始化，并在训练大规模实验中变得更加友好。

更新时间: 2024-05-08 04:01:40

领域: cs.LG

下载: http://arxiv.org/abs/2405.04795v1

Zero-shot LLM-guided Counterfactual Generation for Text

Counterfactual examples are frequently used for model development and evaluation in many natural language processing (NLP) tasks. Although methods for automated counterfactual generation have been explored, such methods depend on models such as pre-trained language models that are then fine-tuned on auxiliary, often task-specific datasets. Collecting and annotating such datasets for counterfactual generation is labor intensive and therefore, infeasible in practice. Therefore, in this work, we focus on a novel problem setting: \textit{zero-shot counterfactual generation}. To this end, we propose a structured way to utilize large language models (LLMs) as general purpose counterfactual example generators. We hypothesize that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged for generating high quality counterfactuals in a zero-shot manner, without requiring any training or fine-tuning. Through comprehensive experiments on various downstream tasks in natural language processing (NLP), we demonstrate the efficacy of LLMs as zero-shot counterfactual generators in evaluating and explaining black-box NLP models.

Updated: 2024-05-08 03:57:45

标题: 零样本LLM引导的文本反事实生成

摘要: 反事实例在许多自然语言处理（NLP）任务的模型开发和评估中经常被使用。尽管自动生成反事实的方法已经被探讨，但这些方法依赖于诸如预训练语言模型之类的模型，然后在辅助的、通常是任务特定的数据集上进行微调。为了生成反事实而收集和注释这样的数据集是费时费力的，因此在实践中是不可行的。因此，在这项工作中，我们专注于一个新颖的问题设置：\textit{零-shot反事实生成}。为此，我们提出了一种结构化的方法，利用大型语言模型（LLMs）作为通用反事实示例生成器。我们假设最近LLMs的指令遵循和文本理解能力可以有效地用于以零-shot方式生成高质量的反事实，而无需任何训练或微调。通过对自然语言处理（NLP）中各种下游任务进行全面实验，我们展示了LLMs作为零-shot反事实生成器在评估和解释黑盒NLP模型方面的有效性。

更新时间: 2024-05-08 03:57:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04793v1

Learning Social Graph for Inactive User Recommendation

Social relations have been widely incorporated into recommender systems to alleviate data sparsity problem. However, raw social relations don't always benefit recommendation due to their inferior quality and insufficient quantity, especially for inactive users, whose interacted items are limited. In this paper, we propose a novel social recommendation method called LSIR (\textbf{L}earning \textbf{S}ocial Graph for \textbf{I}nactive User \textbf{R}ecommendation) that learns an optimal social graph structure for social recommendation, especially for inactive users. LSIR recursively aggregates user and item embeddings to collaboratively encode item and user features. Then, graph structure learning (GSL) is employed to refine the raw user-user social graph, by removing noisy edges and adding new edges based on the enhanced embeddings. Meanwhile, mimic learning is implemented to guide active users in mimicking inactive users during model training, which improves the construction of new edges for inactive users. Extensive experiments on real-world datasets demonstrate that LSIR achieves significant improvements of up to 129.58\% on NDCG in inactive user recommendation. Our code is available at~\url{https://github.com/liun-online/LSIR}.

Updated: 2024-05-08 03:40:36

标题: 学习社交图用于不活跃用户的推荐

摘要: 社交关系已被广泛整合到推荐系统中，以缓解数据稀疏问题。然而，原始社交关系并不总是有益于推荐，因为它们的质量较差且数量不足，尤其是对于交互较少的用户，其交互的物品有限。在本文中，我们提出了一种新颖的社交推荐方法，称为LSIR（\textbf{L}earning \textbf{S}ocial Graph for \textbf{I}nactive User \textbf{R}ecommendation），该方法学习了一个优化的社交图结构，特别适用于交互较少的用户。LSIR递归地聚合用户和物品嵌入，共同编码物品和用户特征。然后，采用图结构学习（GSL）来优化原始的用户-用户社交图，通过删除噪声边缘并根据增强的嵌入来添加新边缘。同时，在模型训练过程中实施模仿学习，以指导活跃用户模仿不活跃用户，从而改善不活跃用户的新边缘构建。对真实世界数据集的广泛实验表明，LSIR在不活跃用户推荐的NDCG上取得了高达129.58％的显著改进。我们的代码可在以下网址找到：\url{https://github.com/liun-online/LSIR}。

更新时间: 2024-05-08 03:40:36

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2405.05288v1

Few-Shot Class Incremental Learning via Robust Transformer Approach

Few-Shot Class-Incremental Learning presents an extension of the Class Incremental Learning problem where a model is faced with the problem of data scarcity while addressing the catastrophic forgetting problem. This problem remains an open problem because all recent works are built upon the convolutional neural networks performing sub-optimally compared to the transformer approaches. Our paper presents Robust Transformer Approach built upon the Compact Convolution Transformer. The issue of overfitting due to few samples is overcome with the notion of the stochastic classifier, where the classifier's weights are sampled from a distribution with mean and variance vectors, thus increasing the likelihood of correct classifications, and the batch-norm layer to stabilize the training process. The issue of CF is dealt with the idea of delta parameters, small task-specific trainable parameters while keeping the backbone networks frozen. A non-parametric approach is developed to infer the delta parameters for the model's predictions. The prototype rectification approach is applied to avoid biased prototype calculations due to the issue of data scarcity. The advantage of ROBUSTA is demonstrated through a series of experiments in the benchmark problems where it is capable of outperforming prior arts with big margins without any data augmentation protocols.

Updated: 2024-05-08 03:35:52

标题: 通过强大的Transformer方法实现少样本类增量学习

摘要: Few-Shot Class-Incremental Learning是Class Incremental Learning问题的一个扩展，其中模型面临着数据稀缺性和解决灾难性遗忘问题的挑战。这个问题仍然是一个开放问题，因为所有最近的工作都建立在卷积神经网络的表现相对不佳的基础上，与变压器方法相比。我们的论文提出了基于紧凑卷积变压器的强大变压器方法。由于样本稀缺导致的过拟合问题通过随机分类器的概念得以解决，其中分类器的权重从具有均值和方差向量的分布中抽样，从而增加了正确分类的可能性，并且使用批量归一化层来稳定训练过程。通过使用小的任务特定可训练参数来处理CF问题，同时保持骨干网络冻结。为了推断模型预测的delta参数，开发了一种非参数方法。原型修正方法被应用以避免由于数据稀缺问题导致的有偏原型计算。通过一系列在基准问题中的实验，ROBUSTA的优势得到了展示，它能够在没有任何数据增强协议的情况下大幅度地超越先前的技术。

更新时间: 2024-05-08 03:35:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.05984v1

Multi-level Shared Knowledge Guided Learning for Knowledge Graph Completion

In the task of Knowledge Graph Completion (KGC), the existing datasets and their inherent subtasks carry a wealth of shared knowledge that can be utilized to enhance the representation of knowledge triplets and overall performance. However, no current studies specifically address the shared knowledge within KGC. To bridge this gap, we introduce a multi-level Shared Knowledge Guided learning method (SKG) that operates at both the dataset and task levels. On the dataset level, SKG-KGC broadens the original dataset by identifying shared features within entity sets via text summarization. On the task level, for the three typical KGC subtasks - head entity prediction, relation prediction, and tail entity prediction - we present an innovative multi-task learning architecture with dynamically adjusted loss weights. This approach allows the model to focus on more challenging and underperforming tasks, effectively mitigating the imbalance of knowledge sharing among subtasks. Experimental results demonstrate that SKG-KGC outperforms existing text-based methods significantly on three well-known datasets, with the most notable improvement on WN18RR.

Updated: 2024-05-08 03:27:46

标题: 多层共享知识引导的知识图补全学习

摘要: 在知识图谱补全（KGC）任务中，现有数据集及其固有的子任务包含丰富的共享知识，可以用来增强知识三元组的表示和整体性能。然而，目前没有研究专门解决KGC中的共享知识。为了弥补这一差距，我们引入了一种多级共享知识引导学习方法（SKG），在数据集和任务级别上均有操作。在数据集级别上，SKG-KGC通过文本摘要识别实体集内的共享特征，扩展原始数据集。在任务级别上，针对三种典型的KGC子任务 - 头实体预测、关系预测和尾实体预测 - 我们提出了一种创新的多任务学习架构，具有动态调整的损失权重。这种方法使模型能够专注于更具挑战性和表现不佳的任务，有效地缓解了子任务之间知识共享不平衡的问题。实验结果表明，SKG-KGC在三个知名数据集上明显优于现有的基于文本的方法，其中在WN18RR数据集上的改进最为显著。

更新时间: 2024-05-08 03:27:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06696v1

CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer

The advent of data-driven weather forecasting models, which learn from hundreds of terabytes (TB) of reanalysis data, has significantly advanced forecasting capabilities. However, the substantial costs associated with data storage and transmission present a major challenge for data providers and users, affecting resource-constrained researchers and limiting their accessibility to participate in AI-based meteorological research. To mitigate this issue, we introduce an efficient neural codec, the Variational Autoencoder Transformer (VAEformer), for extreme compression of climate data to significantly reduce data storage cost, making AI-based meteorological research portable to researchers. Our approach diverges from recent complex neural codecs by utilizing a low-complexity Auto-Encoder transformer. This encoder produces a quantized latent representation through variance inference, which reparameterizes the latent space as a Gaussian distribution. This method improves the estimation of distributions for cross-entropy coding. Extensive experiments demonstrate that our VAEformer outperforms existing state-of-the-art compression methods in the context of climate data. By applying our VAEformer, we compressed the most popular ERA5 climate dataset (226 TB) into a new dataset, CRA5 (0.7 TB). This translates to a compression ratio of over 300 while retaining the dataset's utility for accurate scientific analysis. Further, downstream experiments show that global weather forecasting models trained on the compact CRA5 dataset achieve forecasting accuracy comparable to the model trained on the original dataset. Code, the CRA5 dataset, and the pre-trained model are available at https://github.com/taohan10200/CRA5.

Updated: 2024-05-08 03:27:04

标题: CRA5：通过高效变分转换器对ERA5进行极端压缩，用于便携全球气候和天气研究

摘要: 数据驱动的天气预报模型的出现，从数百TB的再分析数据中学习，显著推进了预测能力。然而，与数据存储和传输相关的重大成本构成了数据提供者和用户的主要挑战，影响了资源受限的研究人员，并限制了他们参与基于人工智能的气象研究的可及性。为了缓解这一问题，我们引入了一种高效的神经编解码器，即变分自编码器变换器（VAEformer），用于极端压缩气候数据，显著降低数据存储成本，使基于人工智能的气象研究对研究人员具有可携性。我们的方法与最近复杂的神经编解码器不同，它利用低复杂度的自编码器变换器。该编码器通过方差推理生成量化的潜在表示，将潜在空间重新参数化为高斯分布。该方法改善了交叉熵编码的分布估计。大量实验表明，我们的VAEformer在气候数据背景下优于现有的最先进的压缩方法。通过应用我们的VAEformer，我们将最受欢迎的ERA5气候数据集（226 TB）压缩成一个新数据集CRA5（0.7 TB）。这相当于超过300的压缩比，同时保留了数据集用于准确的科学分析的实用性。此外，下游实验表明，在紧凑的CRA5数据集上训练的全球天气预报模型实现了与原始数据集上训练的模型相媲美的预测准确性。代码、CRA5数据集和预训练模型可在https://github.com/taohan10200/CRA5获取。

更新时间: 2024-05-08 03:27:04

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.03376v2

Robust Model Selection of Gaussian Graphical Models

In Gaussian graphical model selection, noise-corrupted samples present significant challenges. It is known that even minimal amounts of noise can obscure the underlying structure, leading to fundamental identifiability issues. A recent line of work addressing this "robust model selection" problem narrows its focus to tree-structured graphical models. Even within this specific class of models, exact structure recovery is shown to be impossible. However, several algorithms have been developed that are known to provably recover the underlying tree-structure up to an (unavoidable) equivalence class. In this paper, we extend these results beyond tree-structured graphs. We first characterize the equivalence class up to which general graphs can be recovered in the presence of noise. Despite the inherent ambiguity (which we prove is unavoidable), the structure that can be recovered reveals local clustering information and global connectivity patterns in the underlying model. Such information is useful in a range of real-world problems, including power grids, social networks, protein-protein interactions, and neural structures. We then propose an algorithm which provably recovers the underlying graph up to the identified ambiguity. We further provide finite sample guarantees in the high-dimensional regime for our algorithm and validate our results through numerical simulations.

Updated: 2024-05-08 03:26:22

标题: 高斯图模型的稳健模型选择

摘要: 在高斯图模型选择中，受到噪声干扰的样本会带来重大挑战。已知即使是微小量的噪声也会掩盖底层结构，导致根本的可识别性问题。最近一系列研究致力于解决这个“鲁棒模型选择”问题，将焦点缩小到树形结构图模型。即使在这个特定类别的模型中，精确的结构恢复也被证明是不可能的。然而，已经开发了几种算法，据称可以明确地恢复底层树形结构直到一个（不可避免的）等价类。在本文中，我们将这些结果扩展到超出树形结构图。我们首先表征了在存在噪声的情况下通用图可以恢复到的等价类。尽管存在固有的模糊性（我们证明是不可避免的），但可以恢复的结构揭示了底层模型中的局部聚类信息和全局连通性模式。这种信息在一系列现实世界问题中是有用的，包括电力网格、社交网络、蛋白质相互作用和神经结构。然后，我们提出了一个算法，可以明确地恢复出底层图，直到确定的模糊性。我们进一步为我们的算法在高维情况下提供了有限样本保证，并通过数值模拟验证了我们的结果。

更新时间: 2024-05-08 03:26:22

领域: stat.ML,cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2211.05690v2

Real-Time Pill Identification for the Visually Impaired Using Deep Learning

The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to accurately recognize and differentiate between various pill types through real-time image processing on mobile devices. The system incorporates Text-to- Speech (TTS) to provide immediate auditory feedback, enhancing usability and independence for visually impaired users. Our study evaluates the application's effectiveness in terms of detection accuracy and user experience, highlighting its potential to improve medication management and safety among the visually impaired community. Keywords-Deep Learning; YOLO Framework; Mobile Application; Visual Impairment; Pill Identification; Healthcare

Updated: 2024-05-08 03:18:46

标题: 使用深度学习技术实现盲人实时药丸识别

摘要: 移动技术的普及为解决医疗挑战提供了独特的机会，特别是对于视力受损的个体。本文探讨了一种基于深度学习的移动应用程序的开发和实施，旨在帮助盲人和视力受损个体实时识别药丸。该应用程序利用YOLO框架，旨在通过移动设备上的实时图像处理准确识别和区分各种药丸类型。该系统结合了文本转语音（TTS）技术，提供即时听觉反馈，增强了视力受损用户的可用性和独立性。我们的研究评估了该应用程序在检测准确性和用户体验方面的有效性，突出了其改善视力受损社区药物管理和安全性的潜力。关键词-深度学习；YOLO框架；移动应用程序；视力受损；药丸识别；医疗。

更新时间: 2024-05-08 03:18:46

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.05983v1

Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks

An important issue impacting healthcare is a lack of available experts. Machine learning (ML) models could resolve this by aiding in diagnosing patients. However, creating datasets large enough to train these models is expensive. We evaluated large language models (LLMs) for data creation. Using Autism Spectrum Disorders (ASD), we prompted ChatGPT and GPT-Premium to generate 4,200 synthetic observations to augment existing medical data. Our goal is to label behaviors corresponding to autism criteria and improve model accuracy with synthetic training data. We used a BERT classifier pre-trained on biomedical literature to assess differences in performance between models. A random sample (N=140) from the LLM-generated data was evaluated by a clinician and found to contain 83% correct example-label pairs. Augmenting data increased recall by 13% but decreased precision by 16%, correlating with higher quality and lower accuracy across pairs. Future work will analyze how different synthetic data traits affect ML outcomes.

Updated: 2024-05-08 03:18:12

标题: 利用大型语言模型生成合成数据以提高基于BERT的神经网络性能

摘要: 一个影响医疗保健的重要问题是缺乏可用的专家。机器学习（ML）模型可以通过帮助诊断患者来解决这个问题。然而，创建足够大的数据集来训练这些模型是昂贵的。我们评估了大型语言模型（LLMs）用于数据创建。使用自闭症谱系障碍（ASD），我们促使ChatGPT和GPT-Premium生成4,200个合成观察结果，以增加现有医疗数据。我们的目标是标记与自闭症标准对应的行为，并通过合成训练数据提高模型准确性。我们使用在生物医学文献上预训练的BERT分类器来评估模型之间性能差异。临床医生对由LLM生成的数据的随机样本（N=140）进行了评估，发现其中包含83%的正确示例-标签对。增加数据提高了召回率13%，但降低了精确度16%，与对之间的质量更高和准确性更低相一致。未来的工作将分析不同合成数据特征如何影响ML结果。

更新时间: 2024-05-08 03:18:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.06695v1

Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning

Stochastic optimization algorithms, particularly stochastic policy gradient (SPG), report significant success in reinforcement learning (RL). Nevertheless, up to now, that how to speedily acquire an optimal solution for RL is still a challenge. To tackle this issue, this work develops a fast SPG algorithm from the perspective of utilizing a momentum, coined SPG-NM. Specifically, in SPG-NM, a novel type of the negative momentum (NM) technique is applied into the classical SPG algorithm. Different from the existing NM techniques, we have adopted a few hyper-parameters in our SPG-NM algorithm. Moreover, the computational complexity is nearly same as the modern SPG-type algorithms, e.g., accelerated policy gradient (APG), which equips SPG with Nesterov's accelerated gradient (NAG). We evaluate the resulting algorithm on two classical tasks, bandit setting and Markov decision process (MDP). Numerical results in different tasks demonstrate faster convergence rate of the resulting algorithm by comparing state-of-the-art algorithms, which confirm the positive impact of NM in accelerating SPG for RL. Also, numerical experiments under different settings confirm the robustness of our SPG-NM algorithm for some certain crucial hyper-parameters, which ride the user feel free in practice.

Updated: 2024-05-08 03:01:05

标题: 快速随机策略梯度：负动量用于强化学习

摘要: 随机优化算法，特别是随机策略梯度（SPG），在强化学习（RL）中取得了显著的成功。然而，迄今为止，如何快速获取RL的最优解仍然是一个挑战。为了解决这个问题，本文从利用动量的视角开发了一种快速的SPG算法，称为SPG-NM。具体而言，在SPG-NM中，一种新型的负动量（NM）技术被应用到经典的SPG算法中。与现有的NM技术不同，我们在SPG-NM算法中采用了一些超参数。此外，计算复杂度几乎与现代SPG类型算法相同，例如加速策略梯度（APG），它使用Nesterov的加速梯度（NAG）来装备SPG。我们在两个经典任务，赌博机设置和马尔可夫决策过程（MDP）上评估了所得算法。不同任务中的数值结果显示出所得算法的更快收敛速度，通过比较最先进的算法来证实NM在加速RL中的SPG的积极影响。此外，在不同设置下的数值实验证实了我们的SPG-NM算法对于某些关键超参数的稳健性，这使用户在实践中感到自由。

更新时间: 2024-05-08 03:01:05

领域: cs.LG

下载: http://arxiv.org/abs/2405.12228v1

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the various underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains.

Updated: 2024-05-08 02:56:43

标题: 通用等变换换向3D分子相互作用学习

摘要: 生物学和药物发现中的许多过程涉及分子之间的各种3D相互作用，如蛋白质和蛋白质、蛋白质和小分子等。鉴于不同分子通常以不同的粒度表示，现有方法通常使用不同模型独立地对每种类型的分子进行编码，导致学习各种潜在相互作用物理性质的不足。在本文中，我们首先提出以几何图形集表示任意3D复杂结构，为使用一个模型对所有类型的分子进行编码提供了启示。然后，我们提出了一个通用等变换器（GET），以有效捕捉特定领域的层次结构和领域不可知的相互作用物理性质。具体而言，GET由一个双层注意力模块、一个前馈模块和一个层归一化模块组成，其中每个模块都是E(3)等变的，并专门用于处理不同大小的集合。值得注意的是，与传统基于池化的分层模型相比，我们的GET能够保留所有层次的细粒度信息。对蛋白质、小分子和RNA/DNA之间的相互作用进行了大量实验，验证了我们提出的方法在不同领域中的有效性和泛化能力。

更新时间: 2024-05-08 02:56:43

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2306.01474v5

Rhizomes and Diffusions for Processing Highly Skewed Graphs on Fine-Grain Message-Driven Systems

The paper provides a unified co-design of 1) a programming and execution model that allows spawning tasks from within the vertex data at runtime, 2) language constructs for \textit{actions} that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 3) and an innovative vertex-centric data-structure, using the concept of Rhizomes, that parallelizes both the out and in-degree load of vertex objects across many cores and yet provides a single programming abstraction to the vertex objects. The data structure hierarchically parallelizes the out-degree load of vertices and the in-degree load laterally. The rhizomes internally communicate and remain consistent, using event-driven synchronization mechanisms, to provide a unified and correct view of the vertex. Simulated experimental results show performance gains for BFS, SSSP, and Page Rank on large chip sizes for the tested input graph datasets containing highly skewed degree distribution. The improvements come from the ability to express and create fine-grain dynamic computing task in the form of \textit{actions}, language constructs that aid the compiler to generate code that the runtime system uses to optimally schedule tasks, and the data structure that shares both in and out-degree compute workload among memory-processing elements.

Updated: 2024-05-08 02:48:35

标题: 细粒度消息驱动系统上处理高度倾斜图的根茎和扩散

摘要: 该论文提供了统一的协同设计：1）一种编程和执行模型，允许在顶点数据中运行时生成任务，2）用于发送工作到数据所在位置的\textit{actions}语言构造，将本地控制对象（LCOs）的并行表达性结合起来，实现异步图处理原语，3）以及一种创新的基于顶点的数据结构，使用根茎的概念，将顶点对象的出度和入度负载并行化到许多核心，并提供顶点对象的单一编程抽象。该数据结构分层并行化顶点的出度负载和横向并行化入度负载。根茎内部使用事件驱动的同步机制进行通信和保持一致，以提供顶点的统一和正确视图。模拟实验结果显示，在包含高度倾斜度分布的测试输入图数据集的大型芯片尺寸上，BFS、SSSP和Page Rank的性能提升。这些改进来自于能够表达和创建细粒度动态计算任务的\textit{actions}形式，语言构造有助于编译器生成运行时系统用于优化调度任务的代码，以及数据结构在内存处理元素之间共享入度和出度计算工作负载。

更新时间: 2024-05-08 02:48:35

领域: cs.DC,cs.AI,cs.DS,C.1.4; C.3; C.4; D.1.3

下载: http://arxiv.org/abs/2402.06086v2

Chain of Thoughtlessness: An Analysis of CoT in Planning

Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated by modifying prompts to include examples with chains of thought--demonstrations of solution procedures--with the intuition that it is possible to in-context teach an LLM an algorithm for solving the problem. This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain, and examine the performance of two state-of-the-art LLMs across two axes: generality of examples given in prompt, and complexity of problems queried with each prompt. While our problems are very simple, we only find meaningful performance improvements from chain of thought prompts when those prompts are exceedingly specific to their problem class, and that those improvements quickly deteriorate as the size n of the query-specified stack grows past the size of stacks shown in the examples. Our results hint that, contrary to previous claims in the literature, CoT's performance improvements do not stem from the model learning general algorithmic procedures via demonstrations and depend on carefully engineering highly problem specific prompts. This spotlights drawbacks of chain of thought, especially because of the sharp tradeoff between possible performance gains and the amount of human labor necessary to generate examples with correct reasoning traces.

Updated: 2024-05-08 02:48:28

标题: 无思维链：对规划中无思维链的分析

摘要: 大型语言模型（LLM）在推理问题上的表现通常不会在分布之外泛化。先前的研究声称可以通过修改提示来包含思维链的示例-解决方案的演示-的方式来减轻这种情况，具体做法是可以在上下文中教授LLM解决问题的算法。本文提供了一个关于Blocksworld问题的思维链案例研究，这是一个古典规划领域，同时检验了两种最先进的LLM在两个方面的表现：提示中给出的示例的一般性，以及针对每个提示查询的问题的复杂性。虽然我们的问题非常简单，但我们只有在提示与其问题类别极为具体时才发现思维链提示的表现有意义的改进，而且这些改进很快在查询特定堆栈的大小n超过示例中显示的堆栈大小时就会下降。我们的结果暗示，与文献中先前的说法相反，CoT的表现改进并不是因为模型通过演示学习了一般的算法程序，并且依赖于精心设计高度问题特定的提示。这突显了思维链的缺点，尤其是因为在正确推理过程中产生示例所需的人工劳动量与可能的绩效提升之间存在尖锐的权衡。

更新时间: 2024-05-08 02:48:28

领域: cs.AI

下载: http://arxiv.org/abs/2405.04776v1

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

Although social media platforms are a prominent arena for users to engage in interpersonal discussions and express opinions, the facade and anonymity offered by social media may allow users to spew hate speech and offensive content. Given the massive scale of such platforms, there arises a need to automatically identify and flag instances of hate speech. Although several hate speech detection methods exist, most of these black-box methods are not interpretable or explainable by design. To address the lack of interpretability, in this paper, we propose to use state-of-the-art Large Language Models (LLMs) to extract features in the form of rationales from the input text, to train a base hate speech classifier, thereby enabling faithful interpretability by design. Our framework effectively combines the textual understanding capabilities of LLMs and the discriminative power of state-of-the-art hate speech classifiers to make these classifiers faithfully interpretable. Our comprehensive evaluation on a variety of English language social media hate speech datasets demonstrate: (1) the goodness of the LLM-extracted rationales, and (2) the surprising retention of detector performance even after training to ensure interpretability. All code and data will be made available at https://github.com/AmritaBh/shield.

Updated: 2024-05-08 02:47:36

标题: 朝向使用大型语言模型提取的合理性来解释可解释的仇恨言论检测

摘要: 尽管社交媒体平台是用户进行人际讨论和表达观点的主要场所，但社交媒体提供的外表和匿名性可能会使用户传播仇恨言论和冒犯性内容。鉴于这些平台的规模之大，有必要自动识别和标记仇恨言论的实例。尽管存在多种仇恨言论检测方法，但大多数黑匣子方法在设计上不具有可解释性。为解决这一可解释性缺失问题，本文提出使用最先进的大型语言模型（LLMs）从输入文本中提取理由形式的特征，训练基础仇恨言论分类器，从而实现设计上的忠实可解释性。我们的框架有效地结合了LLMs的文本理解能力和最先进仇恨言论分类器的判别力，使这些分类器能够忠实地解释。我们在各种英语社交媒体仇恨言论数据集上进行了全面评估，结果显示：（1）LLM提取的理由的优势，以及（2）即使在训练以确保可解释性后，检测器性能的惊人保持。所有代码和数据将在https://github.com/AmritaBh/shield上提供。

更新时间: 2024-05-08 02:47:36

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.12403v2

Hypergraph-enhanced Dual Semi-supervised Graph Classification

In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreover, GNNs are inherently limited to encoding local neighborhood information using message-passing mechanisms, thus lacking the ability to model higher-order dependencies among nodes. To tackle these challenges, we propose a Hypergraph-Enhanced DuAL framework named HEAL for semi-supervised graph classification, which captures graph semantics from the perspective of the hypergraph and the line graph, respectively. Specifically, to better explore the higher-order relationships among nodes, we design a hypergraph structure learning to adaptively learn complex node dependencies beyond pairwise relations. Meanwhile, based on the learned hypergraph, we introduce a line graph to capture the interaction between hyperedges, thereby better mining the underlying semantic structures. Finally, we develop a relational consistency learning to facilitate knowledge transfer between the two branches and provide better mutual guidance. Extensive experiments on real-world graph datasets verify the effectiveness of the proposed method against existing state-of-the-art methods.

Updated: 2024-05-08 02:44:13

标题: 超图增强的双半监督图分类

摘要: 在本文中，我们研究了半监督图分类，旨在准确预测具有有限标记图和大量未标记图的情景中图的类别。尽管图神经网络（GNNs）具有很强的能力，但它们通常需要大量昂贵的标记图，而大量未标记图未能得到有效利用。此外，GNNs固有地受限于使用消息传递机制编码局部邻域信息，因此缺乏建模节点之间的高阶依赖关系的能力。为了解决这些挑战，我们提出了一个名为HEAL的超图增强的DuAL框架，用于半监督图分类，分别从超图和线图的角度捕捉图的语义。具体来说，为了更好地探索节点之间的高阶关系，我们设计了一个超图结构学习，以自适应地学习超越成对关系的复杂节点依赖关系。同时，基于学习到的超图，我们引入了一个线图来捕捉超边之间的交互作用，从而更好地挖掘潜在的语义结构。最后，我们开发了一种关系一致性学习，促进两个分支之间的知识传递并提供更好的相互指导。对真实世界的图数据集进行的大量实验证实了所提方法相对于现有最先进方法的有效性。

更新时间: 2024-05-08 02:44:13

领域: cs.LG,cs.AI,cs.IR,cs.SI

下载: http://arxiv.org/abs/2405.04773v1

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.

Updated: 2024-05-08 02:43:34

标题: DeepSeek-V2：一种强大、经济高效的专家混合语言模型

摘要: 我们提出了DeepSeek-V2，这是一个强大的专家混合（MoE）语言模型，具有经济的训练和高效的推理能力。它包含236B个总参数，其中每个标记激活了21B个参数，并支持128K个标记的上下文长度。DeepSeek-V2采用创新的架构，包括多头潜在关注（MLA）和DeepSeekMoE。MLA通过将键值（KV）缓存显著压缩为潜在向量，确保高效的推理，而DeepSeekMoE通过稀疏计算实现经济成本下训练强大模型。与DeepSeek 67B相比，DeepSeek-V2实现了显著更强的性能，同时节省了42.5%的训练成本，将KV缓存减少了93.3%，并将最大生成吞吐量提升至5.76倍。我们在一个包含8.1T个标记的高质量和多源语料库上预训练DeepSeek-V2，并进一步进行监督微调（SFT）和强化学习（RL）以充分释放其潜力。评估结果表明，即使只有激活了21B个参数，DeepSeek-V2及其聊天版本仍然在开源模型中实现了顶尖性能。

更新时间: 2024-05-08 02:43:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.04434v2

Inference With Combining Rules From Multiple Differentially Private Synthetic Datasets

Differential privacy (DP) has been accepted as a rigorous criterion for measuring the privacy protection offered by random mechanisms used to obtain statistics or, as we will study here, synthetic datasets from confidential data. Methods to generate such datasets are increasingly numerous, using varied tools including Bayesian models, deep neural networks and copulas. However, little is still known about how to properly perform statistical inference with these differentially private synthetic (DIPS) datasets. The challenge is for the analyses to take into account the variability from the synthetic data generation in addition to the usual sampling variability. A similar challenge also occurs when missing data is imputed before analysis, and statisticians have developed appropriate inference procedures for this case, which we tend extended to the case of synthetic datasets for privacy. In this work, we study the applicability of these procedures, based on combining rules, to the analysis of DIPS datasets. Our empirical experiments show that the proposed combining rules may offer accurate inference in certain contexts, but not in all cases.

Updated: 2024-05-08 02:33:35

标题: 从多个差分隐私合成数据集中使用组合规则进行推断

摘要: 差分隐私（DP）已被接受为衡量用于获取统计数据或合成数据集的随机机制提供的隐私保护的严格标准。生成此类数据集的方法越来越多，包括贝叶斯模型、深度神经网络和科普拉。然而，目前对如何正确使用这些差分隐私合成（DIPS）数据集进行统计推断仍知之甚少。挑战在于分析需要考虑合成数据生成的变异性，而不仅仅是通常的抽样变异性。类似的挑战也出现在在分析之前对缺失数据进行插补时，统计学家已经为这种情况开发了适当的推断程序，我们试图将其扩展到隐私合成数据集的情况。在这项工作中，我们研究了基于组合规则的这些程序在分析DIPS数据集时的适用性。我们的实证实验表明，提出的组合规则可能在某些情况下提供准确的推断，但并非在所有情况下。

更新时间: 2024-05-08 02:33:35

领域: stat.ME,cs.CR,cs.LG,stat.AP

下载: http://arxiv.org/abs/2405.04769v1

Test-Time Augmentation for Traveling Salesperson Problem

We propose Test-Time Augmentation (TTA) as an effective technique for addressing combinatorial optimization problems, including the Traveling Salesperson Problem. In general, deep learning models possessing the property of invariance, where the output is uniquely determined regardless of the node indices, have been proposed to learn graph structures efficiently. In contrast, we interpret the permutation of node indices, which exchanges the elements of the distance matrix, as a TTA scheme. The results demonstrate that our method is capable of obtaining shorter solutions than the latest models. Furthermore, we show that the probability of finding a solution closer to an exact solution increases depending on the augmentation size.

Updated: 2024-05-08 02:31:51

标题: 旅行推销员问题的测试时间增强

摘要: 我们提出了测试时间增强（TTA）作为一种有效的技术，用于解决组合优化问题，包括旅行推销员问题。一般来说，具有不变性属性的深度学习模型，即输出在节点索引不变的情况下是唯一确定的，已被提出以高效地学习图结构。相比之下，我们将节点索引的排列解释为一种TTA方案，即交换距离矩阵的元素。结果表明，我们的方法能够获得比最新模型更短的解决方案。此外，我们展示了发现接近精确解的解决方案的概率随着增强的大小而增加。

更新时间: 2024-05-08 02:31:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.04767v1

One-Wayness in Quantum Cryptography

The existence of one-way functions is one of the most fundamental assumptions in classical cryptography. In the quantum world, on the other hand, there are evidences that some cryptographic primitives can exist even if one-way functions do not exist. We therefore have the following important open problem in quantum cryptography: What is the most fundamental element in quantum cryptography? In this direction, Brakerski, Canetti, and Qian recently defined a notion called EFI pairs, which are pairs of efficiently generatable states that are statistically distinguishable but computationally indistinguishable, and showed its equivalence with some cryptographic primitives including commitments, oblivious transfer, and general multi-party computations. However, their work focuses on decision-type primitives and does not cover search-type primitives like quantum money and digital signatures. In this paper, we study properties of one-way state generators (OWSGs), which are a quantum analogue of one-way functions. We first revisit the definition of OWSGs and generalize it by allowing mixed output states. Then we show the following results. (1) We define a weaker version of OWSGs, weak OWSGs, and show that they are equivalent to OWSGs. (2) Quantum digital signatures are equivalent to OWSGs. (3) Private-key quantum money schemes (with pure money states) imply OWSGs. (4) Quantum pseudo one-time pad schemes imply both OWSGs and EFI pairs. (5) We introduce an incomparable variant of OWSGs, which we call secretly-verifiable and statistically-invertible OWSGs, and show that they are equivalent to EFI pairs.

Updated: 2024-05-08 02:31:38

标题: 量子密码学中的单向性

摘要: 在经典密码学中，单向函数的存在是最基本的假设之一。然而，在量子世界中，有证据表明一些密码原语即使在单向函数不存在的情况下也可以存在。因此，在量子密码学中存在一个重要的开放问题：量子密码学中最基本的元素是什么？在这方面，Brakerski、Canetti和Qian最近定义了一个称为EFI对的概念，它是一对可以高效生成的状态，统计上可区分但在计算上不可区分，并且证明了它与一些密码原语的等价性，包括承诺、遗忘传输和一般多方计算。然而，他们的工作集中在决策类型原语上，并未涵盖搜索类型原语，如量子货币和数字签名。本文研究了一种称为单向状态生成器（OWSGs）的属性，它是单向函数的量子类比。我们首先重新审视了OWSGs的定义，并通过允许混合输出状态来加以概括。然后我们展示了以下结果：（1）我们定义了OWSGs的一种较弱版本，弱OWSGs，并证明它们与OWSGs等价。（2）量子数字签名等价于OWSGs。（3）私钥量子货币方案（具有纯货币状态）暗示了OWSGs。（4）量子伪一次密码方案暗示了OWSGs和EFI对。（5）我们引入了OWSGs的一个不可比较变体，称为秘密可验证和统计可逆OWSGs，并证明它们与EFI对等价。

更新时间: 2024-05-08 02:31:38

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2210.03394v3

When Foresight Pruning Meets Zeroth-Order Optimization: Efficient Federated Learning for Low-Memory Devices

Although Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design, it fails to work on low-memory AIoT devices due to its heavy memory usage. To address this problem, various federated pruning methods are proposed to reduce memory usage during inference. However, few of them can substantially mitigate the memory burdens during pruning and training. As an alternative, zeroth-order or backpropagation-free (BP-Free) methods can partially alleviate the memory consumption, but they suffer from scaling up and large computation overheads, since the gradient estimation error and floating point operations (FLOPs) increase as the dimensionality of the model parameters grows. In this paper, we propose a federated foresight pruning method based on Neural Tangent Kernel (NTK), which can seamlessly integrate with federated BP-Free training frameworks. We present an approximation to the computation of federated NTK by using the local NTK matrices. Moreover, we demonstrate that the data-free property of our method can substantially reduce the approximation error in extreme data heterogeneity scenarios. Since our approach improves the performance of the vanilla BP-Free method with fewer FLOPs and truly alleviates memory pressure during training and inference, it makes FL more friendly to low-memory devices. Comprehensive experimental results obtained from simulation- and real test-bed-based platforms show that our federated foresight-pruning method not only preserves the ability of the dense model with a memory reduction up to 9x but also boosts the performance of the vanilla BP-Free method with dramatically fewer FLOPs.

Updated: 2024-05-08 02:24:09

标题: 当前视剪枝与零阶优化相遇：用于低内存设备的高效联邦学习

摘要: 尽管联邦学习（FL）在人工智能物联网（AIoT）设计中实现了协作学习，但由于其高内存使用量，无法在低内存AIoT设备上运行。为解决这一问题，提出了各种联邦修剪方法，以减少推断期间的内存使用量。然而，很少有方法可以在修剪和训练期间实质性地减轻内存负担。作为替代方案，零阶或无反向传播（BP-Free）方法可以部分缓解内存消耗，但由于梯度估计误差和浮点运算（FLOPs）随模型参数维度增长而增加，因此它们存在扩展和大量计算开销的问题。在本文中，我们提出了一种基于神经切向核（NTK）的联邦前瞻修剪方法，可以与联邦BP-Free训练框架无缝集成。我们通过使用本地NTK矩阵对联邦NTK的计算进行了近似。此外，我们证明了我们方法的无数据特性可以在极端数据异质性场景中大幅减少近似误差。由于我们的方法通过更少的FLOPs提高了普通BP-Free方法的性能，并在训练和推断过程中真正减轻了内存压力，使FL更适用于低内存设备。从基于仿真和真实测试平台的综合实验结果表明，我们的联邦前瞻修剪方法不仅保留了密集模型的能力，同时将内存减少高达9倍，并显著减少了FLOPs的数量，提升了普通BP-Free方法的性能。

更新时间: 2024-05-08 02:24:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.04765v1

Secure Transformer Inference Protocol

Security of model parameters and user data is critical for Transformer-based services, such as ChatGPT. While recent strides in secure two-party protocols have successfully addressed security concerns in serving Transformer models, their adoption is practically infeasible due to the prohibitive cryptographic overheads involved. Drawing insights from our hands-on experience in developing two real-world Transformer-based services, we identify the inherent efficiency bottleneck in the two-party assumption. To overcome this limitation, we propose a novel three-party threat model. Within this framework, we design a semi-symmetric permutation-based protection scheme and present STIP, the first secure Transformer inference protocol without any inference accuracy loss. Experiments on representative Transformer models in real systems show that STIP has practical security and outperforms state-of-the-art secure two-party protocols in efficiency by millions of times.

Updated: 2024-05-08 02:20:09

标题: 安全Transformer推理协议

摘要: Transformer-based服务，如ChatGPT，模型参数和用户数据的安全性至关重要。尽管最近在安全的两方协议方面取得了成功，解决了为Transformer模型提供安全性的担忧，但由于涉及的密码学开销过高，实际上无法采用。通过我们在开发两个真实世界基于Transformer的服务的实践经验，我们发现两方假设中固有的效率瓶颈。为了克服这一限制，我们提出了一个新颖的三方威胁模型。在这个框架内，我们设计了一个基于半对称置换的保护方案，并提出了STIP，这是第一个没有推理准确性损失的安全Transformer推理协议。在真实系统中代表性Transformer模型上的实验表明，STIP具有实际安全性，并在效率方面比最先进的安全两方协议表现出数百万倍的优势。

更新时间: 2024-05-08 02:20:09

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.00025v2

Nearly-Optimal Consensus Tolerating Adaptive Omissions: Why is a Lot of Randomness is Needed?

We study the problem of reaching agreement in a synchronous distributed system by $n$ autonomous parties, when the communication links from/to faulty parties can omit messages. The faulty parties are selected and controlled by an adaptive, full-information, computationally unbounded adversary. We design a randomized algorithm that works in $O(\sqrt{n}\log^2 n)$ rounds and sends $O(n^2\log^3 n)$ communication bits, where the number of faulty parties is $\Theta(n)$. Our result is simultaneously tight for both these measures within polylogarithmic factors: due to the $\Omega(n^2)$ lower bound on communication by Abraham et al. (PODC'19) and $\Omega(\sqrt{n/\log n})$ lower bound on the number of rounds by Bar-Joseph and Ben-Or (PODC'98). We also quantify how much randomness is necessary and sufficient to reduce time complexity to a certain value, while keeping the communication complexity (nearly) optimal. We prove that no MC algorithm can work in less than $\Omega(\frac{n^2}{\max\{R,n\}\log n})$ rounds if it uses less than $O(R)$ calls to a random source, assuming a constant fraction of faulty parties. This can be contrasted with a long line of work on consensus against an {\em adversary limited to polynomial computation time}, thus unable to break cryptographic primitives, culminating in a work by Ghinea et al. (EUROCRYPT'22), where an optimal $O(r)$-round solution with probability $1-(cr)^{-r}$ is given. Our lower bound strictly separates these two regimes, by excluding such results if the adversary is computationally unbounded. On the upper bound side, we show that for $R\in\tilde{O}(n^{3/2})$ there exists an algorithm solving consensus in $\tilde{O}(\frac{n^2}{R})$ rounds with high probability, where tilde notation hides a polylogarithmic factor. The communication complexity of the algorithm does not depend on the amount of randomness $R$ and stays optimal within polylogarithmic factor.

Updated: 2024-05-08 02:17:10

标题: 几乎最佳的容忍自适应遗漏的一致性：为什么需要很多随机性？

摘要: 我们研究了在一个同步分布式系统中，由$n$个自治方参与者达成一致的问题，其中与/从故障方参与者的通信链路可能省略消息。故障方是由一个自适应、全信息、计算上无界的对手选择和控制的。我们设计了一个随机算法，可以在$O(\sqrt{n}\log^2 n)$轮内工作，并发送$O(n^2\log^3 n)$个通信比特，其中故障方的数量为$\Theta(n)$。我们的结果同时在这两个度量上在对数因子内是紧密的：由Abraham等人(PODC'19)对通信的$\Omega(n^2)$下界和Bar-Joseph和Ben-Or(PODC'98)对轮数的$\Omega(\sqrt{n/\log n})$下界。我们还量化了为了将时间复杂度降低到某个值而需要和足够的随机性，同时保持通信复杂度(几乎)是最优的。我们证明，如果MC算法对一个常数比例的故障方使用少于$O(R)$次随机源调用，则在不到$\Omega(\frac{n^2}{\max\{R,n\}\log n})$轮内无法运行。这与针对一个{\em 仅限于多项式计算时间的对手}的一系列共识工作形成对比，这种对手无法破解密码原语，最终由Ghinea等人(EUROCRYPT'22)发表了一个概率为$1-(cr)^{-r}$的最优$O(r)$轮解决方案。我们的下界严格地将这两种情况分开，如果对手在计算上无界，则排除这样的结果。在上界方面，我们展示了对于$R\in\tilde{O}(n^{3/2})$，存在一个算法以高概率在$\tilde{O}(\frac{n^2}{R})$轮内解决共识问题，其中tilde符号隐藏了一个对数因子。该算法的通信复杂度不依赖于随机性$R$的数量，并在对数因子内保持最优。

更新时间: 2024-05-08 02:17:10

领域: cs.DC,cs.CR,cs.DS

下载: http://arxiv.org/abs/2405.04762v1

Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (\n\n), where the content before and after '\n\n' in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following '\n\n' should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the '\n\n'. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting '\n\n' at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of '\n'.

Updated: 2024-05-08 02:15:45

标题: 跳过：减少大型视觉语言模型中幻觉的简单方法

摘要: 最近大规模视觉语言模型（LVLMs）的进展展示了在人类语言中理解视觉信息方面的令人印象深刻的能力。尽管取得了这些进展，LVLMs仍然面临多模态幻觉的挑战，比如生成与视觉信息中不存在的对象相关的文本描述。然而，多模态幻觉的根本原因仍然未被充分探索。在本文中，我们提出了一个新的观点，认为LVLMs中固有的偏见可能是幻觉的关键因素。具体地，我们系统地确定了与段落分隔符（\n\n）相关的语义转变偏见，即在训练数据中，'\n\n'之前和之后的内容经常表现出显著的语义变化。这种模式导致模型推断，'\n\n'之后的内容应该明显不同于前面的内容，并且具有较少的幻觉描述，从而增加在'\n\n'之后出现幻觉描述的概率。我们已经在多个公开可用的LVLMs上验证了这一假设。此外，我们发现在生成的描述中故意插入'\n\n'可以引起更多幻觉。我们提出了一种简单的方法，通过跳过'\n'的输出来有效缓解LVLMs的幻觉。

更新时间: 2024-05-08 02:15:45

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.01345v6

Coseparable Nonnegative Tensor Factorization With T-CUR Decomposition

Nonnegative Matrix Factorization (NMF) is an important unsupervised learning method to extract meaningful features from data. To address the NMF problem within a polynomial time framework, researchers have introduced a separability assumption, which has recently evolved into the concept of coseparability. This advancement offers a more efficient core representation for the original data. However, in the real world, the data is more natural to be represented as a multi-dimensional array, such as images or videos. The NMF's application to high-dimensional data involves vectorization, which risks losing essential multi-dimensional correlations. To retain these inherent correlations in the data, we turn to tensors (multidimensional arrays) and leverage the tensor t-product. This approach extends the coseparable NMF to the tensor setting, creating what we term coseparable Nonnegative Tensor Factorization (NTF). In this work, we provide an alternating index selection method to select the coseparable core. Furthermore, we validate the t-CUR sampling theory and integrate it with the tensor Discrete Empirical Interpolation Method (t-DEIM) to introduce an alternative, randomized index selection process. These methods have been tested on both synthetic and facial analysis datasets. The results demonstrate the efficiency of coseparable NTF when compared to coseparable NMF.

Updated: 2024-05-08 02:14:51

标题: 具有T-CUR分解的可分解非负张量分解

摘要: 非负矩阵分解（NMF）是一种重要的无监督学习方法，用于从数据中提取有意义的特征。为了在多项式时间框架内解决NMF问题，研究人员引入了一个可分性假设，最近演变为共分性的概念。这一进展为原始数据提供了更高效的核表示。然而，在现实世界中，数据更自然地表示为多维数组，如图像或视频。NMF应用于高维数据需要向量化，这会导致丢失基本的多维相关性。为了保留数据中的这些固有相关性，我们转向张量（多维数组）并利用张量乘积。这种方法将可分NMF扩展到张量设置中，创建了我们所称的可分非负张量分解（NTF）。在这项工作中，我们提供了一种交替的指数选择方法来选择可分核心。此外，我们验证了t-CUR采样理论，并将其与张量离散经验插值方法（t-DEIM）相结合，引入一种替代的随机化指数选择过程。这些方法已在合成数据集和面部分析数据集上进行了测试。结果表明，与可分NMF相比，可分NTF的效率更高。

更新时间: 2024-05-08 02:14:51

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2401.16836v2

Quantifying the Capabilities of LLMs across Scale and Precision

Scale is often attributed as one of the factors that cause an increase in the performance of LLMs, resulting in models with billion and trillion parameters. One of the limitations of such large models is the high computational requirements that limit their usage, deployment, and debugging in resource-constrained scenarios. Two commonly used alternatives to bypass these limitations are to use the smaller versions of LLMs (e.g. Llama 7B instead of Llama 70B) and lower the memory requirements by using quantization. While these approaches effectively address the limitation of resources, their impact on model performance needs thorough examination. In this study, we perform a comprehensive evaluation to investigate the effect of model scale and quantization on the performance. We experiment with two major families of open-source instruct models ranging from 7 billion to 70 billion parameters. Our extensive zero-shot experiments across various tasks including natural language understanding, reasoning, misinformation detection, and hallucination reveal that larger models generally outperform their smaller counterparts, suggesting that scale remains an important factor in enhancing performance. We found that larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks and they serve as a better solution than using smaller models at high precision under similar memory requirements.

Updated: 2024-05-08 02:10:36

标题: 量化LLMs在尺度和精度上的能力

摘要: 规模通常被认为是导致LLMs性能提高的因素之一，从而产生具有数十亿和数万亿参数的模型。这种大型模型的一个限制是高计算需求，限制了它们在资源受限场景下的使用、部署和调试。绕过这些限制的两种常用替代方案是使用较小版本的LLMs（例如使用Llama 7B而不是Llama 70B）并通过量化降低内存需求。虽然这些方法有效地解决了资源限制的问题，但它们对模型性能的影响需要进行彻底的检查。在这项研究中，我们进行了全面评估，以调查模型规模和量化对性能的影响。我们在涵盖了从70亿到700亿参数的两个主要开源指导模型家族上进行了实验。我们进行了广泛的零点实验，涵盖了自然语言理解、推理、虚假信息检测和幻觉等各种任务，发现较大模型通常优于较小模型，表明规模仍然是提高性能的重要因素。我们发现，较大模型对精度降低表现出异常的韧性，并且即使在4位量化下，它们在许多任务上仍能保持高精度，而且在相似的内存需求下，它们比使用较小模型在高精度下更好地解决了问题。

更新时间: 2024-05-08 02:10:36

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.03146v2

Large Language Models for Cyber Security: A Systematic Literature Review

The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.

Updated: 2024-05-08 02:09:17

标题: 大规模语言模型在网络安全中的应用：系统性文献综述

摘要: 大型语言模型（LLMs）的快速发展为在各个领域利用人工智能，包括网络安全，提供了新的机遇。随着网络威胁的数量和复杂性不断增加，对能够自动检测漏洞、分析恶意软件并应对攻击的智能系统的需求也在增加。在这项调查中，我们对LLMs在网络安全（LLM4Security）中的应用文献进行了全面回顾。通过全面收集超过3万篇相关论文，并系统分析了来自顶级安全和软件工程会议的127篇论文，我们旨在提供LLMs如何被用于解决网络安全领域各种问题的整体视角。通过我们的分析，我们确定了几个关键发现。首先，我们观察到LLMs被应用于各种网络安全任务，包括漏洞检测、恶意软件分析、网络入侵检测和钓鱼检测。其次，我们发现用于这些任务中训练和评估LLMs的数据集通常规模有限且缺乏多样性，突显了需要更全面和具代表性的数据集。第三，我们确定了几种有希望的技术，用于将LLMs调整到特定的网络安全领域，如微调、迁移学习和领域特定的预训练。最后，我们讨论了LLM4Security未来研究中的主要挑战和机遇，包括更具解释性和可解释性模型的需求、解决数据隐私和安全问题的重要性，以及利用LLMs进行主动防御和威胁搜索的潜力。总的来说，我们的调查提供了LLM4Security当前技术水平的全面概述，并确定了未来研究的几个有希望的方向。

更新时间: 2024-05-08 02:09:17

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04760v1

Multi-Label Out-of-Distribution Detection with Spectral Normalized Joint Energy

In today's interconnected world, achieving reliable out-of-distribution (OOD) detection poses a significant challenge for machine learning models. While numerous studies have introduced improved approaches for multi-class OOD detection tasks, the investigation into multi-label OOD detection tasks has been notably limited. We introduce Spectral Normalized Joint Energy (SNoJoE), a method that consolidates label-specific information across multiple labels through the theoretically justified concept of an energy-based function. Throughout the training process, we employ spectral normalization to manage the model's feature space, thereby enhancing model efficacy and generalization, in addition to bolstering robustness. Our findings indicate that the application of spectral normalization to joint energy scores notably amplifies the model's capability for OOD detection. We perform OOD detection experiments utilizing PASCAL-VOC as the in-distribution dataset and ImageNet-22K or Texture as the out-of-distribution datasets. Our experimental results reveal that, in comparison to prior top performances, SNoJoE achieves 11% and 54% relative reductions in FPR95 on the respective OOD datasets, thereby defining the new state of the art in this field of study.

Updated: 2024-05-08 02:05:38

标题: 具有谱归一化联合能量的多标签带外检测

摘要: 在当今互联互通的世界中，实现可靠的超出分布（OOD）检测对机器学习模型构成重大挑战。虽然许多研究已经引入了改进的多类OOD检测任务方法，但对多标签OOD检测任务的研究明显有限。我们引入了一种名为Spectral Normalized Joint Energy（SNoJoE）的方法，通过理论上证明的能量函数的概念，将多个标签的特定信息整合在一起。在训练过程中，我们使用谱归一化来管理模型的特征空间，从而提高模型的效力和泛化能力，同时增强鲁棒性。我们的发现表明，将谱归一化应用于联合能量分数显著增强了模型的OOD检测能力。我们进行OOD检测实验，使用PASCAL-VOC作为分布数据集，而将ImageNet-22K或Texture作为超出分布数据集。我们的实验结果显示，与之前的最佳表现相比，SNoJoE在各自的OOD数据集上分别实现了FPR95的11%和54%的相对降低，从而定义了该研究领域的新技术水平。

更新时间: 2024-05-08 02:05:38

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.04759v1

Honeyfile Camouflage: Hiding Fake Files in Plain Sight

Honeyfiles are a particularly useful type of honeypot: fake files deployed to detect and infer information from malicious behaviour. This paper considers the challenge of naming honeyfiles so they are camouflaged when placed amongst real files in a file system. Based on cosine distances in semantic vector spaces, we develop two metrics for filename camouflage: one based on simple averaging and one on clustering with mixture fitting. We evaluate and compare the metrics, showing that both perform well on a publicly available GitHub software repository dataset.

Updated: 2024-05-08 02:01:17

标题: 蜜罐伪装：将假文件隐藏在明处

摘要: 蜜罐文件是一种特别有用的蜜罐类型：部署的假文件，用于检测和推断恶意行为中的信息。本文考虑了将蜜罐文件命名为伪装在文件系统中真实文件之间的挑战。基于语义向量空间中的余弦距离，我们开发了两种用于文件名伪装的度量标准：一种基于简单平均，另一种基于混合拟合的聚类。我们评估和比较这些度量标准，表明两者在一个公开可用的GitHub软件存储库数据集上表现良好。

更新时间: 2024-05-08 02:01:17

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.04758v1

BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective they are for eliminating social biases. In this work, we propose a new methodology for attacking language models with knowledge graph augmented generation. We refactor natural language stereotypes into a knowledge graph, and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. This demonstrates the need for further research in AI safety, and further work in this new adversarial space.

Updated: 2024-05-08 01:51:29

标题: BiasKG：对抗性知识图谱以引导大型语言模型中的偏见

摘要: 现代大型语言模型(LLMs)具有大量世界知识，当适当利用时，可以在常识推理和知识密集型任务中表现出色。这种语言模型还可以学习社会偏见，这对社会可能造成重大危害。已经提出了许多减轻LLM安全性的策略，但目前尚不清楚它们对消除社会偏见有多有效。在这项工作中，我们提出了一种新的攻击语言模型的方法，即知识图增强生成。我们将自然语言刻板印象重构为知识图，并使用对抗攻击策略从几种开源和闭源语言模型中诱使出偏见回应。我们发现我们的方法增加了所有模型中的偏见，即使是经过安全护栏训练的模型也是如此。这表明需要进一步研究AI安全性，并在这一新的对抗空间中进行更多工作。

更新时间: 2024-05-08 01:51:29

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.04756v1

Conditional Local Feature Encoding for Graph Neural Networks

Graph neural networks (GNNs) have shown great success in learning from graph-based data. The key mechanism of current GNNs is message passing, where a node's feature is updated based on the information passing from its local neighbourhood. A limitation of this mechanism is that node features become increasingly dominated by the information aggregated from the neighbourhood as we use more rounds of message passing. Consequently, as the GNN layers become deeper, adjacent node features tends to be similar, making it more difficult for GNNs to distinguish adjacent nodes, thereby, limiting the performance of GNNs. In this paper, we propose conditional local feature encoding (CLFE) to help prevent the problem of node features being dominated by the information from local neighbourhood. The idea of our method is to extract the node hidden state embedding from message passing process and concatenate it with the nodes feature from previous stage, then we utilise linear transformation to form a CLFE based on the concatenated vector. The CLFE will form the layer output to better preserve node-specific information, thus help to improve the performance of GNN models. To verify the feasibility of our method, we conducted extensive experiments on seven benchmark datasets for four graph domain tasks: super-pixel graph classification, node classification, link prediction, and graph regression. The experimental results consistently demonstrate that our method improves model performance across a variety of baseline GNN models for all four tasks.

Updated: 2024-05-08 01:51:19

标题: 图神经网络的条件局部特征编码

摘要: 图神经网络（GNNs）在从基于图的数据中学习方面取得了巨大成功。当前GNNs的关键机制是消息传递，其中节点的特征是基于从其本地邻域传递的信息进行更新的。这种机制的一个局限性是，随着我们使用更多轮次的消息传递，节点特征越来越多地被邻域聚合的信息所主导。因此，随着GNN层变得更深，相邻节点的特征往往会变得相似，使GNN更难区分相邻节点，从而限制了GNN的性能。在本文中，我们提出了条件局部特征编码（CLFE）来帮助防止节点特征被本地邻域的信息所主导的问题。我们方法的思想是从消息传递过程中提取节点隐藏状态嵌入，并将其与前一阶段的节点特征连接起来，然后利用线性变换形成基于连接向量的CLFE。CLFE将形成层输出以更好地保留节点特定信息，从而有助于提高GNN模型的性能。为验证我们方法的可行性，我们在四个图领域任务的七个基准数据集上进行了大量实验：超像素图分类、节点分类、链路预测和图回归。实验结果一致表明，我们的方法改善了各种基准GNN模型的性能，适用于所有四个任务。

更新时间: 2024-05-08 01:51:19

领域: cs.LG

下载: http://arxiv.org/abs/2405.04755v1

Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning

Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets. Addressing this challenge, we propose Key-Point-Driven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar practices from authentic data sources. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. As a result, we present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K question-answer pairs. Utilizing KPMath and augmenting it with additional reasoning-intensive corpora, we create the comprehensive KPMath-Plus dataset. The Qwen1.5-72B model, fine-tuned on KPMath-Plus, achieves 87.0% PASS@1 accuracy on GSM8K and 58.3% on MATH, surpassing competitors in the 7B to 70B range and best commercial models like GPT-4 across multiple math reasoning datasets.

Updated: 2024-05-08 01:48:46

标题: 关键点驱动的数据合成及其在数学推理上的增强

摘要: 大型语言模型(LLMs)在复杂推理任务中表现出巨大潜力，但它们的性能常常受到高质量和推理重点的训练数据稀缺的影响。为了解决这一挑战，我们提出了基于关键点驱动的数据合成(KPDDS)，这是一个新颖的数据合成框架，通过利用关键点和来自真实数据源的典范实践来合成问题-答案对。KPDDS确保生成具有严格质量控制和可观可扩展性的新问题。因此，我们提出了KPMath，一个专门针对数学推理定制的庞大合成数据集，包括超过80万个问题-答案对。利用KPMath并将其与额外的推理密集语料库相结合，我们创建了全面的KPMath-Plus数据集。在KPMath-Plus上微调的Qwen1.5-72B模型，在GSM8K上实现了87.0%的PASS@1精确度，在MATH上实现了58.3%，超越了7B到70B范围内的竞争对手以及像GPT-4这样的最佳商业模型在多个数学推理数据集上。

更新时间: 2024-05-08 01:48:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.02333v3

AquaSonic: Acoustic Manipulation of Underwater Data Center Operations and Resource Management

Underwater datacenters (UDCs) hold promise as next-generation data storage due to their energy efficiency and environmental sustainability benefits. While the natural cooling properties of water save power, the isolated aquatic environment and long-range sound propagation in water create unique vulnerabilities which differ from those of on-land data centers. Our research discovers the unique vulnerabilities of fault-tolerant storage devices, resource allocation software, and distributed file systems to acoustic injection attacks in UDCs. With a realistic testbed approximating UDC server operations, we empirically characterize the capabilities of acoustic injection underwater and find that an attacker can reduce fault-tolerant RAID 5 storage system throughput by 17% up to 100%. Our closed-water analyses reveal that attackers can (i) cause unresponsiveness and automatic node removal in a distributed filesystem with only 2.4 minutes of sustained acoustic injection, (ii) induce a distributed database's latency to increase by up to 92.7% to reduce system reliability, and (iii) induce load-balance managers to redirect up to 74% of resources to a target server to cause overload or force resource colocation. Furthermore, we perform open-water experiments in a lake and find that an attacker can cause controlled throughput degradation at a maximum allowable distance of 6.35 m using a commercial speaker. We also investigate and discuss the effectiveness of standard defenses against acoustic injection attacks. Finally, we formulate a novel machine learning-based detection system that reaches 0% False Positive Rate and 98.2% True Positive Rate trained on our dataset of profiled hard disk drives under 30-second FIO benchmark execution. With this work, we aim to help manufacturers proactively protect UDCs against acoustic injection attacks and ensure the security of subsea computing infrastructures.

Updated: 2024-05-08 01:42:26

标题: AquaSonic：水下数据中心运营和资源管理的声学操控

摘要: 水下数据中心（UDCs）因其节能和环保的优势被视为下一代数据存储的希望。尽管水的自然冷却特性节省了能源，但孤立的水生环境和水中的远程声音传播创造了与陆地数据中心不同的独特脆弱性。我们的研究发现了容错存储设备、资源分配软件和分布式文件系统在UDCs中对声学注入攻击的独特脆弱性。通过模拟UDC服务器操作的实际测试平台，我们经验性地表征了水下声学注入的能力，并发现攻击者可以将容错RAID 5存储系统的吞吐量降低17%至100%。我们的封闭水域分析显示，攻击者可以（i）在仅持续进行2.4分钟的声学注入时导致分布式文件系统出现无响应和自动节点移除，（ii）使分布式数据库的延迟增加高达92.7%，降低系统可靠性，以及（iii）诱使负载平衡管理器将高达74%的资源重定向到目标服务器，导致过载或强制资源共享。此外，我们在湖泊中进行了开放水域实验，发现攻击者可以在最大允许距离6.35米处使用商用扬声器造成受控吞吐量下降。我们还调查和讨论了针对声学注入攻击的标准防御措施的有效性。最后，我们制定了一种基于机器学习的新型检测系统，经过训练，可以在我们的数据集上达到0%的误报率和98.2%的真报率，数据集是在30秒的FIO基准测试执行中对硬盘驱动器进行了分析。通过这项工作，我们旨在帮助制造商积极保护UDCs免受声学注入攻击，并确保海底计算基础设施的安全性。

更新时间: 2024-05-08 01:42:26

领域: cs.CR

下载: http://arxiv.org/abs/2404.11815v2

AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models

Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.

Updated: 2024-05-08 01:41:25

标题: AttacKG+: 通过大型语言模型提升攻击知识图构建

摘要: 攻击知识图谱构建旨在将文本网络威胁情报（CTI）报告转化为结构化表示，展示网络攻击的演化痕迹。尽管先前的研究已经提出了各种方法来构建攻击知识图谱，但它们通常在对各种知识类型的泛化能力有限以及需要在模型设计和调整方面具备专业知识方面存在局限性。为了解决这些问题，我们寻求利用大型语言模型（LLMs），在语言理解和零-shot任务完成方面具有出色能力，已在广泛的任务中取得巨大成功。因此，我们提出了一个完全自动的基于LLM的框架来构建攻击知识图谱，命名为：AttacKG+。我们的框架由四个连续模块组成：改写器、解析器、标识器和总结器，每个模块都通过LLMs提供的指导提示和上下文学习来实现。此外，我们升级现有的攻击知识架构并提出了一个全面的版本。我们将网络攻击表示为一个时间展开事件，每个时间步骤都包含三层表示，包括行为图、MITRE TTP标签和状态摘要。广泛的评估表明：1）我们的表述无缝地满足了威胁事件分析中的信息需求，2）我们的构建框架在忠实和准确提取AttacKG+定义的信息方面是有效的，3）我们的攻击图直接使下游安全实践受益，如攻击重建。所有代码和数据集将在接受后发布。

更新时间: 2024-05-08 01:41:25

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.04753v1

Generative-Enhanced Heterogeneous Graph Contrastive Learning

Heterogeneous Graphs (HGs) can effectively model complex relationships in the real world by multi-type nodes and edges. In recent years, inspired by self-supervised learning, contrastive Heterogeneous Graphs Neural Networks (HGNNs) have shown great potential by utilizing data augmentation and contrastive discriminators for downstream tasks. However, data augmentation is still limited due to the graph data's integrity. Furthermore, the contrastive discriminators remain sampling bias and lack local heterogeneous information. To tackle the above limitations, we propose a novel Generative-Enhanced Heterogeneous Graph Contrastive Learning (GHGCL). Specifically, we first propose a heterogeneous graph generative learning enhanced contrastive paradigm. This paradigm includes: 1) A contrastive view augmentation strategy by using a masked autoencoder. 2) Position-aware and semantics-aware positive sample sampling strategy for generating hard negative samples. 3) A hierarchical contrastive learning strategy for capturing local and global information. Furthermore, the hierarchical contrastive learning and sampling strategies aim to constitute an enhanced contrastive discriminator under the generative-contrastive perspective. Finally, we compare our model with seventeen baselines on eight real-world datasets. Our model outperforms the latest contrastive and generative baselines on node classification and link prediction tasks. To reproduce our work, we have open-sourced our code at https://anonymous.4open.science/r/GC-HGNN-E50C.

Updated: 2024-05-08 01:40:25

标题: 生成增强异构图对比学习

摘要: 异构图（HGs）可以通过多类型节点和边有效地模拟现实世界中的复杂关系。近年来，受自监督学习的启发，对比异构图神经网络（HGNNs）通过利用数据增强和对比鉴别器在下游任务中展现了巨大潜力。然而，由于图数据的完整性，数据增强仍然受到限制。此外，对比鉴别器仍存在采样偏差，缺乏局部异构信息。为了解决以上限制，我们提出了一种新颖的生成增强异构图对比学习（GHGCL）。具体来说，我们首先提出了一种增强对比范式的异构图生成学习。该范式包括：1）通过使用掩码自编码器实现对比视图增强策略。2）针对生成困难负样本的位置感知和语义感知正样本采样策略。3）用于捕获局部和全局信息的分层对比学习策略。此外，分层对比学习和采样策略旨在构成在生成-对比视角下的增强对比鉴别器。最后，我们在八个真实世界数据集上将我们的模型与十七种基线进行比较。我们的模型在节点分类和链接预测任务上优于最新的对比和生成基线。为了重现我们的工作，我们已在https://anonymous.4open.science/r/GC-HGNN-E50C上开源我们的代码。

更新时间: 2024-05-08 01:40:25

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2404.02810v2

Differentially Private Linear Regression with Linked Data

There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients. We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators, which allows us to understand the relative contributions of linkage error, estimation error, and the cost of privacy. The variances of the estimators are also discussed. We demonstrate the performance of the proposed algorithms through simulations and an application to synthetic data.

Updated: 2024-05-08 01:36:39

标题: 使用关联数据的差分隐私线性回归

摘要: 随着现代统计学和机器学习对建立保护隐私方法的需求不断增加，差分隐私作为一种来自计算机科学的数学概念，成为提供强大隐私保证的新兴工具。最近的研究主要集中在开发不同ially私有版本的单个统计和机器学习任务，通常不包括非平凡的上游预处理。一个重要的例子是在进行下游建模之前进行记录链接。记录链接是指将同一组实体的两个或多个数据集进行关联的统计任务，而没有唯一标识符。这种概率性过程为随后的任务带来额外的不确定性。本文提出了两种用于具有链接数据的线性回归的差分隐私算法。具体来说，我们提出了一种噪声梯度方法和一种用于估计回归系数的充分统计扰动方法。通过为估计器提供有限样本误差界限，我们研究了隐私和准确性之间的权衡，从而使我们能够理解链接错误、估计错误和隐私成本的相对贡献。估计器的方差也进行了讨论。我们通过模拟和对合成数据的应用来展示所提算法的性能。

更新时间: 2024-05-08 01:36:39

领域: stat.ME,cs.CR,68P27, 62-XX,G.3; I.0

下载: http://arxiv.org/abs/2308.00836v2

Navigating Chemical Space with Latent Flows

Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at https://github.com/garywei944/ChemFlow.

Updated: 2024-05-08 01:34:25

标题: 使用潜在流来导航化学空间

摘要: 最近深度生成模型在视觉和语言领域的进展激发了对更结构化数据生成的兴趣，如分子。然而，除了生成新的随机分子外，对广阔化学空间的高效探索和全面理解对于分子科学以及药物设计和材料发现应用至关重要。在本文中，我们提出了一个新框架ChemFlow，通过通过流动测量学习的分子生成模型中的潜在空间来遍历化学空间。我们引入了一个动力学系统的视角，将问题形式化为学习一个将分子分布的质量传输到具有所需分子性质或结构多样性的区域的矢量场。在这个框架下，我们统一了先前关于分子潜在空间遍历和优化的方法，并提出了融合不同物理先验的替代竞争方法。我们验证了ChemFlow在分子操作和单目标以及多目标分子优化任务中的有效性，在监督和非监督的分子发现设置下。代码和演示可在GitHub上公开获取https://github.com/garywei944/ChemFlow。

更新时间: 2024-05-08 01:34:25

领域: cs.LG,physics.chem-ph

下载: http://arxiv.org/abs/2405.03987v2

SVD-AE: Simple Autoencoders for Collaborative Filtering

Collaborative filtering (CF) methods for recommendation systems have been extensively researched, ranging from matrix factorization and autoencoder-based to graph filtering-based methods. Recently, lightweight methods that require almost no training have been recently proposed to reduce overall computation. However, existing methods still have room to improve the trade-offs among accuracy, efficiency, and robustness. In particular, there are no well-designed closed-form studies for \emph{balanced} CF in terms of the aforementioned trade-offs. In this paper, we design SVD-AE, a simple yet effective singular vector decomposition (SVD)-based linear autoencoder, whose closed-form solution can be defined based on SVD for CF. SVD-AE does not require iterative training processes as its closed-form solution can be calculated at once. Furthermore, given the noisy nature of the rating matrix, we explore the robustness against such noisy interactions of existing CF methods and our SVD-AE. As a result, we demonstrate that our simple design choice based on truncated SVD can be used to strengthen the noise robustness of the recommendation while improving efficiency. Code is available at https://github.com/seoyoungh/svd-ae.

Updated: 2024-05-08 01:22:47

标题: SVD-AE：用于协作过滤的简单自编码器

摘要: 协同过滤（CF）方法用于推荐系统已经得到广泛研究，包括基于矩阵分解、自编码器和图过滤等方法。最近，提出了几种几乎不需要训练的轻量级方法，以减少整体计算量。然而，现有方法仍有改进准确性、效率和稳健性之间权衡的空间。特别是，在上述权衡方面缺乏设计良好的封闭形式研究。在本文中，我们设计了SVD-AE，这是一种简单而有效的基于奇异向量分解（SVD）的线性自编码器，其封闭形式解可以基于SVD来定义CF。SVD-AE不需要迭代训练过程，因为其封闭形式解可以一次计算出来。此外，考虑到评分矩阵的嘈杂特性，我们探讨了现有CF方法和我们的SVD-AE对这种嘈杂交互的稳健性。结果表明，我们基于截断SVD的简单设计选择可以用于增强推荐的噪声稳健性同时提高效率。代码可在https://github.com/seoyoungh/svd-ae获得。

更新时间: 2024-05-08 01:22:47

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.04746v1

A Survey of Constraint Formulations in Safe Reinforcement Learning

Safety is critical when applying reinforcement learning (RL) to real-world problems. As a result, safe RL has emerged as a fundamental and powerful paradigm for optimizing an agent's policy while incorporating notions of safety. A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward subject to specific safety constraints. Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult. This challenge stems from the diversity of constraint representations and little exploration of their interrelations. To bridge this knowledge gap, we present a comprehensive review of representative constraint formulations, along with a curated selection of algorithms designed specifically for each formulation. In addition, we elucidate the theoretical underpinnings that reveal the mathematical mutual relations among common problem formulations. We conclude with a discussion of the current state and future directions of safe reinforcement learning research.

Updated: 2024-05-08 00:59:16

标题: 《安全强化学习中的约束形式调查》

摘要: 将强化学习（RL）应用于现实世界问题时，安全性至关重要。因此，安全RL已经成为优化代理策略的基本且强大的范例，同时融入安全性概念。一种流行的安全RL方法基于约束准则，旨在最大化预期累积奖励，同时满足特定的安全约束。尽管最近在RL中加强安全性的努力，但对该领域的系统性理解仍然困难。这一挑战源自约束表示形式的多样性，以及对它们相互关系的较少探索。为了弥补这一知识差距，我们提供了代表性约束形式的全面回顾，以及专门为每种形式设计的算法的精选。此外，我们阐明了揭示常见问题形式之间的数学相互关系的理论基础。最后，我们讨论了当前安全强化学习研究的现状和未来方向。

更新时间: 2024-05-08 00:59:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.02025v2

Cryptanalysis of the SIMON Cypher Using Neo4j

The exponential growth in the number of Internet of Things (IoT) devices has seen the introduction of several Lightweight Encryption Algorithms (LEA). While LEAs are designed to enhance the integrity, privacy and security of data collected and transmitted by IoT devices, it is hazardous to assume that all LEAs are secure and exhibit similar levels of protection. To improve encryption strength, cryptanalysts and algorithm designers routinely probe LEAs using various cryptanalysis techniques to identify vulnerabilities and limitations of LEAs. Despite recent improvements in the efficiency of cryptanalysis utilising heuristic methods and a Partial Difference Distribution Table (PDDT), the process remains inefficient, with the random nature of the heuristic inhibiting reproducible results. However, the use of a PDDT presents opportunities to identify relationships between differentials utilising knowledge graphs, leading to the identification of efficient paths throughout the PDDT. This paper introduces the novel use of knowledge graphs to identify intricate relationships between differentials in the SIMON LEA, allowing for the identification of optimal paths throughout the differentials, and increasing the effectiveness of the differential security analyses of SIMON.

Updated: 2024-05-08 00:52:57

标题: 使用Neo4j对SIMON密码的密码分析

摘要: 物联网（IoT）设备数量呈指数增长，引入了几种轻量级加密算法（LEA）。虽然LEA旨在增强物联网设备收集和传输的数据的完整性、隐私和安全性，但却危险地假设所有LEA都是安全的，并具有类似水平的保护。为了提高加密强度，密码分析师和算法设计师经常使用各种密码分析技术来检测LEA的漏洞和局限性。尽管最近密码分析利用启发式方法和部分差分分布表（PDDT）的效率有所提高，但这一过程仍然低效，启发式的随机性阻碍了可重复的结果。然而，使用PDDT可以发现不同差分之间的关系，利用知识图形，从而找到PDDT中的高效路径。本文介绍了知识图形的新颖用法，用于识别SIMON LEA中不同差分之间的复杂关系，从而确定不同差分的最佳路径，并提高SIMON的差分安全性分析的效果。

更新时间: 2024-05-08 00:52:57

领域: cs.CR,cs.DS,cs.IR

下载: http://arxiv.org/abs/2405.04735v1

S-EQA: Tackling Situational Queries in Embodied Question Answering

We present and tackle the problem of Embodied Question Answering (EQA) with Situational Queries (S-EQA) in a household environment. Unlike prior EQA work tackling simple queries that directly reference target objects and quantifiable properties pertaining them, EQA with situational queries (such as "Is the bathroom clean and dry?") is more challenging, as the agent needs to figure out not just what the target objects pertaining to the query are, but also requires a consensus on their states to be answerable. Towards this objective, we first introduce a novel Prompt-Generate-Evaluate (PGE) scheme that wraps around an LLM's output to create a dataset of unique situational queries, corresponding consensus object information, and predicted answers. PGE maintains uniqueness among the generated queries, using multiple forms of semantic similarity. We validate the generated dataset via a large scale user-study conducted on M-Turk, and introduce it as S-EQA, the first dataset tackling EQA with situational queries. Our user study establishes the authenticity of S-EQA with a high 97.26% of the generated queries being deemed answerable, given the consensus object data. Conversely, we observe a low correlation of 46.2% on the LLM-predicted answers to human-evaluated ones; indicating the LLM's poor capability in directly answering situational queries, while establishing S-EQA's usability in providing a human-validated consensus for an indirect solution. We evaluate S-EQA via Visual Question Answering (VQA) on VirtualHome, which unlike other simulators, contains several objects with modifiable states that also visually appear different upon modification -- enabling us to set a quantitative benchmark for S-EQA. To the best of our knowledge, this is the first work to introduce EQA with situational queries, and also the first to use a generative approach for query creation.

Updated: 2024-05-08 00:45:20

标题: S-EQA: 处理具体情境查询的具体问题回答

摘要: 我们提出并解决了在家庭环境中具有情境查询的具体问题回答（EQA）问题。与先前处理直接引用目标对象和与之相关的可量化属性的简单查询的EQA工作不同，具有情境查询的EQA（例如“浴室干净而干燥吗？”）更具挑战性，因为代理需要弄清楚不仅与查询相关的目标对象是什么，还需要就它们的状态达成共识以便回答。为实现这一目标，我们首先引入了一种新颖的Prompt-Generate-Evaluate（PGE）方案，该方案围绕LLM的输出创建了一个独特情境查询、相应共识对象信息和预测答案的数据集。PGE通过多种形式的语义相似性保持生成的查询的唯一性。我们通过在M-Turk上进行的大规模用户研究验证了生成的数据集，并将其作为S-EQA引入，这是处理具有情境查询的EQA的第一个数据集。我们的用户研究表明，97.26％的生成查询被认为是可回答的，考虑到共识对象数据的情况下，S-EQA的真实性得以确立。相反，我们观察到LLM预测的答案与人工评估的答案之间的低相关性为46.2％；这表明LLM在直接回答具有情境查询时的能力较差，同时也确立了S-EQA在提供间接解决方案的人类验证共识方面的可用性。我们通过VirtualHome上的视觉问答（VQA）对S-EQA进行评估，该环境与其他模拟器不同，包含多个具有可修改状态且在修改后外观不同的对象，从而使我们能够为S-EQA设定一个量化基准。据我们所知，这是第一个介绍具有情境查询的EQA的工作，也是第一个使用生成方法进行查询创建的工作。

更新时间: 2024-05-08 00:45:20

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2405.04732v1

Forecasting Ferry Passenger Flow Using Long-Short Term Memory Neural Networks

With recent studies related to Neural Networks being used on different forecasting and time series investigations, this study aims to expand these contexts to ferry passenger traffic. The primary objective of the study is to investigate and evaluate an LSTM-based Neural Networks' capability to forecast ferry passengers of two ports in the Philippines. The proposed model's fitting and evaluation of the passenger flow forecasting of the two ports is based on monthly passenger traffic from 2016 to 2022 data that was acquired from the Philippine Ports Authority (PPA). This work uses Mean Absolute Percentage Error (MAPE) as its primary metric to evaluate the model's forecasting capability. The proposed LSTM-based Neural Networks model achieved 72% forecasting accuracy to the Batangas port ferry passenger data and 74% forecasting accuracy to the Mindoro port ferry passenger data. Using Keras and Scikit-learn Python libraries, this work concludes a reasonable forecasting performance of the presented LSTM model. Aside from these notable findings, this study also recommends further investigation and studies on employing other statistical, machine learning, and deep learning methods on forecasting ferry passenger flows.

Updated: 2024-05-08 00:27:29

标题: 利用长短期记忆神经网络预测渡轮乘客流量

摘要: 随着最近有关神经网络在不同预测和时间序列研究中的应用的研究，本研究旨在将这些背景扩展到渡轮客流。该研究的主要目标是调查和评估基于LSTM的神经网络对菲律宾两个港口的渡轮乘客进行预测的能力。所提出的模型对这两个港口的乘客流量预测进行拟合和评估，基于从菲律宾港口管理局（PPA）获取的2016年至2022年的月度乘客流量数据。本研究使用平均绝对百分比误差（MAPE）作为评估模型预测能力的主要指标。所提出的基于LSTM的神经网络模型对Batangas港口渡轮乘客数据实现了72％的预测准确率，对Mindoro港口渡轮乘客数据实现了74％的预测准确率。使用Keras和Scikit-learn Python库，本研究得出了所提出的LSTM模型的合理预测性能。除这些显著发现外，本研究还建议进一步研究和研究如何在预测渡轮客流中采用其他统计、机器学习和深度学习方法。

更新时间: 2024-05-08 00:27:29

领域: cs.LG

下载: http://arxiv.org/abs/2405.02098v2

Espresso: Robust Concept Filtering in Text-to-Image Models

Diffusion-based text-to-image (T2I) models generate high-fidelity images for given textual prompts. They are trained on large datasets scraped from the Internet, potentially containing unacceptable concepts (e.g., copyright infringing or unsafe). Retraining T2I models after filtering out unacceptable concepts in the training data is inefficient and degrades utility. Hence, there is a need for concept removal techniques (CRTs) which are effective in removing unacceptable concepts, utility-preserving on acceptable concepts, and robust against evasion with adversarial prompts. None of the prior filtering and fine-tuning CRTs satisfy all these requirements simultaneously. We introduce Espresso, the first robust concept filter based on Contrastive Language-Image Pre-Training (CLIP). It identifies unacceptable concepts by projecting the generated image's embedding onto the vector connecting unacceptable and acceptable concepts in the joint text-image embedding space. This ensures robustness by restricting the adversary to adding noise only along this vector, in the direction of the acceptable concept. Further fine-tuning Espresso to separate embeddings of acceptable and unacceptable concepts, while preserving their pairing with image embeddings, ensures both effectiveness and utility. We evaluate Espresso on eleven concepts to show that it is effective (~5% CLIP accuracy on unacceptable concepts), utility-preserving (~93% normalized CLIP score on acceptable concepts), and robust (~4% CLIP accuracy on adversarial prompts for unacceptable concepts). Finally, we present theoretical bounds for the certified robustness of Espresso against adversarial prompts, and an empirical analysis.

Updated: 2024-05-08 00:22:32

标题: Espresso：文本到图像模型中的强大概念过滤

摘要: 基于扩散的文本到图像（T2I）模型为给定的文本提示生成高保真度的图像。它们在从互联网上抓取的大型数据集上进行训练，这些数据集可能包含不可接受的概念（例如侵犯版权或不安全）。在过滤掉训练数据中的不可接受概念后重新训练T2I模型是低效的并且会降低效用。因此，有必要开发概念移除技术（CRTs），这些技术在移除不可接受概念方面有效，对可接受概念保持效用，并且能够抵御通过对抗提示进行的规避。之前的过滤和微调CRTs没有同时满足所有这些要求。我们介绍了Espresso，这是基于对比语言-图像预训练（CLIP）的第一个强大的概念过滤器。它通过将生成的图像嵌入投影到联合文本-图像嵌入空间中连接不可接受和可接受概念的矢量上来识别不可接受的概念。这确保了通过限制对手仅沿着这个矢量的方向添加噪声来提高鲁棒性。进一步微调Espresso以区分可接受和不可接受概念的嵌入，同时保持它们与图像嵌入的配对，确保了有效性和效用。我们对Espresso进行了评估，展示了它的有效性（不可接受概念的约5% CLIP准确性）、保持效用性（可接受概念的约93%标准化CLIP得分）和鲁棒性（对不可接受概念的对抗提示的约4% CLIP准确性）。最后，我们提出了Espresso对抗对抗提示的认证鲁棒性的理论界限，并进行了经验分析。

更新时间: 2024-05-08 00:22:32

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2404.19227v3

CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion

This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at https://github.com/nlp-tlp/CleanGraph under the MIT License.

Updated: 2024-05-08 00:18:45

标题: CleanGraph: 人机协同的知识图谱精炼和完善

摘要: 本文介绍了CleanGraph，这是一个交互式基于Web的工具，旨在促进知识图谱的精细化和完善。保持知识图谱的可靠性是至关重要的，这些知识图谱基于高质量和无误的事实，适用于问答和信息检索系统等现实世界应用。这些图谱通常是通过从文本来源提取语义三元组自动组装而成的。然而，在处理大型或低质量数据集时，确保这些提取的三元组的质量，可能会带来重大挑战，并严重影响下游应用的性能。CleanGraph允许用户对其图形执行创建、读取、更新和删除（CRUD）操作，并应用插件形式的模型进行图形精细化和完善任务。这些功能使用户能够增强其图形数据的完整性和可靠性。CleanGraph的演示和源代码可以在MIT许可下访问https://github.com/nlp-tlp/CleanGraph。

更新时间: 2024-05-08 00:18:45

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.03932v2