Arxiv Day: Article

DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome

Decoding the linguistic intricacies of the genome is a crucial problem in biology, and pre-trained foundational models such as DNABERT and Nucleotide Transformer have made significant strides in this area. Existing works have largely hinged on k-mer, fixed-length permutations of A, T, C, and G, as the token of the genome language due to its simplicity. However, we argue that the computation and sample inefficiencies introduced by k-mer tokenization are primary obstacles in developing large genome foundational models. We provide conceptual and empirical insights into genome tokenization, building on which we propose to replace k-mer tokenization with Byte Pair Encoding (BPE), a statistics-based data compression algorithm that constructs tokens by iteratively merging the most frequent co-occurring genome segment in the corpus. We demonstrate that BPE not only overcomes the limitations of k-mer tokenization but also benefits from the computational efficiency of non-overlapping tokenization. Based on these insights, we introduce DNABERT-2, a refined genome foundation model that adapts an efficient tokenizer and employs multiple strategies to overcome input length constraints, reduce time and memory expenditure, and enhance model capability. Furthermore, we identify the absence of a comprehensive and standardized benchmark for genome understanding as another significant impediment to fair comparative analysis. In response, we propose the Genome Understanding Evaluation (GUE), a comprehensive multi-species genome classification dataset that amalgamates $36$ distinct datasets across $9$ tasks, with input lengths ranging from $70$ to $10000$. Through comprehensive experiments on the GUE benchmark, we demonstrate that DNABERT-2 achieves comparable performance to the state-of-the-art model with $21 \times$ fewer parameters and approximately $92 \times$ less GPU time in pre-training.

Updated: 2024-03-18 23:59:29

标题: DNABERT-2：多物种基因组的高效基础模型和基准测试

摘要: 解码基因组的语言细微差别是生物学中的一个关键问题，而预先训练的基础模型，如DNABERT和核苷酸Transformer，在这一领域取得了重大进展。现有作品主要依赖于k-mer，即A、T、C和G的固定长度排列，作为基因组语言的标记，因其简单性而被广泛使用。然而，我们认为k-mer标记化引入的计算和样本效率低下是发展大型基因组基础模型的主要障碍。我们提供了对基因组标记化的概念性和实证性见解，基于此，我们建议用字节对编码（BPE）替换k-mer标记化，BPE是一种基于统计的数据压缩算法，通过迭代地合并语料库中最常出现的基因组片段来构建标记。我们证明BPE不仅克服了k-mer标记化的局限性，而且从非重叠标记化的计算效率中获益。基于这些见解，我们介绍了DNABERT-2，这是一个精致的基因组基础模型，采用了高效的标记器，并采用多种策略来克服输入长度约束，减少时间和内存开销，并增强模型性能。此外，我们发现基因组理解缺乏全面且标准化的基准，这也是公正比较分析的另一个重要障碍。为此，我们提出了基因组理解评估（GUE），这是一个综合的多物种基因组分类数据集，汇集了36个不同的数据集，涵盖9个任务，输入长度范围从70到10000。通过对GUE基准测试的全面实验，我们证明DNABERT-2在预训练中具有与最先进模型可比的性能，参数数量少了21倍，GPU时间减少了约92倍。

更新时间: 2024-03-18 23:59:29

领域: q-bio.GN,cs.AI,cs.CE,cs.CL

下载: http://arxiv.org/abs/2306.15006v2

Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training

Efficient and biologically plausible alternatives to backpropagation in neural network training remain a challenge due to issues such as high computational complexity and additional assumptions about neural networks, which limit scalability to deeper networks. The likelihood ratio method offers a promising gradient estimation strategy but is constrained by significant memory consumption, especially when deploying multiple copies of data to reduce estimation variance. In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. By exploiting the natural parallelism during the backward pass using LR, we further provide a high-performance training strategy, which pipelines both the forward and backward pass, to make it more suitable for the computation on specialized hardware. Extensive experiments demonstrate the effectiveness of the approximation technique in neural network training. This work underscores the potential of the likelihood ratio method in achieving high-performance neural network training, suggesting avenues for further exploration.

Updated: 2024-03-18 23:23:50

标题: 近似似然比：一种前向和并行框架用于增强神经网络训练

摘要: 在神经网络训练中，高效且符合生物学原理的替代backpropagation方法仍然是一个挑战，主要是因为高计算复杂性和对神经网络的额外假设限制了其在更深层次网络中的可扩展性。似然比方法提供了一种有前途的梯度估计策略，但受到显著内存消耗的限制，尤其是在部署多份数据以降低估计方差时。在本文中，我们引入了一种似然比（LR）方法的近似技术，以减轻梯度估计中的计算和内存需求。通过利用LR在反向传播过程中的自然并行性，我们进一步提供了一种高性能训练策略，将前向和反向传播串联起来，使其更适合在专用硬件上进行计算。大量实验证明了这种近似技术在神经网络训练中的有效性。这项工作强调了似然比方法在实现高性能神经网络训练中的潜力，为进一步探索提供了途径。

更新时间: 2024-03-18 23:23:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.12320v1

Improving LoRA in Privacy-preserving Federated Learning

Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server. Thus, this paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges and further halve the communication cost of federated fine-tuning LLMs. The core idea of FFA-LoRA is to fix the randomly initialized non-zero matrices and only fine-tune the zero-initialized matrices. Compared to LoRA, FFA-LoRA is motivated by practical and theoretical benefits in privacy-preserved FL. Our experiments demonstrate that FFA-LoRA provides more consistent performance with better computational efficiency over vanilla LoRA in various FL tasks.

Updated: 2024-03-18 23:20:08

标题: 改善LoRA在隐私保护的联邦学习中的应用

摘要: 低秩适应（LoRA）是预训练语言模型上最受欢迎的任务特定参数高效微调（PEFT）方法之一，因其良好的性能和计算效率而闻名。LoRA通过在每个冻结的预训练模型模块顶部注入两个可训练秩分解矩阵的乘积。然而，在隐私保护的联邦学习（FL）环境中应用时，LoRA可能会变得不稳定，原因如下：1）数据异质性和多步本地更新的影响不可忽视，2）为了保证差分隐私（DP），对更新梯度施加的加法噪声可能会被放大，3）最终性能易受超参数影响。导致这些现象的一个关键因素是由本地客户端联合优化两个低秩矩阵，而由中央服务器分别聚合它们之间的不一致。因此，本文提出了LoRA的高效而有效的版本Federated Freeze A LoRA（FFA-LoRA），以缓解这些挑战，并进一步将联邦微调LLM的通信成本减半。FFA-LoRA的核心思想是固定随机初始化的非零矩阵，仅微调零初始化的矩阵。与LoRA相比，FFA-LoRA在隐私保护的FL中具有更好的实际和理论效益。我们的实验表明，在各种FL任务中，FFA-LoRA比普通LoRA提供更一致的性能和更好的计算效率。

更新时间: 2024-03-18 23:20:08

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2403.12313v1

Reinforcement Learning from Delayed Observations via World Models

In standard Reinforcement Learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true due to physical constraints and can significantly impact the performance of RL algorithms. In this paper, we focus on addressing observation delays in partially observable environments. We propose leveraging world models, which have shown success in integrating past observations and learning dynamics, to handle observation delays. By reducing delayed POMDPs to delayed MDPs with world models, our methods can effectively handle partial observability, where existing approaches achieve sub-optimal performance or even degrade quickly as observability decreases. Experiments suggest that one of our methods can outperform a naive model-based approach by up to %30. Moreover, we evaluate our methods on visual input based delayed environment, for the first time showcasing delay-aware reinforcement learning on visual observations.

Updated: 2024-03-18 23:18:27

标题: 延迟观察下基于世界模型的强化学习

摘要: 在标准的强化学习设置中，代理通常假设在采取行动后会立即得到有关行动效果的反馈。然而，在实践中，由于物理限制，这种假设可能不成立，并且可能严重影响强化学习算法的性能。在本文中，我们专注于解决部分可观察环境中的观察延迟问题。我们提出利用世界模型来处理观察延迟，世界模型已经在整合过去观察和学习动态方面取得成功。通过将延迟部分可观察马尔可夫决策过程（POMDPs）转化为带有世界模型的延迟马尔可夫决策过程（MDPs），我们的方法可以有效处理部分可观察性，而现有方法在可观察性降低时会达到次优性能甚至快速恶化。实验证明，我们的某种方法可以比天真的基于模型的方法提高%30的性能。此外，我们首次在基于视觉输入的延迟环境上评估了我们的方法，展示了对视觉观察敏感的延迟感知强化学习。

更新时间: 2024-03-18 23:18:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.12309v1

Gradient-based Fuzzy System Optimisation via Automatic Differentiation -- FuzzyR as a Use Case

Since their introduction, fuzzy sets and systems have become an important area of research known for its versatility in modelling, knowledge representation and reasoning, and increasingly its potential within the context explainable AI. While the applications of fuzzy systems are diverse, there has been comparatively little advancement in their design from a machine learning perspective. In other words, while representations such as neural networks have benefited from a boom in learning capability driven by an increase in computational performance in combination with advances in their training mechanisms and available tool, in particular gradient descent, the impact on fuzzy system design has been limited. In this paper, we discuss gradient-descent-based optimisation of fuzzy systems, focussing in particular on automatic differentiation -- crucial to neural network learning -- with a view to free fuzzy system designers from intricate derivative computations, allowing for more focus on the functional and explainability aspects of their design. As a starting point, we present a use case in FuzzyR which demonstrates how current fuzzy inference system implementations can be adjusted to leverage powerful features of automatic differentiation tools sets, discussing its potential for the future of fuzzy system design.

Updated: 2024-03-18 23:18:16

标题: 基于梯度的模糊系统优化通过自动微分——以FuzzyR为案例

摘要: 自从它们被引入以来，模糊集和系统已经成为一个重要的研究领域，以其在建模、知识表示和推理方面的多功能性而闻名，并且在可解释AI的背景下，其潜力越来越大。尽管模糊系统的应用是多样化的，但从机器学习的角度来看，它们的设计进展相对较少。换句话说，尽管诸如神经网络之类的表示法受益于学习能力的迅速增长，这是由于计算性能的提高以及训练机制和可用工具的进步，特别是梯度下降，但对模糊系统设计的影响有限。在本文中，我们讨论了基于梯度下降的模糊系统优化，特别侧重于自动微分--对神经网络学习至关重要--旨在使模糊系统设计者摆脱繁琐的导数计算，更多地专注于功能和可解释性方面的设计。作为起点，我们提出了在FuzzyR中的一个用例，演示了如何调整当前模糊推理系统实现以利用自动微分工具集的强大功能，并讨论了它对模糊系统设计未来的潜力。

更新时间: 2024-03-18 23:18:16

领域: cs.AI

下载: http://arxiv.org/abs/2403.12308v1

Molecular Classification Using Hyperdimensional Graph Classification

Our work introduces an innovative approach to graph learning by leveraging Hyperdimensional Computing. Graphs serve as a widely embraced method for conveying information, and their utilization in learning has gained significant attention. This is notable in the field of chemoinformatics, where learning from graph representations plays a pivotal role. An important application within this domain involves the identification of cancerous cells across diverse molecular structures. We propose an HDC-based model that demonstrates comparable Area Under the Curve results when compared to state-of-the-art models like Graph Neural Networks (GNNs) or the Weisfieler-Lehman graph kernel (WL). Moreover, it outperforms previously proposed hyperdimensional computing graph learning methods. Furthermore, it achieves noteworthy speed enhancements, boasting a 40x acceleration in the training phase and a 15x improvement in inference time compared to GNN and WL models. This not only underscores the efficacy of the HDC-based method, but also highlights its potential for expedited and resource-efficient graph learning.

Updated: 2024-03-18 23:16:17

标题: 使用高维图分类的分子分类

摘要: 我们的工作通过利用超高维计算引入了一种创新的图学习方法。图被广泛应用于传递信息，并且它们在学习中的利用引起了极大关注。在化学信息学领域，从图表示中学习起着至关重要的作用。该领域中的一个重要应用涉及在不同的分子结构中识别癌细胞。我们提出了一种基于HDC的模型，与最先进的模型如图神经网络（GNNs）或Weisfieler-Lehman图核（WL）相比，表现出可比的曲线下面积结果。此外，它优于先前提出的超高维计算图学习方法。此外，它实现了显著的速度提升，在训练阶段加速了40倍，在推断时间上提升了15倍，相比于GNN和WL模型。这不仅突显了基于HDC的方法的有效性，还凸显了其在加快和资源有效的图学习方面的潜力。

更新时间: 2024-03-18 23:16:17

领域: cs.LG,cs.AI,cs.NE,q-bio.QM

下载: http://arxiv.org/abs/2403.12307v1

Language Modeling Is Compression

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

Updated: 2024-03-18 23:15:47

标题: 语言建模是压缩

摘要: 长期以来，人们已经确认预测模型可以转化为无损压缩器，反之亦然。巧合的是，近年来，机器学习社区专注于训练越来越大、更强大的自监督（语言）模型。由于这些大型语言模型展现出令人印象深刻的预测能力，它们具备成为强大压缩器的潜力。在这项工作中，我们主张通过压缩的视角来看待预测问题，并评估大型（基础）模型的压缩能力。我们展示了大型语言模型是强大的通用预测器，并且压缩视角为扩展规律、标记化和上下文学习提供了新颖的见解。例如，Chinchilla 70B虽然主要在文本上训练，但可以将ImageNet的块压缩至其原始尺寸的43.4%，将LibriSpeech的样本压缩至其原始尺寸的16.4%，分别超过了领域特定的压缩器如PNG（58.5%）或FLAC（30.3%）。最后，我们展示了预测-压缩等价性使我们能够使用任何压缩器（如gzip）构建条件生成模型。

更新时间: 2024-03-18 23:15:47

领域: cs.LG,cs.AI,cs.CL,cs.IT,math.IT

下载: http://arxiv.org/abs/2309.10668v2

Fisher Mask Nodes for Language Model Merging

Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple task-specific models into a single multi-task model. In this study, we introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning. Utilizing the Fisher information of mask nodes within the Transformer architecture, we devise a computationally efficient weighted-averaging scheme. Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost, with baseline performance improvements of up to +6.5 and a speedup of 57.4x in the biggest model. Our results prove the potential of our method in current multi-task learning environments and suggest its scalability and adaptability to new model architectures and learning scenarios.

Updated: 2024-03-18 23:10:24

标题: 费舍尔蒙特卡洛节点用于语言模型融合

摘要: 微调预训练模型在下游性能方面具有显著优势。预训练模型（如BERT及其衍生物）在自然语言处理中的普遍性也导致了任务特定的微调模型的激增。由于这些模型通常只能很好地执行一项任务，因此在多任务场景中需要额外的训练或集成。模型合并领域的不断发展提供了解决方案，解决了将多个任务特定模型合并为单一多任务模型的挑战。在这项研究中，我们引入了一种新颖的Transformers模型合并方法，结合了先前在Fisher加权平均和模型修剪中使用Fisher信息的见解。利用Transformer架构中的掩码节点的Fisher信息，我们设计了一种计算效率高的加权平均方案。我们的方法在BERT系列中的各种模型中表现出稳定且显著的性能提升，在计算成本的一小部分内表现优于全尺度的Fisher加权平均，基准性能提升高达+6.5，最大模型的加速比为57.4倍。我们的结果证明了我们的方法在当前多任务学习环境中的潜力，并表明其可扩展性和适应性适用于新的模型架构和学习情景。

更新时间: 2024-03-18 23:10:24

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.09891v2

Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach

Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10), lack granularity for certain diagnoses, but clinicians will add this granularity (as that found within the Diagnostic and Statistical Manual of Mental Disorders classification or DSM-5) as supplemental unstructured text in clinical notes. Traditional natural language processing (NLP) methods face limitations in accurately parsing such diverse clinical language. Large Language Models (LLMs) offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of LLMs for extracting severity-related information for various SUD diagnoses from clinical notes. We propose a workflow employing zero-shot learning of LLMs with carefully crafted prompts and post-processing techniques. Through experimentation with Flan-T5, an open-source LLM, we demonstrate its superior recall compared to the rule-based approach. Focusing on 11 categories of SUD diagnoses, we show the effectiveness of LLMs in extracting severity information, contributing to improved risk assessment and treatment planning for SUD patients.

Updated: 2024-03-18 22:39:03

标题: 利用大型语言模型从临床记录中提取物质使用障碍严重程度信息：一种零样本学习方法

摘要: 物质使用障碍（SUD）由于其对健康和社会的有害影响，构成了一个重要的关注点。SUD的识别和治疗取决于诸多因素，如严重程度、共同决定因素（如戒断症状）和社会健康的决定因素。美国保险提供商使用的现有诊断编码系统，如国际疾病分类（ICD-10），对某些诊断缺乏细致度，但临床医生会在临床记录中添加这种细致度（如在《精神障碍诊断与统计手册》或DSM-5中找到的）。传统的自然语言处理（NLP）方法在准确解析如此多样的临床语言方面存在局限性。大型语言模型（LLMs）通过适应多样的语言模式，有望克服这些挑战。本研究调查了LLMs在从临床记录中提取各种SUD诊断的严重性相关信息方面的应用。我们提出了一种工作流程，采用精心制作的提示和后处理技术对LLMs进行零样本学习。通过对一个开源LLM，Flan-T5的实验，我们展示了它相对于基于规则的方法具有更优越的召回率。聚焦于11个SUD诊断类别，我们展示了LLMs在提取严重性信息方面的有效性，有助于改善SUD患者的风险评估和治疗规划。

更新时间: 2024-03-18 22:39:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.12297v1

A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.

Updated: 2024-03-18 22:11:45

标题: 一个镜像的库：低维深度神经网络是具有反射特征的凸Lasso模型

摘要: 我们证明，在一维数据上训练神经网络等价于解决一个具有固定、明确定义的特征字典矩阵的凸Lasso问题。具体的字典取决于激活函数和深度。我们考虑具有分段线性激活函数的两层网络，深窄的ReLU网络最多有4层，以及具有符号激活和任意深度的矩形和树网络。有趣的是，在ReLU网络中，第四层创建的特征表示关于自身的训练数据的反射。Lasso表示法揭示了对全局最优网络和解决方案空间的洞见。

更新时间: 2024-03-18 22:11:45

领域: cs.LG,cs.AI,cs.NE,math.OC,stat.ML

下载: http://arxiv.org/abs/2403.01046v2

Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making

In this work, we study the effects of feature-based explanations on distributive fairness of AI-assisted decisions, specifically focusing on the task of predicting occupations from short textual bios. We also investigate how any effects are mediated by humans' fairness perceptions and their reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, we see that such explanations do not enable humans to discern correct and incorrect AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter AI recommendations that align with gender stereotypes. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results imply that feature-based explanations are not a reliable mechanism to improve distributive fairness.

Updated: 2024-03-18 22:11:33

标题: 人工智能决策中的解释、公平性和适当依赖

摘要: 在这项工作中，我们研究了基于特征的解释对人工智能辅助决策的分配公平性的影响，具体关注从简短文本传记中预测职业的任务。我们还调查了这些影响如何通过人类的公平感知和他们对人工智能建议的依赖来中介。我们的研究结果表明，解释影响了公平感知，进而与人类倾向于遵循人工智能建议相关。然而，我们发现这种解释并不能使人类区分正确和错误的人工智能建议。相反，我们发现它们可能影响依赖行为，无论人工智能建议的正确性如何。取决于解释突出显示的特征，这可能促进或阻碍分配公平性：当解释突出显示与敏感属性明显相关的与任务无关的特征时，这会引发覆盖，抵消与性别刻板印象一致的人工智能建议。与此同时，如果解释看起来与任务相关，这将引发强化与刻板印象一致的错误的依赖行为。这些结果暗示基于特征的解释并不是改善分配公平性的可靠机制。

更新时间: 2024-03-18 22:11:33

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2209.11812v5

Parasitic Circus:On the Feasibility of Golden Free PCB Verification

Printed circuit boards (PCBs) are an integral part of electronic systems. Hence, verifying their physical integrity in the presence of supply chain attacks (e.g., tampering and counterfeiting) is of utmost importance. Recently, tamper detection techniques grounded in impedance characterization of PCB's Power Delivery Network (PDN) have gained prominence due to their global detection coverage, non-invasive, and low-cost nature. Similar to other physical verification methods, these techniques rely on the existence of a physical golden sample for signature comparisons. However, having access to a physical golden sample for golden signature extraction is not feasible in many real-world scenarios. In this work, we assess the feasibility of eliminating a physical golden sample and replacing it with a simulated golden signature obtained by the PCB design files. By performing extensive simulation and measurements on an in-house designed PCB, we demonstrate how the parasitic impedance of the PCB components plays a major role in reaching a successful verification. Based on the obtained results and using statistical metrics, we show that we can mitigate the discrepancy between collected signatures from simulation and measurements.

Updated: 2024-03-18 21:04:02

标题: 寄生马戏团：关于金色无 PCB 验证的可行性

摘要: Printed circuit boards (PCBs) are an essential component of electronic systems and ensuring their physical integrity in the face of supply chain attacks such as tampering and counterfeiting is crucial. Tamper detection techniques based on impedance characterization of PCB's Power Delivery Network (PDN) have become popular due to their global detection coverage, non-invasive nature, and cost-effectiveness. However, these techniques typically require a physical golden sample for signature comparisons, which may not be feasible in real-world scenarios. In this study, we investigate the possibility of using simulated golden signatures obtained from PCB design files instead of physical golden samples. Through extensive simulation and measurements on an in-house designed PCB, we demonstrate the significant impact of parasitic impedance of PCB components on successful verification. Our results, supported by statistical metrics, show that the discrepancy between signatures obtained from simulation and measurements can be mitigated.

更新时间: 2024-03-18 21:04:02

领域: cs.CR

下载: http://arxiv.org/abs/2403.12252v1

Reference-based Metrics Disprove Themselves in Question Generation

Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicated the annotation process and collect another reference. A good metric was expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.

Updated: 2024-03-18 20:47:10

标题: 基于参考的度量在问题生成中自相矛盾

摘要: 参考度量标准如BLEU和BERTScore被广泛用于评估问题生成（QG）。在这项研究中，我们发现在诸如SQuAD和HotpotQA等QG基准测试中，使用人类编写的参考文献不能保证基于参考文献的度量标准的有效性。大多数QG基准测试只有一个参考文献；我们复制了注释过程并收集了另一个参考文献。一个好的度量标准应该对人类验证的问题不比生成的问题差。然而，在我们新收集的参考文献上，基于参考文献的度量标准的结果证明了度量标准本身的不准确性。我们提出了一个由自然性、可回答性和复杂性等多维标准组成的无参考度量标准，利用大型语言模型。这些标准不受限于单个参考问题的句法或语义，该度量标准也不需要多样化的参考文献。实验证明，我们的度量标准能够准确区分高质量问题和有缺陷的问题，并且与人类判断达到最新水平的一致性。

更新时间: 2024-03-18 20:47:10

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.12242v1

RLIF: Interactive Imitation Learning as Reinforcement Learning

Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict na\"ive behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: https://rlif-page.github.io

Updated: 2024-03-18 20:45:17

标题: RLIF：交互式模仿学习作为强化学习

摘要: 尽管强化学习方法为自动技能获取提供了强大的框架，但在领域如机器人技术等实际基于学习的控制问题中，模仿学习通常提供了更方便和易于访问的替代方案。特别是，一种互动式的模仿学习方法，如DAgger，通过在线询问一个接近最优专家来收集校正数据，以解决困扰天真行为克隆的分布移位挑战，可以在理论和实践中都表现良好，而无需手动指定奖励函数和完整强化学习方法的其他组件。在本文中，我们探讨了如何离线策略强化学习在类似但潜在更实用的假设下可以实现更好的性能，这与互动式模仿学习的假设相似。我们提出的方法使用强化学习与用户干预信号本身作为奖励。这放宽了互动式模仿学习中干预专家应该接近最优的假设，并使算法能够学习行为，以改善潜在的次优人类专家。我们还提供了一个统一的框架来分析我们的RL方法和DAgger；我们提出了这两种方法的次优差异的渐近分析以及我们方法的非渐近样本复杂性界限。然后，我们在具有挑战性的高维连续控制仿真基准测试以及真实世界的基于视觉的机器人操纵任务上评估我们的方法。结果表明，它在不同任务中明显优于类似DAgger的方法，特别是在干预专家是次优时。代码和视频可在项目网站上找到：https://rlif-page.github.io.

更新时间: 2024-03-18 20:45:17

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2311.12996v2

Large language models in 6G security: challenges and opportunities

The rapid integration of Generative AI (GenAI) and Large Language Models (LLMs) in sectors such as education and healthcare have marked a significant advancement in technology. However, this growth has also led to a largely unexplored aspect: their security vulnerabilities. As the ecosystem that includes both offline and online models, various tools, browser plugins, and third-party applications continues to expand, it significantly widens the attack surface, thereby escalating the potential for security breaches. These expansions in the 6G and beyond landscape provide new avenues for adversaries to manipulate LLMs for malicious purposes. We focus on the security aspects of LLMs from the viewpoint of potential adversaries. We aim to dissect their objectives and methodologies, providing an in-depth analysis of known security weaknesses. This will include the development of a comprehensive threat taxonomy, categorizing various adversary behaviors. Also, our research will concentrate on how LLMs can be integrated into cybersecurity efforts by defense teams, also known as blue teams. We will explore the potential synergy between LLMs and blockchain technology, and how this combination could lead to the development of next-generation, fully autonomous security solutions. This approach aims to establish a unified cybersecurity strategy across the entire computing continuum, enhancing overall digital security infrastructure.

Updated: 2024-03-18 20:39:34

标题: 大型语言模型在6G安全中的挑战和机遇

摘要: Generative AI（GenAI）和大型语言模型（LLMs）在教育和医疗等领域的快速整合标志着技术的重大进步。然而，这种增长也导致了一个很大程度上未被探索的方面：它们的安全漏洞。随着同时包括离线和在线模型、各种工具、浏览器插件和第三方应用的生态系统不断扩大，攻击面显著扩大，进而加剧了安全漏洞的潜在风险。6G及更高版本的扩展为对手操纵LLMs进行恶意目的提供了新途径。我们从潜在对手的视角关注LLMs的安全方面。我们旨在剖析他们的目标和方法论，提供对已知安全弱点的深入分析。这将包括发展一个全面的威胁分类法，对各种对手行为进行分类。此外，我们的研究将集中于LLMs如何被纳入防御团队（也称为蓝队）的网络安全工作中。我们将探讨LLMs和区块链技术之间的潜在协同作用，以及这种组合如何可能导致下一代完全自主的安全解决方案的发展。这种方法旨在建立一个统一的网络安全战略，提升整体数字安全基础设施。

更新时间: 2024-03-18 20:39:34

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2403.12239v1

Efficient Transformer-based Hyper-parameter Optimization for Resource-constrained IoT Environments

The hyper-parameter optimization (HPO) process is imperative for finding the best-performing Convolutional Neural Networks (CNNs). The automation process of HPO is characterized by its sizable computational footprint and its lack of transparency; both important factors in a resource-constrained Internet of Things (IoT) environment. In this paper, we address these problems by proposing a novel approach that combines transformer architecture and actor-critic Reinforcement Learning (RL) model, TRL-HPO, equipped with multi-headed attention that enables parallelization and progressive generation of layers. These assumptions are founded empirically by evaluating TRL-HPO on the MNIST dataset and comparing it with state-of-the-art approaches that build CNN models from scratch. The results show that TRL-HPO outperforms the classification results of these approaches by 6.8% within the same time frame, demonstrating the efficiency of TRL-HPO for the HPO process. The analysis of the results identifies the main culprit for performance degradation attributed to stacking fully connected layers. This paper identifies new avenues for improving RL-based HPO processes in resource-constrained environments.

Updated: 2024-03-18 20:35:35

标题: Resource-constrained IoT环境下高效的基于Transformer的超参数优化

摘要: 超参数优化（HPO）过程对于找到表现最佳的卷积神经网络（CNNs）至关重要。HPO的自动化过程以其庞大的计算量和缺乏透明度为特征；这两个因素在资源受限的物联网环境中至关重要。在本文中，我们通过提出一种新颖的方法来解决这些问题，该方法结合了变压器架构和演员-评论者强化学习（RL）模型，即TRL-HPO，配备有多头注意力，实现了并行化和逐步生成层。通过在MNIST数据集上评估TRL-HPO并与从头开始构建CNN模型的最先进方法进行比较，我们经验性地验证了这些假设。结果表明，在相同的时间框架内，TRL-HPO的分类结果优于这些方法的结果6.8％，表明了TRL-HPO在HPO过程中的高效性。结果分析确定了性能下降的主要原因，即完全连接层的堆叠。本文确定了在资源受限环境中改进基于RL的HPO过程的新途径。

更新时间: 2024-03-18 20:35:35

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.12237v1

Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review

Leveraging Artificial Intelligence (AI) in decision support systems has disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. A human-centered perspective attempts to alleviate this concern by designing AI solutions for seamless integration with existing processes. Determining what information AI should provide to aid humans is vital, a concept underscored by explainable AI's efforts to justify AI predictions. However, how the information is presented, e.g., the sequence of recommendations and solicitation of interpretations, is equally crucial as complex interactions may emerge between humans and AI. While empirical studies have evaluated human-AI dynamics across domains, a common vocabulary for human-AI interaction protocols is lacking. To promote more deliberate consideration of interaction designs, we introduce a taxonomy of interaction patterns that delineate various modes of human-AI interactivity. We summarize the results of a systematic review of AI-assisted decision making literature and identify trends and opportunities in existing interactions across application domains from 105 articles. We find that current interactions are dominated by simplistic collaboration paradigms, leading to little support for truly interactive functionality. Our taxonomy offers a tool to understand interactivity with AI in decision-making and foster interaction designs for achieving clear communication, trustworthiness, and collaboration.

Updated: 2024-03-18 20:31:05

标题: 人工智能协作尚不够协作：一项系统性评价中关于AI辅助决策中交互模式的分类学

摘要: 人工智能在决策支持系统中的应用主要集中在技术进步，往往忽视了算法输出与人类期望之间的一致性。人类中心的观点试图通过设计人工智能解决方案，实现与现有流程的无缝集成，以缓解这一问题。确定人工智能应提供哪些信息以帮助人类是至关重要的，这一概念得到了可解释人工智能努力证明人工智能预测的支持。然而，信息的呈现方式，例如建议的顺序和解释的征求，同样至关重要，因为人类与人工智能之间可能出现复杂的互动。虽然经验研究已评估了跨领域的人工智能人际动态，但缺乏一种共同的人工智能交互协议词汇。为了促进更加深思熟虑的交互设计，我们引入了一种交互模式的分类法，界定了各种人工智能人际互动方式。我们总结了AI辅助决策文献的系统回顾结果，并从105篇文章中确定了现有交互在应用领域中的趋势和机遇。我们发现目前的交互主要由简单的协作范式主导，几乎没有真正互动功能的支持。我们的分类法为理解决策中与人工智能的互动提供了工具，并促进交互设计，实现清晰沟通、可信度和合作。

更新时间: 2024-03-18 20:31:05

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2310.19778v3

Thwarting Cybersecurity Attacks with Explainable Concept Drift

Cyber-security attacks pose a significant threat to the operation of autonomous systems. Particularly impacted are the Heating, Ventilation, and Air Conditioning (HVAC) systems in smart buildings, which depend on data gathered by sensors and Machine Learning (ML) models using the captured data. As such, attacks that alter the readings of these sensors can severely affect the HVAC system operations impacting residents' comfort and energy reduction goals. Such attacks may induce changes in the online data distribution being fed to the ML models, violating the fundamental assumption of similarity in training and testing data distribution. This leads to a degradation in model prediction accuracy due to a phenomenon known as Concept Drift (CD) - the alteration in the relationship between input features and the target variable. Addressing CD requires identifying the source of drift to apply targeted mitigation strategies, a process termed drift explanation. This paper proposes a Feature Drift Explanation (FDE) module to identify the drifting features. FDE utilizes an Auto-encoder (AE) that reconstructs the activation of the first layer of the regression Deep Learning (DL) model and finds their latent representations. When a drift is detected, each feature of the drifting data is replaced by its representative counterpart from the training data. The Minkowski distance is then used to measure the divergence between the altered drifting data and the original training data. The results show that FDE successfully identifies 85.77 % of drifting features and showcases its utility in the DL adaptation method under the CD phenomenon. As a result, the FDE method is an effective strategy for identifying drifting features towards thwarting cyber-security attacks.

Updated: 2024-03-18 20:20:00

标题: 用可解释的概念漂移阻挠网络安全攻击

摘要: 网络安全攻击对自主系统的运行构成重大威胁。受影响最严重的是智能建筑中的供暖、通风和空调（HVAC）系统，这些系统依赖于传感器和机器学习（ML）模型利用捕获的数据。因此，攻击可能会改变这些传感器的读数，严重影响HVAC系统的运行，影响居民的舒适度和节能目标。此类攻击可能会导致在线数据分布发生变化，这些数据被馈送给ML模型，违反了训练和测试数据分布相似性的基本假设。这导致了模型预测精度的降低，这是因为所谓的概念漂移（CD）现象——输入特征与目标变量之间的关系发生了变化。解决CD问题需要识别漂移的来源并应用有针对性的缓解策略，这个过程被称为漂移解释。本文提出了一个特征漂移解释（FDE）模块来识别漂移特征。FDE利用一个自动编码器（AE）来重建回归深度学习（DL）模型的第一层的激活，并找到它们的潜在表示。当检测到漂移时，漂移数据的每个特征都被替换为训练数据中的代表性对应物。然后使用明科夫斯基距离来衡量修改后的漂移数据与原始训练数据之间的差异。结果表明，FDE成功识别了85.77％的漂移特征，并展示了其在处理CD现象下的DL适应方法中的实用性。因此，FDE方法是一种有效的策略，用于识别漂移特征以阻止网络安全攻击。

更新时间: 2024-03-18 20:20:00

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2403.13023v1

On student-teacher deviations in distillation: does it pay to disobey?

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in performance. Our work aims to reconcile this seemingly paradoxical observation. Specifically, we characterize the precise nature of the student-teacher deviations, and argue how they can co-occur with better generalization. First, through experiments on image and language data, we identify that these probability deviations correspond to the student systematically exaggerating the confidence levels of the teacher. Next, we theoretically and empirically establish another form of exaggeration in some simple settings: KD exaggerates the implicit bias of gradient descent in converging faster along the top eigendirections of the data. Finally, we tie these two observations together: we demonstrate that the exaggerated bias of KD can simultaneously result in both (a) the exaggeration of confidence and (b) the improved generalization of the student, thus offering a resolution to the apparent paradox. Our analysis brings existing theory and practice closer by considering the role of gradient descent in KD and by demonstrating the exaggerated bias effect in both theoretical and empirical settings.

Updated: 2024-03-18 20:15:51

标题: 关于蒸馏中学生和老师的偏差：违抗是否值得？

摘要: 知识蒸馏（KD）被广泛用于提高“学生”网络的测试准确性，通过训练它模仿经过训练的“老师”网络的软概率。然而，最近的研究表明，尽管被训练来拟合老师的概率，学生不仅可能显著偏离老师的概率，而且在性能上也可能胜过老师。我们的研究旨在调和这一看似矛盾的观察。具体地，我们表征了学生-老师偏差的确切性质，并论证了它们如何可以与更好的泛化同时发生。首先，通过对图像和语言数据的实验，我们发现这些概率偏差对应于学生系统地夸大了老师的置信水平。接下来，在一些简单的设置中，我们在理论上和实证上建立了另一种夸大形式：KD夸大了梯度下降在沿数据的顶部特征向量快速收敛的隐含偏差。最后，我们将这两个观察结果联系在一起：我们证明了KD的夸大偏差可以同时导致（a）置信的夸大和（b）学生的改进泛化，从而提供了对明显矛盾的解决方案。我们的分析通过考虑梯度下降在KD中的作用，并在理论和实证设置中展示了夸大偏差效应，将现有理论和实践更加接近。

更新时间: 2024-03-18 20:15:51

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2301.12923v3

Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations (Interactive Demo for integrating manual feedback can be found here: https://huggingface.co/spaces/skoneru/contextual_refinement_ende).

Updated: 2024-03-18 20:11:03

标题: 翻译：翻译的情境细化：句子和文档级后编辑的大型语言模型

摘要: 大型语言模型（LLM）在各种自然语言处理任务中取得了相当大的成功，但它们还没有达到神经机器翻译（NMT）的最先进性能。然而，它们在需要广泛理解和上下文处理的任务中表现出色，显示了它们在翻译方面的潜力。为了利用这些能力，我们研究了将LLM用于机器翻译并探索了最近的参数高效微调技术。令人惊讶的是，我们的初步实验发现，即使是为了翻译目的进行微调，也会导致性能下降。为了克服这一问题，我们提出了一种替代方法：将LLM作为自动后编辑器（APE）而不是直接翻译器。基于LLM处理和生成长序列的出色能力，我们还提出将我们的方法扩展到文档级翻译。我们展示利用低秩适配器微调对APE可以在句子和文档级度量上取得显著改进，同时泛化到领域外数据。最值得注意的是，我们在ContraPro测试集上实现了89％的最先进准确率，该测试集专门评估了模型在从英语到德语的翻译中解决代词歧义的能力。最后，我们研究了一个涉及文档级翻译的实际场景，其中提供了参考上下文。在这里，我们展示了利用人类修正可以显著减少后续翻译所需的编辑次数（可以在这里找到集成手动反馈的交互式演示：https://huggingface.co/spaces/skoneru/contextual_refinement_ende）。

更新时间: 2024-03-18 20:11:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.14855v2

Span-Oriented Information Extraction -- A Unifying Perspective on Information Extraction

Information Extraction refers to a collection of tasks within Natural Language Processing (NLP) that identifies sub-sequences within text and their labels. These tasks have been used for many years to link extract relevant information and to link free text to structured data. However, the heterogeneity among information extraction tasks impedes progress in this area. We therefore offer a unifying perspective centered on what we define to be spans in text. We then re-orient these seemingly incongruous tasks into this unified perspective and then re-present the wide assortment of information extraction tasks as variants of the same basic Span-Oriented Information Extraction task.

Updated: 2024-03-18 20:10:44

标题: 面向Span的信息提取 -- 信息提取的统一视角

摘要: 信息提取是自然语言处理（NLP）中的一系列任务，它识别文本中的子序列及其标签。多年来，这些任务被用于提取相关信息并将自由文本与结构化数据相连。然而，信息提取任务之间的异质性阻碍了这一领域的进展。因此，我们提出了一个围绕我们定义的文本范围的统一视角。然后，我们将这些看似不相容的任务重新定位到这个统一的视角中，并将广泛的信息提取任务呈现为同一基本的面向范围的信息提取任务的变体。

更新时间: 2024-03-18 20:10:44

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2403.15453v1

Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias

We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.

Updated: 2024-03-18 20:09:01

标题: 在大型语言模型时代重新思考零-shot抽象摘要生成：从位置偏差的视角

摘要: 我们通过测量位置偏差来表征和研究大型语言模型（LLMs）中的零样本抽象摘要，我们将其提出作为先前文献中研究的更为严格的引导偏差现象的一般公式。位置偏差捕捉模型倾向于不公平地优先考虑输入文本的某些部分而非其他部分的倾向，导致不良行为。通过对四个不同的真实世界数据集进行大量实验，我们研究了多个LLM模型（如GPT 3.5-Turbo、Llama-2和Dolly-v2）以及最先进的预训练编码器-解码器抽象摘要模型（如Pegasus和BART）中的位置偏差。我们的发现为零样本摘要任务的模型性能和位置偏差带来了新颖的见解和讨论。

更新时间: 2024-03-18 20:09:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.01989v3

Evaluating Named Entity Recognition: Comparative Analysis of Mono- and Multilingual Transformer Models on Brazilian Corporate Earnings Call Transcriptions

Named Entity Recognition (NER) is a Natural Language Processing technique for extracting information from textual documents. However, much of the existing research on NER has been centered around English-language documents, leaving a gap in the availability of datasets tailored to the financial domain in Portuguese. This study addresses the need for NER within the financial domain, focusing on Portuguese-language texts extracted from earnings call transcriptions of Brazilian banks. By curating a comprehensive dataset comprising 384 transcriptions and leveraging weak supervision techniques for annotation, we evaluate the performance of monolingual models trained on Portuguese (BERTimbau and PTT5) and multilingual models (mBERT and mT5). Notably, we introduce a novel approach that reframes the token classification task as a text generation problem, enabling fine-tuning and evaluation of T5 models. Following the fine-tuning of the models, we conduct an evaluation on the test dataset, employing performance and error metrics. Our findings reveal that BERT-based models consistently outperform T5-based models. Furthermore, while the multilingual models exhibit comparable macro F1-scores, BERTimbau demonstrates superior performance over PTT5. A manual analysis of sentences generated by PTT5 and mT5 unveils a degree of similarity ranging from 0.89 to 1.0, between the original and generated sentences. However, critical errors emerge as both models exhibit discrepancies, such as alterations to monetary and percentage values, underscoring the importance of accuracy and consistency in the financial domain. Despite these challenges, PTT5 and mT5 achieve impressive macro F1-scores of 98.52% and 98.85%, respectively, with our proposed approach. Furthermore, our study sheds light on notable disparities in memory and time consumption for inference across the models.

Updated: 2024-03-18 19:53:56

标题: 评估命名实体识别：对巴西公司财报电话会议转录的单语和多语言Transformer模型的比较分析

摘要: 命名实体识别（NER）是一种从文本文档中提取信息的自然语言处理技术。然而，目前关于NER的大部分研究都集中在英语文档上，导致在葡萄牙语金融领域定制数据集的可用性存在空白。本研究针对金融领域内NER的需求，着重于从巴西银行的收益电话转录中提取的葡萄牙语文本。通过策划一个包含384个转录的全面数据集，并利用弱监督技术进行注释，我们评估了在葡萄牙语（BERTimbau和PTT5）和多语言（mBERT和mT5）模型上训练的单语模型的性能。值得注意的是，我们引入了一种新颖的方法，将标记分类任务重新构建为文本生成问题，从而实现对T5模型的微调和评估。在对模型进行微调后，我们对测试数据集进行了评估，使用性能和错误度量指标。我们的研究结果显示，基于BERT的模型始终优于基于T5的模型。此外，虽然多语言模型表现出可比的宏F1分数，BERTimbau在PTT5上展现出更优越的性能。通过对PTT5和mT5生成的句子进行手动分析，揭示了原始句子和生成句子之间从0.89到1.0的相似度程度。然而，由于两个模型都存在差异，例如对货币和百分比值的更改，突显了在金融领域准确性和一致性的重要性。尽管存在这些挑战，PTT5和mT5通过我们提出的方法分别实现了惊人的98.52％和98.85％的宏F1分数。此外，我们的研究揭示了模型之间推理中的内存和时间消耗存在显着差异。

更新时间: 2024-03-18 19:53:56

领域: cs.CL,cs.AI,cs.LG,68T50

下载: http://arxiv.org/abs/2403.12212v1

Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat?

The evolution of artificial intelligence (AI) has catalyzed a transformation in digital content generation, with profound implications for cyber influence operations. This report delves into the potential and limitations of generative deep learning models, such as diffusion models, in fabricating convincing synthetic images. We critically assess the accessibility, practicality, and output quality of these tools and their implications in threat scenarios of deception, influence, and subversion. Notably, the report generates content for several hypothetical cyber influence operations to demonstrate the current capabilities and limitations of these AI-driven methods for threat actors. While generative models excel at producing illustrations and non-realistic imagery, creating convincing photo-realistic content remains a significant challenge, limited by computational resources and the necessity for human-guided refinement. Our exploration underscores the delicate balance between technological advancement and its potential for misuse, prompting recommendations for ongoing research, defense mechanisms, multi-disciplinary collaboration, and policy development. These recommendations aim to leverage AI's potential for positive impact while safeguarding against its risks to the integrity of information, especially in the context of cyber influence.

Updated: 2024-03-18 19:44:30

标题: 网络影响行动中的合成图像生成：一种新兴威胁？

摘要: 人工智能（AI）的发展催生了数字内容生成的转变，对网络影响操作产生了深远影响。本报告深入探讨了生成式深度学习模型（如扩散模型）在制作具说服力的合成图像方面的潜力和局限性。我们对这些工具的可访问性、实用性和输出质量进行了批判性评估，以及它们在欺骗、影响和颠覆威胁情景中的影响。值得注意的是，该报告为几种假想的网络影响操作生成内容，以展示这些由AI驱动的方法对威胁行为者当前的能力和局限性。尽管生成模型擅长制作插图和非现实图像，但创造具有说服力的逼真照片内容仍然是一个重大挑战，受到计算资源的限制以及需要人类引导的完善的需要。我们的探索强调了技术发展和其被滥用潜力之间的微妙平衡，从而促使对持续研究、防御机制、跨学科合作和政策制定的建议。这些建议旨在发挥人工智能对积极影响的潜力，同时防范其对信息完整性的风险，尤其是在网络影响的背景下。

更新时间: 2024-03-18 19:44:30

领域: cs.CY,cs.AI,cs.CV,K.4.0; I.2.0; I.4.0

下载: http://arxiv.org/abs/2403.12207v1

Compositional learning of functions in humans and machines

The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions.

Updated: 2024-03-18 19:22:53

标题: 人类和机器对函数的组合学习

摘要: 学习和组合函数的能力是人类高效学习和推理的基础，使得人类能够进行灵活的泛化，例如从已知的烹饪过程中创造新菜肴。除了函数的顺序链接外，现有的语言学文献表明，人类可以理解更复杂的交互函数组合，其中输出产生取决于不同函数顺序引起的上下文变化。将调查扩展到视觉领域，我们开发了一个函数学习范例，探索人类和神经网络模型在学习和推理不同交互条件下的组合函数的能力。在对单个函数进行简短训练后，人类参与者被要求组合两个学习过的函数，涵盖了四种主要的交互类型，包括第一个函数的应用创建或移除应用第二个函数的上下文的情况。我们的研究结果表明，人类可以在不同交互条件下对新的视觉函数组合进行零样本泛化，表现出对上下文变化的敏感性。与进行相同任务的神经网络模型进行比较后发现，通过元学习组合性（MLC）方法，标准的序列到序列Transformer可以模仿人类在组合函数方面的泛化模式。

更新时间: 2024-03-18 19:22:53

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2403.12201v1

E2F-Net: Eyes-to-Face Inpainting via StyleGAN Latent Space

Face inpainting, the technique of restoring missing or damaged regions in facial images, is pivotal for applications like face recognition in occluded scenarios and image analysis with poor-quality captures. This process not only needs to produce realistic visuals but also preserve individual identity characteristics. The aim of this paper is to inpaint a face given periocular region (eyes-to-face) through a proposed new Generative Adversarial Network (GAN)-based model called Eyes-to-Face Network (E2F-Net). The proposed approach extracts identity and non-identity features from the periocular region using two dedicated encoders have been used. The extracted features are then mapped to the latent space of a pre-trained StyleGAN generator to benefit from its state-of-the-art performance and its rich, diverse and expressive latent space without any additional training. We further improve the StyleGAN output to find the optimal code in the latent space using a new optimization for GAN inversion technique. Our E2F-Net requires a minimum training process reducing the computational complexity as a secondary benefit. Through extensive experiments, we show that our method successfully reconstructs the whole face with high quality, surpassing current techniques, despite significantly less training and supervision efforts. We have generated seven eyes-to-face datasets based on well-known public face datasets for training and verifying our proposed methods. The code and datasets are publicly available.

Updated: 2024-03-18 19:11:34

标题: E2F-Net: 通过StyleGAN潜空间实现的眼睛到面部修补

摘要: 面部修复是一种在面部图像中恢复缺失或损坏区域的技术，对于在被遮挡的情况下进行人脸识别和分析低质量图像的应用至关重要。这个过程不仅需要产生逼真的视觉效果，还需要保留个体身份特征。本文旨在通过一个名为Eyes-to-Face Network（E2F-Net）的新提出的基于生成对抗网络（GAN）的模型来修复给定眼部区域（眼睛到脸部）的面部。所提出的方法使用了两个专门的编码器从眼部区域提取身份和非身份特征。然后，提取的特征被映射到预训练的StyleGAN生成器的潜在空间，以利用其最先进的性能和丰富、多样和表现力丰富的潜在空间，而无需进行额外的训练。我们进一步改进了StyleGAN的输出，通过一种新的GAN反演技术来找到潜在空间中的最佳代码。我们的E2F-Net需要最少的训练过程，降低了计算复杂度作为一个次要的好处。通过广泛的实验证明，我们的方法成功地高质量地重建了整张脸，超越了当前技术，尽管训练和监督工作明显减少。我们基于知名的公共人脸数据集生成了七个眼睛到脸部数据集，用于训练和验证我们提出的方法。代码和数据集均可公开获取。

更新时间: 2024-03-18 19:11:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.12197v1

Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models

The Gartner 2022 report predicts that 45% of organizations worldwide will encounter software supply chain attacks by 2025, highlighting the urgency to improve software supply chain security for community and national interests. Current malware detection techniques aid in the manual review process by filtering benign and malware packages, yet such techniques have high false-positive rates and limited automation support. Therefore, malware detection techniques could benefit from advanced, more automated approaches for accurate and minimally false-positive results. The goal of this study is to assist security analysts in identifying malicious packages through the empirical study of large language models (LLMs) to detect potential malware in the npm ecosystem. We present SocketAI Scanner, a multi-stage decision-maker malware detection workflow using iterative self-refinement and zero-shot-role-play-Chain of Thought (CoT) prompting techniques for ChatGPT. We studied 5,115 npm packages (of which 2,180 are malicious) and performed a baseline comparison of the GPT-3 and GPT-4 models with a static analysis tool. Our findings showed promising results for GPT models with low misclassification alert rates. Our baseline comparison demonstrates a notable improvement over static analysis in precision scores above 25% and F1 scores above 15%. We attained precision and F1 scores of 91% and 94%, respectively, for the GPT-3 model. Overall, GPT-4 demonstrates superior performance in precision (99%) and F1 (97%) scores, while GPT-3 presents a cost-effective balance between performance and expenditure.

Updated: 2024-03-18 19:10:12

标题: 将镜头转向：利用大型语言模型在npm生态系统中检测恶意软件

摘要: 2022年的Gartner报告预测，到2025年，全球45%的组织将遭遇软件供应链攻击，凸显了改善软件供应链安全对社区和国家利益的迫切性。当前的恶意软件检测技术通过过滤良性和恶意软件包，在手动审核过程中发挥作用，然而这些技术存在较高的误报率和有限的自动化支持。因此，恶意软件检测技术可以受益于先进、更自动化的方法，以获得准确和最小误报的结果。本研究旨在通过对npm生态系统中大型语言模型（LLMs）的经验研究，协助安全分析师识别恶意软件包。我们提出了SocketAI Scanner，一个使用迭代自我完善和零-shot-role-play-Chain of Thought（CoT）提示技术的多阶段决策制定者恶意软件检测工作流程，用于ChatGPT。我们研究了5,115个npm包（其中2,180个是恶意的），并与静态分析工具进行了基线比较。我们的研究结果显示，GPT模型在误分类警报率较低的情况下表现出有希望的结果。我们的基线比较表明，在精确度得分高于25%和F1得分高于15%时，相对于静态分析，我们取得了显著的改进。对于GPT-3模型，我们分别获得了91%和94%的精确度和F1得分。总体而言，GPT-4在精确度（99%）和F1（97%）得分上表现出优越性能，而GPT-3则在性能和支出之间呈现出一种经济有效的平衡。

更新时间: 2024-03-18 19:10:12

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2403.12196v1

Privacy Perceptions and Behaviors of Google Personal Account Holders in Saudi Arabia

While privacy perceptions and behaviors have been investigated in Western societies, little is known about these issues in non-Western societies. To bridge this gap, we interviewed 30 Google personal account holders in Saudi Arabia about their privacy perceptions and behaviors regarding the activity data that Google saves about them. Our study focuses on Google's Activity Controls, which enable users to control whether, and how, Google saves their Web \& App Activity, Location History, and YouTube History. Our results show that although most participants have some level of awareness about Google's data practices and the Activity Controls, many have only vague awareness, and the majority have not used the available controls. When participants viewed their saved activity data, many were surprised by what had been saved. While many participants find Google's use of their data to improve the services provided to them acceptable, the majority find the use of their data for ad purposes unacceptable. We observe that our Saudi participants exhibit similar trends and patterns in privacy awareness, attitudes, preferences, concerns, and behaviors to what has been found in studies in the US. Our results emphasize the need for: 1) improved techniques to inform users about privacy settings during account sign-up, to remind users about their settings, and to raise awareness about privacy settings; 2) improved privacy setting interfaces to reduce the costs that deter many users from changing the settings; and 3) further research to explore privacy concerns in non-Western cultures.

Updated: 2024-03-18 19:07:02

标题: 沙特阿拉伯的谷歌个人账户持有者的隐私感知和行为

摘要: 尽管在西方社会已经对隐私感知和行为进行了调查，但对于非西方社会的这些问题了解甚少。为了填补这一空白，我们对沙特阿拉伯的30名谷歌个人账户持有者进行了访谈，了解他们对谷歌保存有关他们的活动数据的隐私感知和行为。我们的研究重点是谷歌的活动控制，这使用户可以控制谷歌是否以及如何保存他们的网络和应用程序活动、位置历史和YouTube历史。我们的研究结果显示，尽管大多数参与者对谷歌的数据实践和活动控制有一定程度的认识，但许多人只有模糊的认识，大多数人并未使用可用的控制选项。当参与者查看他们保存的活动数据时，许多人对保存的内容感到惊讶。虽然许多参与者认为谷歌利用他们的数据来改善为他们提供的服务是可以接受的，但大多数人认为将他们的数据用于广告目的是不可接受的。我们观察到，我们的沙特参与者在隐私意识、态度、偏好、关注点和行为方面呈现出与在美国研究中发现的类似趋势和模式。我们的研究结果强调了以下需求：1) 改进技术，在账户注册时通知用户有关隐私设置，提醒用户设置，并提高对隐私设置的认识；2) 改进隐私设置界面，降低许多用户更改设置时的成本；3) 进一步研究探讨非西方文化中的隐私关注。

更新时间: 2024-03-18 19:07:02

领域: cs.CY,cs.CR,cs.HC

下载: http://arxiv.org/abs/2308.10148v3

MAC Advice for Facility Location Mechanism Design

Algorithms with predictions have attracted much attention in the last years across various domains, including variants of facility location, as a way to surpass traditional worst-case analyses. We study the $k$-facility location mechanism design problem, where the $n$ agents are strategic and might misreport their location. Unlike previous models, where predictions are for the $k$ optimal facility locations, we receive $n$ predictions for the locations of each of the agents. However, these predictions are only "mostly" and "approximately" correct (or MAC for short) -- i.e., some $\delta$-fraction of the predicted locations are allowed to be arbitrarily incorrect, and the remainder of the predictions are allowed to be correct up to an $\varepsilon$-error. We make no assumption on the independence of the errors. Can such predictions allow us to beat the current best bounds for strategyproof facility location? We show that the $1$-median (geometric median) of a set of points is naturally robust under corruptions, which leads to an algorithm for single-facility location with MAC predictions. We extend the robustness result to a "balanced" variant of the $k$ facilities case. Without balancedness, we show that robustness completely breaks down, even for the setting of $k=2$ facilities on a line. For this "unbalanced" setting, we devise a truthful random mechanism that outperforms the best known result of Lu et al. [2010], which does not use predictions. En route, we introduce the problem of "second" facility location (when the first facility's location is already fixed). Our findings on the robustness of the $1$-median and more generally $k$-medians may be of independent interest, as quantitative versions of classic breakdown-point results in robust statistics.

Updated: 2024-03-18 18:52:04

标题: MAC建议用于设施位置机制设计

摘要: 具有预测的算法在过去几年中在各个领域引起了广泛关注，包括设施位置的变体，作为超越传统最坏情况分析的一种方式。我们研究了$k$-设施位置机制设计问题，其中$n$个代理是战略性的，并可能错误报告他们的位置。与先前的模型不同，在先前的模型中，预测是针对$k$个最佳设施位置的，我们收到了关于每个代理的位置的$n$个预测。然而，这些预测只是“大多数”和“近似”正确（或简称为MAC）-即，一些$\delta$部分的预测位置被允许是任意不正确的，其余的预测被允许是正确的，最多有$\varepsilon$的误差。我们不对错误的独立性做任何假设。这样的预测能够让我们超越当前最佳的战略性设施位置界限吗？我们展示了一组点的$1$-中位数（几何中位数）在损坏下自然是强大的，这导致了一个具有MAC预测的单设施位置的算法。我们将这种强大性结果扩展到$k$个设施的“平衡”变体情况。在没有平衡性的情况下，我们展示了强大性完全崩溃的情况，即使对于线上有$k=2$个设施的情况也是如此。对于这种“不平衡”的情况，我们设计了一个诚实的随机机制，优于卢等人[2010]所知的最佳结果，该结果不使用预测。在此过程中，我们介绍了“第二”设施位置问题（当第一个设施的位置已经固定时）。我们对$1$-中位数和更一般的$k$-中位数的强大性结果可能具有独立的兴趣，作为强大统计学中经典断点结果的定量版本。

更新时间: 2024-03-18 18:52:04

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2403.12181v1

3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View Planning

3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. This task can be carried out via view planning (VP), which aims to optimally place a certain number of cameras in positions that maximize the visual information, improving the resulting 3D reconstruction. Nonetheless, in most real-world settings, existing environmental noise can significantly affect the performance of 3D reconstruction. To that end, this work advocates a novel geometric-based reconstruction quality function for VP, that accounts for the existing noise of the environment, without requiring its closed-form expression. With no analytic expression of the objective function, this work puts forth an adaptive Bayesian optimization algorithm for accurate 3D reconstruction in the presence of noise. Numerical tests on noisy agricultural environments showcase the merits of the proposed approach for 3D reconstruction with even a small number of available cameras.

Updated: 2024-03-18 18:51:09

标题: 在嘈杂的农业环境中的3D重建：基于贝叶斯优化视角的视图规划

摘要: 三维重建是机器人领域的一个基本任务，由于其在包括农业、水下和城市环境在内的各种实际场景中产生的重大影响而受到关注。这项任务可以通过视图规划（VP）来完成，其目标是在最大化视觉信息的位置上最优地放置一定数量的摄像头，从而改善结果的三维重建。然而，在大多数实际场景中，现有的环境噪声可能会显著影响三维重建的性能。为此，本研究提出了一种基于几何的重建质量函数，用于视图规划，考虑了环境中存在的噪声，而不需要其闭合形式表达。在没有目标函数的解析表达式的情况下，本研究提出了一种自适应贝叶斯优化算法，用于在存在噪声的情况下进行准确的三维重建。在嘈杂的农业环境上进行的数值测试展示了所提方法在即使只有少量可用摄像头的情况下进行三维重建的优点。

更新时间: 2024-03-18 18:51:09

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2310.00145v2

Safety Implications of Explainable Artificial Intelligence in End-to-End Autonomous Driving

The end-to-end learning pipeline is gradually creating a paradigm shift in the ongoing development of highly autonomous vehicles, largely due to advances in deep learning, the availability of large-scale training datasets, and improvements in integrated sensor devices. However, a lack of interpretability in real-time decisions with contemporary learning methods impedes user trust and attenuates the widespread deployment and commercialization of such vehicles. Moreover, the issue is exacerbated when these cars are involved in or cause traffic accidents. Such drawback raises serious safety concerns from societal and legal perspectives. Consequently, explainability in end-to-end autonomous driving is essential to enable the safety of vehicular automation. However, the safety and explainability aspects of autonomous driving have generally been investigated disjointly by researchers in today's state of the art. In this paper, we aim to bridge the gaps between these topics and seek to answer the following research question: When and how can explanations improve safety of autonomous driving? In this regard, we first revisit established safety and state-of-the-art explainability techniques in autonomous driving. Furthermore, we present three critical case studies and show the pivotal role of explanations in enhancing self-driving safety. Finally, we describe our empirical investigation and reveal potential value, limitations, and caveats with practical explainable AI methods on their role of assuring safety and transparency for vehicle autonomy.

Updated: 2024-03-18 18:49:20

标题: 自动驾驶端到端中可解释人工智能的安全性影响

摘要: 端到端的学习管道正在逐渐引发高度自主车辆的发展中的范式转变，这在很大程度上是由于深度学习的进步、大规模训练数据集的可用性以及集成传感器设备的改进。然而，当代学习方法在实时决策中缺乏可解释性，这阻碍了用户的信任并削弱了这些车辆的广泛部署和商业化。此外，当这些车辆卷入或导致交通事故时，这个问题变得更加严重。这种缺陷从社会和法律的角度引发了严重的安全担忧。因此，在端到端自动驾驶中，解释性是确保车辆自动化安全的关键。然而，在当今的技术水平上，自动驾驶的安全性和可解释性方面通常由研究人员分别进行研究。本文旨在弥合这些主题之间的差距，并试图回答以下研究问题：何时以及如何解释能够提高自动驾驶的安全性？在这方面，我们首先重新审视自动驾驶中已建立的安全性和最新的可解释性技术。此外，我们提出三个关键案例研究，并展示解释在增强自动驾驶安全性方面的关键作用。最后，我们描述了我们的实证调查，并揭示了可解释人工智能方法在确保车辆自主性安全和透明性方面的潜在价值、局限性和注意事项。

更新时间: 2024-03-18 18:49:20

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.12176v1

TnT-LLM: Text Mining at Scale with Large Language Models

Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications.

Updated: 2024-03-18 18:45:28

标题: TnT-LLM：使用大型语言模型进行规模化文本挖掘

摘要: 将非结构化文本转化为结构化和有意义的形式，通过有用的类别标签进行组织，在文本挖掘中是一个基本步骤，用于下游分析和应用。然而，目前大多数用于生成标签分类和构建基于文本的标签分类器的方法仍然严重依赖领域专业知识和手动策划，使得这一过程昂贵且耗时。特别是在标签空间未明确规定且大规模数据注释不可用时，这一挑战尤为严峻。在本文中，我们利用大型语言模型（LLMs）来应对这些挑战，其基于提示的接口促进了大规模伪标签的诱导和使用。我们提出了TnT-LLM，一个两阶段框架，利用LLMs自动化地生成和分配端到端标签，最大程度减少了针对任何给定用例的人力投入。在第一阶段，我们引入了一种零-shot、多阶段推理方法，使LLMs能够迭代地生成和完善标签分类。在第二阶段，LLMs被用作数据标记器，产生训练样本，以便可以可靠地构建、部署和提供规模化的轻量级监督分类器。我们将TnT-LLM应用于用户意图和Bing Copilot（原Bing Chat）的对话领域分析，这是一个开放域基于对话的搜索引擎。通过人工和自动评估指标进行了大量实验，结果表明，与最新基线相比，TnT-LLM生成的标签分类更准确和相关，同时在规模化分类方面实现了准确性和效率之间的有利平衡。我们还分享了在实际应用中使用LLMs进行大规模文本挖掘时面临的挑战和机会的实践经验和见解。

更新时间: 2024-03-18 18:45:28

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2403.12173v1

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision. Accurately identifying abnormal patterns or events enables operators to promptly detect suspicious activities, thereby enhancing safety. Achieving this demands a comprehensive understanding of human motions, both at body and region levels, while also accounting for the wide variations of performing a single action. However, existing studies fail to simultaneously address these crucial properties. This paper introduces a novel, practical and lightweight framework, namely Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection (GiCiSAD) to overcome the challenges associated with SVAD. GiCiSAD consists of three novel modules: the Graph Attention-based Forecasting module to capture the spatio-temporal dependencies inherent in the data, the Graph-level Jigsaw Puzzle Maker module to distinguish subtle region-level discrepancies between normal and abnormal motions, and the Graph-based Conditional Diffusion model to generate a wide spectrum of human motions. Extensive experiments on four widely used skeleton-based video datasets show that GiCiSAD outperforms existing methods with significantly fewer training parameters, establishing it as the new state-of-the-art.

Updated: 2024-03-18 18:42:32

标题: 基于骨架的视频异常检测的图拼图条件扩散模型

摘要: 骨架视频异常检测（SVAD）是计算机视觉中的关键任务。准确识别异常模式或事件使操作员能够及时检测可疑活动，从而提高安全性。实现这一目标需要全面了解人体运动，包括身体和区域级别，同时还要考虑执行单个动作时的广泛变化。然而，现有研究未能同时解决这些关键特性。本文介绍了一种新颖、实用且轻量级的框架，即基于骨架视频异常检测的图拼图条件扩散模型（GiCiSAD），以克服与SVAD相关的挑战。GiCiSAD包括三个新颖模块：基于图注意力的预测模块，捕获数据中固有的时空依赖关系；基于图级别的拼图制作模块，区分正常和异常运动之间微小的区域级别差异；基于图的条件扩散模型，生成广泛的人体运动谱。对四个广泛使用的基于骨架视频的数据集进行的大量实验表明，GiCiSAD在训练参数显著减少的情况下优于现有方法，确立其为新的最先进技术。

更新时间: 2024-03-18 18:42:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.12172v1

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper introduces EasyJailbreak, a unified framework simplifying the construction and evaluation of jailbreak attacks against LLMs. It builds jailbreak attacks using four components: Selector, Mutator, Constraint, and Evaluator. This modular framework enables researchers to easily construct attacks from combinations of novel and existing components. So far, EasyJailbreak supports 11 distinct jailbreak methods and facilitates the security validation of a broad spectrum of LLMs. Our validation across 10 distinct LLMs reveals a significant vulnerability, with an average breach probability of 60% under various jailbreaking attacks. Notably, even advanced models like GPT-3.5-Turbo and GPT-4 exhibit average Attack Success Rates (ASR) of 57% and 33%, respectively. We have released a wealth of resources for researchers, including a web platform, PyPI published package, screencast video, and experimental outputs.

Updated: 2024-03-18 18:39:53

标题: EasyJailbreak：一种用于越狱大型语言模型的统一框架

摘要: 越狱攻击对于识别和缓解大型语言模型（LLMs）的安全漏洞至关重要。它们旨在绕过保护措施并引发被禁止的输出。然而，由于各种越狱方法之间存在显著差异，社区尚无标准的实现框架可用，这限制了全面的安全评估。本文介绍了EasyJailbreak，这是一个统一框架，简化了对LLMs的越狱攻击的构建和评估。它使用四个组件构建越狱攻击：选择器、变异器、约束器和评估器。这种模块化框架使研究人员能够轻松地通过新的和现有组件的组合构建攻击。到目前为止，EasyJailbreak支持11种不同的越狱方法，并促进了对广泛范围的LLMs进行安全验证。我们对10种不同的LLMs进行验证，发现存在显著的漏洞，各种越狱攻击下平均违规概率为60%。值得注意的是，即使像GPT-3.5-Turbo和GPT-4这样的先进模型，平均攻击成功率（ASR）分别为57%和33%。我们为研究人员提供了大量资源，包括一个网络平台、PyPI发布的软件包、演示视频和实验结果。

更新时间: 2024-03-18 18:39:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.12171v1

Intelligent Execution through Plan Analysis

Intelligent robots need to generate and execute plans. In order to deal with the complexity of real environments, planning makes some assumptions about the world. When executing plans, the assumptions are usually not met. Most works have focused on the negative impact of this fact and the use of replanning after execution failures. Instead, we focus on the positive impact, or opportunities to find better plans. When planning, the proposed technique finds and stores those opportunities. Later, during execution, the monitoring system can use them to focus perception and repair the plan, instead of replanning from scratch. Experiments in several paradigmatic robotic tasks show how the approach outperforms standard replanning strategies.

Updated: 2024-03-18 18:23:36

标题: 智能执行通过计划分析

摘要: 智能机器人需要生成和执行计划。为了处理真实环境的复杂性，规划对世界做出一些假设。在执行计划时，这些假设通常无法满足。大多数研究集中在这一事实的负面影响以及在执行失败后重新规划的使用上。相反，我们关注积极影响，即寻找更好计划的机会。在规划时，所提出的技术找到并存储这些机会。在执行过程中，监控系统可以利用它们来聚焦感知并修复计划，而不是从头开始重新规划。在几种典型机器人任务的实验中，该方法表现出优于标准重新规划策略的效果。

更新时间: 2024-03-18 18:23:36

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2403.12162v1

Safety Cases: How to Justify the Safety of Advanced AI Systems

As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe. We propose a framework for organizing a safety case and discuss four categories of arguments to justify safety: total inability to cause a catastrophe, sufficiently strong control measures, trustworthiness despite capability to cause harm, and -- if AI systems become much more powerful -- deference to credible AI advisors. We evaluate concrete examples of arguments in each category and outline how arguments could be combined to justify that AI systems are safe to deploy.

Updated: 2024-03-18 18:11:46

标题: 安全案例：如何证明先进人工智能系统的安全性

摘要: 随着人工智能系统变得越来越先进，公司和监管机构将面临是否安全地训练和部署它们的困难决策。为了为这些决策做好准备，我们研究了开发人员如何制定一个“安全案例”，即一个结构化的理由，证明人工智能系统不太可能造成灾难。我们提出了一个组织安全案例的框架，并讨论了四类论据来证明安全性：完全无法造成灾难、足够强大的控制措施、尽管有造成伤害能力但值得信赖、以及——如果人工智能系统变得更加强大——对可信任的人工智能顾问的尊重。我们评估了每个类别中论据的具体例子，并概述了如何结合这些论据来证明人工智能系统可以安全部署。

更新时间: 2024-03-18 18:11:46

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.10462v2

Routing and Scheduling in Answer Set Programming applied to Multi-Agent Path Finding: Preliminary Report

We present alternative approaches to routing and scheduling in Answer Set Programming (ASP), and explore them in the context of Multi-agent Path Finding. The idea is to capture the flow of time in terms of partial orders rather than time steps attached to actions and fluents. This also abolishes the need for fixed upper bounds on the length of plans. The trade-off for this avoidance is that (parts of) temporal trajectories must be acyclic, since multiple occurrences of the same action or fluent cannot be distinguished anymore. While this approach provides an interesting alternative for modeling routing, it is without alternative for scheduling since fine-grained timings cannot be represented in ASP in a feasible way. This is different for partial orders that can be efficiently handled by external means such as acyclicity and difference constraints. We formally elaborate upon this idea and present several resulting ASP encodings. Finally, we demonstrate their effectiveness via an empirical analysis.

Updated: 2024-03-18 18:09:47

标题: Answer Set编程中的路径规划和调度应用于多智能体路径规划：初步报告

摘要: 我们提出了在答案集编程（ASP）中进行路由和调度的替代方法，并在多智能体路径规划的背景下探讨这些方法。这个想法是以部分顺序的形式捕捉时间流动，而不是将时间步骤附加到动作和流畅性上。这也消除了计划长度的固定上界的需求。这种避免的代价是（部分）时间轨迹必须是无环的，因为不能再区分相同动作或流畅性的多次出现。虽然这种方法为建模路由提供了一个有趣的替代方案，但对于调度来说是无可替代的，因为细粒度的时间不能以可行的方式在ASP中表示。这在部分顺序方面是不同的，可以通过外部手段如无环和差异约束来有效处理。我们对这个想法进行了正式的详细说明，并提出了几个结果ASP编码。最后，我们通过实证分析证明了它们的有效性。

更新时间: 2024-03-18 18:09:47

领域: cs.AI,cs.LO,cs.SC

下载: http://arxiv.org/abs/2403.12153v1

Align and Distill: Unifying and Improving Domain Adaptive Object Detection

Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +2.0 AP50 on CFC Kenai to Channel. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting.

Updated: 2024-03-18 17:58:02

标题: 对齐和提炼：统一和改进领域自适应目标检测

摘要: 目标检测器在与它们的训练集不同的数据上通常表现不佳。最近，域自适应目标检测（DAOD）方法已经展示出在解决这一挑战上的强大结果。不幸的是，我们发现了一些系统性的基准测试陷阱，使过去的结果受到质疑，并阻碍了进一步的进展：（a）由于基准测试不足导致性能被高估，（b）不一致的实施做法阻碍了方法的透明比较，（c）由于过时的骨干和基准测试缺乏多样性而缺乏普遍性。我们通过引入以下方法来解决这些问题：（1）统一的基准测试和实施框架，Align and Distill（ALDI），实现了DAOD方法的比较并支持未来的发展，（2）为DAOD提供公平和现代的训练和评估协议，解决基准测试陷阱，（3）一个新的DAOD基准数据集，CFC-DAOD，使得能够在多样化的真实世界数据上进行评估，（4）一种新方法，ALDI++，通过大幅度的提升实现了最先进的结果。ALDI++在Cityscapes到Foggy Cityscapes上比先前的最先进结果提高了+3.5 AP50，在Sim10k到Cityscapes上提高了+5.7 AP50（其中我们是唯一能够超越公平基线的方法），在CFC Kenai到Channel上提高了+2.0 AP50。我们的框架、数据集和最先进的方法为DAOD提供了一个关键的重置，并为未来的研究奠定了坚实的基础。代码和数据可在以下链接找到：https://github.com/justinkay/aldi 和 https://github.com/visipedia/caltech-fish-counting。

更新时间: 2024-03-18 17:58:02

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.12029v1

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture mapping. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture mapping method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of \emph{Ultraman} on various standard datasets. In addition, \emph{Ultraman} outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available.

Updated: 2024-03-18 17:57:30

标题: 奥特曼：单图像超高速、高精度3D人体重建

摘要: 3D人体重建一直是计算机视觉领域的一个挑战。先前的方法通常耗时且难以捕捉人体的详细外观。本文提出了一种名为“奥特曼”的新方法，用于从单个图像快速重建带有纹理的3D人体模型。与现有技术相比，“奥特曼”大大提高了重建速度和准确性，同时保留了高质量的纹理细节。我们提出了一个人体重建的新框架，包括三个部分，几何重建、纹理生成和纹理映射。首先，使用网格重建框架，可以精确地从单个图像中提取3D人体形状。同时，我们提出了一种基于单个图像生成多视角一致的人体图像的方法。最后，结合一种新颖的纹理映射方法，优化纹理细节并确保在重建过程中颜色一致性。通过广泛的实验和评估，我们展示了“奥特曼”在各种标准数据集上的卓越性能。此外，“奥特曼”在人体渲染质量和速度方面优于最先进的方法。在文章被接受后，我们将公开发布代码和数据。

更新时间: 2024-03-18 17:57:30

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2403.12028v1

Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem

Job shop scheduling problems represent a significant and complex facet of combinatorial optimization problems, which have traditionally been addressed through either exact or approximate solution methodologies. However, the practical application of these solutions is often challenged due to the complexity of real-world problems. Even when utilizing an approximate solution approach, the time required to identify a near-optimal solution can be prohibitively extensive, and the solutions derived are generally not applicable to new problems. This study proposes an innovative attention-based reinforcement learning method specifically designed for the category of job shop scheduling problems. This method integrates a policy gradient reinforcement learning approach with a modified transformer architecture. A key finding of this research is the ability of our trained learners within the proposed method to be repurposed for larger-scale problems that were not part of the initial training set. Furthermore, empirical evidence demonstrates that our approach surpasses the results of recent studies and outperforms commonly implemented heuristic rules. This suggests that our method offers a promising avenue for future research and practical application in the field of job shop scheduling problems.

Updated: 2024-03-18 17:57:22

标题: 基于注意力的强化学习用于组合优化问题：作业车间调度问题应用

摘要: Job shop scheduling problems are a challenging aspect of combinatorial optimization problems, typically addressed through exact or approximate solutions. However, the complexity of real-world problems often hinders the practical application of these solutions. Even with approximate solutions, the time needed to find a near-optimal solution can be excessive, and these solutions may not be transferable to new problems. This study introduces a novel attention-based reinforcement learning method tailored for job shop scheduling problems, combining policy gradient reinforcement learning with a modified transformer architecture. A key discovery is the ability of our trained learners to handle larger-scale problems beyond the initial training set. Empirical evidence shows that our method outperforms recent studies and commonly used heuristic rules, suggesting a promising avenue for future research and practical applications in job shop scheduling problems.

更新时间: 2024-03-18 17:57:22

领域: cs.AI

下载: http://arxiv.org/abs/2401.16580v2

FlexCap: Generating Rich, Localized, and Flexible Captions in Images

We introduce a versatile $\textit{flexible-captioning}$ vision-language model (VLM) capable of generating region-specific descriptions of varying lengths. The model, FlexCap, is trained to produce length-conditioned captions for input bounding boxes, and this allows control over the information density of its output, with descriptions ranging from concise object labels to detailed captions. To achieve this we create large-scale training datasets of image region descriptions of varying length, starting from captioned images. This flexible-captioning capability has several valuable applications. First, FlexCap demonstrates superior performance in dense captioning tasks on the Visual Genome dataset. Second, a visual question answering (VQA) system can be built by employing FlexCap to generate localized descriptions as inputs to a large language model. The resulting system achieves state-of-the-art zero-shot performance on a number of VQA datasets. We also demonstrate a $\textit{localize-then-describe}$ approach with FlexCap can be better at open-ended object detection than a $\textit{describe-then-localize}$ approach with other VLMs. We highlight a novel characteristic of FlexCap, which is its ability to extract diverse visual information through prefix conditioning. Finally, we qualitatively demonstrate FlexCap's broad applicability in tasks such as image labeling, object attribute recognition, and visual dialog. Project webpage: https://flex-cap.github.io .

Updated: 2024-03-18 17:57:02

标题: FlexCap：在图像中生成丰富、本地化和灵活的标题

摘要: 我们介绍了一种多功能的$\textit{灵活字幕}$视觉语言模型（VLM），能够生成不同长度的特定区域描述。该模型FlexCap经过训练，在输入边界框时产生长度条件的字幕，这使得可以控制其输出的信息密度，描述范围从简洁的对象标签到详细的字幕。为了实现这一点，我们创建了大规模的训练数据集，包括不同长度的图像区域描述，从带字幕的图像开始。这种灵活字幕的能力具有几个宝贵的应用。首先，FlexCap在Visual Genome数据集上的密集字幕任务中表现出优越性能。其次，通过利用FlexCap生成局部化描述作为大型语言模型的输入，可以构建视觉问答（VQA）系统。结果系统在多个VQA数据集上实现了最先进的零-shot性能。我们还展示了FlexCap的$\textit{先定位再描述}$方法在开放式目标检测方面比其他VLM的$\textit{先描述再定位}$方法更好。我们强调了FlexCap的一个新颖特征，即通过前缀调节能够提取多样的视觉信息。最后，我们在图像标注、目标属性识别和视觉对话等任务中定性地展示了FlexCap的广泛适用性。项目网页：https://flex-cap.github.io。

更新时间: 2024-03-18 17:57:02

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2403.12026v1

Supervised Fine-Tuning as Inverse Reinforcement Learning

The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback and assumes access to specific types of preference datasets. In our work, we question the efficacy of such datasets and explore various scenarios where alignment with expert demonstrations proves more realistic. We build a sequential decision-making framework to formulate the problem of aligning LLMs using demonstration datasets. Drawing insights from inverse reinforcement learning and imitation learning, we introduce various approaches for divergence minimization in the LLM alignment tasks. Our analysis highlights the mass-covering and mode-seeking behaviors of these different approaches. Inclusively, we examine the pros and cons of the classical supervised fine-tuning method, elaborating on scenarios where different methods shine.

Updated: 2024-03-18 17:52:57

标题: 监督微调作为逆强化学习

摘要: 目前对齐大型语言模型（LLMs）的主流方法通常依赖于人类或人工智能的反馈，并假定可以访问特定类型的偏好数据集。在我们的工作中，我们质疑这些数据集的有效性，并探讨与专家演示对齐更为现实的各种情景。我们构建了一个顺序决策框架，用于制定使用演示数据集对齐LLMs的问题。借鉴逆强化学习和模仿学习的见解，我们引入了各种方法来最小化LLM对齐任务中的偏差。我们的分析突显了这些不同方法的大范围覆盖和寻找模式行为。此外，我们审查了经典的监督微调方法的优缺点，并详细阐述了不同方法表现出色的情景。

更新时间: 2024-03-18 17:52:57

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.12017v1

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

Recent SOTA approaches for embodied learning via interaction directly employ large language models (LLMs) as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller embodied RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. First, we prompt an LLM to generate training environments that allow agents to quickly learn different tasks in parallel. Concretely, the LLM is given the task description and simulator objectives that the agents should learn and is then asked to generate a set of environment configurations (e.g., different terrains, items given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We show qualitatively how the LLM adapts training environments to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of LLM calls. Lastly, we present detailed ablation studies for our design choices.

Updated: 2024-03-18 17:51:16

标题: EnvGen: 利用LLMs生成和调整环境以训练具身体机器人

摘要: 最近的最先进方法通过互动直接利用大型语言模型(LLMs)作为代理来进行具身学习，以确定环境中的下一步。由于其世界知识和推理能力，LLM代理的性能比基于强化学习(RL)的先前较小代理更强大；然而，频繁调用LLMs是缓慢且昂贵的。我们提出EnvGen，这是一个新颖的框架来解决这个问题。首先，我们提示一个LLM生成训练环境，使代理能够快速并行学习不同任务。具体来说，LLM被给予任务描述和模拟器目标，代理应该学习，并被要求生成一组环境配置(例如，不同的地形，给予代理的物品等)。接下来，我们在原始和LLM生成的环境的混合中训练一个小型RL代理。然后，我们使LLM能够持续调整生成的环境，逐渐改善代理擅长的技能，通过以代理性能的形式向LLM提供反馈。我们通过Crafter和Heist环境中的全面实验展示了EnvGen的实用性。我们发现，通过EnvGen训练的小型RL代理可以胜过SOTA方法，包括一个GPT-4代理，并且学习长视程任务的速度明显更快。我们从质量上展示了LLM如何随着时间的推移调整训练环境以帮助改善RL代理较弱的技能。此外，EnvGen更高效，因为它仅使用少量的LLM调用(例如，总共4次)，而LLM代理需要数千次LLM调用。最后，我们对我们的设计选择进行了详细的剔除研究。

更新时间: 2024-03-18 17:51:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.12014v1

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV.

Updated: 2024-03-18 17:48:15

标题: VideoMV: 基于大型视频生成模型的一致多视角生成

摘要: 生成基于文本或单图像提示的多视图图像对于创建3D内容是至关重要的能力。关于这个主题的两个基本问题是我们用什么数据进行训练以及如何确保多视图一致性。本文介绍了一个新颖的框架，对这两个问题都做出了基本贡献。与利用2D扩散模型的图像进行训练不同，我们提出了一个稠密一致的多视图生成模型，该模型是从现成的视频生成模型进行微调的。视频生成模型生成的图像更适合于多视图生成，因为生成这些图像的底层网络架构使用了一个时间模块来强制帧一致性。此外，用于训练这些模型的视频数据集丰富多样，导致了减少了训练微调域差距。为了增强多视图一致性，我们引入了一个3D感知去噪采样，首先使用一个前馈重建模块得到一个显式的全局3D模型，然后采用一种采样策略，有效地将从全局3D模型渲染的图像纳入去噪采样循环中，以改善最终图像的多视图一致性。作为一个副产品，这个模块还提供了一种快速创建在几秒钟内用3D高斯表示的3D资产的方法。我们的方法可以生成24个密集视图，在训练中收敛速度比最先进的方法快得多（4个GPU小时对比许多千个GPU小时），并具有可比较的视觉质量和一致性。通过进一步微调，我们的方法在数量指标和视觉效果上都优于现有最先进的方法。我们的项目页面是aigc3d.github.io/VideoMV。

更新时间: 2024-03-18 17:48:15

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2403.12010v1

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception

The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception. To address this critical need, this paper introduces Dynamic Routering Network (DyRoNet), a low-rank enhanced dynamic routing framework designed for streaming perception in autonomous driving systems. DyRoNet integrates a suite of pre-trained branch networks, each meticulously fine-tuned to function under distinct environmental conditions. At its core, the framework offers a speed router module, developed to assess and route input data to the most suitable branch for processing. This approach not only addresses the inherent limitations of conventional models in adapting to diverse driving conditions but also ensures the balance between performance and efficiency. Extensive experimental evaluations demonstrating the adaptability of DyRoNet to diverse branch selection strategies, resulting in significant performance enhancements across different scenarios. This work not only establishes a new benchmark for streaming perception but also provides valuable engineering insights for future work.

Updated: 2024-03-18 17:39:34

标题: DyRoNet：自动驾驶流式感知的动态路由和低秩适配器

摘要: 自动驾驶系统的进步取决于实现低延迟和高准确性感知的能力。为了应对这一关键需求，本文介绍了Dynamic Routering Network（DyRoNet），这是一个专为自动驾驶系统中的流式感知设计的低秩增强动态路由框架。DyRoNet集成了一套经过精心调整以适应不同环境条件的预训练分支网络。在其核心，该框架提供了一个速度路由器模块，用于评估并将输入数据路由到最适合的分支进行处理。这种方法不仅解决了传统模型在适应各种驾驶条件方面的固有限制，还确保了性能和效率之间的平衡。大量实验评估表明，DyRoNet对不同分支选择策略的适应性，并在不同场景下显著提升了性能。这项工作不仅为流式感知建立了新的基准，还为未来的工作提供了宝贵的工程见解。

更新时间: 2024-03-18 17:39:34

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2403.05050v3

Modular Neural Networks for Time Series Forecasting: Interpretability and Feature Selection using Attention

Multivariate time series have many applications, from healthcare and meteorology to life science. Although deep learning models have shown excellent predictive performance for time series, they have been criticised for being "black-boxes" or non-interpretable. This paper proposes a novel modular neural network model for multivariate time series prediction that is interpretable by construction. A recurrent neural network learns the temporal dependencies in the data while an attention-based feature selection component selects the most relevant features and suppresses redundant features used in the learning of the temporal dependencies. A modular deep network is trained from the selected features independently to show the users how features influence outcomes, making the model interpretable. Experimental results show that this approach can outperform state-of-the-art interpretable Neural Additive Models (NAM) and variations thereof in both regression and classification of time series tasks, achieving a predictive performance that is comparable to the top non-interpretable methods for time series, LSTM and XGBoost.

Updated: 2024-03-18 17:39:11

标题: 模块化神经网络用于时间序列预测：利用注意力进行可解释性和特征选择

摘要: 多元时间序列具有许多应用，从医疗保健和气象学到生命科学。尽管深度学习模型在时间序列的预测性能方面表现出色，但它们被批评为“黑盒子”或不可解释的。本文提出了一种新颖的模块化神经网络模型，用于多元时间序列预测，通过构造使其具有解释性。循环神经网络学习数据中的时间依赖性，而基于注意力的特征选择组件选择最相关的特征并抑制在学习时间依赖性中使用的冗余特征。模块化深度网络从选定的特征独立训练，向用户展示特征如何影响结果，使模型具有可解释性。实验结果表明，这种方法在时间序列任务的回归和分类中可以胜过最先进的可解释性神经添加模型（NAM）及其变体，实现了与最顶尖的非可解释方法LSTM和XGBoost相媲美的预测性能。

更新时间: 2024-03-18 17:39:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.16834v3

DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing

Text-driven diffusion-based video editing presents a unique challenge not encountered in image editing literature: establishing real-world motion. Unlike existing video editing approaches, here we focus on score distillation sampling to circumvent the standard reverse diffusion process and initiate optimization from videos that already exhibit natural motion. Our analysis reveals that while video score distillation can effectively introduce new content indicated by target text, it can also cause significant structure and motion deviation. To counteract this, we propose to match space-time self-similarities of the original video and the edited video during the score distillation. Thanks to the use of score distillation, our approach is model-agnostic, which can be applied for both cascaded and non-cascaded video diffusion frameworks. Through extensive comparisons with leading methods, our approach demonstrates its superiority in altering appearances while accurately preserving the original structure and motion.

Updated: 2024-03-18 17:38:53

标题: DreamMotion：零-shot 视频编辑的空间-时间自相似性得分提炼

摘要: 基于文本驱动的基于扩散的视频编辑在图像编辑文献中提出了一个独特的挑战：建立现实世界的运动。与现有的视频编辑方法不同，我们在这里专注于得分蒸馏采样，以绕过标准的逆扩散过程，并从已经展示自然运动的视频开始优化。我们的分析表明，虽然视频得分蒸馏可以有效地引入目标文本指示的新内容，但它也可能导致重大的结构和运动偏差。为了对抗这一问题，我们提出在得分蒸馏过程中匹配原始视频和编辑视频的时空自相似性。由于使用了得分蒸馏，我们的方法是与模型无关的，可以应用于级联和非级联视频扩散框架。通过与领先方法的广泛比较，我们的方法展示了在准确保留原始结构和运动的同时改变外观的优越性。

更新时间: 2024-03-18 17:38:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.12002v1

Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance

Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study, we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.

Updated: 2024-03-18 17:35:02

标题: Notochord：一个灵活的实时MIDI表现概率模型

摘要: 基于深度学习的音乐数据概率模型正在产生越来越逼真的结果，并且有望进入许多不同类型的创意工作流程中。然而，它们在表现设置中尚未受到深入研究，其中用户行为的结果通常应该感觉即时。为了实现这样的研究，我们设计了Notochord，这是一个用于序列化结构事件的深度概率模型，并在Lakh MIDI数据集上对其进行了训练。我们的概率公式允许在子事件级别进行可解释的干预，这使得一个模型可以作为多样交互式音乐功能的主干，包括可操纵的生成、和声、机器即兴演奏和基于可能性的界面。Notochord可以生成多声部和多轨MIDI，并在十毫秒以下的延迟下响应输入。训练代码、模型检查点和交互式示例作为开源软件提供。

更新时间: 2024-03-18 17:35:02

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2403.12000v1

Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of Teaching

Feedback is a critical aspect of improvement. Unfortunately, when there is a lot of feedback from multiple sources, it can be difficult to distill the information into actionable insights. Consider student evaluations of teaching (SETs), which are important sources of feedback for educators. They can give instructors insights into what worked during a semester. A collection of SETs can also be useful to administrators as signals for courses or entire programs. However, on a large scale as in high-enrollment courses or administrative records over several years, the volume of SETs can render them difficult to analyze. In this paper, we discuss a novel method for analyzing SETs using natural language processing (NLP) and large language models (LLMs). We demonstrate the method by applying it to a corpus of 5,000 SETs from a large public university. We show that the method can be used to extract, embed, cluster, and summarize the SETs to identify the themes they express. More generally, this work illustrates how to use the combination of NLP techniques and LLMs to generate a codebook for SETs. We conclude by discussing the implications of this method for analyzing SETs and other types of student writing in teaching and research settings.

Updated: 2024-03-18 17:21:35

标题: 使用生成文本模型为学生对教学的评价创建定性编码手册

摘要: 反馈是改进的关键方面。不幸的是，当来自多个来源的反馈很多时，将信息提炼为可操作的见解可能会很困难。考虑教学评估（SETs），这是教育工作者重要的反馈来源。它们可以为教师提供关于学期中哪些工作的见解。一系列的SETs也可以对管理员有用，作为课程或整个项目的信号。然而，在大规模情况下，如高入学率课程或多年的行政记录中，SETs的数量可能使它们难以分析。在本文中，我们讨论了一种使用自然语言处理（NLP）和大型语言模型（LLMs）来分析SETs的新方法。我们通过将其应用于一所大型公立大学的5,000个SETs语料库来展示这种方法。我们展示了该方法可以用于提取、嵌入、聚类和总结SETs，以确定它们所表达的主题。更一般地，这项工作说明了如何使用NLP技术和LLMs的组合生成SETs的代码簿。我们最后讨论了该方法对分析SETs和其他类型的学生写作在教学和研究环境中的影响。

更新时间: 2024-03-18 17:21:35

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2403.11984v1

What Are Tools Anyway? A Survey from the Language Model Perspective

Language models (LMs) are powerful yet mostly for text generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills. However, many works adopt the term "tool" in different ways, raising the question: What is a tool anyway? Subsequently, where and how do tools help LMs? In this survey, we provide a unified definition of tools as external programs used by LMs, and perform a systematic review of LM tooling scenarios and approaches. Grounded on this review, we empirically study the efficiency of various tooling methods by measuring their required compute and performance gains on various benchmarks, and highlight some challenges and potential future research in the field.

Updated: 2024-03-18 17:20:07

标题: 工具究竟是什么？一项从语言模型角度的调查

摘要: 语言模型（LMs）在文本生成任务中非常强大。工具显著增强了它们在需要复杂技能的任务中的性能。然而，许多作品以不同方式采用“工具”一词，引发了一个问题：工具究竟是什么？随后，工具在何处以及如何帮助LMs？在这项调查中，我们提供了工具的统一定义，即LMs使用的外部程序，并对LM工具场景和方法进行了系统性审查。基于这一审查，我们通过测量它们在各种基准测试中所需的计算和性能增益来实证研究各种工具化方法的效率，并强调该领域中的一些挑战和潜在的未来研究方向。

更新时间: 2024-03-18 17:20:07

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.15452v1

Diffusion Denoising as a Certified Defense against Clean-label Poisoning

We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.

Updated: 2024-03-18 17:17:07

标题: 扩散去噪作为一种防范干净标签中毒的认证防御方法

摘要: 我们提出了一种针对干净标签中毒攻击的认证防御方案。这些攻击通过向训练数据中注入少量毒害样本（例如1%），其中包含$p$-范数有界的对抗性扰动，以诱导测试输入的有针对性误分类。受到$去噪平滑$实现的对抗性鲁棒性的启发，我们展示了如何使用现成的扩散模型来清洁篡改的训练数据。我们对七种干净标签中毒攻击进行了广泛测试，并将它们的攻击成功率降低到0-16%，同时测试准确率几乎没有下降。我们将我们的防御措施与现有的对抗措施进行了比较，结果显示该防御措施能够最大程度地减少攻击成功率，并提供最佳的模型效用。我们的结果强调了未来需要开展更强大的干净标签攻击并使用我们的认证但实用的防御方案作为评估这些攻击的强大基准线的需要。

更新时间: 2024-03-18 17:17:07

领域: cs.CR,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.11981v1

Informed Spectral Normalized Gaussian Processes for Trajectory Prediction

Prior parameter distributions provide an elegant way to represent prior expert and world knowledge for informed learning. Previous work has shown that using such informative priors to regularize probabilistic deep learning (DL) models increases their performance and data-efficiency. However, commonly used sampling-based approximations for probabilistic DL models can be computationally expensive, requiring multiple inference passes and longer training times. Promising alternatives are compute-efficient last layer kernel approximations like spectral normalized Gaussian processes (SNGPs). We propose a novel regularization-based continual learning method for SNGPs, which enables the use of informative priors that represent prior knowledge learned from previous tasks. Our proposal builds upon well-established methods and requires no rehearsal memory or parameter expansion. We apply our informed SNGP model to the trajectory prediction problem in autonomous driving by integrating prior drivability knowledge. On two public datasets, we investigate its performance under diminishing training data and across locations, and thereby demonstrate an increase in data-efficiency and robustness to location-transfers over non-informed and informed baselines.

Updated: 2024-03-18 17:05:24

标题: 基于信息的谱归一化高斯过程用于轨迹预测

摘要: 先验参数分布提供了一种优雅的方式来表示先验专家和世界知识，以进行有信息的学习。先前的研究表明，使用这样的信息先验来正则化概率深度学习（DL）模型可以提高它们的性能和数据效率。然而，用于概率DL模型的常用基于采样的近似方法可能在计算上昂贵，需要多次推断和更长的训练时间。一个有前景的替代方案是计算高效的最后一层核近似，如谱归一化高斯过程（SNGPs）。我们提出了一种基于正则化的持续学习方法，适用于SNGPs，这种方法使得可以使用能够代表从先前任务中学到的先验知识的信息先验。我们的提议建立在成熟的方法之上，不需要回放记忆或参数扩展。我们将我们的有信息的SNGP模型应用于自动驾驶中的轨迹预测问题，通过整合先验可驾驶性知识。在两个公共数据集上，我们研究了它在训练数据减少和跨位置下的性能，并因此展示了相对于非信息和信息基线的数据效率和对位置转移的稳健性的增加。

更新时间: 2024-03-18 17:05:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.11966v1

AI for bureaucratic productivity: Measuring the potential of AI to help automate 143 million UK government transactions

There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK central government, and measuring their potential for AI-driven automation. We estimate that UK central government conducts approximately one billion citizen-facing transactions per year in the provision of around 400 services, of which approximately 143 million are complex repetitive transactions. We estimate that 84% of these complex transactions are highly automatable, representing a huge potential opportunity: saving even an average of just one minute per complex transaction would save the equivalent of approximately 1,200 person-years of work every year. We also develop a model to estimate the volume of transactions a government service undertakes, providing a way for government to avoid conducting time consuming transaction volume measurements. Finally, we find that there is high turnover in the types of services government provide, meaning that automation efforts should focus on general procedures rather than services themselves which are likely to evolve over time. Overall, our work presents a novel perspective on the structure and functioning of modern government, and how it might evolve in the age of artificial intelligence.

Updated: 2024-03-18 17:03:17

标题: AI对官僚生产力的作用：衡量AI帮助自动化1.43亿英国政府交易的潜力

摘要: 目前，政府对人工智能提高公共服务生产力的潜力感到相当兴奋，通过自动化复杂但重复的官僚任务，从而节约有技能员工的时间。在这里，我们探讨了这一机会的规模，通过描绘英国中央政府面向公民的官僚决策程序的规模，并测量其AI驱动自动化的潜力。我们估计英国中央政府每年进行大约10亿次面向公民的交易，在提供约400项服务时，其中约有1.43亿次是复杂重复的交易。我们估计这些复杂交易中有84%是高度可自动化的，代表着巨大的潜在机会：即使每个复杂交易仅节省一分钟，每年就可以节省相当于约1200人年的工作量。我们还开发了一个模型来估计政府服务承担的交易量，为政府避免进行耗时的交易量测量提供了一种方法。最后，我们发现政府提供的服务类型有很高的周转率，这意味着自动化工作应该集中在通用程序上，而不是服务本身，因为服务可能随时间演变。总的来说，我们的工作提出了一个关于现代政府结构和运作方式的新颖观点，以及在人工智能时代可能如何发展。

更新时间: 2024-03-18 17:03:17

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.14712v1

Enhanced Event-Based Video Reconstruction with Motion Compensation

Deep neural networks for event-based video reconstruction often suffer from a lack of interpretability and have high memory demands. A lightweight network called CISTA-LSTC has recently been introduced showing that high-quality reconstruction can be achieved through the systematic design of its architecture. However, its modelling assumption that input signals and output reconstructed frame share the same sparse representation neglects the displacement caused by motion. To address this, we propose warping the input intensity frames and sparse codes to enhance reconstruction quality. A CISTA-Flow network is constructed by integrating a flow network with CISTA-LSTC for motion compensation. The system relies solely on events, in which predicted flow aids in reconstruction and then reconstructed frames are used to facilitate flow estimation. We also introduce an iterative training framework for this combined system. Results demonstrate that our approach achieves state-of-the-art reconstruction accuracy and simultaneously provides reliable dense flow estimation. Furthermore, our model exhibits flexibility in that it can integrate different flow networks, suggesting its potential for further performance enhancement.

Updated: 2024-03-18 16:58:23

标题: 通过运动补偿增强的基于事件的视频重建

摘要: 深度神经网络用于基于事件的视频重建往往缺乏可解释性并具有高内存需求。最近引入了一种轻量级网络CISTA-LSTC，显示出通过系统设计其架构可以实现高质量的重建。然而，它的建模假设，即输入信号和输出重建帧共享相同的稀疏表示，忽略了由运动引起的位移。为了解决这个问题，我们提出了对输入强度帧和稀疏编码进行变形以增强重建质量。通过将流网络与CISTA-LSTC集成，构建了一个CISTA-Flow网络用于运动补偿。该系统完全依赖事件，在其中预测的流有助于重建，然后重建帧用于促进流估计。我们还为这个组合系统引入了一个迭代训练框架。结果表明，我们的方法实现了最先进的重建准确性，并同时提供可靠的密集流估计。此外，我们的模型表现出灵活性，可以集成不同的流网络，表明其进一步性能增强的潜力。

更新时间: 2024-03-18 16:58:23

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11961v1

Streamlining Social Media Information Retrieval for COVID-19 Research with Deep Learning

Objective: Social media-based public health research is crucial for epidemic surveillance, but most studies identify relevant corpora with keyword-matching. This study develops a system to streamline the process of curating colloquial medical dictionaries. We demonstrate the pipeline by curating a UMLS-colloquial symptom dictionary from COVID-19-related tweets as proof of concept. Methods: COVID-19-related tweets from February 1, 2020, to April 30, 2022 were used. The pipeline includes three modules: a named entity recognition module to detect symptoms in tweets; an entity normalization module to aggregate detected entities; and a mapping module that iteratively maps entities to Unified Medical Language System concepts. A random 500 entity sample were drawn from the final dictionary for accuracy validation. Additionally, we conducted a symptom frequency distribution analysis to compare our dictionary to a pre-defined lexicon from previous research. Results: We identified 498,480 unique symptom entity expressions from the tweets. Pre-processing reduces the number to 18,226. The final dictionary contains 38,175 unique expressions of symptoms that can be mapped to 966 UMLS concepts (accuracy = 95%). Symptom distribution analysis found that our dictionary detects more symptoms and is effective at identifying psychiatric disorders like anxiety and depression, often missed by pre-defined lexicons. Conclusions: This study advances public health research by implementing a novel, systematic pipeline for curating symptom lexicons from social media data. The final lexicon's high accuracy, validated by medical professionals, underscores the potential of this methodology to reliably interpret and categorize vast amounts of unstructured social media data into actionable medical insights across diverse linguistic and regional landscapes.

Updated: 2024-03-18 16:22:16

标题: 利用深度学习简化COVID-19研究中的社交媒体信息检索

摘要: 目标：基于社交媒体的公共卫生研究对于流行病监测至关重要，但大多数研究通过关键词匹配来识别相关语料库。本研究开发了一个系统，以简化整理口头医学词典的过程。我们通过整理一个COVID-19相关推文的UMLS-口头症状词典来演示这一流程。方法：使用了2020年2月1日至2022年4月30日的COVID-19相关推文。该流程包括三个模块：一个命名实体识别模块用于检测推文中的症状；一个实体标准化模块用于聚合检测到的实体；以及一个映射模块，迭代地将实体映射到统一医学语言系统概念。从最终词典中随机抽取了500个实体样本进行准确性验证。此外，我们进行了症状频率分布分析，将我们的词典与先前研究中的预定义词汇进行比较。结果：我们从推文中识别了498,480个唯一的症状实体表达。预处理将这个数字减少到18,226。最终词典包含38,175个唯一的症状表达，可以映射到966个UMLS概念（准确性=95%）。症状分布分析发现，我们的词典检测到更多的症状，并且能够有效识别焦虑和抑郁等精神障碍，通常被预定义词典遗漏。结论：本研究通过实施一种新颖的系统流程，从社交媒体数据中整理症状词典，推动了公共卫生研究。经医疗专业人员验证的最终词典高准确性，突显了这种方法的潜力，可以可靠地解释和分类大量的非结构化社交媒体数据，为跨越不同语言和地区的广泛医学洞察提供行动性数据。

更新时间: 2024-03-18 16:22:16

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2306.16001v3

Predict the Next Word: Humans exhibit uncertainty in this task and language models _____

Language models (LMs) are statistical models trained to assign probability to human-generated text. As such, it is reasonable to question whether they approximate linguistic variability exhibited by humans well. This form of statistical assessment is difficult to perform at the passage level, for it requires acceptability judgements (i.e., human evaluation) or a robust automated proxy (which is non-trivial). At the word level, however, given some context, samples from an LM can be assessed via exact matching against a prerecorded dataset of alternative single-word continuations of the available context. We exploit this fact and evaluate the LM's ability to reproduce variability that humans (in particular, a population of English speakers) exhibit in the 'next word prediction' task. This can be seen as assessing a form of calibration, which, in the context of text classification, Baan et al. (2022) termed calibration to human uncertainty. We assess GPT2, BLOOM and ChatGPT and find that they exhibit fairly low calibration to human uncertainty. We also verify the failure of expected calibration error (ECE) to reflect this, and as such, advise the community against relying on it in this setting.

Updated: 2024-03-18 16:21:24

标题: 预测下一个词：人类在这个任务和语言模型中表现出不确定性 _____

摘要: 语言模型（LMs）是训练出来的统计模型，用于为人类生成的文本分配概率。因此，可以合理地质疑它们是否很好地逼近人类展示的语言变异性。在段落级别进行这种形式的统计评估是困难的，因为它需要接受性判断（即人类评估）或者一个强大的自动代理（这并不容易）。然而，在词级别上，通过某些上下文，LM的样本可以通过与可用上下文的替代单词延续的预先记录数据集进行精确匹配来进行评估。我们利用这一事实评估LM重现人类（特别是英语使用者群体）在“下一个单词预测”任务中展示的变异性的能力。这可以看作是评估一种校准的形式，即在文本分类的背景下，Baan等人（2022年）将其称为对人类不确定性的校准。我们评估了GPT2、BLOOM和ChatGPT，并发现它们在对人类不确定性的校准方面表现得相当低。我们还验证了预期校准误差（ECE）未能反映这一点，并因此建议社区不要在这种情况下依赖它。

更新时间: 2024-03-18 16:21:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.17527v2

PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts

Vision-language models like CLIP are widely used in zero-shot image classification due to their ability to understand various visual concepts and natural language descriptions. However, how to fully leverage CLIP's unprecedented human-like understanding capabilities to achieve better performance is still an open question. This paper draws inspiration from the human visual perception process: when classifying an object, humans first infer contextual attributes (e.g., background and orientation) which help separate the foreground object from the background, and then classify the object based on this information. Inspired by it, we observe that providing CLIP with contextual attributes improves zero-shot image classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and interoperability. Our code is available at https://github.com/umd-huang-lab/perceptionCLIP

Updated: 2024-03-18 16:02:10

标题: PerceptionCLIP: 通过推断和条件化上下文进行视觉分类

摘要: 视觉-语言模型如CLIP广泛用于零样本图像分类，因为它们能够理解各种视觉概念和自然语言描述。然而，如何充分利用CLIP的前所未有的类人理解能力来实现更好的性能仍然是一个悬而未决的问题。本文从人类视觉感知过程中汲取灵感：在对物体进行分类时，人类首先推断出上下文属性（例如背景和方向），这有助于将前景对象与背景分开，然后根据这些信息对物体进行分类。受此启发，我们观察到为CLIP提供上下文属性可以改善零样本图像分类，并减少对虚假特征的依赖。我们还观察到CLIP本身可以合理地从图像中推断出这些属性。基于这些观察，我们提出了一种无需训练的两步零样本分类方法PerceptionCLIP。给定一幅图像，它首先推断出上下文属性（例如背景），然后在这些属性的基础上进行物体分类。我们的实验证明，PerceptionCLIP实现了更好的泛化性能、群体稳健性和互操作性。我们的代码可在https://github.com/umd-huang-lab/perceptionCLIP找到。

更新时间: 2024-03-18 16:02:10

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2308.01313v3

Larimar: Large Language Models with Episodic Memory Control

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 4-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting and input context length generalization with Larimar and show their effectiveness.

Updated: 2024-03-18 16:01:42

标题: 拉里玛：具有情节记忆控制的大型语言模型

摘要: 大型语言模型（LLMs）中存储的知识的高效准确更新是当今最紧迫的研究挑战之一。本文介绍了Larimar - 一种新颖的、启发于大脑的架构，用于增强LLMs的分布式情节记忆。Larimar的记忆允许动态、一次性更新知识，无需进行计算昂贵的重新训练或微调。在多个事实编辑基准测试上的实验结果表明，Larimar在挑战性的顺序编辑设置中达到了与大多数竞争基线相当的准确性，但在速度上也表现出色 - 根据基础LLM的不同，速度提升了4-10倍，同时由于所提出的架构简单、与LLM无关，因此具有灵活性和普遍性。我们还提供了用于选择性事实遗忘和输入上下文长度概括的机制，并展示了它们的有效性。

更新时间: 2024-03-18 16:01:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.11901v1

Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence -- called the Tsallis KL divergence -- which use the $q$-logarithm in the definition. The approach is a strict generalization, as $q = 1$ corresponds to the standard KL divergence; $q > 1$ provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when $q >1$ could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches to incorporate KL regularization. We show that this generalized MVI($q$) obtains significant improvements over the standard MVI($q = 1$) across 35 Atari games.

Updated: 2024-03-18 15:53:34

标题: Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence 使用Tsallis KL散度的广义Munchausen强化学习

摘要: 许多强化学习中的政策优化方法都包含了对先前政策的Kullback-Leibler（KL）散度，以防止政策变化过快。这个想法最初是在一篇关于保守政策迭代的开创性论文中提出的，近似解法由诸如TRPO和Munchausen Value Iteration（MVI）的算法给出。我们继续这一工作线索，通过研究广义KL散度--称为Tsallis KL散度--该散度在其定义中使用了$q$-对数。这种方法是一种严格的泛化，因为$q = 1$对应于标准的KL散度；$q > 1$提供了一系列新选项。我们表征了在Tsallis KL下学习的政策类型，并阐述了在$q > 1$时可能有益的动机。为了获得一个包含Tsallis KL正则化的实用算法，我们扩展了MVI，这是一种最简单的包括KL正则化的方法之一。我们展示了这种广义MVI($q$)在35个Atari游戏中相对于标准MVI($q = 1$)的显著改进。

更新时间: 2024-03-18 15:53:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2301.11476v4

From explainable to interpretable deep learning for natural language processing in healthcare: how far from reality?

Deep learning (DL) has substantially enhanced healthcare research by addressing various natural language processing (NLP) tasks. Yet, the increasing complexity of DL-based NLP methods necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review on explainable and interpretable DL in healthcare NLP. The term "XIAI" (eXplainable and Interpretable Artificial Intelligence) was introduced to distinguish XAI from IAI. Methods were further categorized based on their functionality (model-, input-, output-based) and scope (local, global). Our analysis shows that attention mechanisms were the most dominant emerging IAI. Moreover, IAI is increasingly used against XAI. The major challenges identified are that most XIAI do not explore "global" modeling processes, the lack of best practices, and the unmet need for systematic evaluation and benchmarks. Important opportunities were raised such as using "attention" to enhance multi-modal XIAI for personalized medicine and combine DL with causal reasoning. Our discussion encourages the integration of XIAI in LLMs and domain-specific smaller models. Our review can stimulate further research and benchmarks toward improving inherent IAI and engaging complex NLP in healthcare.

Updated: 2024-03-18 15:53:33

标题: 从可解释到可解释的深度学习在医疗自然语言处理中的应用：离现实有多远？

摘要: 深度学习（DL）通过解决各种自然语言处理（NLP）任务，显著增强了医疗保健研究。然而，基于DL的NLP方法的增加复杂性需要透明的模型可解释性，或者至少是解释性，以便进行可靠的决策。本研究对医疗保健NLP中可解释和可解释的DL进行了全面的范围性回顾。术语“XIAI”（可解释和可解释的人工智能）被引入以区分XAI和IAI。方法根据其功能（模型、输入、输出）和范围（本地、全球）进一步分类。我们的分析表明，注意机制是最主要的新兴IAI。此外，IAI越来越多地用于对抗XAI。确定的主要挑战是大多数XIAI不探索“全局”建模过程，缺乏最佳实践，并且需要系统评估和基准。提出了重要的机会，如使用“注意力”来增强个性化医学的多模态XIAI，并将DL与因果推理相结合。我们的讨论鼓励将XIAI整合到LLMs和领域特定的较小模型中。我们的回顾可以激发进一步研究和基准，以改善固有的IAI并参与医疗保健中复杂的NLP。

更新时间: 2024-03-18 15:53:33

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11894v1

ASOP: A Sovereign and Secure Device Onboarding Protocol for Cloud-based IoT Services

The existing high-friction device onboarding process hinders the promise and potentiality of Internet of Things (IoT). Even after several attempts by various device manufacturers and working groups, no widely adopted standard solution came to fruition. The latest attempt by Fast Identity Online (FIDO) Alliance promises a zero touch solution for mass market IoT customers, but the burden is transferred to the intermediary supply chain (i.e. they have to maintain infrastructure for managing keys and digital signatures called `Ownership Voucher' for all devices). The specification relies on a `Rendezvous Server' mimicking the notion of Domain Name System (DNS) server'. This essentially means resurrecting all existing possible attack scenarios associated with DNS, which include Denial of Service (DoS) attack, and Correlation attack. `Ownership Voucher' poses the risk that some intermediary supply chain agents may act maliciously and reject the transfer of ownership or sign with a wrong key. Furthermore, the deliberate use of the weak elliptic curve SECP256r1/SECP384r1 (also known as NIST P-256/384) in the specification raises questions. We introduce ASOP: a sovereign and secure device onboarding protocol for IoT devices without blindly trusting the device manufacturer, supply chain, and cloud service provider. The ASOP protocol allows onboarding an IoT device to a cloud server with the help of an authenticator owned by the user. This paper outlines the preliminary development of the protocol and its high-level description. Our `zero-trust' and `human-in-the-loop' approach guarantees that the device owner does not remain at the mercy of third-party infrastructures, and it utilises recently standardized post-quantum cryptographic suite (CRYSTALS) to secure connection and messages.

Updated: 2024-03-18 15:45:14

标题: ASOP: 云端IoT服务的主权和安全设备入网协议

摘要: 现有的高摩擦力设备入网流程阻碍了物联网（IoT）的潜力和前景。尽管各种设备制造商和工作组多次尝试，但没有被广泛采用的标准解决方案得以实现。最新由快速身份认证在线（FIDO）联盟提出的尝试承诺为大众市场的IoT客户提供零触摸解决方案，但负担转移到了中介供应链（即他们必须维护管理所有设备的密钥和数字签名的基础设施，称为`所有权凭证'）。该规范依赖于模拟域名系统（DNS）服务器概念的`Rendezvous Server'。这本质上意味着复活与DNS相关的所有可能攻击场景，包括拒绝服务（DoS）攻击和关联攻击。`所有权凭证'存在着一些中介供应链代理可能恶意行为，拒绝所有权转移或使用错误密钥签名的风险。此外，规范中故意使用弱椭圆曲线SECP256r1/SECP384r1（也称为NIST P-256/384）引发了问题。我们介绍了ASOP：一个主权和安全的IoT设备入网协议，无需盲目信任设备制造商、供应链和云服务提供商。ASOP协议允许用户借助自己拥有的认证器将IoT设备引入云服务器。本文概述了该协议的初步开发和高级描述。我们的`零信任'和`人在环'方法保证设备所有者不会被第三方基础设施所控制，同时利用最近标准化的后量子密码套件（CRYSTALS）来保护连接和消息。

更新时间: 2024-03-18 15:45:14

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2403.13020v1

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.

Updated: 2024-03-18 15:40:36

标题: SuperLoRA：参数高效的多层注意力模块统一调整

摘要: 低秩适应（LoRA）及其变体被广泛应用于微调大型模型，包括自然语言处理的大型语言模型和计算机视觉的扩散模型。本文提出了一个称为SuperLoRA的通用框架，统一和扩展了不同的LoRA变体，可以在不同的超参数设置下实现。通过引入分组、折叠、洗牌、投影和张量分解，SuperLoRA相对于其他LoRA变体提供了高灵活性，并在特别是在极少参数范围内的迁移学习任务中展现出出色的性能。

更新时间: 2024-03-18 15:40:36

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11887v1

QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction

Employing Large Language Models (LLMs) for semantic parsing has achieved remarkable success. However, we find existing methods fall short in terms of reliability and efficiency when hallucinations are encountered. In this paper, we address these challenges with a framework called QueryAgent, which solves a question step-by-step and performs step-wise self-correction. We introduce an environmental feedback-based self-correction method called ERASER. Unlike traditional approaches, ERASER leverages rich environmental feedback in the intermediate steps to perform selective and differentiated self-correction only when necessary. Experimental results demonstrate that QueryAgent notably outperforms all previous few-shot methods using only one example on GrailQA and GraphQ by 7.0 and 15.0 F1. Moreover, our approach exhibits superiority in terms of efficiency, including runtime, query overhead, and API invocation costs. By leveraging ERASER, we further improve another baseline (i.e., AgentBench) by approximately 10 points, revealing the strong transferability of our approach.

Updated: 2024-03-18 15:39:14

标题: QueryAgent：基于环境反馈的可靠高效推理框架与自我纠正

摘要: 使用大型语言模型（LLMs）进行语义解析取得了显著成功。然而，我们发现在遇到幻觉时，现有方法在可靠性和效率方面存在不足。在本文中，我们提出了一个名为QueryAgent的框架，该框架逐步解决问题并进行逐步自我纠正。我们引入了一种基于环境反馈的自我纠正方法，称为ERASER。与传统方法不同，ERASER利用中间步骤中丰富的环境反馈，只在必要时执行选择性和差异化的自我纠正。实验结果表明，QueryAgent在GrailQA和GraphQ上仅使用一个示例就比所有先前的少样本方法表现出色，F1值分别提高了7.0和15.0。此外，我们的方法在效率方面表现出优越性，包括运行时间、查询开销和API调用成本。通过利用ERASER，我们进一步提高了另一个基准（即AgentBench）约10个点，揭示了我们方法的强大可转移性。

更新时间: 2024-03-18 15:39:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.11886v1

Deep Regularized Compound Gaussian Network for Solving Linear Inverse Problems

Incorporating prior information into inverse problems, e.g. via maximum-a-posteriori estimation, is an important technique for facilitating robust inverse problem solutions. In this paper, we devise two novel approaches for linear inverse problems that permit problem-specific statistical prior selections within the compound Gaussian (CG) class of distributions. The CG class subsumes many commonly used priors in signal and image reconstruction methods including those of sparsity-based approaches. The first method developed is an iterative algorithm, called generalized compound Gaussian least squares (G-CG-LS), that minimizes a regularized least squares objective function where the regularization enforces a CG prior. G-CG-LS is then unrolled, or unfolded, to furnish our second method, which is a novel deep regularized (DR) neural network, called DR-CG-Net, that learns the prior information. A detailed computational theory on convergence properties of G-CG-LS and thorough numerical experiments for DR-CG-Net are provided. Due to the comprehensive nature of the CG prior, these experiments show that DR-CG-Net outperforms competitive prior art methods in tomographic imaging and compressive sensing, especially in challenging low-training scenarios.

Updated: 2024-03-18 15:35:52

标题: 深度正则化复合高斯网络用于解决线性反问题

摘要: 将先验信息纳入逆问题中，例如通过最大后验估计，是促进健壮逆问题解决方案的重要技术。本文提出了两种新颖的线性逆问题方法，允许问题特定的统计先验选择在复合高斯（CG）分布类中。CG类包含了许多在信号和图像重建方法中常用的先验，包括稀疏性为基础的方法。首先开发的方法是一种迭代算法，称为广义复合高斯最小二乘（G-CG-LS），它最小化一个正则化最小二乘目标函数，其中正则化强制执行CG先验。然后将G-CG-LS展开，提供我们的第二种方法，即一种新颖的深度正则化（DR）神经网络，称为DR-CG-Net，它可以学习先验信息。提供了关于G-CG-LS收敛性质的详细计算理论，以及DR-CG-Net的彻底数值实验。由于CG先验的全面性质，这些实验表明DR-CG-Net在层析成像和压缩感知方面优于竞争性先前方法，尤其是在具有挑战性的低训练场景中。

更新时间: 2024-03-18 15:35:52

领域: eess.SP,cs.AI,cs.NA,math.NA

下载: http://arxiv.org/abs/2311.17248v3

ReGenNet: Towards Human Action-Reaction Synthesis

Humans constantly interact with their surrounding environments. Current human-centric generative models mainly focus on synthesizing humans plausibly interacting with static scenes and objects, while the dynamic human action-reaction synthesis for ubiquitous causal human-human interactions is less explored. Human-human interactions can be regarded as asymmetric with actors and reactors in atomic interaction periods. In this paper, we comprehensively analyze the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions and propose the first multi-setting human action-reaction synthesis benchmark to generate human reactions conditioned on given human actions. To begin with, we propose to annotate the actor-reactor order of the interaction sequences for the NTU120, InterHuman, and Chi3D datasets. Based on them, a diffusion-based generative model with a Transformer decoder architecture called ReGenNet together with an explicit distance-based interaction loss is proposed to predict human reactions in an online manner, where the future states of actors are unavailable to reactors. Quantitative and qualitative results show that our method can generate instant and plausible human reactions compared to the baselines, and can generalize to unseen actor motions and viewpoint changes.

Updated: 2024-03-18 15:33:06

标题: ReGenNet: 朝着人类动作-反应综合的方向

摘要: 人类不断与周围环境互动。当前以人为中心的生成模型主要侧重于合成人类与静态场景和物体进行合理互动，而对于无处不在的因果人际互动的动态人类动作-反应合成的研究较少。人-人互动可以被视为在原子互动期间具有不对称的行为者和反应者。在本文中，我们全面分析了人-人互动的不对称、动态、同步和详细的性质，并提出了第一个多设置的人类动作-反应合成基准来生成基于给定人类动作的人类反应。首先，我们提出为NTU120、InterHuman和Chi3D数据集注释互动序列的行为者-反应者顺序。基于这些，提出了一种基于扩散的生成模型，其中使用了Transformer解码器架构称为ReGenNet，以及一个明确的基于距离的互动损失，用于在线预测人类反应，其中行为者的未来状态对反应者不可见。定量和定性结果表明，与基线相比，我们的方法可以生成即时和合理的人类反应，并且可以推广到未见的演员动作和视角变化。

更新时间: 2024-03-18 15:33:06

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.11882v1

CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware

With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network's attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x.

Updated: 2024-03-18 15:25:30

标题: CiMNet：面向计算内存硬件的DNN架构和配置的联合优化

摘要: 随着对大规模深度神经网络需求的增长，计算内存（CiM）作为一种显著的解决方案，用于缓解限制冯诺依曼架构的带宽和芯片内部互连瓶颈。然而，CiM硬件的构建带来了挑战，因为任何特定的内存层次结构，例如缓存大小和不同接口的内存带宽，可能与神经网络的属性（例如张量维度和算术密集度）并不完全匹配，从而导致次优和性能不佳的系统。尽管神经架构搜索（NAS）技术在为给定的硬件度量预算（例如DNN执行时间或延迟）提供有效的子网络方面取得成功，但它假定硬件配置已经固定，通常导致给定预算下的次优子网络。在本文中，我们提出了CiMNet，一个框架，用于联合搜索CiM架构的最佳子网络和硬件配置，创建下游任务准确性和执行度量（例如延迟）的帕累托最优前沿。所提出的框架可以理解子网络性能与CiM硬件配置选择之间的复杂相互作用，包括带宽、处理单元大小和内存大小。在来自CNN和Transformer家族的不同模型架构上进行的详尽实验展示了CiMNet在发现共同优化的子网络和CiM硬件配置方面的功效。具体而言，对于与基线ViT-B相似的ImageNet分类准确度，仅优化模型架构可以使性能提高（或减少工作负载执行时间）1.7倍，而同时优化模型架构和硬件配置可以使性能提高3.1倍。

更新时间: 2024-03-18 15:25:30

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2402.11780v2

Exploring Multi-modal Neural Scene Representations With Applications on Thermal Imaging

Neural Radiance Fields (NeRFs) quickly evolved as the new de-facto standard for the task of novel view synthesis when trained on a set of RGB images. In this paper, we conduct a comprehensive evaluation of neural scene representations, such as NeRFs, in the context of multi-modal learning. Specifically, we present four different strategies of how to incorporate a second modality, other than RGB, into NeRFs: (1) training from scratch independently on both modalities; (2) pre-training on RGB and fine-tuning on the second modality; (3) adding a second branch; and (4) adding a separate component to predict (color) values of the additional modality. We chose thermal imaging as second modality since it strongly differs from RGB in terms of radiosity, making it challenging to integrate into neural scene representations. For the evaluation of the proposed strategies, we captured a new publicly available multi-view dataset, ThermalMix, consisting of six common objects and about 360 RGB and thermal images in total. We employ cross-modality calibration prior to data capturing, leading to high-quality alignments between RGB and thermal images. Our findings reveal that adding a second branch to NeRF performs best for novel view synthesis on thermal images while also yielding compelling results on RGB. Finally, we also show that our analysis generalizes to other modalities, including near-infrared images and depth maps. Project page: https://mert-o.github.io/ThermalNeRF/.

Updated: 2024-03-18 15:18:55

标题: 使用热成像技术探索多模态神经场景表示及应用

摘要: 神经辐射场（NeRFs）在训练一组RGB图像时迅速成为新的事实标准，用于新视图合成任务。本文在多模态学习背景下对神经场景表示（如NeRFs）进行全面评估。具体而言，我们提出了四种不同的策略，如何将除RGB之外的第二模态纳入NeRFs中：（1）独立在两种模态上进行从头训练；（2）在RGB上进行预训练，然后在第二模态上进行微调；（3）添加第二分支；和（4）添加一个单独的组件来预测附加模态的（颜色）值。我们选择热成像作为第二模态，因为它在辐射上与RGB明显不同，这使得将其整合到神经场景表示中具有挑战性。为了评估提出的策略，我们捕捉了一个新的公开可用的多视角数据集ThermalMix，总共包含六个常见对象和约360张RGB和热图像。我们在数据捕获之前进行跨模态校准，实现RGB和热图像之间的高质量对齐。我们的研究结果表明，在热图像的新视图合成方面，向NeRF添加第二分支效果最好，同时在RGB上也产生令人信服的结果。最后，我们还展示了我们的分析推广到其他模态，包括近红外图像和深度图。项目页面：https://mert-o.github.io/ThermalNeRF/。

更新时间: 2024-03-18 15:18:55

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2403.11865v1

Executable Code Actions Elicit Better LLM Agents

Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.

Updated: 2024-03-18 15:18:45

标题: 可执行代码操作引发更好的LLM代理

摘要: 大型语言模型（LLM）代理能够执行广泛的行动，例如调用工具和控制机器人，在解决现实世界挑战方面显示出巨大潜力。LLM代理通常通过生成JSON或文本以预定义格式来产生行动，这通常受到有限的行动空间（例如，预定义工具的范围）和受限的灵活性（例如，无法组合多个工具）的限制。本文提出使用可执行的Python代码将LLM代理的行动整合到统一的行动空间（CodeAct）中。CodeAct集成了Python解释器，可以执行代码操作，并通过多轮交互根据新的观察动态修订先前的操作或发出新的操作。我们对17个LLM在API-Bank和一个新的精心策划的基准测试进行了广泛分析，结果显示CodeAct的表现优于广泛使用的替代方法（成功率高出20%）。CodeAct的令人鼓舞的表现激励我们构建一个与环境交互的开源LLM代理，通过执行可解释代码与用户使用自然语言进行协作。为此，我们收集了一个指令调整数据集CodeActInstruct，其中包含7k次使用CodeAct的多轮交互。我们展示它可以与现有数据一起使用，在不损害其一般能力的情况下改进代理导向任务中的模型。由Llama2和Mistral微调的CodeActAgent集成了Python解释器，专门定制以执行复杂任务（例如，模型训练），利用现有库并自主进行自我调试。

更新时间: 2024-03-18 15:18:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.01030v2

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model

Insufficient modeling of human preferences within the reward model is a major obstacle for leveraging human feedback to improve translation quality. Fortunately, quality estimation (QE), which predicts the quality of a given translation without reference, has achieved impressive alignment with human evaluations in the last two years. In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. We examine the problem and argue that the vulnerability of the QE model might lead to high rewards for incorrect translations, resulting in overoptimization and error propagation. To address the problem, we adopt a simple yet effective method that uses heuristic rules to detect the incorrect translations and assigns a penalty term to the reward scores of them. Experimental results show that the proposed QE-based feedback training achieves consistent and significant improvements across various settings, further verified through human preference studies. Our subsequent analysis demonstrates the high data efficiency of the proposed QE-based feedback training: it outperforms systems using larger parallel corpora by a small amount of monolingual data. Our code is available at: https://github.com/zwhe99/FeedbackMT

Updated: 2024-03-18 15:16:16

标题: 用人类反馈来改进机器翻译：质量评估作为奖励模型的探索

摘要: 模型对人类偏好建模不足是利用人类反馈改进翻译质量的主要障碍。幸运的是，质量估计（QE）在过去两年中已经取得了与人类评估令人印象深刻的一致性。在这项工作中，我们研究了将QE模型作为奖励模型以预测人类偏好的潜力，用于反馈训练。我们首先确定了基于QE的反馈训练中的过度优化问题，表现为奖励增加而翻译质量下降。我们检查了这个问题，并认为QE模型的脆弱性可能导致不正确翻译获得高奖励，进而导致过度优化和错误传播。为了解决这个问题，我们采用了一种简单而有效的方法，使用启发式规则检测不正确的翻译，并为其奖励分数分配惩罚项。实验结果显示，提出的基于QE的反馈训练在各种设置中实现了一致且显著的改进，进一步通过人类偏好研究进行验证。我们的后续分析表明，提出的基于QE的反馈训练具有高数据效率：它在比使用更大的平行语料库系统稍微少量的单语数据的情况下表现更好。我们的代码可在以下链接找到：https://github.com/zwhe99/FeedbackMT

更新时间: 2024-03-18 15:16:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2401.12873v3

Towards automated formal security analysis of SAML V2.0 Web Browser SSO standard -- the POST/Artifact use case

Single Sign-On (SSO) protocols streamline user authentication with a unified login for multiple online services, improving usability and security. One of the most common SSO protocol frameworks - the Security Assertion Markup Language V2.0 (SAML) Web SSO Profile - has been in use for more than two decades, primarily in government, education and enterprise environments. Despite its mission-critical nature, only certain deployments and configurations of the Web SSO Profile have been formally analyzed. This paper attempts to bridge this gap by performing a comprehensive formal security analysis of the SAML V2.0 SP-initiated SSO with POST/Artifact Bindings use case. Rather than focusing on a specific deployment and configuration, we closely follow the specification with the goal of capturing many different deployments allowed by the standard. Modeling and analysis is performed using Tamarin prover - state-of-the-art tool for automated verification of security protocols in the symbolic model of cryptography. Technically, we build a meta-model of the use case that we instantiate to eight different protocol variants. Using the Tamarin prover, we formally verify a number of critical security properties for those protocol variants, while identifying certain drawbacks and potential vulnerabilities.

Updated: 2024-03-18 15:11:29

标题: 朝向自动化的SAML V2.0 Web浏览器SSO标准的形式安全分析 -- POST/Artifact使用案例

摘要: 单一登录（SSO）协议通过统一登录为多个在线服务简化用户身份验证，提高可用性和安全性。其中最常见的SSO协议框架 - 安全断言标记语言V2.0（SAML）Web SSO框架 - 已经在政府、教育和企业环境中使用了二十多年。尽管其具有关键性质，但只有某些Web SSO框架的部署和配置已经得到正式分析。本文试图通过对SAML V2.0 SP-initiated SSO with POST/Artifact Bindings使用案例进行全面形式安全分析来填补这一空白。我们不是专注于特定的部署和配置，而是紧密遵循规范，以捕捉标准允许的许多不同部署。使用Tamarin prover进行建模和分析 - 这是一个用于在密码学符号模型中自动验证安全协议的最新工具。在技术上，我们建立了一个使用案例的元模型，我们将其实例化为八种不同的协议变体。使用Tamarin prover，我们正式验证了这些协议变体的许多关键安全属性，同时识别了某些缺点和潜在漏洞。

更新时间: 2024-03-18 15:11:29

领域: cs.CR

下载: http://arxiv.org/abs/2403.11859v1

DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response Evaluation

The fields of therapeutic application and drug research and development (R&D) both face substantial challenges, i.e., the therapeutic domain calls for more treatment alternatives, while numerous promising pre-clinical drugs have failed in clinical trials. One of the reasons is the inadequacy of Cross-drug Response Evaluation (CRE) during the late stages of drug R&D. Although in-silico CRE models bring a promising solution, existing methodologies are restricted to early stages of drug R&D, such as target and cell-line levels, offering limited improvement to clinical success rates. Herein, we introduce DeepCRE, a pioneering AI model designed to predict CRE effectively in the late stages of drug R&D. DeepCRE outperforms the existing best models by achieving an average performance improvement of 17.7% in patient-level CRE, and a 5-fold increase in indication-level CRE, facilitating more accurate personalized treatment predictions and better pharmaceutical value assessment for indications, respectively. Furthermore, DeepCRE has identified a set of six drug candidates that show significantly greater effectiveness than a comparator set of two approved drugs in 5/8 colorectal cancer organoids. This demonstrates the capability of DeepCRE to systematically uncover a spectrum of drug candidates with enhanced therapeutic effects, highlighting its potential to transform drug R&D.

Updated: 2024-03-18 15:05:55

标题: DeepCRE：通过AI驱动的跨药物响应评估改变药物研发

摘要: 治疗应用领域和药物研发(R&D)领域都面临着重大挑战，即治疗领域需要更多的治疗选择，而许多有前途的临床前药物在临床试验中失败。其中一个原因是在药物研发的后期阶段交叉药物反应评估(CRE)的不足。虽然基于计算机的CRE模型提供了一种有希望的解决方案，但现有的方法论仅限于药物研发的早期阶段，如靶点和细胞线水平，对临床成功率的提高有限。在这里，我们介绍了DeepCRE，一个开创性的人工智能模型，旨在有效预测药物研发后期的CRE。DeepCRE通过在患者级别CRE上实现了平均性能提高17.7%，在适应症级别CRE上提高了5倍，从而有助于更准确地个性化治疗预测，并为不同适应症的药物价值评估提供更好的支持。此外，DeepCRE已经确定了一组六种药物候选药物，这些药物在5/8结直肠癌器官样体系中表现出比两种批准药物更显着的效果。这表明DeepCRE有能力系统地发现一系列具有增强疗效的药物候选物，突显了其改变药物研发的潜力。

更新时间: 2024-03-18 15:05:55

领域: cs.AI,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2403.03768v3

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.

Updated: 2024-03-18 15:05:36

标题: 增强解毒剂：改进的点对点认证对抗毒攻击

摘要: 中毒攻击可以通过对训练语料库进行微小更改来不成比例地影响模型行为。虽然存在针对特定中毒攻击的防御措施，但它们通常不提供任何保证，可能会被新型攻击所击败。相比之下，通过研究最坏情况行为，认证防御使得可以提供样本对抗攻击的稳健性保证，即对修改有限数量的训练样本的对抗攻击进行点对点认证。我们通过利用差分隐私和采样高斯机制来确保对每个测试实例的预测不受有限数量中毒示例的影响。通过这样做，我们的模型提供的对抗稳健性保证比先前认证提供的保证要多两倍以上。

更新时间: 2024-03-18 15:05:36

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2308.07553v2

Expectation Entropy as a Password Strength Metric

The classical combinatorics-based password strength formula provides a result in tens of bits, whereas the NIST Entropy Estimation Suite give a result between 0 and 1 for Min-entropy. In this work, we present a newly developed metric -- Expectation entropy that can be applied to estimate the strength of any random or random-like password. Expectation entropy provides the strength of a password on the same scale as an entropy estimation tool. Having an 'Expectation entropy' of a certain value, for example, 0.4 means that an attacker has to exhaustively search at least 40\% of the total number of guesses to find the password.

Updated: 2024-03-18 15:03:37

标题: 期望熵作为密码强度指标

摘要: 基于经典组合学的密码强度公式提供了几十位的结果，而NIST熵估计套件给出了Min-entropy在0到1之间的结果。在这项工作中，我们提出了一种新开发的度量标准--期望熵，可用于估计任何随机或类似随机的密码的强度。期望熵提供了与熵估计工具相同标度上的密码强度。拥有某个特定值的“期望熵”，例如0.4，意味着攻击者至少需要耗尽总猜测数的40％才能找到密码。

更新时间: 2024-03-18 15:03:37

领域: cs.CR

下载: http://arxiv.org/abs/2404.16853v1

Spatio-Temporal Fluid Dynamics Modeling via Physical-Awareness and Parameter Diffusion Guidance

This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance. In the upstream stage, we design a vector quantization reconstruction module with temporal evolution characteristics, ensuring balanced and resilient parameter distribution by introducing general physical constraints. In the downstream stage, a diffusion probability network involving parameters is utilized to generate high-quality future states of fluids, while enhancing the model's generalization ability by perceiving parameters in various physical setups. Extensive experiments on multiple benchmark datasets have verified the effectiveness and robustness of the ST-PAD framework, which showcase that ST-PAD outperforms current mainstream models in fluid dynamics modeling and prediction, especially in effectively capturing local representations and maintaining significant advantages in OOD generations.

Updated: 2024-03-18 14:57:47

标题: 通过物理感知和参数扩散引导的时空流体动力学建模

摘要: 本文提出了一个名为ST-PAD的两阶段框架，用于地球科学领域的时空流体动力学建模，旨在通过时空物理意识和参数扩散引导实现流体动力学的高精度模拟和预测。在上游阶段，我们设计了一个具有时间演变特征的矢量量化重构模块，通过引入一般物理约束，确保参数分布平衡和弹性。在下游阶段，利用涉及参数的扩散概率网络生成流体的高质量未来状态，同时通过感知各种物理设置中的参数，增强模型的泛化能力。对多个基准数据集进行的大量实验验证了ST-PAD框架的有效性和稳健性，表明ST-PAD在流体动力学建模和预测方面优于当前主流模型，特别是在有效捕捉局部表示并在OOD生成中保持显著优势方面。

更新时间: 2024-03-18 14:57:47

领域: cs.LG,cs.AI,physics.flu-dyn

下载: http://arxiv.org/abs/2403.13850v1

Fuzzy Rough Choquet Distances for Classification

This paper introduces a novel Choquet distance using fuzzy rough set based measures. The proposed distance measure combines the attribute information received from fuzzy rough set theory with the flexibility of the Choquet integral. This approach is designed to adeptly capture non-linear relationships within the data, acknowledging the interplay of the conditional attributes towards the decision attribute and resulting in a more flexible and accurate distance. We explore its application in the context of machine learning, with a specific emphasis on distance-based classification approaches (e.g. k-nearest neighbours). The paper examines two fuzzy rough set based measures that are based on the positive region. Moreover, we explore two procedures for monotonizing the measures derived from fuzzy rough set theory, making them suitable for use with the Choquet integral, and investigate their differences.

Updated: 2024-03-18 14:53:48

标题: 模糊粗糙Choquet距离在分类中的应用

摘要: 本文介绍了一种使用基于模糊粗糙集的度量的新型Choquet距离。所提出的距离度量结合了从模糊粗糙集理论中获得的属性信息与Choquet积分的灵活性。这种方法旨在灵活捕捉数据中的非线性关系，承认条件属性对决策属性的相互作用，从而产生更灵活和准确的距离。我们探讨了其在机器学习领域的应用，特别强调基于距离的分类方法（例如k最近邻）。本文研究了两种基于正区域的模糊粗糙集度量。此外，我们探讨了两种从模糊粗糙集理论推导的度量的单调化过程，使其适用于Choquet积分，并研究它们之间的差异。

更新时间: 2024-03-18 14:53:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.11843v1

Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achieving high-quality policy learning. However, most existing offline reinforcement learning (RL) methods depend on two key assumptions--unconfoundedness and positivity--which frequently do not hold in observational data contexts. Recognizing these challenges, we propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on front-door criterion to remove the confounding bias; additionally, we adopt the pessimistic principle to address the distributional shift between the action distributions induced by candidate policies, and the behavior policy that generates the observational data. Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.

Updated: 2024-03-18 14:51:19

标题: 使用中介者进行悲观因果强化学习处理混淆的离线数据

摘要: 在现实世界的场景中，由于时间和预算的限制，从随机实验中收集的数据集往往受到大小的限制。因此，利用大型观察数据集成为实现高质量政策学习的更有吸引力的选择。然而，大多数现有的离线强化学习（RL）方法依赖于两个关键假设--非混杂性和积极性--这些假设在观察数据环境中经常不成立。鉴于这些挑战，我们提出了一种新颖的政策学习算法，称为PESsimistic CAusal Learning（PESCAL）。我们利用基于前门准则的中介变量来消除混淆偏差；此外，我们采用悲观原则来解决候选策略引起的行为分布与生成观察数据的行为策略之间的分布偏移。我们的关键观察是，通过整合中介变量，以中介变量分布函数的下界为学习目标，而不是Q函数，可以部分缓解分布偏移的问题。这一洞见显著简化了我们的算法，避免了对估计的Q函数进行顺序不确定性量化的具有挑战性的任务。此外，我们为我们提出的算法提供了理论保证，并通过模拟和利用来自领先的网约车平台的离线数据集进行的实际实验展示了它们的有效性。

更新时间: 2024-03-18 14:51:19

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11841v1

SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator

Machine learning models are vulnerable to maliciously crafted Adversarial Examples (AEs). Training a machine learning model with AEs improves its robustness and stability against adversarial attacks. It is essential to develop models that produce high-quality AEs. Developing such models has been much slower in natural language processing (NLP) than in areas such as computer vision. This paper introduces a practical and efficient adversarial attack model called SSCAE for \textbf{S}emantic, \textbf{S}yntactic, and \textbf{C}ontext-aware natural language \textbf{AE}s generator. SSCAE identifies important words and uses a masked language model to generate an early set of substitutions. Next, two well-known language models are employed to evaluate the initial set in terms of semantic and syntactic characteristics. We introduce (1) a dynamic threshold to capture more efficient perturbations and (2) a local greedy search to generate high-quality AEs. As a black-box method, SSCAE generates humanly imperceptible and context-aware AEs that preserve semantic consistency and the source language's syntactical and grammatical requirements. The effectiveness and superiority of the proposed SSCAE model are illustrated with fifteen comparative experiments and extensive sensitivity analysis for parameter optimization. SSCAE outperforms the existing models in all experiments while maintaining a higher semantic consistency with a lower query number and a comparable perturbation rate.

Updated: 2024-03-18 14:45:20

标题: SSCAE -- 语义、句法和上下文感知的自然语言对抗样本生成器

摘要: 机器学习模型容易受到恶意构造的对抗性示例（AEs）的影响。使用AEs训练机器学习模型可以提高其对对抗性攻击的稳健性和稳定性。开发能够生成高质量AEs的模型至关重要。与计算机视觉等领域相比，自然语言处理（NLP）领域中这类模型的发展速度要慢得多。本文介绍了一种实用且高效的对抗攻击模型，称为SSCAE，用于生成语义、句法和上下文感知的自然语言AE。SSCAE识别重要单词，并使用掩蔽语言模型生成一组初始替换词。接下来，使用两个知名的语言模型来评估初始替换词组在语义和句法特征方面的表现。我们引入了（1）动态阈值来捕捉更有效的扰动，以及（2）局部贪婪搜索来生成高质量AEs。作为一种黑盒方法，SSCAE生成了人类难以察觉且具有上下文感知性的AEs，保持了语义一致性和源语言的句法和语法要求。通过十五个比较实验和广泛的参数优化敏感性分析，展示了所提出的SSCAE模型的有效性和优越性。在所有实验中，SSCAE都优于现有模型，同时在保持更高的语义一致性的情况下，具有更低的查询次数和可比的扰动率。

更新时间: 2024-03-18 14:45:20

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2403.11833v1

Problem space structural adversarial attacks for Network Intrusion Detection Systems based on Graph Neural Networks

Machine Learning (ML) algorithms have become increasingly popular for supporting Network Intrusion Detection Systems (NIDS). Nevertheless, extensive research has shown their vulnerability to adversarial attacks, which involve subtle perturbations to the inputs of the models aimed at compromising their performance. Recent proposals have effectively leveraged Graph Neural Networks (GNN) to produce predictions based also on the structural patterns exhibited by intrusions to enhance the detection robustness. However, the adoption of GNN-based NIDS introduces new types of risks. In this paper, we propose the first formalization of adversarial attacks specifically tailored for GNN in network intrusion detection. Moreover, we outline and model the problem space constraints that attackers need to consider to carry out feasible structural attacks in real-world scenarios. As a final contribution, we conduct an extensive experimental campaign in which we launch the proposed attacks against state-of-the-art GNN-based NIDS. Our findings demonstrate the increased robustness of the models against classical feature-based adversarial attacks, while highlighting their susceptibility to structure-based attacks.

Updated: 2024-03-18 14:40:33

标题: 基于图神经网络的网络入侵检测系统问题空间结构对抗攻击

摘要: 机器学习（ML）算法在支持网络入侵检测系统（NIDS）方面变得越来越受欢迎。然而，广泛的研究表明它们容易受到对抗性攻击的影响，这些攻击涉及对模型输入的微小扰动，旨在损害其性能。最近的提议有效地利用图神经网络（GNN）来基于入侵展现的结构模式产生预测，以增强检测的稳健性。然而，基于GNN的NIDS的采用引入了新类型的风险。在本文中，我们首次对专为网络入侵检测中的GNN量身定制的对抗性攻击进行了形式化。此外，我们概述并模拟了攻击者在真实场景中需要考虑的问题空间约束，以执行可行的结构攻击。作为最终贡献，我们进行了一项广泛的实验活动，其中我们对最先进的基于GNN的NIDS发起了提出的攻击。我们的发现表明，模型对传统基于特征的对抗性攻击的稳健性增加，同时强调了它们对基于结构的攻击的易受性。

更新时间: 2024-03-18 14:40:33

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2403.11830v1

Graphs Unveiled: Graph Neural Networks and Graph Generation

One of the hot topics in machine learning is the field of GNN. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. This paper represents a survey, providing a comprehensive overview of Graph Neural Networks (GNNs). We discuss the applications of graph neural networks across various domains. Finally, we present an advanced field in GNNs: graph generation.

Updated: 2024-03-18 14:37:27

标题: 揭示的图：图神经网络和图生成

摘要: 机器学习领域的热门话题之一是图神经网络（GNN）。图数据的复杂性给现有的机器学习算法带来了重大挑战。最近，许多研究致力于扩展深度学习方法以处理图数据。本文介绍了一项调查，提供了图神经网络（GNNs）的全面概述。我们讨论了图神经网络在各个领域的应用。最后，我们介绍了图生成在GNNs中的一个先进领域。

更新时间: 2024-03-18 14:37:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.13849v1

Causal Intervention for Fairness in Multi-behavior Recommendation

Recommender systems usually learn user interests from various user behaviors, including clicks and post-click behaviors (e.g., like and favorite). However, these behaviors inevitably exhibit popularity bias, leading to some unfairness issues: 1) for items with similar quality, more popular ones get more exposure; and 2) even worse the popular items with lower popularity might receive more exposure. Existing work on mitigating popularity bias blindly eliminates the bias and usually ignores the effect of item quality. We argue that the relationships between different user behaviors (e.g., conversion rate) actually reflect the item quality. Therefore, to handle the unfairness issues, we propose to mitigate the popularity bias by considering multiple user behaviors. In this work, we examine causal relationships behind the interaction generation procedure in multi-behavior recommendation. Specifically, we find that: 1) item popularity is a confounder between the exposed items and users' post-click interactions, leading to the first unfairness; and 2) some hidden confounders (e.g., the reputation of item producers) affect both item popularity and quality, resulting in the second unfairness. To alleviate these confounding issues, we propose a causal framework to estimate the causal effect, which leverages backdoor adjustment to block the backdoor paths caused by the confounders. In the inference stage, we remove the negative effect of popularity and utilize the good effect of quality for recommendation. Experiments on two real-world datasets validate the effectiveness of our proposed framework, which enhances fairness without sacrificing recommendation accuracy.

Updated: 2024-03-18 14:36:35

标题: 多行为推荐中公平性的因果干预

摘要: 推荐系统通常从各种用户行为中学习用户兴趣，包括点击和点击后的行为（如喜欢和收藏）。然而，这些行为不可避免地表现出流行度偏见，导致一些不公平问题：1）对于质量相似的物品，更受欢迎的物品获得更多曝光；2）甚至更糟糕的是，受欢迎度较低的流行物品可能会获得更多曝光。现有的缓解流行度偏见的工作盲目消除偏见，通常忽略物品质量的影响。我们认为，不同用户行为之间的关系（如转化率）实际上反映了物品质量。因此，为了处理不公平问题，我们提出通过考虑多种用户行为来减轻流行度偏见。在这项工作中，我们研究了多行为推荐中交互生成过程背后的因果关系。具体来说，我们发现：1）物品的流行度是暴露的物品和用户点击后交互之间的混淆因素，导致第一个不公平；2）一些隐藏的混淆因素（如物品生产者的声誉）影响物品的流行度和质量，导致第二个不公平。为了缓解这些混淆问题，我们提出了一个因果框架来估计因果效应，利用反向调整来阻止混淆因素造成的背门路径。在推理阶段，我们消除流行度的负面影响，利用质量的好处进行推荐。对两个真实世界数据集的实验证实了我们提出的框架的有效性，提高了公平性而不牺牲推荐准确性。

更新时间: 2024-03-18 14:36:35

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2209.04589v2

Expressive Losses for Verified Robustness via Convex Combinations

In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.

Updated: 2024-03-18 14:35:21

标题: 通过凸组合实现验证稳健性的表达损失

摘要: 为了训练具有验证对抗性鲁棒性的网络，通常会对扰动区域上的最坏情况损失进行过度逼近，导致网络在牺牲标准性能的情况下获得可验证性。最近的研究表明，通过仔细地将对抗训练与过度逼近相结合，可以获得更好的准确性和鲁棒性之间的权衡。我们假设损失函数的表达能力，即通过一个参数（过度逼近系数）在最坏情况损失的下限和上限之间跨越一系列权衡的能力，是实现最先进性能的关键。为了支持我们的假设，我们展示了通过对抗攻击和IBP界限之间的凸组合获得的平凡表达性损失，尽管概念简单，却在各种设置中取得了最先进的结果。我们对过度逼近系数与不同表达性损失之间的关系进行了详细分析，显示出虽然表达能力至关重要，但更好地逼近最坏情况损失并不一定意味着更优越的鲁棒性-准确性权衡。

更新时间: 2024-03-18 14:35:21

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2305.13991v3

Bugs in Large Language Models Generated Code: An Empirical Study

Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e., automatic code generation. Similar to human-written code, LLM-generated code is prone to bugs, and these bugs have not yet been thoroughly examined by the community. Given the increasing adoption of LLM-based code generation tools (e.g., GitHub Copilot) in SE activities, it is critical to understand the characteristics of bugs contained in code generated by LLMs. This paper examines a sample of 333 bugs collected from code generated using three leading LLMs (i.e., CodeGen, PanGu-Coder, and Codex) and identifies the following 10 distinctive bug patterns: Misinterpretations, Syntax Error, Silly Mistake, Prompt-biased code, Missing Corner Case, Wrong Input Type, Hallucinated Object, Wrong Attribute, Incomplete Generation, and Non-Prompted Consideration. The bug patterns are presented in the form of a taxonomy. The identified bug patterns are validated using an online survey with 34 LLM practitioners and researchers. The surveyed participants generally asserted the significance and prevalence of the bug patterns. Researchers and practitioners can leverage these findings to develop effective quality assurance techniques for LLM-generated code. This study sheds light on the distinctive characteristics of LLM-generated code.

Updated: 2024-03-18 14:34:13

标题: 大型语言模型生成代码中的错误：一项实证研究

摘要: 最近，面向代码的大型语言模型（LLMs）引起了广泛关注。它们可以基于提供的提示生成不同编程语言的代码，实现了软件工程（SE）领域长期以来的一个梦想，即自动代码生成。与人工编写的代码类似，LLM生成的代码容易出现错误，而这些错误尚未得到社区的深入研究。鉴于LLM生成代码工具（例如GitHub Copilot）在SE活动中的日益普及，了解LLM生成的代码中包含的错误特征至关重要。本文研究了使用三种领先的LLMs（即CodeGen、PanGu-Coder和Codex）生成的代码收集的333个错误样本，并识别出以下10种独特的错误模式：误解、语法错误、愚蠢错误、提示偏向代码、遗漏边界情况、错误的输入类型、幻想对象、错误属性、不完整生成和非提示考虑。这些错误模式以分类的形式呈现。通过与34位LLM从业者和研究人员进行在线调查验证了识别出的错误模式。受访者一般认为这些错误模式的重要性和普遍性。研究人员和从业者可以利用这些发现来开发针对LLM生成代码的有效质量保证技术。本研究揭示了LLM生成代码的独特特征。

更新时间: 2024-03-18 14:34:13

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2403.08937v2

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and makes a better use of environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, hybrid asymmetric $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our approach approximately learns the correct return distribution, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

Updated: 2024-03-18 14:27:21

标题: 使用双期望分位回归的分布式强化学习

摘要: 分布式强化学习（RL）在多个基准测试中已被证明是有用的，因为它能够近似完整的回报分布，并更好地利用环境样本。常用的基于不对称$L_1$损失的分布式RL的分位回归方法提供了学习任意回报分布的灵活且有效的方式。实际上，通过使用更高效的混合不对称$L_1$-$L_2$ Huber损失来改进。然而，通过这样做，分布估计保证消失，我们在经验上观察到估计的分布迅速坍缩到其均值。实际上，对应于期望回归的不对称$L_2$损失不能轻易用于分布式时间差分学习。受到基于$L_2$的学习效率的启发，我们提出了一种联合学习回报分布的expectiles和分位数的方法，这样可以在保持对完整回报分布的估计的同时实现高效学习。我们证明我们的方法近似学习了正确的回报分布，并在一个玩具示例和规模上进行了实际实现的基准测试。在Atari基准测试中，我们的方法在2亿次训练帧后与基于Huber的IQN-1基线性能匹配，但避免了分布坍塌并保持了对完整回报分布的估计。

更新时间: 2024-03-18 14:27:21

领域: cs.LG,cs.AI,I.2.8; G.3

下载: http://arxiv.org/abs/2305.16877v2

A Theoretical and Practical Framework for Evaluating Uncertainty Calibration in Object Detection

The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in many domains, making the problem of uncertainty calibration pivotal when considering the future of deep learning. This is especially true when considering object detection systems, that are commonly present in safety-critical applications such as autonomous driving, robotics and medical diagnosis. For this reason, this work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration. This encompasses a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation. The robustness of the proposed uncertainty calibration metrics is shown through a series of representative experiments.

Updated: 2024-03-18 14:24:34

标题: 一个用于评估目标检测中不确定性校准的理论和实践框架

摘要: 深度神经网络的激增导致机器学习系统在各种现实应用中变得越来越普遍。因此，在许多领域对高度可靠的模型的需求不断增长，这使得在考虑深度学习的未来时，不确定性校准问题变得至关重要。特别是在考虑到对象检测系统时，这些系统通常出现在自动驾驶、机器人和医学诊断等安全关键应用中。因此，本研究提出了一个新的理论和实践框架，以评估对象检测系统在不确定性校准方面的表现。这通过明确的形式定义，以及从这种理论基础衍生出的三个新的评估指标，全面阐述了这一概念。通过一系列代表性实验展示了所提出的不确定性校准指标的稳健性。

更新时间: 2024-03-18 14:24:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.00464v2

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

The revolutionary capabilities of large language models (LLMs) have paved the way for multimodal large language models (MLLMs) and fostered diverse applications across various specialized domains. In the remote sensing (RS) field, however, the diverse geographical landscapes and varied objects in RS imagery are not adequately considered in recent MLLM endeavors. To bridge this gap, we construct a large-scale RS image-text dataset, LHRS-Align, and an informative RS-specific instruction dataset, LHRS-Instruct, leveraging the extensive volunteered geographic information (VGI) and globally available RS images. Building on this foundation, we introduce LHRS-Bot, an MLLM tailored for RS image understanding through a novel multi-level vision-language alignment strategy and a curriculum learning method. Additionally, we introduce LHRS-Bench, a benchmark for thoroughly evaluating MLLMs' abilities in RS image understanding. Comprehensive experiments demonstrate that LHRS-Bot exhibits a profound understanding of RS images and the ability to perform nuanced reasoning within the RS domain.

Updated: 2024-03-18 14:16:29

标题: LHRS-Bot：利用VGI增强的大型多模态语言模型赋能遥感技术

摘要: 大型语言模型（LLMs）的革命性能力为多模态大型语言模型（MLLMs）铺平了道路，并促进了在各种专业领域的多样应用。然而，在遥感（RS）领域，最近的MLLM努力并未充分考虑RS图像中多样的地理景观和各种对象。为了弥合这一差距，我们构建了一个大规模的RS图像文本数据集，名为LHRS-Align，以及一个信息丰富的RS特定指导数据集，名为LHRS-Instruct，利用广泛的志愿地理信息（VGI）和全球可用的RS图像。在此基础上，我们引入了LHRS-Bot，一个针对RS图像理解的MLLM，通过一种新颖的多级视觉语言对齐策略和课程学习方法。此外，我们引入了LHRS-Bench，用于全面评估MLLM在RS图像理解方面的能力的基准测试。全面的实验表明，LHRS-Bot表现出对RS图像的深刻理解以及在RS领域内进行微妙推理的能力。

更新时间: 2024-03-18 14:16:29

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.02544v3

Magnushammer: A Transformer-Based Approach to Premise Selection

This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without the engineering overhead. Our method, Magnushammer, outperforms the most advanced and widely used automation tool in interactive theorem proving called Sledgehammer. On the PISA and miniF2F benchmarks Magnushammer achieves $59.5\%$ (against $38.3\%$) and $34.0\%$ (against $20.9\%$) success rates, respectively. By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57.0\%$ to $71.0\%$ on the PISA benchmark using $4$x fewer parameters. Moreover, we develop and open source a novel dataset for premise selection, containing textual representations of (proof state, relevant premise) pairs. To the best of our knowledge, this is the largest available premise selection dataset, and the first one for the Isabelle proof assistant.

Updated: 2024-03-18 14:16:22

标题: "Magnushammer：一种基于Transformer的前提选取方法"

摘要: 这篇论文提出了一种新颖的前提选择方法，这是自动定理证明中的一个关键推理任务。传统上，依赖领域知识和工程努力的符号方法被应用于这一任务。相比之下，这项工作表明，使用变压器架构的对比训练可以实现更高质量的相关前提检索，而无需进行工程开销。我们的方法Magnushammer在交互定理证明中表现出色，优于最先进和广泛使用的自动化工具Sledgehammer。在PISA和miniF2F基准测试中，Magnushammer的成功率分别达到$59.5\%$（对比$38.3\%$）和$34.0\%$（对比$20.9\%$）。通过将该方法与基于语言模型的自动定理证明器相结合，我们进一步将PISA基准测试的最新证明成功率从$57.0\%$提高到$71.0\%$，而参数数量减少了4倍。此外，我们开发并开放了一个新颖的前提选择数据集，其中包含（证明状态，相关前提）对的文本表示。据我们所知，这是目前可用的最大前提选择数据集，也是Isabelle证明助手的第一个数据集。

更新时间: 2024-03-18 14:16:22

领域: cs.LG,cs.AI,cs.LO

下载: http://arxiv.org/abs/2303.04488v3

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, GAMA-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through GAMA-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on GAMA-Bench, achieving a score of 72.5. Moreover, the increasingly higher scores across the three iterations of GPT-3.5 (0613, 1106, 0125) demonstrate marked advancements in the model's intelligence with each update. The code and experimental results are made publicly available via https://github.com/CUHK-ARISE/GAMABench.

Updated: 2024-03-18 14:04:47

标题: 我们在LLM的决策制定方面达到了多远？评估LLM在多智能体环境中的游戏能力

摘要: 决策是一个复杂的任务，需要各种类型的能力，为评估大型语言模型（LLMs）提供了一个优秀的框架。我们的研究通过博弈论的视角探讨了LLMs的决策能力。我们特别关注支持多于两个代理同时参与的游戏。随后，我们介绍了我们的框架GAMA-Bench，包括八种经典的多代理游戏。我们设计了一个评分方案，以定量评估模型在这些游戏中的表现。通过GAMA-Bench，我们研究了LLMs的稳健性、泛化能力和增强策略。结果显示，虽然GPT-3.5表现出令人满意的稳健性，但其泛化能力相对有限。然而，通过Chain-of-Thought等方法，其性能可以得到提升。此外，我们对各种LLMs进行评估，发现GPT-4在GAMA-Bench上表现优于其他模型，获得了72.5分。此外，GPT-3.5（0613、1106、0125）三次迭代中得分逐渐提高，显示了模型智能随着每次更新的显著进步。代码和实验结果可通过https://github.com/CUHK-ARISE/GAMABench 公开获取。

更新时间: 2024-03-18 14:04:47

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.11807v1

Is It Really You Who Forgot the Password? When Account Recovery Meets Risk-Based Authentication

Risk-based authentication (RBA) is used in online services to protect user accounts from unauthorized takeover. RBA commonly uses contextual features that indicate a suspicious login attempt when the characteristic attributes of the login context deviate from known and thus expected values. Previous research on RBA and anomaly detection in authentication has mainly focused on the login process. However, recent attacks have revealed vulnerabilities in other parts of the authentication process, specifically in the account recovery function. Consequently, to ensure comprehensive authentication security, the use of anomaly detection in the context of account recovery must also be investigated. This paper presents the first study to investigate risk-based account recovery (RBAR) in the wild. We analyzed the adoption of RBAR by five prominent online services (that are known to use RBA). Our findings confirm the use of RBAR at Google, LinkedIn, and Amazon. Furthermore, we provide insights into the different RBAR mechanisms of these services and explore the impact of multi-factor authentication on them. Based on our findings, we create a first maturity model for RBAR challenges. The goal of our work is to help developers, administrators, and policy-makers gain an initial understanding of RBAR and to encourage further research in this direction.

Updated: 2024-03-18 13:55:24

标题: 当账户恢复遇到基于风险的认证：真的是你忘记了密码吗？

摘要: 基于风险的身份验证（RBA）被用于在线服务中，以保护用户账户免受未经授权的接管。RBA通常使用表明可疑登录尝试的上下文特征，当登录上下文的特征属性偏离已知且预期值时，表示存在风险。先前关于RBA和身份验证异常检测的研究主要集中在登录过程上。然而，最近的攻击揭示了身份验证过程的其他部分存在漏洞，特别是在账户恢复功能中。因此，为了确保综合身份验证安全性，必须对账户恢复过程中的异常检测进行研究。本文是第一项研究，调查了野外环境中基于风险的账户恢复（RBAR）。我们分析了五个知名在线服务（已知使用RBA）对RBAR的采用情况。我们的发现证实了Google、LinkedIn和Amazon在使用RBAR。此外，我们提供了这些服务不同RBAR机制的洞察，并探讨了多因素身份验证对其的影响。基于我们的发现，我们创建了一个RBAR挑战的成熟度模型。我们的目标是帮助开发人员、管理员和政策制定者对RBAR有一个初步的了解，并鼓励在这个方向进行进一步研究。

更新时间: 2024-03-18 13:55:24

领域: cs.CR

下载: http://arxiv.org/abs/2403.11798v1

Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning.

Updated: 2024-03-18 13:50:50

标题: 大型语言模型的推理能力：对抽象和推理语料库的深入分析

摘要: 现有评估大型语言模型（LLMs）推理能力的方法以结果为中心，难以评估推理过程。我们引入一种新方法，使用抽象和推理语料库（ARC）数据集以过程为中心的方式评估大型语言模型的推理和上下文理解能力。ARC要求在解决问题时具有严格的逻辑结构，使其成为一个基准，便于将模型的推理能力与人类进行比较。实验结果表明，尽管大型语言模型具有弱推理能力，但在逻辑连贯性、组成性和生产力方面仍然落后。我们的实验突出了LLMs的推理能力，提出了实现人类水平推理的发展路径。

更新时间: 2024-03-18 13:50:50

领域: cs.CL,cs.AI,cs.ET,cs.SC

下载: http://arxiv.org/abs/2403.11793v1

Deep Medial Voxels: Learned Medial Axis Approximations for Anatomical Shape Modeling

Shape reconstruction from imaging volumes is a recurring need in medical image analysis. Common workflows start with a segmentation step, followed by careful post-processing and,finally, ad hoc meshing algorithms. As this sequence can be timeconsuming, neural networks are trained to reconstruct shapes through template deformation. These networks deliver state-ofthe-art results without manual intervention, but, so far, they have primarily been evaluated on anatomical shapes with little topological variety between individuals. In contrast, other works favor learning implicit shape models, which have multiple benefits for meshing and visualization. Our work follows this direction by introducing deep medial voxels, a semi-implicit representation that faithfully approximates the topological skeleton from imaging volumes and eventually leads to shape reconstruction via convolution surfaces. Our reconstruction technique shows potential for both visualization and computer simulations.

Updated: 2024-03-18 13:47:18

标题: 深度中轴体素：用于解剖形态建模的学习中轴近似

摘要: 从成像体积中进行形状重建是医学图像分析中经常出现的需求。常见的工作流程以分割步骤开始，然后经过仔细的后处理，最后是特定的网格化算法。由于这个过程可能耗时，因此通过训练神经网络来通过模板变形重建形状。这些网络提供了最先进的结果，无需手动干预，但到目前为止，它们主要是评估解剖形状，个体之间拓扑变化很小。相反，其他作品更倾向于学习隐式形状模型，对网格化和可视化有多重好处。我们的工作沿着这个方向，引入了深层中央体素，这是一种半隐式表示，忠实地近似成像体积中的拓扑骨架，并最终通过卷积表面进行形状重建。我们的重建技术在可视化和计算机模拟方面显示出潜力。

更新时间: 2024-03-18 13:47:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.11790v1

Construction of Hyper-Relational Knowledge Graphs Using Pre-Trained Large Language Models

Extracting hyper-relations is crucial for constructing comprehensive knowledge graphs, but there are limited supervised methods available for this task. To address this gap, we introduce a zero-shot prompt-based method using OpenAI's GPT-3.5 model for extracting hyper-relational knowledge from text. Comparing our model with a baseline, we achieved promising results, with a recall of 0.77. Although our precision is currently lower, a detailed analysis of the model outputs has uncovered potential pathways for future research in this area.

Updated: 2024-03-18 13:44:48

标题: 使用预训练的大型语言模型构建超关系知识图

摘要: 提取超关系对于构建全面的知识图谱至关重要，但目前针对此任务的监督方法有限。为填补这一空白，我们介绍了一种使用OpenAI的GPT-3.5模型的零-shot提示为基础的方法，用于从文本中提取超关系知识。与基线模型相比，我们取得了令人期待的结果，召回率达到了0.77。尽管我们的精确度目前较低，但对模型输出的详细分析揭示了未来研究领域的潜在路径。

更新时间: 2024-03-18 13:44:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.11786v1

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .

Updated: 2024-03-18 13:39:05

标题: 提示歌手：具有自然语言提示的可控歌声合成

摘要: 最近的歌声合成（SVS）方法取得了显著的音频质量和自然性，但它们缺乏明确控制合成歌声风格属性的能力。我们提出了Prompt-Singer，这是第一个能够用自然语言控制歌手性别、音域和音量属性的SVS方法。我们采用基于解码器的多尺度层次结构的变压器模型架构，并设计了一个分离音域旋律的音高表示，从而实现了文本条件的音域控制，同时保持旋律准确性。此外，我们探索了各种实验设置，包括不同类型的文本表示、文本编码器微调以及引入语音数据以缓解数据稀缺问题，旨在促进进一步研究。实验证明，我们的模型具有良好的控制能力和音频质量。音频样本可在http://prompt-singer.github.io 上找到。

更新时间: 2024-03-18 13:39:05

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2403.11780v1

ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts

Smart contracts are susceptible to various security issues, among which access control (AC) vulnerabilities are particularly critical. While existing research has proposed multiple detection tools, the automatic and appropriate repair of AC vulnerabilities in smart contracts remains a challenge. Unlike commonly supported vulnerability types by existing repair tools, such as reentrancy, which are usually fixed by template-based approaches, the main obstacle of AC lies in identifying the appropriate roles or permissions amid a long list of non-AC-related source code to generate proper patch code, a task that demands human-level intelligence. Leveraging recent advancements in large language models (LLMs), we employ the state-of-the-art GPT-4 model and enhance it with a novel approach called ACFIX. The key insight is that we can mine common AC practices for major categories of code functionality and use them to guide LLMs in fixing code with similar functionality. To this end, ACFIX involves both offline and online phases. First, during the offline phase, ACFIX mines a taxonomy of common Role-based Access Control (RBAC) practices from 344,251 on-chain contracts, categorizing 49 role-permission pairs from the top 1,000 pairs mined. Second, during the online phase, ACFIX tracks AC-related elements across the contract and uses this context information along with a Chain-of-Thought pipeline to guide LLMs in identifying the most appropriate role-permission pair for the subject contract and subsequently generating a suitable patch. This patch will then undergo a validity and effectiveness check. To evaluate ACFIX, we built the first benchmark dataset of 118 real-world AC vulnerabilities, and our evaluation revealed that ACFIX successfully repaired 94.92% of them. This represents a significant improvement compared to the baseline GPT-4, which achieved only 52.54%.

Updated: 2024-03-18 13:37:56

标题: ACFIX：利用挖掘的常见RBAC实践指导LLM对智能合约中的访问控制漏洞进行上下文感知修复

摘要: 智能合约容易受到各种安全问题的影响，其中访问控制（AC）漏洞尤为关键。尽管现有研究提出了多种检测工具，但智能合约中AC漏洞的自动和适当修复仍然是一个挑战。与现有修复工具通常支持的漏洞类型（如可重入性）不同，这些漏洞通常通过基于模板的方法进行修复，AC的主要障碍在于在非AC相关源代码的长列表中识别适当的角色或权限，以生成适当的补丁代码，这需要人类级别的智能。利用最新的大型语言模型（LLMs）的进展，我们采用了最先进的GPT-4模型，并结合一种名为ACFIX的新方法进行增强。关键的洞察力是，我们可以挖掘常见的AC实践以对代码功能的主要类别进行分类，并将它们用于指导LLMs修复具有类似功能的代码。为此，ACFIX包括离线和在线阶段。首先，在离线阶段，ACFIX从344,251个链上合约中挖掘常见的基于角色的访问控制（RBAC）实践的分类法，从挖掘的前1000对中对49个角色-权限对进行分类。其次，在在线阶段，ACFIX跟踪合同中的AC相关元素，并使用此上下文信息以及一种思维链管道来引导LLMs识别主要合同的最适当的角色-权限对，并随后生成适当的补丁。然后，此补丁将经过有效性和有效性检查。为了评估ACFIX，我们建立了第一个包含118个真实世界AC漏洞的基准数据集，我们的评估显示ACFIX成功修复了94.92%的漏洞。与基线GPT-4相比，这表示了显著的改进，后者仅实现了52.54%的修复。

更新时间: 2024-03-18 13:37:56

领域: cs.SE,cs.CR

下载: http://arxiv.org/abs/2403.06838v2

Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios.

Updated: 2024-03-18 13:35:10

标题: 朝向在通讯平台上开发实时深度伪造音频检测系统

摘要: 深度伪造音频在通信平台中构成日益严重的威胁，需要实时检测音频流的完整性。与传统的非实时方法不同，本研究评估了在实时通信平台中使用静态深度伪造音频检测模型的可行性。开发了可在跨平台上运行的可执行软件，实现了实时执行。基于Resnet和LCNN架构的两个深度伪造音频检测模型使用ASVspoof 2019数据集实施，与ASVspoof 2019挑战基线相比取得了基准性能。该研究提出了增强这些模型的策略和框架，为通信平台中的实时深度伪造音频检测铺平了道路。这项工作有助于提升音频流安全性，确保在动态、实时通信场景中具备强大的检测能力。

更新时间: 2024-03-18 13:35:10

领域: cs.SD,cs.CR,cs.LG,eess.AS

下载: http://arxiv.org/abs/2403.11778v1

S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention

Motivated by the challenge of seamless cross-dataset transfer in EEG signal processing, this article presents an exploratory study on the use of Joint Embedding Predictive Architectures (JEPAs). In recent years, self-supervised learning has emerged as a promising approach for transfer learning in various domains. However, its application to EEG signals remains largely unexplored. In this article, we introduce Signal-JEPA for representing EEG recordings which includes a novel domain-specific spatial block masking strategy and three novel architectures for downstream classification. The study is conducted on a 54~subjects dataset and the downstream performance of the models is evaluated on three different BCI paradigms: motor imagery, ERP and SSVEP. Our study provides preliminary evidence for the potential of JEPAs in EEG signal encoding. Notably, our results highlight the importance of spatial filtering for accurate downstream classification and reveal an influence of the length of the pre-training examples but not of the mask size on the downstream performance.

Updated: 2024-03-18 13:30:12

标题: S-JEPA：通过动态空间注意力实现无缝跨数据集转移

摘要: 受到在EEG信号处理中实现无缝跨数据集转移的挑战的启发，本文介绍了对联合嵌入预测架构（JEPAs）的使用进行的探索性研究。近年来，自监督学习已经成为各个领域中转移学习的一种有前途的方法。然而，其在EEG信号中的应用仍未被充分探索。在本文中，我们引入了Signal-JEPA来表示包括一种新颖的领域特定空间块掩蔽策略和三种新颖的架构用于下游分类的EEG记录。该研究在一个54个受试者的数据集上进行，评估了模型在三种不同的BCI范式（动作想象、ERP和SSVEP）上的下游性能。我们的研究为JEPAs在EEG信号编码中的潜力提供了初步证据。值得注意的是，我们的结果强调了对准确的下游分类的空间滤波的重要性，并揭示了预训练示例的长度而不是掩蔽大小对下游性能的影响。

更新时间: 2024-03-18 13:30:12

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.11772v1

Privacy Protection in MRI Scans Using 3D Masked Autoencoders

MRI scans provide valuable medical information, however they also contain sensitive and personally identifiable information that needs to be protected. Whereas MRI metadata is easily sanitized, MRI image data is a privacy risk because it contains information to render highly-realistic 3D visualizations of a patient's head, enabling malicious actors to possibly identify the subject by cross-referencing a database. Data anonymization and de-identification is concerned with ensuring the privacy and confidentiality of individuals' personal information. Traditional MRI de-identification methods remove privacy-sensitive parts (e.g. eyes, nose etc.) from a given scan. This comes at the expense of introducing a domain shift that can throw off downstream analyses. In this work, we propose CP-MAE, a model that de-identifies the face by remodeling it (e.g. changing the face) rather than by removing parts using masked autoencoders. CP-MAE outperforms all previous approaches in terms of downstream task performance as well as de-identification. With our method we are able to synthesize high-fidelity scans of resolution up to $256^3$ -- compared to $128^3$ with previous approaches -- which constitutes an eight-fold increase in the number of voxels.

Updated: 2024-03-18 13:27:01

标题: MRI扫描中使用3D掩蔽自编码器进行隐私保护

摘要: MRI扫描提供了有价值的医疗信息，然而它们也包含了敏感和可识别个人信息，需要加以保护。虽然MRI元数据很容易进行清洗，但MRI图像数据却存在隐私风险，因为它包含了能够呈现患者头部高度逼真三维可视化的信息，这使得恶意行为者可能通过交叉参考数据库来识别对象。数据匿名化和去识别旨在确保个人个人信息的隐私和保密性。传统的MRI去识别方法会从给定扫描中去除隐私敏感部分（例如眼睛、鼻子等）。然而，这会引入一个可能会干扰下游分析的领域转换。在这项工作中，我们提出了CP-MAE模型，通过重新塑造面部（例如改变面部）而不是使用遮罩自动编码器来去识别面部。在下游任务性能和去识别方面，CP-MAE表现优于所有先前的方法。通过我们的方法，我们能够合成分辨率高达$256^3$的高保真扫描，而先前方法只能达到$128^3$，这相当于体素数量增加了八倍。

更新时间: 2024-03-18 13:27:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2310.15778v3

Human-in-the-Loop AI for Cheating Ring Detection

Online exams have become popular in recent years due to their accessibility. However, some concerns have been raised about the security of the online exams, particularly in the context of professional cheating services aiding malicious test takers in passing exams, forming so-called "cheating rings". In this paper, we introduce a human-in-the-loop AI cheating ring detection system designed to detect and deter these cheating rings. We outline the underlying logic of this human-in-the-loop AI system, exploring its design principles tailored to achieve its objectives of detecting cheaters. Moreover, we illustrate the methodologies used to evaluate its performance and fairness, aiming to mitigate the unintended risks associated with the AI system. The design and development of the system adhere to Responsible AI (RAI) standards, ensuring that ethical considerations are integrated throughout the entire development process.

Updated: 2024-03-18 13:25:57

标题: 人在环AI用于作弊环检测

摘要: 近年来，由于在线考试的便利性，其已经变得越来越受欢迎。然而，一些人对在线考试的安全性提出了担忧，特别是在专业作弊服务帮助恶意考生通过考试的背景下，形成了所谓的“作弊团体”。在本文中，我们介绍了一个人机协作的人工智能作弊团体检测系统，旨在检测和阻止这些作弊团体。我们概述了这个人机协作的人工智能系统的基本逻辑，探讨了其设计原则，旨在实现检测作弊者的目标。此外，我们阐述了评估其性能和公平性的方法，旨在减轻与人工智能系统相关的意外风险。该系统的设计和开发遵循负责任的人工智能（RAI）标准，确保道德考量贯穿整个开发过程。

更新时间: 2024-03-18 13:25:57

领域: cs.CY,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2403.14711v1

Invisible Backdoor Attack Through Singular Value Decomposition

With the widespread application of deep learning across various domains, concerns about its security have grown significantly. Among these, backdoor attacks pose a serious security threat to deep neural networks (DNNs). In recent years, backdoor attacks on neural networks have become increasingly sophisticated, aiming to compromise the security and trustworthiness of models by implanting hidden, unauthorized functionalities or triggers, leading to misleading predictions or behaviors. To make triggers less perceptible and imperceptible, various invisible backdoor attacks have been proposed. However, most of them only consider invisibility in the spatial domain, making it easy for recent defense methods to detect the generated toxic images.To address these challenges, this paper proposes an invisible backdoor attack called DEBA. DEBA leverages the mathematical properties of Singular Value Decomposition (SVD) to embed imperceptible backdoors into models during the training phase, thereby causing them to exhibit predefined malicious behavior under specific trigger conditions. Specifically, we first perform SVD on images, and then replace the minor features of trigger images with those of clean images, using them as triggers to ensure the effectiveness of the attack. As minor features are scattered throughout the entire image, the major features of clean images are preserved, making poisoned images visually indistinguishable from clean ones. Extensive experimental evaluations demonstrate that DEBA is highly effective, maintaining high perceptual quality and a high attack success rate for poisoned images. Furthermore, we assess the performance of DEBA under existing defense measures, showing that it is robust and capable of significantly evading and resisting the effects of these defense measures.

Updated: 2024-03-18 13:25:12

标题: 通过奇异值分解进行的隐形后门攻击

摘要: 随着深度学习在各个领域的广泛应用，对其安全性的担忧显著增加。在这些担忧中，后门攻击对深度神经网络（DNNs）构成严重的安全威胁。近年来，对神经网络的后门攻击变得越来越复杂，旨在通过植入隐藏的、未经授权的功能或触发器来危害模型的安全性和可信度，导致误导性预测或行为。为了使触发器更不易察觉和不可察觉，提出了各种隐形后门攻击。然而，大多数仅考虑在空间域中的隐形性，使得最近的防御方法容易检测生成的有毒图像。为了解决这些挑战，本文提出了一种称为DEBA的隐形后门攻击。DEBA利用奇异值分解（SVD）的数学特性，在训练阶段将不可察觉的后门嵌入模型中，从而使其在特定触发条件下表现出预定义的恶意行为。具体来说，我们首先对图像进行SVD，然后用清洁图像的次要特征替换触发图像的次要特征，并将其用作触发器，以确保攻击的有效性。由于次要特征分散在整个图像中，清洁图像的主要特征得以保留，使得中毒图像在视觉上难以与清洁图像区分开来。广泛的实验评估表明，DEBA非常有效，保持了高的感知质量和中毒图像的高攻击成功率。此外，我们评估了DEBA在现有防御措施下的性能，表明它具有鲁棒性，能够显著规避和抵抗这些防御措施的影响。

更新时间: 2024-03-18 13:25:12

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.13018v1

Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient Populations

The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce undetectable underdiagnosis bias in DL models. Our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups show that adversarial bias attacks demonstrate high-selectivity for bias in the targeted group by degrading group model performance without impacting overall model performance. Furthermore, our results indicate that adversarial bias attacks result in biased DL models that propagate prediction bias even when evaluated with external datasets.

Updated: 2024-03-18 13:19:33

标题: 明眼人看不见的：对易受攻击的患者群体进行的不可检测的敌对偏见攻击

摘要: 放射学中人工智能（AI）的传播揭示了深度学习（DL）模型加剧对弱势患者群体的临床偏见的风险。尽管先前的文献集中于量化训练的DL模型所展现的偏见，但在医学影像领域，对DL模型进行人口统计学定向对抗性偏见攻击及其在临床环境中的含义仍是一个未被充分探讨的研究领域。在这项工作中，我们展示了人口统计学定向标签污染攻击可以在DL模型中引入不可检测的欠诊断偏见。我们的结果跨越多个性能指标和人口群体（如性别、年龄及其交叉子群）显示，对抗性偏见攻击对目标群体的偏见表现出高选择性，通过降低群体模型性能而不影响整体模型性能。此外，我们的结果表明，对抗性偏见攻击导致有偏见的DL模型，即使使用外部数据集进行评估，也会传播预测偏见。

更新时间: 2024-03-18 13:19:33

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2402.05713v2

Why E.T. Can't Phone Home: A Global View on IP-based Geoblocking at VoWiFi

In current cellular network generations (4G, 5G) the IMS (IP Multimedia Subsystem) plays an integral role in terminating voice calls and short messages. Many operators use VoWiFi (Voice over Wi-Fi, also Wi-Fi calling) as an alternative network access technology to complement their cellular coverage in areas where no radio signal is available (e.g., rural territories or shielded buildings). In a mobile world where customers regularly traverse national borders, this can be used to avoid expensive international roaming fees while journeying overseas, since VoWiFi calls are usually invoiced at domestic rates. To not lose this revenue stream, some operators block access to the IMS for customers staying abroad. This work evaluates the current deployment status of VoWiFi among worldwide operators and analyzes existing geoblocking measures on the IP layer. We show that a substantial share (IPv4: 14.6%, IPv6: 65.2%) of operators implement geoblocking at the DNS- or VoWiFi protocol level, and highlight severe drawbacks in terms of emergency calling service availability.

Updated: 2024-03-18 13:12:56

标题: 为什么E.T.无法打电话回家：基于IP的VoWiFi地理屏蔽的全球视角

摘要: 在当前的蜂窝网络（4G、5G）中，IMS（IP多媒体子系统）在终止语音通话和短信方面发挥着重要作用。许多运营商使用VoWiFi（Wi-Fi语音，也称为Wi-Fi通话）作为一种替代网络接入技术，以补充其在无无线信号覆盖的地区（例如农村地区或屏蔽建筑物）的覆盖范围。在客户经常跨越国界的移动世界中，这可以用来避免在海外旅行时支付昂贵的国际漫游费用，因为VoWiFi通话通常以国内费率计费。为了不失去这一收入来源，一些运营商会阻止在国外逗留的客户访问IMS。本文评估了全球运营商的VoWiFi当前部署状态，并分析了IP层上现有的地理阻止措施。我们发现相当大一部分运营商（IPv4：14.6％，IPv6：65.2％）在DNS或VoWiFi协议层实施地理阻止，并强调了在紧急呼叫服务可用性方面的严重缺陷。

更新时间: 2024-03-18 13:12:56

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2403.11759v1

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration

Data storytelling is powerful for communicating data insights, but it requires diverse skills and considerable effort from human creators. Recent research has widely explored the potential for artificial intelligence (AI) to support and augment humans in data storytelling. However, there lacks a systematic review to understand data storytelling tools from the perspective of human-AI collaboration, which hinders researchers from reflecting on the existing collaborative tool designs that promote humans' and AI's advantages and mitigate their shortcomings. This paper investigated existing tools with a framework from two perspectives: the stages in the storytelling workflow where a tool serves, including analysis, planning, implementation, and communication, and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. Through our analysis, we recognize the common collaboration patterns in existing tools, summarize lessons learned from these patterns, and further illustrate research opportunities for human-AI collaboration in data storytelling.

Updated: 2024-03-18 13:00:17

标题: 我们现在到哪里了？从人工智能协作的角度理解数据叙事工具

摘要: 数据叙事在传达数据洞见方面具有强大的作用，但需要人类创作者具备多样化的技能和相当的努力。最近的研究广泛探讨了人工智能（AI）在支持和增强数据叙事中人类的潜力。但是，缺乏系统性的审查来理解数据叙事工具从人工智能协作的角度，这妨碍了研究人员对促进人类和人工智能优势并减轻其缺点的现有协作工具设计进行反思。本文从两个角度研究了现有工具：工具服务的叙事工作流程阶段，包括分析、规划、实施和沟通，以及每个阶段中人类和人工智能的角色，如创作者、助手、优化器和审阅者。通过我们的分析，我们认识到现有工具中的常见协作模式，总结了从这些模式中学到的经验教训，并进一步说明了数据叙事中人工智能协作的研究机会。

更新时间: 2024-03-18 13:00:17

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2309.15723v2

Post-Quantum Cryptography: Securing Digital Communication in the Quantum Era

The advent of quantum computing poses a profound threat to traditional cryptographic systems, exposing vulnerabilities that compromise the security of digital communication channels reliant on RSA, ECC, and similar classical encryption methods. Quantum algorithms, notably Shor's algorithm, exploit the inherent computational power of quantum computers to efficiently solve mathematical problems underlying these cryptographic schemes. In response, post-quantum cryptography (PQC) emerged as a critical field aimed at developing resilient cryptographic algorithms impervious to quantum attacks. This paper delineates the vulnerabilities of classical cryptographic systems to quantum attacks, elucidates the principles of quantum computing, and introduces various PQC algorithms such as lattice-based cryptography, code-based cryptography, hash-based cryptography, and multivariate polynomial cryptography. Highlighting the importance of PQC in securing digital communication amidst quantum computing advancements, this research underscores its pivotal role in safeguarding data integrity, confidentiality, and authenticity in the face of emerging quantum threats.

Updated: 2024-03-18 12:51:56

标题: 后量子密码学：在量子时代保护数字通信

摘要: 量子计算的出现对传统的加密系统构成了深刻威胁，暴露了RSA、ECC等经典加密方法依赖的数字通信通道的漏洞，危及其安全性。量子算法，特别是Shor算法，利用量子计算机的固有计算能力高效解决了支撑这些加密方案的数学问题。作为回应，后量子密码学（PQC）作为一个关键领域出现，旨在开发对量子攻击具有弹性的加密算法。本文阐述了经典加密系统面对量子攻击的脆弱性，阐明了量子计算的原理，并介绍了各种PQC算法，如基于格的密码学、基于码的密码学、基于哈希的密码学和多项式多项式密码学。强调了PQC在量子计算进步中保护数字通信安全中的重要性，本研究强调了其在面对新兴量子威胁时在维护数据完整性、机密性和真实性方面的关键作用。

更新时间: 2024-03-18 12:51:56

领域: cs.CR

下载: http://arxiv.org/abs/2403.11741v1

Heuristic Reasoning in AI: Instrumental Use and Mimetic Absorption

Deviating from conventional perspectives that frame artificial intelligence (AI) systems solely as logic emulators, we propose a novel program of heuristic reasoning. We distinguish between the 'instrumental' use of heuristics to match resources with objectives, and 'mimetic absorption,' whereby heuristics manifest randomly and universally. Through a series of innovative experiments, including variations of the classic Linda problem and a novel application of the Beauty Contest game, we uncover trade-offs between maximizing accuracy and reducing effort that shape the conditions under which AIs transition between exhaustive logical processing and the use of cognitive shortcuts (heuristics). We provide evidence that AIs manifest an adaptive balancing of precision and efficiency, consistent with principles of resource-rational human cognition as explicated in classical theories of bounded rationality and dual-process theory. Our findings reveal a nuanced picture of AI cognition, where trade-offs between resources and objectives lead to the emulation of biological systems, especially human cognition, despite AIs being designed without a sense of self and lacking introspective capabilities.

Updated: 2024-03-18 12:45:01

标题: AI中的启发式推理：工具性使用和模拟吸收

摘要: 偏离传统观点将人工智能（AI）系统仅框定为逻辑仿真器，我们提出了一种新颖的启发式推理方案。我们区分了启发式的“工具性”使用，用于匹配资源与目标，以及“模拟吸收”，即启发式以随机和普遍的方式表现。通过一系列创新性实验，包括经典的琳达问题的变体和对美丽比赛游戏的新应用，我们揭示了在最大化准确性和减少工作量之间存在的权衡，这些权衡塑造了AI在详尽的逻辑处理和使用认知快捷方式（启发式）之间过渡的条件。我们提供证据表明，AI表现出精确性和效率的自适应平衡，与资源有理性利用的原则一致，这些原则在有限理性和双过程理论的经典理论中有所阐述。我们的发现揭示了AI认知的微妙图景，在资源和目标之间的权衡导致对生物系统（特别是人类认知）的仿真，尽管AI被设计为没有自我意识和缺乏内省能力。

更新时间: 2024-03-18 12:45:01

领域: cs.AI

下载: http://arxiv.org/abs/2403.09404v2

Learning General Policies for Classical Planning Domains: Getting Beyond C$_2$

GNN-based approaches for learning general policies across planning domains are limited by the expressive power of $C_2$, namely; first-order logic with two variables and counting. This limitation can be overcomed by transitioning to $k$-GNNs, for $k=3$, wherein object embeddings are substituted with triplet embeddings. Yet, while $3$-GNNs have the expressive power of $C_3$, unlike $1$- and $2$-GNNs that are confined to $C_2$, they require quartic time for message exchange and cubic space for embeddings, rendering them impractical. In this work, we introduce a parameterized version of relational GNNs. When $t$ is infinity, R-GNN[$t$] approximates $3$-GNNs using only quadratic space for embeddings. For lower values of $t$, such as $t=1$ and $t=2$, R-GNN[$t$] achieves a weaker approximation by exchanging fewer messages, yet interestingly, often yield the $C_3$ features required in several planning domains. Furthermore, the new R-GNN[$t$] architecture is the original R-GNN architecture with a suitable transformation applied to the input states only. Experimental results illustrate the clear performance gains of R-GNN[$1$] and R-GNN[$2$] over plain R-GNNs, and also over edge transformers that also approximate $3$-GNNs.

Updated: 2024-03-18 12:42:53

标题: 学习经典规划领域的通用策略：超越C$_2$

摘要: 基于图神经网络的方法用于在规划领域中学习通用策略受到$C_2$的表达能力的限制，即一阶逻辑与两个变量和计数。这种限制可以通过转变为$k$-GNNs来克服，其中$k=3$，其中对象嵌入被三元组嵌入所替代。然而，尽管$3$-GNNs具有$C_3$的表达能力，而不像$1$-和$2$-GNNs那样被限制在$C_2$，它们需要四次时间进行消息交换和三次空间进行嵌入，使它们变得不实用。在这项工作中，我们介绍了关系GNNs的参数化版本。当$t$为无穷大时，R-GNN[$t$]只使用二次空间来近似$3$-GNNs。对于较低的$t$值，如$t=1$和$t=2$，R-GNN[$t$]通过交换较少的消息实现了更弱的近似，然而有趣的是，通常产生了在多个规划领域中所需的$C_3$特征。此外，新的R-GNN[$t$]架构是原始R-GNN架构，只对输入状态应用了适当的转换。实验结果表明，R-GNN[$1$]和R-GNN[$2$]相对于普通R-GNNs以及近似$3$-GNNs的边缘转换器具有明显的性能提升。

更新时间: 2024-03-18 12:42:53

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11734v1

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model pre-training and VPR, how to bridge the gap and fully unleash the capability of pre-trained models for VPR is still a key issue to address. To this end, we propose a novel method to realize seamless adaptation of pre-trained models for VPR. Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method to achieve both global and local adaptation efficiently, in which only lightweight adapters are tuned without adjusting the pre-trained model. Besides, to guide effective adaptation, we propose a mutual nearest neighbor local feature loss, which ensures proper dense local features are produced for local matching and avoids time-consuming spatial verification in re-ranking. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification. It ranks 1st on the MSLS challenge leaderboard (at the time of submission). The code is released at https://github.com/Lu-Feng/SelaVPR.

Updated: 2024-03-18 12:28:31

标题: 朝向预训练模型在视觉地点识别中的无缝适应

摘要: 最近的研究表明，使用大规模数据进行通用视觉学习任务预训练的视觉模型可以为广泛的视觉感知问题提供有用的特征表示。然而，在视觉地点识别（VPR）中，很少有人尝试利用预先训练的基础模型。由于模型预训练和VPR任务之间的训练目标和数据之间的固有差异，如何弥合这一差距，充分发挥预先训练模型在VPR中的能力仍然是一个关键问题。为此，我们提出了一种新颖的方法，实现了预训练模型对VPR的无缝适应。具体来说，为了获得既关注显著地标以区分地点的全局和局部特征，我们设计了一种混合适应方法，以实现全局和局部适应的高效性，其中只调整轻量级适配器而不调整预训练模型。此外，为了引导有效的适应，我们提出了一个相互最近邻局部特征损失，确保为局部匹配生成适当的密集局部特征，并避免在重新排序中耗时的空间验证。实验结果表明，我们的方法在使用更少的训练数据和训练时间的情况下胜过最先进的方法，并且仅使用基于RANSAC的空间验证的两阶段VPR方法的约3%的检索运行时间。它在MSLS挑战排行榜上排名第一（提交时）。代码已发布在https://github.com/Lu-Feng/SelaVPR。

更新时间: 2024-03-18 12:28:31

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.14505v2

Use of recommendation models to provide support to dyslexic students

Dyslexia is the most widespread specific learning disorder and significantly impair different cognitive domains. This, in turn, negatively affects dyslexic students during their learning path. Therefore, specific support must be given to these students. In addition, such a support must be highly personalized, since the problems generated by the disorder can be very different from one to another. In this work, we explored the possibility of using AI to suggest the most suitable supporting tools for dyslexic students, so as to provide a targeted help that can be of real utility. To do this, we relied on recommendation algorithms, which are a branch of machine learning, that aim to detect personal preferences and provide the most suitable suggestions. We hence implemented and trained three collaborative-filtering recommendation models, namely an item-based, a user-based and a weighted-hybrid model, and studied their performance on a large database of 1237 students' information, collected with a self-evaluating questionnaire regarding all the most used supporting strategies and digital tools. Each recommendation model was tested with three different similarity metrics, namely Pearson correlation, Euclidean distance and Cosine similarity. The obtained results showed that a recommendation system is highly effective in suggesting the optimal help tools/strategies for everyone. This demonstrates that the proposed approach is successful and can be used as a new and effective methodology to support students with dyslexia.

Updated: 2024-03-18 12:12:38

标题: 使用推荐模型为阅读障碍学生提供支持

摘要: 阅读障碍是最常见的特定学习障碍，严重影响不同认知领域。这反过来会在学习过程中对阅读障碍学生产生负面影响。因此，必须为这些学生提供专门支持。此外，这种支持必须高度个性化，因为障碍引发的问题可能各不相同。在这项工作中，我们探讨了利用人工智能为阅读障碍学生建议最合适的支持工具的可能性，以提供真正有用的有针对性帮助。为此，我们依赖于推荐算法，这是机器学习的一个分支，旨在检测个人偏好并提供最合适的建议。因此，我们实施并训练了三个协同过滤推荐模型，分别是基于项目的模型、基于用户的模型和加权混合模型，并研究了它们在一个包含1237名学生信息的大型数据库上的表现，这些信息是通过自我评估问卷收集的，涉及所有最常用的支持策略和数字工具。每个推荐模型都使用三种不同的相似度度量进行测试，即皮尔逊相关系数、欧几里得距离和余弦相似度。获得的结果显示，推荐系统在为每个人建议最佳帮助工具/策略方面非常有效。这表明所提出的方法是成功的，可以作为支持阅读障碍学生的新的有效方法。

更新时间: 2024-03-18 12:12:38

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2403.14710v1

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding the visual world. Conventional LMMs process images in fixed sizes and limited resolutions, while recent explorations in this direction are limited in adaptivity, efficiency, and even correctness. In this work, we first take GPT-4V and LLaVA-1.5 as representative examples and expose systematic flaws rooted in their visual encoding strategy. To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution. LLaVA-UHD includes three key components: (1) An image modularization strategy that divides native-resolution images into smaller variable-sized slices for efficient and extensible encoding, (2) a compression module that further condenses image tokens from visual encoders, and (3) a spatial schema to organize slice tokens for LLMs. Comprehensive experiments show that LLaVA-UHD outperforms established LMMs trained with 2-3 orders of magnitude more data on 9 benchmarks. Notably, our model built on LLaVA-1.5 336x336 supports 6 times larger (i.e., 672x1088) resolution images using only 94% inference computation, and achieves 6.4 accuracy improvement on TextVQA. Moreover, the model can be efficiently trained in academic settings, within 23 hours on 8 A100 GPUs (vs. 26 hours of LLaVA-1.5). We make the data and code publicly available at https://github.com/thunlp/LLaVA-UHD.

Updated: 2024-03-18 12:04:11

标题: LLaVA-UHD：一个可以感知任何宽高比和高分辨率图像的LMM

摘要: 视觉编码构成了理解视觉世界的大型多模态模型（LMMs）的基础。传统的LMMs处理固定大小和有限分辨率的图像，而最近在这个方向的探索在适应性、效率甚至正确性方面都有所限制。在这项工作中，我们首先以GPT-4V和LLaVA-1.5为代表性例子，揭示了它们的视觉编码策略中根植的系统性缺陷。为了应对这些挑战，我们提出了LLaVA-UHD，一个能够高效地感知任意长宽比和高分辨率图像的大型多模态模型。LLaVA-UHD包括三个关键组件：（1）一种图像模块化策略，将原始分辨率图像分割成较小的可变大小切片，以进行高效和可扩展的编码，（2）一个压缩模块，进一步压缩来自视觉编码器的图像令牌，以及（3）一个空间模式，用于组织LLMs的切片令牌。全面的实验证明，LLaVA-UHD在9个基准测试中表现优于使用2-3个数量级更多数据训练的已建立的LMMs。值得注意的是，我们基于LLaVA-1.5 336x336构建的模型支持6倍更大（即672x1088）分辨率的图像，仅使用94%的推理计算，并在TextVQA上实现了6.4的准确度改善。此外，该模型可以在学术环境中高效训练，在8个A100 GPU上仅需23小时（相对于LLaVA-1.5的26小时）。我们将数据和代码公开发布在https://github.com/thunlp/LLaVA-UHD。

更新时间: 2024-03-18 12:04:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.11703v1

Transferring Foundation Models for Generalizable Robotic Manipulation

Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-domain scenarios with new objects and diverse environments. In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks. By integrating the mask modality, which incorporates semantic, geometric, and temporal correlation priors derived from vision foundation models, into the end-to-end policy model, our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning, including new object instances, semantic categories, and unseen backgrounds. We first introduce a series of foundation models to ground natural language demands across multiple tasks. Secondly, we develop a two-stream 2D policy model based on imitation learning, which processes raw images and object masks to predict robot actions with a local-global perception manner. Extensive realworld experiments conducted on a Franka Emika robot arm demonstrate the effectiveness of our proposed paradigm and policy architecture. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.

Updated: 2024-03-18 11:57:09

标题: 将基础模型转移用于具有通用性的机器人操作

摘要: 改进通用目的机器人操作代理在现实世界中的泛化能力长期以来一直是一个重大挑战。现有方法通常依赖于收集大规模的机器人数据，这既昂贵又耗时，例如RT-1数据集。然而，由于数据的多样性不足，这些方法通常在具有新对象和多样化环境的开放领域场景中受限。在本文中，我们提出了一种新的范式，有效地利用由互联网规模基础模型生成的语言推理分割掩模，来调整机器人操作任务。通过将掩模模态集成到端到端策略模型中，该模态融合了从视觉基础模型中派生的语义、几何和时间相关先验，我们的方法可以有效且稳健地感知对象姿态，并实现包括新对象实例、语义类别和未见背景在内的样本高效泛化学习。我们首先引入一系列基础模型来满足多个任务中的自然语言需求。其次，我们基于模仿学习开发了一个基于两流2D策略模型，该模型处理原始图像和对象掩模以以局部-全局感知方式预测机器人动作。在Franka Emika机器人臂上进行的大量实际世界实验证明了我们提出的范式和策略架构的有效性。演示视频可以在我们提交的视频中找到，更全面的演示可以在链接1或链接2中找到。

更新时间: 2024-03-18 11:57:09

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2306.05716v4

HDLdebugger: Streamlining HDL debugging with Large Language Models

In the domain of chip design, Hardware Description Languages (HDLs) play a pivotal role. However, due to the complex syntax of HDLs and the limited availability of online resources, debugging HDL codes remains a difficult and time-intensive task, even for seasoned engineers. Consequently, there is a pressing need to develop automated HDL code debugging models, which can alleviate the burden on hardware engineers. Despite the strong capabilities of Large Language Models (LLMs) in generating, completing, and debugging software code, their utilization in the specialized field of HDL debugging has been limited and, to date, has not yielded satisfactory results. In this paper, we propose an LLM-assisted HDL debugging framework, namely HDLdebugger, which consists of HDL debugging data generation via a reverse engineering approach, a search engine for retrieval-augmented generation, and a retrieval-augmented LLM fine-tuning approach. Through the integration of these components, HDLdebugger can automate and streamline HDL debugging for chip design. Our comprehensive experiments, conducted on an HDL code dataset sourced from Huawei, reveal that HDLdebugger outperforms 13 cutting-edge LLM baselines, displaying exceptional effectiveness in HDL code debugging.

Updated: 2024-03-18 11:19:37

标题: HDLdebugger: 利用大型语言模型简化HDL调试

摘要: 在芯片设计领域，硬件描述语言（HDLs）起着关键作用。然而，由于HDL的复杂语法和在线资源有限，即使对经验丰富的工程师来说，调试HDL代码仍然是一项困难且耗时的任务。因此，迫切需要开发自动化的HDL代码调试模型，可以减轻硬件工程师的负担。尽管大型语言模型（LLMs）在生成、完善和调试软件代码方面具有很强的能力，但它们在专业领域HDL调试方面的利用受到限制，迄今为止还没有取得令人满意的结果。在本文中，我们提出了一个LLM辅助的HDL调试框架，名为HDLdebugger，其中包括通过逆向工程方法生成HDL调试数据，用于检索增强生成的搜索引擎，以及用于检索增强LLM微调的方法。通过整合这些组件，HDLdebugger可以自动化和简化芯片设计的HDL调试。我们在来自华为的HDL代码数据集上进行的全面实验表明，HDLdebugger胜过了13个尖端的LLM基线，在HDL代码调试方面表现出卓越的效果。

更新时间: 2024-03-18 11:19:37

领域: cs.AR,cs.AI,cs.CE,cs.LG,cs.SE

下载: http://arxiv.org/abs/2403.11671v1

Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models

The process of meaning composition, wherein smaller units like morphemes or words combine to form the meaning of phrases and sentences, is essential for human sentence comprehension. Despite extensive neurolinguistic research into the brain regions involved in meaning composition, a computational metric to quantify the extent of composition is still lacking. Drawing on the key-value memory interpretation of transformer feed-forward network blocks, we introduce the Composition Score, a novel model-based metric designed to quantify the degree of meaning composition during sentence comprehension. Experimental findings show that this metric correlates with brain clusters associated with word frequency, structural processing, and general sensitivity to words, suggesting the multifaceted nature of meaning composition during human sentence comprehension.

Updated: 2024-03-18 11:17:48

标题: 用大型语言模型的组成分数在人类大脑中测量意义组成

摘要: 意义组合的过程，即词素或单词等较小单元组合形成短语和句子的意义，对人类句子理解至关重要。尽管在参与意义组合的大脑区域进行了广泛的神经语言学研究，但仍然缺乏一种计算度量来量化组合的程度。借鉴变压器前馈网络块的关键-值内存解释，我们引入了组合分数，这是一种新颖的基于模型的度量标准，旨在量化句子理解过程中的意义组合程度。实验结果显示，这一度量与与单词频率、结构处理和对单词的一般敏感性相关的大脑簇呈正相关，表明在人类句子理解过程中意义组合的多面性特质。

更新时间: 2024-03-18 11:17:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.04325v2

Semantic Data Representation for Explainable Windows Malware Detection Models

Ontologies are a standard tool for creating semantic schemata in many knowledge intensive domains of human interest. They are becoming increasingly important also in the areas that have been until very recently dominated by subsymbolic knowledge representation and machine-learning (ML) based data processing. One such area is information security, and specifically, malware detection. We thus propose PE Malware Ontology that offers a reusable semantic schema for Portable Executable (PE - the Windows binary format) malware files. This ontology is inspired by the structure of the EMBER dataset, which focuses on the static malware analysis of PE files. With this proposal, we hope to provide a unified semantic representation for the existing and future PE-malware datasets and facilitate the application of symbolic, neuro-symbolic, or otherwise explainable approaches in the PE-malware-detection domain, which may produce interpretable results described by the terms defined in our ontology. In addition, we also publish semantically treated EMBER data, including fractional datasets, to support the reproducibility of experiments on EMBER. We supplement our work with a preliminary case study, conducted using concept learning, to show the general feasibility of our approach. While we were not able to match the precision of the state-of-the-art ML tools, the learned malware discriminators were interesting and highly interpretable.

Updated: 2024-03-18 11:17:27

标题: 可解释的Windows恶意软件检测模型的语义数据表示

摘要: 本文提出了PE恶意软件本体论，为便携式可执行文件（PE- Windows二进制格式）恶意软件文件提供了可重用的语义模式。本体论受到EMBER数据集结构的启发，该数据集侧重于PE文件的静态恶意软件分析。通过此提案，我们希望为现有和未来的PE恶意软件数据集提供统一的语义表示，并促进在PE恶意软件检测领域应用符号、神经符号或其他可解释方法，这些方法可能产生由我们定义的术语描述的可解释结果。此外，我们还发布了经过语义处理的EMBER数据，包括分数数据集，以支持在EMBER上实验的可重现性。我们补充了一项初步案例研究，使用概念学习进行，以展示我们方法的一般可行性。尽管我们无法与最先进的机器学习工具的精度相匹配，但学到的恶意软件判别器令人感兴趣且高度可解释。

更新时间: 2024-03-18 11:17:27

领域: cs.CR

下载: http://arxiv.org/abs/2403.11669v1

Safety Analysis of Autonomous Railway Systems: An Introduction to the SACRED Methodology

As the railway industry increasingly seeks to introduce autonomy and machine learning (ML), several questions arise. How can safety be assured for such systems and technologies? What is the applicability of current safety standards within this new technological landscape? What are the key metrics to classify a system as safe? Currently, safety analysis for the railway reflects the failure modes of existing technology; in contrast, the primary concern of analysis of automation is typically average performance. Such purely statistical approaches to measuring ML performance are limited, as they may overlook classes of situations that may occur rarely but in which the function performs consistently poorly. To combat these difficulties we introduce SACRED, a safety methodology for producing an initial safety case and determining important safety metrics for autonomous systems. The development of SACRED is motivated by the proposed GoA-4 light-rail system in Berlin.

Updated: 2024-03-18 11:12:19

标题: 自主铁路系统的安全性分析：SACRED方法论简介

摘要: 随着铁路行业越来越多地寻求引入自主性和机器学习（ML），一些问题浮现出来。如何确保这些系统和技术的安全性？当前的安全标准在这个新的技术领域中的适用性是什么？什么是将系统分类为安全的关键指标？目前，铁路的安全分析反映了现有技术的故障模式；相比之下，自动化分析的主要关注点通常是平均性能。这种纯粹统计方法来衡量ML性能是有限的，因为它们可能忽视那些可能很少发生但在其中该功能表现一直糟糕的情况。为了克服这些困难，我们引入了SACRED，这是一个用于产生初始安全案例并确定自主系统重要安全指标的安全方法。SACRED的开发是受到柏林提议的GoA-4轻轨系统的启发。

更新时间: 2024-03-18 11:12:19

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2403.12114v1

Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

Recently, Multimodal LLMs (MLLMs) have shown a great ability to understand images. However, like traditional vision models, they are still vulnerable to adversarial images. Meanwhile, Chain-of-Thought (CoT) reasoning has been widely explored on MLLMs, which not only improves model's performance, but also enhances model's explainability by giving intermediate reasoning steps. Nevertheless, there is still a lack of study regarding MLLMs' adversarial robustness with CoT and an understanding of what the rationale looks like when MLLMs infer wrong answers with adversarial images. Our research evaluates the adversarial robustness of MLLMs when employing CoT reasoning, finding that CoT marginally improves adversarial robustness against existing attack methods. Moreover, we introduce a novel stop-reasoning attack technique that effectively bypasses the CoT-induced robustness enhancements. Finally, we demonstrate the alterations in CoT reasoning when MLLMs confront adversarial images, shedding light on their reasoning process under adversarial attacks.

Updated: 2024-03-18 10:55:36

标题: 停止推理！当多模态LLMs与链式推理相遇对抗性图像时

摘要: 最近，多模态LLMs（MLLMs）已经展现出了对图像的强大理解能力。然而，与传统视觉模型一样，它们仍然容易受到对抗性图像的影响。同时，Chain-of-Thought（CoT）推理已经在MLLMs上得到了广泛探索，这不仅提高了模型的性能，还通过提供中间推理步骤增强了模型的可解释性。然而，关于MLLMs在CoT下的对抗鲁棒性以及MLLMs在推理错误答案时的理由是什么仍然缺乏研究。我们的研究评估了采用CoT推理时MLLMs的对抗鲁棒性，发现CoT对抗现有攻击方法的鲁棒性稍有提高。此外，我们引入了一种新颖的停止推理攻击技术，有效地绕过了CoT引起的鲁棒性增强。最后，我们展示了MLLMs面对对抗性图像时CoT推理的变化，揭示了它们在对抗性攻击下的推理过程。

更新时间: 2024-03-18 10:55:36

领域: cs.CV,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2402.14899v2

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training.

Updated: 2024-03-18 10:54:15

标题: LeBenchmark 2.0：一种标准化、可复制和增强的自监督法语语音表示框架

摘要: 自监督学习（SSL）是许多不同领域的突破性改进的起源，包括计算机视觉和自然语言处理。语音处理极大地受益于SSL，因为大多数当前与领域相关的任务现在都是用预训练模型来处理的。本文介绍了LeBenchmark 2.0，这是一个用于评估和构建配备SSL的法语语音技术的开源框架。它包括有文档记录的大规模和异构语料库，包含多达14,000小时的异构语音，十个预训练的SSL wav2vec 2.0模型，其中包含从2600万到10亿个可学习参数，与社区共享，并且一个评估协议，由六个下游任务组成，以补充现有的基准。LeBenchmark 2.0还提出了关于预训练SSL模型在语音方面的独特观点，包括冻结与微调下游模型、任务不可知与任务特定的预训练模型，以及大规模模型训练的碳足迹讨论。总的来说，在14,000小时的法语语音上训练的新模型在整个基准测试中表现优于多语言和先前的LeBenchmark SSL模型，但也需要多达四倍的能量进行预训练。

更新时间: 2024-03-18 10:54:15

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2309.05472v2

Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists

Differentially-private (DP) mechanisms can be embedded into the design of a machine learningalgorithm to protect the resulting model against privacy leakage, although this often comes with asignificant loss of accuracy. In this paper, we aim at improving this trade-off for rule lists modelsby establishing the smooth sensitivity of the Gini impurity and leveraging it to propose a DP greedyrule list algorithm. In particular, our theoretical analysis and experimental results demonstrate thatthe DP rule lists models integrating smooth sensitivity have higher accuracy that those using otherDP frameworks based on global sensitivity.

Updated: 2024-03-18 10:44:22

标题: 学习差异隐私但准确规则列表的平滑敏感度

摘要: 差分隐私（DP）机制可以嵌入到机器学习算法的设计中，以保护生成的模型免受隐私泄露的影响，尽管这通常会导致较大的准确性损失。本文旨在通过确定基尼不纯度的平滑敏感性，并利用这一特性提出一种差分隐私贪婪规则列表算法，以改善这种折衷。特别是，我们的理论分析和实验结果表明，集成平滑敏感性的差分隐私规则列表模型比基于全局敏感性的其他DP框架具有更高的准确性。

更新时间: 2024-03-18 10:44:22

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2403.13848v1

Guiding the generation of counterfactual explanations through temporal background knowledge for Predictive Process Monitoring

Counterfactual explanations suggest what should be different in the input instance to change the outcome of an AI system. When dealing with counterfactual explanations in the field of Predictive Process Monitoring, however, control flow relationships among events have to be carefully considered. A counterfactual, indeed, should not violate control flow relationships among activities (temporal background knowledege). Within the field of Explainability in Predictive Process Monitoring, there have been a series of works regarding counterfactual explanations for outcome-based predictions. However, none of them consider the inclusion of temporal background knowledge when generating these counterfactuals. In this work, we adapt state-of-the-art techniques for counterfactual generation in the domain of XAI that are based on genetic algorithms to consider a series of temporal constraints at runtime. We assume that this temporal background knowledge is given, and we adapt the fitness function, as well as the crossover and mutation operators, to maintain the satisfaction of the constraints. The proposed methods are evaluated with respect to state-of-the-art genetic algorithms for counterfactual generation and the results are presented. We showcase that the inclusion of temporal background knowledge allows the generation of counterfactuals more conformant to the temporal background knowledge, without however losing in terms of the counterfactual traditional quality metrics.

Updated: 2024-03-18 10:34:40

标题: 通过时间背景知识引导生成对预测流程监控的反事实解释

摘要: 反事实解释表明改变输入实例以改变AI系统结果应该有什么不同。然而，在预测过程监控领域处理反事实解释时，必须仔细考虑事件之间的控制流关系。事实上，反事实不应违反活动之间的控制流关系（时间背景知识）。在预测过程监控可解释性领域中，已经有一系列关于基于结果的预测的反事实解释的研究。然而，其中没有一个在生成这些反事实时考虑包含时间背景知识。在本研究中，我们改编了基于遗传算法的XAI领域中用于反事实生成的最先进技术，以考虑一系列运行时的时间约束。我们假设这些时间背景知识是已知的，并且我们调整适应度函数以及交叉和变异运算符，以保持满足约束的情况。提出的方法与反事实生成的最先进遗传算法进行了评估，并呈现了结果。我们展示了包含时间背景知识可以更符合时间背景知识的反事实生成，同时不会在反事实传统质量指标方面失去。

更新时间: 2024-03-18 10:34:40

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11642v1

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.

Updated: 2024-03-18 10:32:59

标题: 不受限制的随机CCA：统一多视角和自监督学习

摘要: 典范相关分析（CCA）系列方法在多视图学习中具有基础性作用。正则化线性CCA方法可以被看作是对偏最小二乘（PLS）的推广，并与广义特征值问题（GEP）框架统一起来。然而，这些线性方法的经典算法在处理大规模数据时计算上不可行。对深度CCA的扩展展现出很大的潜力，但目前的训练过程速度缓慢且复杂。首先，我们提出了一个表征顶部广义特征值问题子空间的新型无约束目标。我们的核心贡献是一族快速算法，用随机梯度下降（SGD）应用于相应的CCA目标，从而简单地获得随机偏最小二乘，随机CCA和深度CCA。我们的算法显示出比以往最先进技术更快的收敛速度，并在所有标准CCA和深度CCA基准测试中实现更高的相关性恢复。这些改进使我们能够对英国生物银行的一个极大的生物医学数据集进行首次偏最小二乘分析，该数据集包含超过33,000名个体和500,000个特征。最后，我们将我们的算法应用于匹配“CCA家族”自监督学习（SSL）方法在CIFAR-10和CIFAR-100上的性能，几乎没有超参数调整，并提出理论来澄清这些方法与经典CCA之间的联系，为未来的洞察奠定基础。

更新时间: 2024-03-18 10:32:59

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2310.01012v3

QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation

The study of music-generated dance is a novel and challenging Image generation task. It aims to input a piece of music and seed motions, then generate natural dance movements for the subsequent music. Transformer-based methods face challenges in time series prediction tasks related to human movements and music due to their struggle in capturing the nonlinear relationship and temporal aspects. This can lead to issues like joint deformation, role deviation, floating, and inconsistencies in dance movements generated in response to the music. In this paper, we propose a Quaternion-Enhanced Attention Network (QEAN) for visual dance synthesis from a quaternion perspective, which consists of a Spin Position Embedding (SPE) module and a Quaternion Rotary Attention (QRA) module. First, SPE embeds position information into self-attention in a rotational manner, leading to better learning of features of movement sequences and audio sequences, and improved understanding of the connection between music and dance. Second, QRA represents and fuses 3D motion features and audio features in the form of a series of quaternions, enabling the model to better learn the temporal coordination of music and dance under the complex temporal cycle conditions of dance generation. Finally, we conducted experiments on the dataset AIST++, and the results show that our approach achieves better and more robust performance in generating accurate, high-quality dance movements. Our source code and dataset can be available from https://github.com/MarasyZZ/QEAN and https://google.github.io/aistplusplus_dataset respectively.

Updated: 2024-03-18 09:58:43

标题: QEAN: 四元数增强注意力网络用于视觉舞蹈生成

摘要: 音乐生成的舞蹈研究是一项新颖且具有挑战性的图像生成任务。它旨在输入一段音乐和种子动作，然后为随后的音乐生成自然的舞蹈动作。基于Transformer的方法在与人类运动和音乐相关的时间序列预测任务中面临挑战，因为它们在捕捉非线性关系和时间方面的困难。这可能导致在响应音乐生成的舞蹈动作中出现关节变形、角色偏差、漂浮和不一致等问题。在本文中，我们提出了一种基于四元数增强注意力网络（QEAN）的视觉舞蹈合成方法，从四元数的角度来看，它由自旋位置嵌入（SPE）模块和四元数旋转注意力（QRA）模块组成。首先，SPE以旋转方式将位置信息嵌入自我注意力中，从而更好地学习动作序列和音频序列的特征，并改进对音乐和舞蹈之间关联的理解。其次，QRA以四元数的形式表示和融合3D运动特征和音频特征，使模型能够在舞蹈生成的复杂时间周期条件下更好地学习音乐和舞蹈的时间协调。最后，我们在AIST++数据集上进行了实验，结果表明我们的方法在生成准确、高质量的舞蹈动作方面取得了更好和更稳健的表现。我们的源代码和数据集可以从https://github.com/MarasyZZ/QEAN和https://google.github.io/aistplusplus_dataset获取。

更新时间: 2024-03-18 09:58:43

领域: cs.GR,cs.AI,cs.CV,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2403.11626v1

Assessing the potential of AI-assisted pragmatic annotation: The case of apologies

Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores automating pragma-discursive corpus annotation using large language models (LLMs). We compare ChatGPT, the Bing chatbot, and a human coder in annotating apology components in English based on the local grammar framework. We find that the Bing chatbot outperformed ChatGPT, with accuracy approaching that of a human coder. These results suggest that AI can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient and scalable. Keywords: linguistic annotation, function-to-form approaches, large language models, local grammar analysis, Bing chatbot, ChatGPT

Updated: 2024-03-18 09:56:09

标题: 评估AI辅助实用标注的潜力：道歉案例

摘要: 某些形式的语言注释，如词性和语义标记，可以自动化高精度地进行。然而，对于缺乏与词汇形式直接映射的复杂语用和话语特征，仍然需要手动注释。这种手动过程耗时且容易出错，限制了语料库语言学中功能到形式方法的可扩展性。为了解决这个问题，我们的研究探讨了使用大型语言模型（LLMs）自动化语用-话语语料库注释。我们基于本地语法框架比较了ChatGPT、必应聊天机器人和一个人类编码者在英语道歉组件注释方面的表现。我们发现必应聊天机器人的表现优于ChatGPT，准确率接近人类编码者。这些结果表明，人工智能可以成功应用于辅助语用-话语语料库注释，使该过程更加高效和可扩展。关键词：语言注释，功能到形式方法，大型语言模型，本地语法分析，必应聊天机器人，ChatGPT

更新时间: 2024-03-18 09:56:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.08339v4

Fair Division of Multi-layered Cakes

We consider multi-layered cake cutting in order to fairly allocate numerous divisible resources (layers of cake) among a group of agents under two constraints: contiguity and feasibility. We first introduce a new computational model in a multi-layered cake named ``a pair of knives''. Then, we show the existence of an exact multi-allocation for two agents and two layers using the new computational model. We demonstrate the computation procedure of a feasible and contiguous proportional multi-allocation over a three-layered cake for more than three agents. Finally, we develop a technique for computing proportional allocations for any number $n\geq 2^a3$ of agents and $2^a3$ layers, where $a$ is any positive integer.

Updated: 2024-03-18 09:51:27

标题: 多层蛋糕的公平分配

摘要: 我们考虑多层蛋糕切割，以便在两个约束条件下（连续性和可行性）在一组代理人之间公平地分配大量可分割资源（蛋糕层）。我们首先在多层蛋糕中引入一个名为“一对刀”的新计算模型。然后，我们展示了使用新的计算模型对两个代理人和两个层次进行精确多重分配的存在性。我们展示了在三层蛋糕上为三个以上代理人计算可行和连续比例多重分配的计算过程。最后，我们开发了一种计算任意数量$n\geq 2^a3$个代理人和$2^a3$层次的比例分配的技术，其中$a$是任意正整数。

更新时间: 2024-03-18 09:51:27

领域: cs.AI,cs.GT

下载: http://arxiv.org/abs/2208.00726v2

Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving

Large Language Models (LLMs) have garnered significant attention for their ability to understand text and images, generate human-like text, and perform complex reasoning tasks. However, their ability to generalize this advanced reasoning with a combination of natural language text for decision-making in dynamic situations requires further exploration. In this study, we investigate how well LLMs can adapt and apply a combination of arithmetic and common-sense reasoning, particularly in autonomous driving scenarios. We hypothesize that LLMs hybrid reasoning abilities can improve autonomous driving by enabling them to analyze detected object and sensor data, understand driving regulations and physical laws, and offer additional context. This addresses complex scenarios, like decisions in low visibility (due to weather conditions), where traditional methods might fall short. We evaluated Large Language Models (LLMs) based on accuracy by comparing their answers with human-generated ground truth inside CARLA. The results showed that when a combination of images (detected objects) and sensor data is fed into the LLM, it can offer precise information for brake and throttle control in autonomous vehicles across various weather conditions. This formulation and answers can assist in decision-making for auto-pilot systems.

Updated: 2024-03-18 09:50:00

标题: 基于大型语言模型的混合推理用于自动驾驶汽车

摘要: 大型语言模型（LLMs）因其理解文本和图像、生成类似人类文本和执行复杂推理任务的能力而引起了广泛关注。然而，它们将这种高级推理与自然语言文本相结合以在动态情况下进行决策的能力需要进一步探索。在本研究中，我们调查了LLMs在自主驾驶场景中能够多好地适应和应用算术和常识推理的组合。我们假设LLMs的混合推理能力可以通过分析检测到的物体和传感器数据、理解驾驶规则和物理定律，并提供额外的背景来改善自主驾驶。这解决了复杂的情景，如低能见度（由于天气条件）下的决策，传统方法可能会有所不足。我们通过将LLMs的答案与CARLA内人工生成的标准答案进行比较，以准确性对其进行评估。结果显示，当将图像（检测到的物体）和传感器数据的组合输入LLM时，它能够为自主车辆在各种天气条件下的制动和油门控制提供精确信息。这种公式和答案可以帮助自动驾驶系统做出决策。

更新时间: 2024-03-18 09:50:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.13602v3

Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning

Toddlers evolve from free exploration with sparse feedback to exploiting prior experiences for goal-directed learning with denser rewards. Drawing inspiration from this Toddler-Inspired Reward Transition, we set out to explore the implications of varying reward transitions when incorporated into Reinforcement Learning (RL) tasks. Central to our inquiry is the transition from sparse to potential-based dense rewards, which share optimal strategies regardless of reward changes. Through various experiments, including those in egocentric navigation and robotic arm manipulation tasks, we found that proper reward transitions significantly influence sample efficiency and success rates. Of particular note is the efficacy of the toddler-inspired Sparse-to-Dense (S2D) transition. Beyond these performance metrics, using Cross-Density Visualizer technique, we observed that transitions, especially the S2D, smooth the policy loss landscape, promoting wide minima that enhance generalization in RL models.

Updated: 2024-03-18 09:43:20

标题: 揭示幼儿启发式奖励转换在目标导向的强化学习中的重要性

摘要: 幼儿从自由探索到稀疏反馈，再到利用先前经验进行目标导向学习并获取更丰厚奖励。受此幼儿启发的奖励转变启发我们探索将不同奖励转变纳入强化学习（RL）任务时的影响。我们研究的核心是从稀疏到基于潜力的丰厚奖励的转变，这些转变无论奖励如何变化都共享最佳策略。通过包括自我导航和机器人臂操纵任务在内的各种实验，我们发现适当的奖励转变显著影响样本效率和成功率。特别值得注意的是受幼儿启发的稀疏到稠密（S2D）转变的有效性。除了这些性能指标，使用交叉密度可视化技术，我们观察到转变，特别是S2D，平滑了策略损失景观，促进了在RL模型中的泛化的广泛极小值。

更新时间: 2024-03-18 09:43:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.06880v2

Deep Homography Estimation for Visual Place Recognition

Visual place recognition (VPR) is a fundamental task for many applications such as robot localization and augmented reality. Recently, the hierarchical VPR methods have received considerable attention due to the trade-off between accuracy and efficiency. They usually first use global features to retrieve the candidate images, then verify the spatial consistency of matched local features for re-ranking. However, the latter typically relies on the RANSAC algorithm for fitting homography, which is time-consuming and non-differentiable. This makes existing methods compromise to train the network only in global feature extraction. Here, we propose a transformer-based deep homography estimation (DHE) network that takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Moreover, we design a re-projection error of inliers loss to train the DHE network without additional homography labels, which can also be jointly trained with the backbone network to help it extract the features that are more suitable for local matching. Extensive experiments on benchmark datasets show that our method can outperform several state-of-the-art methods. And it is more than one order of magnitude faster than the mainstream hierarchical VPR methods using RANSAC. The code is released at https://github.com/Lu-Feng/DHE-VPR.

Updated: 2024-03-18 09:33:47

标题: 深度单应性估计用于视觉位置识别

摘要: 视觉地点识别（VPR）是许多应用程序的基本任务，例如机器人定位和增强现实。最近，由于准确性和效率之间的权衡，分层VPR方法受到了广泛关注。它们通常首先使用全局特征检索候选图像，然后验证匹配的局部特征的空间一致性以进行重新排名。然而，后者通常依赖于用于拟合单应性的RANSAC算法，这是耗时且不可微分的。这使得现有方法妥协于仅在全局特征提取中训练网络。在这里，我们提出了一种基于变换器的深度单应性估计（DHE）网络，它以由骨干网络提取的密集特征图作为输入，并拟合单应性以进行快速和可学习的几何验证。此外，我们设计了一个内点重投影误差损失来训练DHE网络，无需额外的单应性标签，还可以与骨干网络一起进行联合训练，以帮助它提取更适合于局部匹配的特征。对基准数据集的大量实验表明，我们的方法可以胜过几种最先进的方法。并且比使用RANSAC的主流分层VPR方法快一个数量级以上。代码发布在https://github.com/Lu-Feng/DHE-VPR。

更新时间: 2024-03-18 09:33:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.16086v2

Optimal Transport for Domain Adaptation through Gaussian Mixture Models

In this paper we explore domain adaptation through optimal transport. We propose a novel approach, where we model the data distributions through Gaussian mixture models. This strategy allows us to solve continuous optimal transport through an equivalent discrete problem. The optimal transport solution gives us a matching between source and target domain mixture components. From this matching, we can map data points between domains, or transfer the labels from the source domain components towards the target domain. We experiment with 2 domain adaptation benchmarks in fault diagnosis, showing that our methods have state-of-the-art performance.

Updated: 2024-03-18 09:32:33

标题: 通过高斯混合模型的最优传输进行域自适应

摘要: 在这篇论文中，我们通过最优输运探讨领域自适应。我们提出了一种新颖的方法，通过高斯混合模型对数据分布进行建模。这种策略使我们能够通过等效的离散问题解决连续最优输运问题。最优输运解决方案为我们提供了源领域和目标领域混合组件之间的匹配。通过这种匹配，我们可以在领域之间映射数据点，或者将标签从源领域组件转移到目标领域。我们在故障诊断中进行了两个领域自适应基准实验，结果显示我们的方法具有最先进的性能。

更新时间: 2024-03-18 09:32:33

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2403.13847v1

Graph Neural Modeling of Network Flows

Network flow problems, which involve distributing traffic such that the underlying infrastructure is used effectively, are ubiquitous in transportation and logistics. Among them, the general Multi-Commodity Network Flow (MCNF) problem concerns the distribution of multiple flows of different sizes between several sources and sinks, while achieving effective utilization of the links. Due to the appeal of data-driven optimization, these problems have increasingly been approached using graph learning methods. In this paper, we propose a novel graph learning architecture for network flow problems called Per-Edge Weights (PEW). This method builds on a Graph Attention Network and uses distinctly parametrized message functions along each link. We extensively evaluate the proposed solution through an Internet flow routing case study using $17$ Service Provider topologies and $2$ routing schemes. We show that PEW yields substantial gains over architectures whose global message function constrains the routing unnecessarily. We also find that an MLP is competitive with other standard architectures. Furthermore, we analyze the relationship between graph structure and predictive performance for data-driven routing of flows, an aspect that has not been considered by existing work in the area.

Updated: 2024-03-18 09:28:39

标题: 图神经网络建模网络流量

摘要: 网络流问题涉及有效分配流量，以确保基础设施的有效利用，在交通运输和物流领域普遍存在。其中，一般的多商品网络流（MCNF）问题涉及在多个源点和汇点之间分配不同大小的多个流量，同时实现链路的有效利用。由于数据驱动优化的吸引力，这些问题越来越多地使用图学习方法来解决。在本文中，我们提出了一种针对网络流问题的新型图学习架构，称为Per-Edge Weights（PEW）。该方法建立在图注意力网络之上，并使用沿着每条链路具有不同参数化的消息函数。我们通过一个互联网流量路由案例研究对所提出的解决方案进行了广泛评估，使用了17个服务提供商拓扑和2种路由方案。我们发现PEW相对于全局消息函数限制路由的架构产生了显著收益。我们还发现MLP在与其他标准架构竞争时具有竞争力。此外，我们分析了图结构与数据驱动流量路由的预测性能之间的关系，这是现有研究尚未考虑的一个方面。

更新时间: 2024-03-18 09:28:39

领域: cs.LG,cs.AI,cs.NI

下载: http://arxiv.org/abs/2209.05208v3

Optimal Layout Synthesis for Deep Quantum Circuits on NISQ Processors with 100+ Qubits

Layout synthesis is mapping a quantum circuit to a quantum processor. SWAP gate insertions are needed for scheduling 2-qubit gates only on connected physical qubits. With the ever-increasing number of qubits in NISQ processors, scalable layout synthesis is of utmost importance. With large optimality gaps observed in heuristic approaches, scalable exact methods are needed. While recent exact and near-optimal approaches scale to moderate circuits, large deep circuits are still out of scope. In this work, we propose a SAT encoding based on parallel plans that apply 1 SWAP and a group of CNOTs at each time step. Using domain-specific information, we maintain optimality in parallel plans while scaling to large and deep circuits. From our results, we show the scalability of our approach which significantly outperforms leading exact and near-optimal approaches (up to 100x). For the first time, we can optimally map several 8, 14, and 16 qubit circuits onto 54, 80, and 127 qubit platforms with up to 17 SWAPs. While adding optimal SWAPs, we also report near-optimal depth in our mapped circuits.

Updated: 2024-03-18 09:19:01

标题: 量子NISQ处理器上具有100个以上量子比特的深度量子电路的最佳布局综合

摘要: 布局综合是将量子电路映射到量子处理器。在NISQ处理器中，需要插入SWAP门以仅在连接的物理量子比特上安排2比特门。随着NISQ处理器中量子比特数量不断增加，可扩展的布局综合至关重要。在启发式方法中观察到的大优化差距表明，需要可扩展的精确方法。尽管最近的精确和接近最优方法可以扩展到中等电路，但大型深电路仍超出范围。在这项工作中，我们提出了一种基于并行计划的SAT编码，每个时间步骤应用1个SWAP和一组CNOT。利用特定领域信息，我们在并行计划中保持最优性，同时扩展到大型和深电路。通过我们的结果，我们展示了我们方法的可扩展性，明显优于领先的精确和接近最优方法（高达100倍）。首次，我们可以将几个8、14和16比特电路最佳映射到54、80和127比特平台上，最多使用17个SWAP。同时添加最佳SWAP，我们还在我们映射的电路中报告了接近最优的深度。

更新时间: 2024-03-18 09:19:01

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2403.11598v1

Formal Security Analysis of the AMD SEV-SNP Software Interface

AMD Secure Encrypted Virtualization technologies enable confidential computing by protecting virtual machines from highly privileged software such as hypervisors. In this work, we develop the first, comprehensive symbolic model of the software interface of the latest SEV iteration called SEV Secure Nested Paging (SEV-SNP). Our model covers remote attestation, key derivation, page swap and live migration. We analyze the security of the software interface of SEV-SNP by verifying critical secrecy, authentication, attestation and freshness properties, and find that the platform-agnostic nature of messages exchanged between SNP guests and the AMD Secure Processor firmware presents a weakness of the design. We show multiple ways of exploiting this weakness, including the compromise of attestation report integrity, and suggest slight modifications to the design which let third parties detect guest migrations to vulnerable platforms

Updated: 2024-03-18 09:09:11

标题: AMD SEV-SNP软件接口的正式安全分析

摘要: AMD Secure Encrypted Virtualization技术通过保护虚拟机免受特权软件（如hypervisor）的侵害，实现了机密计算。在这项工作中，我们开发了最新SEV迭代版本SEV Secure Nested Paging（SEV-SNP）软件接口的首个全面符号模型。我们的模型涵盖了远程认证、密钥衍生、页面交换和实时迁移。我们通过验证SEV-SNP的软件接口的关键保密、认证、认证和新鲜特性来分析其安全性，并发现在SNP客户和AMD Secure Processor固件之间交换的消息的平台无关性表现出设计上的弱点。我们展示了利用这一弱点的多种方式，包括破坏认证报告完整性，并建议对设计进行轻微修改，以便第三方检测客户端迁移到容易受攻击的平台。

更新时间: 2024-03-18 09:09:11

领域: cs.CR

下载: http://arxiv.org/abs/2403.10296v2

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

As the capabilities of large multimodal models (LMMs) continue to advance, evaluating the performance of LMMs emerges as an increasing need. Additionally, there is an even larger gap in evaluating the advanced knowledge and reasoning abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to evaluate LMMs on tasks demanding college-level subject knowledge and deliberate reasoning in a Chinese context. CMMMU is inspired by and strictly follows the annotation and analysis pattern of MMMU. CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. CMMMU focuses on complex perception and reasoning with domain-specific knowledge in the Chinese context. We evaluate 11 open-source LLMs and one proprietary GPT-4V(ision). Even GPT-4V only achieves accuracies of 42%, indicating a large space for improvement. CMMMU will boost the community to build the next-generation LMMs towards expert artificial intelligence and promote the democratization of LMMs by providing diverse language contexts.

Updated: 2024-03-18 09:02:03

标题: CMMMU：一个中文大规模多学科多模态理解基准Benchmark

摘要: 随着大型多模型模型（LMMs）的能力不断提升，评估LMMs的性能成为一个日益迫切的需求。此外，在非英语环境（如中文）中评估LMMs的先进知识和推理能力存在更大的差距。我们介绍了CMMMU，一个新的中文大规模跨学科多模态理解基准，旨在评估LMMs在中国背景下要求大学水平学科知识和深思熟虑推理的任务。CMMMU受MMMUs的注释和分析模式启发，并严格遵循其模式。 CMMMU包括从大学考试、测验和教科书中手动收集的12k个多模态问题，涵盖了六个核心学科：艺术与设计、商业、科学、健康与医学、人文社会科学以及技术与工程，与其伙伴MMMUs一样。这些问题涵盖30个学科，包括39种高度异质的图像类型，如图表、图表、地图、表格、乐谱和化学结构。 CMMMU专注于在中国环境中具有领域特定知识的复杂感知和推理。我们评估了11个开源LLMs和一个专有的GPT-4V（视觉）。即使是GPT-4V也只能达到42%的准确率，表明有很大的改进空间。CMMMU将推动社区构建下一代LMMs，朝着专家人工智能的方向发展，并通过提供多样的语言环境促进LMMs的民主化。

更新时间: 2024-03-18 09:02:03

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2401.11944v2

Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

In the ever-evolving landscape of machine learning, seamless translation of natural language descriptions into executable code remains a formidable challenge. This paper introduces Linguacodus, an innovative framework designed to tackle this challenge by deploying a dynamic pipeline that iteratively transforms natural language task descriptions into code through high-level data-shaping instructions. The core of Linguacodus is a fine-tuned large language model (LLM), empowered to evaluate diverse solutions for various problems and select the most fitting one for a given task. This paper details the fine-tuning process, and sheds light on how natural language descriptions can be translated into functional code. Linguacodus represents a substantial leap towards automated code generation, effectively bridging the gap between task descriptions and executable code. It holds great promise for advancing machine learning applications across diverse domains. Additionally, we propose an algorithm capable of transforming a natural description of an ML task into code with minimal human interaction. In extensive experiments on a vast machine learning code dataset originating from Kaggle, we showcase the effectiveness of Linguacodus. The investigations highlight its potential applications across diverse domains, emphasizing its impact on applied machine learning in various scientific fields.

Updated: 2024-03-18 08:58:47

标题: Linguacodus：机器学习流水线中变革性代码生成的协同框架

摘要: 在不断发展的机器学习领域中，将自然语言描述无缝地转化为可执行代码仍然是一个艰巨的挑战。本文介绍了Linguacodus，这是一个创新的框架，旨在通过部署一个动态管道，通过高级数据整形指令将自然语言任务描述迭代地转化为代码，从而应对这一挑战。Linguacodus的核心是一个经过精细调整的大型语言模型（LLM），它能够评估不同问题的多样化解决方案，并为特定任务选择最合适的解决方案。本文详细介绍了精细调整过程，并阐明了如何将自然语言描述转化为功能性代码。Linguacodus代表了自动代码生成领域的重大进展，有效地弥合了任务描述与可执行代码之间的差距。它对于在不同领域推进机器学习应用具有巨大的潜力。此外，我们提出了一种算法，能够将ML任务的自然描述转化为代码，减少人类干预。在来自Kaggle的庞大机器学习代码数据集上进行了大量实验，展示了Linguacodus的有效性。研究突出了它在不同领域的潜在应用，强调了它对各种科学领域的应用机器学习的影响。

更新时间: 2024-03-18 08:58:47

领域: cs.LG,cs.AI,cs.CL,cs.PL,cs.SE

下载: http://arxiv.org/abs/2403.11585v1

Is it Really Negative? Evaluating Natural Language Video Localization Performance on Multiple Reliable Videos Pool

With the explosion of multimedia content in recent years, Video Corpus Moment Retrieval (VCMR), which aims to detect a video moment that matches a given natural language query from multiple videos, has become a critical problem. However, existing VCMR studies have a significant limitation since they have regarded all videos not paired with a specific query as negative, neglecting the possibility of including false negatives when constructing the negative video set. In this paper, we propose an MVMR (Massive Videos Moment Retrieval) task that aims to localize video frames within a massive video set, mitigating the possibility of falsely distinguishing positive and negative videos. For this task, we suggest an automatic dataset construction framework by employing textual and visual semantic matching evaluation methods on the existing video moment search datasets and introduce three MVMR datasets. To solve MVMR task, we further propose a strong method, CroCs, which employs cross-directional contrastive learning that selectively identifies the reliable and informative negatives, enhancing the robustness of a model on MVMR task. Experimental results on the introduced datasets reveal that existing video moment search models are easily distracted by negative video frames, whereas our model shows significant performance.

Updated: 2024-03-18 08:55:36

标题: 这个标题的翻译是：它真的是负面的吗？在多个可靠视频池上评估自然语言视频本地化性能

摘要: 随着近年来多媒体内容的激增，视频语料库时刻检索（VCMR）已成为一个关键问题，旨在从多个视频中检测与给定自然语言查询匹配的视频时刻。然而，现有的VCMR研究存在一个重要局限，即他们将所有未配对特定查询的视频视为负样本，忽略了在构建负视频集时可能包含假负样本的可能性。本文提出了一个旨在在大规模视频集中定位视频帧的MVMR（大规模视频时刻检索）任务，减轻了误将正样本和负样本区分开的可能性。为了解决这一任务，我们提出了一个自动数据集构建框架，通过在现有视频时刻搜索数据集上采用文本和视觉语义匹配评估方法，并引入了三个MVMR数据集。为了解决MVMR任务，我们进一步提出了一种强大的方法CroCs，它采用交叉方向对比学习，有选择地识别可靠和信息丰富的负样本，增强模型在MVMR任务上的鲁棒性。引入数据集的实验结果显示，现有的视频时刻搜索模型很容易被负视频帧分散注意力，而我们的模型表现出显著的性能。

更新时间: 2024-03-18 08:55:36

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2309.16701v2

Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum Simulations

Achieving high-performance computation on quantum systems presents a formidable challenge that necessitates bridging the capabilities between quantum hardware and classical computing resources. This study introduces an innovative distribution-aware Quantum-Classical-Quantum (QCQ) architecture, which integrates cutting-edge quantum software framework works with high-performance classical computing resources to address challenges in quantum simulation for materials and condensed matter physics. At the heart of this architecture is the seamless integration of VQE algorithms running on QPUs for efficient quantum state preparation, Tensor Network states, and QCNNs for classifying quantum states on classical hardware. For benchmarking quantum simulators, the QCQ architecture utilizes the cuQuantum SDK to leverage multi-GPU acceleration, integrated with PennyLane's Lightning plugin, demonstrating up to tenfold increases in computational speed for complex phase transition classification tasks compared to traditional CPU-based methods. This significant acceleration enables models such as the transverse field Ising and XXZ systems to accurately predict phase transitions with a 99.5% accuracy. The architecture's ability to distribute computation between QPUs and classical resources addresses critical bottlenecks in Quantum-HPC, paving the way for scalable quantum simulation. The QCQ framework embodies a synergistic combination of quantum algorithms, machine learning, and Quantum-HPC capabilities, enhancing its potential to provide transformative insights into the behavior of quantum systems across different scales. As quantum hardware continues to improve, this hybrid distribution-aware framework will play a crucial role in realizing the full potential of quantum computing by seamlessly integrating distributed quantum resources with the state-of-the-art classical computing infrastructure.

Updated: 2024-03-18 08:54:10

标题: 多GPU启用的混合量子-经典工作流在量子-HPC中间件中：在量子模拟中的应用

摘要: 实现量子系统高性能计算面临着巨大挑战，需要弥合量子硬件和经典计算资源之间的能力差距。本研究引入了一种创新的分布感知量子-经典-量子（QCQ）架构，将前沿量子软件框架与高性能经典计算资源相结合，以解决材料和凝聚态物理领域量子模拟中的挑战。该架构的核心是在QPUs上运行的VQE算法，用于高效的量子态准备，以及在经典硬件上对量子态进行分类的张量网络状态和QCNNs的无缝集成。对于基准测试量子模拟器，QCQ架构利用cuQuantum SDK来利用多GPU加速，结合PennyLane的Lightning插件，相较于传统基于CPU的方法，为复杂相变分类任务展示了高达十倍的计算速度增加。这种显著加速使得横向场伊辛和XXZ系统等模型能够以99.5%的准确度准确预测相变。该架构在QPUs和经典资源之间分发计算的能力解决了量子高性能计算中的关键瓶颈，为可扩展的量子模拟铺平道路。 QCQ框架体现了量子算法、机器学习和量子高性能计算能力的协同组合，提升了其在不同尺度下提供变革性洞察力的潜力。随着量子硬件的持续改进，这种混合分布感知框架将在实现量子计算的全部潜力上发挥关键作用，通过无缝集成分布式量子资源和最先进的经典计算基础设施。

更新时间: 2024-03-18 08:54:10

领域: quant-ph,cs.AI,cs.AR,cs.DC

下载: http://arxiv.org/abs/2403.05828v2

Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale

A syntactic language model (SLM) incrementally generates a sentence with its syntactic tree in a left-to-right manner. We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts with high parallelism. GPST circumvents the limitations of previous SLMs such as relying on gold trees and sequential training. It consists of two components, a usual SLM supervised by a uni-directional language modeling loss, and an additional composition model, which induces syntactic parse trees and computes constituent representations, supervised by a bi-directional language modeling loss. We propose a representation surrogate to enable joint parallel training of the two models in a hard-EM fashion. We pre-train GPST on OpenWebText, a corpus with $9$ billion tokens, and demonstrate the superiority of GPST over GPT-2 with a comparable size in numerous tasks covering both language understanding and language generation. Meanwhile, GPST also significantly outperforms existing unsupervised SLMs on left-to-right grammar induction, while holding a substantial acceleration on training.

Updated: 2024-03-18 08:48:30

标题: 预训练结构化生成变换器：规模化无监督句法语言模型

摘要: 一种句法语言模型（SLM）以从左到右的方式逐步生成句子及其句法树。我们提出了生成预训练结构变换器（GPST），这是一个规模庞大的无监督SLM，能够在原始文本上进行从头开始高并行度的预训练。GPST避开了以往SLM的局限，如依赖于黄金树和顺序训练。它由两个组成部分组成，一个常规SLM由单向语言建模损失监督，另一个是附加的组合模型，诱导句法分析树并计算成分表示，由双向语言建模损失监督。我们提出了一种表示替代方案，使得两个模型能够以硬EM方式进行联合并行训练。我们在OpenWebText上对GPST进行了预训练，这是一个包含90亿标记的语料库，并展示了GPST在各种任务上的优越性，涵盖了语言理解和语言生成。同时，GPST在左到右语法归纳方面也显著优于现有的无监督SLM，在训练速度上也有显著加速。

更新时间: 2024-03-18 08:48:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.08293v2

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

We propose to improve transformers of a specific modality with irrelevant data from other modalities, e.g., improve an ImageNet model with audio or point cloud datasets. We would like to highlight that the data samples of the target modality are irrelevant to the other modalities, which distinguishes our method from other works utilizing paired (e.g., CLIP) or interleaved data of different modalities. We propose a methodology named Multimodal Pathway - given a target modality and a transformer designed for it, we use an auxiliary transformer trained with data of another modality and construct pathways to connect components of the two models so that data of the target modality can be processed by both models. In this way, we utilize the universal sequence-to-sequence modeling abilities of transformers obtained from two modalities. As a concrete implementation, we use a modality-specific tokenizer and task-specific head as usual but utilize the transformer blocks of the auxiliary model via a proposed method named Cross-Modal Re-parameterization, which exploits the auxiliary weights without any inference costs. On the image, point cloud, video, and audio recognition tasks, we observe significant and consistent performance improvements with irrelevant data from other modalities. The code and models are available at https://github.com/AILab-CVC/M2PT.

Updated: 2024-03-18 08:45:52

标题: 多模态路径：利用其他模态的无关数据改进变压器

摘要: 我们提出通过来自其他模态的不相关数据来改进特定模态的变压器，例如，改进一个ImageNet模型与音频或点云数据集。我们想要强调目标模态的数据样本与其他模态无关，这将我们的方法与利用配对数据（如CLIP）或不同模态的交叉数据的其他工作区分开来。我们提出了一种名为多模态路径的方法 - 给定目标模态和为其设计的变压器，我们使用另一个模态的数据训练的辅助变压器，并构建路径来连接两个模型的组件，以便目标模态的数据可以被两个模型处理。通过这种方式，我们利用从两个模态获得的变压器的通用序列到序列建模能力。作为具体实现，我们像往常一样使用特定于模态的标记器和任务特定的头，但通过一种名为跨模态重新参数化的方法利用辅助模型的变压器块，该方法利用辅助权重而不产生任何推理成本。在图像、点云、视频和音频识别任务中，我们观察到通过来自其他模态的不相关数据的显著且一致的性能改进。代码和模型可在https://github.com/AILab-CVC/M2PT上找到。

更新时间: 2024-03-18 08:45:52

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.14405v2

ProMISe: Promptable Medical Image Segmentation using SAM

With the proposal of the Segment Anything Model (SAM), fine-tuning SAM for medical image segmentation (MIS) has become popular. However, due to the large size of the SAM model and the significant domain gap between natural and medical images, fine-tuning-based strategies are costly with potential risk of instability, feature damage and catastrophic forgetting. Furthermore, some methods of transferring SAM to a domain-specific MIS through fine-tuning strategies disable the model's prompting capability, severely limiting its utilization scenarios. In this paper, we propose an Auto-Prompting Module (APM), which provides SAM-based foundation model with Euclidean adaptive prompts in the target domain. Our experiments demonstrate that such adaptive prompts significantly improve SAM's non-fine-tuned performance in MIS. In addition, we propose a novel non-invasive method called Incremental Pattern Shifting (IPS) to adapt SAM to specific medical domains. Experimental results show that the IPS enables SAM to achieve state-of-the-art or competitive performance in MIS without the need for fine-tuning. By coupling these two methods, we propose ProMISe, an end-to-end non-fine-tuned framework for Promptable Medical Image Segmentation. Our experiments demonstrate that both using our methods individually or in combination achieves satisfactory performance in low-cost pattern shifting, with all of SAM's parameters frozen.

Updated: 2024-03-18 08:40:48

标题: ProMISe: 使用SAM进行可提示的医学图像分割

摘要: 随着提出了分割任何模型（SAM），对医学图像分割（MIS）进行SAM的微调变得流行起来。然而，由于SAM模型的庞大尺寸以及自然图像和医学图像之间显著的领域差距，基于微调的策略成本高且存在潜在的不稳定性、特征损坏和灾难性遗忘的风险。此外，通过微调策略将SAM转移到特定领域的MIS的一些方法会禁用模型的提示能力，严重限制其使用场景。在本文中，我们提出了一个自动提示模块（APM），为SAM基础模型在目标领域提供欧几里得自适应提示。我们的实验表明，这种自适应提示显著提高了SAM在MIS中未经微调的性能。此外，我们提出了一种称为增量模式转移（IPS）的新型非侵入性方法，以使SAM适应特定的医学领域。实验结果显示，IPS使SAM在MIS中实现了最先进或具有竞争力的性能，无需进行微调。通过结合这两种方法，我们提出了ProMISe，一个端到端的非微调框架，用于可提示的医学图像分割。我们的实验表明，无论是单独使用我们的方法还是结合使用，都可以在低成本模式转移中实现令人满意的性能，而SAM的所有参数都被冻结。

更新时间: 2024-03-18 08:40:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.04164v2

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Large-kernel convolutional neural networks (ConvNets) have recently received extensive research attention, but two unresolved and critical issues demand further investigation. 1) The architectures of existing large-kernel ConvNets largely follow the design principles of conventional ConvNets or transformers, while the architectural design for large-kernel ConvNets remains under-addressed. 2) As transformers have dominated multiple modalities, it remains to be investigated whether ConvNets also have a strong universal perception ability in domains beyond vision. In this paper, we contribute from two aspects. 1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep. Following such guidelines, our proposed large-kernel ConvNet shows leading performance in image recognition (ImageNet accuracy of 88.0%, ADE20K mIoU of 55.6%, and COCO box AP of 56.4%), demonstrating better performance and higher speed than the recent powerful competitors. 2) We discover large kernels are the key to unlocking the exceptional performance of ConvNets in domains where they were originally not proficient. With certain modality-related preprocessing approaches, the proposed model achieves state-of-the-art performance on time-series forecasting and audio recognition tasks even without modality-specific customization to the architecture. All the code and models are publicly available on GitHub and Huggingface.

Updated: 2024-03-18 08:37:24

标题: UniRepLKNet：用于音频、视频、点云、时间序列和图像识别的通用感知大核ConvNet

摘要: 最近，大卷积神经网络(ConvNets)受到了广泛的研究关注，但仍有两个未解决且关键的问题需要进一步调查。1) 现有大内核ConvNets的架构主要遵循传统ConvNets或transformers的设计原则，而大内核ConvNets的架构设计仍未得到充分解决。2) 随着transformers在多模态中占主导地位，尚待研究ConvNets在视觉以外领域是否也具有较强的普适感知能力。在本文中，我们从两个方面做出了贡献。1) 我们提出了四项用于设计大内核ConvNets的架构指导原则，其中的核心是利用大内核与小内核的本质特征的区别 - 它们可以广泛地看而不需深入。遵循这些指导原则，我们提出的大内核ConvNet在图像识别方面表现出领先的性能（ImageNet准确率为88.0%，ADE20K mIoU为55.6%，COCO框AP为56.4%），展示出比最近强大竞争对手更好的性能和更高的速度。2) 我们发现大内核是解锁ConvNets在原本不擅长的领域中异常性能的关键。通过某些与模态相关的预处理方法，提出的模型在时间序列预测和音频识别任务上实现了最先进的性能，甚至在不对架构进行特定于模态的定制化的情况下。所有代码和模型均可以在GitHub和Huggingface上公开获取。

更新时间: 2024-03-18 08:37:24

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.15599v2

Local Interpretations for Explainable Natural Language Processing: A Survey

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model's predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.

Updated: 2024-03-18 08:29:49

标题: 可解释自然语言处理的本地解释：一项调查

摘要: 随着深度学习技术在过去十年在各个领域的应用不断增加，对于黑盒模型不透明性的投诉也在增加，这导致了对深度学习模型透明度的增加关注。本研究探讨了各种方法来提高深度神经网络在自然语言处理（NLP）任务中的可解释性，包括机器翻译和情感分析。我们在本研究开始时就对解释性术语的定义及其各个方面进行了全面讨论。本调查中收集和总结的方法仅涉及局部解释，并被特别分为三类：1）通过相关输入特征解释模型的预测；2）通过自然语言解释进行解释；3）探查模型和单词表示的隐藏状态。

更新时间: 2024-03-18 08:29:49

领域: cs.CL,cs.AI,A.1; I.2.7

下载: http://arxiv.org/abs/2103.11072v3

Reinforcement Learning with Token-level Feedback for Controllable Text Generation

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally guided by coarse-grained (sentence/paragraph-level) feedback, which may lead to suboptimal performance owing to semantic twists or progressions within sentences. To tackle that, we propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation, and employs a "first-quantize-then-noise" paradigm to enhance the robustness of the RL algorithm.Furthermore, TOLE can be flexibly extended to multiple constraints with little computational expense. Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks. We have released our codes at https://github.com/WindyLee0822/CTG

Updated: 2024-03-18 08:18:37

标题: 使用令牌级反馈进行可控文本生成的强化学习

摘要: 为了满足现实世界应用的要求，控制大型语言模型（LLMs）的生成是至关重要的。先前的研究试图将强化学习（RL）引入可控文本生成中，然而大多数现有方法存在过拟合问题（基于微调的方法）或语义崩溃问题（后处理方法）。然而，当前的RL方法通常受到粗粒度（句子/段落级别）反馈的指导，这可能导致由于句子内的语义扭曲或进展而导致次优性能。为了解决这个问题，我们提出了一种名为TOLE的新型强化学习算法，该算法为可控文本生成制定了TOken-LEvel奖励，并采用了“首先量化，然后加噪声”的范式来增强RL算法的鲁棒性。此外，TOLE可以灵活扩展到多个约束条件，计算开销较小。实验结果表明，我们的算法在单属性和多属性控制任务上都能实现优越性能。我们已在https://github.com/WindyLee0822/CTG 上发布了我们的代码。

更新时间: 2024-03-18 08:18:37

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.11558v1

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%$\rightarrow$10.9%) and CommonsenseQA (36.3%$\rightarrow$47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Updated: 2024-03-18 07:56:48

标题: Quiet-STaR：语言模型可以自我教导在说话之前进行思考

摘要: 在写作和交谈时，人们有时会停下来思考。虽然以推理为重点的作品经常将推理框定为回答问题或完成代理任务的一种方法，但推理几乎隐含在几乎所有的书面文本中。例如，这适用于证明中未明确陈述的步骤，或者适用于支撑谈话的心灵理论。在Self-Taught Reasoner（STaR，Zelikman等，2022）中，有用的思考是通过从问答中的少数示例推断出理由并从导致正确答案的示例中学习而获得的。这是一个高度受限的设置--理想情况下，一个语言模型可以学会推断任意文本中未明确陈述的理由。我们提出了Quiet-STaR，这是STaR的一个泛化版本，其中语言模型学会在每个标记处生成理由来解释未来的文本，从而改进其预测能力。我们解决了一些关键挑战，包括1）生成延续的计算成本，2）LM最初不知道如何生成或使用内部思想，以及3）需要预测超出单个下一个标记。为了解决这些问题，我们提出了一种标记并行采样算法，使用可学习标记来指示思想的开始和结束，并采用了扩展的teacher-forcing技术。鼓舞人心的是，生成的理由在帮助模型难以预测的标记方面起到了不成比例的作用，并提高了LM直接回答困难问题的能力。特别是，在持续在一个互联网文本语料库上对LM进行Quiet-STaR的预训练之后，我们发现在GSM8K（5.9%→10.9%）和CommonsenseQA（36.3%→47.2%）上的零射击改进，并观察到自然文本中困难标记的困惑度改进。至关重要的是，这些改进不需要对这些任务进行微调。Quiet-STaR标志着LM可以以一种更一般和可扩展的方式学会推理的一步。

更新时间: 2024-03-18 07:56:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.09629v2

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

We consider the gradient descent flow widely used for the minimization of the $\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two modified versions; one adapted for the overparametrized setting, and the other for the underparametrized setting. Both have a clear and natural invariant geometric meaning, taking into account the pullback vector bundle structure in the overparametrized, and the pushforward vector bundle structure in the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry.

Updated: 2024-03-18 07:51:52

标题: 在深度学习中通过几何适应的梯度下降以均匀指数速率进行全局$\mathcal{L}^2$最小化

摘要: 我们考虑在深度学习网络中广泛使用的用于最小化$\mathcal{L}^2$代价函数的梯度下降流，并引入两个修改版本；一个适用于过参数化设置，另一个适用于欠参数化设置。两者都具有明确且自然的不变几何意义，考虑到在过参数化设置中的拉回向量丛结构和在欠参数化设置中的推向前向向量丛结构。在过参数化情况下，我们证明，只要满足一个秩条件，所有修改梯度下降的轨道都会以统一的指数收敛速度驱使$\mathcal{L}^2$代价到其全局最小值；因此，可以获得一个先验停止时间，以达到任何预定的全局最小值的接近程度。我们指出后者与次黎曼几何的关系。

更新时间: 2024-03-18 07:51:52

领域: cs.LG,cs.AI,math-ph,math.MP,math.OC,stat.ML,57R70, 62M45

下载: http://arxiv.org/abs/2311.15487v3

OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These include limited sample sizes, which hinder effective feature learning, variations among source domains, and sensitivities to changes in lighting and camera positions during imaging. These factors collectively compromise the accuracy of model predictions. Traditional AOI often fails to capitalize on the rich mechanism-parameter information from machines or inside images, including statistical parameters, which typically benefit AOI classification. To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net). A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network. This synergy enables a more refined fusion of semantic representations from different modalities. We further introduce feature refinement and a gating function in our OANet to optimize the combination of these features, enhancing inference and decision-making capabilities. Experimental outcomes show that our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.

Updated: 2024-03-18 07:41:39

标题: OCR就是你所需要的：将多模态性引入基于图像的缺陷检测系统

摘要: 自动光学检测（AOI）在制造过程中发挥着关键作用，主要利用高分辨率成像仪器进行扫描。它通过分析图像纹理或模式来检测异常，因此在工业制造和质量控制中是必不可少的工具。尽管其重要性，但为AOI部署模型常常面临挑战。这些挑战包括样本量有限，阻碍了有效特征学习，源域之间存在变化，对光照和摄像头位置的变化敏感。这些因素共同损害了模型预测的准确性。传统的AOI通常无法充分利用来自机器或图像内的丰富机制参数信息，包括通常有益于AOI分类的统计参数。为了解决这个问题，我们引入了一个根植于光学字符识别（OCR）的外部模态引导数据挖掘框架，以提取图像中的统计特征作为第二模态来增强性能，称为OANet（Ocr-Aoi-Net）。我们方法的一个关键方面是利用单一模态感知模型提取的外部模态特征与由卷积神经网络编码的图像特征进行对齐。这种协同作用使得不同模态的语义表示能够更精细地融合。我们进一步在OANet中引入特征细化和门控函数，以优化这些特征的组合，增强推理和决策能力。实验结果表明，我们的方法显著提高了缺陷检测模型的召回率，并在挑战性场景中保持高鲁棒性。

更新时间: 2024-03-18 07:41:39

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.11536v1

Impart: An Imperceptible and Effective Label-Specific Backdoor Attack

Backdoor attacks have been shown to impose severe threats to real security-critical scenarios. Although previous works can achieve high attack success rates, they either require access to victim models which may significantly reduce their threats in practice, or perform visually noticeable in stealthiness. Besides, there is still room to improve the attack success rates in the scenario that different poisoned samples may have different target labels (a.k.a., the all-to-all setting). In this study, we propose a novel imperceptible backdoor attack framework, named Impart, in the scenario where the attacker has no access to the victim model. Specifically, in order to enhance the attack capability of the all-to-all setting, we first propose a label-specific attack. Different from previous works which try to find an imperceptible pattern and add it to the source image as the poisoned image, we then propose to generate perturbations that align with the target label in the image feature by a surrogate model. In this way, the generated poisoned images are attached with knowledge about the target class, which significantly enhances the attack capability.

Updated: 2024-03-18 07:22:56

标题: Impart: 一种难以察觉且有效的标签特定后门攻击

摘要: 后门攻击已被证明对真实的安全关键场景造成严重威胁。尽管先前的研究可以实现高攻击成功率，但它们要么需要访问受害者模型，这在实践中可能会显著降低它们的威胁，要么在隐秘性方面表现出明显可见性。此外，在不同受污染样本可能具有不同目标标签（即全部对全部设置）的情况下，仍有改进攻击成功率的空间。在本研究中，我们提出了一种新颖的不可察觉的后门攻击框架，名为Impart，在攻击者无法访问受害者模型的情况下。具体来说，为了增强全部对全部设置的攻击能力，我们首先提出了一种标签特定攻击。与先前的研究试图找到一个不可察觉的模式并将其添加到源图像作为受污染图像不同，我们随后提出通过一个替代模型生成与图像特征中的目标标签对齐的扰动。通过这种方式，生成的受污染图像附带有关于目标类别的知识，显著增强了攻击能力。

更新时间: 2024-03-18 07:22:56

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2403.13017v1

Effectiveness Assessment of Recent Large Vision-Language Models

The advent of large vision-language models (LVLMs) represents a noteworthy advancement towards the pursuit of artificial general intelligence. However, the extent of their efficacy across both specialized and general tasks warrants further investigation. This article endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive comprehension of these innovative methodologies. To gauge their efficacy in specialized tasks, we tailor a comprehensive testbed comprising three distinct scenarios: natural, healthcare, and industrial, encompassing six challenging tasks. These tasks include salient, camouflaged, and transparent object detection, as well as polyp and skin lesion detection, alongside industrial anomaly detection. We examine the performance of three recent open-source LVLMs -- MiniGPT-v2, LLaVA-1.5, and Shikra -- in the realm of visual recognition and localization. Moreover, we conduct empirical investigations utilizing the aforementioned models alongside GPT-4V, assessing their multi-modal understanding capacities in general tasks such as object counting, absurd question answering, affordance reasoning, attribute recognition, and spatial relation reasoning. Our investigations reveal that these models demonstrate limited proficiency not only in specialized tasks but also in general tasks. We delve deeper into this inadequacy and suggest several potential factors, including limited cognition in specialized tasks, object hallucination, text-to-image interference, and decreased robustness in complex problems. We hope this study would provide valuable insights for the future development of LVLMs, augmenting their power in coping with both general and specialized applications.

Updated: 2024-03-18 07:21:01

标题: 最近大型视觉-语言模型的有效性评估

摘要: 大型视觉语言模型（LVLMs）的出现代表着人工通用智能追求的一个显著进步。然而，它们在专业和通用任务中的有效性程度需要进一步调查。本文旨在分别评估流行的LVLMs在专业和通用任务中的能力，旨在提供对这些创新方法论的全面理解。为了评估它们在专业任务中的有效性，我们设计了一个包括三个不同场景的全面测试平台：自然、医疗和工业，涵盖了六个具有挑战性的任务。这些任务包括显著、伪装和透明的物体检测，以及息肉和皮肤病变检测，以及工业异常检测。我们考察了三种最近的开源LVLMs（MiniGPT-v2、LLaVA-1.5和Shikra）在视觉识别和定位领域的表现。此外，我们还使用上述模型和GPT-4V进行了实证研究，评估它们在通用任务中的多模态理解能力，如物体计数、荒谬问题回答、能力推理、属性识别和空间关系推理。我们的调查显示，这些模型不仅在专业任务中表现出有限的熟练度，而且在通用任务中也是如此。我们深入探讨了这种不足，并提出了一些潜在因素，包括在专业任务中认知能力有限、物体幻觉、文本与图像的干扰，以及复杂问题中鲁棒性降低。我们希望这项研究能为未来LVLMs的发展提供有价值的见解，增强它们在应对通用和专业应用中的能力。

更新时间: 2024-03-18 07:21:01

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.04306v2

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

Examining the factors that the counter-speech uses is at the core of understanding the optimal methods for confronting hate speech online. Various studies assess the emotional base factor used in counter speech, such as emotion-empathy, offensiveness, and level of hostility. To better understand the counter-speech used in conversational interactions, this study distills persuasion modes into reason, emotion, and credibility and then evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) conversation interactions concerning racism, sexism, and religion. The evaluation covers the distinct behaviors of human versus generated counter-speech. We also assess the interplay between the replies' stance and each mode of persuasion in the counter-speech. Notably, we observe nuanced differences in the counter-speech persuasion modes for open and closed interactions -- especially on the topic level -- with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The generated counter-speech tends to exhibit an emotional persuasion mode, while human counters lean towards using reasoning. Furthermore, our study shows that reason as a persuasion mode tends to obtain more supportive replies than do other persuasion types. The findings highlight the potential of incorporating persuasion modes into studies about countering hate speech, as these modes can serve as an optimal means of explainability and paves the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counter-speech.

Updated: 2024-03-18 07:20:35

标题: 仇恨源于无知！对抗言语仇恨的劝说方式的精华提炼

摘要: 研究反言所使用的因素是理解如何最有效地应对网络仇恨言论的核心。各种研究评估了反言中使用的情感基础因素，如情感共鸣、冒犯性和敌意程度。为了更好地理解对话互动中使用的反言，本研究将说服方式归纳为理性、情感和信誉，然后评估它们在涉及种族主义、性别歧视和宗教的两种对话互动类型中的使用：封闭式（多轮）和开放式（单轮）对话互动。评估了人类与生成的反言之间的不同行为。我们还评估了回复立场与反言中每种说服方式之间的相互作用。值得注意的是，我们观察到开放和封闭互动中反言说服方式的微妙差异，尤其是在话题层面上，一般倾向于使用理性作为表达对仇恨言论的反立场的说服方式。生成的反言倾向于展现情感说服方式，而人类反对者更倾向于使用推理。此外，我们的研究表明，理性作为一种说服方式往往获得更多支持性回复，而其他说服类型则未必。研究结果突显了将说服方式纳入对抗仇恨言论研究的潜力，因为这些方式可以作为解释的最佳手段，并为进一步采纳回复立场和其在评估构成最佳反言中所起作用的角色铺平道路。

更新时间: 2024-03-18 07:20:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.15449v1

Efficient and Privacy-Preserving Federated Learning based on Full Homomorphic Encryption

Since the first theoretically feasible full homomorphic encryption (FHE) scheme was proposed in 2009, great progress has been achieved. These improvements have made FHE schemes come off the paper and become quite useful in solving some practical problems. In this paper, we propose a set of novel Federated Learning Schemes by utilizing the latest homomorphic encryption technologies, so as to improve the security, functionality and practicality at the same time. Comparisons have been given in four practical data sets separately from medical, business, biometric and financial fields, covering both horizontal and vertical federated learning scenarios. The experiment results show that our scheme achieves significant improvements in security, efficiency and practicality, compared with classical horizontal and vertical federated learning schemes.

Updated: 2024-03-18 07:13:09

标题: 高效且保护隐私的基于全同态加密的联邦学习

摘要: 自2009年提出第一个理论上可行的全同态加密（FHE）方案以来，取得了巨大进展。这些改进使得FHE方案从纸上变成了相当实用的解决一些实际问题的工具。在本文中，我们提出了一套利用最新同态加密技术的新型联邦学习方案，以提高安全性、功能性和实用性。我们在医疗、商业、生物特征和金融领域分别给出了四组实际数据集的比较，涵盖了水平和垂直联邦学习场景。实验结果表明，与经典的水平和垂直联邦学习方案相比，我们的方案在安全性、效率和实用性方面取得了显著的改进。

更新时间: 2024-03-18 07:13:09

领域: cs.CR

下载: http://arxiv.org/abs/2403.11519v1

LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory Optimization

This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules. Our method allows for the introduction of constraints information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This "gray box" method marries the optimization-based safety and interpretability with the powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and on the real robot. In simulation, LeTO achieves a success rate comparable to state-of-the-art imitation learning methods, but the generated trajectories are of less uncertainty, higher quality, and smoother. In real-world experiments, we deployed LeTO to handle constraints-critical tasks. The results show the effectiveness of LeTO comparing with state-of-the-art imitation learning approaches. We release our code at https://github.com/ZhengtongXu/LeTO.

Updated: 2024-03-18 07:10:02

标题: LeTO：使用可微分轨迹优化学习受约束的视觉动作策略

摘要: 这篇论文介绍了LeTO，一种通过可微轨迹优化学习受限视觉动作策略的方法。我们的方法独特地将可微优化层集成到神经网络中。通过将优化层形式化为轨迹优化问题，我们使模型能够端到端地以安全和受控的方式生成动作，而无需额外的模块。我们的方法允许在训练过程中引入约束信息，从而平衡满足约束、平滑轨迹和最小化误差与示范之间的训练目标。这种“灰盒”方法将基于优化的安全性和可解释性与神经网络的强大表征能力相结合。我们在模拟环境和真实机器人上定量评估了LeTO。在模拟环境中，LeTO实现了与最先进的模仿学习方法相媲美的成功率，但生成的轨迹具有更少的不确定性、更高的质量和更平滑。在现实世界的实验中，我们部署了LeTO来处理关键约束任务。结果显示LeTO与最先进的模仿学习方法相比的有效性。我们在https://github.com/ZhengtongXu/LeTO发布了我们的代码。

更新时间: 2024-03-18 07:10:02

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2401.17500v2

End-To-End Underwater Video Enhancement: Dataset and Model

Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on developing image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this gap, we construct the Synthetic Underwater Video Enhancement (SUVE) dataset, comprising 840 diverse underwater-style videos paired with ground-truth reference videos. Based on this dataset, we train a novel underwater video enhancement model, UVENet, which utilizes inter-frame relationships to achieve better enhancement performance. Through extensive experiments on both synthetic and real underwater videos, we demonstrate the effectiveness of our approach. This study represents the first comprehensive exploration of UVE to our knowledge. The code is available at https://anonymous.4open.science/r/UVENet.

Updated: 2024-03-18 06:24:46

标题: 端到端水下视频增强：数据集与模型

摘要: 水下视频增强（UVE）旨在提高水下视频的可见度和帧质量，这对海洋研究和探索具有重要意义。然而，现有方法主要集中在开发图像增强算法，以独立增强每一帧。缺乏专门针对UVE任务定制的监督数据集和模型。为填补这一空白，我们构建了合成水下视频增强（SUVE）数据集，包括840个多样的水下风格视频，配对地面真实参考视频。基于这一数据集，我们训练了一种新颖的水下视频增强模型UVENet，利用帧间关系实现更好的增强性能。通过对合成和真实水下视频的广泛实验，我们展示了我们方法的有效性。据我们所知，本研究代表了对UVE的首次全面探索。代码可在https://anonymous.4open.science/r/UVENet上找到。

更新时间: 2024-03-18 06:24:46

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.11506v1

MLVICX: Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning

Self-supervised learning (SSL) is potentially useful in reducing the need for manual annotation and making deep learning models accessible for medical image analysis tasks. By leveraging the representations learned from unlabeled data, self-supervised models perform well on tasks that require little to no fine-tuning. However, for medical images, like chest X-rays, which are characterized by complex anatomical structures and diverse clinical conditions, there arises a need for representation learning techniques that can encode fine-grained details while preserving the broader contextual information. In this context, we introduce MLVICX (Multi-Level Variance-Covariance Exploration for Chest X-ray Self-Supervised Representation Learning), an approach to capture rich representations in the form of embeddings from chest X-ray images. Central to our approach is a novel multi-level variance and covariance exploration strategy that empowers the model to detect diagnostically meaningful patterns while reducing redundancy effectively. By enhancing the variance and covariance of the learned embeddings, MLVICX promotes the retention of critical medical insights by adapting both global and local contextual details. We demonstrate the performance of MLVICX in advancing self-supervised chest X-ray representation learning through comprehensive experiments. The performance enhancements we observe across various downstream tasks highlight the significance of the proposed approach in enhancing the utility of chest X-ray embeddings for precision medical diagnosis and comprehensive image analysis. For pertaining, we used the NIH-Chest X-ray dataset, while for downstream tasks, we utilized NIH-Chest X-ray, Vinbig-CXR, RSNA pneumonia, and SIIM-ACR Pneumothorax datasets. Overall, we observe more than 3% performance gains over SOTA SSL approaches in various downstream tasks.

Updated: 2024-03-18 06:19:37

标题: MLVICX: 胸部X射线自监督表示学习的多级方差-协方差探索

摘要: 自监督学习（SSL）在减少手动标注的需求和使深度学习模型适用于医学图像分析任务方面具有潜在的用途。通过利用从未标记数据中学习到的表示，自监督模型在需要很少或根本不需要微调的任务上表现良好。然而，对于像胸部X射线这样的医学图像，其特点是复杂的解剖结构和多样化的临床病况，需要一种可以编码细粒度细节同时保留更广泛上下文信息的表示学习技术。在这个背景下，我们引入了MLVICX（用于胸部X射线自监督表示学习的多级方差-协方差探索）方法，该方法旨在从胸部X射线图像中捕获丰富的嵌入表示。我们方法的核心是一种新颖的多级方差和协方差探索策略，使模型能够有效地检测诊断有意义的模式，同时减少冗余。通过增强学习到的嵌入的方差和协方差，MLVICX促进了通过调整全局和局部上下文细节来保留关键的医学见解。我们通过全面实验展示了MLVICX在推进自监督胸部X射线表示学习方面的性能。我们观察到在各种下游任务中性能提升，突显了所提出方法在提高胸部X射线嵌入在精准医学诊断和综合图像分析中的实用性的重要性。对于相关性，我们使用了NIH-Chest X射线数据集，而对于下游任务，我们利用了NIH-Chest X射线、Vinbig-CXR、RSNA肺炎和SIIM-ACR气胸数据集。总体而言，在各种下游任务中，我们观察到比SOTA SSL方法更多超过3%的性能增益。

更新时间: 2024-03-18 06:19:37

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.11504v1

Controllable Data Generation by Deep Learning: A Review

Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning has created the opportunity for expressive methods to learn the underlying representation and properties of data. Such capability provides new ways of determining the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationships to generate structural data, given the desired properties. This article is a systematic review that explains this promising research area, commonly known as controllable deep data generation. First, the article raises the potential challenges and provides preliminaries. Then the article formally defines controllable deep data generation, proposes a taxonomy on various techniques and summarizes the evaluation metrics in this specific domain. After that, the article introduces exciting applications of controllable deep data generation, experimentally analyzes and compares existing works. Finally, this article highlights the promising future directions of controllable deep data generation and identifies five potential challenges.

Updated: 2024-03-18 06:06:48

标题: 深度学习可控数据生成：一项综述

摘要: 设计和生成具有目标属性的新数据已经吸引了各种关键应用，例如分子设计、图像编辑和语音合成。传统的手工制作方法严重依赖专业经验和大量人力投入，但仍然受制于科学知识的不足和低效率，难以支持有效和高效的数据生成。最近，深度学习的进步为表达方法学习数据的基本表示和属性创造了机会。这种能力提供了确定数据的结构模式和功能属性之间相互关系的新方法，并利用这些关系生成具有所需属性的结构数据。本文是一篇系统性的综述，解释了这一有前途的研究领域，通常被称为可控深度数据生成。首先，文章提出了潜在挑战并提供了基础知识。然后，文章正式定义了可控深度数据生成，提出了各种技术的分类法，并总结了该特定领域的评估指标。之后，文章介绍了可控深度数据生成的激动人心的应用，实验性地分析和比较了现有的研究成果。最后，本文强调了可控深度数据生成的有前途的未来方向，并确定了五个潜在的挑战。

更新时间: 2024-03-18 06:06:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2207.09542v6

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sensing modalities, high-accuracy ground truth, and diverse challenging environments across three Eurasian university campuses. MCD comprises both CCS (Classical Cylindrical Spinning) and NRE (Non-Repetitive Epicyclic) lidars, high-quality IMUs (Inertial Measurement Units), cameras, and UWB (Ultra-WideBand) sensors. Furthermore, in a pioneering effort, we introduce semantic annotations of 29 classes over 59k sparse NRE lidar scans across three domains, thus providing a novel challenge to existing semantic segmentation research upon this largely unexplored lidar modality. Finally, we propose, for the first time to the best of our knowledge, continuous-time ground truth based on optimization-based registration of lidar-inertial data on large survey-grade prior maps, which are also publicly released, each several times the size of existing ones. We conduct a rigorous evaluation of numerous state-of-the-art algorithms on MCD, report their performance, and highlight the challenges awaiting solutions from the research community.

Updated: 2024-03-18 06:00:38

标题: MCD：用于机器人感知的多样化大规模多校区数据集

摘要: 感知在各种机器人应用中起着至关重要的作用。然而，现有的注释充分的数据集偏向于自动驾驶场景，而未标记的SLAM数据集很快就会过拟合，并且通常缺乏环境和领域的变化。为了拓展这些领域的前沿，我们介绍了一个名为MCD（多校园数据集）的全面数据集，涵盖了各种传感模式、高精度地面真实数据以及三个欧亚大学校园中多样化的挑战性环境。MCD包括经典圆柱旋转（CCS）和非重复圆环（NRE）激光雷达、高质量IMU（惯性测量单元）、摄像头和UWB（超宽带）传感器。此外，在开创性的努力中，我们在三个领域的59k稀疏NRE激光扫描上引入了29个类别的语义注释，从而为现有主要未开发的激光雷达模态上的语义分割研究提供了新的挑战。最后，据我们所知，我们首次提出了基于连续时间地面真实数据的方法，该方法基于基于优化的激光惯性数据在大型测绘等级先验地图上的注册，这些地图也公开发布，每个地图的尺寸是现有地图的数倍。我们在MCD上对众多最先进的算法进行了严格评估，报告了它们的性能，并强调了等待研究社区解决的挑战。

更新时间: 2024-03-18 06:00:38

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.11496v1

BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions

Caches are used to reduce the speed differential between the CPU and memory to improve the performance of modern processors. However, attackers can use contention-based cache timing attacks to steal sensitive information from victim processes through carefully designed cache eviction sets. And L1 data cache attacks are widely exploited and pose a significant privacy and confidentiality threat. Existing hardware-based countermeasures mainly focus on cache partitioning, randomization, and cache line flushing, which unfortunately either incur high overhead or can be circumvented by sophisticated attacks. In this paper, we propose a novel hardware-software co-design called BackCache with the idea of always achieving cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions. To improve the security of BackCache, we introduce a randomly used replacement policy (RURP) and a dynamic backup cache resizing mechanism. We also present a theoretical security analysis to demonstrate the effectiveness of BackCache. Our evaluation on the gem5 simulator shows that BackCache can degrade the performance by 1.33%, 7.34%, and 7.59% For OS kernel, single-thread, and multi-thread benchmarks.

Updated: 2024-03-18 05:52:15

标题: BackCache: 通过隐藏缓存行逐出来减轻基于争用的缓存时序攻击

摘要: 缓存用于减少CPU和内存之间的速度差异，以提高现代处理器的性能。然而，攻击者可以利用基于争用的缓存定时攻击，通过精心设计的缓存驱逐集，从受害进程中窃取敏感信息。L1数据缓存攻击被广泛利用，并构成重大的隐私和保密威胁。现有的基于硬件的对策主要集中在缓存分区、随机化和缓存行刷新上，不幸的是，这些方法要么会产生很高的开销，要么可能被复杂的攻击所规避。本文提出了一种新颖的硬件-软件协同设计，称为BackCache，其核心思想是始终实现缓存命中而不是缓存未命中，以减轻L1数据缓存上的基于争用的缓存定时攻击。BackCache将从L1数据缓存中驱逐的缓存行放置到一个完全关联的备用缓存中，以隐藏这些驱逐。为了提高BackCache的安全性，我们引入了一个随机使用的替换策略（RURP）和一个动态备用缓存调整机制。我们还提出了一个理论安全性分析，以展示BackCache的有效性。我们在gem5模拟器上的评估显示，BackCache可以使OS内核、单线程和多线程基准测试的性能分别降低1.33%、7.34%和7.59%。

更新时间: 2024-03-18 05:52:15

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2304.10268v4

Budget Recycling Differential Privacy

Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing ``out-of-bound'' noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By ``soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.

Updated: 2024-03-18 03:43:45

标题: 预算回收差分隐私

摘要: 差分隐私（DP）机制通常通过产生``超出界限''的嘈杂结果来强制减少数据效用，以实现严格的隐私预算。我们引入了预算循环差分隐私（BR-DP）框架，旨在为广泛存在的DP机制提供软边界的嘈杂输出。通过``软边界''，我们指的是机制释放大多数输出在预定义的误差边界内的能力，从而提高效用并同时保持隐私。BR-DP的核心包括两个组件：一个负责每次迭代生成嘈杂答案的DP内核，以及一个以概率方式回收/再生或释放嘈杂答案的回收器。我们深入探讨了BR-DP的隐私核算，最终发展出一种预算原则，可以在DP内核和回收器之间最优地分配可用预算。此外，我们介绍了用于紧密BR-DP核算在组合场景中的算法，我们的研究结果表明，与DP相比，BR-DP在组合后实现了减少的隐私泄漏。此外，我们探讨了在BR-DP框架内通过子采样实现隐私放大的概念，并提出了针对各种查询的BR-DP的最佳采样率。我们使用真实数据进行实验，结果表明BR-DP在提供DP机制所提供的效用-隐私权衡方面的效果。

更新时间: 2024-03-18 03:43:45

领域: cs.CR,cs.DS,eess.SP

下载: http://arxiv.org/abs/2403.11445v1

On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing

With ChatGPT under the spotlight, utilizing large language models (LLMs) to assist academic writing has drawn a significant amount of debate in the community. In this paper, we aim to present a comprehensive study of the detectability of ChatGPT-generated content within the academic literature, particularly focusing on the abstracts of scientific papers, to offer holistic support for the future development of LLM applications and policies in academia. Specifically, we first present GPABench2, a benchmarking dataset of over 2.8 million comparative samples of human-written, GPT-written, GPT-completed, and GPT-polished abstracts of scientific writing in computer science, physics, and humanities and social sciences. Second, we explore the methodology for detecting ChatGPT content. We start by examining the unsatisfactory performance of existing ChatGPT detecting tools and the challenges faced by human evaluators (including more than 240 researchers or students). We then test the hand-crafted linguistic features models as a baseline and develop a deep neural framework named CheckGPT to better capture the subtle and deep semantic and linguistic patterns in ChatGPT written literature. Last, we conduct comprehensive experiments to validate the proposed CheckGPT framework in each benchmarking task over different disciplines. To evaluate the detectability of ChatGPT content, we conduct extensive experiments on the transferability, prompt engineering, and robustness of CheckGPT.

Updated: 2024-03-18 03:14:54

标题: 关于ChatGPT内容的可检测性：通过学术写作的视角进行基准测试、方法论和评估

摘要: 在ChatGPT备受关注的情况下，利用大型语言模型(LLMs)来辅助学术写作在学术界引起了大量的争论。本文旨在对ChatGPT生成的内容在学术文献中的可检测性进行全面研究，特别关注科学论文的摘要，以为未来LLM应用和政策在学术界的发展提供全面支持。具体来说，我们首先介绍了GPABench2，这是一个基准数据集，包含超过280万个人工编写、GPT编写、GPT完成和GPT润色的计算机科学、物理学以及人文社会科学领域的科学写作摘要的比较样本。其次，我们探讨了检测ChatGPT内容的方法论。我们首先检查现有的ChatGPT检测工具的表现不佳以及人类评估者(包括240多名研究人员或学生)面临的挑战。然后，我们测试手工设计的语言特征模型作为基准，并开发了一个名为CheckGPT的深度神经框架，以更好地捕捉ChatGPT写作文学中微妙和深层语义和语言模式。最后，我们进行了全面实验，验证了在不同学科的每个基准任务中提出的CheckGPT框架。为了评估ChatGPT内容的可检测性，我们进行了对CheckGPT的可转移性、提示工程和稳健性的广泛实验。

更新时间: 2024-03-18 03:14:54

领域: cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2306.05524v2

Measuring Quantum Information Leakage Under Detection Threat

Gentle quantum leakage is proposed as a measure of information leakage to arbitrary eavesdroppers that aim to avoid detection. Gentle (also sometimes referred to as weak or non-demolition) measurements are used to encode the desire of the eavesdropper to evade detection. The gentle quantum leakage meets important axioms proposed for measures of information leakage including positivity, independence, and unitary invariance. Global depolarizing noise, an important family of physical noise in quantum devices, is shown to reduce gentle quantum leakage (and hence can be used as a mechanism to ensure privacy or security). A lower bound for the gentle quantum leakage based on asymmetric approximate cloning is presented. This lower bound relates information leakage to mutual incompatibility of quantum states. A numerical example, based on the encoding in the celebrated BB84 quantum key distribution algorithm, is used to demonstrate the results.

Updated: 2024-03-18 03:07:09

标题: 测量在检测威胁下的量子信息泄漏

摘要: 温和量子泄漏被提议作为衡量信息泄漏给任意试图避免被检测的窃听者的指标。温和（有时也被称为弱或非破坏性）测量被用来编码窃听者避免被检测的愿望。温和量子泄漏符合信息泄漏度量的重要公理，包括正性、独立性和酉不变性。全局去极化噪声，作为量子设备中重要的物理噪声家族，被证明可以降低温和量子泄漏（因此可以用作确保隐私或安全的机制）。基于不对称近似克隆提出了温和量子泄漏的下限。这个下限将信息泄漏与量子态的互不兼容性联系起来。基于著名的BB84量子密钥分发算法中的编码，使用了一个数值示例来证明这些结果。

更新时间: 2024-03-18 03:07:09

领域: quant-ph,cs.CR,cs.IT,cs.SY,eess.SY,math.IT

下载: http://arxiv.org/abs/2403.11433v1