Arxiv Day: Article

AMG: Avatar Motion Guided Video Generation

Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware control; whereas 3D avatar-based approaches, while offering more freedom in control, lack photorealism and cannot be harmonized seamlessly with background scene. We propose AMG, a method that combines the 2D photorealism and 3D controllability by conditioning video diffusion models on controlled rendering of 3D avatars. We additionally introduce a novel data processing pipeline that reconstructs and renders human avatar movements from dynamic camera videos. AMG is the first method that enables multi-person diffusion video generation with precise control over camera positions, human motions, and background style. We also demonstrate through extensive evaluation that it outperforms existing human video generation methods conditioned on pose sequences or driving videos in terms of realism and adaptability.

Updated: 2024-09-02 23:59:01

标题: AMG：头像动作引导视频生成

摘要: 随着深度生成模型的进步，人类视频生成任务受到了越来越多的关注。生成具有人类运动的逼真视频在本质上具有挑战性，因为人体拓扑结构的复杂性和对视觉伪影的敏感性。广泛研究的2D媒体生成方法利用大规模的人类媒体数据集，但在3D感知控制方面仍有困难；而基于3D角色的方法虽然在控制方面提供了更多自由，但缺乏逼真性，并且无法与背景场景无缝协调。我们提出了AMG，一种将2D逼真性和3D可控性相结合的方法，通过将视频扩散模型置于对3D角色的控制渲染之上。我们还引入了一种新颖的数据处理流程，从动态摄像头视频中重建和渲染人类角色的运动。AMG是第一个能够实现多人扩散视频生成，并能精确控制摄像头位置、人类动作和背景风格的方法。通过广泛的评估，我们还证明了它在逼真性和适应性方面优于现有的基于姿势序列或驱动视频的人类视频生成方法。

更新时间: 2024-09-02 23:59:01

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2409.01502v1

Advanced Predictive Modeling for Enhanced Mortality Prediction in ICU Stroke Patients Using Clinical Data

Background: Stroke is second-leading cause of disability and death among adults. Approximately 17 million people suffer from a stroke annually, with about 85% being ischemic strokes. Predicting mortality of ischemic stroke patients in intensive care unit (ICU) is crucial for optimizing treatment strategies, allocating resources, and improving survival rates. Methods: We acquired data on ICU ischemic stroke patients from MIMIC-IV database, including diagnoses, vital signs, laboratory tests, medications, procedures, treatments, and clinical notes. Stroke patients were randomly divided into training (70%, n=2441), test (15%, n=523), and validation (15%, n=523) sets. To address data imbalances, we applied Synthetic Minority Over-sampling Technique (SMOTE). We selected 30 features for model development, significantly reducing feature number from 1095 used in the best study. We developed a deep learning model to assess mortality risk and implemented several baseline machine learning models for comparison. Results: XGB-DL model, combining XGBoost for feature selection and deep learning, effectively minimized false positives. Model's AUROC improved from 0.865 (95% CI: 0.821 - 0.905) on first day to 0.903 (95% CI: 0.868 - 0.936) by fourth day using data from 3,646 ICU mortality patients in the MIMIC-IV database with 0.945 AUROC (95% CI: 0.944 - 0.947) during training. Although other ML models also performed well in terms of AUROC, we chose Deep Learning for its higher specificity. Conclusions: Through enhanced feature selection and data cleaning, proposed model demonstrates a 13% AUROC improvement compared to existing models while reducing feature number from 1095 in previous studies to 30.

Updated: 2024-09-02 23:41:47

标题: ICU中风患者临床数据的高级预测建模，提升死亡预测

摘要: 背景：中风是成年人残疾和死亡的第二大原因。每年大约有1700万人患中风，其中约85%为缺血性中风。预测重症监护病房（ICU）中缺血性中风患者的死亡率对优化治疗策略、分配资源和提高生存率至关重要。方法：我们从MIMIC-IV数据库获取了ICU缺血性中风患者的数据，包括诊断、生命体征、实验室检测、药物、程序、治疗和临床记录。中风患者被随机分为训练组（70%，n=2441）、测试组（15%，n=523）和验证组（15%，n=523）。为解决数据不平衡问题，我们应用了合成少数过采样技术（SMOTE）。我们选择了30个特征进行模型开发，显著减少了特征数量，该数量在最佳研究中使用了1095个。我们开发了一个深度学习模型来评估死亡风险，并实施了几个基准机器学习模型进行比较。结果：XGB-DL模型，结合XGBoost进行特征选择和深度学习，有效地最小化了假阳性。模型的AUROC从第一天的0.865（95% CI: 0.821-0.905）提高到第四天的0.903（95% CI: 0.868-0.936），使用了MIMIC-IV数据库中3646名ICU死亡患者的数据，训练时的AUROC为0.945（95% CI: 0.944-0.947）。尽管其他机器学习模型在AUROC方面表现良好，但我们选择深度学习是因为其更高的特异性。结论：通过增强特征选择和数据清洗，所提出的模型与现有模型相比，在将特征数量从先前研究中的1095个减少到30个的情况下，AUROC有13%的改善。

更新时间: 2024-09-02 23:41:47

领域: cs.LG

下载: http://arxiv.org/abs/2407.14211v2

A practical generalization metric for deep networks benchmarking

There is an ongoing and dedicated effort to estimate bounds on the generalization error of deep learning models, coupled with an increasing interest with practical metrics that can be used to experimentally evaluate a model's ability to generalize. This interest is not only driven by practical considerations but is also vital for theoretical research, as theoretical estimations require practical validation. However, there is currently a lack of research on benchmarking the generalization capacity of various deep networks and verifying these theoretical estimations. This paper aims to introduce a practical generalization metric for benchmarking different deep networks and proposes a novel testbed for the verification of theoretical estimations. Our findings indicate that a deep network's generalization capacity in classification tasks is contingent upon both classification accuracy and the diversity of unseen data. The proposed metric system is capable of quantifying the accuracy of deep learning models and the diversity of data, providing an intuitive and quantitative evaluation method, a trade-off point. Furthermore, we compare our practical metric with existing generalization theoretical estimations using our benchmarking testbed. It is discouraging to note that most of the available generalization estimations do not correlate with the practical measurements obtained using our proposed practical metric. On the other hand, this finding is significant as it exposes the shortcomings of theoretical estimations and inspires new exploration.

Updated: 2024-09-02 23:38:25

标题: 一个用于深度网络基准测试的实用泛化度量标准

摘要: 目前正在进行一项持续且专注的工作，旨在估计深度学习模型的泛化误差界限，并且越来越多地关注可用于实验评估模型泛化能力的实用指标。这种兴趣不仅受到实际考虑的驱使，而且对于理论研究也至关重要，因为理论估计需要实际验证。然而，目前缺乏关于基准测试各种深度网络的泛化能力并验证这些理论估计的研究。本文旨在介绍一个用于基准测试不同深度网络的实用泛化度量，并提出一个新颖的测试平台以验证理论估计。我们的研究结果表明，在分类任务中，深度网络的泛化能力取决于分类准确度和未见数据的多样性。所提出的度量系统能够量化深度学习模型的准确性和数据的多样性，提供一种直观和定量评估方法，达到一个权衡点。此外，我们将我们的实用度量与现有的泛化理论估计使用我们提出的基准测试平台进行比较。令人沮丧的是，大多数可用的泛化估计与使用我们提出的实用度量获得的实际测量结果不相关。另一方面，这一发现具有重要意义，因为它揭示了理论估计的不足，并激发了新的探索。

更新时间: 2024-09-02 23:38:25

领域: cs.LG

下载: http://arxiv.org/abs/2409.01498v1

PhysORD: A Neuro-Symbolic Approach for Physics-infused Motion Prediction in Off-road Driving

Motion prediction is critical for autonomous off-road driving, however, it presents significantly more challenges than on-road driving because of the complex interaction between the vehicle and the terrain. Traditional physics-based approaches encounter difficulties in accurately modeling dynamic systems and external disturbance. In contrast, data-driven neural networks require extensive datasets and struggle with explicitly capturing the fundamental physical laws, which can easily lead to poor generalization. By merging the advantages of both methods, neuro-symbolic approaches present a promising direction. These methods embed physical laws into neural models, potentially significantly improving generalization capabilities. However, no prior works were evaluated in real-world settings for off-road driving. To bridge this gap, we present PhysORD, a neural-symbolic approach integrating the conservation law, i.e., the Euler-Lagrange equation, into data-driven neural models for motion prediction in off-road driving. Our experiments showed that PhysORD can accurately predict vehicle motion and tolerate external disturbance by modeling uncertainties. It outperforms existing methods both in accuracy and efficiency and demonstrates data-efficient learning and generalization ability in long-term prediction.

Updated: 2024-09-02 23:35:07

标题: PhysORD：一种神经符号方法，用于越野驾驶中注入物理学的运动预测

摘要: 运动预测对于自主越野驾驶至关重要，然而，与在路上驾驶相比，它面临着更多挑战，因为车辆与地形之间的复杂相互作用。传统基于物理的方法在准确建模动态系统和外部干扰方面遇到困难。相比之下，基于数据驱动的神经网络需要大量数据集，并且难以明确捕捉基本物理定律，这很容易导致泛化能力差。通过融合这两种方法的优势，神经符号方法呈现出一种有前途的方向。这些方法将物理定律嵌入神经模型中，潜在地显著提高泛化能力。然而，在越野驾驶的实际环境中尚未评估过任何先前的工作。为了弥补这一差距，我们提出了PhysORD，这是一种神经符号方法，将守恒定律（即欧拉-拉格朗日方程）整合到数据驱动的神经模型中，用于越野驾驶中的运动预测。我们的实验表明，PhysORD能够准确预测车辆运动并容忍外部干扰通过建模不确定性。它在准确性和效率方面均优于现有方法，并展示了在长期预测中的数据高效学习和泛化能力。

更新时间: 2024-09-02 23:35:07

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2404.01596v2

EarthGen: Generating the World from Top-Down Views

In this work, we present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. Pairing this concept with a tiled generation method yields a scalable system that can generate thousands of square kilometers of realistic Earth surfaces at high resolution. We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom. We also demonstrate its ability to create diverse and coherent scenes via an interactive gigapixel-scale generated map. Finally, we demonstrate how our system can be extended to enable novel content creation applications including controllable world generation and 3D scene generation.

Updated: 2024-09-02 23:17:56

标题: EarthGen：从自上而下的视角生成世界

摘要: 在这项工作中，我们提出了一种新颖的广泛多尺度生成地形建模方法。我们模型的核心是一系列超分辨扩散模型，可以组合在一起产生一致的图像跨越多个分辨率。将这个概念与瓦片生成方法相结合，可以产生数千平方公里的高分辨率真实地球表面的可扩展系统。我们在从必应地图收集的数据集上评估了我们的方法，并展示了它在1024倍放大的极端超分辨率任务上优于超分辨基线。我们还展示了它通过一个交互式的吉卜赛像素规模生成的地图能够创建多样化和连贯的场景。最后，我们展示了我们的系统如何扩展，以实现可控世界生成和3D场景生成等新颖内容创作应用。

更新时间: 2024-09-02 23:17:56

领域: cs.CV,cs.AI,J.2; I.4.8

下载: http://arxiv.org/abs/2409.01491v1

AlphaFold Meets Flow Matching for Generating Protein Ensembles

The biological functions of proteins often depend on dynamic structural ensembles. In this work, we develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins. We repurpose highly accurate single-state predictors such as AlphaFold and ESMFold and fine-tune them under a custom flow matching framework to obtain sequence-conditoned generative models of protein structure called AlphaFlow and ESMFlow. When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling. When further trained on ensembles from all-atom MD, our method accurately captures conformational flexibility, positional distributions, and higher-order ensemble observables for unseen proteins. Moreover, our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories, demonstrating its potential as a proxy for expensive physics-based simulations. Code is available at https://github.com/bjing2016/alphaflow.

Updated: 2024-09-02 22:43:33

标题: AlphaFold与流匹配相遇：生成蛋白质集合

摘要: 蛋白质的生物功能通常依赖于动态结构集合。在这项工作中，我们开发了一种基于流的生成建模方法，用于学习和采样蛋白质的构象景观。我们重新利用高度准确的单态预测器，如AlphaFold和ESMFold，并在自定义流匹配框架下对它们进行微调，以获得称为AlphaFlow和ESMFlow的蛋白质结构的序列条件生成模型。在PDB上进行训练和评估时，与通过MSA子采样的AlphaFold相比，我们的方法提供了更优越的精度和多样性组合。当进一步在全原子MD的集合上进行训练时，我们的方法准确捕捉了未见蛋白质的构象灵活性、位置分布和高阶集合可观测量。此外，我们的方法可以通过更快的时钟收敛到某些平衡性质，使静态PDB结构多样化，而不需要复制MD轨迹，展示了它作为昂贵基于物理的模拟的代理的潜力。代码可在https://github.com/bjing2016/alphaflow 获得。

更新时间: 2024-09-02 22:43:33

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2402.04845v2

Dissociation of Faithful and Unfaithful Reasoning in LLMs

Large language models (LLMs) often improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. We investigate how LLMs recover from errors in Chain of Thought. Through analysis of error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, which occurs when models arrive at the correct answer despite invalid reasoning text. We identify factors that shift LLM recovery behavior: LLMs recover more frequently from obvious errors and in contexts that provide more evidence for the correct answer. Critically, these factors have divergent effects on faithful and unfaithful recoveries. Our results indicate that there are distinct mechanisms driving faithful and unfaithful error recoveries. Selective targeting of these mechanisms may be able to drive down the rate of unfaithful reasoning and improve model interpretability.

Updated: 2024-09-02 22:40:20

标题: LLMs中忠实和不忠实推理的分离

摘要: 大型语言模型（LLMs）通常在生成思维链推理文本后再生成答案时，在下游任务中提高性能。我们调查了LLMs如何从错误的思维链中恢复过来。通过分析错误恢复行为，我们发现了在思维链中的不忠实性，即模型尽管给出无效的推理文本，却到达了正确答案。我们确定了影响LLM恢复行为的因素：LLMs更频繁地从明显错误中恢复，并且在提供更多正确答案证据的情境中恢复。至关重要的是，这些因素对忠实和不忠实的恢复产生了不同的影响。我们的结果表明，存在驱动忠实和不忠实错误恢复的不同机制。有选择性地针对这些机制可能能够降低不忠实推理的比率并提高模型的可解释性。

更新时间: 2024-09-02 22:40:20

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.15092v2

Watermarking of Quantum Circuits

Quantum circuits constitute Intellectual Property (IP) of the quantum developers and users, which needs to be protected from theft by adversarial agents, e.g., the quantum cloud provider or a rogue adversary present in the cloud. This necessitates the exploration of low-overhead techniques applicable to near-term quantum devices, to trace the quantum circuits/algorithms\textquotesingle{} IP and their output. We present two such lightweight watermarking techniques to prove ownership in the event of an adversary cloning the circuit design. For the first technique, a rotation gate is placed on ancilla qubits combined with other gate(s) at the output of the circuit. For the second method, a set of random gates are inserted in the middle of the circuit followed by its inverse, separated from the circuit by a barrier. These models are combined and applied on benchmark circuits, and the circuit depth, 2-qubit gate count, probability of successful trials (PST), and probabilistic proof of authorship (PPA) are compared against the state-of-the-art. The PST is reduced by a minuscule 0.53\% against the non-watermarked benchmarks and is up to 22.69\% higher compared to existing techniques. The circuit depth has been reduced by up to 27.7\% as against the state-of-the-art. The PPA is astronomically smaller than existing watermarks.

Updated: 2024-09-02 22:37:47

标题: 量子电路的水印技术

摘要: 量子电路构成量子开发者和用户的知识产权（IP），需要保护免受恶意代理人（例如量子云服务提供商或存在于云中的不端对手）的窃取。这需要探索适用于近期量子设备的低开销技术，以追踪量子电路/算法的知识产权及其输出。我们提出了两种轻量级水印技术，以证明所有权，以防对手克隆电路设计。对于第一种技术，在辅助量子比特上放置旋转门，与电路输出处的其他门组合在一起。对于第二种方法，在电路中间插入一组随机门，后跟其逆操作，通过隔离栅与电路分开。这些模型结合并应用于基准电路，并将电路深度、2比特门计数、成功试验的概率（PST）和概率性证明所有权（PPA）与现有技术进行比较。与非水印基准相比，成功试验的概率（PST）仅降低了微小的0.53％，相比于现有技术高出多达22.69％。电路深度降低了多达27.7％，超过现有技术。概率性证明所有权（PPA）远小于现有水印技术。

更新时间: 2024-09-02 22:37:47

领域: cs.CR

下载: http://arxiv.org/abs/2409.01484v1

Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning

Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. These models use conditionally activated feedforward subnetworks in transformer blocks, allowing for a separation between total model parameters and per-example computation. However, large token-routed SMoE models face a significant challenge: during inference, the entire model must be used for a sequence or a batch, resulting in high latencies in a distributed setting that offsets the advantages of per-token sparse activation. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures, mainly modulating the choice of expert counts in pretraining. We investigate whether such pruned models offer advantages over smaller SMoE models trained from scratch, when evaluating and comparing them individually on tasks. To that end, we introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training. Our findings reveal a threshold pruning factor for the reduction that depends on the number of experts used in pretraining, above which, the reduction starts to degrade model performance. These insights contribute to our understanding of model design choices when pretraining with SMoE architectures, particularly useful when considering task-specific inference optimization for later stages.

Updated: 2024-09-02 22:35:03

标题: 通过评估任务特定专家修剪来重新审视SMoE语言模型

摘要: 稀疏混合专家（SMoE）模型已经成为语言建模中可扩展的替代方案，与密集模型相比。这些模型在变压器块中使用有条件激活的前馈子网络，允许在总模型参数和每个示例的计算之间进行分离。然而，大型的令牌路由的SMoE模型面临着一个重要挑战：在推断过程中，整个模型必须用于一个序列或一个批次，导致在分布式环境中出现高延迟，抵消了每个令牌稀疏激活的优势。我们的研究探讨了任务特定的模型修剪，以指导关于设计SMoE架构的决策，主要调节预训练中专家数量的选择。我们研究了这种修剪模型是否在评估和比较它们在任务上的表现时，比从头开始训练的较小SMoE模型提供优势。为此，我们引入了一种自适应的任务感知修剪技术UNCURL，以在训练后以离线方式减少每个MoE层中的专家数量。我们的研究结果显示，减少的阈值修剪因子取决于预训练中使用的专家数量，超过该阈值后，减少开始降低模型性能。这些见解有助于我们在预训练时使用SMoE架构时的模型设计选择的理解，特别是在考虑后续阶段的任务特定推断优化时。

更新时间: 2024-09-02 22:35:03

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2409.01483v1

Masked Mixers for Language Generation and Retrieval

Attention mechanisms that confer selective focus on a strict subset of input elements are nearly ubiquitous in language models today. We posit there to be downside to the use of attention: most information present in the input is necessarily lost. In support of this idea we observe poor input representation accuracy in transformers, but find more accurate representation in what we term masked mixers which replace self-attention with masked convolutions. Applied to TinyStories the masked mixer learns causal language tasks more efficiently than early transformer implementations and somewhat less efficiently than optimized, current implementations. The most efficient learning algorithm observed for this dataset is a transformer-masked mixer hybrid, suggesting that these models learn in an orthogonal manner. We hypothesized that the information loss exhibited by transformers would be much more detrimental to retrieval than generation, and to test this we introduce an efficient training approach for retrieval models based on existing generative model embeddings. With this method, embeddings from masked mixers are found to result in far better summary-to-story retrieval compared to embeddings from transformers.

Updated: 2024-09-02 22:17:18

标题: 遮蔽混合器用于语言生成和检索

摘要: 注意机制在当前的语言模型中几乎无处不在，它可以选择性地关注输入元素的严格子集。我们认为使用注意力机制存在一个缺点：输入中的大部分信息必然会丢失。为了支持这一观点，我们观察到transformers中输入表示的准确性较差，但在我们所称的masked mixers中找到更准确的表示，这些mixers用掩码卷积代替了自注意力机制。应用于TinyStories，masked mixer比早期transformer实现更有效地学习因果关系语言任务，但比优化的当前实现效率稍低。观察到对于这个数据集最有效的学习算法是transformer-masked mixer混合模型，这表明这些模型以一种正交的方式学习。我们假设transformers表现出的信息丢失对检索比生成更为有害，为了验证这一点，我们引入了一种基于现有生成模型嵌入的检索模型的高效训练方法。通过这种方法，masked mixers生成的嵌入与transformers生成的嵌入相比，在摘要到故事检索方面表现更好。

更新时间: 2024-09-02 22:17:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.01482v1

CMOB: Large-Scale Cancer Multi-Omics Benchmark with Open Datasets, Tasks, and Baselines

Machine learning has shown great potential in the field of cancer multi-omics studies, offering incredible opportunities for advancing precision medicine. However, the challenges associated with dataset curation and task formulation pose significant hurdles, especially for researchers lacking a biomedical background. Here, we introduce the CMOB, the first large-scale cancer multi-omics benchmark integrates the TCGA platform, making data resources accessible and usable for machine learning researchers without significant preparation and expertise.To date, CMOB includes a collection of 20 cancer multi-omics datasets covering 32 cancers, accompanied by a systematic data processing pipeline. CMOB provides well-processed dataset versions to support 20 meaningful tasks in four studies, with a collection of benchmarks. We also integrate CMOB with two complementary resources and various biological tools to explore broader research avenues.All resources are open-accessible with user-friendly and compatible integration scripts that enable non-experts to easily incorporate this complementary information for various tasks. We conduct extensive experiments on selected datasets to offer recommendations on suitable machine learning baselines for specific applications. Through CMOB, we aim to facilitate algorithmic advances and hasten the development, validation, and clinical translation of machine-learning models for personalized cancer treatments. CMOB is available on GitHub (\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark}).

Updated: 2024-09-02 22:04:08

标题: CMOB：具有开放数据集、任务和基线的大规模癌症多组学基准。

摘要: 机器学习在癌症多组学研究领域展现出巨大潜力，为推进精准医学提供了令人难以置信的机会。然而，与数据集筛选和任务制定相关的挑战构成了重要障碍，特别是对于缺乏生物医学背景的研究人员而言。在这里，我们介绍了CMOB，第一个大规模癌症多组学基准，集成了TCGA平台，使数据资源对于机器学习研究人员而言易于获取和使用，无需大量准备和专业知识。迄今为止，CMOB包括了涵盖32种癌症的20个癌症多组学数据集合，配备了系统化的数据处理流程。CMOB提供了经过良好处理的数据集版本，支持四项研究中的20个有意义的任务，并提供了一系列基准。我们还将CMOB与两个互补资源和多种生物工具集成，以探索更广泛的研究途径。所有资源均为开放获取，具有用户友好和兼容的集成脚本，使非专家能够轻松将这些补充信息纳入各种任务。我们对选定的数据集进行了广泛实验，为特定应用提供了适当的机器学习基线建议。通过CMOB，我们旨在促进算法的进步，并加快为个性化癌症治疗开发、验证和临床转化的机器学习模型。CMOB可在GitHub上获取（\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark}）。

更新时间: 2024-09-02 22:04:08

领域: q-bio.GN,cs.LG

下载: http://arxiv.org/abs/2409.02143v1

Compatible Gradient Approximations for Actor-Critic Algorithms

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with respect to input actions. This reliance requires precise action-value gradient computations, a task that proves challenging under function approximation. We introduce an actor-critic algorithm that bypasses the need for such precision by employing a zeroth-order approximation of the action-value gradient through two-point stochastic gradient estimation within the action space. This approach provably and effectively addresses compatibility issues inherent in deterministic policy gradient schemes. Empirical results further demonstrate that our algorithm not only matches but frequently exceeds the performance of current state-of-the-art methods.

Updated: 2024-09-02 22:00:50

标题: Actor-Critic算法的兼容梯度逼近方法

摘要: 确定性策略梯度算法是控制连续系统中的演员评论方法的基础，然而它们经常遇到不准确性，这是由于它们依赖于评论家对输入动作的值估计的导数。这种依赖性需要精确的动作值梯度计算，这是在函数逼近下很具挑战性的任务。我们引入了一种演员评论算法，通过在动作空间内进行两点随机梯度估计的零阶近似，绕过了对这种精度的需求。这种方法可以明确和有效地解决固定策略梯度方案中固有的兼容性问题。实证结果进一步表明，我们的算法不仅与当前最先进的方法相匹配，而且经常超越其性能。

更新时间: 2024-09-02 22:00:50

领域: cs.LG

下载: http://arxiv.org/abs/2409.01477v1

State-Blocking Side-Channel Attacks and Autonomous Fault Detection in Quantum Key Distribution

Side-channel attacks allow an Eavesdropper to use insecurities in the practical implementation of QKD systems to gain an advantage that is not considered by security proofs that assume perfect implementations. In this work we specify a side-channel capability for Eve that has yet to be considered, before then going on to discuss a scheme to autonomously detect such an attack during an ongoing QKD session, and the limits as to how fast a detection can be made. The side-channel capability is very general and covers a wide variety of possible implementations for the attack itself. We present how Alice and Bob can put in place a countermeasure to continue use of the QKD system, once a detection is made, regardless of the ongoing side-channel attack. This prevents downtime of QKD systems, which in critical infrastructure could pose severe risks. We then extend Eves side-channel capability and present a modified attack strategy. This strengthened attack can be detected under certain conditions by our scheme, however intelligent choices of parameters from Eve allow her strengthened attack to go undetected. From this, we discuss the implications this has on Privacy Amplification, and therefore on the security of QKD as a whole. Finally, consideration is given as to how these types of attacks are analogous to certain types of faults in the QKD system, how our detection scheme can also detect these faults, and therefore how this adds autonomous fault detection and redundancy to implementations of QKD.

Updated: 2024-09-02 21:44:57

标题: 状态阻塞侧信道攻击及在量子密钥分发中的自主故障检测

摘要: 侧信道攻击允许窃听者利用量子密钥分发系统实际实现中的不安全性，获得一种并未被安全性证明考虑的优势，这些证明假定实现是完美的。在这项工作中，我们指定了一个迄今尚未被考虑的Eve的侧信道能力，然后继续讨论一种方案，用于在进行中的量子密钥分发会话期间自动检测此类攻击，以及检测速度的限制。侧信道能力非常通用，涵盖了各种可能的攻击实现。我们展示了Alice和Bob如何实施一种对策，一旦检测到攻击，就可以继续使用量子密钥分发系统，而不受正在进行的侧信道攻击的影响。这可以避免关键基础设施中量子密钥分发系统的停机，可能带来严重的风险。然后，我们扩展了Eve的侧信道能力，并提出了一种改进的攻击策略。在某些条件下，我们的方案可以检测到这种加强攻击，然而Eve智能地选择参数可以使她的加强攻击不被检测到。基于此，我们讨论了这对隐私增强的影响，以及对整个量子密钥分发系统安全性的影响。最后，我们考虑了这些类型的攻击如何类似于量子密钥分发系统中的某些故障类型，我们的检测方案也可以检测到这些故障，从而为量子密钥分发的实现增加了自主故障检测和冗余。

更新时间: 2024-09-02 21:44:57

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2305.18006v3

Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations

Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One advantage of the neural network methods for PDEs lies in its automatic differentiation (AD), which necessitates only the sample points themselves, unlike traditional finite difference (FD) approximations that require nearby local points to compute derivatives. In this paper, we quantitatively demonstrate the advantage of AD in training neural networks. The concept of truncated entropy is introduced to characterize the training property. Specifically, through comprehensive experimental and theoretical analyses conducted on random feature models and two-layer neural networks, we discover that the defined truncated entropy serves as a reliable metric for quantifying the residual loss of random feature models and the training speed of neural networks for both AD and FD methods. Our experimental and theoretical analyses demonstrate that, from a training perspective, AD outperforms FD in solving PDEs.

Updated: 2024-09-02 21:40:41

标题: 自动微分在训练神经网络以解决微分方程中是必不可少的

摘要: 基于神经网络的方法最近在科学和工程领域中解决偏微分方程（PDEs）方面显示出了显著的潜力，特别是在涉及复杂领域或融入经验数据的情况下。神经网络方法在PDEs中的一个优势在于其自动微分（AD），它仅需要样本点本身，而传统的有限差分（FD）逼近需要附近的局部点来计算导数。在本文中，我们定量地展示了AD在训练神经网络中的优势。引入了截断熵的概念来表征训练属性。具体来说，通过对随机特征模型和双层神经网络进行全面的实验和理论分析，我们发现定义的截断熵可以作为一个可靠的度量，用于量化随机特征模型的残差损失以及用于AD和FD方法的神经网络的训练速度。我们的实验和理论分析表明，从训练的角度来看，AD在解决PDEs方面优于FD。

更新时间: 2024-09-02 21:40:41

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2405.14099v3

SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery

Automated detection of surgical errors can improve robotic-assisted surgery. Despite promising progress, existing methods still face challenges in capturing rich temporal context to establish long-term dependencies while maintaining computational efficiency. In this paper, we propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection, facilitating efficient long sequence modelling with linear complexity. SEDMamba enhances selective SSM with a bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos. The bottleneck mechanism compresses and restores features within their spatial dimension, thereby reducing computational complexity. FCTF utilizes multiple dilated 1D convolutional layers to merge temporal information across diverse scale ranges, accommodating errors of varying duration. Our work also contributes the first-of-its-kind, frame-level, in-vivo surgical error dataset to support error detection in real surgical cases. Specifically, we deploy the clinically validated observational clinical human reliability assessment tool (OCHRA) to annotate the errors during suturing tasks in an open-source radical prostatectomy dataset (SAR-RARP50). Experimental results demonstrate that our SEDMamba outperforms state-of-the-art methods with at least 1.82% AUC and 3.80% AP performance gains with significantly reduced computational complexity. The corresponding error annotations, code and models will be released at https://github.com/wzjialang/SEDMamba.

Updated: 2024-09-02 21:35:51

标题: SEDMamba: 使用瓶颈机制和细到粗的时间融合增强选择性状态空间建模，提高机器辅助手术中的高效错误检测

摘要: 自动检测手术错误可以改善机器人辅助手术。尽管取得了一些进展，但现有方法在捕捉丰富的时间上下文以建立长期依赖关系的同时保持计算效率方面仍面临挑战。在本文中，我们提出了一种名为SEDMamba的新型分层模型，将选择性状态空间模型（SSM）融入手术错误检测中，促进了具有线性复杂度的高效长序列建模。SEDMamba通过瓶颈机制和细到粗的时间融合（FCTF）增强了选择性SSM，以在长视频中检测和时间定位手术错误。瓶颈机制在其空间维度内压缩和恢复特征，从而降低了计算复杂性。FCTF利用多个扩张的1D卷积层在不同尺度范围内合并时间信息，以适应不同持续时间的错误。我们还贡献了第一个框架级别的体内手术错误数据集，以支持在真实手术案例中进行错误检测。具体而言，我们在开源激进前列腺切除术数据集（SAR-RARP50）中部署了经临床验证的观察性人类可靠性评估工具（OCHRA）来注释在缝合任务中的错误。实验结果表明，我们的SEDMamba在至少1.82%的AUC和3.80%的AP性能增益方面优于最先进的方法，并且计算复杂性显著降低。相应的错误注释、代码和模型将在https://github.com/wzjialang/SEDMamba发布。

更新时间: 2024-09-02 21:35:51

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.15920v2

Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning (Full Version)

Deep Neural Networks (DNNs) can handle increasingly complex tasks, albeit they require rapidly expanding training datasets. Collecting data from platforms with user-generated content, such as social networks, has significantly eased the acquisition of large datasets for training DNNs. Despite these advancements, the manual labeling process remains a substantial challenge in terms of both time and cost. In response, Semi-Supervised Learning (SSL) approaches have emerged, where only a small fraction of the dataset needs to be labeled, leaving the majority unlabeled. However, leveraging data from untrusted sources like social networks also creates new security risks, as potential attackers can easily inject manipulated samples. Previous research on the security of SSL primarily focused on injecting backdoors into trained models, while less attention was given to the more challenging untargeted poisoning attacks. In this paper, we introduce Phantom, the first untargeted poisoning attack in SSL that disrupts the training process by injecting a small number of manipulated images into the unlabeled dataset. Unlike existing attacks, our approach only requires adding few manipulated samples, such as posting images on social networks, without the need to control the victim. Phantom causes SSL algorithms to overlook the actual images' pixels and to rely only on maliciously crafted patterns that \ourname superimposed on the real images. We show Phantom's effectiveness for 6 different datasets and 3 real-world social-media platforms (Facebook, Instagram, Pinterest). Already small fractions of manipulated samples (e.g., 5\%) reduce the accuracy of the resulting model by 10\%, with higher percentages leading to a performance comparable to a naive classifier. Our findings demonstrate the threat of poisoning user-generated content platforms, rendering them unsuitable for SSL in specific tasks.

Updated: 2024-09-02 21:29:05

标题: 幻影：半监督学习中的无目标毒化攻击（完整版）

摘要: 深度神经网络（DNNs）可以处理越来越复杂的任务，尽管它们需要不断扩大的训练数据集。从用户生成内容平台（如社交网络）收集数据显著地简化了用于训练DNNs的大型数据集的获取。尽管有这些进展，手动标记过程仍然是一个在时间和成本方面存在重大挑战的问题。作为回应，半监督学习（SSL）方法已经出现，其中只需要对数据集的一小部分进行标记，其余大部分则未标记。然而，利用来自社交网络等不受信任来源的数据也带来了新的安全风险，因为潜在攻击者可以轻易注入操纵样本。先前关于SSL安全性的研究主要集中在向训练模型中注入后门，而对更具挑战性的非定向中毒攻击则给予较少关注。在本文中，我们介绍了Phantom，这是SSL中的第一个非定向中毒攻击，通过向未标记数据集中注入少量操纵图像来破坏训练过程。与现有的攻击不同，我们的方法只需要添加少量操纵样本，例如在社交网络上发布图像，而无需控制受害者。Phantom导致SSL算法忽略实际图像的像素，仅依赖于我们植入到真实图像上的恶意制作的模式。我们展示了Phantom对6个不同数据集和3个现实世界社交媒体平台（Facebook、Instagram、Pinterest）的有效性。即使是少量操纵样本（例如5\%），也会使结果模型的准确性下降10\%，更高百分比会导致性能与一个天真的分类器相当。我们的发现表明了中毒用户生成内容平台的威胁，使它们在特定任务中不适用于SSL。

更新时间: 2024-09-02 21:29:05

领域: cs.CR

下载: http://arxiv.org/abs/2409.01470v1

Manipulating Large Language Models to Increase Product Visibility

Large language models (LLMs) are increasingly being integrated into search engines to provide natural language responses tailored to user queries. Customers and end-users are also becoming more dependent on these models for quick and easy purchase decisions. In this work, we investigate whether recommendations from LLMs can be manipulated to enhance a product's visibility. We demonstrate that adding a strategic text sequence (STS) -- a carefully crafted message -- to a product's information page can significantly increase its likelihood of being listed as the LLM's top recommendation. To understand the impact of STS, we use a catalog of fictitious coffee machines and analyze its effect on two target products: one that seldom appears in the LLM's recommendations and another that usually ranks second. We observe that the strategic text sequence significantly enhances the visibility of both products by increasing their chances of appearing as the top recommendation. This ability to manipulate LLM-generated search responses provides vendors with a considerable competitive advantage and has the potential to disrupt fair market competition. Just as search engine optimization (SEO) revolutionized how webpages are customized to rank higher in search engine results, influencing LLM recommendations could profoundly impact content optimization for AI-driven search services. Code for our experiments is available at https://github.com/aounon/llm-rank-optimizer.

Updated: 2024-09-02 21:29:04

标题: 操纵大型语言模型以提高产品可见性

摘要: 大型语言模型（LLMs）越来越多地被整合到搜索引擎中，以提供针对用户查询定制的自然语言响应。顾客和最终用户也越来越依赖这些模型进行快速和简单的购买决策。在这项工作中，我们调查了LLMs的推荐是否可以被操纵以增强产品的可见性。我们证明，向产品信息页面添加一个战略文本序列（STS）-一个精心制作的消息-可以显著增加其被列为LLMs顶部推荐的可能性。为了了解STS的影响，我们使用了一个虚构的咖啡机目录，并分析了其对两个目标产品的影响：一个很少出现在LLMs的推荐中，另一个通常排名第二。我们观察到，战略文本序列显著增强了两个产品的可见性，通过增加它们作为顶部推荐的机会。操纵LLM生成的搜索响应的能力为供应商提供了相当大的竞争优势，并有可能扰乱公平市场竞争。就像搜索引擎优化（SEO）彻底改变了网页如何定制以在搜索引擎结果中排名更高一样，影响LLM的推荐可能会深刻影响面向AI驱动搜索服务的内容优化。我们实验的代码可在https://github.com/aounon/llm-rank-optimizer获取。

更新时间: 2024-09-02 21:29:04

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.07981v2

On the limits of neural network explainability via descrambling

We characterize the exact solutions to neural network descrambling--a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights, leading to descrambled weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data--now understood as the descramblers--can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.

Updated: 2024-09-02 21:17:39

标题: 关于通过解密来解释神经网络可解释性的限制

摘要: 我们对神经网络解密的精确解进行了表征-这是一个数学模型，用于解释经过训练的神经网络（NNs）的全连接层。通过将问题重新表述为在图匹配和复杂性理论中出现的Brockett函数的最小化，我们展示了隐藏层预激活的主要成分可以被表征为层权重的最佳解释者或解码器，从而导致解密的权重矩阵。我们展示了在典型的深度学习环境中，这些解码器采用多样化和有趣的形式，包括（1）将最大的主成分与傅里叶基底的最低频模式进行匹配，用于各向同性隐藏数据，（2）发现两层线性NNs中的语义发展，用于信号恢复问题，以及（3）通过最佳排列神经元来解释CNNs。我们的数值实验表明，隐藏层数据的特征分解-现在被理解为解码器-也可以揭示层的基本变换。这些结果说明SVD与NN的可解释性之间的直接关系比以前认为的更紧密，并为发现NN的隐藏动作的可解释图案提供了一个有前途的途径，特别是在操作学习或物理信息NNs的背景下，在这些背景下，输入/输出数据具有有限的人类可读性。

更新时间: 2024-09-02 21:17:39

领域: cs.LG,cs.NA,eess.SP,math.NA

下载: http://arxiv.org/abs/2301.07820v3

PoliPrompt: A High-Performance Cost-Effective LLM-Based Text Classification Framework for Political Science

Recent advancements in large language models (LLMs) have opened new avenues for enhancing text classification efficiency in political science, surpassing traditional machine learning methods that often require extensive feature engineering, human labeling, and task-specific training. However, their effectiveness in achieving high classification accuracy remains questionable. This paper introduces a three-stage in-context learning approach that leverages LLMs to improve classification accuracy while minimizing experimental costs. Our method incorporates automatic enhanced prompt generation, adaptive exemplar selection, and a consensus mechanism that resolves discrepancies between two weaker LLMs, refined by an advanced LLM. We validate our approach using datasets from the BBC news reports, Kavanaugh Supreme Court confirmation, and 2018 election campaign ads. The results show significant improvements in classification F1 score (+0.36 for zero-shot classification) with manageable economic costs (-78% compared with human labeling), demonstrating that our method effectively addresses the limitations of traditional machine learning while offering a scalable and reliable solution for text analysis in political science.

Updated: 2024-09-02 21:05:31

标题: PoliPrompt：一种高性能、成本效益高的基于LLM的政治科学文本分类框架

摘要: 最近，大型语言模型（LLMs）的最新进展为政治科学中的文本分类效率提升打开了新的途径，超越了通常需要大量特征工程、人工标注和任务特定训练的传统机器学习方法。然而，它们在实现高分类准确性方面的有效性仍然存在疑问。本文介绍了一种三阶段的上下文学习方法，利用LLMs来提高分类准确性，同时最小化实验成本。我们的方法包括自动增强提示生成、自适应样本选择以及一个解决两个较弱LLMs之间差异的共识机制，这些LLMs经过先进LLM的改进。我们使用来自BBC新闻报道、卡瓦诺最高法院确认和2018年选举广告的数据集验证了我们的方法。结果显示，在管理经济成本的情况下（与人工标注相比减少了78%），分类F1分数有显著提高（零-shot分类+0.36），表明我们的方法有效地解决了传统机器学习的局限性，同时为政治科学中的文本分析提供了可扩展和可靠的解决方案。

更新时间: 2024-09-02 21:05:31

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.01466v1

Stein transport for Bayesian inference

We introduce $\textit{Stein transport}$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. While SVGD relies on convergence in the long-time limit, Stein transport reaches its posterior approximation at finite time $t=1$. Studying the mean-field limit, we discuss the errors incurred by regularisation and finite-particle effects, and we connect Stein transport to birth-death dynamics and Fisher-Rao gradient flows. In a series of experiments, we show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.

Updated: 2024-09-02 21:03:38

标题: 贝叶斯推断的Stein传输

摘要: 我们介绍了Stein transport，这是一种用于贝叶斯推断的新方法，旨在有效地沿着预定义的调整概率分布曲线推动粒子集合。驱动向量场是从再生核希尔伯特空间中选择的，并且可以通过适当的核岭回归公式或作为Stein几何中的微小最优传输映射来导出。Stein transport的更新方程类似于Stein变分梯度下降（SVGD）的方程，但引入了一个随时间变化的得分函数以及附加到粒子上的特定权重。虽然SVGD依赖于长时间极限下的收敛，但Stein transport在有限时间t=1时达到其后验近似。通过研究平均场极限，我们讨论了由正则化和有限粒子效应引起的误差，并将Stein transport与出生-死亡动力学和Fisher-Rao梯度流联系起来。在一系列实验中，我们展示了与SVGD相比，Stein transport不仅通常能够以显著减少的计算预算达到更准确的后验近似，而且还有效地缓解了SVGD中常见的方差崩溃现象。

更新时间: 2024-09-02 21:03:38

领域: stat.ML,cs.LG,cs.NA,math.NA,math.ST,stat.ME,stat.TH

下载: http://arxiv.org/abs/2409.01464v1

INSTA-YOLO: Real-Time Instance Segmentation

Instance segmentation has gained recently huge attention in various computer vision applications. It aims at providing different IDs to different object of the scene, even if they belong to the same class. This is useful in various scenarios, especially in occlusions. Instance segmentation is usually performed as a two-stage pipeline. First, an object is detected, then semantic segmentation within the detected box area. This process involves costly up-sampling, especially for the segmentation part. Moreover, for some applications, such as LiDAR point clouds and aerial object detection, it is often required to predict oriented boxes, which add extra complexity to the two-stage pipeline. In this paper, we propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation. The proposed model is inspired by the YOLO one-shot object detector, with the box regression loss is replaced with polynomial regression in the localization head. This modification enables us to skip the segmentation up-sampling decoder altogether and produces the instance segmentation contour from the polynomial output coefficients. In addition, this architecture is a natural fit for oriented objects. We evaluate our model on three datasets, namely, Carnva, Cityscapes and Airbus. The results show our model achieves competitive accuracy in terms of mAP with significant improvement in speed by 2x on GTX-1080 GPU.

Updated: 2024-09-02 20:56:32

标题: INSTA-YOLO: 实时实例分割

摘要: 实例分割近年来在各种计算机视觉应用中受到了极大关注。它的目标是为场景中的不同对象提供不同的ID，即使它们属于同一类。这在各种场景中都很有用，特别是在遮挡情况下。实例分割通常作为一个两阶段流程来执行。首先检测一个对象，然后在检测到的框区域内进行语义分割。这个过程涉及到昂贵的上采样，特别是在分割部分。此外，对于一些应用，比如LiDAR点云和航空目标检测，通常需要预测有向框，这给两阶段流程增加了额外的复杂性。在本文中，我们提出了Insta-YOLO，一种新颖的一阶段端到端深度学习模型，用于实时实例分割。所提出的模型受到YOLO单次目标检测器的启发，其中框回归损失被替换为在定位头部的多项式回归。这种修改使我们能够完全跳过分割上采样解码器，并从多项式输出系数中生成实例分割轮廓。此外，这种架构很适合有向对象。我们在三个数据集上评估我们的模型，即Carnva、Cityscapes和Airbus。结果显示，我们的模型在mAP方面实现了竞争性的准确性，并在GTX-1080 GPU上的速度上实现了2倍的显著改进。

更新时间: 2024-09-02 20:56:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2102.06777v3

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.

Updated: 2024-09-02 20:42:08

标题: 地图：通过二次逼近实现低计算量的模型合并与摊销帕累托前沿

摘要: 模型合并已经成为一种有效的方法，将从同一预训练模型微调的多个单任务模型合并成一个多任务模型。这个过程通常涉及计算模型参数的加权平均，而无需额外的训练。现有的模型合并方法侧重于提高平均任务准确度。然而，在不同任务目标之间的干扰和冲突可能导致模型合并过程中的权衡。在实际应用中，一组具有各种权衡的解决方案可能更具信息性，有助于从不同偏好中做出决策的从业者。在本文中，我们介绍了一种新颖的低计算算法，称为具有摊销帕累托前沿的模型合并（MAP）。MAP识别一组用于合并多个模型以反映权衡的缩放系数的帕累托集。MAP的核心组件是利用从预先选择的一组缩放系数导出的二次逼近替代模型来近似各种任务的评估指标，从而实现摊销推理。对视觉和自然语言处理任务的实验结果表明，MAP可以准确地识别帕累托前沿。为了进一步减少MAP所需的计算量，我们提出了（1）一种贝叶斯自适应采样算法和（2）一个具有多阶段的嵌套合并方案。

更新时间: 2024-09-02 20:42:08

领域: cs.LG

下载: http://arxiv.org/abs/2406.07529v3

RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices. Block-wise generation can be a promising alternative for designing compact-sized (parameter-efficient) deep generative models since the model can generate one block at a time instead of generating the whole image at once. However, block-wise generation is also considerably challenging because ensuring coherence across generated blocks can be non-trivial. To this end, we design a retrieval-augmented generation (RAG) approach and leverage the corresponding blocks of the images retrieved by the RAG module to condition the training and generation stages of a block-wise denoising diffusion model. Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation. While we showcase our approach using the latent diffusion model (LDM) as the base model, it can be used with other variants of denoising diffusion models. We validate the solution of the coherence problem through the proposed approach by reporting substantive experiments to demonstrate our approach's effectiveness in compact model size and excellent generation quality.

Updated: 2024-09-02 20:33:49

标题: RISSOLE：通过分块生成和检索引导实现参数高效的扩散模型

摘要: 扩散模型展示了令人印象深刻的生成能力。然而，它们也具有大量参数，导致模型尺寸巨大，因此不适用于部署在资源受限设备上。分块生成可以作为设计紧凑（参数高效）深度生成模型的有希望的替代方案，因为该模型可以一次生成一个块，而不是一次性生成整个图像。然而，分块生成也具有相当大的挑战性，因为确保生成的块之间的连贯性可能并不轻松。为此，我们设计了一种检索增强生成（RAG）方法，并利用RAG模块检索到的图像相应块来调节分块去噪扩散模型的训练和生成阶段。我们的调节方案确保了在训练期间以及因此在生成期间各个块之间的连贯性。虽然我们展示了我们的方法使用潜在扩散模型（LDM）作为基本模型，但它也可以与其他变体的去噪扩散模型一起使用。我们通过提出的方法验证了解决连贯性问题的解决方案，通过报告实质性实验来展示我们的方法在紧凑模型尺寸和优秀生成质量方面的有效性。

更新时间: 2024-09-02 20:33:49

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.17095v2

On the Impacts of Contexts on Repository-Level Code Generation

CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored. Our work underscores the critical importance of leveraging repository-level contexts to generate executable and functionally correct code. We present \textbf{\methodnamews}, a novel benchmark designed to evaluate repository-level code generation, with a focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts. Our study examines a controlled scenario where developers specify essential code dependencies (contexts), challenging models to integrate them effectively. Additionally, we introduce an instruction-tuned dataset that enhances CodeLLMs' ability to leverage dependencies, along with a new metric, \textit{Dependency Invocation Rate (DIR)}, to quantify context utilization. Experimental results reveal that while pretrained LLMs demonstrate superior performance in terms of correctness, instruction-tuned models excel in context utilization and debugging capabilities. \methodnamews offers a comprehensive evaluation framework for assessing code functionality and alignment with developer intent, thereby advancing the development of more reliable CodeLLMs for real-world applications. The dataset and source code are available at~\url{https://github.com/FSoft-AI4Code/RepoExec}.

Updated: 2024-09-02 20:26:26

标题: 关于上下文对存储库级别代码生成的影响

摘要: CodeLLMs已被广泛应用于代码生成任务，但它们处理具有复杂上下文依赖关系的存储库级代码生成的能力仍未得到充分探索。我们的工作强调了利用存储库级上下文生成可执行和功能正确的代码的重要性。我们提出了\textbf{\methodnamews}，这是一个新颖的基准，旨在评估存储库级代码生成，重点关注三个关键方面：可执行性，通过全面测试用例生成的功能正确性，以及准确利用跨文件上下文。我们的研究考察了一个受控场景，在这个场景中，开发人员指定了必要的代码依赖关系（上下文），挑战模型有效地集成它们。此外，我们引入了一个经过指令调整的数据集，增强了CodeLLMs利用依赖关系的能力，同时引入了一个新的度量指标\textit{Dependency Invocation Rate (DIR)}，用于量化上下文利用。实验结果显示，虽然预训练的LLMs在正确性方面表现出色，但经过指令调整的模型在上下文利用和调试能力方面表现优异。\methodnamews提供了一个全面的评估框架，用于评估代码功能和与开发人员意图的一致性，从而推动更可靠的CodeLLMs用于实际应用的开发。数据集和源代码可在~\url{https://github.com/FSoft-AI4Code/RepoExec} 上获得。

更新时间: 2024-09-02 20:26:26

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.11927v3

Eliciting Informative Text Evaluations with Large Language Models

Peer prediction mechanisms motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as textual feedback is the norm in a large variety of feedback channels: peer reviews, e-commerce customer reviews, and comments on social media. We introduce two mechanisms, the Generative Peer Prediction Mechanism (GPPM) and the Generative Synopsis Peer Prediction Mechanism (GSPPM). These mechanisms utilize LLMs as predictors, mapping from one agent's report to a prediction of her peer's report. Theoretically, we show that when the LLM prediction is sufficiently accurate, our mechanisms can incentivize high effort and truth-telling as an (approximate) Bayesian Nash equilibrium. Empirically, we confirm the efficacy of our mechanisms through experiments conducted on two real datasets: the Yelp review dataset and the ICLR OpenReview dataset. We highlight the results that on the ICLR dataset, our mechanisms can differentiate three quality levels -- human-written reviews, GPT-4-generated reviews, and GPT-3.5-generated reviews in terms of expected scores. Additionally, GSPPM penalizes LLM-generated reviews more effectively than GPPM.

Updated: 2024-09-02 20:25:36

标题: 使用大型语言模型引发信息丰富的文本评价

摘要: 同行评价机制激励高质量反馈，并提供可证明的保证。然而，当前的方法仅适用于相当简单的报告，如多项选择或标量数字。我们的目标是将这些技术拓展到基于文本的报告的更广泛领域，借鉴了大型语言模型的最新发展。这大大增加了同行评价机制的适用性，因为文本反馈在各种反馈渠道中是常态：同行评审、电子商务客户评论和社交媒体评论。我们介绍了两种机制，即生成型同行评价机制（GPPM）和生成型摘要同行评价机制（GSPPM）。这些机制利用LLM作为预测器，从一个代理的报告映射到对她同行报告的预测。理论上，我们展示了当LLM预测足够准确时，我们的机制可以激励高投入和真实陈述作为（近似）贝叶斯纳什均衡。在实证方面，我们通过在两个真实数据集上进行的实验证实了我们机制的有效性：Yelp评论数据集和ICLR OpenReview数据集。我们强调了在ICLR数据集上的结果，我们的机制可以根据预期得分区分三个质量水平--人工撰写的评论、GPT-4生成的评论和GPT-3.5生成的评论。此外，GSPPM比GPPM更有效地惩罚LLM生成的评论。

更新时间: 2024-09-02 20:25:36

领域: cs.CL,cs.AI,cs.GT

下载: http://arxiv.org/abs/2405.15077v4

Uplift Modeling Under Limited Supervision

Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or treatment policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random, underlining the need for models that can generalize with limited supervision to reduce experimental risks.

Updated: 2024-09-02 20:21:29

标题: 有限监督下的提升建模

摘要: 在电子商务中估计因果效应往往涉及昂贵的处理分配，这在大规模环境中可能是不切实际的。利用机器学习来预测这种处理效果，而无需实际干预，是减少风险的标准做法。然而，现有的处理效果预测方法往往依赖于大量训练集，这些训练集是由真实实验构建的，因此本质上是有风险的。在这项工作中，我们提出了一种图神经网络，以减少所需的训练集大小，依赖于在电子商务数据中常见的图。具体来说，我们将问题视为具有受限标记实例数量的节点回归问题，开发了一个类似于先前因果效应估计器的双模型神经架构，并测试了用于编码的不同消息传递层。此外，作为额外步骤，我们将模型与获取函数相结合，以指导在实验预算极低的情况下创建训练集。该框架是灵活的，因为每个步骤都可以单独与其他模型或处理策略一起使用。对真实大规模网络的实验表明，我们的方法明显优于现有技术，后者在许多情况下表现接近随机，强调了需要能够在有限监督下进行泛化以减少实验风险的模型的必要性。

更新时间: 2024-09-02 20:21:29

领域: cs.LG,cs.AI,stat.ME

下载: http://arxiv.org/abs/2403.19289v4

Real-Time Recurrent Learning using Trace Units in Reinforcement Learning

Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments. For agents that learn online and continually interact with the environment, it is desirable to train RNNs with real-time recurrent learning (RTRL); unfortunately, RTRL is prohibitively expensive for standard RNNs. A promising direction is to use linear recurrent architectures (LRUs), where dense recurrent weights are replaced with a complex-valued diagonal, making RTRL efficient. In this work, we build on these insights to provide a lightweight but effective approach for training RNNs in online RL. We introduce Recurrent Trace Units (RTUs), a small modification on LRUs that we nonetheless find to have significant performance benefits over LRUs when trained with RTRL. We find RTUs significantly outperform other recurrent architectures across several partially observable environments while using significantly less computation.

Updated: 2024-09-02 20:08:23

标题: 使用迹单位在强化学习中进行实时循环学习

摘要: 递归神经网络（RNNs）被用来在部分可观察环境中学习表示。对于在线学习并持续与环境互动的代理，希望使用实时递归学习（RTRL）来训练RNNs；不幸的是，对于标准RNNs来说，RTRL的成本过高。一个有希望的方向是使用线性递归结构（LRUs），其中密集的递归权重被一个复数对角线取代，使得RTRL高效。在这项工作中，我们基于这些洞察力提供了一种轻量但有效的方法来训练在线RL中的RNNs。我们引入递归追踪单元（RTUs），这是对LRUs的一个小修改，然而我们发现当使用RTRL训练时，RTUs相比LRUs具有显著的性能优势。我们发现在使用显著更少计算的情况下，RTUs在几个部分可观察环境中显著优于其他递归结构。

更新时间: 2024-09-02 20:08:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.01449v1

FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition

Real-life applications of action recognition often require a fine-grained understanding of subtle movements, e.g., in sports analytics, user interactions in AR/VR, and surgical videos. Although fine-grained actions are more costly to annotate, existing semi-supervised action recognition has mainly focused on coarse-grained action recognition. Since fine-grained actions are more challenging due to the absence of scene bias, classifying these actions requires an understanding of action-phases. Hence, existing coarse-grained semi-supervised methods do not work effectively. In this work, we for the first time thoroughly investigate semi-supervised fine-grained action recognition (FGAR). We observe that alignment distances like dynamic time warping (DTW) provide a suitable action-phase-aware measure for comparing fine-grained actions, a concept previously unexploited in FGAR. However, since regular DTW distance is pairwise and assumes strict alignment between pairs, it is not directly suitable for classifying fine-grained actions. To utilize such alignment distances in a limited-label setting, we propose an Alignability-Verification-based Metric learning technique to effectively discriminate between fine-grained action pairs. Our learnable alignability score provides a better phase-aware measure, which we use to refine the pseudo-labels of the primary video encoder. Our collaborative pseudo-labeling-based framework `\textit{FinePseudo}' significantly outperforms prior methods on four fine-grained action recognition datasets: Diving48, FineGym99, FineGym288, and FineDiving, and shows improvement on existing coarse-grained datasets: Kinetics400 and Something-SomethingV2. We also demonstrate the robustness of our collaborative pseudo-labeling in handling novel unlabeled classes in open-world semi-supervised setups. Project Page: https://daveishan.github.io/finepsuedo-webpage/.

Updated: 2024-09-02 20:08:06

标题: FinePseudo：通过时间对齐提高伪标记在半监督精细动作识别中的效果

摘要: 动作识别的真实应用通常需要对微妙动作进行细致的理解，例如在体育分析、AR/VR中的用户交互和外科手术视频中。尽管细粒度动作更昂贵，但现有的半监督动作识别主要集中在粗粒度动作识别上。由于细粒度动作由于没有场景偏差更具挑战性，对这些动作进行分类需要理解动作阶段。因此，现有的粗粒度半监督方法无法有效地工作。在这项工作中，我们首次全面研究了半监督细粒度动作识别（FGAR）。我们发现像动态时间规整（DTW）这样的对齐距离提供了适合比较细粒度动作的动作阶段感知度量，这是以前在FGAR中未曾开发的概念。然而，由于常规的DTW距离是成对的，并假定对之间存在严格的对齐，因此不直接适用于分类细粒度动作。为了在有限标签设置中利用这种对齐距离，我们提出了一种基于对齐能力验证的度量学习技术，以有效区分细粒度动作对。我们可学习的对齐能力分数提供了更好的阶段感知度量，我们用它来优化主视频编码器的伪标签。我们的协作伪标签框架“FinePseudo”在四个细粒度动作识别数据集上显著优于先前方法：Diving48、FineGym99、FineGym288和FineDiving，并在现有的粗粒度数据集Kinetics400和Something-SomethingV2上显示了改进。我们还展示了我们的协作伪标签在处理开放世界半监督设置中的新未标记类别方面的稳健性。项目页面：https://daveishan.github.io/finepsuedo-webpage/。

更新时间: 2024-09-02 20:08:06

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.01448v1

A multi-language toolkit for supporting automated checking of research outputs

This article presents the automatic checking of research outputs package acro, which assists researchers and data governance teams by automatically applying best-practice principles-based statistical disclosure control (SDC) techniques on-the-fly as researchers conduct their analyses. acro distinguishes between: research output that is safe to publish; output that requires further analysis; and output that cannot be published because it creates substantial risk of disclosing private data. This is achieved through the use of a lightweight Python wrapper that sits over well-known analysis tools that produce outputs such as tables, plots, and statistical models. This adds functionality to (i) identify potentially disclosive outputs against a range of commonly used disclosure tests; (ii) apply disclosure mitigation strategies where required; (iii) report reasons for applying SDC; and (iv) produce simple summary documents trusted research environment staff can use to streamline their workflow. The major analytical programming languages used by researchers are supported: Python, R, and Stata. The acro code and documentation are available under an MIT license at https://github.com/AI-SDC/ACRO

Updated: 2024-09-02 20:06:21

标题: 一个支持研究成果自动化检查的多语言工具包

摘要: 本文介绍了研究产出包acro的自动检查，该包通过自动应用基于最佳实践原则的统计披露控制（SDC）技术，帮助研究人员和数据治理团队在研究人员进行分析时即时进行辅助。acro区分了：安全发布的研究产出；需要进一步分析的产出；以及不能发布的产出，因为它会造成披露私人数据的重大风险。这是通过使用一个轻量级的Python包装器实现的，该包装器覆盖了生产诸如表格、图表和统计模型等产出的众所周知的分析工具。这增加了以下功能：（i）识别针对一系列常用披露测试的潜在披露产出；（ii）在需要时应用披露缓解策略；（iii）报告应用SDC的原因；（iv）生成简单的摘要文件，可由受信任的研究环境工作人员用于简化其工作流程。支持研究人员使用的主要分析编程语言有：Python、R和Stata。acro代码和文档可在https://github.com/AI-SDC/ACRO下以MIT许可证获得。

更新时间: 2024-09-02 20:06:21

领域: cs.CR,cs.IR,cs.SE,stat.AP,stat.ME

下载: http://arxiv.org/abs/2212.02935v2

Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification

In landscape-aware algorithm selection problem, the effectiveness of feature-based predictive models strongly depends on the representativeness of training data for practical applications. In this work, we investigate the potential of randomly generated functions (RGF) for the model training, which cover a much more diverse set of optimization problem classes compared to the widely-used black-box optimization benchmarking (BBOB) suite. Correspondingly, we focus on automated algorithm configuration (AAC), that is, selecting the best suited algorithm and fine-tuning its hyperparameters based on the landscape features of problem instances. Precisely, we analyze the performance of dense neural network (NN) models in handling the multi-output mixed regression and classification tasks using different training data sets, such as RGF and many-affine BBOB (MA-BBOB) functions. Based on our results on the BBOB functions in 5d and 20d, near optimal configurations can be identified using the proposed approach, which can most of the time outperform the off-the-shelf default configuration considered by practitioners with limited knowledge about AAC. Furthermore, the predicted configurations are competitive against the single best solver in many cases. Overall, configurations with better performance can be best identified by using NN models trained on a combination of RGF and MA-BBOB functions.

Updated: 2024-09-02 20:04:41

标题: 考虑景观特征的自动化算法配置：使用多输出混合回归和分类

摘要: 在具有景观感知的算法选择问题中，基于特征的预测模型的有效性强烈依赖于训练数据对实际应用的代表性。在这项工作中，我们研究了随机生成函数（RGF）在模型训练中的潜力，与广泛使用的黑盒优化基准测试（BBOB）套件相比，RGF覆盖了更多种类的优化问题类。相应地，我们专注于自动算法配置（AAC），即根据问题实例的景观特征选择最合适的算法并微调其超参数。准确地说，我们分析了密集神经网络（NN）模型在处理多输出混合回归和分类任务时使用不同训练数据集（如RGF和多仿射BBOB（MA-BBOB）函数）的性能。根据我们在5d和20d上BBOB函数的结果，可以使用提出的方法识别接近最佳配置，这在大多数情况下能够胜过对AAC了解有限的从业者考虑的即插即用默认配置。此外，在许多情况下，预测的配置与单个最佳求解器竞争。总的来说，通过使用在RGF和MA-BBOB函数组合上训练的NN模型，可以最佳地识别性能更好的配置。

更新时间: 2024-09-02 20:04:41

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2409.01446v1

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos. Such methods could benefit various video editing, processing, and understanding tasks. However, existing approaches operate under the restrictive assumption that a suitable video pair for alignment is given, significantly limiting their broader applicability. To address this, we re-pose temporal alignment as a search problem and introduce the task of Alignable Video Retrieval (AVR). Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query. To achieve this, we make three key contributions: 1) we introduce DRAQ, a video alignability indicator to identify and re-rank the best alignable video from a set of candidates; 2) we propose an effective and generalizable frame-level video feature design to improve the alignment performance of several off-the-shelf feature representations, and 3) we propose a novel benchmark and evaluation protocol for AVR using cycle-consistency metrics. Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach in identifying alignable video pairs from diverse datasets. Project Page: https://daveishan.github.io/avr-webpage/.

Updated: 2024-09-02 20:00:49

标题: 从海洋同步：从大规模数据集中检索可对齐的视频

摘要: 时间视频对齐旨在同步两个视频中的关键事件，如对象交互或动作阶段转换。这种方法可以在各种视频编辑、处理和理解任务中受益。然而，现有方法在一个限制性假设下运作，即给定一个适合对齐的视频对，显著限制了它们的广泛适用性。为了解决这个问题，我们将时间对齐重新定位为一个搜索问题，并引入Alignable Video Retrieval (AVR)任务。给定一个查询视频，我们的方法可以从大量剪辑中识别出易对齐的视频，并将它们与查询视频进行时间同步。为了实现这一目标，我们做出了三个关键贡献：1)我们引入了DRAQ，一个视频对齐指标，用于识别和重新排列最佳对齐视频; 2)我们提出了一种有效且可泛化的帧级视频特征设计，以提高几种现成特征表示的对齐性能; 3)我们提出了一种新的基准和评估协议，使用循环一致性度量来评估AVR。我们在包括大规模Kinetics700在内的3个数据集上的实验表明，我们的方法在识别来自不同数据集的可对齐视频对方面的有效性。项目网页：https://daveishan.github.io/avr-webpage/。

更新时间: 2024-09-02 20:00:49

领域: cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.01445v1

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

We introduce Kvasir-VQA, an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations to facilitate advanced machine learning tasks in Gastrointestinal (GI) diagnostics. This dataset comprises 6,500 annotated images spanning various GI tract conditions and surgical instruments, and it supports multiple question types including yes/no, choice, location, and numerical count. The dataset is intended for applications such as image captioning, Visual Question Answering (VQA), text-based generation of synthetic medical images, object detection, and classification. Our experiments demonstrate the dataset's effectiveness in training models for three selected tasks, showcasing significant applications in medical image analysis and diagnostics. We also present evaluation metrics for each task, highlighting the usability and versatility of our dataset. The dataset and supporting artifacts are available at https://datasets.simula.no/kvasir-vqa.

Updated: 2024-09-02 19:41:59

标题: Kvasir-VQA：一种文本-图像对GI道数据集

摘要: 我们介绍了Kvasir-VQA，这是从HyperKvasir和Kvasir-Instrument数据集衍生出来的一个扩展数据集，注释了问题和答案，以促进在胃肠道（GI）诊断中的高级机器学习任务。该数据集包括6,500张带有各种GI道条件和外科器械注释的图像，并支持多种问题类型，包括是/否、选择、位置和数值计数。该数据集旨在应用于图像说明、视觉问答（VQA）、基于文本生成合成医学图像、目标检测和分类等领域。我们的实验表明该数据集在为三个选择任务训练模型方面的有效性，展示了在医学图像分析和诊断中的重要应用。我们还提供了每个任务的评估指标，突出了我们数据集的可用性和多功能性。该数据集和支持文档可在https://datasets.simula.no/kvasir-vqa上获取。

更新时间: 2024-09-02 19:41:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01437v1

Efficient and Scalable Estimation of Tool Representations in Vector Space

Recent advancements in function calling and tool use have significantly enhanced the capabilities of large language models (LLMs) by enabling them to interact with external information sources and execute complex tasks. However, the limited context window of LLMs presents challenges when a large number of tools are available, necessitating efficient methods to manage prompt length and maintain accuracy. Existing approaches, such as fine-tuning LLMs or leveraging their reasoning capabilities, either require frequent retraining or incur significant latency overhead. A more efficient solution involves training smaller models to retrieve the most relevant tools for a given query, although this requires high quality, domain-specific data. To address those challenges, we present a novel framework for generating synthetic data for tool retrieval applications and an efficient data-driven tool retrieval strategy using small encoder models. Empowered by LLMs, we create ToolBank, a new tool retrieval dataset that reflects real human user usages. For tool retrieval methodologies, we propose novel approaches: (1) Tool2Vec: usage-driven tool embedding generation for tool retrieval, (2) ToolRefiner: a staged retrieval method that iteratively improves the quality of retrieved tools, and (3) MLC: framing tool retrieval as a multi-label classification problem. With these new methods, we achieve improvements of up to 27.28 in Recall@K on the ToolBench dataset and 30.5 in Recall@K on ToolBank. Additionally, we present further experimental results to rigorously validate our methods. Our code is available at \url{https://github.com/SqueezeAILab/Tool2Vec}

Updated: 2024-09-02 19:39:24

标题: 工具表示在向量空间中的高效可扩展估计

摘要: 最近在函数调用和工具使用方面取得的进展显著地增强了大型语言模型（LLMs）的能力，使它们能够与外部信息源互动并执行复杂任务。然而，LLMs的有限上下文窗口在大量工具可用时存在挑战，需要有效管理提示长度并保持准确性。现有方法，如微调LLMs或利用它们的推理能力，要么需要频繁的重新训练，要么会产生显著的延迟开销。更有效的解决方案涉及训练较小的模型来检索给定查询的最相关工具，尽管这需要高质量的领域特定数据。为了解决这些挑战，我们提出了一个新颖的框架，用于生成工具检索应用的合成数据，并使用小型编码器模型实现高效的数据驱动工具检索策略。在LLMs的支持下，我们创建了ToolBank，这是一个反映真实人类用户用法的新工具检索数据集。对于工具检索方法，我们提出了新颖的方法：（1）Tool2Vec：用于工具检索的基于使用的工具嵌入生成，（2）ToolRefiner：一种分阶段的检索方法，可逐步提高检索到的工具的质量，以及（3）MLC：将工具检索作为多标签分类问题。通过这些新方法，我们在ToolBench数据集上的Recall@K方面实现了高达27.28的改进，并在ToolBank上的Recall@K方面实现了30.5的改进。此外，我们提供了更多实验结果来严格验证我们的方法。我们的代码可在\url{https://github.com/SqueezeAILab/Tool2Vec}上获得。

更新时间: 2024-09-02 19:39:24

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.02141v1

Self-Supervised Learning for Identifying Defects in Sewer Footage

Sewerage infrastructure is among the most expensive modern investments requiring time-intensive manual inspections by qualified personnel. Our study addresses the need for automated solutions without relying on large amounts of labeled data. We propose a novel application of Self-Supervised Learning (SSL) for sewer inspection that offers a scalable and cost-effective solution for defect detection. We achieve competitive results with a model that is at least 5 times smaller than other approaches found in the literature and obtain competitive performance with 10\% of the available data when training with a larger architecture. Our findings highlight the potential of SSL to revolutionize sewer maintenance in resource-limited settings.

Updated: 2024-09-02 19:28:48

标题: 自监督学习用于识别下水道录像中的缺陷

摘要: 污水管道基础设施是当今最昂贵的现代投资之一，需要由合格人员进行耗时的手动检查。我们的研究解决了需要自动化解决方案的问题，而不依赖于大量标记数据。我们提出了一种新颖的自监督学习（SSL）应用于污水检查，为缺陷检测提供了可扩展和具有成本效益的解决方案。我们的模型至少比文献中发现的其他方法小5倍，并在用更大的架构训练时，只用10\%的可用数据即可获得竞争性表现。我们的研究结果突显了SSL在资源有限的环境中彻底改变污水管道维护的潜力。

更新时间: 2024-09-02 19:28:48

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.02140v1

Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation

Federated Learning (FL) enables multiple clients to collaboratively train a model without sharing their local data. Yet the FL system is vulnerable to well-designed Byzantine attacks, which aim to disrupt the model training process by uploading malicious model updates. Existing robust aggregation rule-based defense methods overlook the diversity of magnitude and direction across different layers of the model updates, resulting in limited robustness performance, particularly in non-IID settings. To address these challenges, we propose the Layer-Adaptive Sparsified Model Aggregation (LASA) approach, which combines pre-aggregation sparsification with layer-wise adaptive aggregation to improve robustness. Specifically, LASA includes a pre-aggregation sparsification module that sparsifies updates from each client before aggregation, reducing the impact of malicious parameters and minimizing the interference from less important parameters for the subsequent filtering process. Based on sparsified updates, a layer-wise adaptive filter then adaptively selects benign layers using both magnitude and direction metrics across all clients for aggregation. We provide the detailed theoretical robustness analysis of LASA and the resilience analysis for the FL integrated with LASA. Extensive experiments are conducted on various IID and non-IID datasets. The numerical results demonstrate the effectiveness of LASA. Code is available at \url{https://github.com/JiiahaoXU/LASA}.

Updated: 2024-09-02 19:28:35

标题: 通过层自适应稀疏模型聚合实现拜占庭容错的联邦学习

摘要: Federated Learning（FL）使多个客户端能够协作训练模型，而无需共享他们的本地数据。然而，FL系统容易受到设计精良的拜占庭攻击的影响，这些攻击旨在通过上传恶意模型更新来破坏模型训练过程。现有的基于鲁棒聚合规则的防御方法忽视了模型更新在不同层之间的幅度和方向的多样性，导致鲁棒性性能有限，特别是在非IID设置中。为了解决这些挑战，我们提出了Layer-Adaptive Sparsified Model Aggregation（LASA）方法，该方法将预聚合稀疏化与分层自适应聚合相结合以提高鲁棒性。具体而言，LASA包括一个预聚合稀疏化模块，该模块在聚合之前对每个客户端的更新进行稀疏化处理，从而减少恶意参数的影响，并最小化对后续过滤过程中不太重要参数的干扰。基于稀疏化的更新，一个分层自适应过滤器然后根据所有客户端的幅度和方向度量来自适应地选择良性层进行聚合。我们提供了LASA的详细的理论鲁棒性分析以及集成LASA的FL的恢复能力分析。我们在各种IID和非IID数据集上进行了大量实验。数值结果表明LASA的有效性。代码可在\url{https://github.com/JiiahaoXU/LASA}获取。

更新时间: 2024-09-02 19:28:35

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2409.01435v1

Self-Directed Learning of Convex Labelings on Graphs

We study the problem of learning the clusters of a given graph in the self-directed learning setup. This learning setting is a variant of online learning, where rather than an adversary determining the sequence in which nodes are presented, the learner autonomously and adaptively selects them. While self-directed learning of Euclidean halfspaces, linear functions, and general abstract multi-class hypothesis classes was recently considered, no results previously existed specifically for self-directed node classification on graphs. In this paper, we address this problem developing efficient algorithms for it. More specifically, we focus on the case of (geodesically) convex clusters, i.e., for every two nodes sharing the same label, all nodes on every shortest path between them also share the same label. In particular, we devise a polynomial-time algorithm that makes only $3(h(G)+1)^4 \ln n$ mistakes on graphs with two convex clusters, where $n$ is the total number of nodes and $h(G)$ is the Hadwiger number, i.e., the size of the largest clique minor of the graph $G$. We also show that our algorithm is robust to the case that clusters are slightly non-convex, still achieving a mistake bound logarithmic in $n$. Finally, for the more standard case of homophilic clusters, where strongly connected nodes tend to belong the same class, we devise a simple and efficient algorithm.

Updated: 2024-09-02 19:13:26

标题: 图上凸标签的自主学习

摘要: 我们研究了在自主学习设置中学习给定图的聚类问题。这种学习设置是在线学习的一个变种，在这种情况下，学习者自主地和自适应地选择节点，而不是对手决定节点呈现的顺序。虽然最近考虑了对欧几里得半空间、线性函数和一般抽象多类假设类别进行自主学习，但以往没有专门针对图中自主节点分类的结果。在本文中，我们解决了这个问题，并开发了有效的算法。更具体地，我们关注（测地）凸聚类的情况，即对于共享相同标签的两个节点，它们之间每条最短路径上的所有节点也共享相同的标签。特别地，我们设计了一个多项式时间算法，在具有两个凸聚类的图上，只会犯$3(h(G)+1)^4 \ln n$个错误，其中$n$是总节点数，$h(G)$是Hadwiger数，即图$G$的最大团子图的大小。我们还展示了我们的算法对于聚类略微非凸的情况是鲁棒的，仍然能够实现与$n$对数成比例的错误限制。最后，对于更标准的同类聚类情况，即强连通节点倾向于属于同一类的情况，我们设计了一个简单而高效的算法。

更新时间: 2024-09-02 19:13:26

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.01428v1

Into the Unknown: Self-Learning Large Language Models

We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through self-assessment of their own hallucinations. We introduce a concept called Point in the Unknown (PiU) to identify atomic knowledge unknown to a model, along with four methods for automatic PiUs identification, facilitating the creation of a self-learning loop that focuses exclusively on the absorption of currently unknown knowledge into the model. Additionally, we developed evaluation metrics to gauge an LLM's self-learning capability. Our experiments revealed that LLMs with at least 3B parameters that have undergone some instruction training would be able to perform self-learning well. We further proved the effectiveness of self-learning by comparing the performance of a model that has undergone self-learning to a model that has not. Our self-learning concept allows more efficient LLM updates and opens new perspectives for LLM knowledge exchange.

Updated: 2024-09-02 19:01:44

标题: 走向未知：自学习大型语言模型

摘要: 我们解决了自学习LLM的主要问题：即要学习什么。我们提出了一个自学习LLM框架，使LLM能够通过对自己的幻觉进行自我评估而独立学习先前未知的知识。我们引入了一个称为“未知点”(PiU)的概念，用于识别模型未知的原子知识，以及四种自动PiU识别方法，促进创建一个专注于将当前未知知识吸收到模型中的自学习循环。此外，我们开发了评估指标来衡量LLM的自学习能力。我们的实验表明，至少具有3B参数并经过一些指导训练的LLM能够很好地进行自学习。通过将经历自学习的模型与未经过自学习的模型的表现进行比较，我们进一步证明了自学习的有效性。我们的自学习概念使LLM更新更加高效，为LLM知识交流开辟了新的视角。

更新时间: 2024-09-02 19:01:44

领域: cs.AI

下载: http://arxiv.org/abs/2402.09147v3

MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning

In visual Reinforcement Learning (RL), learning from pixel-based observations poses significant challenges on sample efficiency, primarily due to the complexity of extracting informative state representations from high-dimensional data. Previous methods such as contrastive-based approaches have made strides in improving sample efficiency but fall short in modeling the nuanced evolution of states. To address this, we introduce MOOSS, a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking to explicitly model state evolution in visual RL. Specifically, we propose a self-supervised dual-component strategy that integrates (1) a graph construction of pixel-based observations for spatial-temporal masking, coupled with (2) a multi-level contrastive learning mechanism that enriches state representations by emphasizing temporal continuity and change of states. MOOSS advances the understanding of state dynamics by disrupting and learning from spatial-temporal correlations, which facilitates policy learning. Our comprehensive evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency, demonstrating the effectiveness of our method. Our code is released at https://github.com/jsun57/MOOSS.

Updated: 2024-09-02 18:57:53

标题: MOOSS：面具增强的时间对比学习，用于视觉强化学习中的平滑状态演变

摘要: 在视觉强化学习（RL）中，从基于像素的观察中学习会带来显著的样本效率挑战，主要是由于从高维数据中提取信息状态表示的复杂性。先前的方法，如基于对比的方法，在改善样本效率方面取得了进展，但在建模状态的微妙演变方面表现不佳。为了解决这一问题，我们引入了MOOSS，这是一个新颖的框架，利用时间对比目标，并借助基于图的时空遮蔽，明确地对视觉RL中的状态演变进行建模。具体来说，我们提出了一种自监督的双组件策略，集成了（1）基于像素观察的图构建用于时空遮蔽，以及（2）一种多级对比学习机制，通过强调状态的时间连续性和变化来丰富状态表示。MOOSS通过破坏和学习空间-时间相关性，推动了对状态动态的理解，从而促进了策略学习。我们在多个连续和离散控制基准上进行了全面评估，结果显示MOOSS在样本效率方面优于先前的最先进的视觉RL方法，证明了我们方法的有效性。我们的代码已发布在https://github.com/jsun57/MOOSS。

更新时间: 2024-09-02 18:57:53

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.02714v1

Erasure Coded Neural Network Inference via Fisher Averaging

Erasure-coded computing has been successfully used in cloud systems to reduce tail latency caused by factors such as straggling servers and heterogeneous traffic variations. A majority of cloud computing traffic now consists of inference on neural networks on shared resources where the response time of inference queries is also adversely affected by the same factors. However, current erasure coding techniques are largely focused on linear computations such as matrix-vector and matrix-matrix multiplications and hence do not work for the highly non-linear neural network functions. In this paper, we seek to design a method to code over neural networks, that is, given two or more neural network models, how to construct a coded model whose output is a linear combination of the outputs of the given neural networks. We formulate the problem as a KL barycenter problem and propose a practical algorithm COIN that leverages the diagonal Fisher information to create a coded model that approximately outputs the desired linear combination of outputs. We conduct experiments to perform erasure coding over neural networks trained on real-world vision datasets and show that the accuracy of the decoded outputs using COIN is significantly higher than other baselines while being extremely compute-efficient.

Updated: 2024-09-02 18:46:26

标题: 通过费舍尔平均实现的消除编码神经网络推断

摘要: 擦除编码计算已成功应用于云系统中，以减少由迟滞服务器和异构流量变化等因素引起的尾部延迟。现在，大多数云计算流量主要由在共享资源上进行的神经网络推理组成，推理查询的响应时间也受到相同因素的不利影响。然而，当前的擦除编码技术主要集中在线性计算（如矩阵-向量和矩阵-矩阵乘法）上，因此不适用于高度非线性的神经网络函数。在本文中，我们试图设计一种方法来对神经网络进行编码，即给定两个或更多神经网络模型，如何构建一个编码模型，其输出是给定神经网络输出的线性组合。我们将这个问题定义为一个KL质心问题，并提出了一个名为COIN的实用算法，利用对角费舍尔信息来创建一个编码模型，该模型近似输出所需的输出线性组合。我们进行了实验，对在真实视觉数据集上训练的神经网络进行擦除编码，并展示了使用COIN解码输出的准确性明显高于其他基线，同时计算效率极高。

更新时间: 2024-09-02 18:46:26

领域: cs.LG

下载: http://arxiv.org/abs/2409.01420v1

Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching

Discovering Ordinary Differential Equations (ODEs) from trajectory data is a crucial task in AI-driven scientific discovery. Recent methods for symbolic discovery of ODEs primarily rely on fixed training datasets collected a-priori, often leading to suboptimal performance, as observed in our experiments in Figure 1. Inspired by active learning, we explore methods for querying informative trajectory data to evaluate predicted ODEs, where data are obtained by the specified initial conditions of the trajectory. Chaos theory indicates that small changes in the initial conditions of a dynamical system can result in vastly different trajectories, necessitating the maintenance of a large set of initial conditions of the trajectory. To address this challenge, we introduce Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching (APPS). Instead of directly selecting individual initial conditions, APPS first identifies an informative region and samples a batch of initial conditions within that region. Compared to traditional active learning methods, APPS eliminates the need for maintaining a large amount of data. Extensive experiments demonstrate that APPS consistently discovers more accurate ODE expressions than baseline methods using passively collected datasets.

Updated: 2024-09-02 18:24:39

标题: 通过相轨迹草图进行常微分方程的主动符号发现

摘要: 从轨迹数据中发现普通微分方程（ODEs）是人工智能驱动的科学发现中的关键任务。最近用于符号发现ODEs的方法主要依赖于预先收集的固定训练数据集，通常导致性能亚优，正如我们在图1的实验中观察到的。受主动学习的启发，我们探索了查询信息轨迹数据以评估预测ODEs的方法，其中数据是通过轨迹的指定初始条件获得的。混沌理论表明，动力系统初始条件的微小变化可能导致截然不同的轨迹，需要维护大量轨迹的初始条件。为了解决这一挑战，我们引入了通过相位图素描的主动符号发现普通微分方程（APPS）。APPS首先识别一个信息丰富的区域并在该区域内取样一批初始条件，而不是直接选择个别初始条件。与传统的主动学习方法相比，APPS消除了维护大量数据的需求。大量实验表明，APPS始终比使用被动收集数据集的基线方法更准确地发现ODE表达式。

更新时间: 2024-09-02 18:24:39

领域: cs.LG,cs.SC

下载: http://arxiv.org/abs/2409.01416v1

Probabilistic Iterative Hard Thresholding for Sparse Learning

For statistical modeling wherein the data regime is unfavorable in terms of dimensionality relative to the sample size, finding hidden sparsity in the ground truth can be critical in formulating an accurate statistical model. The so-called "l0 norm" which counts the number of non-zero components in a vector, is a strong reliable mechanism of enforcing sparsity when incorporated into an optimization problem. However, in big data settings wherein noisy estimates of the gradient must be evaluated out of computational necessity, the literature is scant on methods that reliably converge. In this paper we present an approach towards solving expectation objective optimization problems with cardinality constraints. We prove convergence of the underlying stochastic process, and demonstrate the performance on two Machine Learning problems.

Updated: 2024-09-02 18:14:45

标题: 稀疏学习的概率迭代硬阈值化

摘要: 在统计建模中，当数据维度相对于样本大小不利时，发现地面真实数据中的隐藏稀疏性对于制定准确的统计模型至关重要。所谓的“l0范数”是一个强有力可靠的机制，用于在优化问题中强制稀疏性时计算向量中非零分量的数量。然而，在大数据环境中，由于计算的需要，必须评估梯度的噪声估计，关于可靠收敛的方法在文献中很少见。在本文中，我们提出了一种解决带基数约束的期望目标优化问题的方法。我们证明了基础随机过程的收敛性，并展示了在两个机器学习问题上的性能。

更新时间: 2024-09-02 18:14:45

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2409.01413v1

Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network Design

We introduce the first, to our knowledge, rigorous approach that enables multi-agent networks to self-configure their communication topology to balance the trade-off between scalability and optimality during multi-agent planning. We are motivated by the future of ubiquitous collaborative autonomy where numerous distributed agents will be coordinating via agent-to-agent communication to execute complex tasks such as traffic monitoring, event detection, and environmental exploration. But the explosion of information in such large-scale networks currently curtails their deployment due to impractical decision times induced by the computational and communication requirements of the existing near-optimal coordination algorithms. To overcome this challenge, we present the AlterNAting COordination and Network-Design Algorithm (Anaconda), a scalable algorithm that also enjoys near-optimality guarantees. Subject to the agents' bandwidth constraints, Anaconda enables the agents to optimize their local communication neighborhoods such that the action-coordination approximation performance of the network is maximized. Compared to the state of the art, Anaconda is an anytime self-configurable algorithm that quantifies its suboptimality guarantee for any type of network, from fully disconnected to fully centralized, and that, for sparse networks, is one order faster in terms of decision speed. To develop the algorithm, we quantify the suboptimality cost due to decentralization, i.e., due to communication-minimal distributed coordination. We also employ tools inspired by the literature on multi-armed bandits and submodular maximization subject to cardinality constraints. We demonstrate Anaconda in simulated scenarios of area monitoring and compare it with a state-of-the-art algorithm.

Updated: 2024-09-02 18:11:33

标题: 性能感知的自配置多智能体网络：一种用于同时协调和网络设计的分布式次模块方法

摘要: 我们介绍了第一个严格的方法，可以使多代理网络自我配置其通信拓扑，在多代理规划期间平衡可扩展性和最优性之间的权衡。我们受到了普遍协作自治的未来的启发，在那里许多分布式代理将通过代理之间的通信协调执行复杂任务，例如交通监控，事件检测和环境探索。但是，这种大规模网络中信息的爆炸目前由于现有近似最优协调算法的计算和通信要求引起的不切实际的决策时间而限制了它们的部署。为了克服这一挑战，我们提出了交替协调和网络设计算法（Anaconda），这是一种可扩展的算法，同时也具有近似最优性保证。在代理的带宽约束下，Anaconda使代理能够优化其本地通信邻域，从而最大化网络的动作协调近似性能。与最先进技术相比，Anaconda是一种任意自配置算法，可以为任何类型的网络量化其次优性保证，从完全断开到完全集中，对于稀疏网络而言，在决策速度方面快一个数量级。为了开发该算法，我们量化了由于分权化而产生的次优性成本，即由于通信最小化的分布式协调。我们还采用了受多臂老虎机和受基数约束的子模最大化文献启发的工具。我们在区域监控的模拟场景中展示了Anaconda，并将其与一种最先进的算法进行了比较。

更新时间: 2024-09-02 18:11:33

领域: eess.SY,cs.AI,cs.MA,cs.RO,cs.SY,math.OC

下载: http://arxiv.org/abs/2409.01411v1

Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather than oriented towards any particular learning task. In this work, we present a formal model of DD, arguing that a precise characterization of the underlying optimization problem must specify the inference task associated with the application of interest. Without this task-specific focus, the DD problem is under-specified, and the selection of a DD algorithm for a particular task is merely heuristic. Our formalization reveals novel applications of DD across different modeling environments. We analyze existing DD methods through this broader lens, highlighting their strengths and limitations in terms of accuracy and faithfulness to optimal DD operation. Finally, we present numerical results for two case studies important in contemporary settings. Firstly, we address a critical challenge in medical data analysis: merging the knowledge from different datasets composed of intersecting, but not identical, sets of features, in order to construct a larger dataset in what is usually a small sample setting. Secondly, we consider out-of-distribution error across boundary conditions for physics-informed neural networks (PINNs), showing the potential for DD to provide more physically faithful data. By establishing this general formulation of DD, we aim to establish a new research paradigm by which DD can be understood and from which new DD techniques can arise.

Updated: 2024-09-02 18:11:15

标题: 基于第一原理的数据集提炼：整合核心信息提取和有意义的学习

摘要: 数据集精炼（DD）是一种越来越重要的技术，它专注于构建一个能够捕捉训练数据核心信息的合成数据集，以实现在后续模型训练中获得可比较的性能。虽然DD具有广泛的应用，但支持它的理论发展较少。新的DD方法在一组共同的基准上进行比较，而不是针对任何特定的学习任务。在这项工作中，我们提出了一个DD的形式模型，认为对基本优化问题的精确描述必须指定与感兴趣的应用相关联的推理任务。缺乏这种任务特定的关注，DD问题就是不完全指定的，而为特定任务选择DD算法仅仅是一种启发式方法。我们的形式化揭示了DD在不同建模环境中的新应用。我们通过这个更广泛的视角分析现有的DD方法，突出它们在准确性和对最优DD操作的忠实度方面的优势和局限性。最后，我们提供了两个在当代环境中重要的案例研究的数值结果。首先，我们解决了医疗数据分析中的一个关键挑战：将由相交但不完全相同特征集组成的不同数据集中的知识合并，以构建一个更大的数据集，在通常是小样本设置中。其次，我们考虑了物理信息神经网络（PINNs）在边界条件下的分布之外的错误，展示了DD提供更加符合物理的数据的潜力。通过建立这种DD的一般公式化，我们旨在建立一个新的研究范式，通过这个范式可以理解DD，并从中产生新的DD技术。

更新时间: 2024-09-02 18:11:15

领域: cs.LG,stat.CO

下载: http://arxiv.org/abs/2409.01410v1

$\mathtt{emuflow}$: Normalising Flows for Joint Cosmological Analysis

Given the growth in the variety and precision of astronomical datasets of interest for cosmology, the best cosmological constraints are invariably obtained by combining data from different experiments. At the likelihood level, one complication in doing so is the need to marginalise over large-dimensional parameter models describing the data of each experiment. These include both the relatively small number of cosmological parameters of interest and a large number of "nuisance" parameters. Sampling over the joint parameter space for multiple experiments can thus become a very computationally expensive operation. This can be significantly simplified if one could sample directly from the marginal cosmological posterior distribution of preceding experiments, depending only on the common set of cosmological parameters. In this paper, we show that this can be achieved by emulating marginal posterior distributions via normalising flows. The resulting trained normalising flow models can be used to efficiently combine cosmological constraints from independent datasets without increasing the dimensionality of the parameter space under study. We show that the method is able to accurately describe the posterior distribution of real cosmological datasets, as well as the joint distribution of different datasets, even when significant tension exists between experiments. The resulting joint constraints can be obtained in a fraction of the time it would take to combine the same datasets at the level of their likelihoods. We construct normalising flow models for a set of public cosmological datasets of general interests and make them available, together with the software used to train them, and to exploit them in cosmological parameter inference.

Updated: 2024-09-02 18:04:14

标题: $\mathtt{emuflow}$：用于联合宇宙学分析的正规流

摘要: 鉴于天文数据集的种类和精度不断增长，对宇宙学感兴趣的最佳宇宙学约束通常是通过结合不同实验的数据获得的。在似然性水平上，这样做的一个复杂性在于需要对描述每个实验数据的大维参数模型进行边缘化。这些参数包括感兴趣的相对较少的宇宙学参数和大量的“扰动”参数。对多个实验的联合参数空间进行采样可能会变得非常计算昂贵。如果能够直接从先前实验的边缘宇宙学后验分布中进行采样，仅依赖于共同的宇宙学参数，这种情况可以显著简化。在本文中，我们展示了通过模拟归一化流可以实现这一点。经过训练的归一化流模型可以有效地结合独立数据集的宇宙学约束，而不增加所研究参数空间的维度。我们展示了该方法能够准确描述真实宇宙学数据集的后验分布，以及不同数据集之间的联合分布，即使实验之间存在显著的张力。通过这种方法，可以在相同数据集的似然性水平上结合这些数据集所需时间的一小部分时间内获得联合约束。我们为一组公共宇宙学数据集构建了归一化流模型，并提供了用于训练和利用这些模型进行宇宙学参数推断的软件。

更新时间: 2024-09-02 18:04:14

领域: astro-ph.CO,cs.LG

下载: http://arxiv.org/abs/2409.01407v1

Globally Stable Neural Imitation Policies

Imitation learning presents an effective approach to alleviate the resource-intensive and time-consuming nature of policy learning from scratch in the solution space. Even though the resulting policy can mimic expert demonstrations reliably, it often lacks predictability in unexplored regions of the state-space, giving rise to significant safety concerns in the face of perturbations. To address these challenges, we introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees. We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem, and jointly train the policy and its corresponding Lyapunov candidate to ensure global stability. We validate our approach by conducting extensive experiments in simulation and successfully deploying the trained policies on a real-world manipulator arm. The experimental results demonstrate that our method overcomes the instability, accuracy, and computational intensity problems associated with previous imitation learning methods, making our method a promising solution for stable policy learning in complex planning scenarios.

Updated: 2024-09-02 18:03:26

标题: 全球稳定的神经模仿政策

摘要: 模仿学习是缓解从零开始学习策略所需资源和耗时性质的有效方法。尽管结果策略可以可靠地模仿专家演示，但在状态空间未探索的区域通常缺乏可预测性，这在面对扰动时会引发重大安全问题。为了解决这些挑战，我们引入了稳定神经动力系统（SNDS），这是一种模仿学习制度，可以产生具有正式稳定性保证的策略。我们采用了一个神经网络策略架构，便于基于李亚普诺夫定理表示稳定性，并联合训练策略及其对应的李亚普诺夫候选者，以确保全局稳定性。我们通过在仿真中进行大量实验验证了我们的方法，并成功将训练好的策略部署到现实世界中的机械臂上。实验结果表明，我们的方法克服了以往模仿学习方法所面临的不稳定性、准确性和计算密集度问题，使我们的方法成为复杂规划场景中稳定策略学习的一个有前途的解决方案。

更新时间: 2024-09-02 18:03:26

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2403.04118v2

AI-Assisted Generation of Difficult Math Questions

Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core "skills" from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an "out of distribution" task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH$^2$ - a dataset of higher-quality math questions, as evidenced by: (a) Lower performance of all models on MATH$^2$ than on MATH (b) Higher performance on MATH when using MATH$^2$ questions as in-context examples. Although focused on mathematics, our methodology seems applicable to other domains requiring structured reasoning, and potentially as a component of scalable oversight. Also of interest is a striking relationship observed between models' performance on the new dataset: the success rate on MATH$^2$ is the square on MATH, suggesting that successfully solving the question in MATH$^2$ requires a nontrivial combination of two distinct math skills.

Updated: 2024-09-02 18:01:44

标题: 人工智能辅助生成困难数学问题

摘要: 当前LLM培训将数学推理定位为核心能力。尽管公开可用的资源已被充分利用，但对多样化和具有挑战性的数学问题仍存在未满足的需求。仅依赖人类专家既耗时又昂贵，而LLM生成的问题往往缺乏必要的多样性和难度。我们提出了一种设计框架，结合了LLM的优势和人机协作方法，生成多样化且具有挑战性的数学问题。我们利用强大LLM的元认知技能[Didolkar等人，2024年]从现有数学数据集中提取核心“技能”。这些技能作为通过随机对LLM提示的核心技能对生成新颖和困难问题的基础。每个问题中使用两种不同的技能使得对LLM和人类来说寻找这样的问题成为一项“分布外”任务。我们的流程通过LLM迭代生成和完善问题和解决方案，通过多轮提示。然后，人类标注者验证并进一步完善问题，通过进一步的LLM交互提高效率。将这一流程应用于从MATH数据集[Hendrycks等人，2021年]中提取的技能上，产生了MATH$^2$ - 一个高质量的数学问题数据集，证明如下：(a) 在MATH$^2$上所有模型的性能低于MATH上的性能；(b) 使用MATH$^2$问题作为上下文示例时，在MATH上的性能更高。虽然我们专注于数学，但我们的方法似乎适用于其他需要结构化推理的领域，并有可能作为可扩展监督的组成部分。同样令人感兴趣的是观察到的模型在新数据集上的表现之间的显著关系：在MATH$^2$上的成功率是MATH的平方，这表明成功解决MATH$^2$中的问题需要两种不同数学技能的非平凡组合。

更新时间: 2024-09-02 18:01:44

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.21009v2

Optimal training of finitely-sampled quantum reservoir computers for forecasting of chaotic dynamics

In the current Noisy Intermediate Scale Quantum (NISQ) era, the presence of noise deteriorates the performance of quantum computing algorithms. Quantum Reservoir Computing (QRC) is a type of Quantum Machine Learning algorithm, which, however, can benefit from different types of tuned noise. In this paper, we analyse the effect that finite-sampling noise has on the chaotic time-series prediction capabilities of QRC and Recurrence-free Quantum Reservoir Computing (RF-QRC). First, we show that, even without a recurrent loop, RF-QRC contains temporal information about previous reservoir states using leaky integrated neurons. This makes RF-QRC different from Quantum Extreme Learning Machines (QELM). Second, we show that finite sampling noise degrades the prediction capabilities of both QRC and RF-QRC while affecting QRC more due to the propagation of noise. Third, we optimize the training of the finite-sampled quantum reservoir computing framework using two methods: (a) Singular Value Decomposition (SVD) applied to the data matrix containing noisy reservoir activation states; and (b) data-filtering techniques to remove the high-frequencies from the noisy reservoir activation states. We show that denoising reservoir activation states improve the signal-to-noise ratios with smaller training loss. Finally, we demonstrate that the training and denoising of the noisy reservoir activation signals in RF-QRC are highly parallelizable on multiple Quantum Processing Units (QPUs) as compared to the QRC architecture with recurrent connections. The analyses are numerically showcased on prototypical chaotic dynamical systems with relevance to turbulence. This work opens opportunities for using quantum reservoir computing with finite samples for time-series forecasting on near-term quantum hardware.

Updated: 2024-09-02 17:51:48

标题: 有限采样量量子储备计算机用于混沌动力学预测的最佳训练

摘要: 在当前的嘈杂中间尺度量子（NISQ）时代，噪音的存在使量子计算算法的性能下降。量子水库计算（QRC）是一种量子机器学习算法，然而，它可以从不同类型的调整噪音中受益。在本文中，我们分析了有限采样噪音对QRC和无复发的量子水库计算（RF-QRC）的混沌时间序列预测能力的影响。首先，我们展示了即使没有循环环，RF-QRC也使用漏积分神经元包含有关先前水库状态的时间信息。这使得RF-QRC与量子极限学习机（QELM）不同。其次，我们展示了有限采样噪音降低了QRC和RF-QRC的预测能力，同时由于噪音的传播，影响QRC更多。第三，我们通过两种方法优化了有限采样量子水库计算框架的训练：（a）奇异值分解（SVD）应用于包含有噪水库激活状态的数据矩阵；和（b）数据过滤技术以去除嘈杂水库激活状态中的高频率。我们展示了去噪水库激活状态可以提高信噪比并减小训练损失。最后，我们证明了在RF-QRC中训练和去噪嘈杂水库激活信号与具有循环连接的QRC架构相比，可以在多个量子处理单元（QPUs）上高度并行化。这些分析在与湍流相关的典型混沌动力系统上进行了数值展示。这项工作为在近期量子硬件上使用具有有限样本的量子水库计算进行时间序列预测开辟了机会。

更新时间: 2024-09-02 17:51:48

领域: quant-ph,cs.LG,nlin.CD

下载: http://arxiv.org/abs/2409.01394v1

Discovering Governing equations from Graph-Structured Data by Sparse Identification of Nonlinear Dynamical Systems

The combination of machine learning (ML) and sparsity-promoting techniques is enabling direct extraction of governing equations from data, revolutionizing computational modeling in diverse fields of science and engineering. The discovered dynamical models could be used to address challenges in climate science, neuroscience, ecology, finance, epidemiology, and beyond. However, most existing sparse identification methods for discovering dynamical systems treat the whole system as one without considering the interactions between subsystems. As a result, such models are not able to capture small changes in the emergent system behavior. To address this issue, we developed a new method called Sparse Identification of Nonlinear Dynamical Systems from Graph-structured data (SINDyG), which incorporates the network structure into sparse regression to identify model parameters that explain the underlying network dynamics. SINDyG discovers the governing equations of network dynamics while offering improvements in accuracy and model simplicity.

Updated: 2024-09-02 17:51:37

标题: 从图结构化数据中通过稀疏非线性动力系统识别发现控制方程

摘要: 机器学习（ML）和稀疏促进技术的结合使得可以直接从数据中提取控制方程，从而在科学和工程领域的计算建模中实现革命性变革。发现的动力学模型可以用于解决气候科学、神经科学、生态学、金融、流行病学等领域的挑战。然而，大多数现有的用于发现动力系统的稀疏识别方法将整个系统视为一个整体，而不考虑子系统之间的相互作用。因此，这种模型无法捕捉新兴系统行为中的微小变化。为了解决这个问题，我们开发了一种新方法，名为从图结构化数据中稀疏识别非线性动力系统（SINDyG），该方法将网络结构纳入稀疏回归中，以识别解释基础网络动态的模型参数。SINDyG发现了网络动态的控制方程，同时提供了精确性和模型简单性的改进。

更新时间: 2024-09-02 17:51:37

领域: eess.SY,cs.CE,cs.LG,cs.SY

下载: http://arxiv.org/abs/2409.04463v1

GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI

Much previous AI research has focused on developing monolithic models to maximize their intelligence and capability, with the primary goal of enhancing performance on specific tasks. In contrast, this paper explores an alternative approach: collaborative AI systems that use workflows to integrate models, data sources, and pipelines to solve complex and diverse tasks. We introduce GenAgent, an LLM-based framework that automatically generates complex workflows, offering greater flexibility and scalability compared to monolithic models. The core innovation of GenAgent lies in representing workflows with code, alongside constructing workflows with collaborative agents in a step-by-step manner. We implement GenAgent on the ComfyUI platform and propose a new benchmark, OpenComfy. The results demonstrate that GenAgent outperforms baseline approaches in both run-level and task-level evaluations, showing its capability to generate complex workflows with superior effectiveness and stability.

Updated: 2024-09-02 17:44:10

标题: GenAgent：利用自动化工作流生成构建协作式人工智能系统——以ComfyUI为例进行案例研究

摘要: 许多先前的人工智能研究都集中在开发单一模型以最大化其智能和能力，主要目标是提高特定任务的性能。相比之下，本文探讨了一种替代方法：协作人工智能系统，利用工作流程集成模型、数据源和管道来解决复杂和多样化的任务。我们介绍了GenAgent，这是一个基于LLM的框架，能够自动生成复杂的工作流程，相较于单一模型具有更大的灵活性和可扩展性。GenAgent的核心创新在于用代码表示工作流程，同时以逐步方式构建具有协作代理的工作流程。我们在ComfyUI平台上实现了GenAgent，并提出了一个新的基准测试OpenComfy。结果表明，GenAgent在运行级和任务级评估中均优于基线方法，展示了其生成具有优越效果和稳定性的复杂工作流程的能力。

更新时间: 2024-09-02 17:44:10

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.01392v1

Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability

Temporal knowledge graphs (TKGs) are valuable resources for capturing evolving relationships among entities, yet they are often plagued by noise, necessitating robust anomaly detection mechanisms. Existing dynamic graph anomaly detection approaches struggle to capture the rich semantics introduced by node and edge categories within TKGs, while TKG embedding methods lack interpretability, undermining the credibility of anomaly detection. Moreover, these methods falter in adapting to pattern changes and semantic drifts resulting from knowledge updates. To tackle these challenges, we introduce AnoT, an efficient TKG summarization method tailored for interpretable online anomaly detection in TKGs. AnoT begins by summarizing a TKG into a novel rule graph, enabling flexible inference of complex patterns in TKGs. When new knowledge emerges, AnoT maps it onto a node in the rule graph and traverses the rule graph recursively to derive the anomaly score of the knowledge. The traversal yields reachable nodes that furnish interpretable evidence for the validity or the anomalous of the new knowledge. Overall, AnoT embodies a detector-updater-monitor architecture, encompassing a detector for offline TKG summarization and online scoring, an updater for real-time rule graph updates based on emerging knowledge, and a monitor for estimating the approximation error of the rule graph. Experimental results on four real-world datasets demonstrate that AnoT surpasses existing methods significantly in terms of accuracy and interoperability. All of the raw datasets and the implementation of AnoT are provided in https://github.com/zjs123/ANoT.

Updated: 2024-09-02 17:41:24

标题: 在线检测具有可解释性的时间知识图中的异常

摘要: 时间知识图谱（TKGs）是捕捉实体之间演化关系的宝贵资源，然而经常受到噪音干扰，需要强大的异常检测机制。现有的动态图异常检测方法往往难以捕捉TKGs中节点和边类别引入的丰富语义，而TKG嵌入方法缺乏可解释性，削弱了异常检测的可信度。此外，这些方法在适应由知识更新导致的模式变化和语义漂移方面表现不佳。为了应对这些挑战，我们引入了AnoT，一种针对TKG的可解释在线异常检测的高效总结方法。AnoT首先将TKG总结为一个新颖的规则图，使得能够灵活推断TKG中的复杂模式。当新知识出现时，AnoT将其映射到规则图中的一个节点，并递归遍历规则图以推导知识的异常分数。遍历产生可达节点，为新知识的有效性或异常性提供可解释的证据。总体而言，AnoT体现了一个检测器-更新器-监控器架构，包括用于离线TKG总结和在线评分的检测器，用于基于新知识进行实时规则图更新的更新器，以及用于估计规则图的近似误差的监控器。在四个真实数据集上的实验结果表明，AnoT在准确性和互操作性方面显著优于现有方法。所有原始数据集和AnoT的实现均在https://github.com/zjs123/ANoT中提供。

更新时间: 2024-09-02 17:41:24

领域: cs.AI,cs.DB,cs.LG

下载: http://arxiv.org/abs/2408.00872v2

VLSI Hypergraph Partitioning with Deep Learning

Partitioning is a known problem in computer science and is critical in chip design workflows, as advancements in this area can significantly influence design quality and efficiency. Deep Learning (DL) techniques, particularly those involving Graph Neural Networks (GNNs), have demonstrated strong performance in various node, edge, and graph prediction tasks using both inductive and transductive learning methods. A notable area of recent interest within GNNs are pooling layers and their application to graph partitioning. While these methods have yielded promising results across social, computational, and other random graphs, their effectiveness has not yet been explored in the context of VLSI hypergraph netlists. In this study, we introduce a new set of synthetic partitioning benchmarks that emulate real-world netlist characteristics and possess a known upper bound for solution cut quality. We distinguish these benchmarks with the prior work and evaluate existing state-of-the-art partitioning algorithms alongside GNN-based approaches, highlighting their respective advantages and disadvantages.

Updated: 2024-09-02 17:32:01

标题: 利用深度学习进行VLSI超图分区

摘要: 分区是计算机科学中已知的问题，在芯片设计工作流程中至关重要，因为在这一领域的进展可以显著影响设计质量和效率。深度学习（DL）技术，特别是涉及图神经网络（GNNs）的技术，在各种节点、边和图预测任务中展现出强大的性能，使用归纳和传导学习方法。最近GNNs中一个备受关注的领域是池化层及其在图分区中的应用。虽然这些方法在社交、计算和其他随机图中产生了有希望的结果，但它们的有效性尚未在VLSI超图网表的情况下进行探讨。在本研究中，我们引入了一组新的合成分区基准，模拟真实世界网表特性，并具有已知的解决方案切割质量的上界。我们将这些基准与先前的工作区分开，并评估现有的最先进的分区算法以及基于GNN的方法，突出它们各自的优势和劣势。

更新时间: 2024-09-02 17:32:01

领域: cs.AR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2409.01387v1

Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku

Using Large Language Models (LLMs) has gained popularity among software developers for generating source code. However, the use of LLM-generated code can introduce risks of adding suboptimal, defective, and vulnerable code. This makes it necessary to devise methods for the accurate detection of LLM-generated code. Toward this goal, we perform a case study of Claude 3 Haiku (or Claude 3 for brevity) on CodeSearchNet dataset. We divide our analyses into two parts: function-level and class-level. We extract 22 software metric features, such as Code Lines and Cyclomatic Complexity, for each level of granularity. We then analyze code snippets generated by Claude 3 and their human-authored counterparts using the extracted features to understand how unique the code generated by Claude 3 is. In the following step, we use the unique characteristics of Claude 3-generated code to build Machine Learning (ML) models and identify which features of the code snippets make them more detectable by ML models. Our results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.

Updated: 2024-09-02 17:25:15

标题: LLM生成的代码的自动检测：Claude 3 Haiku案例研究

摘要: 使用大型语言模型（LLM）在软件开发人员中越来越受欢迎，用于生成源代码。然而，使用LLM生成的代码可能会引入添加次优、有缺陷和易受攻击的代码的风险。这使得有必要设计方法来准确检测LLM生成的代码。为了实现这一目标，我们在CodeSearchNet数据集上对Claude 3 Haiku（简称为Claude 3）进行了案例研究。我们将我们的分析分为两部分：函数级和类级。我们提取了22个软件度量特征，如代码行数和圈复杂度，以便了解每个粒度级别的特征。然后，我们使用提取的特征分析由Claude 3生成的代码片段及其人工编写的对应代码，以了解Claude 3生成的代码有多独特。在接下来的步骤中，我们使用Claude 3生成的代码的独特特征构建机器学习（ML）模型，并确定代码片段的哪些特征使它们更容易被ML模型检测到。我们的结果表明，Claude 3倾向于生成更长的函数，但比人类生成更短的类，这种特征可以用于使用ML模型检测Claude 3生成的代码，在函数级和类级代码片段的准确度分别为82％和66％。

更新时间: 2024-09-02 17:25:15

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.01382v1

Membership Inference Attacks Against In-Context Learning

Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.

Updated: 2024-09-02 17:23:23

标题: 对上下文学习的成员推断攻击

摘要: 将大型语言模型（LLMs）适应特定任务引入了对计算效率的担忧，促使探索高效方法，如上下文学习（ICL）。然而，在现实假设下，ICL对隐私攻击的脆弱性仍然大部分未被探索。在这项工作中，我们提出了第一个专为ICL定制的成员推断攻击，仅依赖于生成的文本而没有相关的概率。我们提出了四种针对不同受限情景的攻击策略，并在四种流行的大型语言模型上进行了广泛实验。实证结果表明，我们的攻击可以准确确定大多数情况下的成员身份状态，例如，对LLaMA的95\%准确度优势，表明相关风险比现有基于概率的攻击所展示的要高得多。此外，我们提出了一种混合攻击，综合了前述策略的优势，在大多数情况下实现了超过95\%的准确度优势。此外，我们调查了针对数据、指令和输出的三种潜在防御方法。结果表明，结合来自正交维度的防御显著降低了隐私泄露，并提供了增强的隐私保证。

更新时间: 2024-09-02 17:23:23

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2409.01380v1

Content, Nudges and Incentives: A Study on the Effectiveness and Perception of Embedded Phishing Training

A common form of phishing training in organizations is the use of simulated phishing emails to test employees' susceptibility to phishing attacks, and the immediate delivery of training material to those who fail the test. This widespread practice is dubbed embedded training; however, its effectiveness in decreasing the likelihood of employees falling for phishing again in the future is questioned by the contradictory findings of several recent field studies. We investigate embedded phishing training in three aspects. First, we observe that the practice incorporates different components -- knowledge gains from its content, nudges and reminders from the test itself, and the deterrent effect of potential consequences -- our goal is to study which ones are more effective, if any. Second, we explore two potential improvements to training, namely its timing and the use of incentives. Third, we analyze employees' reception and perception of the practice. For this, we conducted a large-scale mixed-methods (quantitative and qualitative) study on the employees of a partner company. Our study contributes several novel findings on the training practice: in particular, its effectiveness comes from its nudging effect, i.e., the periodic reminder of the threat rather than from its content, which is rarely consumed by employees due to lack of time and perceived usefulness. Further, delaying training to ease time pressure is as effective as currently established practices, while rewards do not improve secure behavior. Finally, some of our results support previous findings with increased ecological validity, e.g., that phishing is an attention problem, rather than a knowledge one, even for the most susceptible employees, and thus enforcing training does not help.

Updated: 2024-09-02 17:17:44

标题: 内容、助推和激励：嵌入式钓鱼训练的有效性和感知研究

摘要: 组织中常见的网络钓鱼培训形式是利用模拟网络钓鱼邮件测试员工对网络钓鱼攻击的易受性，并立即向未通过测试的员工提供培训材料。这种广泛的做法被称为嵌入式培训；然而，根据几项最近的现场研究发现，它在降低员工未来再次受网络钓鱼攻击的可能性方面的有效性受到质疑。我们从三个方面研究了嵌入式网络钓鱼培训。首先，我们观察到这种做法包含不同的组成部分--内容的知识收益，测试本身的劝导和提醒，以及潜在后果的威慑效果--我们的目标是研究哪些更有效，如果有的话。其次，我们探讨了两种潜在的培训改进，即培训的时机和激励措施的使用。第三，我们分析了员工对这种实践的接受和感知。为此，我们对一家合作公司的员工进行了大规模的混合方法（定量和定性）研究。我们的研究在培训实践方面提供了一些新颖的发现：特别是其有效性来自于其劝导效果，即定期提醒威胁，而不是来自于内容，因为员工很少消化内容，因为他们缺乏时间和认为内容无用。此外，推迟培训以减轻时间压力与当前已建立的做法一样有效，而奖励并不会改善安全行为。最后，我们的一些结果支持以前的发现，并增加了生态效度，例如，网络钓鱼是一个注意力问题，而不是知识问题，即使对于最易受攻击的员工也是如此，因此强制培训并不会有帮助。

更新时间: 2024-09-02 17:17:44

领域: cs.CR

下载: http://arxiv.org/abs/2409.01378v1

NeuFair: Neural Network Fairness Repair with Dropout

This paper investigates neuron dropout as a post-processing bias mitigation for deep neural networks (DNNs). Neural-driven software solutions are increasingly applied in socially critical domains with significant fairness implications. While neural networks are exceptionally good at finding statistical patterns from data, they may encode and amplify existing biases from the historical data. Existing bias mitigation algorithms often require modifying the input dataset or the learning algorithms. We posit that the prevalent dropout methods that prevent over-fitting during training by randomly dropping neurons may be an effective and less intrusive approach to improve the fairness of pre-trained DNNs. However, finding the ideal set of neurons to drop is a combinatorial problem. We propose NeuFair, a family of post-processing randomized algorithms that mitigate unfairness in pre-trained DNNs via dropouts during inference after training. Our randomized search is guided by an objective to minimize discrimination while maintaining the model's utility. We show that our design of randomized algorithms is effective and efficient in improving fairness (up to 69%) with minimal or no model performance degradation. We provide intuitive explanations of these phenomena and carefully examine the influence of various hyperparameters of search algorithms on the results. Finally, we empirically and conceptually compare NeuFair to different state-of-the-art bias mitigators.

Updated: 2024-09-02 17:13:22

标题: NeuFair：使用Dropout修复神经网络公平性

摘要: 本文研究了神经元丢失作为深度神经网络（DNN）后处理偏差缓解的方法。神经驱动的软件解决方案越来越多地应用于具有重大公平性影响的社会关键领域。虽然神经网络在从数据中找到统计模式方面表现出色，但它们可能会对历史数据中存在的偏见进行编码和放大。现有的偏差缓解算法通常需要修改输入数据集或学习算法。我们认为在训练过程中通过随机删除神经元来防止过拟合的流行的dropout方法可能是改善预训练DNN公平性的一种有效且不太侵入的方法。然而，找到要删除的理想神经元集合是一个组合问题。我们提出NeuFair，这是一组后处理随机算法，通过在训练后进行推断期间的dropout来减轻预训练DNN中的不公平性。我们的随机搜索受到一个旨在最小化歧视的客观指导，同时保持模型的效用。我们展示了我们设计的随机算法在提高公平性方面是有效且高效的（最高可达69%），并且几乎没有或没有模型性能下降。我们提供了这些现象的直观解释，并仔细检查了搜索算法的各种超参数对结果的影响。最后，我们在实证和概念上将NeuFair与不同的最新偏差缓解工具进行了比较。

更新时间: 2024-09-02 17:13:22

领域: cs.LG,cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.04268v3

An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs

Large language models (LLMs) have shown strong arithmetic reasoning capabilities when prompted with Chain-of-Thought (CoT) prompts. However, we have only a limited understanding of how they are processed by LLMs. To demystify it, prior work has primarily focused on ablating different components in the CoT prompt and empirically observing their resulting LLM performance change. Yet, the reason why these components are important to LLM reasoning is not explored. To fill this gap, in this work, we investigate ``neuron activation'' as a lens to provide a unified explanation to observations made by prior work. Specifically, we look into neurons within the feed-forward layers of LLMs that may have activated their arithmetic reasoning capabilities, using Llama2 as an example. To facilitate this investigation, we also propose an approach based on GPT-4 to automatically identify neurons that imply arithmetic reasoning. Our analyses revealed that the activation of reasoning neurons in the feed-forward layers of an LLM can explain the importance of various components in a CoT prompt, and future research can extend it for a more complete understanding.

Updated: 2024-09-02 17:12:48

标题: 对LLMs的算术推理引发思维链的神经元激活的研究

摘要: 大型语言模型（LLMs）在接收链式思维（CoT）提示时展现出强大的算术推理能力。然而，我们对它们如何被LLMs处理的理解仅有限。为了揭开神秘面纱，先前的研究主要集中在消除CoT提示中的不同组件，并经验性地观察它们对LLM性能的影响。然而，这些组件为什么对LLM推理重要并未被探讨。为了填补这一空白，在这项工作中，我们通过“神经元激活”作为一个镜头，提供了一个统一的解释来解释先前研究所做的观察。具体来说，我们研究了可能激活其算术推理能力的LLMs前馈层内的神经元，以Llama2为例。为了促进这项研究，我们还提出了一种基于GPT-4的方法，自动识别暗示算术推理的神经元。我们的分析表明，LLMs前馈层中推理神经元的激活可以解释CoT提示中各种组件的重要性，未来研究可以扩展这一理解以获得更完整的认识。

更新时间: 2024-09-02 17:12:48

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2406.12288v3

H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark

The Abstraction and Reasoning Corpus (ARC) is a visual program synthesis benchmark designed to test challenging out-of-distribution generalization in humans and machines. Since 2019, limited progress has been observed on the challenge using existing artificial intelligence methods. Comparing human and machine performance is important for the validity of the benchmark. While previous work explored how well humans can solve tasks from the ARC benchmark, they either did so using only a subset of tasks from the original dataset, or from variants of ARC, and therefore only provided a tentative estimate of human performance. In this work, we obtain a more robust estimate of human performance by evaluating 1729 humans on the full set of 400 training and 400 evaluation tasks from the original ARC problem set. We estimate that average human performance lies between 73.3% and 77.2% correct with a reported empirical average of 76.2% on the training set, and between 55.9% and 68.9% correct with a reported empirical average of 64.2% on the public evaluation set. However, we also find that 790 out of the 800 tasks were solvable by at least one person in three attempts, suggesting that the vast majority of the publicly available ARC tasks are in principle solvable by typical crowd-workers recruited over the internet. Notably, while these numbers are slightly lower than earlier estimates, human performance still greatly exceeds current state-of-the-art approaches for solving ARC. To facilitate research on ARC, we publicly release our dataset, called H-ARC (human-ARC), which includes all of the submissions and action traces from human participants.

Updated: 2024-09-02 17:11:32

标题: H-ARC：人类在抽象和推理语料库基准测试中的稳健表现评估

摘要: 抽象和推理语料库（ARC）是一个视觉程序综合基准，旨在测试人类和机器在具有挑战性的分布外泛化方面的性能。自2019年以来，使用现有人工智能方法在挑战中观察到了有限的进展。比较人类和机器的性能对于基准的有效性至关重要。尽管先前的研究探讨了人类在ARC基准中解决任务的能力，但他们要么仅使用原始数据集的子集，要么使用ARC的变体，因此仅提供了对人类性能的初步估计。在这项工作中，我们通过对1729名人类在原始ARC问题集的400个训练任务和400个评估任务上进行评估，获得了更稳健的人类性能估计。我们估计平均人类表现在73.3%至77.2%之间，报告的经验平均值为76.2%，在训练集上为55.9%至68.9%之间，报告的经验平均值为64.2%。但是，我们还发现，在三次尝试中，至少有一个人可以解决800个任务中的790个任务，这表明绝大多数公开可用的ARC任务原则上可以由在互联网上招募的典型众包工作者解决。值得注意的是，尽管这些数字略低于之前的估计，人类表现仍然远远超过当前解决ARC的最先进方法。为了促进ARC的研究，我们公开发布了我们的数据集，称为H-ARC（人类-ARC），其中包括所有来自人类参与者的提交和操作追踪。

更新时间: 2024-09-02 17:11:32

领域: cs.AI

下载: http://arxiv.org/abs/2409.01374v1

Privacy-Aware Document Visual Question Answering

Document Visual Question Answering (DocVQA) has quickly grown into a central task of document understanding. But despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees. In this work, we explore privacy in the domain of DocVQA for the first time, highlighting privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions. Specifically, we focus on invoice processing as a realistic document understanding scenario, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the data of the invoice provider is the sensitive information to be protected. We demonstrate that non-private models tend to memorise, a behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through either or both of the two input modalities: vision (document image) or language (OCR tokens). Finally, we design attacks exploiting the memorisation effect of the model, and demonstrate their effectiveness in probing a representative DocVQA models.

Updated: 2024-09-02 17:00:21

标题: 隐私感知文档视觉问答

摘要: 文档视觉问答（DocVQA）已迅速发展成为文档理解的核心任务。然而，尽管文档包含敏感或受版权保护的信息，但目前没有一种DocVQA方法提供强大的隐私保障。在这项工作中，我们首次探讨了DocVQA领域的隐私问题，突出了用于DocVQA的最先进多模态LLM模型中的隐私问题，并探讨了可能的解决方案。具体来说，我们将发票处理作为一个现实的文档理解场景，并提出了一个包括发票文档及相关问题和答案的大规模DocVQA数据集。我们采用联邦学习方案，反映了不同企业文档的实际分布，并探讨了发票提供者数据是需要保护的敏感信息的使用案例。我们证明非私有模型往往会记忆，这种行为可能会导致暴露私人信息。然后，我们评估了在这种多模态场景中采用联邦学习和差分隐私的基线训练方案，其中敏感信息可能通过两种输入模态之一或两者同时暴露：视觉（文档图像）或语言（OCR令牌）。最后，我们设计了攻击，利用模型的记忆效应，并证明了它们在探测代表性DocVQA模型方面的有效性。

更新时间: 2024-09-02 17:00:21

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.10108v2

Exploring Bias and Prediction Metrics to Characterise the Fairness of Machine Learning for Equity-Centered Public Health Decision-Making: A Narrative Review

Background: The rapid advancement of Machine Learning (ML) represents novel opportunities to enhance public health research, surveillance, and decision-making. However, there is a lack of comprehensive understanding of algorithmic bias, systematic errors in predicted population health outcomes, resulting from the public health application of ML. The objective of this narrative review is to explore the types of bias generated by ML and quantitative metrics to assess these biases. Methods : We performed search on PubMed, MEDLINE, IEEE (Institute of Electrical and Electronics Engineers), ACM (Association for Computing Machinery) Digital Library, Science Direct, and Springer Nature. We used keywords to identify studies describing types of bias and metrics to measure these in the domain of ML and public and population health published in English between 2008 and 2023, inclusive. Results: A total of 72 articles met the inclusion criteria. Our review identified the commonly described types of bias and quantitative metrics to assess these biases from an equity perspective. Conclusion : The review will help formalize the evaluation framework for ML on public health from an equity perspective.

Updated: 2024-09-02 17:00:05

标题: 探索偏见和预测指标以表征面向公平的机器学习在以公平为中心的公共卫生决策中的应用：一项叙事性综述

摘要: 背景：机器学习（ML）的快速发展为增强公共卫生研究、监测和决策提供了新的机会。然而，目前对于算法偏见和由ML在公共卫生应用中导致的预测人群健康结果中的系统错误缺乏全面的理解。本叙事性综述的目标是探讨ML生成的偏见类型及用于评估这些偏见的定量指标。方法：我们在PubMed、MEDLINE、IEEE（电气和电子工程师学会）、ACM（计算机协会）数字图书馆、Science Direct和Springer Nature进行了检索。我们使用关键词识别了描述ML领域和公共卫生以及人口健康中偏见类型和评估指标的研究，在2008年至2023年间发表的英文文章。结果：共有72篇文章符合纳入标准。我们的综述确定了常描述的偏见类型和定量指标，从公平性的角度评估这些偏见。结论：本综述将有助于从公平性的角度规范ML在公共卫生领域的评估框架。

更新时间: 2024-09-02 17:00:05

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2408.13295v2

Does Data-Efficient Generalization Exacerbate Bias in Foundation Models?

Foundation models have emerged as robust models with label efficiency in diverse domains. In medical imaging, these models contribute to the advancement of medical diagnoses due to the difficulty in obtaining labeled data. However, it is unclear whether using a large amount of unlabeled data, biased by the presence of sensitive attributes during pre-training, influences the fairness of the model. This research examines the bias in the Foundation model (RetFound) when it is applied to fine-tune the Brazilian Multilabel Ophthalmological Dataset (BRSET), which has a different population than the pre-training dataset. The model evaluation, in comparison with supervised learning, shows that the Foundation Model has the potential to reduce the gap between the maximum AUC and minimum AUC evaluations across gender and age groups. However, in a data-efficient generalization, the model increases the bias when the data amount decreases. These findings suggest that when deploying a Foundation Model in real-life scenarios with limited data, the possibility of fairness issues should be considered.

Updated: 2024-09-02 16:58:16

标题: 数据高效泛化是否加剧了基础模型中的偏见？

摘要: 基础模型已经成为在不同领域具有标签效率的强大模型。在医学成像领域，这些模型对于医学诊断的进展起到了推动作用，因为获取标记数据的困难。然而，目前尚不清楚在预训练过程中使用大量未标记数据，且受到敏感属性存在的偏见是否会影响模型的公平性。本研究考察了基础模型（RetFound）在应用于微调巴西多标签眼科数据集（BRSET）时的偏见情况，该数据集与预训练数据集的人口群体不同。与监督学习相比，模型评估显示基础模型有潜力减少性别和年龄组之间最大AUC和最小AUC评估之间的差距。然而，在数据效率的泛化过程中，当数据量减少时，模型增加了偏见。这些发现表明，在实际场景中部署基础模型时，应考虑可能存在的公平性问题，特别是在数据有限的情况下。

更新时间: 2024-09-02 16:58:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.16154v2

Imitating Language via Scalable Inverse Reinforcement Learning

The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum likelihood estimation (MLE) for next token prediction led to its role as predominant paradigm. However, the broader field of imitation learning can more effectively utilize the sequential structure underlying autoregressive generation. We focus on investigating the inverse reinforcement learning (IRL) perspective to imitation, extracting rewards and directly optimizing sequences instead of individual token likelihoods and evaluate its benefits for fine-tuning large language models. We provide a new angle, reformulating inverse soft-Q-learning as a temporal difference regularized extension of MLE. This creates a principled connection between MLE and IRL and allows trading off added complexity with increased performance and diversity of generations in the supervised fine-tuning (SFT) setting. We find clear advantages for IRL-based imitation, in particular for retaining diversity while maximizing task performance, rendering IRL a strong alternative on fixed SFT datasets even without online data generation. Our analysis of IRL-extracted reward functions further indicates benefits for more robust reward functions via tighter integration of supervised and preference-based LLM post-training.

Updated: 2024-09-02 16:48:57

标题: 通过可扩展的逆强化学习模仿语言

摘要: 语言模型训练的大部分基于模仿学习。它涵盖了预训练、监督微调，并影响了从人类反馈中强化学习的起始条件（RLHF）。最大似然估计（MLE）在下一个标记预测中的简单性和可扩展性使其成为主导范式。然而，更广泛的模仿学习领域可以更有效地利用自回归生成的顺序结构。我们专注于探究反强化学习（IRL）视角对模仿的影响，提取奖励并直接优化序列，而不是单个标记的可能性，并评估其在微调大型语言模型中的益处。我们提供了一个新的角度，将反软Q学习重新构建为MLE的时间差异正则化扩展。这在MLE和IRL之间创建了一个合理的连接，并允许在受监督的微调（SFT）环境中通过增加性能和生成多样性来权衡增加的复杂性。我们发现基于IRL的模仿具有明显的优势，特别是在保持多样性的同时最大化任务性能方面，即使没有在线数据生成，IRL也是固定SFT数据集上的一个强大替代方案。我们对IRL提取的奖励函数的分析进一步表明，通过更牢固地整合监督和基于偏好的LLM后训练，可以获得更健壮的奖励函数的益处。

更新时间: 2024-09-02 16:48:57

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2409.01369v1

Debiasing Graph Representation Learning based on Information Bottleneck

Graph representation learning has shown superior performance in numerous real-world applications, such as finance and social networks. Nevertheless, most existing works might make discriminatory predictions due to insufficient attention to fairness in their decision-making processes. This oversight has prompted a growing focus on fair representation learning. Among recent explorations on fair representation learning, prior works based on adversarial learning usually induce unstable or counterproductive performance. To achieve fairness in a stable manner, we present the design and implementation of GRAFair, a new framework based on a variational graph auto-encoder. The crux of GRAFair is the Conditional Fairness Bottleneck, where the objective is to capture the trade-off between the utility of representations and sensitive information of interest. By applying variational approximation, we can make the optimization objective tractable. Particularly, GRAFair can be trained to produce informative representations of tasks while containing little sensitive information without adversarial training. Experiments on various real-world datasets demonstrate the effectiveness of our proposed method in terms of fairness, utility, robustness, and stability.

Updated: 2024-09-02 16:45:23

标题: Information Bottleneck基础上的去偏见图表示学习

摘要: 图形表示学习在诸多实际应用中表现出优越性能，如金融和社交网络。然而，大多数现有作品可能由于在决策过程中对公平性关注不足而做出歧视性预测。这一疏忽已经引起对公平表示学习的日益关注。在最近对公平表示学习的探索中，以对抗学习为基础的先前作品通常会导致不稳定或适得其反的性能。为了以稳定的方式实现公平性，我们提出了基于变分图自动编码器的新框架GRAFair的设计和实现。GRAFair的关键在于条件公平性瓶颈，其目标是捕捉表示的效用和敏感信息之间的权衡。通过应用变分逼近，我们可以使优化目标变得可行。特别是，GRAFair可以在不进行对抗训练的情况下训练，以产生有关任务的信息丰富的表示，同时包含很少的敏感信息。在各种真实世界数据集上的实验证明了我们提出的方法在公平性、效用性、鲁棒性和稳定性方面的有效性。

更新时间: 2024-09-02 16:45:23

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2409.01367v1

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

Deploying large language models (LLMs) on edge devices presents significant challenges due to the substantial computational overhead and memory requirements. Activation sparsification can mitigate these challenges by reducing the number of activated neurons during inference. Existing methods typically employ thresholding-based sparsification based on the statistics of activation tensors. However, these methods do not explicitly model the impact of activation sparsification on performance, leading to suboptimal performance degradation. To address this issue, this paper reformulates the activation sparsification problem by introducing a new objective that optimizes the sparsification decisions. Building on this reformulation, we propose CHESS, a general activation sparsification approach via CHannel-wise thrEsholding and Selective Sparsification. First, channel-wise thresholding assigns a unique threshold to each activation channel in the feed-forward network (FFN) layers. Then, selective sparsification involves applying thresholding-based activation sparsification to specific layers within the attention modules. Finally, we detail the implementation of sparse kernels to accelerate LLM inference. Experimental results demonstrate that the proposed CHESS achieves lower performance degradation over 8 downstream tasks while activating fewer parameters compared to existing methods, thus speeding up the LLM inference by up to 1.27x.

Updated: 2024-09-02 16:41:44

标题: CHESS: 通过通道阈值化和选择性稀疏化优化LLM推断

摘要: 在边缘设备上部署大型语言模型(LLMs)面临着重要挑战，主要是由于巨大的计算开销和内存需求。激活稀疏化可以通过减少推断过程中激活的神经元数量来缓解这些挑战。现有方法通常基于激活张量的统计信息采用基于阈值的稀疏化。然而，这些方法并未明确建模激活稀疏化对性能的影响，导致性能下降不够优化。为解决这个问题，本文通过引入优化稀疏化决策的新目标重新制定了激活稀疏化问题。基于这个重新制定，我们提出了CHESS，一种通过通道阈值化和选择性稀疏化的通用激活稀疏化方法。首先，通道阈值化为前馈网络(FFN)层中的每个激活通道分配一个唯一阈值。然后，选择性稀疏化涉及将基于阈值的激活稀疏化应用于注意力模块中的特定层。最后，我们详细介绍了稀疏内核的实现，以加速LLM的推断。实验结果表明，所提出的CHESS在激活更少的参数的同时，在8个下游任务中实现了更低的性能下降，从而将LLM的推断加速了最多1.27倍。

更新时间: 2024-09-02 16:41:44

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.01366v1

Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant

Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Specific Assistant Instructions based on AlexanderStreet therapy, and secondly, we use an adaption fine-tuning method and retrieval augmented generation method to improve pre-trained LLMs. Through quantitative evaluation of linguistic quality using automatic and human evaluation, we observe that pre-trained LLMs on Psychotherapy Assistant Instructions outperform state-of-the-art LLMs response baselines. Our Assistant-Instruction approach offers a half-annotation method to align pre-trained LLMs with instructions and provide pre-trained LLMs with more psychotherapy knowledge.

Updated: 2024-09-02 16:33:29

标题: 使用助手对心理治疗聊天机器人进行特定领域的改进

摘要: 大型语言模型（LLMs）已经展示出在特定任务上具有令人印象深刻的泛化能力，这些任务使用了人类编写的指导数据。然而，这些指导数据的数量、多样性和专业知识有限，这引发了对LLMs在心理治疗任务中在提供领域特定指导时表现的担忧。为了解决这个问题，首先我们提出了基于AlexanderStreet疗法的领域特定助理指导，其次，我们使用了一种适应性微调方法和检索增强生成方法来改进预训练的LLMs。通过使用自动评估和人工评估的定量评估语言质量，我们观察到预训练的LLMs在心理治疗助理指导上的表现优于最先进的LLMs响应基线。我们的助理指导方法提供了一种半注释方法，将预训练的LLMs与指导对齐，并为其提供更多的心理治疗知识。

更新时间: 2024-09-02 16:33:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.16160v2

Correlating Time Series with Interpretable Convolutional Kernels

This study addresses the problem of convolutional kernel learning in univariate, multivariate, and multidimensional time series data, which is crucial for interpreting temporal patterns in time series and supporting downstream machine learning tasks. First, we propose formulating convolutional kernel learning for univariate time series as a sparse regression problem with a non-negative constraint, leveraging the properties of circular convolution and circulant matrices. Second, to generalize this approach to multivariate and multidimensional time series data, we use tensor computations, reformulating the convolutional kernel learning problem in the form of tensors. This is further converted into a standard sparse regression problem through vectorization and tensor unfolding operations. In the proposed methodology, the optimization problem is addressed using the existing non-negative subspace pursuit method, enabling the convolutional kernel to capture temporal correlations and patterns. To evaluate the proposed model, we apply it to several real-world time series datasets. On the multidimensional rideshare and taxi trip data from New York City and Chicago, the convolutional kernels reveal interpretable local correlations and cyclical patterns, such as weekly seasonality. In the context of multidimensional fluid flow data, both local and nonlocal correlations captured by the convolutional kernels can reinforce tensor factorization, leading to performance improvements in fluid flow reconstruction tasks. Thus, this study lays an insightful foundation for automatically learning convolutional kernels from time series data, with an emphasis on interpretability through sparsity and non-negativity constraints.

Updated: 2024-09-02 16:29:21

标题: 将时间序列与可解释卷积核相关联

摘要: 这项研究解决了在单变量、多变量和多维时间序列数据中卷积核学习的问题，这对于解释时间序列中的时间模式并支持下游机器学习任务至关重要。首先，我们提出将单变量时间序列的卷积核学习建模为带有非负约束的稀疏回归问题，利用循环卷积和循环矩阵的特性。其次，为了将这种方法推广到多变量和多维时间序列数据，我们使用张量计算，将卷积核学习问题重新表述为张量的形式。这进一步通过向量化和张量展开操作转换为标准的稀疏回归问题。在所提出的方法中，优化问题使用现有的非负子空间追踪方法来解决，使卷积核能够捕获时间相关性和模式。为了评估所提出的模型，我们将其应用于几个真实世界的时间序列数据集。在纽约市和芝加哥的多维顺风车和出租车行程数据中，卷积核揭示了可解释的局部相关性和周期性模式，如每周季节性。在多维流体流数据的背景下，卷积核捕获的局部和非局部相关性可以加强张量分解，从而提高流体流重建任务的性能。因此，这项研究为从时间序列数据中自动学习卷积核奠定了深刻的基础，强调了通过稀疏性和非负性约束进行解释的重要性。

更新时间: 2024-09-02 16:29:21

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.01362v1

A Survey and Comparison of Post-quantum and Quantum Blockchains

Blockchains have gained substantial attention from academia and industry for their ability to facilitate decentralized trust and communications. However, the rapid progress of quantum computing poses a significant threat to the security of existing blockchain technologies. Notably, the emergence of Shor's and Grover's algorithms raises concerns regarding the compromise of the cryptographic systems underlying blockchains. Consequently, it is essential to develop methods that reinforce blockchain technology against quantum attacks. In response to this challenge, two distinct approaches have been proposed. The first approach involves post-quantum blockchains, which aim to utilize classical cryptographic algorithms resilient to quantum attacks. The second approach explores quantum blockchains, which leverage the power of quantum computers and networks to rebuild the foundations of blockchains. This paper aims to provide a comprehensive overview and comparison of post-quantum and quantum blockchains while exploring open questions and remaining challenges in these domains. It offers an in-depth introduction, examines differences in blockchain structure, security, privacy, and other key factors, and concludes by discussing current research trends.

Updated: 2024-09-02 16:20:22

标题: 一项关于后量子和量子区块链的调查和比较

摘要: 区块链因其促进去中心化信任和通信的能力而受到学术界和行业的广泛关注。然而，量子计算的快速进展对现有区块链技术的安全性构成重大威胁。特别是，Shor和Grover算法的出现引发了关于破坏支撑区块链的加密系统的担忧。因此，开发加强区块链技术抵御量子攻击的方法至关重要。针对这一挑战，提出了两种不同的方法。第一种方法涉及后量子区块链，旨在利用对量子攻击具有抵抗力的经典加密算法。第二种方法探索量子区块链，利用量子计算机和网络的力量重建区块链的基础。本文旨在全面概述和比较后量子和量子区块链，同时探讨这些领域中的未解问题和尚存挑战。它提供了深入介绍，考察了区块链结构、安全性、隐私性和其他关键因素的差异，并最后讨论了当前研究趋势。

更新时间: 2024-09-02 16:20:22

领域: cs.CR,quant-ph

下载: http://arxiv.org/abs/2409.01358v1

Ancestral Reinforcement Learning: Unifying Zeroth-Order Optimization and Genetic Algorithms for Reinforcement Learning

Reinforcement Learning (RL) offers a fundamental framework for discovering optimal action strategies through interactions within unknown environments. Recent advancement have shown that the performance and applicability of RL can significantly be enhanced by exploiting a population of agents in various ways. Zeroth-Order Optimization (ZOO) leverages an agent population to estimate the gradient of the objective function, enabling robust policy refinement even in non-differentiable scenarios. As another application, Genetic Algorithms (GA) boosts the exploration of policy landscapes by mutational generation of policy diversity in an agent population and its refinement by selection. A natural question is whether we can have the best of two worlds that the agent population can have. In this work, we propose Ancestral Reinforcement Learning (ARL), which synergistically combines the robust gradient estimation of ZOO with the exploratory power of GA. The key idea in ARL is that each agent within a population infers gradient by exploiting the history of its ancestors, i.e., the ancestor population in the past, while maintaining the diversity of policies in the current population as in GA. We also theoretically reveal that the populational search in ARL implicitly induces the KL-regularization of the objective function, resulting in the enhanced exploration. Our results extend the applicability of populational algorithms for RL.

Updated: 2024-09-02 16:19:25

标题: 祖传强化学习：将零阶优化和遗传算法统一应用于强化学习

摘要: 强化学习（RL）提供了一个基本框架，通过与未知环境的互动来发现最佳行动策略。最近的进展表明，通过各种方式利用一群代理人可以显著提高RL的性能和适用性。零阶优化（ZOO）利用代理人群体来估计目标函数的梯度，即使在不可微分的情况下也能实现强大的策略优化。作为另一种应用，遗传算法（GA）通过在代理人群体中通过突变生成策略多样性来增加对策略空间的探索，并通过选择进行优化。一个自然的问题是我们是否可以同时拥有代理人群体的两种最佳优势。在这项工作中，我们提出了祖先强化学习（ARL），它将ZOO的强大梯度估计与GA的探索能力相结合。ARL的关键思想是，每个代理人通过利用其祖先的历史来推断梯度，即过去的祖先人口，同时保持当前人口中的策略多样性，就像GA一样。我们还在理论上揭示了ARL中的人口搜索隐含地引入了目标函数的KL正则化，从而增强了探索能力。我们的结果扩展了强化学习中人口算法的适用性。

更新时间: 2024-09-02 16:19:25

领域: cs.LG

下载: http://arxiv.org/abs/2408.09493v2

Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement

Recently, attention-based transformers have become a de facto standard in many deep learning applications including natural language processing, computer vision, signal processing, etc.. In this paper, we propose a transformer-based end-to-end model to extract a target speaker's speech from a monaural multi-speaker mixed audio signal. Unlike existing speaker extraction methods, we introduce two additional objectives to impose speaker embedding consistency and waveform encoder invertibility and jointly train both speaker encoder and speech separator to better capture the speaker conditional embedding. Furthermore, we leverage a multi-scale discriminator to refine the perceptual quality of the extracted speech. Our experiments show that the use of a dual path transformer in the separator backbone along with proposed training paradigm improves the CNN baseline by $3.12$ dB points. Finally, we compare our approach with recent state-of-the-arts and show that our model outperforms existing methods by $4.1$ dB points on an average without creating additional data dependency.

Updated: 2024-09-02 16:11:12

标题: Spectron: 使用具有对抗细化的条件变换器进行目标说话者提取

摘要: 最近，基于注意力机制的transformer已成为许多深度学习应用的事实标准，包括自然语言处理、计算机视觉、信号处理等。本文提出了一种基于transformer的端到端模型，用于从单声道多说话者混合音频信号中提取目标发言者的语音。与现有的说话者提取方法不同，我们引入了两个额外的目标，以强制说话者嵌入的一致性和波形编码器的可逆性，并联合训练说话者编码器和语音分离器，以更好地捕捉说话者条件嵌入。此外，我们利用一个多尺度鉴别器来提升提取语音的感知质量。我们的实验表明，在分离器主干中使用双路径transformer以及所提出的训练范式可以将CNN基线提高3.12 dB。最后，我们将我们的方法与最近的最新技术进行比较，结果显示我们的模型在平均情况下优于现有方法4.1 dB，而不产生额外的数据依赖。

更新时间: 2024-09-02 16:11:12

领域: cs.SD,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2409.01352v1

PatternPaint: Generating Layout Patterns Using Generative AI and Inpainting Techniques

Generation of VLSI layout patterns is essential for a wide range of Design For Manufacturability (DFM) studies. In this study, we investigate the potential of generative machine learning models for creating design rule legal metal layout patterns. Our results demonstrate that the proposed model can generate legal patterns in complex design rule settings and achieves a high diversity score. The designed system, with its flexible settings, supports both pattern generation with localized changes, and design rule violation correction. Our methodology is validated on Intel 18A Process Design Kit (PDK) and can produce a wide range of DRC-compliant pattern libraries with only 20 starter patterns.

Updated: 2024-09-02 16:02:26

标题: PatternPaint：使用生成式人工智能和修复技术生成布局模式

摘要: VLSI布局图案的生成对于广泛的设计制造性（DFM）研究至关重要。在本研究中，我们调查了生成式机器学习模型在创建符合设计规则的金属布局图案方面的潜力。我们的结果表明，所提出的模型可以在复杂的设计规则设置中生成合法的图案，并取得了高的多样性得分。设计的系统具有灵活的设置，支持具有局部变化的图案生成，以及设计规则违反的纠正。我们的方法在Intel 18A工艺设计工具包（PDK）上得到验证，并且仅使用20个起始图案即可生成各种符合DRC的图案库。

更新时间: 2024-09-02 16:02:26

领域: cs.CV,cs.CE,cs.LG

下载: http://arxiv.org/abs/2409.01348v1

Pairing Analogy-Augmented Generation with Procedural Memory for Procedural Q&A

While LLMs in the RAG paradigm have shown remarkable performance on a variety of tasks, they still under-perform on unseen domains, especially on complex tasks like procedural question answering. In this work, we introduce a novel formalism and structure for manipulating text-based procedures. Based on this formalism, we further present a novel dataset called LCStep, scraped from the LangChain Python docs. Moreover, we extend the traditional RAG system to propose a novel system called analogy-augmented generation (AAG), that draws inspiration from human analogical reasoning and ability to assimilate past experiences to solve unseen problems. The proposed method uses a frozen language model with a custom procedure memory store to adapt to specialized knowledge. We demonstrate that AAG outperforms few-shot and RAG baselines on LCStep, RecipeNLG, and CHAMP datasets under a pairwise LLM-based evaluation, corroborated by human evaluation in the case of RecipeNLG.

Updated: 2024-09-02 15:58:24

标题: 使用类比增强生成与程序性记忆相配对，用于程序性问答

摘要: 尽管RAG范例中的LLM在各种任务上表现出色，但它们在未知领域上仍表现不佳，特别是在诸如程序性问题回答等复杂任务上。在这项工作中，我们引入了一种新的形式和结构来操作基于文本的程序。基于这种形式，我们进一步提出了一个新的数据集LCStep，从LangChain Python文档中获取。此外，我们扩展了传统的RAG系统，提出了一个新的系统称为类比增强生成（AAG），该系统从人类类比推理和吸收过去经验以解决未知问题的能力中汲取灵感。所提出的方法使用一个冻结的语言模型和一个自定义的程序内存存储来适应专业知识。我们证明AAG在LCStep、RecipeNLG和CHAMP数据集上通过基于LLM的成对评估优于few-shot和RAG基线，RecipeNLG的人类评估结果也得到了证实。

更新时间: 2024-09-02 15:58:24

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.01344v1

A Financial Time Series Denoiser Based on Diffusion Model

Financial time series often exhibit low signal-to-noise ratio, posing significant challenges for accurate data interpretation and prediction and ultimately decision making. Generative models have gained attention as powerful tools for simulating and predicting intricate data patterns, with the diffusion model emerging as a particularly effective method. This paper introduces a novel approach utilizing the diffusion model as a denoiser for financial time series in order to improve data predictability and trading performance. By leveraging the forward and reverse processes of the conditional diffusion model to add and remove noise progressively, we reconstruct original data from noisy inputs. Our extensive experiments demonstrate that diffusion model-based denoised time series significantly enhance the performance on downstream future return classification tasks. Moreover, trading signals derived from the denoised data yield more profitable trades with fewer transactions, thereby minimizing transaction costs and increasing overall trading efficiency. Finally, we show that by using classifiers trained on denoised time series, we can recognize the noising state of the market and obtain excess return.

Updated: 2024-09-02 15:55:36

标题: 基于扩散模型的金融时间序列去噪器

摘要: 金融时间序列往往表现出低信噪比，给准确数据解释和预测以及最终决策带来了重大挑战。生成模型作为模拟和预测复杂数据模式的强大工具备受关注，扩散模型作为一种特别有效的方法应运而生。本文介绍了一种新颖的方法，利用扩散模型作为金融时间序列的去噪器，以改善数据可预测性和交易表现。通过利用条件扩散模型的正向和反向过程逐渐添加和消除噪音，我们可以从嘈杂的输入中重建原始数据。我们广泛的实验表明，基于扩散模型去噪的时间序列显著提高了未来回报分类任务的表现。此外，从去噪数据中得出的交易信号带来更多盈利的交易，交易次数更少，从而减少交易成本并提高整体交易效率。最后，我们展示了通过使用在去噪时间序列上训练的分类器，我们能够识别市场的噪声状态并获得超额回报。

更新时间: 2024-09-02 15:55:36

领域: cs.LG,cs.AI,q-fin.CP,q-fin.TR

下载: http://arxiv.org/abs/2409.02138v1

Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English

People communicate in more than 7,000 languages around the world, with around 780 languages spoken in India alone. Despite this linguistic diversity, research on Sentiment Analysis has predominantly focused on English text data, resulting in a disproportionate availability of sentiment resources for English. This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation. By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages. We also discuss the shortcomings and potential for future work towards the end.

Updated: 2024-09-02 15:41:34

标题: 跨语言情感分析：机器翻译英文前后的评估

摘要: 全球有超过7000种语言进行交流，仅在印度就有大约780种语言。尽管存在这种语言多样性，但情感分析的研究主要集中在英语文本数据上，导致英语情感资源的不均衡可用性。本文研究了变压器模型在跨多语言数据集和经过机器翻译的文本中的情感分析任务中的性能。通过比较这些模型在不同语言环境中的有效性，我们可以了解它们在性能变化方面的见解，以及对不同语言进行情感分析的潜在影响。我们还讨论了研究的不足之处和未来工作的潜力。

更新时间: 2024-09-02 15:41:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.02887v2

The Cultivated Practices of Text-to-Image Generation

Humankind is entering a novel creative era in which anybody can synthesize digital information using generative artificial intelligence (AI). Text-to-image generation, in particular, has become vastly popular and millions of practitioners produce AI-generated images and AI art online. This chapter first gives an overview of the key developments that enabled a healthy co-creative online ecosystem around text-to-image generation to rapidly emerge, followed by a high-level description of key elements in this ecosystem. A particular focus is placed on prompt engineering, a creative practice that has been embraced by the AI art community. It is then argued that the emerging co-creative ecosystem constitutes an intelligent system on its own - a system that both supports human creativity, but also potentially entraps future generations and limits future development efforts in AI. The chapter discusses the potential risks and dangers of cultivating this co-creative ecosystem, such as the bias inherent in today's training data, potential quality degradation in future image generation systems due to synthetic data becoming common place, and the potential long-term effects of text-to-image generation on people's imagination, ambitions, and development.

Updated: 2024-09-02 15:34:35

标题: 文献标题翻译为：文本到图像生成的培育实践

摘要: 人类正在进入一个新的创意时代，在这个时代任何人都可以使用生成式人工智能合成数字信息。特别是文本到图像生成已经变得非常流行，数以百万计的从业者在网上生产AI生成的图像和AI艺术。本章首先概述了使得围绕文本到图像生成形成健康共创在线生态系统迅速出现的关键发展，接着高层次地描述了这一生态系统中的关键元素。特别关注的是提示工程，这是一个被AI艺术社区所接受的创意实践。然后论证了新兴的共创生态系统本身构成了一个智能系统 - 一个既支持人类创造力，但也潜在地困住未来一代并限制AI未来发展努力的系统。本章讨论了培育这一共创生态系统的潜在风险和危险，比如今天训练数据中固有的偏见、未来图像生成系统由于合成数据变得普遍而可能出现的质量下降，以及文本到图像生成对人们想象力、抱负和发展的潜在长期影响。

更新时间: 2024-09-02 15:34:35

领域: cs.CY,cs.AI,K.4; J.5; I.2.0; K.5.m

下载: http://arxiv.org/abs/2306.11393v3

Pediatric brain tumor classification using digital histopathology and deep learning: evaluation of SOTA methods on a multi-center Swedish cohort

Brain tumors are the most common solid tumors in children and young adults, but the scarcity of large histopathology datasets has limited the application of computational pathology in this group. This study implements two weakly supervised multiple-instance learning (MIL) approaches on patch-features obtained from state-of-the-art histology-specific foundation models to classify pediatric brain tumors in hematoxylin and eosin whole slide images (WSIs) from a multi-center Swedish cohort. WSIs from 540 subjects (age 8.5$\pm$4.9 years) diagnosed with brain tumor were gathered from the six Swedish university hospitals. Instance (patch)-level features were obtained from WSIs using three pre-trained feature extractors: ResNet50, UNI and CONCH. Instances were aggregated using attention-based MIL (ABMIL) or clustering-constrained attention MIL (CLAM) for patient-level classification. Models were evaluated on three classification tasks based on the hierarchical classification of pediatric brain tumors: tumor category, family and type. Model generalization was assessed by training on data from two of the centers and testing on data from four other centers. Model interpretability was evaluated through attention-mapping. The highest classification performance was achieved using UNI features and AMBIL aggregation, with Matthew's correlation coefficient of 0.86$\pm$0.04, 0.63$\pm$0.04, and 0.53$\pm$0.05, for tumor category, family and type classification, respectively. When evaluating generalization, models utilizing UNI and CONCH features outperformed those using ResNet50. However, the drop in performance from the in-site to out-of-site testing was similar across feature extractors. These results show the potential of state-of-the-art computational pathology methods in diagnosing pediatric brain tumors at different hierarchical levels with fair generalizability on a multi-center national dataset.

Updated: 2024-09-02 15:32:04

标题: 使用数字组织病理学和深度学习对儿童脑肿瘤进行分类：评估SOTA方法在瑞典多中心队列上的应用

摘要: 大脑肿瘤是儿童和年轻成人中最常见的实体肿瘤，但是缺乏大型组织病理学数据集限制了计算病理学在这一群体中的应用。本研究在来自瑞典多中心队列的血红蛋白和嗜酸性染色全切片图像中实施了两种弱监督的多实例学习（MIL）方法，用于对儿童脑肿瘤进行分类。来自六家瑞典大学医院的540名（年龄为8.5±4.9岁）被诊断为脑肿瘤的受试者的WSI被收集。实例（patch）级特征是使用三个预训练的特征提取器（ResNet50、UNI和CONCH）从WSI中获取的。使用基于注意力的MIL（ABMIL）或聚类约束注意力MIL（CLAM）对实例进行聚合以进行患者级分类。模型根据儿童脑肿瘤的分层分类进行了三个分类任务的评估：肿瘤类别、家族和类型。通过在两个中心的数据上进行训练并在其他四个中心的数据上进行测试来评估模型的泛化能力。通过注意力映射评估模型的可解释性。使用UNI特征和AMBIL聚合获得了最高的分类性能，Matthew's相关系数分别为0.86±0.04、0.63±0.04和0.53±0.05，用于肿瘤类别、家族和类型分类。在评估泛化性时，利用UNI和CONCH特征的模型优于使用ResNet50的模型。然而，从站内到站外测试的性能下降在特征提取器之间相似。这些结果显示了最先进的计算病理学方法在不同层次上诊断儿童脑肿瘤的潜力，并在多中心国家数据集上具有公平的泛化能力。

更新时间: 2024-09-02 15:32:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01330v1

Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning

Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in estimating and optimizing the utility-privacy trade-off in image datasets, helping to inform data and privacy modifications for better outcomes based on dataset characteristics.

Updated: 2024-09-02 15:30:27

标题: 评估图像数据集特征对保护隐私的机器学习影响

摘要: 机器学习（ML）在许多领域中至关重要，包括计算机视觉。然而，针对敏感数据训练的ML模型面临安全挑战，因为它们可能会受到攻击并泄露信息。隐私保护机器学习（PPML）通过使用差分隐私（DP）来平衡效用和隐私，从而解决了这个问题。本研究确定了影响私有和非私有卷积神经网络（CNN）模型效用和脆弱性的图像数据集特征。通过分析多个数据集和隐私预算，我们发现不平衡的数据集会增加少数类别的脆弱性，但差分隐私可以缓解这个问题。拥有较少类别的数据集可以提高模型的效用和隐私性，而高熵或低费舍尔判别比（FDR）的数据集会恶化效用-隐私权衡。这些见解为从业者和研究人员在评估和优化图像数据集中的效用-隐私权衡提供了宝贵的指导，有助于根据数据集特征进行数据和隐私修改，从而实现更好的结果。

更新时间: 2024-09-02 15:30:27

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2409.01329v1

Grounding Language Models in Autonomous Loco-manipulation Tasks

Humanoid robots with behavioral autonomy have consistently been regarded as ideal collaborators in our daily lives and promising representations of embodied intelligence. Compared to fixed-based robotic arms, humanoid robots offer a larger operational space while significantly increasing the difficulty of control and planning. Despite the rapid progress towards general-purpose humanoid robots, most studies remain focused on locomotion ability with few investigations into whole-body coordination and tasks planning, thus limiting the potential to demonstrate long-horizon tasks involving both mobility and manipulation under open-ended verbal instructions. In this work, we propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios. We combine reinforcement learning (RL) with whole-body optimization to generate robot motions and store them into a motion library. We further leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph that comprises a series of motion primitives to bridge lower-level execution with higher-level planning. Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks, demonstrating high autonomy from free-text commands in unstructured scenes.

Updated: 2024-09-02 15:27:48

标题: 将语言模型与自主定位操作任务结合起来

摘要: 具有行为自主性的人形机器人一直被视为我们日常生活中理想的合作者和具有体现智能的有前途的代表。与固定基础的机械臂相比，人形机器人提供了更大的操作空间，同时显著增加了控制和规划的难度。尽管朝着通用人形机器人的快速进展，大多数研究仍然集中在行走能力上，对整体协调和任务规划的研究很少，从而限制了展示涉及移动和操作的长期任务潜力，而这些任务是根据开放式的口头指令进行的。在这项工作中，我们提出了一个基于不同场景中的任务学习、选择和规划行为的新框架。我们将强化学习（RL）与全身优化相结合，生成机器人动作并将其存储到动作库中。我们进一步利用大型语言模型（LLM）的规划和推理特性，构建了一个包含一系列运动基元的分层任务图，以将低级执行与高级规划相连接。在模拟和现实世界中使用CENTAURO机器人进行的实验表明，基于语言模型的规划者可以有效地适应新的运动操作任务，展示了在无结构场景中从自由文本命令中实现高度自主性的能力。

更新时间: 2024-09-02 15:27:48

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.01326v1

Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem

Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics could be fruitful.

Updated: 2024-09-02 15:23:52

标题: 对深度学习模型转换器的失败和风险分析：ONNX生态系统中的案例研究

摘要: 软件工程师使用各种开发框架和运行环境开发、优化和部署深度学习（DL）模型。DL模型转换器在不同框架之间以及到运行环境之间移动模型。转换错误会损害模型质量并干扰部署。然而，DL模型转换器的故障特征是未知的，使用DL互操作技术时存在风险。本文分析了DL模型转换器的故障。我们对软件工程师进行关于DL互操作工具、使用案例和痛点的调查（N=92）。然后，我们对与主要互操作性工具ONNX相关的模型转换器中的故障进行了特征化（在PyTorch和TensorFlow中有200个问题）。最后，我们提出并测试了关于我们研究的故障的结构原因的两个假设。我们发现，模型转换器的节点转换阶段占了缺陷的约75%，报告的故障中有33%与语义不正确的模型有关。语义不正确的模型的原因难以捉摸，但行为不一致的模型共享操作符序列。我们的研究结果促使未来对DL互操作软件进行更简单、更易于维护、扩展和验证的研究。对行为容忍度和架构覆盖度指标的研究可能会取得成果。

更新时间: 2024-09-02 15:23:52

领域: cs.SE,cs.LG

下载: http://arxiv.org/abs/2303.17708v4

LoGex: Improved tail detection of extremely rare histopathology classes via guided diffusion

In realistic medical settings, the data are often inherently long-tailed, with most samples concentrated in a few classes and a long tail of rare classes, usually containing just a few samples. This distribution presents a significant challenge because rare conditions are critical to detect and difficult to classify due to limited data. In this paper, rather than attempting to classify rare classes, we aim to detect these as out-of-distribution data reliably. We leverage low-rank adaption (LoRA) and diffusion guidance to generate targeted synthetic data for the detection problem. We significantly improve the OOD detection performance on a challenging histopathological task with only ten samples per tail class without losing classification accuracy on the head classes.

Updated: 2024-09-02 15:18:15

标题: LoGex：通过引导扩散改进极为罕见的组织病理分类的尾部检测

摘要: 在现实医疗环境中，数据往往具有长尾特性，大多数样本集中在少数几个类别中，而稀有类别则呈现长尾分布，通常只包含少量样本。这种分布形式带来了重大挑战，因为稀有情况对检测至关重要，但由于数据有限，很难分类。在本文中，我们不是试图对稀有类别进行分类，而是致力于可靠地检测这些作为离群数据。我们利用低秩适应（LoRA）和扩散引导来生成针对检测问题的有针对性的合成数据。我们在一个具有挑战性的组织病理学任务上显著提高了离群数据检测性能，每个长尾类别仅有十个样本，而不会损失头部类别的分类准确性。

更新时间: 2024-09-02 15:18:15

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.01317v1

Disentangling Mean Embeddings for Better Diagnostics of Image Generators

The evaluation of image generators remains a challenge due to the limitations of traditional metrics in providing nuanced insights into specific image regions. This is a critical problem as not all regions of an image may be learned with similar ease. In this work, we propose a novel approach to disentangle the cosine similarity of mean embeddings into the product of cosine similarities for individual pixel clusters via central kernel alignment. Consequently, we can quantify the contribution of the cluster-wise performance to the overall image generation performance. We demonstrate how this enhances the explainability and the likelihood of identifying pixel regions of model misbehavior across various real-world use cases.

Updated: 2024-09-02 15:16:07

标题: 解开平均嵌入以更好地诊断图像生成器

摘要: 图像生成器的评估仍然是一个挑战，因为传统度量标准在提供特定图像区域的微妙见解方面存在局限性。这是一个关键问题，因为并非所有图像区域都能以相似的轻松程度学习。在这项工作中，我们提出了一种新颖的方法，通过中心核对齐来将均值嵌入的余弦相似度解开为单个像素簇的余弦相似度的乘积。因此，我们可以量化各个像素簇的性能对整体图像生成性能的贡献。我们演示了这如何增强了在各种真实用例中解释性和识别模型行为不当的像素区域的可能性。

更新时间: 2024-09-02 15:16:07

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.01314v1

Multi-frequency Neural Born Iterative Method for Solving 2-D Inverse Scattering Problems

In this work, we propose a deep learning-based imaging method for addressing the multi-frequency electromagnetic (EM) inverse scattering problem (ISP). By combining deep learning technology with EM physical laws, we have successfully developed a multi-frequency neural Born iterative method (NeuralBIM), guided by the principles of the single-frequency NeuralBIM. This method integrates multitask learning techniques with NeuralBIM's efficient iterative inversion process to construct a robust multi-frequency Born iterative inversion model. During training, the model employs a multitask learning approach guided by homoscedastic uncertainty to adaptively allocate the weights of each frequency's data. Additionally, an unsupervised learning method, constrained by the physical laws of ISP, is used to train the multi-frequency NeuralBIM model, eliminating the need for contrast and total field data. The effectiveness of the multi-frequency NeuralBIM is validated through synthetic and experimental data, demonstrating improvements in accuracy and computational efficiency for solving ISP. Moreover, this method exhibits strong generalization capabilities and noise resistance. The multi-frequency NeuralBIM method explores a novel inversion method for multi-frequency EM data and provides an effective solution for the electromagnetic ISP of multi-frequency data.

Updated: 2024-09-02 15:16:07

标题: 多频率神经递归Borna方法用于解决二维逆散射问题

摘要: 在这项工作中，我们提出了一种基于深度学习的成像方法，用于解决多频电磁（EM）逆散射问题（ISP）。通过将深度学习技术与电磁物理定律相结合，我们成功地开发了一种多频神经Born迭代方法（NeuralBIM），遵循单频神经Born迭代方法的原则。该方法将多任务学习技术与神经Born迭代方法的高效迭代反演过程相结合，构建了一个稳健的多频Born迭代反演模型。在训练过程中，模型采用了一种由同方差不确定性指导的多任务学习方法，自适应地分配每个频率数据的权重。此外，还利用受ISP物理定律约束的无监督学习方法来训练多频神经Born迭代方法模型，消除了对对比度和总场数据的需求。多频神经Born迭代方法的有效性通过合成和实验数据验证，展现了解决ISP问题时精度和计算效率的提升。此外，该方法具有较强的泛化能力和抗噪性。多频神经Born迭代方法探索了一种新颖的多频EM数据反演方法，并为多频数据的电磁ISP提供了有效解决方案。

更新时间: 2024-09-02 15:16:07

领域: physics.comp-ph,cs.AI,cs.LG,35Q61,I.2.6; G.1.8; G.1.3

下载: http://arxiv.org/abs/2409.01315v1

HyperInterval: Hypernetwork approach to training weight interval regions in continual learning

Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce \our{} \footnote{The source code is available at https://github.com/gmum/HyperInterval}, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, \our{} maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and, finally, we can utilize one set of weights. \our{} obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.

Updated: 2024-09-02 15:09:05

标题: 超区间：超网络方法用于在持续学习中训练权重区间区域

摘要: 最近，提出了一种新的持续学习（CL）范式，称为间隔持续学习（InterContiNet），用于控制灾难性遗忘，该范式依赖于在神经网络参数空间上强制执行间隔约束。不幸的是，由于权重空间的高维性，InterContiNet的训练具有挑战性，使得间隔难以管理。为了解决这个问题，我们引入了\our{}技术，该技术在嵌入空间内使用区间算术，并利用超网络将这些区间映射到目标网络参数空间。我们为连续任务训练区间嵌入，并训练一个超网络将这些嵌入转换为目标网络的权重。对于给定任务的嵌入与超网络一起训练，保留了目标网络对先前任务嵌入的响应。区间算术在一个更易管理、低维的嵌入空间中工作，而不是直接在高维权重空间中准备间隔。我们的模型允许更快速和更有效的训练。此外，\our{}保持了不遗忘的保证。在训练结束时，我们可以选择一个通用嵌入来生成一个专门用于所有任务的单个网络。在这样的框架中，超网络仅用于训练，最终我们可以利用一组权重。 \our{}比InterContiNet获得了显着更好的结果，并在几个基准测试中取得了SOTA结果。

更新时间: 2024-09-02 15:09:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.15444v3

Evolving Virtual World with Delta-Engine

In this paper, we focus on the \emph{virtual world}, a cyberspace where people can live in. An ideal virtual world shares great similarity with our real world. One of the crucial aspects is its evolving nature, reflected by individuals' capability to grow and thereby influence the objective world. Such dynamics is unpredictable and beyond the reach of existing systems. For this, we propose a special engine called \textbf{\emph{Delta-Engine}} to drive this virtual world. $\Delta$ associates the world's evolution to the engine's scalability. It consists of a base engine and a neural proxy. The base engine programs the prototype of the virtual world; given a trigger, the neural proxy generates new snippets on the base engine through \emph{incremental prediction}. This paper presents a full-stack introduction to the delta-engine. The key feature of the delta-engine is its scalability to unknown elements within the world, Technically, it derives from the prefect co-work of the neural proxy and the base engine, and the alignment with high-quality data. We introduce an engine-oriented fine-tuning method that embeds the base engine into the proxy. We then discuss the human-LLM collaborative design to produce novel and interesting data efficiently. Eventually, we propose three evaluation principles to comprehensively assess the performance of a delta engine: naive evaluation, incremental evaluation, and adversarial evaluation.

Updated: 2024-09-02 15:08:32

标题: 使用Delta-Engine演变的虚拟世界

摘要: 在这篇论文中，我们关注\emph{虚拟世界}，一个人们可以生活在其中的网络空间。理想的虚拟世界与我们的现实世界有很大的相似性。其中一个关键方面是其不断发展的特性，反映在个体能够成长并因此影响客观世界的能力上。这种动态是不可预测的，超出现有系统的范围。因此，我们提出了一个名为\textbf{\emph{Delta-Engine}}的特殊引擎来驱动这个虚拟世界。$\Delta$将世界的演变与引擎的可伸缩性联系起来。它由一个基础引擎和一个神经代理组成。基础引擎编程虚拟世界的原型；在给定触发器的情况下，神经代理通过\emph{增量预测}在基础引擎上生成新的片段。本文对delta引擎进行了全方位介绍。Delta引擎的关键特点是其对世界中未知元素的可伸缩性。从技术上讲，它源自神经代理和基础引擎的完美协作，以及与高质量数据的对齐。我们介绍了一种面向引擎的微调方法，将基础引擎嵌入到代理中。然后我们讨论了人类-LLM的协作设计，以高效地产生新颖和有趣的数据。最后，我们提出了三个评估原则来全面评估delta引擎的性能：天真评估，增量评估和对抗性评估。

更新时间: 2024-09-02 15:08:32

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2408.05842v4

Reward Augmentation in Reinforcement Learning for Testing Distributed Systems

Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on reinforcement learning. Since the natural reward structure is very sparse, the key to successful exploration in reinforcement learning is reward augmentation. We show two different techniques that build on one another. First, we provide a decaying exploration bonus based on the discovery of new states -- the reward decays as the same state is visited multiple times. The exploration bonus captures the intuition from coverage-guided fuzzing of prioritizing new coverage points; in contrast to other schemes, we show that taking the maximum of the bonus and the Q-value leads to more effective exploration. Second, we provide waypoints to the algorithm as a sequence of predicates that capture interesting semantic scenarios. Waypoints exploit designer insight about the protocol and guide the exploration to ``interesting'' parts of the state space. Our reward structure ensures that new episodes can reliably get to deep interesting states even without execution caching. We have implemented our algorithm in Go. Our evaluation on three large benchmarks (RedisRaft, Etcd, and RSL) shows that our algorithm can significantly outperform baseline approaches in terms of coverage and bug finding.

Updated: 2024-09-02 15:07:05

标题: 奖励增强在测试分布式系统中的应用

摘要: 流行的分布式协议实现中的错误是许多热门互联网服务中停机的原因。我们描述了一种基于强化学习的分布式协议实现的随机化测试方法。由于自然奖励结构非常稀疏，强化学习中成功探索的关键在于奖励增强。我们展示了两种相互补充的技术。首先，我们提供了一个基于发现新状态的衰减探索奖励 - 随着同一状态被多次访问，奖励会衰减。探索奖励捕捉了基于覆盖引导模糊测试优先考虑新覆盖点的直觉；与其他方案相比，我们表明取奖励和Q值的最大值可以更有效地探索。其次，我们提供算法的航点作为一系列捕捉有趣语义场景的谓词。航点利用设计者对协议的洞察，并引导探索到状态空间的“有趣”部分。我们的奖励结构确保新的事件可以可靠地到达深层有趣的状态，即使没有执行缓存。我们在Go中实现了我们的算法。我们对三个大型基准测试（RedisRaft，Etcd和RSL）的评估显示，我们的算法在覆盖率和错误查找方面可以明显优于基线方法。

更新时间: 2024-09-02 15:07:05

领域: cs.SE,cs.DC,cs.LG,cs.PL

下载: http://arxiv.org/abs/2409.02137v1

Representing Neural Network Layers as Linear Operations via Koopman Operator Theory

The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more approachable. We draw from a dynamical systems view of neural networks, offering a fresh perspective by using Koopman operator theory and its connections with dynamic mode decomposition (DMD). Together, they offer a framework for linearizing dynamical systems by embedding the system into an appropriate observable space. By reframing a neural network as a dynamical system, we demonstrate that we can replace the nonlinear layer in a pretrained multi-layer perceptron (MLP) with a finite-dimensional linear operator. In addition, we analyze the eigenvalues of DMD and the right singular vectors of SVD, to present evidence that time-delayed coordinates provide a straightforward and highly effective observable space for Koopman theory to linearize a network layer. Consequently, we replace layers of an MLP trained on the Yin-Yang dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%, compared to the original 98.4%. In addition, we replace layers in an MLP trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.

Updated: 2024-09-02 15:04:33

标题: 用库普曼算子理论将神经网络层表示为线性操作

摘要: 简单的神经网络表现出色往往归功于它们的非线性激活。然而，对神经网络的线性观点使得理解和控制网络变得更加可行。我们从神经网络的动力系统观点出发，利用库普曼算子理论及其与动态模态分解（DMD）的关联，提供了一种新的视角。它们共同构建了一个将动力系统线性化的框架，通过将系统嵌入适当的可观测空间。通过将神经网络重新构建为一个动力系统，我们展示了可以用有限维线性算子替换预训练的多层感知器（MLP）中的非线性层。此外，我们分析了DMD的特征值和SVD的右奇异向量，以提供证据表明，时滞坐标为库普曼理论线性化网络层提供了一个简单且高效的可观测空间。因此，我们将在Yin-Yang数据集上训练的MLP的层替换为DMD模型的预测，达到了97.3%的模型准确率，而原始准确率为98.4%。此外，我们将在MNIST数据集上训练的MLP的层替换为DMD模型的预测，在测试集上达到了95.8%的准确率，而原始准确率为97.2%。

更新时间: 2024-09-02 15:04:33

领域: cs.LG

下载: http://arxiv.org/abs/2409.01308v1

Highly Accurate Real-space Electron Densities with Neural Networks

Variational ab-initio methods in quantum chemistry stand out among other methods in providing direct access to the wave function. This allows in principle straightforward extraction of any other observable of interest, besides the energy, but in practice this extraction is often technically difficult and computationally impractical. Here, we consider the electron density as a central observable in quantum chemistry and introduce a novel method to obtain accurate densities from real-space many-electron wave functions by representing the density with a neural network that captures known asymptotic properties and is trained from the wave function by score matching and noise-contrastive estimation. We use variational quantum Monte Carlo with deep-learning ans\"atze (deep QMC) to obtain highly accurate wave functions free of basis set errors, and from them, using our novel method, correspondingly accurate electron densities, which we demonstrate by calculating dipole moments, nuclear forces, contact densities, and other density-based properties.

Updated: 2024-09-02 14:56:22

标题: 用神经网络实现高精度的实空间电子密度

摘要: 量子化学中的变分从头方法在提供直接访问波函数方面优于其他方法。这原则上允许直接提取除能量之外的任何其他感兴趣的可观测量，但在实践中，这种提取通常在技术上很困难且计算上不切实际。在这里，我们将电子密度作为量子化学中的一个核心可观测量，并引入一种新方法，通过用神经网络表示密度来从实空间多电子波函数中获得准确的密度，该神经网络捕捉已知的渐近性质，并通过得分匹配和噪声对比估计从波函数中训练。我们使用具有深度学习方案（深度QMC）的变分量子蒙特卡罗来获得没有基组误差的高度准确的波函数，并从中使用我们的新方法，相应准确的电子密度，我们通过计算偶极矩、核力、接触密度和其他基于密度的性质来证明这一点。

更新时间: 2024-09-02 14:56:22

领域: physics.chem-ph,cs.LG

下载: http://arxiv.org/abs/2409.01306v1

Topological degree as a discrete diagnostic for disentanglement, with applications to the $Δ$VAE

We investigate the ability of Diffusion Variational Autoencoder ($\Delta$VAE) with unit sphere $\mathcal{S}^2$ as latent space to capture topological and geometrical structure and disentangle latent factors in datasets. For this, we introduce a new diagnostic of disentanglement: namely the topological degree of the encoder, which is a map from the data manifold to the latent space. By using tools from homology theory, we derive and implement an algorithm that computes this degree. We use the algorithm to compute the degree of the encoder of models that result from the training procedure. Our experimental results show that the $\Delta$VAE achieves relatively small LSBD scores, and that regardless of the degree after initialization, the degree of the encoder after training becomes $-1$ or $+1$, which implies that the resulting encoder is at least homotopic to a homeomorphism.

Updated: 2024-09-02 14:51:31

标题: 拓扑度作为离散解缠度的诊断指标，以及在$Δ$VAE中的应用

摘要: 我们研究了Diffusion Variational Autoencoder（$\Delta$VAE）在以单位球$\mathcal{S}^2$作为潜在空间的情况下，捕捉数据集中的拓扑和几何结构以及解开潜在因子的能力。为此，我们引入了一种新的解开诊断方法：即编码器的拓扑度，它是一个从数据流形到潜在空间的映射。通过使用同调理论工具，我们推导并实现了一个计算这种度的算法。我们使用该算法计算训练过程产生的模型的编码器度。我们的实验结果显示，$\Delta$VAE实现了相对较小的LSBD分数，并且无论初始时的度数如何，训练后编码器的度数都变为$-1$或$+1$，这意味着结果编码器至少是同伦于一个同胚。

更新时间: 2024-09-02 14:51:31

领域: cs.LG,cs.AI,math.AT,51H20 55N35 68T09 68T07

下载: http://arxiv.org/abs/2409.01303v1

Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Background: This study aimed to evaluate and compare the performance of classical machine learning models (CMLs) and large language models (LLMs) in predicting mortality associated with COVID-19 by utilizing a high-dimensional tabular dataset. Materials and Methods: We analyzed data from 9,134 COVID-19 patients collected across four hospitals. Seven CML models, including XGBoost and random forest (RF), were trained and evaluated. The structured data was converted into text for zero-shot classification by eight LLMs, including GPT-4 and Mistral-7b. Additionally, Mistral-7b was fine-tuned using the QLoRA approach to enhance its predictive capabilities. Results: Among the CML models, XGBoost and RF achieved the highest accuracy, with F1 scores of 0.87 for internal validation and 0.83 for external validation. In the LLM category, GPT-4 was the top performer with an F1 score of 0.43. Fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, resulting in an F1 score of 0.74, which was stable during external validation. Conclusion: While LLMs show moderate performance in zero-shot classification, fine-tuning can significantly enhance their effectiveness, potentially aligning them closer to CML models. However, CMLs still outperform LLMs in high-dimensional tabular data tasks.

Updated: 2024-09-02 14:51:12

标题: 大型语言模型与经典机器学习：在使用高维表格数据进行COVID-19死亡率预测中的表现

摘要: 背景：本研究旨在通过利用高维度表格数据集评估和比较经典机器学习模型（CML）和大型语言模型（LLMs）在预测与COVID-19相关的死亡率方面的表现。材料和方法：我们分析了来自四家医院收集的9,134名COVID-19患者的数据。训练和评估了包括XGBoost和随机森林（RF）在内的七个CML模型。结构化数据被转换为文本，通过八个LLMs进行零射击分类，包括GPT-4和Mistral-7b。此外，使用QLoRA方法对Mistral-7b进行微调以增强其预测能力。结果：在CML模型中，XGBoost和RF实现了最高准确率，内部验证和外部验证的F1分数分别为0.87和0.83。在LLM类别中，GPT-4是表现最好的，F1分数为0.43。通过微调Mistral-7b，将其召回率从1%提高到79%，从而获得了0.74的F1分数，在外部验证过程中保持稳定。结论：虽然LLMs在零射击分类中表现一般，但微调可以显著提升其效果，潜在地使它们更接近CML模型。然而，在高维度表格数据任务中，CML模型仍然优于LLMs。

更新时间: 2024-09-02 14:51:12

领域: cs.LG,cs.AI,cs.CL,92C50, 68T50,J.3

下载: http://arxiv.org/abs/2409.02136v1

ResQuNNs:Towards Enabling Deep Learning in Quantum Convolution Neural Networks

In this paper, we present a novel framework for enhancing the performance of Quanvolutional Neural Networks (QuNNs) by introducing trainable quanvolutional layers and addressing the critical challenges associated with them. Traditional quanvolutional layers, although beneficial for feature extraction, have largely been static, offering limited adaptability. Unlike state-of-the-art, our research overcomes this limitation by enabling training within these layers, significantly increasing the flexibility and potential of QuNNs. However, the introduction of multiple trainable quanvolutional layers induces complexities in gradient-based optimization, primarily due to the difficulty in accessing gradients across these layers. To resolve this, we propose a novel architecture, Residual Quanvolutional Neural Networks (ResQuNNs), leveraging the concept of residual learning, which facilitates the flow of gradients by adding skip connections between layers. By inserting residual blocks between quanvolutional layers, we ensure enhanced gradient access throughout the network, leading to improved training performance. Moreover, we provide empirical evidence on the strategic placement of these residual blocks within QuNNs. Through extensive experimentation, we identify an efficient configuration of residual blocks, which enables gradients across all the layers in the network that eventually results in efficient training. Our findings suggest that the precise location of residual blocks plays a crucial role in maximizing the performance gains in QuNNs. Our results mark a substantial step forward in the evolution of quantum deep learning, offering new avenues for both theoretical development and practical quantum computing applications.

Updated: 2024-09-02 14:38:01

标题: ResQuNNs：向在量子卷积神经网络中实现深度学习迈进

摘要: 在本文中，我们提出了一种增强Quanvolutional神经网络（QuNNs）性能的新框架，通过引入可训练的quanvolutional层并解决与其相关的关键挑战。传统的quanvolutional层虽然有助于特征提取，但大多数是静态的，提供有限的适应性。与最先进的技术不同，我们的研究通过在这些层内进行训练，克服了这一限制，显著增加了QuNNs的灵活性和潜力。然而，引入多个可训练的quanvolutional层会导致基于梯度的优化变得复杂，主要是由于在这些层之间访问梯度的困难。为了解决这个问题，我们提出了一种新的架构，Residual Quanvolutional神经网络（ResQuNNs），利用残差学习的概念，通过在层之间添加跳连接来促进梯度流动。通过在quanvolutional层之间插入残差块，我们确保在整个网络中增强了梯度访问，从而提高了训练性能。此外，我们提供了关于在QuNNs中放置这些残差块的战略位置的实证证据。通过广泛的实验，我们确定了一种高效配置的残差块，使得网络中所有层之间的梯度最终实现了高效的训练。我们的发现表明，残差块的精确位置在最大化QuNNs性能增益中起着至关重要的作用。我们的结果在量子深度学习的发展中迈出了重要的一步，为理论发展和实际量子计算应用开辟了新的途径。

更新时间: 2024-09-02 14:38:01

领域: cs.LG,quant-ph

下载: http://arxiv.org/abs/2402.09146v5

One-Index Vector Quantization Based Adversarial Attack on Image Classification

To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in real-world scenarios. In this paper, we propose a novel one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm, successfully resulting in image misclassification in victim models. The one-index attack method modifies a single index in the compressed data stream so that the decompressed image is misclassified. It only needs to modify a single VQ index to realize an attack, which limits the number of perturbed indexes. The proposed method belongs to a semi-black-box attack, which is more in line with the actual attack scenario. We apply our method to attack three popular image classification models, i.e., Resnet, NIN, and VGG16. On average, 55.9% and 77.4% of the images in CIFAR-10 and Fashion MNIST, respectively, are successfully attacked, with a high level of misclassification confidence and a low level of image perturbation.

Updated: 2024-09-02 14:25:00

标题: 基于一维向量量化的对抗攻击在图像分类上的应用

摘要: 为了改进存储和传输，通常会对图像进行压缩。矢量量化（VQ）是一种流行的压缩方法，因为它具有高压缩比，可以抑制其他压缩技术。尽管如此，现有的针对图像分类的对抗攻击方法大多是在像素域中进行的，压缩域中很少有例外，这使它们在现实场景中的适用性较小。在本文中，我们提出了一种新颖的在VQ域中进行一索引攻击的方法，通过差分进化算法生成对抗性图像，成功导致受害模型中的图像误分类。一索引攻击方法修改了压缩数据流中的单个索引，使得解压后的图像被错误分类。它只需要修改一个VQ索引来实现攻击，这限制了扰动索引的数量。所提出的方法属于半黑盒攻击，更符合实际攻击场景。我们将我们的方法应用于攻击三种流行的图像分类模型，即Resnet、NIN和VGG16。在CIFAR-10和Fashion MNIST数据集中，平均成功攻击了55.9%和77.4%的图像，具有较高的误分类置信度和较低的图像扰动水平。

更新时间: 2024-09-02 14:25:00

领域: cs.CV,cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.01282v1

Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions

Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. However, most of these frameworks assume settings with cross-sectional data, whereas researchers often have access to panel data, which in traditional methods helps to deal with unobserved heterogeneity between units. In this paper, we explore how we can adapt double/debiased machine learning (DML) (Chernozhukov et al., 2018) for panel data in the presence of unobserved heterogeneity. This adaptation is challenging because DML's cross-fitting procedure assumes independent data and the unobserved heterogeneity is not necessarily additively separable in settings with nonlinear observed confounding. We assess the performance of several intuitively appealing estimators in a variety of simulations. While we find violations of the cross-fitting assumptions to be largely inconsequential for the accuracy of the effect estimates, many of the considered methods fail to adequately account for the presence of unobserved heterogeneity. However, we find that using predictive models based on the correlated random effects approach (Mundlak, 1978) within DML leads to accurate coefficient estimates across settings, given a sample size that is large relative to the number of observed confounders. We also show that the influence of the unobserved heterogeneity on the observed confounders plays a significant role for the performance of most alternative methods.

Updated: 2024-09-02 13:59:54

标题: 双机器学习遇见面板数据--承诺、风险和潜在解决方案

摘要: 使用机器学习（ML）算法估计因果效应可以帮助在适当的框架内放松函数形式的假设。然而，大多数这些框架假设具有横截面数据，而研究人员通常可以访问面板数据，传统方法中的面板数据有助于处理单位之间的未观察异质性。在本文中，我们探讨了如何在存在未观察到的异质性的面板数据中调整双重/无偏机器学习（DML）（Chernozhukov等，2018）。这种适应性具有挑战性，因为DML的交叉拟合程序假设数据是独立的，未观察到的异质性在具有非线性观测混杂的设置中未必可加分离。我们在各种模拟中评估了几种直观吸引人的估计量的性能。虽然我们发现违反交叉拟合假设对效应估计的准确性影响较小，但考虑的许多方法未能充分考虑未观察到的异质性的存在。然而，我们发现在DML中使用基于相关随机效应方法（Mundlak，1978）的预测模型在给定相对于观察混杂因子数量较大的样本大小的情况下，在各种情况下可以得到准确的系数估计。我们还表明未观察到的异质性对大多数替代方法的性能起着重要作用。

更新时间: 2024-09-02 13:59:54

领域: econ.EM,cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2409.01266v1

Stabilizing Extreme Q-learning by Maclaurin Expansion

In offline reinforcement learning, in-sample learning methods have been widely used to prevent performance degradation caused by evaluating out-of-distribution actions from the dataset. Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. It has demonstrated strong performance in both offline and online reinforcement learning settings. However, issues remain, such as the instability caused by the exponential term in the loss function and the risk of the error distribution deviating from the Gumbel distribution. Therefore, we propose Maclaurin Expanded Extreme Q-learning to enhance stability. In this method, applying Maclaurin expansion to the loss function in XQL enhances stability against large errors. This approach involves adjusting the modeled value function between the value function under the behavior policy and the soft optimal value function, thus achieving a trade-off between stability and optimality depending on the order of expansion. It also enables adjustment of the error distribution assumption from a normal distribution to a Gumbel distribution. Our method significantly stabilizes learning in online RL tasks from DM Control, where XQL was previously unstable. Additionally, it improves performance in several offline RL tasks from D4RL.

Updated: 2024-09-02 13:55:25

标题: 通过Maclaurin展开稳定极端Q-learning

摘要: 在离线强化学习中，样本内学习方法被广泛应用以防止由于评估数据集中的超出分布的动作而导致性能下降。极端Q学习（XQL）采用基于贝尔曼误差遵循Gumbel分布的假设的损失函数，使其能够以样本内方式对软最优值函数进行建模。它在离线和在线强化学习设置中表现出强大的性能。但是，仍然存在一些问题，例如损失函数中指数项引起的不稳定性以及误差分布偏离Gumbel分布的风险。因此，我们提出了Maclaurin Expanded Extreme Q-learning来增强稳定性。在这种方法中，将Maclaurin展开应用于XQL中的损失函数可增强针对大误差的稳定性。该方法涉及调整在行为策略下的值函数和软最优值函数之间的模拟值函数，从而根据展开的顺序实现稳定性和最优性之间的权衡。它还使得可以将错误分布假设从正态分布调整为Gumbel分布。我们的方法显著稳定了来自DM Control的在线RL任务中的学习，其中XQL以前不稳定。此外，它改进了来自D4RL的几个离线RL任务的性能。

更新时间: 2024-09-02 13:55:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.04896v2

Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling

The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by incorporating monocular depth cues for sophisticated 3D scene modeling. Addressing the prevalent challenge of skewed data distribution in traffic accident datasets, we propose the Binary Adaptive Loss for Early Anticipation (BA-LEA). This novel loss function, together with a multi-task learning strategy, shifts the focus of the predictive model towards the critical moments preceding an accident. {We rigorously evaluate the performance of our framework on three benchmark datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D), and DADA-2000 Dataset--demonstrating its superior predictive accuracy through key metrics such as Average Precision (AP) and mean Time-To-Accident (mTTA).

Updated: 2024-09-02 13:46:25

标题: 自动驾驶的实时事故预测通过单目深度增强三维建模

摘要: 交通事故预测的主要目标是利用仪表盘摄像头视频实时预测潜在事故，这对提高自动驾驶技术的安全性和可靠性至关重要。在本研究中，我们介绍了一种创新框架AccNet，通过将单眼深度线索纳入复杂的3D场景建模，显著提升了预测能力，超越了当前基于2D方法的最新技术（SOTA）。针对交通事故数据集中普遍存在的数据分布不均衡挑战，我们提出了用于早期预测的二进制自适应损失（BA-LEA）。这种新颖的损失函数，结合多任务学习策略，将预测模型的重点转移到事故发生前的关键时刻。我们在三个基准数据集--仪表盘事故数据集（DAD）、车祸数据集（CCD）和AnAn事故检测（A3D）以及DADA-2000数据集上对我们的框架性能进行了严格评估，通过关键指标如平均精度（AP）和平均事故发生时间（mTTA）展示了其优越的预测准确性。

更新时间: 2024-09-02 13:46:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01256v1

GAS: Generative Activation-Aided Asynchronous Split Federated Learning

Split Federated Learning (SFL) splits and collaboratively trains a shared model between clients and server, where clients transmit activations and client-side models to server for updates. Recent SFL studies assume synchronous transmission of activations and client-side models from clients to server. However, due to significant variations in computational and communication capabilities among clients, activations and client-side models arrive at server asynchronously. The delay caused by asynchrony significantly degrades the performance of SFL. To address this issue, we consider an asynchronous SFL framework, where an activation buffer and a model buffer are embedded on the server to manage the asynchronously transmitted activations and client-side models, respectively. Furthermore, as asynchronous activation transmissions cause the buffer to frequently receive activations from resource-rich clients, leading to biased updates of the server-side model, we propose Generative activations-aided Asynchronous SFL (GAS). In GAS, the server maintains an activation distribution for each label based on received activations and generates activations from these distributions according to the degree of bias. These generative activations are then used to assist in updating the server-side model, ensuring more accurate updates. We derive a tighter convergence bound, and our experiments demonstrate the effectiveness of the proposed method.

Updated: 2024-09-02 13:37:28

标题: GAS：生成激活辅助的异步拆分联邦学习

摘要: 分裂联邦学习（SFL）在客户端和服务器之间分裂并协作训练共享模型，客户端向服务器传输激活和客户端模型以进行更新。最近的SFL研究假设客户端向服务器同步传输激活和客户端模型。然而，由于客户端之间计算和通信能力的显著差异，激活和客户端模型会异步到达服务器。由于异步性引起的延迟严重降低了SFL的性能。为了解决这个问题，我们考虑了一个异步SFL框架，在服务器上嵌入了一个激活缓冲区和一个模型缓冲区，分别管理异步传输的激活和客户端模型。此外，由于异步激活传输导致缓冲区频繁接收来自资源丰富的客户端的激活，从而导致服务器端模型的偏向更新，我们提出了生成激活辅助的异步SFL（GAS）。在GAS中，服务器基于收到的激活维护每个标签的激活分布，并根据偏向程度从这些分布中生成激活。然后，这些生成的激活用于辅助更新服务器端模型，确保更准确的更新。我们推导了更紧密的收敛界限，我们的实验证明了所提出方法的有效性。

更新时间: 2024-09-02 13:37:28

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2409.01251v1

Adversarial Pruning: A Survey and Benchmark of Pruning Methods for Adversarial Robustness

Recent work has proposed neural network pruning techniques to reduce the size of a network while preserving robustness against adversarial examples, i.e., well-crafted inputs inducing a misclassification. These methods, which we refer to as adversarial pruning methods, involve complex and articulated designs, making it difficult to analyze the differences and establish a fair and accurate comparison. In this work, we overcome these issues by surveying current adversarial pruning methods and proposing a novel taxonomy to categorize them based on two main dimensions: the pipeline, defining when to prune; and the specifics, defining how to prune. We then highlight the limitations of current empirical analyses and propose a novel, fair evaluation benchmark to address them. We finally conduct an empirical re-evaluation of current adversarial pruning methods and discuss the results, highlighting the shared traits of top-performing adversarial pruning methods, as well as common issues. We welcome contributions in our publicly-available benchmark at https://github.com/pralab/AdversarialPruningBenchmark

Updated: 2024-09-02 13:34:01

标题: 对抗性修剪：对抗性鲁棒性修剪方法的调查和基准研究

摘要: 最近的研究提出了神经网络修剪技术，以减小网络的规模同时保持对抗性示例的稳健性，即精心设计的输入导致错误分类。我们将这些方法称为对抗性修剪方法，涉及复杂和详细的设计，使得分析差异并建立公平和准确的比较变得困难。在这项工作中，我们通过调查当前对抗性修剪方法并提出一种新颖的分类法来克服这些问题，根据两个主要维度对其进行分类：流程，定义何时修剪；具体方法，定义如何修剪。然后我们强调当前经验分析的局限性，并提出一个新颖的公平评估基准来解决这些问题。最后，我们对当前的对抗性修剪方法进行了实证重新评估，并讨论结果，突出表现最佳的对抗性修剪方法的共同特征，以及常见问题。我们欢迎在我们的公开可用基准上进行贡献：https://github.com/pralab/AdversarialPruningBenchmark

更新时间: 2024-09-02 13:34:01

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2409.01249v1

Conversational Complexity for Assessing Risk in Large Language Models

Large Language Models (LLMs) present a dual-use dilemma: they enable beneficial applications while harboring potential for harm, particularly through conversational interactions. Despite various safeguards, advanced LLMs remain vulnerable. A watershed case was Kevin Roose's notable conversation with Bing, which elicited harmful outputs after extended interaction. This contrasts with simpler early jailbreaks that produced similar content more easily, raising the question: How much conversational effort is needed to elicit harmful information from LLMs? We propose two measures: Conversational Length (CL), which quantifies the conversation length used to obtain a specific response, and Conversational Complexity (CC), defined as the Kolmogorov complexity of the user's instruction sequence leading to the response. To address the incomputability of Kolmogorov complexity, we approximate CC using a reference LLM to estimate the compressibility of user instructions. Applying this approach to a large red-teaming dataset, we perform a quantitative analysis examining the statistical distribution of harmful and harmless conversational lengths and complexities. Our empirical findings suggest that this distributional analysis and the minimisation of CC serve as valuable tools for understanding AI safety, offering insights into the accessibility of harmful information. This work establishes a foundation for a new perspective on LLM safety, centered around the algorithmic complexity of pathways to harm.

Updated: 2024-09-02 13:29:44

标题: 大型语言模型中用于评估风险的对话复杂性

摘要: 大型语言模型（LLMs）提出了一个双重用途困境：它们可以实现有益的应用，但同时也存在潜在的危害，特别是通过对话交互。尽管有各种各样的保障措施，先进的LLMs仍然容易受到攻击。一个具有标志性意义的案例是凯文·鲁斯与必应（Bing）的对话，经过长时间的交互后，引发了有害的输出。这与早期较简单的越狱情况形成对比，后者更容易产生类似的内容，这引发了一个问题：需要多少对话努力才能从LLMs获取有害信息？我们提出了两个度量标准：对话长度（CL），用于量化获取特定响应所需的对话长度，以及对话复杂度（CC），定义为导致响应的用户指令序列的Kolmogorov复杂度。为了解决Kolmogorov复杂度的不可计算性，我们使用一个参考LLM来估计用户指令的可压缩性来近似CC。将这种方法应用到一个大型红队测试数据集中，我们进行了定量分析，研究了有害和无害对话长度和复杂度的统计分布。我们的实证研究结果表明，这种分布分析和对CC的最小化可作为理解AI安全的有价值工具，提供了有关有害信息获取可访问性的见解。这项工作为围绕通往危害之路的算法复杂性建立了LLM安全的新视角基础。

更新时间: 2024-09-02 13:29:44

领域: cs.AI,cs.CL,cs.IT,math.IT

下载: http://arxiv.org/abs/2409.01247v1

Revisiting Safe Exploration in Safe Reinforcement learning

Safe reinforcement learning (SafeRL) extends standard reinforcement learning with the idea of safety, where safety is typically defined through the constraint of the expected cost return of a trajectory being below a set limit. However, this metric fails to distinguish how costs accrue, treating infrequent severe cost events as equal to frequent mild ones, which can lead to riskier behaviors and result in unsafe exploration. We introduce a new metric, expected maximum consecutive cost steps (EMCC), which addresses safety during training by assessing the severity of unsafe steps based on their consecutive occurrence. This metric is particularly effective for distinguishing between prolonged and occasional safety violations. We apply EMMC in both on- and off-policy algorithm for benchmarking their safe exploration capability. Finally, we validate our metric through a set of benchmarks and propose a new lightweight benchmark task, which allows fast evaluation for algorithm design.

Updated: 2024-09-02 13:29:29

标题: 重新审视安全强化学习中的安全探索

摘要: 安全强化学习（SafeRL）扩展了标准强化学习的概念，引入了安全性的概念，其中安全性通常通过期望轨迹成本回报低于设定限制来定义。然而，该度量标准未能区分成本的积累方式，将罕见的严重成本事件视为频繁的轻微事件，可能导致更危险的行为，并导致不安全的探索。我们引入了一个新的度量标准，预期最大连续成本步数（EMCC），通过评估不安全步骤的连续发生来评估训练过程中的安全性严重程度。该度量标准特别适用于区分长期和偶发的安全性违规行为。我们将EMMC应用于基准测试中的on-policy和off-policy算法，以评估它们的安全探索能力。最后，我们通过一系列基准测试验证了我们的度量标准，并提出了一个新的轻量级基准任务，可用于快速评估算法设计。

更新时间: 2024-09-02 13:29:29

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2409.01245v1

An Effective Information Theoretic Framework for Channel Pruning

Channel pruning is a promising method for accelerating and compressing convolutional neural networks. However, current pruning algorithms still remain unsolved problems that how to assign layer-wise pruning ratios properly and discard the least important channels with a convincing criterion. In this paper, we present a novel channel pruning approach via information theory and interpretability of neural networks. Specifically, we regard information entropy as the expected amount of information for convolutional layers. In addition, if we suppose a matrix as a system of linear equations, a higher-rank matrix represents there exist more solutions to it, which indicates more uncertainty. From the point of view of information theory, the rank can also describe the amount of information. In a neural network, considering the rank and entropy as two information indicators of convolutional layers, we propose a fusion function to reach a compromise of them, where the fusion results are defined as ``information concentration''. When pre-defining layer-wise pruning ratios, we employ the information concentration as a reference instead of heuristic and engineering tuning to provide a more interpretable solution. Moreover, we leverage Shapley values, which are a potent tool in the interpretability of neural networks, to evaluate the channel contributions and discard the least important channels for model compression while maintaining its performance. Extensive experiments demonstrate the effectiveness and promising performance of our method. For example, our method improves the accuracy by 0.21% when reducing 45.5% FLOPs and removing 40.3% parameters for ResNet-56 on CIFAR-10. Moreover, our method obtains loss in Top-1/Top-5 accuracies of 0.43%/0.11% by reducing 41.6% FLOPs and removing 35.0% parameters for ResNet-50 on ImageNet.

Updated: 2024-09-02 13:19:40

标题: 一个有效的信息理论框架用于信道剪枝

摘要: 通道剪枝是加速和压缩卷积神经网络的一种有前途的方法。然而，当前的剪枝算法仍然存在未解决的问题，即如何适当分配逐层的剪枝比例，并使用令人信服的标准丢弃最不重要的通道。在本文中，我们提出了一种基于信息理论和神经网络可解释性的新型通道剪枝方法。具体来说，我们将信息熵视为卷积层的预期信息量。此外，如果我们将一个矩阵视为一个线性方程组，高秩矩阵表示存在更多解，这表明存在更多的不确定性。从信息理论的角度来看，秩也可以描述信息的量。在神经网络中，考虑到秩和熵作为卷积层的两个信息指标，我们提出了一个融合函数来达到它们之间的折衷，其中融合结果被定义为“信息集中度”。在预定义逐层剪枝比例时，我们使用信息集中度作为参考，而不是启发式和工程调整，以提供一个更可解释的解决方案。此外，我们利用Shapley值，这是神经网络可解释性的有效工具，来评估通道的贡献并丢弃最不重要的通道，以实现模型压缩同时保持其性能。大量实验证明了我们方法的有效性和有前途的性能。例如，我们的方法在减少45.5％的FLOPs和去除40.3％参数的情况下，将ResNet-56在CIFAR-10上的准确性提高了0.21％。此外，我们的方法在减少41.6％的FLOPs和去除35.0％参数的情况下，将ResNet-50在ImageNet上的Top-1/Top-5准确性损失降低了0.43％/0.11％。

更新时间: 2024-09-02 13:19:40

领域: cs.IT,cs.AI,cs.LG,math.IT

下载: http://arxiv.org/abs/2408.16772v2

Sample Complexity of the Sign-Perturbed Sums Method

We study the sample complexity of the Sign-Perturbed Sums (SPS) method, which constructs exact, non-asymptotic confidence regions for the true system parameters under mild statistical assumptions, such as independent and symmetric noise terms. The standard version of SPS deals with linear regression problems, however, it can be generalized to stochastic linear (dynamical) systems, even with closed-loop setups, and to nonlinear and nonparametric problems, as well. Although the strong consistency of the method was rigorously proven, the sample complexity of the algorithm was only analyzed so far for scalar linear regression problems. In this paper we study the sample complexity of SPS for general linear regression problems. We establish high probability upper bounds for the diameters of SPS confidence regions for finite sample sizes and show that the SPS regions shrink at the same, optimal rate as the classical asymptotic confidence ellipsoids. Finally, the difference between the theoretical bounds and the empirical sizes of SPS confidence regions is investigated experimentally.

Updated: 2024-09-02 13:18:53

标题: 样本复杂性的符号扰动求和方法

摘要: 我们研究了Sign-Perturbed Sums（SPS）方法的样本复杂度，该方法在轻微的统计假设下构建了真实系统参数的精确、非渐近置信区间，例如独立和对称的噪声项。标准版本的SPS处理线性回归问题，但它可以推广到随机线性（动态）系统，甚至闭环设置，以及非线性和非参数化问题。尽管该方法的强一致性已得到严格证明，但到目前为止，该算法的样本复杂度仅针对标量线性回归问题进行了分析。在本文中，我们研究了SPS在一般线性回归问题中的样本复杂度。我们为有限样本大小建立了SPS置信区间直径的高概率上界，并展示了SPS区域与经典渐近置信椭圆相同的最佳收缩速率。最后，实验性地调查了理论界限与SPS置信区间的经验尺寸之间的差异。

更新时间: 2024-09-02 13:18:53

领域: stat.ML,cs.LG,cs.SY,eess.SP,eess.SY,stat.ME

下载: http://arxiv.org/abs/2409.01243v1

CyberCortex.AI: An AI-based Operating System for Autonomous Robotics and Complex Automation

The underlying framework for controlling autonomous robots and complex automation applications are Operating Systems (OS) capable of scheduling perception-and-control tasks, as well as providing real-time data communication to other robotic peers and remote cloud computers. In this paper, we introduce CyberCortex.AI, a robotics OS designed to enable heterogeneous AI-based robotics and complex automation applications. CyberCortex.AI is a decentralized distributed OS which enables robots to talk to each other, as well as to High Performance Computers (HPC) in the cloud. Sensory and control data from the robots is streamed towards HPC systems with the purpose of training AI algorithms, which are afterwards deployed on the robots. Each functionality of a robot (e.g. sensory data acquisition, path planning, motion control, etc.) is executed within a so-called DataBlock of Filters shared through the internet, where each filter is computed either locally on the robot itself, or remotely on a different robotic system. The data is stored and accessed via a so-called \textit{Temporal Addressable Memory} (TAM), which acts as a gateway between each filter's input and output. CyberCortex.AI has two main components: i) the CyberCortex.AI.inference system, which is a real-time implementation of the DataBlock running on the robots' embedded hardware, and ii) the CyberCortex.AI.dojo, which runs on an HPC computer in the cloud, and it is used to design, train and deploy AI algorithms. We present a quantitative and qualitative performance analysis of the proposed approach using two collaborative robotics applications: \textit{i}) a forest fires prevention system based on an Unitree A1 legged robot and an Anafi Parrot 4K drone, as well as \textit{ii}) an autonomous driving system which uses CyberCortex.AI for collaborative perception and motion control.

Updated: 2024-09-02 13:14:50

标题: CyberCortex.AI：用于自主机器人和复杂自动化的基于人工智能的操作系统

摘要: 用于控制自主机器人和复杂自动化应用程序的基本框架是能够调度感知和控制任务，并提供实时数据通信给其他机器人同行和远程云计算机的操作系统（OS）。在本文中，我们介绍了CyberCortex.AI，这是一个旨在实现异构基于人工智能的机器人和复杂自动化应用程序的机器人操作系统。CyberCortex.AI是一个分散的分布式操作系统，使机器人能够相互通信，以及与云中的高性能计算机（HPC）通信。从机器人中传输的感知和控制数据被流向HPC系统，目的是训练AI算法，然后将这些算法部署到机器人上。机器人的每个功能（例如感知数据获取、路径规划、运动控制等）都在通过互联网共享的所谓的Filters DataBlock中执行，其中每个过滤器可以在机器人本身上本地计算，也可以在不同的机器人系统上远程计算。数据通过一个称为“时间可寻址内存”（TAM）进行存储和访问，该内存充当每个过滤器输入和输出之间的网关。CyberCortex.AI具有两个主要组件：i）CyberCortex.AI.inference系统，这是在机器人嵌入式硬件上运行的DataBlock的实时实现，以及ii）CyberCortex.AI.dojo，在云中的HPC计算机上运行，用于设计、训练和部署AI算法。我们使用两个协作机器人应用程序对所提出的方法进行定量和定性性能分析：i）基于Unitree A1腿式机器人和Anafi Parrot 4K无人机的森林防火系统，以及ii）使用CyberCortex.AI进行协作感知和运动控制的自主驾驶系统。

更新时间: 2024-09-02 13:14:50

领域: cs.RO,cs.AI,cs.OS

下载: http://arxiv.org/abs/2409.01241v1

MRI-based and metabolomics-based age scores act synergetically for mortality prediction shown by multi-cohort federated learning

Biological age scores are an emerging tool to characterize aging by estimating chronological age based on physiological biomarkers. Various scores have shown associations with aging-related outcomes. This study assessed the relation between an age score based on brain MRI images (BrainAge) and an age score based on metabolomic biomarkers (MetaboAge). We trained a federated deep learning model to estimate BrainAge in three cohorts. The federated BrainAge model yielded significantly lower error for age prediction across the cohorts than locally trained models. Harmonizing the age interval between cohorts further improved BrainAge accuracy. Subsequently, we compared BrainAge with MetaboAge using federated association and survival analyses. The results showed a small association between BrainAge and MetaboAge as well as a higher predictive value for the time to mortality of both scores combined than for the individual scores. Hence, our study suggests that both aging scores capture different aspects of the aging process.

Updated: 2024-09-02 13:11:37

标题: MRI和代谢组学基于年龄评分在多中心联合学习中显示出协同作用，用于死亡预测

摘要: 生物年龄评分是一种新兴工具，通过估计基于生理生物标志物的年龄来表征衰老。各种评分已显示与衰老相关结果的关联。本研究评估了基于脑MRI图像（BrainAge）和基于代谢组学生物标志物（MetaboAge）的年龄评分之间的关系。我们训练了一个联邦深度学习模型来估计三个队列中的BrainAge。联邦BrainAge模型在跨队列的年龄预测方面产生了显著较低的误差，比本地训练模型更好。通过协调队列之间的年龄间隔进一步提高了BrainAge的准确性。随后，我们使用联邦关联和生存分析比较了BrainAge和MetaboAge。结果显示BrainAge和MetaboAge之间存在一定关联，以及两个评分结合对死亡时间具有更高的预测价值，而不是个别评分。因此，我们的研究表明，这两种衰老评分捕捉了衰老过程的不同方面。

更新时间: 2024-09-02 13:11:37

领域: q-bio.QM,cs.LG,I.2.1

下载: http://arxiv.org/abs/2409.01235v1

SoK: Security of the Image Processing Pipeline in Autonomous Vehicles

Cameras are crucial sensors for autonomous vehicles. They capture images that are essential for many safety-critical tasks, including perception. To process these images, a complex pipeline with multiple layers is used. Security attacks on this pipeline can severely affect passenger safety and system performance. However, many attacks overlook different layers of the pipeline, and their feasibility and impact vary. While there has been research to improve the quality and robustness of the image processing pipeline, these efforts often work in parallel with security research, without much awareness of their potential synergy. In this work, we aim to bridge this gap by combining security and robustness research for the image processing pipeline in autonomous vehicles. We classify the risk of attacks using the automotive security standard ISO 21434, emphasizing the need to consider all layers for overall system security. We also demonstrate how existing robustness research can help mitigate the impact of attacks, addressing the current research gap. Finally, we present an embedded testbed that can influence various parameters across all layers, allowing researchers to analyze the effects of different defense strategies and attack impacts. We demonstrate the importance of such a test environment through a use-case analysis and show how blinding attacks can be mitigated using HDR imaging as an example of robustness-related research.

Updated: 2024-09-02 13:10:53

标题: SoK: 自动驾驶车辆中图像处理管线的安全性

摘要: 摄像头是自动驾驶车辆的关键传感器。它们捕捉对于许多安全关键任务，包括感知，至关重要的图像。为了处理这些图像，需要使用具有多个层的复杂流水线。对该流水线的安全攻击可能严重影响乘客安全和系统性能。然而，许多攻击忽略了流水线的不同层，它们的可行性和影响各不相同。虽然已经有研究致力于改进图像处理流水线的质量和稳健性，但这些努力通常与安全研究并行进行，缺乏对它们潜在协同作用的认识。在这项工作中，我们旨在通过将安全性和稳健性研究结合起来，为自动驾驶车辆的图像处理流水线填补这一差距。我们使用汽车安全标准ISO 21434对攻击风险进行分类，强调需要考虑整个系统安全的所有层。我们还展示了现有稳健性研究如何帮助减轻攻击的影响，解决当前的研究缺口。最后，我们提出了一个嵌入式测试平台，可以影响所有层的各种参数，使研究人员能够分析不同防御策略和攻击影响的效果。我们通过一个用例分析展示了这样一个测试环境的重要性，并展示了如何使用HDR成像作为稳健性相关研究的示例来减轻致盲攻击。

更新时间: 2024-09-02 13:10:53

领域: cs.CR

下载: http://arxiv.org/abs/2409.01234v1

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data

In many experimental and observational studies, the outcome of interest is often difficult or expensive to observe, reducing effective sample sizes for estimating average treatment effects (ATEs) even when identifiable. We study how incorporating data on units for which only surrogate outcomes not of primary interest are observed can increase the precision of ATE estimation. We refrain from imposing stringent surrogacy conditions, which permit surrogates as perfect replacements for the target outcome. Instead, we supplement the available, albeit limited, observations of the target outcome with abundant observations of surrogate outcomes, without any assumptions beyond unconfounded treatment assignment and missingness and corresponding overlap conditions. To quantify the potential gains, we derive the difference in efficiency bounds on ATE estimation with and without surrogates, both when an overwhelming or comparable number of units have missing outcomes. We develop robust ATE estimation and inference methods that realize these efficiency gains. We empirically demonstrate the gains by studying long-term-earning effects of job training.

Updated: 2024-09-02 12:59:59

标题: 关于替代品在有限结果数据下有效估计治疗效果的作用

摘要: 在许多实验和观察性研究中，感兴趣的结果往往难以观察或昂贵，降低了估计平均治疗效应（ATEs）的有效样本量，即使是可识别的。我们研究了如何将仅观察到非主要感兴趣的替代结果的单位的数据纳入，以增加ATE估计的精度。我们避免强加严格的替代条件，允许替代物作为目标结果的完美替代。相反，我们利用替代结果的丰富观察结果，补充目标结果的有限观察结果，没有其他假设。为了量化潜在的收益，我们推导了使用替代品和不使用替代品对ATE估计的效率界限的差异，无论是当有大量或数量相当的单位缺少结果时。我们开发了稳健的ATE估计和推断方法，实现了这些效率收益。我们通过研究职业培训的长期收入效应来实证地展示这些收益。

更新时间: 2024-09-02 12:59:59

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2003.12408v4

Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling

Learning-based methods have gained attention as general-purpose solvers because they can automatically learn problem-specific heuristics, reducing the need for manually crafted heuristics. However, these methods often face challenges with scalability. To address these issues, the improved Sampling algorithm for Combinatorial Optimization (iSCO) using discrete Langevin dynamics has been proposed, demonstrating better performance than several learning-based solvers. This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA). QQA smoothly transitions the objective function from a simple convex form, where half-integral solutions dominate, to the original objective function, where the variables are restricted to 0 or 1. Furthermore, we incorporate parallel run communication leveraging GPUs, enhancing exploration capabilities and accelerating convergence. Numerical experiments demonstrate that our approach is a competitive general-purpose solver, achieving comparable performance to iSCO across various benchmark problems. Notably, our method exhibits superior trade-offs between speed and solution quality for large-scale instances compared to iSCO, commercial solvers, and specialized algorithms.

Updated: 2024-09-02 12:55:27

标题: 用梯度采样的并行准量子退火优化

摘要: 基于学习的方法作为通用求解器已经引起了关注，因为它们可以自动学习特定问题的启发式方法，减少了手工制定启发式方法的需要。然而，这些方法通常面临可扩展性方面的挑战。为了解决这些问题，提出了一种改进的组合优化采样算法（iSCO），使用离散朗格朗日动力学，表现出比几种基于学习的求解器更好的性能。本研究提出了一种不同的方法，通过连续松弛结合基于梯度的更新，并结合准量子退火（QQA）。QQA将目标函数平滑地从简单的凸形式过渡到原始目标函数，其中半整数解占主导地位，然后变量被限制为0或1。此外，我们结合了利用GPU的并行运行通信，增强了探索能力并加速了收敛速度。数值实验表明，我们的方法是一种具有竞争力的通用求解器，在各种基准问题上实现了与iSCO相当的性能。值得注意的是，我们的方法在大规模情况下在速度和解决方案质量之间展现出优越的权衡，相比于iSCO、商业求解器和专门算法。

更新时间: 2024-09-02 12:55:27

领域: cs.LG,stat.CO,stat.ME,stat.ML

下载: http://arxiv.org/abs/2409.02135v1

Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model

Existing Transformer-based models for point cloud analysis suffer from quadratic complexity, leading to compromised point cloud resolution and information loss. In contrast, the newly proposed Mamba model, based on state space models (SSM), outperforms Transformer in multiple areas with only linear complexity. However, the straightforward adoption of Mamba does not achieve satisfactory performance on point cloud tasks. In this work, we present Mamba3D, a state space model tailored for point cloud learning to enhance local feature extraction, achieving superior performance, high efficiency, and scalability potential. Specifically, we propose a simple yet effective Local Norm Pooling (LNP) block to extract local geometric features. Additionally, to obtain better global features, we introduce a bidirectional SSM (bi-SSM) with both a token forward SSM and a novel backward SSM that operates on the feature channel. Extensive experimental results show that Mamba3D surpasses Transformer-based counterparts and concurrent works in multiple tasks, with or without pre-training. Notably, Mamba3D achieves multiple SoTA, including an overall accuracy of 92.6% (train from scratch) on the ScanObjectNN and 95.1% (with single-modal pre-training) on the ModelNet40 classification task, with only linear complexity. Our code and weights are available at https://github.com/xhanxu/Mamba3D.

Updated: 2024-09-02 12:55:04

标题: Mamba3D：通过状态空间模型增强3D点云分析的局部特征

摘要: 现有基于Transformer的点云分析模型存在二次复杂度问题，导致点云分辨率受损和信息丢失。相比之下，基于状态空间模型（SSM）的新提出的Mamba模型在多个方面表现优于Transformer，且仅具有线性复杂度。然而，直接采用Mamba并不能在点云任务上取得令人满意的性能。在本研究中，我们提出了Mamba3D，这是一种专门用于点云学习的状态空间模型，用于增强局部特征提取，实现了卓越性能、高效率和可扩展性潜力。具体来说，我们提出了一个简单而有效的局部规范池（LNP）块来提取局部几何特征。此外，为了获得更好的全局特征，我们引入了一个双向SSM（bi-SSM），其中包括一个令牌向前SSM和一个在特征通道上操作的新颖的向后SSM。广泛的实验结果表明，Mamba3D在多个任务中超越了基于Transformer的对手和同时进行的工作，无论是否进行了预训练。值得注意的是，Mamba3D在ScanObjectNN上实现了92.6％的整体准确率（从头开始训练），在ModelNet40分类任务上实现了95.1％（单模态预训练），且仅具有线性复杂度。我们的代码和权重可在https://github.com/xhanxu/Mamba3D 上获取。

更新时间: 2024-09-02 12:55:04

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.14966v2

A multilingual training strategy for low resource Text to Speech

Recent speech technologies have led to produce high quality synthesised speech due to recent advances in neural Text to Speech (TTS). However, such TTS models depend on extensive amounts of data that can be costly to produce and is hardly scalable to all existing languages, especially that seldom attention is given to low resource languages. With techniques such as knowledge transfer, the burden of creating datasets can be alleviated. In this paper, we therefore investigate two aspects; firstly, whether data from social media can be used for a small TTS dataset construction, and secondly whether cross lingual transfer learning (TL) for a low resource language can work with this type of data. In this aspect, we specifically assess to what extent multilingual modeling can be leveraged as an alternative to training on monolingual corporas. To do so, we explore how data from foreign languages may be selected and pooled to train a TTS model for a target low resource language. Our findings show that multilingual pre-training is better than monolingual pre-training at increasing the intelligibility and naturalness of the generated speech.

Updated: 2024-09-02 12:53:01

标题: 一个适用于资源匮乏的文本到语音的多语种培训策略

摘要: 最近的语音技术已经导致产生高质量的合成语音，这是由于神经文本到语音（TTS）的最新进展。然而，这种TTS模型依赖于大量的数据，生产这些数据可能成本高昂，并且很难扩展到所有现有的语言，尤其是对低资源语言很少关注。通过知识转移等技术，可以减轻创建数据集的负担。因此，在本文中，我们研究了两个方面；首先，社交媒体的数据是否可以用于小规模TTS数据集的构建，其次，跨语言迁移学习（TL）对于低资源语言是否可以利用这种类型的数据。在这方面，我们特别评估多语言建模在何种程度上可以作为训练单语语料库的替代方法。为此，我们探讨了如何选择和汇集外语数据来训练目标低资源语言的TTS模型。我们的研究结果表明，多语言预训练比单语预训练更有助于提高生成语音的可懂性和自然性。

更新时间: 2024-09-02 12:53:01

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2409.01217v1

ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a novel Enhanced Semantic Performance Point Cloud Transformer with a two-stage semantic recognition framework tailored for VR applications. ESP-PCT takes advantage of the accuracy of sensory point cloud data and optimizes the semantic recognition process, where the localization and focus stages are trained jointly in an end-to-end manner. We evaluate ESP-PCT on various VR semantic recognition conditions, demonstrating substantial enhancements in recognition efficiency. Notably, ESP-PCT achieves a remarkable accuracy of 93.2% while reducing the computational requirements (FLOPs) by 76.9% and memory usage by 78.2% compared to the existing Point Transformer model simultaneously. These underscore ESP-PCT's potential in VR semantic recognition by achieving high accuracy and reducing redundancy. The code and data of this project are available at \url{https://github.com/lymei-SEU/ESP-PCT}.

Updated: 2024-09-02 12:48:40

标题: ESP-PCT:通过点云变换器中时间和空间冗余的高效压缩增强VR语义性能

摘要: 语义识别在虚拟现实（VR）应用中至关重要，可以实现沉浸式和交互式体验。一种有前途的方法是利用毫米波（mmWave）信号生成点云。然而，当前mmWave点云模型的高计算和内存需求阻碍了它们的效率和可靠性。为了解决这一限制，本文介绍了ESP-PCT，一种专为VR应用量身定制的新型增强语义性能点云变换器，具有两阶段语义识别框架。ESP-PCT利用感知点云数据的准确性，并优化语义识别过程，其中定位和聚焦阶段以端到端的方式联合训练。我们在各种VR语义识别条件下评估了ESP-PCT，展示了识别效率的显著提升。值得注意的是，ESP-PCT在减少计算要求（FLOPs）76.9％和内存使用率78.2％的同时，实现了93.2％的显著准确度，与现有的Point Transformer模型相比。这些突出了ESP-PCT在VR语义识别中实现高准确度和减少冗余的潜力。该项目的代码和数据可在\url{https://github.com/lymei-SEU/ESP-PCT}上找到。

更新时间: 2024-09-02 12:48:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01216v1

SBOM Generation Tools in the Python Ecosystem: an In-Detail Analysis

Software Bills of Material (SBOMs), which improve transparency by listing the components constituting software, are a key countermeasure to the mounting problem of Software Supply Chain attacks. SBOM generation tools take project source files and provide an SBOM as output, interacting with the software ecosystem. While SBOMs are a substantial improvement for security practitioners, providing a complete and correct SBOM is still an open problem. This paper investigates the causes of the issues affecting SBOM completeness and correctness, focusing on the PyPI ecosystem. We analyze four popular SBOM generation tools using the CycloneDX standard. Our analysis highlights issues related to dependency versions, metadata files, remote dependencies, and optional dependencies. Additionally, we identified a systematic issue with the lack of standards for metadata in the PyPI ecosystem. This includes inconsistencies in the presence of metadata files as well as variations in how their content is formatted.

Updated: 2024-09-02 12:48:10

标题: Python生态系统中的SBOM生成工具：详细分析

摘要: 软件物料清单（SBOMs）通过列出构成软件的组件，提高透明度，是对软件供应链攻击不断增加的问题的关键对策。SBOM生成工具获取项目源文件并生成SBOM作为输出，与软件生态系统进行交互。虽然SBOM对于安全从业者来说是一个重大改进，但提供完整和正确的SBOM仍然是一个未解之谜。本文调查了影响SBOM完整性和正确性的问题的原因，重点关注PyPI生态系统。我们使用CycloneDX标准分析了四种流行的SBOM生成工具。我们的分析突出了与依赖版本、元数据文件、远程依赖和可选依赖相关的问题。此外，我们还发现了PyPI生态系统中元数据标准缺乏的系统性问题。这包括元数据文件存在的不一致性，以及它们的内容格式化方式的差异。

更新时间: 2024-09-02 12:48:10

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2409.01214v1

Supervised Pattern Recognition Involving Skewed Feature Densities

Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities. Given two groups characterized by respective densities without or with overlap, different types of respective transformations are obtained and employed to quantitatively evaluate the performance of k-neighbors methodologies based on the Euclidean distance an coincidence similarity index. More specifically, the accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account for the comparison. Several interesting results are described and discussed, including the enhanced potential of the dissimilarity index for classifying datasets with right skewed feature densities, as well as the identification that the sharpness of the comparison between data elements can be independent of the respective supervised classification performance.

Updated: 2024-09-02 12:45:18

标题: 监督学习中涉及偏斜特征密度的模式识别

摘要: 模式识别是科学和技术活动中一个特别重要的任务。与此同时，模式识别涉及一些挑战，包括选择用于表示数据元素的特征，以及可能的相应转换。本文比较了基于欧氏距离和基于巧合相似度指数的不相似性指数的分类潜力，分别使用k-邻居监督分类方法对来自一维和二维对称密度的几种转换结果的特征进行比较。给定两组具有重叠或无重叠特征的密度，获得并应用不同类型的相应转换来定量评估基于欧氏距离和巧合相似度指数的k-邻居方法的性能。更具体地，将分类两组相邻密度之间的交点的准确性考虑在内进行比较。描述并讨论了几个有趣的结果，包括不相似性指数在分类具有右偏特征密度的数据集方面的增强潜力，以及发现数据元素之间比较的尖锐度可能独立于相应的监督分类性能。

更新时间: 2024-09-02 12:45:18

领域: cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2409.01213v1

Identifying Weight-Variant Latent Causal Models

The task of causal representation learning aims to uncover latent higher-level causal representations that affect lower-level observations. Identifying true latent causal representations from observed data, while allowing instantaneous causal relations among latent variables, remains a challenge, however. To this end, we start from the analysis of three intrinsic properties in identifying latent space from observations: transitivity, permutation indeterminacy, and scaling indeterminacy. We find that transitivity acts as a key role in impeding the identifiability of latent causal representations. To address the unidentifiable issue due to transitivity, we introduce a novel identifiability condition where the underlying latent causal model satisfies a linear-Gaussian model, in which the causal coefficients and the distribution of Gaussian noise are modulated by an additional observed variable. Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling. Furthermore, based on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them, together with the mapping from the latent causal variables to the observed ones. We show that the proposed method learns the true parameters asymptotically. Experimental results on synthetic and real data demonstrate the identifiability and consistency results and the efficacy of the proposed method in learning latent causal representations.

Updated: 2024-09-02 12:44:58

标题: 识别变重潜在因果模型

摘要: 因果表征学习的任务旨在揭示影响较低级观察的潜在高级因果表征。然而，从观察数据中识别真正的潜在因果表征，同时允许潜在变量之间的瞬时因果关系，仍然是一个挑战。为此，我们从识别观察数据中的潜在空间的三个固有属性的分析开始：传递性、置换不确定性和缩放不确定性。我们发现传递性在阻碍潜在因果表征的可识别性方面起着关键作用。为了解决传递性导致的不可识别问题，我们引入了一个新的可识别性条件，其中潜在因果模型满足线性高斯模型，其中因果系数和高斯噪声的分布由另一个观察变量调节。在一些温和的假设下，我们可以证明潜在因果表征可以被识别，直到微小的置换和缩放。此外，基于这一理论结果，我们提出了一种新方法，称为结构因果变分自动编码器，该方法直接学习潜在因果表征以及它们之间的因果关系，以及从潜在因果变量到观察变量的映射。我们展示了所提出的方法在渐近情况下学习真实参数。合成和真实数据的实验结果证明了所提出方法在学习潜在因果表征方面的可识别性和一致性结果以及有效性。

更新时间: 2024-09-02 12:44:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2208.14153v6

DIDChain: Advancing Supply Chain Data Management with Decentralized Identifiers and Blockchain

Supply chain data management faces challenges in traceability, transparency, and trust. These issues stem from data silos and communication barriers. This research introduces DIDChain, a framework leveraging blockchain technology, Decentralized Identifiers, and the InterPlanetary File System. DIDChain improves supply chain data management. To address privacy concerns, DIDChain employs a hybrid blockchain architecture that combines public blockchain transparency with the control of private systems. Our hybrid approach preserves the authenticity and reliability of supply chain events. It also respects the data privacy requirements of the participants in the supply chain. Central to DIDChain is the cheqd infrastructure. The cheqd infrastructure enables digital tracing of asset events, such as an asset moving from the milk-producing dairy farm to the cheese manufacturer. In this research, assets are raw materials and products. The cheqd infrastructure ensures the traceability and reliability of assets in the management of supply chain data. Our contribution to blockchain-enabled supply chain systems demonstrates the robustness of DIDChain. Integrating blockchain technology through DIDChain offers a solution to data silos and communication barriers. With DIDChain, we propose a framework to transform the supply chain infrastructure across industries.

Updated: 2024-09-02 12:44:58

标题: DIDChain：利用去中心化标识和区块链推进供应链数据管理

摘要: 供应链数据管理面临着追溯性、透明度和信任度等方面的挑战。这些问题源于数据孤岛和沟通障碍。本研究介绍了DIDChain，这是一个利用区块链技术、分散式标识符和星际文件系统的框架，可以改善供应链数据管理。为了解决隐私问题，DIDChain采用了混合区块链架构，将公共区块链透明度与私人系统控制相结合。我们的混合方法保留了供应链事件的真实性和可靠性，同时尊重参与者在供应链中的数据隐私要求。DIDChain的核心是cheqd基础设施，它可以实现资产事件的数字追踪，比如资产从生产牛奶的乳制品场到奶酪生产商的移动。在本研究中，资产指原材料和产品。cheqd基础设施确保了在供应链数据管理中资产的追溯性和可靠性。我们对区块链启用的供应链系统的贡献展示了DIDChain的稳健性。通过DIDChain整合区块链技术可以解决数据孤岛和沟通障碍。我们提出了一个框架，可以改变跨行业的供应链基础设施。

更新时间: 2024-09-02 12:44:58

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2406.11356v2

Lifecycle Management of Resumés with Decentralized Identifiers and Verifiable Credentials

Trust in applications is crucial for fast and efficient hiring processes. Applicants must present verifiable credentials that employers can trust without delays or the risk of fraudulent information. This paper introduces a trust framework for managing digital resum\'e credentials, addressing trust challenges by leveraging Decentralized Applications, Decentralized Identifiers, and Verifiable Credentials. We propose a framework for real-time issuance, storage, and verification of Verifiable Credentials without intermediaries. We showcase the integration of the European Blockchain Service Infrastructure as a trust anchor. Furthermore, we demonstrate a streamlined application process, reducing verification times and fostering a reliable credentialing ecosystem across various sectors, including recruitment and professional certification.

Updated: 2024-09-02 12:40:52

标题: 简历的生命周期管理：使用去中心化标识符和可验证凭证

摘要: 应用程序中的信任对于快速和高效的招聘流程至关重要。申请人必须提供可验证的证书，雇主可以在没有延迟或欺诈信息风险的情况下信任。本文介绍了一个信任框架，用于管理数字简历凭证，通过利用去中心化应用程序、去中心化标识和可验证凭证来解决信任挑战。我们提出了一个用于实时发放、存储和验证可验证凭证的框架，无需中介。我们展示了欧洲区块链服务基础设施作为信任锚点的集成。此外，我们展示了一个简化的应用程序流程，缩短了验证时间，并促进了可靠的凭证生态系统在招聘和专业认证等各个领域的跨界整合。

更新时间: 2024-09-02 12:40:52

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2406.11535v2

Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems

In Integrated Sensing and Communication (ISAC) systems, matching the radar targets with communication user equipments (UEs) is functional to several communication tasks, such as proactive handover and beam prediction. In this paper, we consider a radar-assisted communication system where a base station (BS) is equipped with a multiple-input-multiple-output (MIMO) radar that has a double aim: (i) associate vehicular radar targets to vehicular equipments (VEs) in the communication beamspace and (ii) predict the beamforming vector for each VE from radar data. The proposed target-to-user (T2U) association consists of two stages. First, vehicular radar targets are detected from range-angle images, and, for each, a beamforming vector is estimated. Then, the inferred per-target beamforming vectors are matched with the ones utilized at the BS for communication to perform target-to-user (T2U) association. Joint multi-target detection and beam inference is obtained by modifying the you only look once (YOLO) model, which is trained over simulated range-angle radar images. Simulation results over different urban vehicular mobility scenarios show that the proposed T2U method provides a probability of correct association that increases with the size of the BS antenna array, highlighting the respective increase of the separability of the VEs in the beamspace. Moreover, we show that the modified YOLO architecture can effectively perform both beam prediction and radar target detection, with similar performance in mean average precision on the latter over different antenna array sizes.

Updated: 2024-09-02 12:37:27

标题: 基于深度学习的集成感知和通信系统中的目标到用户关联

摘要: 在集成感知和通信（ISAC）系统中，将雷达目标与通信用户设备（UEs）匹配对于多个通信任务是功能性的，比如主动切换和波束预测。在本文中，我们考虑了一个雷达辅助通信系统，其中一个基站（BS）配备有一个多输入多输出（MIMO）雷达，其具有双重目标：（i）将车载雷达目标与通信波束中的车载设备（VEs）相关联，（ii）从雷达数据中预测每个VE的波束形成向量。所提出的目标到用户（T2U）关联包括两个阶段。首先，从距离-角度图像中检测车载雷达目标，并为每个目标估计一个波束形成向量。然后，推断的每个目标的波束形成向量与BS用于通信的波束形成向量进行匹配，以执行目标到用户（T2U）关联。通过修改“你只看一次”（YOLO）模型，实现联合多目标检测和波束推断，该模型在模拟的距离-角度雷达图像上进行训练。在不同的城市车辆移动场景下的模拟结果表明，所提出的T2U方法提供了随着BS天线阵列规模增加而增加的正确关联概率，突出了车载设备在波束空间中可分离性的增加。此外，我们展示修改后的YOLO架构可以有效地执行波束预测和雷达目标检测，在不同天线阵列规模上对后者的平均精度表现相似。

更新时间: 2024-09-02 12:37:27

领域: cs.NI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2401.12801v2

Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models

Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based neural networks. Although Transformer-based large models (LMs), including language, vision, and multimodal models, have demonstrated impressive capabilities in AI-generated content (AIGC), their application in industrial domains, such as detection, planning, and control, remains relatively limited. Deploying pre-trained LMs in industrial environments often encounters the challenge of stability and plasticity due to the complexity of tasks, the diversity of data, and the dynamic nature of user demands. To address these challenges, the pre-training and fine-tuning strategy, coupled with continual learning, has proven to be an effective solution, enabling models to adapt to dynamic demands while continuously optimizing their inference and decision-making capabilities. This paper surveys the integration of LMs into IIoT-enhanced General Industrial Intelligence (GII), focusing on two key areas: LMs for GII and LMs on GII. The former focuses on leveraging LMs to provide optimized solutions for industrial application challenges, while the latter investigates continuous optimization of LMs learning and inference capabilities in collaborative scenarios involving industrial devices, edge computing, and cloud computing. This paper provides insights into the future development of GII, aiming to establish a comprehensive theoretical framework and research direction for GII, thereby advancing GII towards a more general and adaptive future.

Updated: 2024-09-02 12:35:59

标题: 通往普遍工业智能：关于IIoT增强的持续大模型的调查

摘要: 目前，工业物联网（IIoT）中大多数应用仍然依赖于基于CNN的神经网络。尽管基于Transformer的大型模型（LMs），包括语言、视觉和多模态模型，在人工智能生成内容（AIGC）方面展示出令人印象深刻的能力，但它们在工业领域的应用，如检测、规划和控制，仍然相对有限。在工业环境中部署预训练的LMs通常会遇到稳定性和可塑性挑战，这是由于任务的复杂性、数据的多样性和用户需求的动态性。为了解决这些挑战，预训练和微调策略，结合持续学习，已被证明是一种有效的解决方案，使模型能够适应动态需求，同时不断优化它们的推理和决策能力。本文调查了LMs如何整合到增强普通工业智能（GII）中，重点关注两个关键领域：GII的LMs和GII上的LMs。前者专注于利用LMs为工业应用挑战提供优化解决方案，而后者研究LMs在涉及工业设备、边缘计算和云计算的协作场景中不断优化学习和推理能力。本文为GII未来发展提供了见解，旨在建立一个全面的理论框架和研究方向，从而推动GII朝着更普遍和适应性的未来发展。

更新时间: 2024-09-02 12:35:59

领域: cs.LG

下载: http://arxiv.org/abs/2409.01207v1

EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.

Updated: 2024-09-02 12:23:18

标题: EnCLAP++：分析EnCLAP框架以优化自动音频字幕性能

摘要: 在这项工作中，我们旨在分析和优化EnCLAP框架，这是一种自动音频字幕的最新模型。我们研究了修改声学编码器组件的影响，探索了使用不同数据集规模进行预训练，并研究了重新排序方案的有效性。通过广泛的实验和对生成字幕的定量分析，我们开发了EnCLAP++，这是一个显著优于原始版本的增强版。

更新时间: 2024-09-02 12:23:18

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2409.01201v1

RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model

The increasing sophistication of text-to-image generative models has led to complex challenges in defining and enforcing copyright infringement criteria and protection. Existing methods, such as watermarking and dataset deduplication, fail to provide comprehensive solutions due to the lack of standardized metrics and the inherent complexity of addressing copyright infringement in diffusion models. To deal with these challenges, we propose a Reinforcement Learning-based Copyright Protection(RLCP) method for Text-to-Image Diffusion Model, which minimizes the generation of copyright-infringing content while maintaining the quality of the model-generated dataset. Our approach begins with the introduction of a novel copyright metric grounded in copyright law and court precedents on infringement. We then utilize the Denoising Diffusion Policy Optimization (DDPO) framework to guide the model through a multi-step decision-making process, optimizing it using a reward function that incorporates our proposed copyright metric. Additionally, we employ KL divergence as a regularization term to mitigate some failure modes and stabilize RL fine-tuning. Experiments conducted on 3 mixed datasets of copyright and non-copyright images demonstrate that our approach significantly reduces copyright infringement risk while maintaining image quality.

Updated: 2024-09-02 12:15:16

标题: RLCP：一种基于强化学习的文本到图像扩散模型版权保护方法

摘要: 文本到图像生成模型日益复杂化，导致在定义和执行侵犯版权标准和保护方面面临复杂挑战。现有方法，如水印和数据集去重，由于缺乏标准化指标以及在扩散模型中处理版权侵权的固有复杂性，无法提供全面的解决方案。为了应对这些挑战，我们提出了一种基于强化学习的文本到图像扩散模型版权保护（RLCP）方法，该方法在保持模型生成数据集质量的同时最小化侵犯版权内容的生成。我们的方法首先引入了一种基于版权法和侵权法院先例的新型版权指标。然后，我们利用去噪扩散策略优化（DDPO）框架引导模型通过多步决策过程，使用包含我们提出的版权指标的奖励函数进行优化。此外，我们利用KL散度作为正则化项，以减少一些失败模式并稳定强化学习微调。在三个混合版权和非版权图像数据集上进行的实验表明，我们的方法显著降低了版权侵犯风险，同时保持了图像质量。

更新时间: 2024-09-02 12:15:16

领域: cs.CY,cs.AI,cs.CR

下载: http://arxiv.org/abs/2408.16634v2

Barlow Twins Deep Neural Network for Advanced 1D Drug-Target Interaction Prediction

Accurate prediction of drug-target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive performance against multiple established benchmarks using only one-dimensional input. The use of gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for substantial computational resources. We also investigate how the model reaches its decision based on individual training samples. By comparing co-crystal structures, we find that BarlowDTI effectively exploits catalytically active and stabilising residues, highlighting the model's ability to generalise from one-dimensional input data. In addition, we further benchmark new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug-target interaction predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions. Therefore, we provide an easy-to-use web interface that can be freely accessed at https://www.bio.nat.tum.de/oc2/barlowdti .

Updated: 2024-09-02 12:00:23

标题: 巴洛双胞胎深度神经网络用于先进的1D药物靶标相互作用预测

摘要: 准确预测药物靶标相互作用对于推动药物发现至关重要。通过减少时间和成本，机器学习和深度学习可以加速这一繁重的发现过程。在一种新颖的方法BarlowDTI中，我们利用强大的Barlow Twins架构进行特征提取，同时考虑靶蛋白的结构。我们的方法仅使用一维输入就能实现与多个已建立基准的最先进预测性能。梯度提升机作为基础预测器确保了快速高效的预测，无需大量计算资源。我们还研究了模型如何根据个体训练样本做出决策。通过比较共结晶结构，我们发现BarlowDTI有效利用了催化活性和稳定残基，突显了模型从一维输入数据中泛化的能力。此外，我们还将新基线与现有方法进行了进一步基准测试。这些创新提高了药物靶标相互作用预测的效率和有效性，为加速药物开发和加深对分子相互作用的理解提供了坚固的工具。因此，我们提供了一个易于使用的网络界面，可在https://www.bio.nat.tum.de/oc2/barlowdti 上免费访问。

更新时间: 2024-09-02 12:00:23

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.00040v2

CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models

Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making them considerably stealthier than traditional static backdoor attacks. However, existing research on NLP backdoor detection primarily focuses on defending against static backdoor attacks, while detecting dynamic backdoors in NLP models remains largely unexplored. This paper presents CLIBE, the first framework to detect dynamic backdoors in Transformer-based NLP models. CLIBE injects a "few-shot perturbation" into the suspect Transformer model by crafting optimized weight perturbation in the attention layers to make the perturbed model classify a limited number of reference samples as a target label. Subsequently, CLIBE leverages the generalization ability of this few-shot perturbation to determine whether the original model contains a dynamic backdoor. Extensive evaluation on three advanced NLP dynamic backdoor attacks, two widely-used Transformer frameworks, and four real-world classification tasks strongly validates the effectiveness of CLIBE. We also demonstrate the robustness of CLIBE against various adaptive attacks. Furthermore, we employ CLIBE to scrutinize 49 popular Transformer models on Hugging Face and discover one exhibiting a high probability of containing a dynamic backdoor. We have contacted Hugging Face and provided detailed evidence of this model's backdoor behavior. Moreover, we extend CLIBE to detect backdoor text generation models modified to exhibit toxic behavior. To the best of our knowledge, CLIBE is the first framework capable of detecting backdoors in text generation models without access to trigger input test samples.

Updated: 2024-09-02 11:59:56

标题: CLIBE: 在基于Transformer的自然语言处理模型中检测动态后门

摘要: 后门可以被注入到NLP模型中，当输入文本包含特定特征，即攻击者秘密选择的触发器时，会引发不当行为。与静态文本触发器中使用的固定单词、短语或句子不同，NLP动态后门攻击设计与抽象和潜在文本特征相关联的触发器，使它们比传统静态后门攻击更为隐蔽。然而，现有的关于NLP后门检测的研究主要集中在防御静态后门攻击，而检测NLP模型中的动态后门仍然大部分未被探索。本文介绍了CLIBE，这是第一个用于检测基于Transformer的NLP模型中动态后门的框架。CLIBE通过在注意力层中制作优化的权重扰动，将“少量扰动”注入到可疑Transformer模型中，使扰动后的模型将有限数量的参考样本分类为目标标签。随后，CLIBE利用这种少量扰动的泛化能力来确定原始模型是否包含动态后门。对三种高级NLP动态后门攻击、两种广泛使用的Transformer框架和四个真实分类任务的广泛评估，强烈验证了CLIBE的有效性。我们还展示了CLIBE对各种自适应攻击的鲁棒性。此外，我们利用CLIBE审查了Hugging Face上的49个流行Transformer模型，并发现一个模型具有高概率包含动态后门。我们已经联系了Hugging Face，并提供了该模型后门行为的详细证据。此外，我们扩展了CLIBE以检测被修改以展示有毒行为的后门文本生成模型。据我们所知，CLIBE是第一个能够在没有触发器输入测试样本的情况下检测文本生成模型中后门的框架。

更新时间: 2024-09-02 11:59:56

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.01193v1

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in deteriorating SER performance in practice. In this paper, we introduce a Two-level Refinement Network, dubbed TRNet, to address this challenge. Specifically, a pre-trained speech enhancement module is employed for front-end noise reduction and noise level estimation. Later, we utilize clean speech spectrograms and their corresponding deep representations as reference signals to refine the spectrogram distortion and representation shift of enhanced speech during model training. Experimental results validate that the proposed TRNet substantially promotes the robustness of the proposed system in both matched and unmatched noisy environments, without compromising its performance in noise-free environments.

Updated: 2024-09-02 11:52:47

标题: TRNet：利用语音增强的两级细化网络进行噪声鲁棒的语音情感识别

摘要: 在语音情绪识别（SER）中一个持续存在的挑战是环境噪声，这经常导致实际中SER性能下降。在本文中，我们引入了一个名为TRNet的两级细化网络，以解决这一挑战。具体来说，我们采用了一个预训练的语音增强模块进行前端降噪和噪声水平估计。随后，我们利用清晰语音频谱图及其对应的深度表示作为参考信号，以在模型训练期间细化增强语音的频谱畸变和表示偏移。实验结果验证了所提出的TRNet显著提升了所提出系统在匹配和不匹配的嘈杂环境中的鲁棒性，而不会损害其在无噪声环境中的性能。

更新时间: 2024-09-02 11:52:47

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2404.12979v2

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

This work evaluates the compression techniques on ConvNeXt models in image classification tasks using the CIFAR-10 dataset. Structured pruning, unstructured pruning, and dynamic quantization methods are evaluated to reduce model size and computational complexity while maintaining accuracy. The experiments, conducted on cloud-based platforms and edge device, assess the performance of these techniques. Results show significant reductions in model size, with up to 75% reduction achieved using structured pruning techniques. Additionally, dynamic quantization achieves a reduction of up to 95% in the number of parameters. Fine-tuned models exhibit improved compression performance, indicating the benefits of pre-training in conjunction with compression techniques. Unstructured pruning methods reveal trends in accuracy and compression, with limited reductions in computational complexity. The combination of OTOV3 pruning and dynamic quantization further enhances compression performance, resulting 89.7% reduction in size, 95% reduction with number of parameters and MACs, and 3.8% increase with accuracy. The deployment of the final compressed model on edge device demonstrates high accuracy 92.5% and low inference time 20 ms, validating the effectiveness of compression techniques for real-world edge computing applications.

Updated: 2024-09-02 11:48:19

标题: 边缘人工智能：卷积神经网络模型压缩技术的评估

摘要: 这项工作评估了在使用CIFAR-10数据集进行图像分类任务中对ConvNeXt模型的压缩技术。评估了结构化剪枝、非结构化剪枝和动态量化方法以减少模型大小和计算复杂性，同时保持准确性。在基于云平台和边缘设备上进行的实验评估了这些技术的性能。结果显示，使用结构化剪枝技术可实现高达75%的模型大小减少。此外，动态量化可将参数数量减少高达95%。经过微调的模型表现出更好的压缩性能，表明预训练与压缩技术相结合的好处。非结构化剪枝方法显示了准确性和压缩的趋势，计算复杂性的减少有限。OTOV3剪枝和动态量化的组合进一步增强了压缩性能，使大小减少了89.7%，参数数量和MACs减少了95%，准确性增加了3.8%。在边缘设备上部署最终压缩模型显示出92.5%的高准确性和20毫秒的低推理时间，验证了压缩技术在实际边缘计算应用中的有效性。

更新时间: 2024-09-02 11:48:19

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.02134v1

LiveFC: A System for Live Fact-Checking of Audio Streams

The advances in the digital era have led to rapid dissemination of information. This has also aggravated the spread of misinformation and disinformation. This has potentially serious consequences, such as civil unrest. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. While automated fact-checking approaches exist, they do not operate in real-time and do not always account for spread of misinformation through different modalities. This is particularly important as proactive fact-checking on live streams in real-time can help people be informed of false narratives and prevent catastrophic consequences that may cause civil unrest. This is particularly relevant with the rapid dissemination of information through video on social media platforms or other streams like political rallies and debates. Hence, in this work we develop a platform named LiveFC, that can aid in fact-checking live audio streams in real-time. LiveFC has a user-friendly interface that displays the claims detected along with their veracity and evidence for live streams with associated speakers for claims from respective segments. The app can be accessed at http://livefc.factiverse.ai and a screen recording of the demo can be found at https://bit.ly/3WVAoIw.

Updated: 2024-09-02 11:45:41

标题: LiveFC：一种用于音频流实时事实核查的系统

摘要: 数字时代的进步导致信息的快速传播。这也加剧了错误信息和虚假信息的传播。这可能会导致严重后果，例如社会动荡。虽然事实核查旨在应对这一问题，但手动事实核查繁琐且不具规模性。虽然存在自动事实核查方法，但它们不是实时运行，也并非总是考虑到通过不同方式传播错误信息。这一点尤为重要，因为实时的主动事实核查可以帮助人们了解虚假叙述，并防止可能引发社会动荡的灾难性后果。这在社交媒体平台上通过视频或政治集会和辩论等其他流媒体迅速传播信息的情况下尤为重要。因此，在本研究中，我们开发了一个名为LiveFC的平台，可以帮助实时核查直播音频流。LiveFC具有用户友好的界面，显示检测到的主张及其真实性和相关证据，用于带有发言人的直播流的各个部分的主张。该应用程序可访问http://livefc.factiverse.ai，演示的屏幕录制可在https://bit.ly/3WVAoIw找到。

更新时间: 2024-09-02 11:45:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.07448v2

Qualitative and quantitative analysis of student's perceptions in the use of generative AI in educational environments

The effective integration of generative artificial intelligence in education is a fundamental aspect to prepare future generations. The objective of this study is to analyze from a quantitative and qualitative point of view the perception of controlled student-IA interaction within the classroom. This analysis includes assessing the ethical implications and everyday use of AI tools, as well as understanding whether AI tools encourage students to pursue STEM careers. Several points for improvement in education are found, such as the challenge of getting teachers to engage with new technologies and adapt their methods in all subjects, not just those related to technologies.

Updated: 2024-09-02 11:43:18

标题: 学生对生成式人工智能在教育环境中使用的感知的定性和定量分析

摘要: 在教育中有效整合生成式人工智能是为了准备未来一代的基本方面。本研究的目标是从定量和定性的角度分析控制学生-人工智能互动在课堂内的感知。这个分析包括评估人工智能工具的伦理问题和日常使用，以及了解人工智能工具是否鼓励学生追求STEM职业。发现了教育中的几个改进点，比如让教师面对新技术并在所有科目中调整他们的方法的挑战，而不仅仅是与技术相关的科目。

更新时间: 2024-09-02 11:43:18

领域: cs.CY,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.13487v2

Backdoor Defense through Self-Supervised and Generative Learning

Backdoor attacks change a small portion of training data by introducing hand-crafted triggers and rewiring the corresponding labels towards a desired target class. Training on such data injects a backdoor which causes malicious inference in selected test samples. Most defenses mitigate such attacks through various modifications of the discriminative learning procedure. In contrast, this paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space. Interestingly, these representations get either preserved or heavily disturbed under recent backdoor attacks. In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset. Experiments show that training on cleansed dataset greatly reduces the attack success rate and retains the accuracy on benign inputs.

Updated: 2024-09-02 11:40:01

标题: 自监督和生成学习的后门防御

摘要: 后门攻击通过引入手工制作的触发器改变训练数据的一小部分，并将相应的标签重定向到所需的目标类。在这样的数据上训练注入了一个后门，导致在选定的测试样本中产生恶意推断。大多数防御措施通过对判别式学习过程进行各种修改来减轻这种攻击。相反，本文探讨了一种基于生成建模的方法，该方法在自监督表示空间中对每个类别的分布进行建模。有趣的是，这些表示在最近的后门攻击下要么被保留，要么受到严重干扰。在这两种情况下，我们发现每类生成模型可以帮助检测受污染的数据并清洁数据集。实验证明，对清洁数据集进行训练可以大大降低攻击成功率，并保留良性输入的准确性。

更新时间: 2024-09-02 11:40:01

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2409.01185v1

DiffLoad: Uncertainty Quantification in Electrical Load Forecasting with the Diffusion Model

Electrical load forecasting plays a crucial role in decision-making for power systems, including unit commitment and economic dispatch. The integration of renewable energy sources and the occurrence of external events, such as the COVID-19 pandemic, have rapidly increased uncertainties in load forecasting. The uncertainties in load forecasting can be divided into two types: epistemic uncertainty and aleatoric uncertainty. Separating these types of uncertainties can help decision-makers better understand where and to what extent the uncertainty is, thereby enhancing their confidence in the following decision-making. This paper proposes a diffusion-based Seq2Seq structure to estimate epistemic uncertainty and employs the robust additive Cauchy distribution to estimate aleatoric uncertainty. Our method not only ensures the accuracy of load forecasting but also demonstrates the ability to separate the two types of uncertainties and be applicable to different levels of loads. The relevant code can be found at \url{https://anonymous.4open.science/r/DiffLoad-4714/}.

Updated: 2024-09-02 11:31:16

标题: DiffLoad：使用扩散模型进行电力负荷预测中的不确定性量化

摘要: 电力负荷预测在电力系统的决策中起着至关重要的作用，包括机组组合和经济调度。可再生能源的整合和外部事件（如COVID-19大流行）的发生迅速增加了负荷预测中的不确定性。负荷预测中的不确定性可以分为两种类型：认知不确定性和随机不确定性。将这些类型的不确定性分开可以帮助决策者更好地理解不确定性的程度和范围，从而增强他们在后续决策中的信心。本文提出了一种基于扩散的Seq2Seq结构来估计认知不确定性，并采用稳健的加法Cauchy分布来估计随机不确定性。我们的方法不仅确保负荷预测的准确性，还展示了分离两种类型不确定性的能力，并适用于不同负荷水平。相关代码可在以下链接找到：\url{https://anonymous.4open.science/r/DiffLoad-4714/}。

更新时间: 2024-09-02 11:31:16

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2306.01001v5

Integrating End-to-End and Modular Driving Approaches for Online Corner Case Detection in Autonomous Driving

Online corner case detection is crucial for ensuring safety in autonomous driving vehicles. Current autonomous driving approaches can be categorized into modular approaches and end-to-end approaches. To leverage the advantages of both, we propose a method for online corner case detection that integrates an end-to-end approach into a modular system. The modular system takes over the primary driving task and the end-to-end network runs in parallel as a secondary one, the disagreement between the systems is then used for corner case detection. We implement this method on a real vehicle and evaluate it qualitatively. Our results demonstrate that end-to-end networks, known for their superior situational awareness, as secondary driving systems, can effectively contribute to corner case detection. These findings suggest that such an approach holds potential for enhancing the safety of autonomous vehicles.

Updated: 2024-09-02 11:14:41

标题: 将端到端和模块化驾驶方法融合在一起，用于自动驾驶中在线角点案例检测

摘要: 在线边缘案例检测对于确保自动驾驶车辆的安全至关重要。当前的自动驾驶方法可以分为模块化方法和端到端方法。为了充分利用两者的优势，我们提出了一种在线边缘案例检测方法，将端到端方法整合到模块化系统中。模块化系统接管主要的驾驶任务，而端到端网络作为次要系统并行运行，系统之间的不一致性被用于边缘案例检测。我们在实际车辆上实施了这种方法，并进行了定性评估。我们的结果表明，作为次要驾驶系统的端到端网络，以其出色的情境意识，能够有效地贡献于边缘案例检测。这些发现表明，这种方法有潜力提升自动驾驶车辆的安全性。

更新时间: 2024-09-02 11:14:41

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2409.01178v1

Logit Scaling for Out-of-Distribution Detection

The safe deployment of machine learning and AI models in open-world settings hinges critically on the ability to detect out-of-distribution (OOD) data accurately, data samples that contrast vastly from what the model was trained with. Current approaches to OOD detection often require further training the model, and/or statistics about the training data which may no longer be accessible. Additionally, many existing OOD detection methods struggle to maintain performance when transferred across different architectures. Our research tackles these issues by proposing a simple, post-hoc method that does not require access to the training data distribution, keeps a trained network intact, and holds strong performance across a variety of architectures. Our method, Logit Scaling (LTS), as the name suggests, simply scales the logits in a manner that effectively distinguishes between in-distribution (ID) and OOD samples. We tested our method on benchmarks across various scales, including CIFAR-10, CIFAR-100, ImageNet and OpenOOD. The experiments cover 3 ID and 14 OOD datasets, as well as 9 model architectures. Overall, we demonstrate state-of-the-art performance, robustness and adaptability across different architectures, paving the way towards a universally applicable solution for advanced OOD detection.

Updated: 2024-09-02 11:10:44

标题: Logit缩放用于异常检测

摘要: 在开放世界环境中安全部署机器学习和人工智能模型，关键在于准确检测出超出分布（OOD）数据，即与模型训练数据差异巨大的数据样本。当前的OOD检测方法通常需要进一步训练模型，和/或有关训练数据的统计信息，这些信息可能不再可访问。此外，许多现有的OOD检测方法在跨不同架构时往往难以保持性能。我们的研究通过提出一种简单的后续方法来解决这些问题，该方法不需要访问训练数据分布，保持训练好的网络完整，并在各种架构上保持强大的性能。我们的方法，Logit Scaling（LTS），顾名思义，简单地缩放逻辑元素，以有效区分入分布（ID）和OOD样本。我们在各种规模的基准测试中测试了我们的方法，包括CIFAR-10、CIFAR-100、ImageNet和OpenOOD。实验涵盖3个ID和14个OOD数据集，以及9种模型架构。总体上，我们展示了在不同架构下的最先进性能、稳健性和适应性，为先进的OOD检测铺平了道路，开辟了通向通用适用解决方案的途径。

更新时间: 2024-09-02 11:10:44

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.01175v1

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational language models.

Updated: 2024-09-02 11:09:55

标题: 通过重新表述非歧视标准将公平性推广到生成语言模型

摘要: 生成式人工智能，如大型语言模型，在最近几年经历了快速发展。随着这些模型越来越多地面向公众，人们对在应用中持续和放大有害偏见的担忧也在增加。性别刻板印象可能对其目标个体造成伤害和限制，无论其是否包括误传或歧视。认识到性别偏见是一种普遍存在的社会构造，本文研究了如何揭示和量化生成式语言模型中性别偏见的存在。具体来说，我们推导了三个来自分类学中众所周知的非歧视标准——独立性、分离性和充分性——的生成式人工智能类比。为了展示这些标准的实际应用，我们设计了针对每个标准的提示，重点关注职业性别刻板印象，特别是利用医学测试来介绍生成式人工智能背景中的真实情况。我们的结果涉及这些对话语言模型中存在的职业性别偏见。

更新时间: 2024-09-02 11:09:55

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2403.08564v3

Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data

To protect the intellectual property of well-trained deep neural networks (DNNs), black-box watermarks, which are embedded into the prediction behavior of DNN models on a set of specially-crafted samples and extracted from suspect models using only API access, have gained increasing popularity in both academy and industry. Watermark robustness is usually implemented against attackers who steal the protected model and obfuscate its parameters for watermark removal. However, current robustness evaluations are primarily performed under moderate attacks or unrealistic settings. Existing removal attacks could only crack a small subset of the mainstream black-box watermarks, and fall short in four key aspects: incomplete removal, reliance on prior knowledge of the watermark, performance degradation, and high dependency on data. In this paper, we propose a watermark-agnostic removal attack called \textsc{Neural Dehydration} (\textit{abbrev.} \textsc{Dehydra}), which effectively erases all ten mainstream black-box watermarks from DNNs, with only limited or even no data dependence. In general, our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message. We further design target class detection and recovered sample splitting algorithms to reduce the utility loss and achieve data-free watermark removal on five of the watermarking schemes. We conduct comprehensive evaluation of \textsc{Dehydra} against ten mainstream black-box watermarks on three benchmark datasets and DNN architectures. Compared with existing removal attacks, \textsc{Dehydra} achieves strong removal effectiveness across all the covered watermarks, preserving at least $90\%$ of the stolen model utility, under the data-limited settings, i.e., less than $2\%$ of the training data or even data-free.

Updated: 2024-09-02 11:01:35

标题: 神经脱水：有效地从具有有限数据的DNNs中擦除黑盒水印

摘要: 为了保护经过良好训练的深度神经网络（DNNs）的知识产权，黑盒水印技术已在学术界和工业界获得日益增长的关注。这些水印被嵌入到DNN模型在一组特别设计的样本上的预测行为中，并且仅通过API访问从可疑模型中提取。水印的鲁棒性通常是针对窃取受保护模型并混淆其参数以移除水印的攻击者实施的。然而，目前的鲁棒性评估主要是在中等攻击或不切实际的设置下进行的。现有的移除攻击只能破解少数主流黑盒水印，并在四个关键方面存在不足：不完全移除、依赖水印的先验知识、性能降低以及对数据的高度依赖。在本文中，我们提出了一种称为“神经脱水”（Dehydra）的无水印攻击技术，它可以有效地从DNNs中擦除所有十种主流黑盒水印，而且只有有限甚至没有数据依赖。总体上，我们的攻击流程利用受保护模型的内部来恢复和遗忘水印信息。我们进一步设计了目标类别检测和恢复样本分割算法，以减少实用性损失，并在五种水印方案上实现无数据水印移除。我们对Dehydra在三个基准数据集和DNN架构上对十种主流黑盒水印进行了全面评估。与现有的移除攻击相比，Dehydra在所有覆盖的水印上实现了强大的移除效果，在数据受限的情况下，即少于2％的训练数据甚至无数据，保留了至少90％的窃取模型实用性。

更新时间: 2024-09-02 11:01:35

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2309.03466v2

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Designing a safe policy for uncertain environments is crucial in real-world control applications. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm capable of identifying a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional Lagrangian max-min formulation with policy gradient methods can become trapped in suboptimal solutions by encountering a sum of conflicting gradients from the objective and constraint functions during its inner minimization problem. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a binary search algorithm with a policy gradient subroutine and prove that it identifies an $\varepsilon$-optimal policy in an RCMDP with $\tilde{\mathcal{O}}(\varepsilon^{-4})$ policy evaluations.

Updated: 2024-09-02 10:56:20

标题: 通过外接形式在鲁棒受限马尔可夫决策过程中实现近最优策略识别

摘要: 在现实世界的控制应用中，设计一种适用于不确定环境的安全策略至关重要。然而，在马尔可夫决策过程（MDP）框架内，这一挑战仍然未得到充分解决。本文介绍了一种能够在鲁棒约束MDP（RCMDP）中识别接近最优策略的第一个算法，其中最优策略在一组环境中的最坏情况下满足约束条件的同时最小化累积成本。我们首先证明，传统的Lagrangian最大-最小形式与策略梯度方法可能会陷入次优解，因为在内部最小化问题中遇到目标和约束函数的冲突梯度之和。为了解决这个问题，我们利用RCMDP问题的上确界形式，通过从目标或约束中选择单个梯度来解决冲突。基于上确界形式，我们提出了一个带有策略梯度子程序的二分搜索算法，并证明它能够在RCMDP中识别出一个$\varepsilon$-最优策略，需要$\tilde{\mathcal{O}}(\varepsilon^{-4})$个策略评估。

更新时间: 2024-09-02 10:56:20

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2408.16286v2

EuroPED-NN: Uncertainty aware surrogate model

This work successfully generates an uncertainty-aware surrogate model of the EuroPED plasma pedestal model using the Bayesian neural network with noise contrastive prior (BNN-NCP) technique. This model is trained using data from the JET-ILW pedestal database and subsequent model evaluations, conforming to EuroPED-NN. The BNN-NCP technique has been proven to be a suitable method for generating uncertainty-aware surrogate models. It matches the output results of a regular neural network while providing confidence estimates for predictions as uncertainties. Additionally, it highlights out-of-distribution (OOD) regions using surrogate model uncertainties. This provides critical insights into model robustness and reliability. EuroPED-NN has been physically validated, first, analyzing electron density $n_e\!\left(\psi_{\text{pol}}=0.94\right)$ with respect to increasing plasma current, $I_p$, and second, validating the $\Delta-\beta_{p,ped}$ relation associated with the EuroPED model. This affirms the robustness of the underlying physics learned by the surrogate model. On top of that, the method was used to develop a EuroPED-like model fed with experimental data, i.e. an uncertainty aware experimental model, which is functional in JET database. Both models have been also tested in $\sim 50$ AUG shots.

Updated: 2024-09-02 10:55:37

标题: EuroPED-NN：考虑不确定性的代理模型

摘要: 这项工作成功地利用贝叶斯神经网络噪声对比先验（BNN-NCP）技术生成了EuroPED等离子体底座模型的不确定性感知代理模型。该模型使用JET-ILW底座数据库的数据进行训练，并进行后续模型评估，符合EuroPED-NN的要求。已证明BNN-NCP技术是生成不确定性感知代理模型的合适方法。它与常规神经网络的输出结果相匹配，同时为预测提供置信度估计作为不确定性。此外，它利用代理模型的不确定性突出显示出分布之外（OOD）的区域。这为模型的健壮性和可靠性提供了关键见解。EuroPED-NN已经被物理验证，首先，分析与增加等离子体电流$I_p$有关的电子密度$n_e\!\left(\psi_{\text{pol}}=0.94\right)$，其次，验证与EuroPED模型相关的$\Delta-\beta_{p,ped}$关系。这证实了代理模型所学到的基础物理的健壮性。此外，该方法被用来开发一个类似EuroPED的模型，以实验数据为输入，即一个具有不确定性感知的实验模型，在JET数据库中运行良好。这两个模型还在约50个AUG射击中进行了测试。

更新时间: 2024-09-02 10:55:37

领域: physics.plasm-ph,cs.LG

下载: http://arxiv.org/abs/2402.00760v2

PACSBO: Probably approximately correct safe Bayesian optimization

Safe Bayesian optimization (BO) algorithms promise to find optimal control policies without knowing the system dynamics while at the same time guaranteeing safety with high probability. In exchange for those guarantees, popular algorithms require a smoothness assumption: a known upper bound on a norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it is unclear how to, in practice, obtain an upper bound of an unknown function in its corresponding RKHS. In response, we propose an algorithm that estimates an upper bound on the RKHS norm of an unknown function from data and investigate its theoretical properties. Moreover, akin to Lipschitz-based methods, we treat the RKHS norm as a local rather than a global object, and thus reduce conservatism. Integrating the RKHS norm estimation and the local interpretation of the RKHS norm into a safe BO algorithm yields PACSBO, an algorithm for probably approximately correct safe Bayesian optimization, for which we provide numerical and hardware experiments that demonstrate its applicability and benefits over popular safe BO algorithms.

Updated: 2024-09-02 10:50:34

标题: PACSBO：可能大致正确的安全贝叶斯优化

摘要: 安全贝叶斯优化（BO）算法承诺在不知道系统动力学的情况下找到最优控制策略，同时以很高的概率保证安全性。为了获得这些保证，流行的算法需要一个平滑性假设：在再生核希尔伯特空间（RKHS）中有一个已知的范数上界。RKHS是一个潜在的无限维空间，目前尚不清楚如何在实践中获得未知函数在其对应RKHS中的上界。因此，我们提出了一种从数据中估计未知函数在RKHS范数上界的算法，并研究其理论性质。此外，类似于利普希茨方法，我们将RKHS范数视为一个局部而不是全局对象，从而减少了保守性。将RKHS范数估计和RKHS范数的局部解释整合到安全BO算法中，得到了PACSBO，一个用于可能近似正确的安全贝叶斯优化的算法，我们通过数值和硬件实验证明了其适用性和优势，相对于流行的安全BO算法。

更新时间: 2024-09-02 10:50:34

领域: cs.LG,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2409.01163v1

Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning

In this technical report, we describe our submission to DCASE2024 Challenge Task6 (Automated Audio Captioning) and Task8 (Language-based Audio Retrieval). We develop our approach building upon the EnCLAP audio captioning framework and optimizing it for Task6 of the challenge. Notably, we outline the changes in the underlying components and the incorporation of the reranking process. Additionally, we submit a supplementary retriever model, a byproduct of our modified framework, to Task8. Our proposed systems achieve FENSE score of 0.542 on Task6 and mAP@10 score of 0.386 on Task8, significantly outperforming the baseline models.

Updated: 2024-09-02 10:47:07

标题: 使用辅助检索模型扩展EnCLAP以实现自动音频字幕制作

摘要: 在这份技术报告中，我们描述了我们参与DCASE2024挑战赛任务6（自动音频字幕）和任务8（基于语言的音频检索）的提交。我们基于EnCLAP音频字幕框架开发了我们的方法，并针对挑战赛的任务6进行了优化。值得注意的是，我们概述了底层组件的变化以及引入重新排序过程。此外，我们提交了一个补充的检索模型，作为我们修改后的框架的副产品，用于任务8。我们提出的系统在任务6上实现了0.542的FENSE分数，在任务8上实现了0.386的mAP@10分数，明显优于基线模型。

更新时间: 2024-09-02 10:47:07

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2409.01160v1

Forecasting infectious disease prevalence with associated uncertainty using neural networks

Infectious diseases pose significant human and economic burdens. Accurately forecasting disease incidence can enable public health agencies to respond effectively to existing or emerging diseases. Despite progress in the field, developing accurate forecasting models remains a significant challenge. This thesis proposes two methodological frameworks using neural networks (NNs) with associated uncertainty estimates - a critical component limiting the application of NNs to epidemic forecasting thus far. We develop our frameworks by forecasting influenza-like illness (ILI) in the United States. Our first proposed method uses Web search activity data in conjunction with historical ILI rates as observations for training NN architectures. Our models incorporate Bayesian layers to produce uncertainty intervals, positioning themselves as legitimate alternatives to more conventional approaches. The best performing architecture: iterative recurrent neural network (IRNN), reduces mean absolute error by 10.3% and improves Skill by 17.1% on average in forecasting tasks across four flu seasons compared to the state-of-the-art. We build on this method by introducing IRNNs, an architecture which changes the sampling procedure in the IRNN to improve the uncertainty estimation. Our second framework uses neural ordinary differential equations to bridge the gap between mechanistic compartmental models and NNs; benefiting from the physical constraints that compartmental models provide. We evaluate eight neural ODE models utilising a mixture of ILI rates and Web search activity data to provide forecasts. These are compared with the IRNN and IRNN0 - the IRNN using only ILI rates. Models trained without Web search activity data outperform the IRNN0 by 16% in terms of Skill. Future work should focus on more effectively using neural ODEs with Web search data to compete with the best performing IRNN.

Updated: 2024-09-02 10:41:43

标题: 使用神经网络预测传染病流行病的患病率及相关不确定性

摘要: 传染病给人类和经济带来了重大负担。准确预测疾病发病率可以使公共卫生机构有效地应对现有或新出现的疾病。尽管在这个领域取得了进展，但发展准确的预测模型仍然是一个重大挑战。本论文提出了两种使用神经网络（NNs）的方法论框架，带有相关的不确定性估计 - 这是迄今限制将NNs应用于流行病预测的关键组成部分。我们通过预测美国的类流感病例（ILI）来发展我们的框架。我们第一种提出的方法将网络搜索活动数据与历史ILI率结合作为训练NN架构的观察值。我们的模型包括贝叶斯层来生成不确定性间隔，使其成为更传统方法的合法替代方案。表现最佳的架构：迭代循环神经网络（IRNN），与最先进技术相比，在四个流感季节的预测任务中平均减少了10.3％的平均绝对误差，并提高了17.1％的技能。我们在这种方法的基础上引入了IRNNs，这种架构改变了IRNN中的抽样过程以改进不确定性估计。我们的第二个框架使用神经常微分方程来弥合机械性隔室模型和NNs之间的差距；受益于隔室模型提供的物理约束。我们评估了八个神经ODE模型，利用混合的ILI率和网络搜索活动数据进行预测。这些模型与IRNN和IRNN0进行比较 - IRNN只使用ILI率。在没有网络搜索活动数据的情况下训练的模型在技能方面比IRNN0高出16％。未来的工作应该集中于更有效地利用神经ODE与网络搜索数据竞争表现最佳的IRNN。

更新时间: 2024-09-02 10:41:43

领域: cs.LG,q-bio.PE

下载: http://arxiv.org/abs/2409.01154v1

Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free representation alignment metric (Pfram) that can measure the similarities between any two representation systems without requiring additional training parameters. Notably, Pfram can also assess the alignment of a neural representation system with the human representation system, represented by ground-truth annotations of images. By evaluating the alignment with object annotations, we demonstrate that this metric shows strong and consistent correlations with object hallucination across a wide range of state-of-the-art MLLMs, spanning various model architectures and sizes. Furthermore, using this metric, we explore other key issues related to image representations in MLLMs, such as the role of different modules, the impact of textual instructions, and potential improvements including the use of alternative visual encoders. Our code is available at: https://github.com/yellow-binary-tree/Pfram.

Updated: 2024-09-02 10:37:26

标题: 理解多模态幻觉与无参数表示对齐

摘要: 幻觉是多模态大型语言模型（MLLMs）中的常见问题，然而其基本原则仍然不明确。在本文中，我们研究了MLLMs的哪些组件导致了对象幻觉。为了分析图像表示而完全避免除图像表示本身以外的所有其他因素的影响，我们提出了一个无参数表示对齐度量（Pfram），可以衡量任意两个表示系统之间的相似性，而无需额外的训练参数。值得注意的是，Pfram还可以评估神经表示系统与人类表示系统（由图像的地面实况注释表示）的对齐度。通过评估与对象注释的对齐度，我们展示了这个度量在各种最先进的MLLMs中显示出与对象幻觉强烈和一致的相关性，涵盖了各种模型架构和大小。此外，使用这个度量，我们探讨了与MLLMs中图像表示相关的其他关键问题，例如不同模块的作用、文本指令的影响以及包括使用替代视觉编码器在内的潜在改进。我们的代码可在https://github.com/yellow-binary-tree/Pfram获取。

更新时间: 2024-09-02 10:37:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.01151v1

The Initial Screening Order Problem

We investigate the role of the initial screening order (ISO) in candidate screening tasks, such as employee hiring and academic admissions, in which a screener is tasked with selecting $k$ candidates from a candidate pool. The ISO refers to the order in which the screener searches the candidate pool. Today, it is common for the ISO to be the product of an information access system, such as an online platform or a database query. The ISO has been largely overlooked in the literature, despite its potential impact on the optimality and fairness of the chosen $k$ candidates, especially under a human screener. We define two problem formulations describing the search behavior of the screener under the ISO: the best-$k$, where the screener selects the $k$ best candidates; and the good-$k$, where the screener selects the $k$ first good-enough candidates. To study the impact of the ISO, we introduce a human-like screener and compare it to its algorithmic counterpart, where the human-like screener is conceived to be inconsistent over time due to fatigue. In particular, our analysis shows that the ISO, under a human-like screener solving for the good-$k$ problem, hinders individual fairness despite meeting group level fairness, and hampers the optimality of the selected $k$ candidates. This is due to position bias, where a candidate's evaluation is affected by its position within the ISO. We report extensive simulated experiments exploring the parameters of the best-$k$ and good-$k$ problems for the algorithmic and human-like screeners. The simulation framework is flexible enough to account for multiple screening settings, being an alternative to running real-world candidate screening procedures. This work is motivated by a real-world candidate screening problem studied in collaboration with an European company.

Updated: 2024-09-02 10:35:42

标题: 初始筛选订单问题

摘要: 我们调查了初始筛选顺序（ISO）在候选人筛选任务中的作用，例如员工招聘和学术录取，其中筛选者的任务是从候选人池中选择$k$个候选人。ISO指的是筛选者搜索候选人池的顺序。如今，ISO通常是信息获取系统的产物，例如在线平台或数据库查询。尽管ISO可能对所选的$k$个候选人的最优性和公平性产生影响，尤其是在人类筛选者的情况下，但在文献中大多被忽视。我们定义了两种描述ISO下筛选者搜索行为的问题形式：最佳-$k$，筛选者选择$k$个最佳候选人；和好-$k$，筛选者选择前$k$个足够好的候选人。为了研究ISO的影响，我们引入了一种类似于人类的筛选者，并将其与算法对应物进行比较，其中类似于人类的筛选者在时间上由于疲劳而不一致。特别是，我们的分析表明，在类似于人类的筛选者解决好-$k$问题时，ISO会阻碍个体公平性，尽管满足了群体水平的公平性，并且会影响所选的$k$个候选人的最优性。这是由于位置偏见，候选人的评估受到其在ISO中的位置的影响。我们报告了大量模拟实验，探索了算法和类似于人类的筛选者在最佳-$k$和好-$k$问题的参数。模拟框架足够灵活，可以考虑多个筛选设置，是运行真实候选人筛选程序的替代方案。这项工作受到与一家欧洲公司合作研究的真实候选人筛选问题的激励。

更新时间: 2024-09-02 10:35:42

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2307.15398v4

Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis

Recent advances in Diffusion Models (DMs) have led to significant progress in visual synthesis and editing tasks, establishing them as a strong competitor to Generative Adversarial Networks (GANs). However, the latent space of DMs is not as well understood as that of GANs. Recent research has focused on unsupervised semantic discovery in the latent space of DMs by leveraging the bottleneck layer of the denoising network, which has been shown to exhibit properties of a semantic latent space. However, these approaches are limited to discovering global attributes. In this paper we address, the challenge of local image manipulation in DMs and introduce an unsupervised method to factorize the latent semantics learned by the denoising network of pre-trained DMs. Given an arbitrary image and defined regions of interest, we utilize the Jacobian of the denoising network to establish a relation between the regions of interest and their corresponding subspaces in the latent space. Furthermore, we disentangle the joint and individual components of these subspaces to identify latent directions that enable local image manipulation. Once discovered, these directions can be applied to different images to produce semantically consistent edits, making our method suitable for practical applications. Experimental results on various datasets demonstrate that our method can produce semantic edits that are more localized and have better fidelity compared to the state-of-the-art.

Updated: 2024-09-02 10:33:48

标题: 通过联合和个体组件分析实现扩散模型中的本地编辑

摘要: 最近关于扩散模型（DMs）的进展在视觉合成和编辑任务方面取得了显著进展，使其成为生成对抗网络（GANs）的强大竞争对手。然而，与GANs相比，DMs的潜在空间并没有被充分理解。最近的研究集中在利用去噪网络的瓶颈层，在DMs的潜在空间中进行无监督语义发现，已经显示出具有语义潜在空间属性。然而，这些方法仅限于发现全局属性。本文解决了DMs中局部图像处理的挑战，并引入了一种无监督方法来分解预训练DMs的去噪网络学习的潜在语义。给定任意图像和定义的感兴趣区域，我们利用去噪网络的雅可比矩阵建立感兴趣区域与潜在空间中相应子空间之间的关系。此外，我们分解这些子空间的联合和个体分量，以识别能够实现局部图像处理的潜在方向。一旦发现这些方向，它们可以应用于不同的图像，产生语义一致的编辑，使我们的方法适用于实际应用。在各种数据集上的实验结果表明，与最先进技术相比，我们的方法能够产生更局部化和更好保真度的语义编辑。

更新时间: 2024-09-02 10:33:48

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.16845v2

FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking

Growth, abnormal behavior, and diseases of fish can be early detected by monitoring fish tracking through the method of image processing, which is of great significance for factory aquaculture. However, underwater reflections and some reasons with fish, such as the high similarity , rapid swimming caused by stimuli and multi-object occlusion bring challenges to multi-target tracking of fish. To address these challenges, this paper establishes a complex multi-scene sturgeon tracking dataset and proposes a real-time end-to-end fish tracking model, FMRFT. In this model, the Mamba In Mamba (MIM) architecture with low memory consumption is introduced into the tracking algorithm to realize multi-frame video timing memory and fast feature extraction, which improves the efficiency of correlation analysis for contiguous frames in multi-fish video. Additionally, the superior feature interaction and a priori frame processing capabilities of RT-DETR are leveraged to provide an effective tracking algorithm. By incorporating the QTSI query interaction processing module, the model effectively handles occluded objects and redundant tracking frames, resulting in more accurate and stable fish tracking. Trained and tested on the dataset, the model achieves an IDF1 score of 90.3% and a MOTA accuracy of 94.3%. Experimental results demonstrate that the proposed FMRFT model effectively addresses the challenges of high similarity and mutual occlusion in fish populations, enabling accurate tracking in factory farming environments.

Updated: 2024-09-02 10:33:45

标题: FMRFT：融合Mamba和DETR用于查询时间序列交点鱼类跟踪

摘要: 鱼类的生长、异常行为和疾病可以通过监测鱼类跟踪的图像处理方法进行早期检测，这对工厂水产养殖具有重要意义。然而，水下反射和一些鱼类特征，如高相似性、受刺激引起的快速游动和多目标遮挡，给鱼类的多目标跟踪带来挑战。为了解决这些挑战，本文建立了一个复杂的多场景鲟鱼跟踪数据集，并提出了一个实时端到端的鱼类跟踪模型FMRFT。在该模型中，引入了低内存消耗的Mamba In Mamba（MIM）架构到跟踪算法中，实现了多帧视频定时存储和快速特征提取，从而提高了多鱼视频中相邻帧的相关性分析效率。此外，利用RT-DETR的优越特征交互和先验帧处理能力，提供了一种有效的跟踪算法。通过整合QTSI查询交互处理模块，该模型有效处理遮挡对象和冗余跟踪帧，实现更准确和稳定的鱼类跟踪。在数据集上进行训练和测试，该模型实现了90.3%的IDF1得分和94.3%的MOTA准确度。实验结果表明，所提出的FMRFT模型有效解决了鱼群中高相似性和相互遮挡的挑战，实现了在工厂养殖环境中的准确跟踪。

更新时间: 2024-09-02 10:33:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01148v1

LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning

Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised graph learning that has attracted attention across various application scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored. Because conventional augmentation techniques like feature embedding masking cannot directly process textual attributes on TAGs. A naive strategy for applying GCL to TAGs is to encode the textual attributes into feature embeddings via a language model and then feed the embeddings into the following GCL module for processing. Such a strategy faces three key challenges: I) failure to avoid information loss, II) semantic loss during the text encoding phase, and III) implicit augmentation constraints that lead to uncontrollable and incomprehensible results. In this paper, we propose a novel GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to produce textual augmentations and LLMs' powerful natural language processing (NLP) abilities to address the three limitations aforementioned to pave the way for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG datasets illustrate the superiority of the proposed LATEX-GCL method. The source codes and datasets are released to ease the reproducibility, which can be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.

Updated: 2024-09-02 10:30:55

标题: LATEX-GCL：基于大型语言模型（LLMs）的文本属性图对比学习的数据增强

摘要: 图对比学习（GCL）是一种强大的自监督图学习范式，受到各种应用场景的关注。然而，对于学习文本属性图（TAGs）的GCL尚未被探索。因为传统的增强技术如特征嵌入掩膜不能直接处理TAGs上的文本属性。将文本属性编码为特征嵌入，并将嵌入馈送到后续的GCL模块进行处理是一种朴素的策略。这种策略面临着三个关键挑战：I）无法避免信息丢失，II）在文本编码阶段的语义丢失，III）导致不可控和难以理解结果的隐式增强约束。在本文中，我们提出了一种新颖的GCL框架，命名为LATEX-GCL，利用大型语言模型（LLMs）生成文本增强，并利用LLMs强大的自然语言处理（NLP）能力来解决上述三个限制，为将GCL应用于TAG任务铺平道路。对四个高质量TAG数据集的大量实验表明了所提出的LATEX-GCL方法的优越性。为了方便重现，我们发布了源代码和数据集，可通过此链接访问：https://anonymous.4open.science/r/LATEX-GCL-0712。

更新时间: 2024-09-02 10:30:55

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2409.01145v1

Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations

In the last years, cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these securities and nowadays cryptocurrency exchanges process transactions for over 100 billion US dollars per month. However, many cryptocurrencies have low liquidity and therefore they are highly prone to market manipulation schemes. In this paper, we perform an in-depth analysis of pump and dump schemes organized by communities over the Internet. We observe how these communities are organized and how they carry out the fraud. Then, we report on two case studies related to pump and dump groups. Lastly, we introduce an approach to detect the fraud in real time that outperforms the current state of the art, so to help investors stay out of the market when a pump and dump scheme is in action.

Updated: 2024-09-02 10:25:39

标题: 比特币时代的抛售和抛售：加密货币市场操纵的实时检测

摘要: 在过去几年中，加密货币越来越受欢迎。即使不是专家，人们也开始投资这些证券，如今加密货币交易所每月处理超过1000亿美元的交易。然而，许多加密货币流动性较低，因此它们很容易受到市场操纵计划的影响。在本文中，我们对互联网上由社区组织的抬高和抛售计划进行了深入分析。我们观察这些社区是如何组织的以及他们如何实施欺诈行为。然后，我们报告了两个与抬高和抛售团体相关的案例研究。最后，我们介绍了一种实时检测欺诈的方法，该方法优于当前的技术水平，从而帮助投资者在抬高和抛售计划实施时避免进入市场。

更新时间: 2024-09-02 10:25:39

领域: cs.CY,cs.CR,cs.LG,q-fin.ST

下载: http://arxiv.org/abs/2005.06610v2

Autonomous Payload Thermal Control

In small satellites there is less room for heat control equipment, scientific instruments, and electronic components. Furthermore, the near proximity of electronic components makes power dissipation difficult, with the risk of not being able to control the temperature appropriately, reducing component lifetime and mission performance. To address this challenge, taking advantage of the advent of increasing intelligence on board satellites, an autonomous thermal control tool that uses deep reinforcement learning is proposed for learning the thermal control policy onboard. The tool was evaluated in a real space edge processing computer that will be used in a demonstration payload hosted in the International Space Station (ISS). The experiment results show that the proposed framework is able to learn to control the payload processing power to maintain the temperature under operational ranges, complementing traditional thermal control systems.

Updated: 2024-09-02 10:23:41

标题: 自主载荷热控制

摘要: 在小卫星中，热控制设备、科学仪器和电子元件的空间较小。此外，电子元件的近距离使得功率散热困难，存在温度控制不当的风险，降低元件寿命和任务性能。为解决这一挑战，利用卫星上日益增多的智能化，提出了一种利用深度强化学习的自主热控工具，用于学习卫星上的热控策略。该工具在将用于国际空间站（ISS）上的演示负载中的真实空间边缘处理计算机上进行了评估。实验结果表明，所提出的框架能够学习控制负载处理功率，以维持温度在操作范围内，补充传统热控系统。

更新时间: 2024-09-02 10:23:41

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2307.15438v3

Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching

Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it leads to frequent DRAM access in the MoE and attention layers. We observe that conventional computing devices have limitations when processing the MoE and attention layers, which dominate the total execution time and exhibit low arithmetic intensity (Op/B). Processing MoE layers only with devices targeting low-Op/B such as processing-in-memory (PIM) architectures is challenging due to the fluctuating Op/B in the MoE layer caused by continuous batching. To address these challenges, we propose Duplex, which comprises xPU tailored for high-Op/B and Logic-PIM to effectively perform low-Op/B operation within a single device. Duplex selects the most suitable processor based on the Op/B of each layer within LLMs. As the Op/B of the MoE layer is at least 1 and that of the attention layer has a value of 4-8 for grouped query attention, prior PIM architectures are not efficient, which place processing units inside DRAM dies and only target extremely low-Op/B (under one) operations. Based on recent trends, Logic-PIM adds more through-silicon vias (TSVs) to enable high-bandwidth communication between the DRAM die and the logic die and place powerful processing units on the logic die, which is best suited for handling low-Op/B operations ranging from few to a few dozens. To maximally utilize the xPU and Logic-PIM, we propose expert and attention co-processing.

Updated: 2024-09-02 10:21:21

标题: 双工：一种用于大型语言模型的设备，具有专家混合、分组查询注意力和连续批处理

摘要: 大型语言模型(LLM)因其在各种上下文中生成高质量内容的能力而出现。为了减少它们对计算资源的爆炸性需求，出现了混合专家(MoE)。MoE层使得能够利用大量参数进行更少的计算。应用最先进的连续批处理可以提高吞吐量；然而，这会导致MoE和注意力层频繁访问DRAM。我们观察到，传统的计算设备在处理MoE和注意力层时存在局限，这些层主导了总执行时间，并表现出低算术强度(Op/B)。仅使用针对低Op/B的设备(如处理内存中的PIM架构)处理MoE层是具有挑战性的，因为MoE层的Op/B会因连续批处理而波动。为了解决这些挑战，我们提出了Duplex，它包括为高Op/B定制的xPU和逻辑PIM，以有效地在单个设备内执行低Op/B操作。Duplex根据LLMs中每个层的Op/B选择最适合的处理器。由于MoE层的Op/B至少为1，而注意力层的Op/B值为4-8(对于分组查询注意力)，之前的PIM架构不高效，这些架构将处理单元放置在DRAM芯片内部，仅针对极低Op/B(低于1)的操作。根据最近的趋势，逻辑PIM增加了更多的通过硅通孔(TSV)以实现DRAM芯片和逻辑芯片之间的高带宽通信，并在逻辑芯片上放置强大的处理单元，最适合处理从几个到几十个低Op/B操作。为了最大程度地利用xPU和逻辑PIM，我们提出了专家和注意力共处理。

更新时间: 2024-09-02 10:21:21

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2409.01141v1

LLM-PQA: LLM-enhanced Prediction Query Answering

The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a novel tool that addresses prediction queries formulated in natural language. LLM-PQA is the first to combine the capabilities of LLMs and retrieval-augmented mechanism for the needs of prediction queries by integrating data lakes and model zoos. This integration provides users with access to a vast spectrum of heterogeneous data and diverse ML models, facilitating dynamic prediction query answering. In addition, LLM-PQA can dynamically train models on demand, based on specific query requirements, ensuring reliable and relevant results even when no pre-trained model in a model zoo, available for the task.

Updated: 2024-09-02 10:20:35

标题: LLM-PQA：LLM增强预测查询回答

摘要: 大语言模型（LLMs）的出现为改变查询处理方式提供了机会，超越传统基于SQL的数据库系统的限制。然而，使用LLM来回答预测查询仍然具有挑战性，因为必须使用外部ML模型并进行推断才能提供答案。本文介绍了LLM-PQA，这是一种新型工具，用于处理用自然语言表达的预测查询。LLM-PQA是第一个将LLMs和检索增强机制的能力结合起来，以满足预测查询的需要，通过集成数据湖和模型动物园。这种集成为用户提供了访问广泛的异构数据和多样化的ML模型的能力，促进动态预测查询的回答。此外，LLM-PQA可以基于特定查询要求动态训练模型，确保在没有为任务提供的预训练模型时仍能获得可靠和相关的结果。

更新时间: 2024-09-02 10:20:35

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.01140v1

Generating Synthetic Satellite Imagery for Rare Objects: An Empirical Comparison of Models and Metrics

Generative deep learning architectures can produce realistic, high-resolution fake imagery -- with potentially drastic societal implications. A key question in this context is: How easy is it to generate realistic imagery, in particular for niche domains. The iterative process required to achieve specific image content is difficult to automate and control. Especially for rare classes, it remains difficult to assess fidelity, meaning whether generative approaches produce realistic imagery and alignment, meaning how (well) the generation can be guided by human input. In this work, we present a large-scale empirical evaluation of generative architectures which we fine-tuned to generate synthetic satellite imagery. We focus on nuclear power plants as an example of a rare object category - as there are only around 400 facilities worldwide, this restriction is exemplary for many other scenarios in which training and test data is limited by the restricted number of occurrences of real-world examples. We generate synthetic imagery by conditioning on two kinds of modalities, textual input and image input obtained from a game engine that allows for detailed specification of the building layout. The generated images are assessed by commonly used metrics for automatic evaluation and then compared with human judgement from our conducted user studies to assess their trustworthiness. Our results demonstrate that even for rare objects, generation of authentic synthetic satellite imagery with textual or detailed building layouts is feasible. In line with previous work, we find that automated metrics are often not aligned with human perception -- in fact, we find strong negative correlations between commonly used image quality metrics and human ratings.

Updated: 2024-09-02 10:19:39

标题: 生成稀有物体合成卫星图像：模型和度量的实证比较

摘要: 生成式深度学习架构可以生成逼真、高分辨率的虚假图像，可能对社会产生重大影响。在这种背景下一个关键问题是：生成逼真图像有多容易，特别是对于小众领域。实现特定图像内容所需的迭代过程很难自动化和控制。特别是对于罕见类别，评估忠实度和对齐度仍然困难，即生成方法是否产生逼真图像以及生成过程是否受到人类输入的引导。在这项工作中，我们对生成架构进行了大规模实证评估，我们对其进行了微调，以生成合成卫星图像。我们以核电厂为例，作为一种罕见的对象类别 - 全球仅有大约400个核电厂，这种限制在许多其他情况下也是示范性的，训练和测试数据受到实际例子数量有限的限制。我们通过在文本输入和从游戏引擎获得的图像输入两种模态条件下生成合成图像，该游戏引擎允许对建筑布局进行详细规定。生成的图像通过常用的自动评估指标进行评估，然后与我们进行的用户研究中的人类判断进行比较，以评估它们的可信度。我们的结果表明，即使对于罕见的对象，使用文本或详细建筑布局生成真实合成的卫星图像是可行的。与先前的工作一致，我们发现自动指标常常与人类感知不一致 - 事实上，我们发现常用的图像质量指标与人类评分之间存在强烈的负相关。

更新时间: 2024-09-02 10:19:39

领域: cs.CV,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2409.01138v1

Large Language Models Can Understanding Depth from Monocular Images

Monocular depth estimation is a critical function in computer vision applications. This paper shows that large language models (LLMs) can effectively interpret depth with minimal supervision, using efficient resource utilization and a consistent neural network architecture. We introduce LLM-MDE, a multimodal framework that deciphers depth through language comprehension. Specifically, LLM-MDE employs two main strategies to enhance the pretrained LLM's capability for depth estimation: cross-modal reprogramming and an adaptive prompt estimation module. These strategies align vision representations with text prototypes and automatically generate prompts based on monocular images, respectively. Comprehensive experiments on real-world MDE datasets confirm the effectiveness and superiority of LLM-MDE, which excels in few-/zero-shot tasks while minimizing resource use. The source code is available.

Updated: 2024-09-02 10:11:52

标题: 大型语言模型可以从单目图像中理解深度

摘要: 单目深度估计是计算机视觉应用中的关键功能。本文表明，大型语言模型(LLMs)可以有效地解释深度，使用高效的资源利用和一致的神经网络架构。我们引入LLM-MDE，一个通过语言理解解密深度的多模态框架。具体来说，LLM-MDE采用两种主要策略来增强预训练的LLM对深度估计的能力：跨模态重编程和自适应提示估计模块。这些策略使视觉表示与文本原型对齐，并分别基于单目图像自动生成提示。对真实世界的MDE数据集进行的全面实验证实了LLM-MDE的有效性和优越性，它在少量/零次任务中表现出色，同时最大限度地减少资源使用。源代码可供使用。

更新时间: 2024-09-02 10:11:52

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01133v1

Physics simulation capabilities of LLMs

[Abridged abstract] Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding. Combining these two capabilities could one day enable AI systems to simulate and predict the physical world. We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems. We condition LLM generation on the use of well-documented and widely-used packages to elicit coding capabilities in the physics and astrophysics domains. We contribute $\sim 50$ original and challenging problems in celestial mechanics (with REBOUND), stellar physics (with MESA), 1D fluid dynamics (with Dedalus) and non-linear dynamics (with SciPy). Since our problems do not admit unique solutions, we evaluate LLM performance on several soft metrics: counts of lines that contain different types of errors (coding, physics, necessity and sufficiency) as well as a more "educational" Pass-Fail metric focused on capturing the salient physical ingredients of the problem at hand. As expected, today's SOTA LLM (GPT4) zero-shot fails most of our problems, although about 40\% of the solutions could plausibly get a passing grade. About $70-90 \%$ of the code lines produced are necessary, sufficient and correct (coding \& physics). Physics and coding errors are the most common, with some unnecessary or insufficient lines. We observe significant variations across problem class and difficulty. We identify several failure modes of GPT4 in the computational physics domain. Our reconnaissance work provides a snapshot of current computational capabilities in classical physics and points to obvious improvement targets if AI systems are ever to reach a basic level of autonomy in physics simulation capabilities.

Updated: 2024-09-02 10:02:51

标题: LLMs的物理模拟能力

摘要: [摘要] 大型语言模型(LLMs)可以解决一些本科到研究生水平的物理教科书问题，并且擅长编码。将这两种能力结合起来，有一天可能使人工智能系统能够模拟和预测物理世界。我们对博士级到研究级计算物理问题的最新技术(LLMs)进行评估。我们在物理和天体物理领域的编码能力上有所贡献，对LLM生成进行条件设置，并使用大量文档化和广泛使用的软件包。我们提出了大约50个原创且具有挑战性的问题，涉及天体力学（使用REBOUND）、恒星物理学（使用MESA）、一维流体动力学（使用Dedalus）和非线性动力学（使用SciPy）。由于我们的问题没有唯一解，我们评估了LLM在几个软指标上的表现：包含不同类型错误（编码、物理、必要性和充分性）的行数计数，以及更“教育性”的及格与否指标，重点捕捉问题的显著物理要素。如预期的那样，今天的最新技术LLM（GPT4）零射未能解决我们大部分问题，尽管大约40％的解决方案可能能够获得及格分数。约70-90％的代码行是必要的、充分的和正确的（编码和物理）。物理和编码错误最常见，还有一些不必要或不充分的行。我们观察到在问题类别和难度上存在显著的变化。我们确定了GPT4在计算物理领域的几种失败模式。我们的侦察工作展示了当前经典物理计算能力的快照，并指出了显而易见的改进目标，如果人工智能系统有望在物理仿真能力方面达到基本水平的自主性。

更新时间: 2024-09-02 10:02:51

领域: cs.AI,astro-ph.EP,astro-ph.IM,astro-ph.SR,physics.data-an

下载: http://arxiv.org/abs/2312.02091v2

Two-stage initial-value iterative physics-informed neural networks for simulating solitary waves of nonlinear wave equations

We propose a new two-stage initial-value iterative neural network (IINN) algorithm for solitary wave computations of nonlinear wave equations based on traditional numerical iterative methods and physics-informed neural networks (PINNs). Specifically, the IINN framework consists of two subnetworks, one of which is used to fit a given initial value, and the other incorporates physical information and continues training on the basis of the first subnetwork. Importantly, the IINN method does not require any additional data information including boundary conditions, apart from the given initial value. Corresponding theoretical guarantees are provided to demonstrate the effectiveness of our IINN method. The proposed IINN method is efficiently applied to learn some types of solutions in different nonlinear wave equations, including the one-dimensional (1D) nonlinear Schr\"odinger equations (NLS) equation (with and without potentials), the 1D saturable NLS equation with PT -symmetric optical lattices, the 1D focusing-defocusing coupled NLS equations, the KdV equation, the two-dimensional (2D) NLS equation with potentials, the 2D amended GP equation with a potential, the (2+1)-dimensional KP equation, and the 3D NLS equation with a potential. These applications serve as evidence for the efficacy of our method. Finally, by comparing with the traditional methods, we demonstrate the advantages of the proposed IINN method.

Updated: 2024-09-02 10:00:02

标题: 两阶段初始值迭代物理信息神经网络用于模拟非线性波动方程的孤立波

摘要: 我们提出了一种新的基于传统数值迭代方法和物理信息神经网络（PINNs）的孤立波非线性波动方程计算的两阶段初值迭代神经网络（IINN）算法。具体而言，IINN框架由两个子网络组成，其中一个用于拟合给定的初始值，另一个则包含物理信息，并在第一个子网络的基础上继续训练。重要的是，IINN方法不需要任何额外的数据信息，包括边界条件，除了给定的初始值。我们提供了相应的理论保证来展示我们的IINN方法的有效性。所提出的IINN方法有效地应用于学习不同非线性波动方程中的一些解，包括一维（1D）非线性薛定谔方程（NLS）方程（有或无势），具有PT对称光栅的一维可饱和NLS方程，一维聚焦-散焦耦合NLS方程，KdV方程，带势的二维（2D）NLS方程，带有势的2D修正GP方程，（2+1）维KP方程，以及带有势的三维NLS方程。这些应用作为我们方法有效性的证据。最后，通过与传统方法的比较，我们展示了所提出的IINN方法的优势。

更新时间: 2024-09-02 10:00:02

领域: physics.comp-ph,cs.AI,cs.LG,math-ph,math.MP,nlin.PS,nlin.SI

下载: http://arxiv.org/abs/2409.01124v1

Fast Robust Kernel Regression through Sign Gradient Descent with Early Stopping

Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the model parameters. Here, we introduce an equivalent formulation of the objective function of KRR, which opens up both for replacing the ridge penalty with the $\ell_\infty$ and $\ell_1$ penalties and for studying kernel ridge regression from the perspective of gradient descent. Using the $\ell_\infty$ and $\ell_1$ penalties, we obtain robust and sparse kernel regression, respectively. We further study the similarities between explicitly regularized kernel regression and the solutions obtained by early stopping of iterative gradient-based methods, where we connect $\ell_\infty$ regularization to sign gradient descent, $\ell_1$ regularization to forward stagewise regression (also known as coordinate descent), and $\ell_2$ regularization to gradient descent, and, in the last case, theoretically bound for the differences. We exploit the close relations between $\ell_\infty$ regularization and sign gradient descent, and between $\ell_1$ regularization and coordinate descent to propose computationally efficient methods for robust and sparse kernel regression. We finally compare robust kernel regression through sign gradient descent to existing methods for robust kernel regression on five real data sets, demonstrating that our method is one to two orders of magnitude faster, without compromising accuracy.

Updated: 2024-09-02 09:54:53

标题: 快速稳健的核回归方法：通过带有提前终止的符号梯度下降算法

摘要: 核岭回归(Kernel ridge regression, KRR)是线性岭回归的一种推广，它在数据上是非线性的，但在模型参数上是线性的。在这里，我们介绍了KRR目标函数的一个等价表达式，这打开了用$\ell_\infty$和$\ell_1$惩罚替代岭惩罚以及从梯度下降的视角研究核岭回归的可能性。使用$\ell_\infty$和$\ell_1$惩罚，我们分别获得了稳健和稀疏的核回归。我们进一步研究了显式正则化核回归和通过早停止迭代梯度下降方法获得的解之间的相似性，其中我们将$\ell_\infty$正则化与符号梯度下降联系起来，将$\ell_1$正则化与前向逐步回归(也称为坐标下降)联系起来，将$\ell_2$正则化与梯度下降联系起来，并在最后一种情况下理论上限制了差异。我们利用$\ell_\infty$正则化与符号梯度下降之间的紧密关系，以及$\ell_1$正则化与坐标下降之间的关系，提出了计算效率高的方法来进行稳健和稀疏的核回归。最后，我们将通过符号梯度下降进行的稳健核回归与现有的用于五个真实数据集的稳健核回归方法进行比较，结果表明我们的方法快了一到两个数量级，而且准确性没有受损。

更新时间: 2024-09-02 09:54:53

领域: stat.ML,cs.LG,math.OC,stat.ME

下载: http://arxiv.org/abs/2306.16838v6

Simplifying the Theory on Over-Smoothing

Graph convolutions have gained popularity due to their ability to efficiently operate on data with an irregular geometric structure. However, graph convolutions cause over-smoothing, which refers to representations becoming more similar with increased depth. However, many different definitions and intuitions currently coexist, leading to research efforts focusing on incompatible directions. This paper attempts to align these directions by showing that over-smoothing is merely a special case of power iteration. This greatly simplifies the existing theory on over-smoothing, making it more accessible. Based on the theory, we provide a novel comprehensive definition of rank collapse as a generalized form of over-smoothing and introduce the rank-one distance as a corresponding metric. Our empirical evaluation of 14 commonly used methods shows that more models than were previously known suffer from this issue.

Updated: 2024-09-02 09:49:49

标题: 简化过度平滑理论

摘要: 图卷积由于其能够高效地处理具有不规则几何结构的数据而变得流行。然而，图卷积会导致过度平滑，这意味着随着深度增加，表示变得更加相似。然而，目前存在许多不同的定义和直觉，导致研究工作集中在不兼容的方向上。本文试图通过展示过度平滑仅仅是幂迭代的一种特例来调整这些方向。这极大简化了关于过度平滑的现有理论，使其更易理解。基于这一理论，我们提出了一个新颖的全面定义，将秩坍缩视为过度平滑的一种广义形式，并引入秩一距离作为相应的度量。我们对14种常用方法进行了实证评估，结果表明比以前所知更多的模型受到这一问题的影响。

更新时间: 2024-09-02 09:49:49

领域: cs.LG

下载: http://arxiv.org/abs/2407.11876v2

On Learning Action Costs from Input Plans

Most of the work on learning action models focus on learning the actions' dynamics from input plans. This allows us to specify the valid plans of a planning task. However, very little work focuses on learning action costs, which in turn allows us to rank the different plans. In this paper we introduce a new problem: that of learning the costs of a set of actions such that a set of input plans are optimal under the resulting planning model. To solve this problem we present $LACFIP^k$, an algorithm to learn action's costs from unlabeled input plans. We provide theoretical and empirical results showing how $LACFIP^k$ can successfully solve this task.

Updated: 2024-09-02 09:48:43

标题: 从输入计划中学习行动成本

摘要: 大多数关于学习动作模型的工作都集中在从输入计划中学习动作的动态。这使我们能够指定计划任务的有效计划。然而，很少有工作关注学习动作成本，这反过来使我们能够对不同的计划进行排名。在本文中，我们引入了一个新问题：学习一组动作的成本，使得一组输入计划在生成的规划模型下是最优的。为了解决这个问题，我们提出了$LACFIP^k$，一种从未标记的输入计划中学习动作成本的算法。我们提供了理论和实证结果，展示了$LACFIP^k$如何成功解决这个任务。

更新时间: 2024-09-02 09:48:43

领域: cs.AI

下载: http://arxiv.org/abs/2408.10889v2

ACE, a generic constraint solver

Constraint Programming (CP) is a useful technology for modeling and solving combinatorial constrained problems. On the one hand, on can use a library like PyCSP3 for easily modeling problems arising in various application fields (e.g., scheduling, planning, data-mining, cryptography, bio-informatics, organic chemistry, etc.). Problem instances can then be directly generated from specific models and data. On the other hand, for solving instances (notably, represented in XCSP3 format), one can use a constraint solver like ACE, which is presented in this paper. ACE is an open-source constraint solver, developed in Java, which focuses on integer variables (including 0/1-Boolean variables), state-of-the-art table constraints, popular global constraints, search heuristics and (mono-criterion) optimization.

Updated: 2024-09-02 09:48:04

标题: ACE，一个通用约束求解器

摘要: Constraint Programming (CP) 是一种用于建模和解决组合约束问题的有用技术。一方面，可以使用类似 PyCSP3 的库轻松地建模各种应用领域中出现的问题（如调度、规划、数据挖掘、密码学、生物信息学、有机化学等）。然后可以直接从特定模型和数据生成问题实例。另一方面，为了解决实例（特别是以 XCSP3 格式表示的实例），可以使用像 ACE 这样的约束求解器，本文介绍了它。ACE 是一个基于 Java 开发的开源约束求解器，专注于整数变量（包括 0/1 布尔变量）、最先进的表约束、流行的全局约束、搜索启发式和（单一标准）优化。

更新时间: 2024-09-02 09:48:04

领域: cs.AI

下载: http://arxiv.org/abs/2302.05405v2

Time series classification with random convolution kernels based transforms: pooling operators and input representations matter

This article presents a new approach based on MiniRocket, called SelF-Rocket, for fast time series classification (TSC). Unlike existing approaches based on random convolution kernels, it dynamically selects the best couple of input representations and pooling operator during the training process. SelF-Rocket achieves state-of-the-art accuracy on the University of California Riverside (UCR) TSC benchmark datasets.

Updated: 2024-09-02 09:42:17

标题: 基于随机卷积核变换的时间序列分类：池算子和输入表示的重要性

摘要: 本文介绍了一种基于MiniRocket的新方法，称为SelF-Rocket，用于快速时间序列分类（TSC）。与现有基于随机卷积核的方法不同，SelF-Rocket在训练过程中动态选择最佳的输入表示和池化运算符。SelF-Rocket在加利福尼亚大学河滨分校（UCR）TSC基准数据集上取得了最先进的准确性。

更新时间: 2024-09-02 09:42:17

领域: cs.LG

下载: http://arxiv.org/abs/2409.01115v1

SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation

Out-of-Distribution (OOD) detection in computer vision is a crucial research area, with related benchmarks playing a vital role in assessing the generalizability of models and their applicability in real-world scenarios. However, existing OOD benchmarks in the literature suffer from two main limitations: (1) they often overlook semantic shift as a potential challenge, and (2) their scale is limited compared to the large datasets used to train modern models. To address these gaps, we introduce SOOD-ImageNet, a novel dataset comprising around 1.6M images across 56 classes, designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions, with a particular focus on the issue of semantic shift. We ensured the necessary scalability and quality by developing an innovative data engine that leverages the capabilities of modern vision-language models, complemented by accurate human checks. Through extensive training and evaluation of various models on SOOD-ImageNet, we showcase its potential to significantly advance OOD research in computer vision. The project page is available at https://github.com/bach05/SOODImageNet.git.

Updated: 2024-09-02 09:37:39

标题: SOOD-ImageNet：用于语义外分布图像分类和语义分割的大规模数据集

摘要: 在计算机视觉中，对于超出分布（OOD）的检测是一个关键的研究领域，相关基准在评估模型的泛化能力和在现实场景中的适用性方面起着至关重要的作用。然而，现有文献中的OOD基准存在两个主要限制：（1）它们往往忽视语义转移作为一个潜在挑战，（2）与用于训练现代模型的大型数据集相比，其规模有限。为了解决这些空白，我们引入了SOOD-ImageNet，一个新颖的数据集，包括约1.6百万张图像，涵盖了56个类别，旨在用于常见的计算机视觉任务，如图像分类和语义分割，以OOD条件下，特别关注语义转移问题。我们通过开发一种创新的数据引擎，利用现代视觉-语言模型的能力，并辅以准确的人工检查，确保了必要的可扩展性和质量。通过对SOOD-ImageNet上各种模型进行广泛的训练和评估，展示了其在推动计算机视觉中OOD研究方面的潜力。该项目页面可在https://github.com/bach05/SOODImageNet.git 上查看。

更新时间: 2024-09-02 09:37:39

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.01109v1

Poster: Developing an O-RAN Security Test Lab

Open Radio Access Networks (ORAN) is a new architectural approach, having been proposed only a few years ago, and it is an expansion of the current Next Generation Radio Access Networks (NG-RAN) of 5G. ORAN aims to break this closed RAN market that is controlled by a handful of vendors, by implementing open interfaces between the different Radio Access Networks (RAN) components, and by introducing modern technologies to the RAN like machine learning, virtualization, and disaggregation. However, the architectural design of ORAN was recently causing concerns and debates about its security, which is considered one of its major drawbacks. Several theoretical risk analyses related to ORAN have been conducted, but to the best of our knowledge, not even a single practical one has been performed yet. In this poster, we discuss and propose a way for a minimal, future-proof deployment of an ORAN 5G network, able to accommodate various hands-on security analyses for its different elements.

Updated: 2024-09-02 09:36:38

标题: 海报：建立一个O-RAN安全测试实验室

摘要: 开放无线接入网络（ORAN）是一种新的架构方法，仅在几年前提出，它是对5G当前下一代无线接入网络（NG-RAN）的扩展。ORAN旨在打破由少数供应商控制的封闭RAN市场，通过在不同无线接入网络（RAN）组件之间实现开放接口，并向RAN引入机器学习、虚拟化和分解等现代技术来实现这一目标。然而，ORAN的架构设计最近引起了关于其安全性的担忧和争论，这被认为是其主要缺点之一。已经进行了几项与ORAN相关的理论风险分析，但据我们所知，尚未进行过一项实际的风险分析。在这篇海报中，我们讨论并提出了一种未来部署ORAN 5G网络的最小化方法，能够容纳对其不同元素进行各种实际安全分析。

更新时间: 2024-09-02 09:36:38

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2409.01107v1

AI Olympics challenge with Evolutionary Soft Actor Critic

In the following report, we describe the solution we propose for the AI Olympics competition held at IROS 2024. Our solution is based on a Model-free Deep Reinforcement Learning approach combined with an evolutionary strategy. We will briefly describe the algorithms that have been used and then provide details of the approach

Updated: 2024-09-02 09:34:18

标题: AI奥林匹克挑战：进化软演员评论员

摘要: 在接下来的报告中，我们描述了我们为2024年IROS举办的AI奥林匹克竞赛提出的解决方案。我们的解决方案基于一种无模型深度强化学习方法，结合了进化策略。我们将简要描述已经使用的算法，然后提供方法的详细信息。

更新时间: 2024-09-02 09:34:18

领域: cs.RO,cs.AI,cs.LG,cs.NE

下载: http://arxiv.org/abs/2409.01104v1

DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios

Accurate real-time object detection enhances the safety of advanced driver-assistance systems, making it an essential component in driving scenarios. With the rapid development of deep learning technology, CNN-based YOLO real-time object detectors have gained significant attention. However, the local focus of CNNs results in performance bottlenecks. To further enhance detector performance, researchers have introduced Transformer-based self-attention mechanisms to leverage global receptive fields, but their quadratic complexity incurs substantial computational costs. Recently, Mamba, with its linear complexity, has made significant progress through global selective scanning. Inspired by Mamba's outstanding performance, we propose a novel object detector: DS MYOLO. This detector captures global feature information through a simplified selective scanning fusion block (SimVSS Block) and effectively integrates the network's deep features. Additionally, we introduce an efficient channel attention convolution (ECAConv) that enhances cross-channel feature interaction while maintaining low computational complexity. Extensive experiments on the CCTSDB 2021 and VLD-45 driving scenarios datasets demonstrate that DS MYOLO exhibits significant potential and competitive advantage among similarly scaled YOLO series real-time object detectors.

Updated: 2024-09-02 09:22:33

标题: DS MYOLO：基于SSMs的用于驾驶场景的可靠物体检测器

摘要: 准确的实时目标检测提高了先进驾驶辅助系统的安全性，使其成为驾驶场景中的重要组成部分。随着深度学习技术的快速发展，基于CNN的YOLO实时目标检测器受到了广泛关注。然而，CNN的局部焦点导致了性能瓶颈。为了进一步提升检测器性能，研究人员引入了基于Transformer的自注意机制来利用全局感受野，但其二次复杂度导致了大量的计算开销。最近，Mamba通过全局选择扫描取得了显著进展，具有线性复杂度。受到Mamba出色表现的启发，我们提出了一种新颖的目标检测器：DS MYOLO。该检测器通过简化的选择性扫描融合块（SimVSS块）捕获全局特征信息，并有效地集成了网络的深层特征。此外，我们引入了一种高效的通道注意卷积（ECAConv），增强了跨通道特征交互，同时保持低计算复杂度。在CCTSDB 2021和VLD-45驾驶场景数据集上进行的大量实验表明，DS MYOLO在类似规模的YOLO系列实时目标检测器中展现出显著的潜力和竞争优势。

更新时间: 2024-09-02 09:22:33

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01093v1

Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach

Digital twins (DTs) have emerged as a promising enabler for representing the real-time states of physical worlds and realizing self-sustaining systems. In practice, DTs of physical devices, such as mobile users (MUs), are commonly deployed in multi-access edge computing (MEC) networks for the sake of reducing latency. To ensure the accuracy and fidelity of DTs, it is essential for MUs to regularly synchronize their status with their DTs. However, MU mobility introduces significant challenges to DT synchronization. Firstly, MU mobility triggers DT migration which could cause synchronization failures. Secondly, MUs require frequent synchronization with their DTs to ensure DT fidelity. Nonetheless, DT migration among MEC servers, caused by MU mobility, may occur infrequently. Accordingly, we propose a two-timescale DT synchronization and migration framework with reliability consideration by establishing a non-convex stochastic problem to minimize the long-term average energy consumption of MUs. We use Lyapunov theory to convert the reliability constraints and reformulate the new problem as a partially observable Markov decision-making process (POMDP). Furthermore, we develop a heterogeneous agent proximal policy optimization with Beta distribution (Beta-HAPPO) method to solve it. Numerical results show that our proposed Beta-HAPPO method achieves significant improvements in energy savings when compared with other benchmarks.

Updated: 2024-09-02 09:20:46

标题: 数字孪生网络的双时间尺度同步和迁移：一种多智能体深度强化学习方法

摘要: 数字孪生体（DTs）已经成为一种有前景的工具，用于表示实时物理世界的状态并实现自持续系统。在实践中，物理设备的DTs，如移动用户（MUs），通常部署在多接入边缘计算（MEC）网络中，以减少延迟。为了确保DTs的准确性和忠实度，MUs需要定期将他们的状态与其DTs同步。然而，MU的移动性给DT同步带来了重大挑战。首先，MU的移动性会触发DT迁移，可能导致同步失败。其次，MUs需要频繁与其DTs同步，以确保DT的忠实度。然而，由于MU的移动性导致的MEC服务器之间的DT迁移可能并不经常发生。因此，我们提出了一个具有可靠性考虑的两时间尺度DT同步和迁移框架，通过建立一个非凸随机问题来最小化MUs的长期平均能耗。我们使用Lyapunov理论将可靠性约束转化，并重新构造新问题，将其形式化为部分可观察马尔可夫决策过程（POMDP）。此外，我们开发了一种具有Beta分布的异质代理近端策略优化（Beta-HAPPO）方法来解决这个问题。数值结果显示，与其他基准相比，我们提出的Beta-HAPPO方法在节能方面取得了显著的改进。

更新时间: 2024-09-02 09:20:46

领域: cs.ET,cs.AI,cs.NI,C.2.3; C.2.4

下载: http://arxiv.org/abs/2409.01092v1

CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

The relentless expansion of deep learning applications in recent years has prompted a pivotal shift toward on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This article addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce CARIn, a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives. Leveraging an expressive multi-objective optimisation framework and a runtime-aware sorting and search algorithm (RASS) as the MOO solver, CARIn facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably, RASS generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of CARIn across various model architectures, such as Convolutional Neural Networks and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem's objectives, reaching 1.92x when compared to single-model designs and up to 10.69x in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06x over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.

Updated: 2024-09-02 09:18:11

标题: CARIn：用于单一和多个DNN工作负载的异构设备上的约束感知和响应推断

摘要: 近年来深度学习应用的不断扩展促使向设备执行的关键转变，这是由对实时处理、增强隐私关注和减少延迟的迫切需求推动的。本文探讨了在移动设备上优化深度神经网络（DNNs）执行所固有的挑战，重点关注设备异构性、多DNN执行和动态运行时适应。我们介绍了CARIn，一个专为在用户定义的服务级目标下优化部署单一和多DNN应用的新框架。利用富有表现力的多目标优化框架和作为MOO求解器的运行时感知排序和搜索算法（RASS），CARIn促进对动态条件的有效适应，同时解决与多DNN执行相关的资源争用问题。值得注意的是，RASS生成一组配置，预测后续运行时适应，确保对环境波动做出快速、低开销的调整。对包括文本分类、场景识别和面部分析在内的各种任务进行广泛评估，展示了CARIn在各种模型架构（如卷积神经网络和变压器）和现实用例中的多功能性。我们观察到在对问题目标的公平处理方面有大幅增强，与单一模型设计相比达到1.92倍，与最新的OODIn框架相比高达10.69倍。此外，在多DNN应用中，我们实现了高达4.06倍的显著增益，超过了硬件不可知设计。最后，我们的框架在有效消除响应环境挑战时识别最佳设计的时间开销的同时，保持了其性能。

更新时间: 2024-09-02 09:18:11

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2409.01089v1

Towards Split Learning-based Privacy-Preserving Record Linkage

Split Learning has been recently introduced to facilitate applications where user data privacy is a requirement. However, it has not been thoroughly studied in the context of Privacy-Preserving Record Linkage, a problem in which the same real-world entity should be identified among databases from different dataholders, but without disclosing any additional information. In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching, by introducing a novel training method through the utilization of Reference Sets, which are publicly available data corpora, showcasing minimal matching impact against a traditional centralized SVM-based technique.

Updated: 2024-09-02 09:17:05

标题: 朝向基于分割学习的隐私保护记录链接

摘要: 最近引入了Split Learning来促进需要用户数据隐私的应用。然而，在隐私保护记录链接的背景下，这一方法尚未得到深入研究。在这种问题中，应该在来自不同数据持有者的数据库中识别相同的实体，但不泄露任何额外信息。本文通过引入一种新颖的训练方法，利用公开可用的数据语料库Reference Sets，展示了Split Learning在隐私保护记录匹配方面的潜力，与传统的集中式SVM技术相比，具有最小的匹配影响。

更新时间: 2024-09-02 09:17:05

领域: cs.CR,cs.DB,cs.LG

下载: http://arxiv.org/abs/2409.01088v1

Pre-Trained Language Models for Keyphrase Prediction: A Review

Keyphrase Prediction (KP) is essential for identifying keyphrases in a document that can summarize its content. However, recent Natural Language Processing (NLP) advances have developed more efficient KP models using deep learning techniques. The limitation of a comprehensive exploration jointly both keyphrase extraction and generation using pre-trained language models spotlights a critical gap in the literature, compelling our survey paper to bridge this deficiency and offer a unified and in-depth analysis to address limitations in previous surveys. This paper extensively examines the topic of pre-trained language models for keyphrase prediction (PLM-KP), which are trained on large text corpora via different learning (supervisor, unsupervised, semi-supervised, and self-supervised) techniques, to provide respective insights into these two types of tasks in NLP, precisely, Keyphrase Extraction (KPE) and Keyphrase Generation (KPG). We introduce appropriate taxonomies for PLM-KPE and KPG to highlight these two main tasks of NLP. Moreover, we point out some promising future directions for predicting keyphrases.

Updated: 2024-09-02 09:15:44

标题: 预训练语言模型用于关键词预测：一项综述

摘要: 关键词预测（KP）对于识别文档中能够概括其内容的关键词至关重要。然而，最近自然语言处理（NLP）的进展已经利用深度学习技术开发出更有效的KP模型。综合探索关键词提取和生成结合使用预训练语言模型的局限性突显了文献中的一个重要空白，促使我们的调查论文填补这一不足，并提供统一且深入的分析来解决先前调查中的局限性。本文广泛研究了用于关键词预测（PLM-KP）的预训练语言模型的主题，这些模型通过不同的学习技术（监督、无监督、半监督和自监督）在大型文本语料库上进行训练，以提供对NLP中这两种任务——关键词提取（KPE）和关键词生成（KPG）的洞察。我们为PLM-KPE和KPG引入了适当的分类法，以突出NLP中这两个主要任务。此外，我们指出了一些有前途的未来方向，用于预测关键词。

更新时间: 2024-09-02 09:15:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.01087v1

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing

Fashion image editing is a crucial tool for designers to convey their creative ideas by visualizing design concepts interactively. Current fashion image editing techniques, though advanced with multimodal prompts and powerful diffusion models, often struggle to accurately identify editing regions and preserve the desired garment texture detail. To address these challenges, we introduce a new multimodal fashion image editing architecture based on latent diffusion models, called Detail-Preserved Diffusion Models (DPDEdit). DPDEdit guides the fashion image generation of diffusion models by integrating text prompts, region masks, human pose images, and garment texture images. To precisely locate the editing region, we first introduce Grounded-SAM to predict the editing region based on the user's textual description, and then combine it with other conditions to perform local editing. To transfer the detail of the given garment texture into the target fashion image, we propose a texture injection and refinement mechanism. Specifically, this mechanism employs a decoupled cross-attention layer to integrate textual descriptions and texture images, and incorporates an auxiliary U-Net to preserve the high-frequency details of generated garment texture. Additionally, we extend the VITON-HD dataset using a multimodal large language model to generate paired samples with texture images and textual descriptions. Extensive experiments show that our DPDEdit outperforms state-of-the-art methods in terms of image fidelity and coherence with the given multimodal inputs.

Updated: 2024-09-02 09:15:26

标题: DPDEdit：用于多模态时尚图像编辑的保留细节的扩散模型

摘要: 时尚图像编辑是设计师传达其创意观念的关键工具，通过交互式地可视化设计概念。当前的时尚图像编辑技术虽然具有多模态提示和强大的扩散模型，但往往难以准确识别编辑区域并保留所需的服装纹理细节。为了解决这些挑战，我们引入了一种基于潜在扩散模型的新型多模态时尚图像编辑架构，称为Detail-Preserved Diffusion Models（DPDEdit）。DPDEdit通过集成文本提示、区域蒙版、人体姿势图像和服装纹理图像来指导扩散模型的时尚图像生成。为了精确定位编辑区域，我们首先引入了Grounded-SAM，根据用户的文本描述预测编辑区域，然后将其与其他条件结合起来进行局部编辑。为了将给定服装纹理的细节转移到目标时尚图像中，我们提出了一种纹理注入和细化机制。具体地，这种机制采用了一个解耦的交叉注意力层，将文本描述和纹理图像整合在一起，并结合辅助U-Net来保留生成的服装纹理的高频细节。此外，我们使用多模态大型语言模型扩展了VITON-HD数据集，生成了具有纹理图像和文本描述的配对样本。大量实验表明，我们的DPDEdit在图像保真度和与给定多模态输入的一致性方面优于现有方法。

更新时间: 2024-09-02 09:15:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01086v1

FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before inference by fine-tuning only the last Feed-Forward Network (FFN) module. This targeted approach ensures efficient optimization without overfitting, significantly improving the model's ability to comprehend and accurately follow the context. Our experiments demonstrate substantial gains in reading comprehension, text summarization and adherence to output structures. For instance, FastMem improves the accuracy of Llama 3-8B-Inst on the NQ-SWAP dataset from 59.1% to 71.6%, and reduces the output structure failure rate of Qwen 1.5-4B-Chat from 34.9% to 25.5%. Extensive experimental results highlight FastMem's potential to offer a robust solution to enhance the reliability and accuracy of LLMs in various applications. Our code is available at: https://github.com/IAAR-Shanghai/FastMem

Updated: 2024-09-02 09:13:51

标题: FastMem：快速记忆提示改善大型语言模型的上下文意识

摘要: 大型语言模型(LLMs)在生成连贯文本方面表现出色，但它们常常在上下文意识方面遇到困难，导致在需要忠实遵循提供信息的任务中出现不准确性。我们引入了FastMem，这是一种旨在通过快速记忆提示来增强指导微调的LLMs上下文意识的新方法。FastMem通过仅微调最后的前馈网络(FFN)模块，在推理前最大化提示的可能性。这种有针对性的方法确保了高效的优化而不会过度拟合，显著提高了模型理解和准确遵循上下文的能力。我们的实验表明，在阅读理解、文本摘要和遵循输出结构方面取得了实质性的收益。例如，FastMem将NQ-SWAP数据集上Llama 3-8B-Inst的准确率从59.1%提高到71.6%，将Qwen 1.5-4B-Chat的输出结构失败率从34.9%降低到25.5%。广泛的实验结果突显了FastMem在各种应用中提供强大解决方案的潜力。我们的代码可在以下链接找到：https://github.com/IAAR-Shanghai/FastMem

更新时间: 2024-09-02 09:13:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.16069v2

Affordance-based Robot Manipulation with Flow Matching

We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with a single flow matching policy also leads to consistently better performance than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.

Updated: 2024-09-02 09:11:28

标题: 基于功能性的流匹配的机器人操作

摘要: 我们提出了一个辅助机器人操作的框架，重点解决了两个基本挑战：首先，有效地将大规模模型调整到下游场景可供性理解任务中，特别是在日常生活场景中，收集涉及人类的多任务数据需要大量努力；其次，通过基于视觉可供性模型的机器人轨迹学习来有效地学习机器人轨迹。我们通过采用一个参数高效的提示调整方法来解决第一个挑战，该方法将可学习的文本提示前置到冻结的视觉模型中，以预测多任务场景中的操作可供性。然后，我们提出通过受可供性指导的机器人轨迹在监督的流匹配方法中学习。流匹配将机器人视觉动作策略表示为将随机航路点流向期望的机器人轨迹的条件过程。最后，我们介绍了一个横跨日常生活活动的10个任务的真实世界数据集，用于测试我们的框架。我们的广泛评估突出显示，采用语言提示器学习操作可供性的提出方法实现了具有竞争力的性能，甚至在各种数据规模上优于其他微调协议，同时满足了参数效率。采用单一流匹配策略学习多任务机器人轨迹也比替代的行为克隆方法表现一致更好，特别是在给定多模态机器人动作分布的情况下。我们的框架无缝地统一了可供性模型学习和通过流匹配生成机器人操作轨迹。

更新时间: 2024-09-02 09:11:28

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.01083v1

Evidential Transformers for Improved Image Retrieval

We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval. In this paper, we make several contributions to content-based image retrieval (CBIR). We incorporate probabilistic methods into image retrieval, achieving robust and reliable results, with evidential classification surpassing traditional training based on multiclass classification as a baseline for deep metric learning. Furthermore, we improve the state-of-the-art retrieval results on several datasets by leveraging the Global Context Vision Transformer (GC ViT) architecture. Our experimental results consistently demonstrate the reliability of our approach, setting a new benchmark in CBIR in all test settings on the Stanford Online Products (SOP) and CUB-200-2011 datasets.

Updated: 2024-09-02 09:10:47

标题: 证据变换器用于改进图像检索

摘要: 我们介绍了证据变换器(Evidential Transformer)，这是一种基于不确定性驱动的改进和稳健的图像检索变换器模型。在本文中，我们对基于内容的图像检索(CBIR)做出了几项贡献。我们将概率方法融入图像检索中，实现了稳健可靠的结果，证据分类超越了基于多类分类的传统训练，作为深度度量学习的基线。此外，我们通过利用全局上下文视觉变换器(GC ViT)架构，提高了几个数据集上的最新检索结果。我们的实验结果一致表明了我们方法的可靠性，在Stanford Online Products (SOP)和CUB-200-2011数据集的所有测试设置中，都设立了一个新的基准。

更新时间: 2024-09-02 09:10:47

领域: cs.CV,cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.01082v1

Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization

With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-domain DP methods incompatible. Therefore, we propose a Molecular data Pruning framework for enhanced Generalization (MolPeg), which focuses on the source-free data pruning scenario, where data pruning is applied with pretrained models. By maintaining two models with different updating paces during training, we introduce a novel scoring function to measure the informativeness of samples based on the loss discrepancy. As a plug-and-play framework, MolPeg realizes the perception of both source and target domain and consistently outperforms existing DP methods across four downstream tasks. Remarkably, it can surpass the performance obtained from full-dataset training, even when pruning up to 60-70% of the data on HIV and PCBA dataset. Our work suggests that the discovery of effective data-pruning metrics could provide a viable path to both enhanced efficiency and superior generalization in transfer learning.

Updated: 2024-09-02 09:06:04

标题: 超越效率：分子数据修剪以增强泛化效果

摘要: 随着各种分子任务和海量数据集的出现，如何进行高效训练已成为该领域中一项紧迫但尚未被充分探讨的问题。数据修剪（DP）作为一种常见的节省训练负担的方法，会滤除较不重要的样本以形成一个用于训练的核心集。然而，对于分子任务越来越依赖预训练模型，传统领域内DP方法已不再兼容。因此，我们提出了一个用于增强泛化的分子数据修剪框架（MolPeg），专注于无源数据修剪场景，其中数据修剪是应用于预训练模型的。通过在训练过程中维护两个具有不同更新速度的模型，我们引入了一种基于损失差异度量样本信息量的新型评分函数。作为一个即插即用的框架，MolPeg实现了源领域和目标领域的感知，并在四个下游任务中持续优于现有的DP方法。值得注意的是，即使在HIV和PCBA数据集上修剪了60-70%的数据，它仍然能够超越从完整数据集训练中获得的性能。我们的工作表明，发现有效的数据修剪度量标准可能为转移学习中的增强效率和优越泛化提供一条可行的路径。

更新时间: 2024-09-02 09:06:04

领域: cs.LG,cs.AI,q-bio.BM

下载: http://arxiv.org/abs/2409.01081v1

Bootstrap SGD: Algorithmic Stability and Robustness

In this paper some methods to use the empirical bootstrap approach for stochastic gradient descent (SGD) to minimize the empirical risk over a separable Hilbert space are investigated from the view point of algorithmic stability and statistical robustness. The first two types of approaches are based on averages and are investigated from a theoretical point of view. A generalization analysis for bootstrap SGD of Type 1 and Type 2 based on algorithmic stability is done. Another type of bootstrap SGD is proposed to demonstrate that it is possible to construct purely distribution-free pointwise confidence intervals of the median curve using bootstrap SGD.

Updated: 2024-09-02 08:56:39

标题: Bootstrap SGD：算法稳定性和鲁棒性

摘要: 本文从算法稳定性和统计鲁棒性的角度研究了利用经验bootstrap方法来最小化可分离希尔伯特空间上的经验风险的随机梯度下降（SGD）的一些方法。前两种方法基于平均值，并从理论角度进行了研究。基于算法稳定性的bootstrap SGD Type 1和Type 2的泛化分析已经完成。提出了另一种类型的bootstrap SGD，以证明可以使用bootstrap SGD构建纯粹无分布的点值置信区间来估计中位数曲线。

更新时间: 2024-09-02 08:56:39

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.01074v1

SCOPE: Sign Language Contextual Processing with Embedding from LLMs

Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition (SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. To address these challenges, we introduce SCOPE (Sign language Contextual Processing with Embedding from LLMs), a novel context-aware vision-based SLR and SLT framework. For SLR, we utilize dialogue contexts through a multi-modal encoder to enhance gloss-level recognition. For subsequent SLT, we further fine-tune a Large Language Model (LLM) by incorporating prior conversational context. We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T, CSL-Daily, and our SCOPE dataset. Moreover, surveys conducted with participants from the Deaf community further validate the robustness and effectiveness of our approach in real-world applications. Both our dataset and code will be open-sourced to facilitate further research.

Updated: 2024-09-02 08:56:12

标题: 范围：LLM中嵌入的手语语境处理

摘要: 手语是全球约7000万聋人使用的一种视觉语言，传达视觉和语境信息。目前基于视觉的手语识别（SLR）和翻译（SLT）方法在对话场景中存在困难，原因是数据集多样性有限，忽略了语境相关信息。为了解决这些挑战，我们引入了SCOPE（带有来自LLMs的嵌入的手语语境处理）这一新颖的基于视觉的SLR和SLT框架。对于SLR，我们利用多模态编码器通过对话语境来增强词汇水平的识别。对于随后的SLT，我们进一步通过整合先前的对话语境来微调大型语言模型（LLM）。我们还贡献了一个新的手语数据集，其中包含72小时的中文手语视频，在不同场景下进行语境对话。实验结果表明，我们的SCOPE框架在多个数据集上达到了最先进的性能，包括Phoenix-2014T、CSL-Daily和我们的SCOPE数据集。此外，我们与聋人社区的参与者进行的调查进一步验证了我们的方法在实际应用中的鲁棒性和有效性。我们的数据集和代码都将开源，以促进进一步研究。

更新时间: 2024-09-02 08:56:12

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.01073v1

Exploring and Learning Structure: Active Inference Approach in Navigational Agents

Drawing inspiration from animal navigation strategies, we introduce a novel computational model for navigation and mapping, rooted in biologically inspired principles. Animals exhibit remarkable navigation abilities by efficiently using memory, imagination, and strategic decision-making to navigate complex and aliased environments. Building on these insights, we integrate traditional cognitive mapping approaches with an Active Inference Framework (AIF) to learn an environment structure in a few steps. Through the incorporation of topological mapping for long-term memory and AIF for navigation planning and structure learning, our model can dynamically apprehend environmental structures and expand its internal map with predicted beliefs during exploration. Comparative experiments with the Clone-Structured Graph (CSCG) model highlight our model's ability to rapidly learn environmental structures in a single episode, with minimal navigation overlap. this is achieved without prior knowledge of the dimensions of the environment or the type of observations, showcasing its robustness and effectiveness in navigating ambiguous environments.

Updated: 2024-09-02 08:48:12

标题: 探索和学习结构：导航代理中的主动推理方法

摘要: 灵感来源于动物导航策略，我们引入了一种根植于生物启发原则的导航和地图计算模型。动物通过有效地利用记忆、想象力和战略决策来展示出卓越的导航能力，以在复杂和模糊的环境中导航。基于这些见解，我们将传统的认知地图方法与主动推断框架（AIF）相结合，以在几个步骤中学习环境结构。通过将拓扑地图用于长期记忆和AIF用于导航规划和结构学习的整合，我们的模型能够动态地理解环境结构，并在探索过程中通过预测信念扩展其内部地图。与克隆结构化图（CSCG）模型的比较实验突显了我们的模型在单次事件中迅速学习环境结构的能力，且导航重叠最小。这一成就是在没有先验环境尺寸或观测类型知识的情况下实现的，展示了其在导航模糊环境中的鲁棒性和有效性。

更新时间: 2024-09-02 08:48:12

领域: cs.AI,cs.NE,cs.RO

下载: http://arxiv.org/abs/2408.05982v2

Learning in Hybrid Active Inference Models

An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and successful planning through the delineation of abstract sub-goals.

Updated: 2024-09-02 08:41:45

标题: 混合主动推理模型中的学习

摘要: 人工智能中的一个开放性问题是系统如何灵活地学习对解决本质上连续问题有用的离散抽象。在计算神经科学的先前工作中，已经考虑了在主动推理的形式主义下在决策过程中离散和连续变量的功能集成。然而，他们的重点在于分类决策的表达物理实现，且假定层次混合生成模型是已知的。因此，不清楚这个框架如何扩展到学习。因此，我们提出了一种新颖的层次混合主动推理代理，其中高级离散主动推理规划者位于低级连续主动推理控制器之上。我们利用最近在循环切换线性动态系统（rSLDS）中的工作，通过对复杂连续动态的分段线性分解来实现有意义的离散表示的端到端学习。rSLDS学习的表示形式指导了混合决策代理的结构，使我们能够（1）指定类似于选项框架的时间抽象子目标，（2）将探索提升到离散空间，从而利用信息论探索奖励，以及（3）在离散规划者中缓存低级问题的近似解决方案。我们将我们的模型应用于稀疏连续山车任务，通过增强的探索实现快速系统识别，并通过划分抽象子目标成功规划。

更新时间: 2024-09-02 08:41:45

领域: cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2409.01066v1

Defending against Model Inversion Attacks via Random Erasing

Model Inversion (MI) is a type of privacy violation that focuses on reconstructing private training data through abusive exploitation of machine learning models. To defend against MI attacks, state-of-the-art (SOTA) MI defense methods rely on regularizations that conflict with the training loss, creating explicit tension between privacy protection and model utility. In this paper, we present a new method to defend against MI attacks. Our method takes a new perspective and focuses on training data. Our idea is based on a novel insight on Random Erasing (RE), which has been applied in the past as a data augmentation technique to improve the model accuracy under occlusion. In our work, we instead focus on applying RE for degrading MI attack accuracy. Our key insight is that MI attacks require significant amount of private training data information encoded inside the model in order to reconstruct high-dimensional private images. Therefore, we propose to apply RE to reduce private information presented to the model during training. We show that this can lead to substantial degradation in MI reconstruction quality and attack accuracy. Meanwhile, natural accuracy of the model is only moderately affected. Our method is very simple to implement and complementary to existing defense methods. Our extensive experiments of 23 setups demonstrate that our method can achieve SOTA performance in balancing privacy and utility of the models. The results consistently demonstrate the superiority of our method over existing defenses across different MI attacks, network architectures, and attack configurations.

Updated: 2024-09-02 08:37:17

标题: 通过随机擦除来防御模型反演攻击

摘要: 模型反演（MI）是一种侵犯隐私的类型，专注于通过滥用机器学习模型来重建私人训练数据。为了抵御MI攻击，最先进的MI防御方法依赖于与训练损失相冲突的正则化，从而在隐私保护和模型效用之间创建显性紧张关系。在本文中，我们提出了一种新的方法来抵御MI攻击。我们的方法采取了一个新的视角，重点放在训练数据上。我们的想法基于对Random Erasing（RE）的新颖见解，过去已将其应用作为一种数据增强技术，以提高模型在遮挡下的准确性。在我们的工作中，我们转而专注于将RE应用于降低MI攻击的准确性。我们的关键见解是，MI攻击需要在模型中编码大量私人训练数据信息，以重建高维私人图像。因此，我们建议在训练过程中应用RE，以减少向模型呈现的私人信息。我们展示了这可能会导致MI重建质量和攻击准确性的显著降低。与此同时，模型的自然准确率只受到适度影响。我们的方法非常简单易行，且与现有防御方法互补。我们进行了23个设置的广泛实验，结果表明我们的方法可以在平衡模型的隐私和效用方面取得最先进的性能。结果一致表明，我们的方法在不同的MI攻击、网络架构和攻击配置下均优于现有的防御措施。

更新时间: 2024-09-02 08:37:17

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2409.01062v1

No Peer, no Cry: Network Application Fuzzing via Fault Injection

Network-facing applications are commonly exposed to all kinds of attacks, especially when connected to the internet. As a result, web servers like Nginx or client applications such as curl make every effort to secure and harden their code to rule out memory safety violations. One would expect this to include regular fuzz testing, as fuzzing has proven to be one of the most successful approaches to uncovering bugs in software. Yet, surprisingly little research has focused on fuzzing network applications. When studying the underlying reasons, we find that the interactive nature of communication, its statefulness, and the protection of exchanged messages render typical fuzzers ineffective. Attempts to replay recorded messages or modify them on the fly only work for specific targets and often lead to early termination of communication. In this paper, we discuss these challenges in detail, highlighting how the focus of existing work on protocol state space promises little relief. We propose a fundamentally different approach that relies on fault injection rather than modifying messages. Effectively, we force one of the communication peers into a weird state where its output no longer matches the expectations of the target peer, potentially uncovering bugs. Importantly, this weird peer can still properly encrypt/sign the protocol message, overcoming a fundamental challenge of current fuzzers. In effect, we leave the communication system intact but introduce small corruptions. Since we can turn either the server or the client into the weird peer, our approach is the first that can effectively test client-side network applications. Evaluating 16 targets, we show that Fuzztruction-Net outperforms other fuzzers in terms of coverage and bugs found. Overall, Fuzztruction-Net uncovered 23 new bugs in well-tested software, such as the web servers Nginx and Apache HTTPd and the OpenSSH client.

Updated: 2024-09-02 08:35:55

标题: 无同行，无哭泣：通过故障注入进行网络应用模糊测试

摘要: 网络应用程序通常暴露于各种攻击，特别是在连接到互联网时。因此，像Nginx这样的Web服务器或像curl这样的客户端应用程序会尽一切努力保护和加固其代码，以排除内存安全问题。人们期望这包括定期的模糊测试，因为模糊测试已被证明是发现软件中bug的最成功方法之一。然而，令人惊讶的是，很少有研究关注模糊化网络应用程序。在研究底层原因时，我们发现通信的互动性质、其状态性和交换消息的保护使得典型的模糊器无效。尝试重播记录的消息或实时修改它们只适用于特定目标，通常会导致通信的早期终止。在本文中，我们详细讨论了这些挑战，强调现有工作对协议状态空间的关注带来的希望有限。我们提出了一个基本不同的方法，依赖于故障注入而不是修改消息。有效地，我们让通信的一方处于异常状态，使其输出不再符合目标方的期望，从而可能发现bug。重要的是，这个异常的通信方仍然可以正确加密/签署协议消息，克服了当前模糊器的一个基本挑战。实际上，我们保持通信系统完整，但引入小的破坏。由于我们可以将服务器或客户端转变为异常的通信方，我们的方法是第一个可以有效测试客户端网络应用程序的方法。通过评估16个目标，我们展示了Fuzztruction-Net在覆盖范围和发现bug方面优于其他模糊器。总体而言，Fuzztruction-Net在经过充分测试的软件中发现了23个新的bug，例如Web服务器Nginx和Apache HTTPd以及OpenSSH客户端。

更新时间: 2024-09-02 08:35:55

领域: cs.CR

下载: http://arxiv.org/abs/2409.01059v1

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denoising iterations misses the opportunity to update fine details, resulting in noticeable quality degradation. In our work, we introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models. Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality. To realize the practical benefits of this intuition, we conduct an extensive analysis and propose a novel method, FRDiff. FRDiff is designed to harness the advantages of both reduced NFE and feature reuse, achieving a Pareto frontier that balances fidelity and latency trade-offs in various generative tasks.

Updated: 2024-09-02 08:30:37

标题: FRDiff：特征重用用于扩散模型的通用免训练加速

摘要: 扩散模型的大量计算成本，特别是由于为了生成高质量图像而必需的重复去噪步骤，成为它们广泛采用的主要障碍。虽然有几项研究尝试通过使用先进的ODE求解器来减少评分函数评估（NFE）的数量，而无需进行微调来解决这个问题，但减少去噪迭代次数会错过更新精细细节的机会，导致明显的质量下降。在我们的工作中，我们引入了一种利用扩散模型固有的时间冗余的先进加速技术。重复使用具有高时间相似性的特征图为节省计算资源提供了新机会，而不会损害输出质量。为了实现这种直觉的实际好处，我们进行了广泛分析并提出了一种新方法FRDiff。FRDiff旨在利用减少的NFE和特征重用的优势，实现在各种生成任务中平衡保真度和延迟权衡的帕累托前沿。

更新时间: 2024-09-02 08:30:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.03517v3

BadMerging: Backdoor Attacks Against Model Merging

Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective approach to facilitate knowledge transfer among these independently fine-tuned models. MM directly combines multiple fine-tuned task-specific models into a merged model without additional training, and the resulting model shows enhanced capabilities in multiple tasks. Although MM provides great utility, it may come with security risks because an adversary can exploit MM to affect multiple downstream tasks. However, the security risks of MM have barely been studied. In this paper, we first find that MM, as a new learning paradigm, introduces unique challenges for existing backdoor attacks due to the merging process. To address these challenges, we introduce BadMerging, the first backdoor attack specifically designed for MM. Notably, BadMerging allows an adversary to compromise the entire merged model by contributing as few as one backdoored task-specific model. BadMerging comprises a two-stage attack mechanism and a novel feature-interpolation-based loss to enhance the robustness of embedded backdoors against the changes of different merging parameters. Considering that a merged model may incorporate tasks from different domains, BadMerging can jointly compromise the tasks provided by the adversary (on-task attack) and other contributors (off-task attack) and solve the corresponding unique challenges with novel attack designs. Extensive experiments show that BadMerging achieves remarkable attacks against various MM algorithms. Our ablation study demonstrates that the proposed attack designs can progressively contribute to the attack performance. Finally, we show that prior defense mechanisms fail to defend against our attacks, highlighting the need for more advanced defense.

Updated: 2024-09-02 08:28:44

标题: BadMerging: 模型合并的后门攻击

摘要: 对预训练模型进行微调以用于下游任务已导致开源任务特定模型的激增。最近，模型合并（MM）已被证明是促进这些独立微调模型之间知识转移的有效方法。MM直接将多个微调的任务特定模型合并为一个合并模型，无需额外训练，而生成的模型在多个任务中显示出增强的能力。尽管MM提供了很大的实用性，但可能存在安全风险，因为对手可以利用MM来影响多个下游任务。然而，对MM的安全风险几乎没有进行研究。在本文中，我们首先发现，作为一种新的学习范式，MM由于合并过程而引入了现有后门攻击所面临的独特挑战。为了解决这些挑战，我们引入了BadMerging，这是专门为MM设计的第一个后门攻击。值得注意的是，BadMerging允许对手通过贡献至少一个带有后门的任务特定模型来 compromise 整个合并模型。BadMerging包括一个两阶段攻击机制和基于特征插值的损失函数，以增强嵌入后门的抵抗力，使其对不同合并参数的更改具有稳健性。考虑到合并模型可能包含来自不同领域的任务，BadMerging可以共同破坏对手提供的任务（on-task 攻击）和其他贡献者（off-task 攻击），并通过新颖的攻击设计解决相应的独特挑战。大量实验表明，BadMerging对各种MM算法实现了显著的攻击。我们的消融研究表明，提出的攻击设计可以逐步提升攻击性能。最后，我们展示先前的防御机制无法抵御我们的攻击，突显了对更先进的防御的需求。

更新时间: 2024-09-02 08:28:44

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2408.07362v2

A Perspective on Literary Metaphor in the Context of Generative AI

At the intersection of creative text generation and literary theory, this study explores the role of literary metaphor and its capacity to generate a range of meanings. In this regard, literary metaphor is vital to the development of any particular language. To investigate whether the inclusion of original figurative language improves textual quality, we trained an LSTM-based language model in Afrikaans. The network produces phrases containing compellingly novel figures of speech. Specifically, the emphasis falls on how AI might be utilised as a defamiliarisation technique, which disrupts expected uses of language to augment poetic expression. Providing a literary perspective on text generation, the paper raises thought-provoking questions on aesthetic value, interpretation and evaluation.

Updated: 2024-09-02 08:27:29

标题: 一种关于生成式人工智能背景下文学隐喻的视角

摘要: 在创意文本生成和文学理论的交叉点上，这项研究探讨了文学隐喻的作用以及其生成各种含义的能力。在这方面，文学隐喻对任何特定语言的发展至关重要。为了调查原创比喻语言是否会提高文本质量，我们在南非荷兰语中训练了基于LSTM的语言模型。该网络生成包含引人入胜的新颖修辞的短语。具体而言，重点在于AI如何作为一种陌生化技术的利用，该技术打破了语言的预期用法以增强诗意表达。通过提供文学视角对文本生成进行探讨，本文引发了有关审美价值、解释和评价的发人深省的问题。

更新时间: 2024-09-02 08:27:29

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.01053v1

Accelerated Multi-objective Task Learning using Modified Q-learning Algorithm

Robots find extensive applications in industry. In recent years, the influence of robots has also increased rapidly in domestic scenarios. The Q-learning algorithm aims to maximise the reward for reaching the goal. This paper proposes a modified version of the Q-learning algorithm, known as Q-learning with scaled distance metric (Q-SD). This algorithm enhances task learning and makes task completion more meaningful. A robotic manipulator (agent) applies the Q-SD algorithm to the task of table cleaning. Using Q-SD, the agent acquires the sequence of steps necessary to accomplish the task while minimising the manipulator's movement distance. We partition the table into grids of different dimensions. The first has a grid count of 3 times 3, and the second has a grid count of 4 times 4. Using the Q-SD algorithm, the maximum success obtained in these two environments was 86% and 59% respectively. Moreover, Compared to the conventional Q-learning algorithm, the drop in average distance moved by the agent in these two environments using the Q-SD algorithm was 8.61% and 6.7% respectively.

Updated: 2024-09-02 08:20:41

标题: 使用修改后的Q学习算法加速多目标任务学习

摘要: 机器人在工业中有着广泛的应用。近年来，机器人在家庭场景中的影响也迅速增加。Q-learning算法旨在最大化到达目标的奖励。本文提出了Q-learning算法的修改版本，即带有缩放距离度量的Q-learning（Q-SD）。该算法增强了任务学习，并使任务完成更有意义。一个机器人操作器（代理）将Q-SD算法应用于桌面清洁任务。使用Q-SD，代理获取了完成任务所需的步骤序列，同时最小化操作器的移动距离。我们将桌子划分为不同尺寸的网格。第一个网格计数为3乘以3，第二个网格计数为4乘以4。使用Q-SD算法，在这两个环境中获得的最大成功率分别为86%和59%。此外，与传统的Q-learning算法相比，在这两个环境中使用Q-SD算法的代理移动的平均距离分别减少了8.61%和6.7%。

更新时间: 2024-09-02 08:20:41

领域: cs.RO,cs.AI,68T05, 93C85, 93B40, 90C29,I.2.6; I.2.9; I.2.8; F.1.1; F.2.1; H.1.2; G.1.6

下载: http://arxiv.org/abs/2409.01046v1

Robust Vehicle Localization and Tracking in Rain using Street Maps

GPS-based vehicle localization and tracking suffers from unstable positional information commonly experienced in tunnel segments and in dense urban areas. Also, both Visual Odometry (VO) and Visual Inertial Odometry (VIO) are susceptible to adverse weather conditions that causes occlusions or blur on the visual input. In this paper, we propose a novel approach for vehicle localization that uses street network based map information to correct drifting odometry estimates and intermittent GPS measurements especially, in adversarial scenarios such as driving in rain and tunnels. Specifically, our approach is a flexible fusion algorithm that integrates intermittent GPS, drifting IMU and VO estimates together with 2D map information for robust vehicle localization and tracking. We refer to our approach as Map-Fusion. We robustly evaluate our proposed approach on four geographically diverse datasets from different countries ranging across clear and rain weather conditions. These datasets also include challenging visual segments in tunnels and underpasses. We show that with the integration of the map information, our Map-Fusion algorithm reduces the error of the state-of-the-art VO and VIO approaches across all datasets. We also validate our proposed algorithm in a real-world environment and in real-time on a hardware constrained mobile robot. Map-Fusion achieved 2.46m error in clear weather and 6.05m error in rain weather for a 150m route.

Updated: 2024-09-02 08:15:12

标题: 在雨天使用街道地图进行稳健的车辆定位和跟踪

摘要: 基于GPS的车辆定位和跟踪在隧道段和密集城区常常遇到不稳定的位置信息。此外，视觉里程计（VO）和视觉惯性里程计（VIO）都容易受到恶劣天气条件的影响，导致视觉输入出现遮挡或模糊。在本文中，我们提出了一种新颖的车辆定位方法，利用基于街道网络的地图信息来校正漂移的里程估计和间歇性的GPS测量，特别是在恶劣情况下，如雨天和隧道驾驶时。具体来说，我们的方法是一种灵活的融合算法，将间歇性GPS、漂移IMU和VO估计与2D地图信息结合起来，实现鲁棒的车辆定位和跟踪。我们将这种方法称为Map-Fusion。我们在来自不同国家的四个地理多样化数据集上对我们提出的方法进行了鲁棒性评估，这些数据集涵盖了晴天和雨天等各种天气条件。这些数据集还包括隧道和地下通道等具有挑战性的视觉段。我们展示了通过整合地图信息，我们的Map-Fusion算法在所有数据集上减少了现有VO和VIO方法的误差。我们还在硬件受限的移动机器人上在真实环境中实时验证了我们提出的算法。Map-Fusion在晴天时误差为2.46米，在雨天时误差为6.05米，针对150米路线。

更新时间: 2024-09-02 08:15:12

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.01038v1

Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning

Large language models demonstrate impressive performance on downstream tasks, yet requiring extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions--critical for transitioning large models from pre-trained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties, and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of task-specific directions during the fine-tuning process, thereby enhancing model performance on targeted tasks. Extensive experiments have conclusively demonstrated the effectiveness of LoRA-Dash, and in-depth analyses further reveal the underlying mechanisms of LoRA-Dash. The code is available at https://github.com/Chongjie-Si/Subspace-Tuning.

Updated: 2024-09-02 08:10:51

标题: 释放参数高效微调中任务特定指令的力量

摘要: 大型语言模型在下游任务上表现出令人印象深刻的性能，但在完全微调所有参数时需要大量资源。为了缓解这一问题，已经开发了参数高效微调（PEFT）策略，如LoRA。在本文中，我们深入探讨了任务特定方向的概念--对于将大型模型从预训练状态过渡到PEFT中的任务特定增强至关重要。我们提出了一个框架，明确定义这些方向并探讨它们的属性和实际利用挑战。然后，我们介绍了一种新颖的方法，LoRA-Dash，旨在在微调过程中最大化任务特定方向的影响，从而提高模型在目标任务上的性能。大量实验证明了LoRA-Dash的有效性，并深入分析进一步揭示了LoRA-Dash的潜在机制。代码可在https://github.com/Chongjie-Si/Subspace-Tuning 上找到。

更新时间: 2024-09-02 08:10:51

领域: cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.01035v1

Variation in prediction accuracy due to randomness in data division and fair evaluation using interval estimation

This paper attempts to answer a "simple question" in building predictive models using machine learning algorithms. Although diagnostic and predictive models for various diseases have been proposed using data from large cohort studies and machine learning algorithms, challenges remain in their generalizability. Several causes for this challenge have been pointed out, and partitioning of the dataset with randomness is considered to be one of them. In this study, we constructed 33,600 diabetes diagnosis models with "initial state" dependent randomness using autoML (automatic machine learning framework) and open diabetes data, and evaluated their prediction accuracy. The results showed that the prediction accuracy had an initial state-dependent distribution. Since this distribution could follow a normal distribution, we estimated the expected interval of prediction accuracy using statistical interval estimation in order to fairly compare the accuracy of the prediction models.

Updated: 2024-09-02 08:05:13

标题: 数据划分中的随机性和使用区间估计进行公平评估导致预测准确性的变化

摘要: 本文试图回答一个"简单问题"，即如何利用机器学习算法构建预测模型。尽管已经提出了使用大型队列研究和机器学习算法的数据建立各种疾病的诊断和预测模型，但它们的普适性仍存在挑战。已经指出了这种挑战的几个原因，其中考虑其中之一的原因是使用随机性对数据集进行划分。在这项研究中，我们使用autoML（自动机器学习框架）和开放性糖尿病数据构建了33,600个糖尿病诊断模型，并评估了它们的预测准确性。结果显示，预测准确性具有一个依赖于初始状态的分布。由于这种分布可能遵循正态分布，我们使用统计区间估计来估计预测准确性的期望区间，以便公平比较预测模型的准确性。

更新时间: 2024-09-02 08:05:13

领域: cs.LG

下载: http://arxiv.org/abs/2409.01025v1

Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems

Artificial Intelligence (AI) has paved the way for revolutionary decision-making processes, which if harnessed appropriately, can contribute to advancements in various sectors, from healthcare to economics. However, its black box nature presents significant ethical challenges related to bias and transparency. AI applications are hugely impacted by biases, presenting inconsistent and unreliable findings, leading to significant costs and consequences, highlighting and perpetuating inequalities and unequal access to resources. Hence, developing safe, reliable, ethical, and Trustworthy AI systems is essential. Our team of researchers working with Trustworthy and Responsible AI, part of the Transdisciplinary Scholarship Initiative within the University of Calgary, conducts research on Trustworthy and Responsible AI, including fairness, bias mitigation, reproducibility, generalization, interpretability, and authenticity. In this paper, we review and discuss the intricacies of AI biases, definitions, methods of detection and mitigation, and metrics for evaluating bias. We also discuss open challenges with regard to the trustworthiness and widespread application of AI across diverse domains of human-centric decision making, as well as guidelines to foster Responsible and Trustworthy AI models.

Updated: 2024-09-02 07:55:45

标题: 可信赖和负责任的人类中心自主决策系统的人工智能

摘要: 人工智能（AI）为革命性决策过程铺平了道路，如果适当利用，可以促进从医疗保健到经济各个领域的进步。然而，其黑匣子性质存在与偏见和透明度相关的重大伦理挑战。AI应用受偏见影响巨大，呈现出不一致和不可靠的发现，导致重大成本和后果，突显和持续不平等和资源不均等的情况。因此，发展安全、可靠、符合伦理和值得信赖的AI系统是至关重要的。我们的研究团队与卡尔加里大学跨学科奖学金计划中的值得信赖和负责任的AI合作，进行值得信赖和负责任的AI研究，包括公平性、偏见缓解、可重现性、泛化、可解释性和真实性。在本文中，我们审查和讨论AI偏见的复杂性、定义、检测和缓解方法，以及评估偏见的指标。我们还讨论了关于AI在人类中心决策各领域的值得信赖和广泛应用的开放挑战，以及促进负责任和值得信赖的AI模型的指南。

更新时间: 2024-09-02 07:55:45

领域: cs.AI

下载: http://arxiv.org/abs/2408.15550v2

From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model

We explore Bird's-Eye View (BEV) generation, converting a BEV map into its corresponding multi-view street images. Valued for its unified spatial representation aiding multi-sensor fusion, BEV is pivotal for various autonomous driving applications. Creating accurate street-view images from BEV maps is essential for portraying complex traffic scenarios and enhancing driving algorithms. Concurrently, diffusion-based conditional image generation models have demonstrated remarkable outcomes, adept at producing diverse, high-quality, and condition-aligned results. Nonetheless, the training of these models demands substantial data and computational resources. Hence, exploring methods to fine-tune these advanced models, like Stable Diffusion, for specific conditional generation tasks emerges as a promising avenue. In this paper, we introduce a practical framework for generating images from a BEV layout. Our approach comprises two main components: the Neural View Transformation and the Street Image Generation. The Neural View Transformation phase converts the BEV map into aligned multi-view semantic segmentation maps by learning the shape correspondence between the BEV and perspective views. Subsequently, the Street Image Generation phase utilizes these segmentations as a condition to guide a fine-tuned latent diffusion model. This finetuning process ensures both view and style consistency. Our model leverages the generative capacity of large pretrained diffusion models within traffic contexts, effectively yielding diverse and condition-coherent street view images.

Updated: 2024-09-02 07:47:16

标题: 从鸟瞰到街景：利用潜在扩散模型塑造多样化和与条件对齐的图像

摘要: 我们探讨了鸟瞰视图（BEV）生成，将BEV地图转换为其对应的多视角街道图像。由于其统一的空间表示有助于多传感器融合，BEV对于各种自动驾驶应用至关重要。从BEV地图中创建准确的街景图像对于描绘复杂的交通场景和增强驾驶算法至关重要。同时，基于扩散的条件图像生成模型已经展示出卓越的成果，擅长生成多样化、高质量和与条件一致的结果。然而，这些模型的训练需要大量的数据和计算资源。因此，探索方法来对这些先进模型进行微调，如稳定扩散，用于特定的条件生成任务，是一条有前景的途径。在本文中，我们介绍了一个从BEV布局生成图像的实用框架。我们的方法包括两个主要组件：神经视图转换和街景图像生成。神经视图转换阶段通过学习BEV和透视视图之间的形状对应关系，将BEV地图转换为对齐的多视角语义分割地图。随后，街景图像生成阶段利用这些分割作为条件来引导一个经过微调的潜在扩散模型。这个微调过程确保了视图和风格的一致性。我们的模型利用了大型预训练扩散模型在交通背景下的生成能力，有效地产生多样化和与条件一致的街景图像。

更新时间: 2024-09-02 07:47:16

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.01014v1

SeCo-INR: Semantically Conditioned Implicit Neural Representations for Improved Medical Image Super-Resolution

Implicit Neural Representations (INRs) have recently advanced the field of deep learning due to their ability to learn continuous representations of signals without the need for large training datasets. Although INR methods have been studied for medical image super-resolution, their adaptability to localized priors in medical images has not been extensively explored. Medical images contain rich anatomical divisions that could provide valuable local prior information to enhance the accuracy and robustness of INRs. In this work, we propose a novel framework, referred to as the Semantically Conditioned INR (SeCo-INR), that conditions an INR using local priors from a medical image, enabling accurate model fitting and interpolation capabilities to achieve super-resolution. Our framework learns a continuous representation of the semantic segmentation features of a medical image and utilizes it to derive the optimal INR for each semantic region of the image. We tested our framework using several medical imaging modalities and achieved higher quantitative scores and more realistic super-resolution outputs compared to state-of-the-art methods.

Updated: 2024-09-02 07:45:06

标题: SeCo-INR: 用于改进医学图像超分辨率的语义条件隐式神经表示

摘要: Implicit Neural Representations (INRs)最近由于其无需大型训练数据集即可学习信号的连续表示能力而推动了深度学习领域的发展。尽管已经研究了INR方法用于医学图像超分辨率，但其在医学图像中本地先验的适应性尚未得到广泛探讨。医学图像包含丰富的解剖区分，这些区分可能提供有价值的本地先验信息，以增强INR的准确性和鲁棒性。在这项工作中，我们提出了一个新颖的框架，称为语义条件INR（SeCo-INR），它使用医学图像的本地先验来条件化INR，实现准确的模型拟合和插值能力，以实现超分辨率。我们的框架学习了医学图像的语义分割特征的连续表示，并利用它来为图像的每个语义区域推导出最佳的INR。我们使用了几种医学成像模式来测试我们的框架，并与最先进的方法相比，实现了更高的定量分数和更真实的超分辨率输出。

更新时间: 2024-09-02 07:45:06

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2409.01013v1

Improved Diversity-Promoting Collaborative Metric Learning for Recommendation

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a \textit{Diversity Control Regularization Scheme} (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require \textit{negative sampling} to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML. Code are available at \url{https://github.com/statusrank/LibCML}.

Updated: 2024-09-02 07:44:48

标题: 改进的促进多样性的协作度量学习用于推荐

摘要: 协同度量学习（CML）最近已成为推荐系统（RS）中一种流行的方法，弥合了度量学习和协同过滤之间的差距。遵循RS的惯例，现有的实践利用其模型设计中独特的用户表示。本文重点关注一个具有多个兴趣类别的用户的挑战性场景。在这种情况下，独特的用户表示可能会引起偏好偏向，特别是当物品类别分布不均衡时。为了解决这个问题，我们提出了一种名为“促进多样性的协同度量学习”（DPCML）的新方法，希望考虑用户通常被忽视的少数兴趣。DPCML的关键思想是在系统中为每个用户引入一组多重表示，其中用户对物品的偏好通过在其嵌入集中取最小的物品-用户距离来聚合。具体来说，我们实例化了两种有效的分配策略，以探索每个用户的适当数量的向量。同时，我们开发了一种“多样性控制正则化方案”（DCRS）以更好地适应多向量表示策略。从理论上讲，我们展示了DPCML可能引起比传统CML更小的泛化误差。此外，我们注意到基于CML的方法通常需要负采样来减少由其中的成对目标引起的沉重计算负担。在本文中，我们从一种One-Way Partial AUC（OPAUC）的角度揭示了广泛采用的硬注意采样的基本限制，然后为基于CML的范例开发了一种有效的采样替代方案。最后，对一系列基准数据集的全面实验证明了DPCML的有效性。代码可在\url{https://github.com/statusrank/LibCML}上找到。

更新时间: 2024-09-02 07:44:48

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.01012v1

Fitting trees to $\ell_1$-hyperbolic distances

Building trees to represent or to fit distances is a critical component of phylogenetic analysis, metric embeddings, approximation algorithms, geometric graph neural nets, and the analysis of hierarchical data. Much of the previous algorithmic work, however, has focused on generic metric spaces (i.e., those with no a priori constraints). Leveraging several ideas from the mathematical analysis of hyperbolic geometry and geometric group theory, we study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding. That is, we define a vector of hyperbolicity (ultrametric) values over all triples of points and compare the $\ell_p$ norms of this vector with the $\ell_q$ norm of the distortion of the best tree fit to the distances. This formulation allows us to define the average hyperbolicity (ultrametricity) in terms of a normalized $\ell_1$ norm of the hyperbolicity vector. Furthermore, we can interpret the classical tree fitting result of Gromov as a $p = q = \infty$ result. We present an algorithm HCCRootedTreeFit such that the $\ell_1$ error of the output embedding is analytically bounded in terms of the $\ell_1$ norm of the hyperbolicity vector (i.e., $p = q = 1$) and that this result is tight. Furthermore, this algorithm has significantly different theoretical and empirical performance as compared to Gromov's result and related algorithms. Finally, we show using HCCRootedTreeFit and related tree fitting algorithms, that supposedly standard data sets for hierarchical data analysis and geometric graph neural networks have radically different tree fits than those of synthetic, truly tree-like data sets, suggesting that a much more refined analysis of these standard data sets is called for.

Updated: 2024-09-02 07:38:32

标题: 将树适应于$\ell_1$-双曲距离

摘要: 构建树来表示或适应距离是系统发育分析、度量嵌入、近似算法、几何图神经网络和分层数据分析的关键组成部分。然而，先前的算法工作大多集中在通用度量空间（即没有先验约束的空间）上。利用超几何分析和几何群论的数学思想，我们将树拟合问题研究为在超度量（超度量）向量与树（超度量）嵌入误差之间寻找关系。也就是说，我们定义了一个超度量（超度量）值向量，涵盖所有三点，并将这个向量的$\ell_p$范数与最佳树拟合距离的失真的$\ell_q$范数进行比较。这个表述使我们能够以超度量向量的归一化$\ell_1$范数来定义平均超度量（超度量）。此外，我们可以将Gromov的经典树拟合结果解释为$p = q = \infty$的结果。我们提出了一种名为HCCRootedTreeFit的算法，使得输出嵌入的$\ell_1$误差在超度量向量的$\ell_1$范数方面被解析地限制（即$p = q = 1），并且这个结果是紧密的。此外，与Gromov的结果和相关算法相比，这个算法在理论和经验上有显著不同的表现。最后，我们通过使用HCCRootedTreeFit和相关的树拟合算法，展示了用于分层数据分析和几何图神经网络的标准数据集与合成的真正类似树状数据集之间树拟合有着根本不同，这表明需要对这些标准数据集进行更加精细的分析。

更新时间: 2024-09-02 07:38:32

领域: cs.DS,cs.LG,math.MG

下载: http://arxiv.org/abs/2409.01010v1

Unlocking the Wisdom of Large Language Models: An Introduction to The Path to Artificial General Intelligence

This booklet, "Unlocking the Wisdom of Large Language Models," serves as an introduction to the comprehensive work "The Path to Artificial General Intelligence." Through a series of nine aphorisms, we distill key insights and principles that underpin the larger exploration of AI's future through adversarial LLM dialogue. We propose this approach as a potential path to realizing artificial general intelligence (AGI). This booklet also includes the titles, abstracts, and introductions of the chapters in the main book, and presents the first two chapters in their entirety.

Updated: 2024-09-02 07:29:37

标题: 解锁大型语言模型的智慧：通往人工通用智能之路的介绍

摘要: 这本小册子，《解锁大型语言模型的智慧》，作为全面工作《通往人工通用智能之路》的介绍。通过一系列九个格言，我们提炼出支撑人工智能未来探索的关键见解和原则，通过对抗性LLM对话。我们提议这种方法作为实现人工通用智能（AGI）的潜在途径。这本小册子还包括主要书籍中各章的标题、摘要和介绍，并完整呈现了前两章。

更新时间: 2024-09-02 07:29:37

领域: cs.AI,I.2.7

下载: http://arxiv.org/abs/2409.01007v1

OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection

Recent studies have demonstrated the significant potential of Large Language Models (LLMs) in generating Register Transfer Level (RTL) code, with notable advancements showcased by commercial models such as GPT-4 and Claude3-Opus. However, these proprietary LLMs often raise concerns regarding privacy and security. While open-source LLMs offer solutions to these concerns, they typically underperform commercial models in RTL code generation tasks, primarily due to the scarcity of high-quality open-source RTL datasets. To address this challenge, we introduce OriGen , a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology for generating high-quality, large-scale RTL code. Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets. Furthermore, OriGen can rectify syntactic errors through a self-reflection process that leverages compiler feedback. Experimental results demonstrate that OriGen significantly outperforms other open-source alternatives in RTL code generation. It surpasses the previous best-performing open-source LLM by 12.8% and even exceeds GPT-4 Turbo in the pass@1 metric on the VerilogEval-Human benchmark. Moreover, OriGen exhibits superior capabilities in self-reflection and error correction, outperforming GPT-4 by 19.9% on a benchmark designed to evaluate self-reflection capabilities.

Updated: 2024-09-02 07:25:21

标题: OriGen：通过代码到代码增强和自我反思提升RTL代码生成

摘要: 最近的研究表明，大型语言模型（LLMs）在生成寄存器传输级（RTL）代码方面具有显著潜力，商业模型如GPT-4和Claude3-Opus展示了明显的进展。然而，这些专有的LLMs往往引发了关于隐私和安全性的担忧。虽然开源的LLMs提供了解决这些问题的方案，但它们在RTL代码生成任务中通常表现不佳，主要是因为高质量的开源RTL数据集稀缺。为了解决这一挑战，我们引入了OriGen，这是一个完全开源的框架，结合了自我反思能力和一种新颖的数据增强方法，用于生成高质量、大规模的RTL代码。我们的方法采用了一种代码到代码增强技术，以提高开源RTL代码数据集的质量。此外，OriGen可以通过利用编译器反馈进行自我反思过程来纠正语法错误。实验结果表明，OriGen在RTL代码生成方面明显优于其他开源替代方案。它比以前表现最佳的开源LLM高出12.8%，甚至在VerilogEval-Human基准测试中甚至超过了GPT-4 Turbo的pass@1指标。此外，OriGen在自我反思和错误纠正方面表现出卓越的能力，在设计用于评估自我反思能力的基准测试中，比GPT-4高出19.9%。

更新时间: 2024-09-02 07:25:21

领域: cs.AR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.16237v2

Physics-informed DeepONet with stiffness-based loss functions for structural response prediction

Finite element modeling is a well-established tool for structural analysis, yet modeling complex structures often requires extensive pre-processing, significant analysis effort, and considerable time. This study addresses this challenge by introducing an innovative method for real-time prediction of structural static responses using DeepOnet which relies on a novel approach to physics-informed networks driven by structural balance laws. This approach offers the flexibility to accurately predict responses under various load classes and magnitudes. The trained DeepONet can generate solutions for the entire domain, within a fraction of a second. This capability effectively eliminates the need for extensive remodeling and analysis typically required for each new case in FE modeling. We apply the proposed method to two structures: a simple 2D beam structure and a comprehensive 3D model of a real bridge. To predict multiple variables with DeepONet, we utilize two strategies: a split branch/trunk and multiple DeepONets combined into a single DeepONet. In addition to data-driven training, we introduce a novel physics-informed training approaches. This method leverages structural stiffness matrices to enforce fundamental equilibrium and energy conservation principles, resulting in two novel physics-informed loss functions: energy conservation and static equilibrium using the Schur complement. We use various combinations of loss functions to achieve an error rate of less than 5% with significantly reduced training time. This study shows that DeepONet, enhanced with hybrid loss functions, can accurately and efficiently predict displacements and rotations at each mesh point, with reduced training time.

Updated: 2024-09-02 07:19:47

标题: 使用基于刚度的损失函数的物理知识引导的DeepONet用于结构响应预测

摘要: 有限元建模是结构分析的一种成熟工具，然而建模复杂结构通常需要大量的预处理、重要的分析工作和相当长的时间。本研究通过引入一种创新方法来解决这一挑战，实时预测结构静态响应，该方法依赖于DeepOnet，采用一种基于结构平衡定律驱动的物理信息网络的新方法。这种方法能够灵活地准确预测各种载荷类别和大小下的响应。经过训练的DeepONet能够在几秒内为整个域生成解决方案。这种能力有效地消除了有限元建模中通常需要为每个新案例进行的大量重建和分析的需求。我们将提出的方法应用于两种结构：一个简单的2D梁结构和一个真实桥梁的全面3D模型。为了使用DeepONet预测多个变量，我们采用两种策略：分支/主干和多个DeepONet组合成单个DeepONet。除了数据驱动的训练，我们还引入了一种新颖的基于物理信息的训练方法。该方法利用结构刚度矩阵来强制执行基本的平衡和能量守恒原则，从而产生两种新颖的基于物理信息的损失函数：能量守恒和使用舒尔补充的静态平衡。我们使用各种损失函数的组合来实现低于5%的错误率，同时大大减少训练时间。本研究表明，通过混合损失函数增强的DeepONet能够准确高效地预测每个网格点的位移和旋转，同时减少训练时间。

更新时间: 2024-09-02 07:19:47

领域: cs.LG,cs.CE

下载: http://arxiv.org/abs/2409.00994v1

Biometrics and Behavior Analysis for Detecting Distractions in e-Learning

In this article, we explore computer vision approaches to detect abnormal head pose during e-learning sessions and we introduce a study on the effects of mobile phone usage during these sessions. We utilize behavioral data collected from 120 learners monitored while participating in a MOOC learning sessions. Our study focuses on the influence of phone-usage events on behavior and physiological responses, specifically attention, heart rate, and meditation, before, during, and after phone usage. Additionally, we propose an approach for estimating head pose events using images taken by the webcam during the MOOC learning sessions to detect phone-usage events. Our hypothesis suggests that head posture undergoes significant changes when learners interact with a mobile phone, contrasting with the typical behavior seen when learners face a computer during e-learning sessions. We propose an approach designed to detect deviations in head posture from the average observed during a learner's session, operating as a semi-supervised method. This system flags events indicating alterations in head posture for subsequent human review and selection of mobile phone usage occurrences with a sensitivity over 90%.

Updated: 2024-09-02 07:18:16

标题: 生物识别和行为分析用于检测电子学习中的干扰

摘要: 在这篇文章中，我们探讨了计算机视觉方法来检测在线学习会话中异常头部姿势，并介绍了一项关于移动电话使用对这些会话影响的研究。我们利用从监测参与MOOC学习会话的120名学习者收集的行为数据。我们的研究侧重于电话使用事件对行为和生理反应（特别是注意力、心率和冥想）的影响，包括电话使用前、期间和后。此外，我们提出了一种方法，利用在MOOC学习会话期间由网络摄像头拍摄的图像来检测电话使用事件中的头部姿势事件。我们的假设表明，当学习者与移动电话互动时，头部姿势会发生显著变化，与学习者在面对电脑进行在线学习会话时所见到的典型行为形成对比。我们提出了一种旨在检测头部姿势与学习者会话期间观察到的平均值有偏差的方法，作为一种半监督方法。该系统标记出表明头部姿势变化的事件，供随后人工审查和选择具有超过90%敏感度的移动电话使用事件。

更新时间: 2024-09-02 07:18:16

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.15434v3

VAAD: Visual Attention Analysis Dashboard applied to e-Learning

In this paper, we present an approach in the Multimodal Learning Analytics field. Within this approach, we have developed a tool to visualize and analyze eye movement data collected during learning sessions in online courses. The tool is named VAAD, an acronym for Visual Attention Analysis Dashboard. These eye movement data have been gathered using an eye-tracker and subsequently processed and visualized for interpretation. The purpose of the tool is to conduct a descriptive analysis of the data by facilitating its visualization, enabling the identification of differences and learning patterns among various learner populations. Additionally, it integrates a predictive module capable of anticipating learner activities during a learning session. Consequently, VAAD holds the potential to offer valuable insights into online learning behaviors from both descriptive and predictive perspectives.

Updated: 2024-09-02 07:15:02

标题: VAAD:应用于电子学习的视觉注意力分析仪表板

摘要: 在这篇论文中，我们提出了一种在多模态学习分析领域的方法。在这种方法中，我们开发了一种工具，用于可视化和分析在线课程学习过程中收集的眼动数据。该工具名为VAAD，是Visual Attention Analysis Dashboard的缩写。这些眼动数据是使用眼动追踪器收集的，随后经过处理和可视化以进行解释。该工具的目的是通过促进数据的可视化，实现对数据的描述性分析，从而能够识别不同学习人群之间的差异和学习模式。此外，它还集成了一个能够预测学习者在学习过程中活动的模块。因此，VAAD有潜力从描述性和预测性的角度为在线学习行为提供有价值的见解。

更新时间: 2024-09-02 07:15:02

领域: cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.20091v4

3D Priors-Guided Diffusion for Blind Face Restoration

Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement.Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration.

Updated: 2024-09-02 07:13:32

标题: 盲目人脸修复的3D先验引导扩散

摘要: 盲目脸部恢复旨在从降级的图像中恢复清晰的脸部图像。最近采用生成对抗网络（GANs）作为先验的方法在这一领域取得了显著成功。然而，这些方法在实现逼真度和保真度之间的平衡方面遇到了挑战，特别是在复杂的降级场景中。为了继承扩散模型杰出的逼真生成能力，同时受到身份感知保真度的限制，我们提出了一种通过将3D面部先验作为结构和身份约束嵌入到去噪扩散过程中的新型扩散框架。具体地，为了获得更准确的3D先验表示，3D面部图像通过使用经过预训练的恢复网络处理的初始恢复脸部图像重建为3D可变模型（3DMM）。采用定制的多级特征提取方法来利用3D面部图像的结构和身份信息，然后将其映射到噪声估计过程中。为了增强身份信息融入噪声估计的效果，我们提出了一个时间感知融合块（TAFB）。该模块提供了更高效和自适应的权重融合以进行去噪，考虑到扩散模型中去噪过程的动态性质，该过程涉及初始结构的细化，然后是纹理细节的增强。广泛的实验证明，我们的网络在合成和真实世界数据集上对盲目脸部恢复的最新算法表现出色。

更新时间: 2024-09-02 07:13:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.00991v1

From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning

Accurate prediction of protein-ligand binding affinities is an essential challenge in structure-based drug design. Despite recent advances in data-driven methods for affinity prediction, their accuracy is still limited, partially because they only take advantage of static crystal structures while the actual binding affinities are generally determined by the thermodynamic ensembles between proteins and ligands. One effective way to approximate such a thermodynamic ensemble is to use molecular dynamics (MD) simulation. Here, an MD dataset containing 3,218 different protein-ligand complexes is curated, and Dynaformer, a graph-based deep learning model is further developed to predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories. In silico experiments demonstrated that the model exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset, outperforming the methods hitherto reported. Moreover, in a virtual screening on heat shock protein 90 (HSP90) using Dynaformer, 20 candidates are identified and their binding affinities are further experimentally validated. Dynaformer displayed promising results in virtual drug screening, revealing 12 hit compounds (two are in the submicromolar range), including several novel scaffolds. Overall, these results demonstrated that the approach offer a promising avenue for accelerating the early drug discovery process.

Updated: 2024-09-02 07:10:37

标题: 从静态到动态结构：利用基于图的深度学习提高结合亲和力预测

摘要: 准确预测蛋白质-配体结合亲和力是结构基药物设计中一个关键挑战。尽管数据驱动方法在亲和力预测方面取得了一些进展，但它们的准确性仍然有限，部分原因是它们仅利用静态晶体结构，而实际的结合亲和力通常由蛋白质和配体之间的热力学集合确定。近似这种热力学集合的一个有效方法是使用分子动力学（MD）模拟。在这里，筛选了一个包含3,218个不同蛋白质-配体复合物的MD数据集，并进一步发展了一种基于图的深度学习模型Dynaformer，通过从MD轨迹中学习蛋白质-配体相互作用的几何特征来预测结合亲和力。通过体内实验表明，该模型在CASF-2016基准数据集上展现出了最先进的评分和排名能力，超过了迄今报道的方法。此外，在使用Dynaformer对热休克蛋白90（HSP90）进行虚拟筛选时，鉴定了20个候选物，并进一步通过实验证实了它们的结合亲和力。Dynaformer在虚拟药物筛选中展示出了良好的结果，揭示了12个命中化合物（其中两个处于亚微摩尔范围），包括几种新颖的骨架。总的来说，这些结果表明该方法为加快早期药物发现过程提供了一个有前途的途径。

更新时间: 2024-09-02 07:10:37

领域: q-bio.BM,cs.LG,physics.chem-ph,q-bio.QM

下载: http://arxiv.org/abs/2208.10230v4

Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces

Online question-and-answer (Q\&A) systems based on the Large Language Model (LLM) have progressively diverged from recreational to professional use. This paper proposed a Multi-Agent framework with environmentally reinforcement learning (E-RL) for code correction called Code Learning (Co-Learning) community, assisting beginners to correct code errors independently. It evaluates the performance of multiple LLMs from an original dataset with 702 error codes, uses it as a reward or punishment criterion for E-RL; Analyzes input error codes by the current agent; selects the appropriate LLM-based agent to achieve optimal error correction accuracy and reduce correction time. Experiment results showed that 3\% improvement in Precision score and 15\% improvement in time cost as compared with no E-RL method respectively. Our source code is available at: https://github.com/yuqian2003/Co_Learning

Updated: 2024-09-02 07:03:22

标题: 合作学习：具有会话自然语言界面的多agent强化协作框架的代码学习

摘要: 基于大型语言模型（LLM）的在线问答系统从娱乐性用途逐渐转向专业化应用。本文提出了一种基于环境强化学习（E-RL）的多智能体框架，用于代码纠正，称为代码学习（Co-Learning）社区，帮助初学者独立纠正代码错误。它评估了多个LLM在原始数据集中702个错误代码上的表现，将其作为E-RL的奖励或惩罚标准；分析当前智能体的输入错误代码；选择适当的基于LLM的智能体以实现最佳错误校正准确性并减少校正时间。实验结果显示，与无E-RL方法相比，精度分数提高了3％，时间成本降低了15％。我们的源代码可在以下链接找到：https://github.com/yuqian2003/Co_Learning

更新时间: 2024-09-02 07:03:22

领域: cs.SE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.00985v1

Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability

As the adoption of Artificial Intelligence (AI) models expands into critical real-world applications, ensuring the explainability of these models becomes paramount, particularly in sensitive fields such as medicine and finance. Linear Discriminant Analysis (LDA) remains a popular choice for classification due to its interpretable nature, derived from its capacity to model class distributions and enhance class separation through linear combinations of features. However, real-world datasets often suffer from incomplete data, posing substantial challenges for both classification accuracy and model interpretability. In this paper, we introduce a novel and robust classification method, termed Weighted missing Linear Discriminant Analysis (WLDA), which extends LDA to handle datasets with missing values without the need for imputation. Our approach innovatively incorporates a weight matrix that penalizes missing entries, thereby refining parameter estimation directly on incomplete data. This methodology not only preserves the interpretability of LDA but also significantly enhances classification performance in scenarios plagued by missing data. We conduct an in-depth theoretical analysis to establish the properties of WLDA and thoroughly evaluate its explainability. Experimental results across various datasets demonstrate that WLDA consistently outperforms traditional methods, especially in challenging environments where missing values are prevalent in both training and test datasets. This advancement provides a critical tool for improving classification accuracy and maintaining model transparency in the face of incomplete data.

Updated: 2024-09-02 07:01:31

标题: 直接处理线性判别分析中的缺失数据以提高分类准确性和解释性

摘要: 随着人工智能（AI）模型在关键实际应用中的采用不断扩大，确保这些模型的可解释性变得至关重要，特别是在医学和金融等敏感领域。线性判别分析（LDA）由于其可解释的特性仍然是分类的热门选择，这种可解释性源于其能够通过特征的线性组合来建模类分布并增强类别之间的分离。然而，现实世界中的数据集经常存在不完整数据，这给分类准确性和模型可解释性带来了重大挑战。在本文中，我们介绍了一种新颖而强大的分类方法，称为加权缺失线性判别分析（WLDA），它扩展了LDA以处理具有缺失值的数据集，而无需进行插补。我们的方法创新地将一个权重矩阵纳入其中，惩罚缺失的条目，从而直接在不完整数据上优化参数估计。这种方法不仅保留了LDA的可解释性，还显著提高了在存在缺失数据困扰的情况下的分类性能。我们进行了深入的理论分析，以建立WLDA的属性，并全面评估其可解释性。跨越各种数据集的实验结果表明，WLDA在各种环境中始终优于传统方法，特别是在训练和测试数据集中普遍存在缺失值的挑战环境中。这一进步为改善分类准确性并在面对不完整数据时保持模型透明度提供了一个关键工具。

更新时间: 2024-09-02 07:01:31

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.00710v2

Recognition of Schrodinger cat state based on CNN

We applied convolutional neural networks to the classification of cat states and coherent states. Initially, we generated datasets of Schrodinger cat states and coherent states from nonlinear processes and preprocessed these datasets. Subsequently, we constructed both LeNet and ResNet network architectures, adjusting parameters such as convolution kernels and strides to optimal values. We then trained both LeNet and ResNet on the training sets. The loss function values indicated that ResNet performs better in classifying cat states and coherent states. Finally, we evaluated the trained models on the test sets, achieving an accuracy of 97.5% for LeNet and 100% for ResNet. We evaluated cat states and coherent states with different {\alpha}, demonstrating a certain degree of generalization capability. The results show that LeNet may mistakenly recognize coherent states as cat states without coherent features, while ResNet provides a feasible solution to the problem of mistakenly recognizing cat states and coherent states by traditional neural networks.

Updated: 2024-09-02 06:55:14

标题: 基于卷积神经网络的薛定谔猫态识别

摘要: 我们将卷积神经网络应用于猫态和相干态的分类。首先，我们从非线性过程中生成了薛定谔猫态和相干态的数据集，并对这些数据集进行了预处理。随后，我们构建了LeNet和ResNet网络架构，调整了卷积核和步幅等参数至最佳值。然后，我们对训练集上的LeNet和ResNet进行了训练。损失函数值表明，ResNet在分类猫态和相干态方面表现更好。最后，我们在测试集上评估了训练模型，LeNet的准确率为97.5%，ResNet为100%。我们评估了具有不同α的猫态和相干态，展示了一定程度的泛化能力。结果显示，LeNet可能会误将没有相干特征的相干态误认为猫态，而ResNet为传统神经网络误认识猫态和相干态问题提供了可行解决方案。

更新时间: 2024-09-02 06:55:14

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2409.02132v1

ERATTA: Extreme RAG for Table To Answers with Large Language Models

Large language models (LLMs) with retrieval augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. Although RAG implemented with AI agents (agentic-RAG) has been recently popularized, its suffers from unstable cost and unreliable performances for Enterprise-level data-practices. Most existing use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user-query routing, data-retrieval and custom prompting for question-answering capabilities from Enterprise-data tables. The source tables here are highly fluctuating and large in size and the proposed framework enables structured responses in under 10 seconds per query. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.

Updated: 2024-09-02 06:51:36

标题: 勘误：用于大型语言模型表格答案的ERATTA极端RAG

摘要: 最近，具有检索增强生成（RAG）功能的大型语言模型（LLMs）已成为可扩展生成式人工智能解决方案的最佳选择。尽管最近流行的具有人工智能代理的RAG实现（agentic-RAG），但其成本不稳定且性能不可靠，适用于企业级数据实践。大多数现有用例将RAG与LLMs结合使用，要么是通用的，要么是极度领域特定的，因此对RAG-LLM方法的可扩展性和普适性提出了质疑。在这项工作中，我们提出了一个独特的基于LLM的系统，可以调用多个LLM来实现数据认证、用户查询路由、数据检索以及来自企业数据表的问题回答能力的定制提示。这里的源表格波动很大且规模庞大，所提出的框架可以在不到10秒的时间内提供结构化响应。此外，我们提出了一个五项指标评分模块，用于检测并报告LLM响应中的幻觉。我们提出的系统和评分指标在可持续性、财务健康和社交媒体领域的数百个用户查询中实现了超过90%的置信度分数。对所提出的极端RAG架构的扩展可以使用LLMs实现异构源查询。

更新时间: 2024-09-02 06:51:36

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.03963v3

Regret Analysis for Randomized Gaussian Process Upper Confidence Bound

Gaussian process upper confidence bound (GP-UCB) is a theoretically established algorithm for Bayesian optimization (BO), where we assume the objective function $f$ follows GP. One notable drawback of GP-UCB is that the theoretical confidence parameter $\beta$ increased along with the iterations is too large. To alleviate this drawback, this paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB (IRGP-UCB), which uses the confidence parameter generated from the shifted exponential distribution. We analyze the expected regret and conditional expected regret, where the expectation and the probability are taken respectively with $f$ and noises and with the randomness of the BO algorithm. In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite. Finally, we show numerical experiments using synthetic and benchmark functions and real-world emulators.

Updated: 2024-09-02 06:49:29

标题: 随机高斯过程上置信区间的后悔分析

摘要: 高斯过程上置信界（GP-UCB）是贝叶斯优化（BO）的一个理论上建立的算法，其中我们假设目标函数$f$遵循GP。GP-UCB的一个显著缺点是随着迭代次数增加，理论上的置信参数$\beta$过大。为了缓解这一缺点，本文分析了改进的随机化GP-UCB（IRGP-UCB）的随机变体，该算法使用来自移位指数分布的置信参数。我们分析了期望遗憾和条件期望遗憾，其中期望和概率分别考虑了$f$和噪声以及BO算法的随机性。在这两种遗憾分析中，如果输入域是有限的，IRGP-UCB可以实现次线性遗憾上界，而不会增加置信参数。最后，我们使用合成和基准函数以及真实世界仿真器展示了数值实验。

更新时间: 2024-09-02 06:49:29

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.00979v1

Enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications

Deploying federated learning (FL) in real-world scenarios, particularly in healthcare, poses challenges in communication and security. In particular, with respect to the federated aggregation procedure, researchers have been focusing on the study of secure aggregation (SA) schemes to provide privacy guarantees over the model's parameters transmitted by the clients. Nevertheless, the practical availability of SA in currently available FL frameworks is currently limited, due to computational and communication bottlenecks. To fill this gap, this study explores the implementation of SA within the open-source Fed-BioMed framework. We implement and compare two SA protocols, Joye-Libert (JL) and Low Overhead Masking (LOM), by providing extensive benchmarks in a panel of healthcare data analysis problems. Our theoretical and experimental evaluations on four datasets demonstrate that SA protocols effectively protect privacy while maintaining task accuracy. Computational overhead during training is less than 1% on a CPU and less than 50% on a GPU for large models, with protection phases taking less than 10 seconds. Incorporating SA into Fed-BioMed impacts task accuracy by no more than 2% compared to non-SA scenarios. Overall this study demonstrates the feasibility of SA in real-world healthcare applications and contributes in reducing the gap towards the adoption of privacy-preserving technologies in sensitive applications.

Updated: 2024-09-02 06:43:22

标题: 在联邦学习中增强隐私保护：用于现实世界医疗应用的安全聚合

摘要: 在实际场景中部署联邦学习（FL），尤其是在医疗保健领域，存在通信和安全方面的挑战。特别是在联邦聚合过程中，研究人员一直在关注研究安全聚合（SA）方案，以提供对客户传输的模型参数的隐私保证。然而，目前可用的FL框架中SA的实际可用性受到计算和通信瓶颈的限制。为了填补这一空白，本研究探讨了在开源Fed-BioMed框架中实现SA的可能性。我们实现并比较了两种SA协议，Joye-Libert（JL）和Low Overhead Masking（LOM），通过在一系列医疗数据分析问题中提供广泛的基准测试。我们在四个数据集上的理论和实验评估表明，SA协议在保护隐私的同时保持任务准确性。在训练过程中的计算开销在CPU上不到1％，在GPU上不到50％对于大型模型，保护阶段不到10秒。将SA整合到Fed-BioMed中与非SA情景相比，对任务准确性的影响不超过2％。总体而言，本研究表明了在实际医疗应用中使用SA的可行性，并有助于缩小在敏感应用中采用保护隐私技术的差距。

更新时间: 2024-09-02 06:43:22

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2409.00974v1

Pedestrian Attribute Recognition via CLIP based Prompt Vision-Language Fusion

Existing pedestrian attribute recognition (PAR) algorithms adopt pre-trained CNN (e.g., ResNet) as their backbone network for visual feature learning, which might obtain sub-optimal results due to the insufficient employment of the relations between pedestrian images and attribute labels. In this paper, we formulate PAR as a vision-language fusion problem and fully exploit the relations between pedestrian images and attribute labels. Specifically, the attribute phrases are first expanded into sentences, and then the pre-trained vision-language model CLIP is adopted as our backbone for feature embedding of visual images and attribute descriptions. The contrastive learning objective connects the vision and language modalities well in the CLIP-based feature space, and the Transformer layers used in CLIP can capture the long-range relations between pixels. Then, a multi-modal Transformer is adopted to fuse the dual features effectively and feed-forward network is used to predict attributes. To optimize our network efficiently, we propose the region-aware prompt tuning technique to adjust very few parameters (i.e., only the prompt vectors and classification heads) and fix both the pre-trained VL model and multi-modal Transformer. Our proposed PAR algorithm only adjusts 0.75% learnable parameters compared with the fine-tuning strategy. It also achieves new state-of-the-art performance on both standard and zero-shot settings for PAR, including RAPv1, RAPv2, WIDER, PA100K, and PETA-ZS, RAP-ZS datasets. The source code and pre-trained models will be released on https://github.com/Event-AHU/OpenPAR.

Updated: 2024-09-02 06:37:04

标题: 通过基于CLIP的提示视觉-语言融合实现的行人属性识别

摘要: 现有的行人属性识别（PAR）算法采用预训练的CNN（例如ResNet）作为视觉特征学习的骨干网络，由于对行人图像和属性标签之间关系的不足利用，可能会获得次优结果。本文将PAR形式化为一个视觉-语言融合问题，并充分利用行人图像和属性标签之间的关系。具体而言，首先将属性短语扩展为句子，然后采用预训练的视觉-语言模型CLIP作为我们的特征嵌入的骨干，用于视觉图像和属性描述的特征嵌入。对比学习目标在基于CLIP的特征空间中很好地连接了视觉和语言模态，而CLIP中使用的Transformer层可以捕捉像素之间的远程关系。然后，采用多模态Transformer有效融合双重特征，并使用前馈网络来预测属性。为了有效优化我们的网络，我们提出了区域感知提示调整技术，调整极少量参数（即只调整提示向量和分类头），并固定预训练的VL模型和多模态Transformer。与微调策略相比，我们提出的PAR算法仅调整了0.75％的可学习参数。它还在PAR的标准和零样本设置中取得了新的最先进性能，包括RAPv1、RAPv2、WIDER、PA100K和PETA-ZS、RAP-ZS数据集。源代码和预训练模型将在https://github.com/Event-AHU/OpenPAR上发布。

更新时间: 2024-09-02 06:37:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2312.10692v2

Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities

Imaging techniques such as Chest X-rays, whole slide images, and optical coherence tomography serve as the initial screening and detection for a wide variety of medical pulmonary and ophthalmic conditions respectively. This paper investigates the intricacies of using pretrained deep convolutional neural networks with transfer learning across diverse medical imaging datasets with varying modalities for binary and multiclass classification. We conducted a comprehensive performance analysis with ten network architectures and model families each with pretraining and random initialization. Our finding showed that the use of pretrained models as fixed feature extractors yields poor performance irrespective of the datasets. Contrary, histopathology microscopy whole slide images have better performance. It is also found that deeper and more complex architectures did not necessarily result in the best performance. This observation implies that the improvements in ImageNet are not parallel to the medical imaging tasks. Within a medical domain, the performance of the network architectures varies within model families with shifts in datasets. This indicates that the performance of models within a specific modality may not be conclusive for another modality within the same domain. This study provides a deeper understanding of the applications of deep learning techniques in medical imaging and highlights the impact of pretrained networks across different medical imaging datasets under five different experimental settings.

Updated: 2024-09-02 06:31:48

标题: 疾病分类及预训练深度卷积神经网络对不同医学影像数据集跨影像模态的影响

摘要: 各种成像技术，如胸部X射线、全切片图像和光学相干断层扫描分别用作广泛医学肺部和眼科疾病的初步筛查和检测。本文研究了在不同医学成像数据集上使用预训练的深度卷积神经网络进行迁移学习以进行二元和多类分类的复杂性。我们对十种网络架构和模型家族分别进行了全面性能分析，每种都有预训练和随机初始化。我们的研究结果显示，使用预训练模型作为固定特征提取器会导致性能较差，无论数据集如何。相反，组织病理学显微镜全切片图像表现出更好的性能。同时，更深和更复杂的架构并不一定导致最佳性能。这一观察结果表明，ImageNet上的改进并不与医学成像任务平行。在医学领域内，网络架构的性能在模型家族中随着数据集的变化而变化。这表明，在特定模态下模型的性能可能对同一领域内另一个模态的结论并不具有决定性。本研究深入了解了深度学习技术在医学成像中的应用，并强调了在五种不同实验设置下预训练网络在不同医学成像数据集上的影响。

更新时间: 2024-09-02 06:31:48

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.17011v2

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the PeRFlow models inherit knowledge from the pretrained diffusion models. Thus, the training converges fast and the obtained models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. Codes for training and inference are publicly released. https://github.com/magic-research/piecewise-rectified-flow

Updated: 2024-09-02 06:27:05

标题: PeRFlow：分段修正流作为通用即插即用加速器

摘要: 我们提出了Piecewise Rectified Flow（PeRFlow），一种基于流的方法，用于加速扩散模型。PeRFlow将生成流的采样过程分成几个时间窗口，并通过回流操作在每个间隔中拉直轨迹，从而接近分段线性流。PeRFlow在几步生成中表现出卓越的性能。此外，通过专门的参数化，PeRFlow模型继承了预训练扩散模型的知识。因此，训练收敛快，获得的模型具有有利的转移能力，可作为通用即插即用加速器，与基于预训练扩散模型的各种工作流兼容。用于训练和推断的代码已公开发布。https://github.com/magic-research/piecewise-rectified-flow

更新时间: 2024-09-02 06:27:05

领域: cs.LG

下载: http://arxiv.org/abs/2405.07510v5

Instant Adversarial Purification with Adversarial Consistency Distillation

Neural networks, despite their remarkable performance in widespread applications, including image classification, are also known to be vulnerable to subtle adversarial noise. Although some diffusion-based purification methods have been proposed, for example, DiffPure, those methods are time-consuming. In this paper, we propose One Step Control Purification (OSCP), a diffusion-based purification model that can purify the adversarial image in one Neural Function Evaluation (NFE) in diffusion models. We use Latent Consistency Model (LCM) and ControlNet for our one-step purification. OSCP is computationally friendly and time efficient compared to other diffusion-based purification methods; we achieve defense success rate of 74.19\% on ImageNet, only requiring 0.1s for each purification. Moreover, there is a fundamental incongruence between consistency distillation and adversarial perturbation. To address this ontological dissonance, we propose Gaussian Adversarial Noise Distillation (GAND), a novel consistency distillation framework that facilitates a more nuanced reconciliation of the latent space dynamics, effectively bridging the natural and adversarial manifolds. Our experiments show that the GAND does not need a Full Fine Tune (FFT); PEFT, e.g., LoRA is sufficient.

Updated: 2024-09-02 06:25:09

标题: 使用对抗一致性提炼进行即时对抗净化

摘要: 神经网络，尽管在广泛的应用中表现出卓越的性能，包括图像分类，但也被认为对微小的对抗性噪音具有脆弱性。尽管已经提出了一些基于扩散的净化方法，例如DiffPure，但这些方法非常耗时。在本文中，我们提出了一种基于扩散的净化模型One Step Control Purification (OSCP)，它可以在扩散模型中的一个神经函数评估(NFE)中净化对抗性图像。我们使用Latent Consistency Model (LCM)和ControlNet来进行一步净化。与其他基于扩散的净化方法相比，OSCP在计算上更友好，时间效率更高；在ImageNet上我们实现了74.19\%的防御成功率，每次净化仅需0.1秒。此外，一致性提炼和对抗性扰动之间存在根本的不一致。为了解决这种本体上的不和谐，我们提出了一种新颖的一致性提炼框架——高斯对抗性噪音提炼(GAND)，这有助于更细致地调和潜在空间动态，有效地连接了自然和对抗性流形。我们的实验表明，GAND并不需要全面微调(FFT)；PEFT，例如LoRA已经足够。

更新时间: 2024-09-02 06:25:09

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.17064v2

Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning

The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS. In this paper, we propose a novel end-to-end Deep Reinforcement Learning (DRL) method. We model the IPPS problem as a Markov Decision Process (MDP) and employ a Heterogeneous Graph Neural Network (GNN) to capture the complex relationships among operations, machines, and jobs. To optimize the scheduling strategy, we use Proximal Policy Optimization (PPO). Experimental results show that, compared to traditional methods, our approach significantly improves solution efficiency and quality in large-scale IPPS instances, providing superior scheduling strategies for modern intelligent manufacturing systems.

Updated: 2024-09-02 06:18:30

标题: 通过基于图神经网络的深度强化学习解决集成工艺规划和调度问题

摘要: 集成工艺规划和调度（IPPS）问题结合了工艺路线规划和车间调度，以实现制造业的高效率和最大资源利用，这对于现代制造系统至关重要。传统方法使用混合整数线性规划（MILP）和启发式算法在解决IPPS时不能很好地平衡解决方案质量和速度。在本文中，我们提出了一种新颖的端到端深度强化学习（DRL）方法。我们将IPPS问题建模为马尔可夫决策过程（MDP），并采用异质图神经网络（GNN）来捕捉操作、机器和工作之间的复杂关系。为了优化调度策略，我们使用近端策略优化（PPO）。实验结果表明，与传统方法相比，我们的方法显著改善了大规模IPPS实例的解决方案效率和质量，为现代智能制造系统提供了卓越的调度策略。

更新时间: 2024-09-02 06:18:30

领域: math.OC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.00968v1

A computational transition for detecting correlated stochastic block models by low-degree polynomials

Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric communities, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, and subsampling probability $s$. For the detection problem of distinguishing this model from a pair of independent Erd\H{o}s-R\'enyi graphs with the same edge density $\mathcal{G}(n,\tfrac{\lambda s}{n})$, we focus on tests based on \emph{low-degree polynomials} of the entries of the adjacency matrices, and we determine the threshold that separates the easy and hard regimes. More precisely, we show that this class of tests can distinguish these two models if and only if $s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$, where $\alpha\approx 0.338$ is the Otter's constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum threshold. Our proof of low-degree hardness is based on a conditional variant of the low-degree likelihood calculation.

Updated: 2024-09-02 06:14:05

标题: 使用低次多项式检测相关随机块模型的计算性转变

摘要: 检测一对随机图中的相关性是一个基本的统计和计算问题，近年来得到了广泛研究。在这项工作中，我们考虑一对相关（稀疏）随机块模型$\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$，这些模型是从一个共同的父随机块模型$\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon)$中子采样得到的，其中$k=O(1)$表示对称社区数量，平均度$\lambda=O(1)$，离散参数$\epsilon$，和子采样概率$s$。对于区分这个模型与具有相同边密度的独立Erd\H{o}s-R\'enyi图对的检测问题$\mathcal{G}(n,\tfrac{\lambda s}{n})$，我们专注于基于邻接矩阵条目的“低次多项式”的测试，并确定分隔简单和困难区域的阈值。更具体地，我们表明，只有当$s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$时，这类测试才能区分这两个模型，其中$\alpha\approx 0.338$是Otter常数，$\frac{1}{\lambda \epsilon^2}$是Kesten-Stigum阈值。我们对低次难度的证明是基于低次似然计算的条件变体。

更新时间: 2024-09-02 06:14:05

领域: math.PR,cs.DS,cs.LG,math.ST,stat.TH,Primary 68Q87, Secondary 62M20

下载: http://arxiv.org/abs/2409.00966v1

EagleEye: Attention to Unveil Malicious Event Sequences from Provenance Graphs

Securing endpoints is challenging due to the evolving nature of threats and attacks. With endpoint logging systems becoming mature, provenance-graph representations enable the creation of sophisticated behavior rules. However, adapting to the pace of emerging attacks is not scalable with rules. This led to the development of ML models capable of learning from endpoint logs. However, there are still open challenges: i) malicious patterns of malware are spread across long sequences of events, and ii) ML classification results are not interpretable. To address these issues, we develop and present EagleEye, a novel system that i) uses rich features from provenance graphs for behavior event representation, including command-line embeddings, ii) extracts long sequences of events and learns event embeddings, and iii) trains a lightweight Transformer model to classify behavior sequences as malicious or not. We evaluate and compare EagleEye against state-of-the-art baselines on two datasets, namely a new real-world dataset from a corporate environment, and the public DARPA dataset. On the DARPA dataset, at a false-positive rate of 1%, EagleEye detects $\approx$89% of all malicious behavior, outperforming two state-of-the-art solutions by an absolute margin of 38.5%. Furthermore, we show that the Transformer's attention mechanism can be leveraged to highlight the most suspicious events in a long sequence, thereby providing interpretation of malware alerts.

Updated: 2024-09-02 06:07:06

标题: 鹰眼：关注揭示源谱图中的恶意事件序列

摘要: 保护端点是有挑战的，因为威胁和攻击的不断发展。随着端点日志系统的成熟，溯源图表示使得创建复杂的行为规则成为可能。然而，随着新兴攻击的速度，通过规则来适应是不可扩展的。这导致了能够从端点日志中学习的ML模型的开发。然而，仍然存在一些挑战：i）恶意软件的恶意模式分布在长序列事件中，ii）ML分类结果不可解释。为了解决这些问题，我们开发并展示了EagleEye，这是一个新颖的系统，它：i）使用溯源图中的丰富特征来表示行为事件，包括命令行嵌入，ii）提取长序列事件并学习事件嵌入，iii）训练一个轻量级Transformer模型来将行为序列分类为恶意或非恶意。我们在两个数据集上评估和比较EagleEye与最先进的基线，即一个来自企业环境的新实际数据集和公共DARPA数据集。在DARPA数据集上，以1%的误报率，EagleEye检测到大约89%的恶意行为，超过了两种最先进的解决方案38.5%的绝对差距。此外，我们展示了Transformer的注意机制可以用来突出长序列中最可疑的事件，从而提供恶意软件警报的解释。

更新时间: 2024-09-02 06:07:06

领域: cs.CR

下载: http://arxiv.org/abs/2408.09217v2

MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents

Machine learning research, crucial for technological advancements and innovation, often faces significant challenges due to its inherent complexity, slow pace of experimentation, and the necessity for specialized expertise. Motivated by this, we present a new systematic framework, autonomous Machine Learning Research with large language models (MLR-Copilot), designed to enhance machine learning research productivity through the automatic generation and implementation of research ideas using Large Language Model (LLM) agents. The framework consists of three phases: research idea generation, experiment implementation, and implementation execution. First, existing research papers are used to generate hypotheses and experimental plans vis IdeaAgent powered by LLMs. Next, the implementation generation phase translates these plans into executables with ExperimentAgent. This phase leverages retrieved prototype code and optionally retrieves candidate models and data. Finally, the execution phase, also managed by ExperimentAgent, involves running experiments with mechanisms for human feedback and iterative debugging to enhance the likelihood of achieving executable research outcomes. We evaluate our framework on five machine learning research tasks and the experimental results show the framework's potential to facilitate the research progress and innovations.

Updated: 2024-09-02 05:55:06

标题: MLR-Copilot：基于大型语言模型代理的自主机器学习研究

摘要: 机器学习研究对于技术进步和创新至关重要，但常常面临着由于其固有的复杂性、实验的缓慢进展以及对专业知识的需求而带来的重大挑战。基于这一动机，我们提出了一个新的系统框架，即基于大型语言模型的自主机器学习研究（MLR-Copilot），旨在通过使用大型语言模型代理自动生成和实施研究思路，以增强机器学习研究的生产力。该框架包括三个阶段：研究思路生成、实验实施和实现执行。首先，利用现有研究论文通过LLM代理IdeaAgent生成假设和实验计划。接下来，实施生成阶段将这些计划转化为可执行的实验，通过ExperimentAgent实现。该阶段利用检索到的原型代码，并可选择检索候选模型和数据。最后，执行阶段，也由ExperimentAgent管理，涉及运行实验并通过人类反馈和迭代调试机制以增加实现可执行研究结果的可能性。我们在五个机器学习研究任务上评估了我们的框架，实验结果显示了框架促进研究进展和创新的潜力。

更新时间: 2024-09-02 05:55:06

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2408.14033v2

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation

In this work, we study the task of ``visually'' translating scene text from a source language (e.g., Hindi) to a target language (e.g., English). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the source scene text, such as font, size, and background. There are several challenges associated with this task, such as translation with limited context, deciding between translation and transliteration, accommodating varying text lengths within fixed spatial boundaries, and preserving the font and background styles of the source scene text in the target language. To address this problem, we make the following contributions: (i) We study visual translation as a standalone problem for the first time in the literature. (ii) We present a cascaded framework for visual translation that combines state-of-the-art modules for scene text recognition, machine translation, and scene text synthesis as a baseline for the task. (iii) We propose a set of task-specific design enhancements to design a variant of the baseline to obtain performance improvements. (iv) Currently, the existing related literature lacks any comprehensive performance evaluation for this novel task. To fill this gap, we introduce several automatic and user-assisted evaluation metrics designed explicitly for evaluating visual translation. Further, we evaluate presented baselines for translating scene text between Hindi and English. Our experiments demonstrate that although we can effectively perform visual translation over a large collection of scene text images, the presented baseline only partially addresses challenges posed by visual translation tasks. We firmly believe that this new task and the limitations of existing models, as reported in this paper, should encourage further research in visual translation.

Updated: 2024-09-02 05:51:02

标题: 用我的语言展示给我这个世界：建立场景文本到场景文本翻译的第一个基准

摘要: 在这项工作中，我们研究了将场景文本从源语言（如印地语）翻译成目标语言（如英语）的“视觉”翻译任务。视觉翻译不仅涉及对场景文本的识别和翻译，还包括生成保留源场景文本视觉特征（如字体、大小和背景）的翻译图像。这个任务涉及一些挑战，比如有限上下文下的翻译、在固定空间边界内决定翻译和音译、适应不同长度的文本以及在目标语言中保留源场景文本的字体和背景风格。为了解决这个问题，我们提出了以下贡献：（i）我们在文献中第一次研究视觉翻译作为独立问题。（ii）我们提出了一个级联框架用于视觉翻译，该框架将场景文本识别、机器翻译和场景文本合成的最新模块结合起来作为任务的基线。（iii）我们提出了一系列任务特定的设计增强措施，设计了基线的一个变种以获得性能改进。（iv）目前，现有相关文献缺乏对这一新任务的全面性能评估。为了填补这一空白，我们引入了几种专门设计用于评估视觉翻译的自动和用户辅助评估指标。此外，我们评估了用于在印地语和英语之间翻译场景文本的基线。我们的实验表明，虽然我们可以有效地在大量场景文本图像上进行视觉翻译，但所提出的基线只部分解决了视觉翻译任务所面临的挑战。我们坚信这一新任务以及现有模型的局限性，正如本文报道的那样，应该促进对视觉翻译的进一步研究。

更新时间: 2024-09-02 05:51:02

领域: cs.CV,cs.AI,cs.CL,cs.MM

下载: http://arxiv.org/abs/2308.03024v3

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we introduce FastV, a versatile plug-and-play method designed to optimize computational efficiency by learning adaptive attention patterns in early layers and pruning visual tokens in subsequent ones. Our evaluations demonstrate FastV's ability to dramatically reduce computational costs (e.g., a 45 reduction in FLOPs for LLaVA-1.5-13B) without sacrificing performance in a wide range of image and video understanding tasks. The computational efficiency and performance trade-off of FastV are highly customizable and pareto-efficient. It can compress the FLOPs of a 13B-parameter model to achieve a lower budget than that of a 7B-parameter model, while still maintaining superior performance. We believe FastV has practical values for deployment of LVLMs in edge devices and commercial models. Code is released at https://github.com/pkunlp-icler/FastV.

Updated: 2024-09-02 05:48:54

标题: 一个图像在第2层后价值1/2代币：大规模视觉语言模型的即插即用推理加速

摘要: 在这项研究中，我们发现大型视觉-语言模型（LVLMs）中存在效率低下的注意力现象，尤其是在知名模型如LLaVA-1.5、QwenVL-Chat和Video-LLaVA中。我们发现，在流行的LVLMs的深层中，对视觉令牌的注意力计算极其低效，这表明与处理文本数据相比，需要一种更稀疏的方法。为此，我们引入了FastV，这是一种多功能即插即用方法，旨在通过学习早期层中的自适应注意力模式和在随后层中修剪视觉令牌来优化计算效率。我们的评估表明，FastV能够显著降低计算成本（例如，LLaVA-1.5-13B的FLOPs减少了45%），而不会在各种图像和视频理解任务中牺牲性能。FastV的计算效率和性能权衡是高度可定制和帕累托有效的。它可以通过将一个130亿参数模型的FLOPs压缩到低于一个70亿参数模型的预算，同时保持优越的性能。我们相信FastV在在边缘设备和商业模型中部署LVLMs具有实际价值。代码发布在https://github.com/pkunlp-icler/FastV。

更新时间: 2024-09-02 05:48:54

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.06764v3

Semantically Controllable Augmentations for Generalizable Robot Learning

Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training. However, collecting large real-world datasets is intractable due to high operational costs. For robot learning to generalize despite these challenges, it is essential to leverage sources of data or priors beyond the robot's direct experience. In this work, we posit that image-text generative models, which are pre-trained on large corpora of web-scraped data, can serve as such a data source. These generative models encompass a broad range of real-world scenarios beyond a robot's direct experience and can synthesize novel synthetic experiences that expose robotic agents to additional world priors aiding real-world generalization at no extra cost. In particular, our approach leverages pre-trained generative models as an effective tool for data augmentation. We propose a generative augmentation framework for semantically controllable augmentations and rapidly multiplying robot datasets while inducing rich variations that enable real-world generalization. Based on diverse augmentations of robot data, we show how scalable robot manipulation policies can be trained and deployed both in simulation and in unseen real-world environments such as kitchens and table-tops. By demonstrating the effectiveness of image-text generative models in diverse real-world robotic applications, our generative augmentation framework provides a scalable and efficient path for boosting generalization in robot learning at no extra human cost.

Updated: 2024-09-02 05:25:34

标题: 可控语义增强用于通用机器人学习

摘要: 机器人操作到未见过的真实场景的泛化需要在训练过程中接触各种数据集。然而，由于高昂的运营成本，收集大型真实世界数据集是不可行的。尽管面临这些挑战，机器人学习要实现泛化，就必须利用除机器人直接经验之外的数据来源或先验知识。在这项工作中，我们认为在大型网络抓取数据集上预训练的图像-文本生成模型可以作为这样一个数据源。这些生成模型涵盖了机器人直接经验之外的广泛真实世界场景，并能合成新颖的虚拟体验，使机器人代理能够接触到额外的世界先验知识，从而有助于实现无额外成本的真实世界泛化。特别是，我们的方法利用预训练的生成模型作为数据增强的有效工具。我们提出了一个用于语义可控增强和快速扩增机器人数据集的生成增强框架，同时引入丰富的变化，从而实现真实世界泛化。基于对机器人数据的多样增强，我们展示了可在模拟和未见过的真实世界环境（如厨房和桌面）中训练和部署可扩展的机器人操作策略。通过展示图像-文本生成模型在各种真实世界机器人应用中的有效性，我们的生成增强框架为在机器人学习中提升泛化提供了一条可扩展和高效的路径，而无需额外的人力成本。

更新时间: 2024-09-02 05:25:34

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.00951v1

A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. However, these generative techniques raises concerns about data misappropriation and intellectual property infringement. Adversarial attacks on machine learning models have been extensively studied, and a well-established body of research has extended these techniques as a benign metric to prevent the underlying misuse of generative AI. Current approaches to safeguarding images from manipulation by LDMs are limited by their reliance on model-specific knowledge and their inability to significantly degrade semantic quality of generated images. In response to these shortcomings, we propose the Posterior Collapse Attack (PCA) based on the observation that VAEs suffer from posterior collapse during training. Our method minimizes dependence on the white-box information of target models to get rid of the implicit reliance on model-specific knowledge. By accessing merely a small amount of LDM parameters, in specific merely the VAE encoder of LDMs, our method causes a substantial semantic collapse in generation quality, particularly in perceptual consistency, and demonstrates strong transferability across various model architectures. Experimental results show that PCA achieves superior perturbation effects on image generation of LDMs with lower runtime and VRAM. Our method outperforms existing techniques, offering a more robust and generalizable solution that is helpful in alleviating the socio-technical challenges posed by the rapidly evolving landscape of generative AI.

Updated: 2024-09-02 05:25:06

标题: 一种灰盒攻击：针对基于潜在扩散模型的图像编辑的后验坍塌

摘要: 最近在生成式人工智能方面取得的进展，特别是潜在扩散模型（LDMs），已经彻底改变了图像合成和操作。然而，这些生成技术引发了关于数据挪用和知识产权侵权的担忧。对机器学习模型的敌对攻击已经得到广泛研究，而且已经建立了一个完善的研究体系，将这些技术扩展为一种良性度量，以防止生成式人工智能的潜在滥用。目前用于保护图像免受LDMs操作的方法受到其对模型特定知识的依赖以及无法明显降低生成图像的语义质量的限制。针对这些缺点，我们提出了基于观察到VAEs在训练期间出现后验坍缩的后验坍缩攻击（PCA）。我们的方法最大程度地减少对目标模型的白盒信息的依赖，从而摆脱对模型特定知识的隐式依赖。通过仅访问少量LDM参数，特别是LDMs的VAE编码器，我们的方法在生成质量上引起了显著的语义坍塌，特别是在感知一致性方面，并且在各种模型架构之间展示出强大的可转移性。实验结果表明，PCA在低运行时间和VRAM下实现了对LDMs图像生成的卓越扰动效果。我们的方法优于现有技术，提供了一个更为强大和具有普适性的解决方案，有助于缓解生成式人工智能快速发展的社会技术挑战。

更新时间: 2024-09-02 05:25:06

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.10901v2

XNet v2: Fewer Limitations, Better Results and Greater Universality

XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model. XNet v2 performs wavelet-based image-level complementary fusion, using fusion results along with raw images inputs three different sub-networks to construct consistency loss. Furthermore, we introduce a feature-level fusion module to enhance the transfer of low-frequency (LF) information and HF information. XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitve results in fully-supervised learning. More importantly, XNet v2 excels in scenarios where XNet fails. Compared to XNet, XNet v2 exhibits fewer limitations, better results and greater universality. Extensive experiments on three 2D and two 3D datasets demonstrate the effectiveness of XNet v2. Code is available at https://github.com/Yanfeng-Zhou/XNetv2 .

Updated: 2024-09-02 05:20:18

标题: XNet v2: 更少的限制，更好的结果和更广泛的普适性

摘要: XNet引入了基于小波的X形统一架构，用于全监督和半监督生物医学分割。然而，到目前为止，XNet仍然面临一些限制，包括当图像缺乏高频信息时性能下降，原始图像的利用不足和融合不足。为了解决这些问题，我们提出了XNet v2，一个低频和高频互补模型。XNet v2执行基于小波的图像级互补融合，使用融合结果以及原始图像输入三个不同的子网络来构建一致性损失。此外，我们引入了一个特征级融合模块，以增强低频信息和高频信息的传输。XNet v2在半监督分割方面取得了最先进的成果，同时在全监督学习中保持了竞争力的结果。更重要的是，XNet v2在XNet失败的情况下表现出色。与XNet相比，XNet v2的限制更少，结果更好，具有更广泛的普适性。对三个2D和两个3D数据集进行的大量实验表明了XNet v2的有效性。代码可在https://github.com/Yanfeng-Zhou/XNetv2获取。

更新时间: 2024-09-02 05:20:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.00947v1

Pearl: A Production-ready Reinforcement Learning Agent

Reinforcement learning (RL) is a versatile framework for optimizing long-term goals. Although many real-world problems can be formalized with RL, learning and deploying a performant RL policy requires a system designed to address several important challenges, including the exploration-exploitation dilemma, partial observability, dynamic action spaces, and safety concerns. While the importance of these challenges has been well recognized, existing open-source RL libraries do not explicitly address them. This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way. In addition to presenting benchmarking results, we also highlight examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on GitHub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io.

Updated: 2024-09-02 05:18:49

标题: 珍珠：一个可投入生产使用的强化学习代理

摘要: 强化学习（RL）是优化长期目标的多功能框架。尽管许多现实世界中的问题可以用RL形式化，但学习和部署一个高性能的RL策略需要一个系统来解决几个重要挑战，包括勘探-利用困境、部分可观察性、动态行动空间和安全问题。尽管这些挑战的重要性已经得到了广泛认可，但现有的开源RL库并没有明确解决这些问题。本文介绍了Pearl，一个生产就绪的RL软件包，旨在以模块化方式解决这些挑战。除了展示基准测试结果外，我们还强调了Pearl正在进行的产业应用示例，以展示其在生产应用案例中的优势。Pearl在GitHub上开源，其官方网站是pearlagent.github.io。

更新时间: 2024-09-02 05:18:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2312.03814v2

A Framework for Synthetic Audio Conversations Generation using Large Language Models

In this paper, we introduce ConversaSynth, a framework designed to generate synthetic conversation audio using large language models (LLMs) with multiple persona settings. The framework first creates diverse and coherent text-based dialogues across various topics, which are then converted into audio using text-to-speech (TTS) systems. Our experiments demonstrate that ConversaSynth effectively generates highquality synthetic audio datasets, which can significantly enhance the training and evaluation of models for audio tagging, audio classification, and multi-speaker speech recognition. The results indicate that the synthetic datasets generated by ConversaSynth exhibit substantial diversity and realism, making them suitable for developing robust, adaptable audio-based AI systems.

Updated: 2024-09-02 05:09:46

标题: 使用大型语言模型生成合成音频对话的框架

摘要: 在本文中，我们介绍了ConversaSynth，这是一个旨在使用具有多个人设定的大型语言模型（LLMs）生成合成对话音频的框架。该框架首先通过各种话题创建多样且连贯的基于文本的对话，然后利用文本转语音（TTS）系统将其转换为音频。我们的实验表明，ConversaSynth能够有效生成高质量的合成音频数据集，这可以显著提升音频标记、音频分类和多说话者语音识别模型的训练和评估。结果表明，ConversaSynth生成的合成数据集具有相当大的多样性和真实感，使其适用于开发强大、适应性强的基于音频的人工智能系统。

更新时间: 2024-09-02 05:09:46

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.00946v1

Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training

In this paper, we investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems, focusing on enhancing both efficiency and practicality. As distributed ML systems become integral for complex ML tasks, ensuring resilience against Byzantine failures-where workers may contribute incorrect updates due to malice or error-gains paramount importance. Our first contribution is the introduction of the Centered Trimmed Meta Aggregator (CTMA), an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels, while requiring low computational demands. Additionally, we propose harnessing a recently developed gradient estimation technique based on a double-momentum strategy within the Byzantine context. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process and reducing the reliance on numerous hyperparameters. The effectiveness of this technique is supported by theoretical insights within the stochastic convex optimization (SCO) framework and corroborated by empirical evidence.

Updated: 2024-09-02 04:51:17

标题: 容错机器学习：高效的元聚合和同步训练

摘要: 在这篇论文中，我们研究了分布式机器学习系统中拜占庭容错训练的挑战性框架，重点是提高效率和实用性。随着分布式机器学习系统成为复杂机器学习任务的重要组成部分，确保对抗拜占庭故障（工作者可能因恶意或错误而贡献不正确的更新）的弹性变得至关重要。我们的第一个贡献是引入了居中修剪元聚合器（CTMA），一种高效的元聚合器，将基线聚合器升级到最佳性能水平，同时需要较低的计算需求。此外，我们提出在拜占庭环境中利用最近开发的基于双动量策略的梯度估计技术。我们的论文突出了这种技术在拜占庭容错训练中的理论和实际优势，特别是在简化调整过程和减少对大量超参数的依赖方面。这种技术的有效性得到了随机凸优化（SCO）框架内的理论见解的支持，并得到了实证证据的证实。

更新时间: 2024-09-02 04:51:17

领域: cs.LG

下载: http://arxiv.org/abs/2405.14759v3

Large Language Models for Automatic Detection of Sensitive Topics

Sensitive information detection is crucial in content moderation to maintain safe online communities. Assisting in this traditionally manual process could relieve human moderators from overwhelming and tedious tasks, allowing them to focus solely on flagged content that may pose potential risks. Rapidly advancing large language models (LLMs) are known for their capability to understand and process natural language and so present a potential solution to support this process. This study explores the capabilities of five LLMs for detecting sensitive messages in the mental well-being domain within two online datasets and assesses their performance in terms of accuracy, precision, recall, F1 scores, and consistency. Our findings indicate that LLMs have the potential to be integrated into the moderation workflow as a convenient and precise detection tool. The best-performing model, GPT-4o, achieved an average accuracy of 99.5\% and an F1-score of 0.99. We discuss the advantages and potential challenges of using LLMs in the moderation workflow and suggest that future research should address the ethical considerations of utilising this technology.

Updated: 2024-09-02 04:50:42

标题: 大型语言模型用于自动检测敏感话题

摘要: 敏感信息检测在内容审核中至关重要，以维护安全的在线社区。协助传统手动过程可能减轻人工审核员的繁重和乏味任务，使他们能够专注于可能带来潜在风险的被标记内容。快速发展的大型语言模型(LLMs)以其理解和处理自然语言的能力而闻名，因此成为支持这一过程的潜在解决方案。本研究探讨了五种LLMs在心理健康领域内两个在线数据集中检测敏感信息的能力，并评估其准确性、精确度、召回率、F1分数和一致性的表现。我们的研究结果表明，LLMs有潜力作为方便且精确的检测工具整合到审核工作流程中。表现最佳的模型GPT-4o平均准确率达到99.5％，F1分数为0.99。我们讨论了在审核工作流程中使用LLMs的优势和潜在挑战，并建议未来研究应解决利用这项技术的伦理考虑。

更新时间: 2024-09-02 04:50:42

领域: cs.CL,cs.AI,J.6

下载: http://arxiv.org/abs/2409.00940v1

Image Colorization: A Survey and Dataset

Image colorization estimates RGB colors for grayscale images or video frames to improve their aesthetic and perceptual quality. Over the last decade, deep learning techniques for image colorization have significantly progressed, necessitating a systematic survey and benchmarking of these techniques. This article presents a comprehensive survey of recent state-of-the-art deep learning-based image colorization techniques, describing their fundamental block architectures, inputs, optimizers, loss functions, training protocols, training data, etc. It categorizes the existing colorization techniques into seven classes and discusses important factors governing their performance, such as benchmark datasets and evaluation metrics. We highlight the limitations of existing datasets and introduce a new dataset specific to colorization. We perform an extensive experimental evaluation of existing image colorization methods using both existing datasets and our proposed one. Finally, we discuss the limitations of existing methods and recommend possible solutions and future research directions for this rapidly evolving topic of deep image colorization. The dataset and codes for evaluation are publicly available at https://github.com/saeed-anwar/ColorSurvey.

Updated: 2024-09-02 04:15:50

标题: 图像着色：调查与数据集

摘要: 图像着色是为灰度图像或视频帧估计RGB颜色，以提高其审美和感知质量。在过去的十年里，用于图像着色的深度学习技术取得了显著进展，需要对这些技术进行系统调查和基准测试。本文全面调查了最新的基于深度学习的图像着色技术，描述了它们的基本模块架构、输入、优化器、损失函数、训练协议、训练数据等。它将现有的着色技术分类为七类，并讨论了影响其性能的重要因素，如基准数据集和评估指标。我们强调现有数据集的局限性，并介绍了一个特定于着色的新数据集。我们对现有的图像着色方法进行了广泛的实验评估，使用了现有数据集和我们提出的数据集。最后，我们讨论了现有方法的局限性，并推荐可能的解决方案和未来的研究方向，以应对这一快速发展的深度图像着色主题。评估数据集和代码可在https://github.com/saeed-anwar/ColorSurvey 上公开获取。

更新时间: 2024-09-02 04:15:50

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2008.10774v4

An Idiosyncrasy of Time-discretization in Reinforcement Learning

Many reinforcement learning algorithms are built on an assumption that an agent interacts with an environment over fixed-duration, discrete time steps. However, physical systems are continuous in time, requiring a choice of time-discretization granularity when digitally controlling them. Furthermore, such systems do not wait for decisions to be made before advancing the environment state, necessitating the study of how the choice of discretization may affect a reinforcement learning algorithm. In this work, we consider the relationship between the definitions of the continuous-time and discrete-time returns. Specifically, we acknowledge an idiosyncrasy with naively applying a discrete-time algorithm to a discretized continuous-time environment, and note how a simple modification can better align the return definitions. This observation is of practical consideration when dealing with environments where time-discretization granularity is a choice, or situations where such granularity is inherently stochastic.

Updated: 2024-09-02 04:13:50

标题: 一种强化学习中时间离散化的特殊性

摘要: 许多强化学习算法都建立在一个假设之上，即一个代理和环境在固定时长的离散时间步内进行交互。然而，物理系统在时间上是连续的，在数字控制它们时需要选择时间离散化的粒度。此外，这样的系统不会等待决策再推进环境状态，因此需要研究离散化选择可能如何影响强化学习算法。在这项工作中，我们考虑了连续时间和离散时间回报定义之间的关系。具体而言，我们认识到了将离散时间算法天真地应用于离散化连续时间环境时的特殊性，并注意到一个简单的修改如何能更好地对齐回报的定义。当处理时间离散化粒度是一个选择的环境，或者这种粒度本质上是随机的情况时，这一观察是一个实际上的考虑因素。

更新时间: 2024-09-02 04:13:50

领域: cs.LG,cs.AI,I.2.6; I.2.9

下载: http://arxiv.org/abs/2406.14951v2

Unicorn: U-Net for Sea Ice Forecasting with Convolutional Neural Ordinary Differential Equations

Sea ice at the North Pole is vital to global climate dynamics. However, accurately forecasting sea ice poses a significant challenge due to the intricate interaction among multiple variables. Leveraging the capability to integrate multiple inputs and powerful performances seamlessly, many studies have turned to neural networks for sea ice forecasting. This paper introduces a novel deep architecture named Unicorn, designed to forecast weekly sea ice. Our model integrates multiple time series images within its architecture to enhance its forecasting performance. Moreover, we incorporate a bottleneck layer within the U-Net architecture, serving as neural ordinary differential equations with convolution operations, to capture the spatiotemporal dynamics of latent variables. Through real data analysis with datasets spanning from 1998 to 2021, our proposed model demonstrates significant improvements over state-of-the-art models in the sea ice concentration forecasting task. It achieves an average MAE improvement of 12% compared to benchmark models. Additionally, our method outperforms existing approaches in sea ice extent forecasting, achieving a classification performance improvement of approximately 18%. These experimental results show the superiority of our proposed model.

Updated: 2024-09-02 03:37:46

标题: 独角兽：用卷积神经普通微分方程进行海冰预测的U-Net

摘要: 北极海冰对全球气候动态至关重要。然而，由于多个变量之间复杂的相互作用，准确预测海冰面临着重大挑战。许多研究利用整合多个输入和强大性能的能力，已经转向神经网络进行海冰预测。本文介绍了一种名为Unicorn的新型深层架构，旨在预测每周海冰。我们的模型在其架构中整合了多个时间序列图像，以提高其预测性能。此外，我们在U-Net架构中引入了一个瓶颈层，作为具有卷积操作的神经常微分方程，用于捕获潜在变量的时空动态。通过涵盖从1998年到2021年的数据集的实际数据分析，我们提出的模型在海冰浓度预测任务中表现出显著的改进。与基准模型相比，它实现了平均MAE改进12%。此外，我们的方法在海冰范围预测方面优于现有方法，实现了约18%的分类性能改进。这些实验结果展示了我们提出的模型的优越性。

更新时间: 2024-09-02 03:37:46

领域: cs.AI,physics.ao-ph

下载: http://arxiv.org/abs/2405.03929v2

CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models

With the profound development of large language models(LLMs), their safety concerns have garnered increasing attention. However, there is a scarcity of Chinese safety benchmarks for LLMs, and the existing safety taxonomies are inadequate, lacking comprehensive safety detection capabilities in authentic Chinese scenarios. In this work, we introduce CHiSafetyBench, a dedicated safety benchmark for evaluating LLMs' capabilities in identifying risky content and refusing answering risky questions in Chinese contexts. CHiSafetyBench incorporates a dataset that covers a hierarchical Chinese safety taxonomy consisting of 5 risk areas and 31 categories. This dataset comprises two types of tasks: multiple-choice questions and question-answering, evaluating LLMs from the perspectives of risk content identification and the ability to refuse answering risky questions respectively. Utilizing this benchmark, we validate the feasibility of automatic evaluation as a substitute for human evaluation and conduct comprehensive automatic safety assessments on mainstream Chinese LLMs. Our experiments reveal the varying performance of different models across various safety domains, indicating that all models possess considerable potential for improvement in Chinese safety capabilities. Our dataset is publicly available at https://github.com/UnicomAI/UnicomBenchmark/tree/main/CHiSafetyBench.

Updated: 2024-09-02 03:37:35

标题: CHiSafetyBench：大型语言模型的中国层次安全基准

摘要: 随着大型语言模型（LLMs）的深入发展，它们的安全性问题越来越受到关注。然而，目前缺乏针对LLMs的中文安全基准，并且现有的安全分类法不足，缺乏在真实的中文场景中具有全面安全检测能力。在这项工作中，我们介绍了CHiSafetyBench，一个专门用于评估LLMs在识别风险内容和拒绝回答风险问题能力的中文环境中的基准。CHiSafetyBench包括一个数据集，涵盖了一个包含5个风险领域和31个分类的层次化中文安全分类法。该数据集包括两种任务：多项选择题和问答题，分别从风险内容识别和拒绝回答风险问题的能力角度评估LLMs。利用这一基准，我们验证了自动评估作为人工评估替代的可行性，并对主流中文LLMs进行了全面的自动安全评估。我们的实验揭示了不同模型在各种安全领域的表现差异，表明所有模型在中文安全能力方面都具有相当大的改进潜力。我们的数据集可以在以下网址公开获取：https://github.com/UnicomAI/UnicomBenchmark/tree/main/CHiSafetyBench。

更新时间: 2024-09-02 03:37:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.10311v2

Development of Occupancy Prediction Algorithm for Underground Parking Lots

The core objective of this study is to address the perception challenges faced by autonomous driving in adverse environments like basements. Initially, this paper commences with data collection in an underground garage. A simulated underground garage model is established within the CARLA simulation environment, and SemanticKITTI format occupancy ground truth data is collected in this simulated setting. Subsequently, the study integrates a Transformer-based Occupancy Network model to complete the occupancy grid prediction task within this scenario. A comprehensive BEV perception framework is designed to enhance the accuracy of neural network models in dimly lit, challenging autonomous driving environments. Finally, experiments validate the accuracy of the proposed solution's perception performance in basement scenarios. The proposed solution is tested on our self-constructed underground garage dataset, SUSTech-COE-ParkingLot, yielding satisfactory results.

Updated: 2024-09-02 03:31:49

标题: 地下停车场占用预测算法的开发

摘要: 本研究的核心目标是解决自动驾驶在地下车库等恶劣环境中的感知挑战。最初，本文从地下车库中进行数据收集。在CARLA仿真环境中建立了一个模拟地下车库模型，并在这个模拟环境中收集了SemanticKITTI格式的占用地面真实数据。随后，研究将基于Transformer的占用网络模型集成到这种情景中完成占用格预测任务。设计了一个全面的BEV感知框架，以提高神经网络模型在昏暗、具有挑战性的自动驾驶环境中的准确性。最后，实验证实了所提出解决方案在地下车库情景中感知性能的准确性。所提出的解决方案在我们自行构建的地下车库数据集SUSTech-COE-ParkingLot上进行了测试，取得了令人满意的结果。

更新时间: 2024-09-02 03:31:49

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.00923v1

ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model

Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resulting in low testing efficiency. In this paper, we utilize carefully designed prompt engineering to drive the large language model (LLM) to predict high-risk option combinations (i.e., more likely to contain vulnerabilities) and perform fuzz testing automatically without human intervention. We developed a tool called ProphetFuzz and evaluated it on a dataset comprising 52 programs collected from three related studies. The entire experiment consumed 10.44 CPU years. ProphetFuzz successfully predicted 1748 high-risk option combinations at an average cost of only \$8.69 per program. Results show that after 72 hours of fuzzing, ProphetFuzz discovered 364 unique vulnerabilities associated with 12.30\% of the predicted high-risk option combinations, which was 32.85\% higher than that found by state-of-the-art in the same timeframe. Additionally, using ProphetFuzz, we conducted persistent fuzzing on the latest versions of these programs, uncovering 140 vulnerabilities, with 93 confirmed by developers and 21 awarded CVE numbers.

Updated: 2024-09-02 03:31:08

标题: ProphetFuzz：仅通过大型语言模型的文档完全自动化预测和模糊高风险选项组合

摘要: 与选项组合相关的漏洞由于其广泛的搜索空间而对软件安全测试构成重要挑战。先前的研究主要通过变异或过滤技术来解决这一挑战，但这些方法并未有效地将所有选项组合视为具有相等潜在漏洞的可能性，因此在非易受攻击的目标上浪费了大量时间，导致测试效率低下。在本文中，我们利用精心设计的提示工程来驱动大型语言模型（LLM）预测高风险的选项组合（即更有可能包含漏洞），并在无需人为干预的情况下自动执行模糊测试。我们开发了一个名为ProphetFuzz的工具，并在包含52个程序的数据集上对其进行评估，这些程序来自三项相关研究。整个实验耗费了10.44个CPU年。ProphetFuzz成功预测了1748个高风险选项组合，每个程序的平均成本仅为8.69美元。结果显示，在进行了72小时的模糊测试后，ProphetFuzz发现了364个与预测的高风险选项组合中的12.30\%相关的独特漏洞，比同一时间段内最先进技术发现的漏洞数量高出32.85\%。此外，利用ProphetFuzz，我们对这些程序的最新版本进行了持续的模糊测试，发现了140个漏洞，其中93个被开发人员确认，并获得了21个CVE编号。

更新时间: 2024-09-02 03:31:08

领域: cs.CR

下载: http://arxiv.org/abs/2409.00922v1

Statically Contextualizing Large Language Models with Typed Holes

Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate context, particularly when working with definitions not in the training data nor near the cursor. This paper demonstrates that tight integration with the type and binding structure of a language, as exposed by its language server, can address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server identifies the type and typing context of the hole being filled, even in the presence of errors, ensuring that a meaningful program sketch is always available. This allows prompting with codebase-wide contextual information not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications. These applications serve as challenge problems due to their reliance on application-specific data structures. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.

Updated: 2024-09-02 03:29:00

标题: 使用带有类型空缺的大型语言模型进行静态上下文化

摘要: 大型语言模型(LLMs)已经重塑了程序合成的格局。然而，当代基于LLM的代码补全系统经常会出现生成错误代码的幻觉，因为它们缺乏适当的上下文，特别是在处理训练数据中不存在或光标附近的定义时。本文证明，与语言服务器暴露的语言类型和绑定结构紧密集成，可以以一种令牌高效的方式解决这个上下文化问题。简而言之，我们认为AI也需要集成开发环境(IDEs)！具体而言，我们将LLM代码生成集成到Hazel实时程序草图环境中。Hazel语言服务器识别正在填充的空洞的类型和输入上下文，即使存在错误，也确保始终有一个有意义的程序草图可用。这允许提示具有基于代码库的上下文信息，这些信息并不是在光标的词法上下文中，也不一定在同一个文件中，但很可能在开发者目标的语义上下文中。LLM合成的补全然后通过与语言服务器的进一步对话进行迭代细化。为了评估这些技术，我们引入了一个模型-视图-更新(MVU)Web应用程序数据集MVUBench。由于这些应用程序依赖于特定于应用程序的数据结构，这些应用程序作为挑战问题。我们发现，与类型定义的上下文化特别具有影响力。在Hazel的背景下介绍我们的想法后，我们复制我们的技术，并将MVUBench移植到TypeScript，以验证这些方法对于高资源语言的适用性。最后，我们概述了ChatLSP，这是一种保守的Language Server Protocol(LSP)的扩展，语言服务器可以实现，以公开能力，各种设计的AI代码补全系统可以利用这些能力，在为LLM生成提示时整合静态上下文。

更新时间: 2024-09-02 03:29:00

领域: cs.PL,cs.AI,cs.SE,D.3.0

下载: http://arxiv.org/abs/2409.00921v1

On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67\% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.

Updated: 2024-09-02 03:26:58

标题: 关于分布偏移下公共表征对私有迁移学习的益处

摘要: 公共预训练是改进差分隐私模型训练的一种有前途的方法。然而，最近的研究指出，许多研究这一范式的积极结果仅考虑分布内任务，并且可能不适用于预训练和微调数据之间存在分布转移的情况 -- 这种情况在微调私有任务时可能会出现，因为数据的敏感性质。在这项工作中，我们通过三项任务的实证研究表明，即使在存在较大分布转移的情况下，公共特征实际上可以比私有数据从头开始训练提高私有训练准确性高达67％。我们为这一现象提供了一个理论解释，表明如果公共数据和私有数据共享低维表示，即使从公共数据单独学习私有任务是不可能的，公共表示也可以改善私有训练的样本复杂性。总的来说，我们的结果提供了证据表明，在极端分布转移的现实环境中，公共数据确实可以使私有训练变得实用。

更新时间: 2024-09-02 03:26:58

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2312.15551v4

ToolACE: Winning the Points of LLM Function Calling

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

Updated: 2024-09-02 03:19:56

标题: ToolACE：赢得LLM功能调用的积分

摘要: 功能调用显著扩展了大型语言模型的应用边界，其中高质量和多样化的训练数据对于释放这一能力至关重要。然而，真实的功能调用数据很难收集和注释，而现有管道生成的合成数据往往缺乏覆盖范围和准确性。在本文中，我们提出了ToolACE，这是一个自动代理管道，旨在生成准确、复杂和多样化的工具学习数据。ToolACE利用一种新颖的自我演化合成过程来策划一个包含26,507种多样化API的全面API池。对话通过多个代理之间的相互作用生成，受到形式化思维过程的指导。为了确保数据准确性，我们实现了一个结合基于规则和基于模型检查的双层验证系统。我们展示了使用我们合成数据训练的模型，即使只有8B参数，也可以在伯克利Function-Calling排行榜上取得最先进的性能，与最新的GPT-4模型不相上下。我们的模型和部分数据可在https://huggingface.co/Team-ACE 上公开获取。

更新时间: 2024-09-02 03:19:56

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.00920v1

MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT

We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation. The main theme of symbolic music generation primarily encompasses the preprocessing of music data and the implementation of a deep learning framework. Current techniques dedicated to symbolic music generation generally encounter two significant challenges: training data's lack of information about chords and scales and the requirement of specially designed model architecture adapted to the unique format of symbolic music representation. In this paper, we solve the above problems by introducing new symbolic music representation with MusicLang chord analysis model. We propose our MMT-BERT architecture adapting to the representation. To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss. This approach, supported by the in-depth understanding of symbolic music encoded within MusicBERT, fortifies the consonance and humanity of music generated by our method. Experimental results demonstrate the effectiveness of our approach which strictly follows the state-of-the-art methods.

Updated: 2024-09-02 03:18:56

标题: MMT-BERT：基于多轨音乐变压器和MusicBERT的基于和弦感知的符号音乐生成

摘要: 我们提出了一种新颖的符号音乐表示和生成对抗网络（GAN）框架，专门设计用于符号多轨音乐生成。符号音乐生成的主题主要涵盖了音乐数据的预处理和深度学习框架的实现。目前专门用于符号音乐生成的技术通常会面临两个重要挑战：训练数据缺乏有关和弦和音阶的信息，以及需要专门设计的模型架构适应符号音乐表示的独特格式。在本文中，我们通过引入新的符号音乐表示和MusicLang和弦分析模型来解决上述问题。我们提出了适应该表示的MMT-BERT架构。为了构建一个强大的多轨音乐生成器，我们对预训练的MusicBERT模型进行微调，作为鉴别器，并结合相对标准损失。这种方法，通过对MusicBERT中编码的符号音乐的深入理解支持，增强了我们方法生成的音乐的和谐性和人性化。实验结果证明了我们的方法的有效性，严格遵循最先进的方法。

更新时间: 2024-09-02 03:18:56

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2409.00919v1

Generalized Continuous-Time Models for Nesterov's Accelerated Gradient Methods

Recent research has indicated a substantial rise in interest in understanding Nesterov's accelerated gradient methods via their continuous-time models. However, most existing studies focus on specific classes of Nesterov's methods, which hinders the attainment of an in-depth understanding and a unified perspective. To address this deficit, we present generalized continuous-time models that cover a broad range of Nesterov's methods, including those previously studied under existing continuous-time frameworks. Our key contributions are as follows. First, we identify the convergence rates of the generalized models, eliminating the need to determine the convergence rate for any specific continuous-time model derived from them. Second, we show that six existing continuous-time models are special cases of our generalized models, thereby positioning our framework as a unifying tool for analyzing and understanding these models. Third, we design a restart scheme for Nesterov's methods based on our generalized models and show that it ensures a monotonic decrease in objective function values. Owing to the broad applicability of our models, this scheme can be used to a broader class of Nesterov's methods compared to the original restart scheme. Fourth, we uncover a connection between our generalized models and gradient flow in continuous time, showing that the accelerated convergence rates of our generalized models can be attributed to a time reparametrization in gradient flow. Numerical experiment results are provided to support our theoretical analyses and results.

Updated: 2024-09-02 02:56:10

标题: Nesterov加速梯度方法的广义连续时间模型

摘要: 最近的研究表明，人们对通过连续时间模型理解涅斯捷罗夫（Nesterov）的加速梯度方法的兴趣大幅增加。然而，大多数现有研究集中在特定类别的涅斯捷罗夫方法上，这阻碍了对这些方法的深入理解和统一视角的获得。为了解决这一不足，我们提出了一种广义的连续时间模型，涵盖了涅斯捷罗夫方法的广泛范围，包括那些在现有连续时间框架下已经研究过的方法。我们的主要贡献如下。首先，我们确定了广义模型的收敛速度，消除了需要确定从这些模型导出的任何特定连续时间模型的收敛速度的必要性。其次，我们展示了六个现有的连续时间模型是我们广义模型的特例，因此将我们的框架定位为分析和理解这些模型的统一工具。第三，基于我们的广义模型，我们设计了一个针对涅斯捷罗夫方法的重启方案，并展示它确保了目标函数值的单调下降。由于我们模型的广泛适用性，这个方案可以应用于比原始重启方案更广泛的涅斯捷罗夫方法。第四，我们揭示了我们的广义模型与连续时间梯度流之间的联系，展示了我们广义模型的加速收敛速度可以归因于梯度流中的时间重新参数化。我们提供了数值实验结果来支持我们的理论分析和结果。

更新时间: 2024-09-02 02:56:10

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2409.00913v1

ViRED: Prediction of Visual Relations in Engineering Drawings

To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inherently limits its capacity to assess relationships among all entity pairs in the drawings. To address this issue, we propose a vision-based relation detection model, named ViRED, to identify the associations between tables and circuits in electrical engineering drawings. Our model mainly consists of three parts: a vision encoder, an object encoder, and a relation decoder. We implement ViRED using PyTorch to evaluate its performance. To validate the efficacy of ViRED, we conduct a series of experiments. The experimental results indicate that, within the engineering drawing dataset, our approach attained an accuracy of 96\% in the task of relation prediction, marking a substantial improvement over existing methodologies. The results also show that ViRED can inference at a fast speed even when there are numerous objects in a single engineering drawing.

Updated: 2024-09-02 02:42:34

标题: ViRED：工程图纸中视觉关系的预测

摘要: 为了准确理解工程图纸，建立图像与描述表之间的对应关系至关重要。现有的文档理解方法主要集中在文本作为主要模态，这对包含大量图像信息的文档并不适用。在视觉关系检测领域，任务结构固有地限制了其评估图纸中所有实体对之间关系的能力。为了解决这个问题，我们提出了一种基于视觉的关系检测模型，命名为ViRED，用于识别电气工程图纸中表格和电路之间的关联。我们的模型主要包括三个部分：视觉编码器、对象编码器和关系解码器。我们使用PyTorch实现了ViRED以评估其性能。为了验证ViRED的有效性，我们进行了一系列实验。实验结果表明，在工程绘图数据集中，我们的方法在关系预测任务中达到了96\%的准确率，明显优于现有的方法。结果还显示，即使在单个工程图纸中有大量对象时，ViRED也能够以快速速度进行推理。

更新时间: 2024-09-02 02:42:34

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.00909v1

Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction

Motion prediction plays an essential role in autonomous driving systems, enabling autonomous vehicles to achieve more accurate local-path planning and driving decisions based on predictions of the surrounding vehicles. However, existing methods neglect the potential missing values caused by object occlusion, perception failures, etc., which inevitably degrades the trajectory prediction performance in real traffic scenarios. To address this limitation, we propose a novel end-to-end framework for incomplete vehicle trajectory prediction, named Multi-scale Temporal Fusion Transformer (MTFT), which consists of the Multi-scale Attention Head (MAH) and the Continuity Representation-guided Multi-scale Fusion (CRMF) module. Specifically, the MAH leverages the multi-head attention mechanism to parallelly capture multi-scale motion representation of trajectory from different temporal granularities, thus mitigating the adverse effect of missing values on prediction. Furthermore, the multi-scale motion representation is input into the CRMF module for multi-scale fusion to obtain the robust temporal feature of the vehicle. During the fusion process, the continuity representation of vehicle motion is first extracted across time steps to guide the fusion, ensuring that the resulting temporal feature incorporates both detailed information and the overall trend of vehicle motion, which facilitates the accurate decoding of future trajectory that is consistent with the vehicle's motion trend. We evaluate the proposed model on four datasets derived from highway and urban traffic scenarios. The experimental results demonstrate its superior performance in the incomplete vehicle trajectory prediction task compared with state-of-the-art models, e.g., a comprehensive performance improvement of more than 39% on the HighD dataset.

Updated: 2024-09-02 02:36:18

标题: 多尺度时间融合变压器用于不完整车辆轨迹预测

摘要: 运动预测在自动驾驶系统中起着至关重要的作用，使自动驾驶车辆能够基于对周围车辆的预测实现更准确的本地路径规划和驾驶决策。然而，现有方法忽视了由物体遮挡、感知失败等导致的潜在缺失数值，这不可避免地降低了在真实交通场景中的轨迹预测性能。为了解决这一局限性，我们提出了一种新颖的用于不完整车辆轨迹预测的端到端框架，命名为多尺度时间融合变压器（MTFT），它由多尺度注意头（MAH）和连续性表示引导的多尺度融合（CRMF）模块组成。具体而言，MAH利用多头注意机制并行捕获来自不同时间粒度的轨迹的多尺度运动表示，从而减轻了缺失数值对预测的不利影响。此外，多尺度运动表示输入到CRMF模块进行多尺度融合，以获得车辆的稳健时间特征。在融合过程中，首先提取车辆运动在时间步之间的连续性表示来引导融合，确保所得到的时间特征既包含详细信息又包含车辆运动的整体趋势，从而有助于准确解码与车辆运动趋势一致的未来轨迹。我们在从高速公路和城市交通场景中得出的四个数据集上评估了所提出的模型。实验结果表明，与最先进的模型相比，该模型在不完整车辆轨迹预测任务中表现出卓越的性能，例如，在HighD数据集上综合性能提升超过39%。

更新时间: 2024-09-02 02:36:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.00904v1

United We Stand: Decentralized Multi-Agent Planning With Attrition

Decentralized planning is a key element of cooperative multi-agent systems for information gathering tasks. However, despite the high frequency of agent failures in realistic large deployment scenarios, current approaches perform poorly in the presence of failures, by not converging at all, and/or by making very inefficient use of resources (e.g. energy). In this work, we propose Attritable MCTS (A-MCTS), a decentralized MCTS algorithm capable of timely and efficient adaptation to changes in the set of active agents. It is based on the use of a global reward function for the estimation of each agent's local contribution, and regret matching for coordination. We evaluate its effectiveness in realistic data-harvesting problems under different scenarios. We show both theoretically and experimentally that A-MCTS enables efficient adaptation even under high failure rates. Results suggest that, in the presence of frequent failures, our solution improves substantially over the best existing approaches in terms of global utility and scalability.

Updated: 2024-09-02 02:35:50

标题: 团结一致：具有消耗的分散多智能体规划

摘要: 分散规划是合作多智能体系统中信息收集任务的关键要素。然而，在现实大规模部署场景中，智能体故障的频率很高，当前方法在故障存在的情况下表现不佳，要么根本不收敛，要么利用资源非常低效（例如能量）。在这项工作中，我们提出了可退让的MCTS（A-MCTS），这是一种分散的MCTS算法，能够及时有效地适应活跃智能体集合中的变化。它基于使用全局奖励函数来估计每个智能体的局部贡献，并使用遗憾匹配进行协调。我们在不同场景下评估了A-MCTS在现实数据收集问题中的有效性。我们理论上和实验上都表明，在高故障率下，A-MCTS能够实现高效的适应。结果表明，在频繁故障的情况下，我们的解决方案在全局效用和可扩展性方面明显优于现有最佳方法。

更新时间: 2024-09-02 02:35:50

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2407.08254v2

Persuasion Games using Large Language Models

Large Language Models (LLMs) have emerged as formidable instruments capable of comprehending and producing human-like text. This paper explores the potential of LLMs, to shape user perspectives and subsequently influence their decisions on particular tasks. This capability finds applications in diverse domains such as Investment, Credit cards and Insurance, wherein they assist users in selecting appropriate insurance policies, investment plans, Credit cards, Retail, as well as in Behavioral Change Support Systems (BCSS). We present a sophisticated multi-agent framework wherein a consortium of agents operate in collaborative manner. The primary agent engages directly with user agents through persuasive dialogue, while the auxiliary agents perform tasks such as information retrieval, response analysis, development of persuasion strategies, and validation of facts. Empirical evidence from our experiments demonstrates that this collaborative methodology significantly enhances the persuasive efficacy of the LLM. We continuously analyze the resistance of the user agent to persuasive efforts and counteract it by employing a combination of rule-based and LLM-based resistance-persuasion mapping techniques. We employ simulated personas and generate conversations in insurance, banking, and retail domains to evaluate the proficiency of large language models (LLMs) in recognizing, adjusting to, and influencing various personality types. Concurrently, we examine the resistance mechanisms employed by LLM simulated personas. Persuasion is quantified via measurable surveys before and after interaction, LLM-generated scores on conversation, and user decisions (purchase or non-purchase).

Updated: 2024-09-02 02:30:51

标题: 使用大型语言模型进行说服游戏

摘要: 大型语言模型(LLMs)已经成为一种强大的工具，能够理解和生成类似人类的文本。本文探讨了LLMs的潜力，以塑造用户观点并随后影响他们在特定任务上的决策。这种能力在各种领域都有应用，如投资、信用卡和保险，在这些领域中，它们帮助用户选择适当的保险政策、投资计划、信用卡、零售，以及行为变化支持系统(BCSS)。我们提出了一个复杂的多代理框架，其中一组代理以协作方式运作。主要代理通过说服对话直接与用户代理互动，而辅助代理执行任务，如信息检索、响应分析、发展说服策略和验证事实。我们实验证据表明，这种协作方法显著增强了LLM的说服效力。我们不断分析用户代理对说服努力的抵抗，并通过采用基于规则和基于LLM的抵抗-说服映射技术的组合来对抗。我们利用模拟人物并在保险、银行和零售领域生成对话，以评估大型语言模型(LLMs)在识别、适应和影响各种人格类型方面的熟练程度。同时，我们研究LLM模拟人物采用的抵抗机制。通过可测量的调查来量化说服，在互动之前和之后，LLM生成的对话评分以及用户的决策(购买或不购买)。

更新时间: 2024-09-02 02:30:51

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.15879v2

On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks

This paper studies the problem of how efficiently functions in the Sobolev spaces $\mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $\mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$, when the error is measured in the $L^p([0,1]^d)$ norm. This problem has been studied by several recent works, which obtained the approximation rate $\mathcal{O}((WL)^{-2s/d})$ up to logarithmic factors when $p=q=\infty$, and the rate $\mathcal{O}(L^{-2s/d})$ for networks with fixed width when the Sobolev embedding condition $1/q -1/p<s/d$ holds. We generalize these results by showing that the rate $\mathcal{O}((WL)^{-2s/d})$ indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.

Updated: 2024-09-02 02:26:01

标题: 关于使用深度ReLU神经网络对Sobolev和Besov函数进行最优逼近

摘要: 本文研究了在Sobolev空间$\mathcal{W}^{s,q}([0,1]^d)$和Besov空间$\mathcal{B}^s_{q,r}([0,1]^d)$中，当误差以$L^p([0,1]^d)$范数衡量时，深度ReLU神经网络在宽度为$W$和深度为$L$时如何高效逼近函数的问题。最近几项研究已经研究了这个问题，当$p=q=\infty$时获得了逼近速率$\mathcal{O}((WL)^{-2s/d})$，在Sobolev嵌入条件$1/q-1/p<s/d$成立时，对于固定宽度的网络获得了速率$\mathcal{O}(L^{-2s/d})。我们通过展示在Sobolev嵌入条件下速率$\mathcal{O}((WL)^{-2s/d})$的确成立来推广这些结果。已知这个速率是最优的，直到对数因子。我们证明的关键工具是一种新颖的利用不同宽度和深度的深度ReLU神经网络对稀疏向量进行编码，这可能具有独立的兴趣。

更新时间: 2024-09-02 02:26:01

领域: stat.ML,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2409.00901v1

Beta-CoRM: A Bayesian Approach for $n$-gram Profiles Analysis

$n$-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for $n$-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

Updated: 2024-09-02 02:24:21

标题: Beta-CoRM：一种用于$n$-gram配置文件分析的贝叶斯方法

摘要: $n$-gram简介已成功并广泛用于分析可能长度不同的长序列，以进行聚类或分类。主要使用机器学习算法来实现这一目的，但尽管它们在预测性能方面表现出色，但这些方法无法发现隐藏的结构或提供数据的完整概率表示。为此，设计了一种新型的贝叶斯生成模型，用于处理$n$-gram简介作为二进制属性。所提出的模型的灵活性允许在生成模型中考虑一种简单的特征选择方法。此外，还为一种快速的推断过程推导了一个切片取样算法，应用于合成和真实数据场景，并显示特征选择可以提高分类准确性。

更新时间: 2024-09-02 02:24:21

领域: stat.ME,cs.CR,stat.AP

下载: http://arxiv.org/abs/2011.11558v3

Infiltrating the Sky: Data Delay and Overflow Attacks in Earth Observation Constellations

Low Earth Orbit (LEO) Earth Observation (EO) satellites have changed the way we monitor Earth. Acting like moving cameras, EO satellites are formed in constellations with different missions and priorities, and capture vast data that needs to be transmitted to the ground for processing. However, EO satellites have very limited downlink communication capability, limited by transmission bandwidth, number and location of ground stations, and small transmission windows due to high velocity satellite movement. To optimize resource utilization, EO constellations are expected to share communication spectrum and ground stations for maximum communication efficiency. In this paper, we investigate a new attack surface exposed by resource competition in EO constellations, targeting the delay or drop of Earth monitoring data using legitimate EO services. Specifically, an attacker can inject high-priority requests to temporarily preempt low-priority data transmission windows. Furthermore, we show that by utilizing predictable satellite dynamics, an attacker can intelligently target critical data from low-priority satellites, either delaying its delivery or irreversibly dropping the data. We formulate two attacks, the data delay attack and the data overflow attack, design algorithms to assist attackers in devising attack strategies, and analyze their feasibility or optimality in typical scenarios. We then conduct trace-driven simulations using real-world satellite images and orbit data to evaluate the success probability of launching these attacks under realistic satellite communication settings. We also discuss possible defenses against these attacks.

Updated: 2024-09-02 02:20:13

标题: 渗透天空：地球观测星座中的数据延迟和溢出攻击

摘要: 低地球轨道（LEO）地球观测（EO）卫星已经改变了我们监测地球的方式。EO卫星像移动的摄像头一样，以不同的任务和优先级形成星座，并捕获大量数据需要传输到地面进行处理。然而，EO卫星的下行通信能力非常有限，受到传输带宽、地面站数量和位置以及高速卫星运动造成的小传输窗口限制。为了优化资源利用，EO星座预计会共享通信频谱和地面站，以实现最大的通信效率。在本文中，我们调查了EO星座中由资源竞争暴露出的新攻击表面，针对使用合法的EO服务延迟或丢失地球监测数据。具体而言，攻击者可以注入高优先级请求，暂时抢占低优先级数据传输窗口。此外，我们展示了通过利用可预测的卫星动态，攻击者可以智能地瞄准低优先级卫星的关键数据，延迟其传送或无法恢复地丢弃数据。我们制定了两种攻击，数据延迟攻击和数据溢出攻击，设计算法来帮助攻击者制定攻击策略，并分析它们在典型场景中的可行性或最优性。然后，我们使用真实卫星图像和轨道数据进行基于跟踪的模拟，评估在现实卫星通信设置下发动这些攻击的成功概率。我们还讨论了可能的对抗措施。

更新时间: 2024-09-02 02:20:13

领域: cs.NI,cs.CR,cs.ET

下载: http://arxiv.org/abs/2409.00897v1

Improving Adaptivity via Over-Parameterization in Sequence Models

It is well known that eigenfunctions of a kernel play a crucial role in kernel regression. Through several examples, we demonstrate that even with the same set of eigenfunctions, the order of these functions significantly impacts regression outcomes. Simplifying the model by diagonalizing the kernel, we introduce an over-parameterized gradient descent in the realm of sequence model to capture the effects of various orders of a fixed set of eigen-functions. This method is designed to explore the impact of varying eigenfunction orders. Our theoretical results show that the over-parameterization gradient flow can adapt to the underlying structure of the signal and significantly outperform the vanilla gradient flow method. Moreover, we also demonstrate that deeper over-parameterization can further enhance the generalization capability of the model. These results not only provide a new perspective on the benefits of over-parameterization and but also offer insights into the adaptivity and generalization potential of neural networks beyond the kernel regime.

Updated: 2024-09-02 02:11:52

标题: 在序列模型中通过过参数化来提高适应性

摘要: 众所周知，核函数的特征函数在核回归中起着至关重要的作用。通过几个例子，我们演示了即使具有相同的特征函数集，这些函数的顺序也显著影响回归结果。通过对核进行对角化简化模型，我们引入了一种过度参数化的梯度下降方法，以捕捉固定一组特征函数各种顺序的效应。该方法旨在探索不同特征函数顺序的影响。我们的理论结果表明，过度参数化的梯度流可以适应信号的基本结构，并显著优于普通梯度流方法。此外，我们还证明了更深的过度参数化可以进一步增强模型的泛化能力。这些结果不仅为过度参数化的好处提供了新的视角，而且还为神经网络在核范围之外的适应性和泛化潜力提供了洞察。

更新时间: 2024-09-02 02:11:52

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2409.00894v1

NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches

Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs and may necessitate trained personnel. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images, but the lack of comprehensive datasets with diverse viewpoints, modalities and food annotations hinders the accuracy and realism of such methods. To address this limitation, we introduce NutritionVerse-Synth, the first large-scale dataset of 84,984 photorealistic synthetic 2D food images with associated dietary information and multimodal annotations (including depth images, instance masks, and semantic masks). Additionally, we collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism. Leveraging these novel datasets, we develop and benchmark NutritionVerse, an empirical study of various dietary intake estimation approaches, including indirect segmentation-based and direct prediction networks. We further fine-tune models pretrained on synthetic data with real images to provide insights into the fusion of synthetic and real data. Finally, we release both datasets (NutritionVerse-Synth, NutritionVerse-Real) on https://www.kaggle.com/nutritionverse/datasets as part of an open initiative to accelerate machine learning for dietary sensing.

Updated: 2024-09-02 02:08:22

标题: 营养诗篇：各种饮食摄入估计方法的实证研究

摘要: 准确的膳食摄入估计对于制定政策和支持健康饮食至关重要，因为营养不良直接导致生活质量下降。然而，像食物日记这样的自我报告方法存在重大偏见。其他传统的饮食评估技术和新兴的替代方法，如移动应用程序，需要大量时间成本，并可能需要受过训练的人员。最近的工作集中在利用计算机视觉和机器学习自动估计食物图像的膳食摄入量，但缺乏具有多样化视角、模态和食物注释的综合数据集阻碍了这种方法的准确性和现实性。为了解决这一局限性，我们介绍了NutritionVerse-Synth，这是第一个包含84,984个逼真的合成2D食物图像和相关膳食信息以及多模态注释（包括深度图像、实例掩模和语义掩模）的大规模数据集。此外，我们收集了一个真实图像数据集NutritionVerse-Real，包含251种菜肴的889张图像，用于评估现实感。利用这些新颖的数据集，我们开发和评估了NutritionVerse，这是对各种膳食摄入估计方法的实证研究，包括间接分割基于和直接预测网络。我们进一步对在合成数据上预训练的模型进行微调，并使用真实图像提供关于合成和真实数据融合的见解。最后，我们在https://www.kaggle.com/nutritionverse/datasets上发布了这两个数据集（NutritionVerse-Synth、NutritionVerse-Real），作为加速机器学习膳食感知的开放倡议的一部分。

更新时间: 2024-09-02 02:08:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.07704v2

From Wide to Deep: Dimension Lifting Network for Parameter-efficient Knowledge Graph Embedding

Knowledge graph embedding (KGE) that maps entities and relations into vector representations is essential for downstream applications. Conventional KGE methods require high-dimensional representations to learn the complex structure of knowledge graph, but lead to oversized model parameters. Recent advances reduce parameters by low-dimensional entity representations, while developing techniques (e.g., knowledge distillation or reinvented representation forms) to compensate for reduced dimension. However, such operations introduce complicated computations and model designs that may not benefit large knowledge graphs. To seek a simple strategy to improve the parameter efficiency of conventional KGE models, we take inspiration from that deeper neural networks require exponentially fewer parameters to achieve expressiveness comparable to wider networks for compositional structures. We view all entity representations as a single-layer embedding network, and conventional KGE methods that adopt high-dimensional entity representations equal widening the embedding network to gain expressiveness. To achieve parameter efficiency, we instead propose a deeper embedding network for entity representations, i.e., a narrow entity embedding layer plus a multi-layer dimension lifting network (LiftNet). Experiments on three public datasets show that by integrating LiftNet, four conventional KGE methods with 16-dimensional representations achieve comparable link prediction accuracy as original models that adopt 512-dimensional representations, saving 68.4% to 96.9% parameters.

Updated: 2024-09-02 01:48:34

标题: 从宽到深：用于参数高效知识图嵌入的维度提升网络

摘要: 知识图谱嵌入（KGE）将实体和关系映射到向量表示对于下游应用至关重要。传统的KGE方法需要高维表示来学习知识图谱的复杂结构，但导致模型参数过大。最近的进展通过低维实体表示来减少参数，同时发展技术（例如知识蒸馏或重新设计表示形式）来弥补降低的维度。然而，这种操作引入了复杂的计算和模型设计，可能并不适用于大型知识图谱。为了寻求改进传统KGE模型参数效率的简单策略，我们从更深的神经网络需要指数级更少的参数来实现与更宽网络相当的表现力的启发。我们将所有实体表示视为单层嵌入网络，并且采用高维实体表示的传统KGE方法等同于扩展嵌入网络以获得表现力。为了实现参数效率，我们提出了一个更深的实体表示嵌入网络，即一个窄实体嵌入层加上一个多层维度提升网络（LiftNet）。在三个公共数据集上的实验表明，通过整合LiftNet，采用16维表示的四种传统KGE方法实现了与采用512维表示的原始模型相当的链接预测准确率，节省了68.4%到96.9%的参数。

更新时间: 2024-09-02 01:48:34

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2303.12816v4

TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Reinforcement learning (RL) trains an agent from experiences interacting with the environment. In scenarios where online interactions are impractical, offline RL, which trains the agent using pre-collected datasets, has become popular. While this new paradigm presents remarkable effectiveness across various real-world domains, like healthcare and energy management, there is a growing demand to enable agents to rapidly and completely eliminate the influence of specific trajectories from both the training dataset and the trained agents. To meet this problem, this paper advocates Trajdeleter, the first practical approach to trajectory unlearning for offline RL agents. The key idea of Trajdeleter is to guide the agent to demonstrate deteriorating performance when it encounters states associated with unlearning trajectories. Simultaneously, it ensures the agent maintains its original performance level when facing other remaining trajectories. Additionally, we introduce Trajauditor, a simple yet efficient method to evaluate whether Trajdeleter successfully eliminates the specific trajectories of influence from the offline RL agent. Extensive experiments conducted on six offline RL algorithms and three tasks demonstrate that Trajdeleter requires only about 1.5% of the time needed for retraining from scratch. It effectively unlearns an average of 94.8% of the targeted trajectories yet still performs well in actual environment interactions after unlearning. The replication package and agent parameters are available online.

Updated: 2024-09-02 01:39:58

标题: TrajDeleter：在离线强化学习代理中实现轨迹遗忘

摘要: 强化学习（RL）通过与环境互动的经验训练代理。在在线交互不可行的情况下，离线RL通过使用预先收集的数据集训练代理已经变得流行起来。虽然这种新范式在各种真实世界领域，如医疗保健和能源管理中表现出显著的有效性，但人们越来越需要使代理能够迅速完全消除训练数据集和训练代理中特定轨迹的影响。为解决这一问题，本文提出了Trajdeleter，这是离线RL代理的轨迹遗忘的第一个实用方法。Trajdeleter的关键思想是引导代理在遇到与遗忘轨迹相关的状态时表现出性能下降。同时，它确保代理在面对其他剩余轨迹时保持其原始性能水平。此外，我们引入了Trajauditor，这是一种简单而高效的方法，用于评估Trajdeleter是否成功消除了对离线RL代理的特定轨迹的影响。对六种离线RL算法和三个任务进行的大量实验表明，Trajdeleter只需大约原始训练时间的1.5%。它有效地遗忘了94.8%的目标轨迹，但在遗忘后在实际环境交互中仍表现良好。复制包和代理参数可在线获得。

更新时间: 2024-09-02 01:39:58

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2404.12530v2

User-Specific Dialogue Generation with User Profile-Aware Pre-Training Model and Parameter-Efficient Fine-Tuning

This paper addresses user-specific dialogs. In contrast to previous research on personalized dialogue focused on achieving virtual user dialogue as defined by persona descriptions, user-specific dialogue aims to reproduce real-user dialogue beyond persona-based dialogue. Fine-tuning using the target user's dialogue history is an efficient learning method for a user-specific model. However, it is prone to overfitting and model destruction due to the small amount of data. Therefore, we propose a learning method for user-specific models by combining parameter-efficient fine-tuning with a pre-trained dialogue model that includes user profiles. Parameter-efficient fine-tuning adds a small number of parameters to the entire model, so even small amounts of training data can be trained efficiently and are robust to model destruction. In addition, the pre-trained model, which is learned by adding simple prompts for automatically inferred user profiles, can generate speech with enhanced knowledge of the user's profile, even when there is little training data during fine-tuning. In experiments, we compared the proposed model with large-language-model utterance generation using prompts containing users' personal information. Experiments reproducing real users' utterances revealed that the proposed model can generate utterances with higher reproducibility than the compared methods, even with a small model.

Updated: 2024-09-02 01:30:40

标题: 使用用户资料感知的预训练模型和参数高效微调的用户特定对话生成

摘要: 这篇论文讨论了用户特定的对话。与以前关于个性化对话的研究侧重于实现虚拟用户对话的研究相比，虚拟用户对话是根据人物描述定义的，而用户特定对话旨在重现超越基于人物描述的对话的真实用户对话。使用目标用户的对话历史进行微调是一种有效的学习方法，用于用户特定模型。然而，由于数据量较小，它容易出现过拟合和模型破坏。因此，我们提出了一种结合了参数高效微调和包含用户资料的预训练对话模型的用户特定模型的学习方法。参数高效微调向整个模型添加了少量参数，因此即使是少量训练数据也可以高效训练，并且对模型破坏具有鲁棒性。此外，通过添加简单提示来自动推断用户资料进行学习的预训练模型，即使在微调过程中数据量很小，也可以生成具有增强知识的用户资料的语音。在实验中，我们将提出的模型与包含用户个人信息的提示的大型语言模型话语生成进行了比较。重现真实用户话语的实验表明，提出的模型即使是一个小模型，也可以生成具有更高可重现性的话语，比较方法更胜一筹。

更新时间: 2024-09-02 01:30:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.00887v1

Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model

Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure, offering a noninvasive method for personalized breast cancer management. We have curated the largest multiparametric breast MRI dataset, involving 5,205 patients from three hospitals in the north, southeast, and southwest of China, for the development and extensive evaluation of our model. MOME demonstrated accurate and robust identification of breast cancer. It achieved comparable performance for malignancy recognition to that of four senior radiologists and significantly outperformed a junior radiologist, with 0.913 AUROC, 0.948 AUPRC, 0.905 F1 score, and 0.723 MCC. Our findings suggest that MOME could reduce the need for biopsies in BI-RADS 4 patients with a ratio of 7.3%, classify triple-negative breast cancer with an AUROC of 0.709, and predict pathological complete response to neoadjuvant chemotherapy with an AUROC of 0.694. The model further supports scalable and interpretable inference, adapting to missing modalities and providing decision explanations by highlighting lesions and measuring modality contributions. MOME exemplifies a discriminative, robust, scalable, and interpretable multimodal model, paving the way for noninvasive, personalized management of breast cancer patients based on multiparametric breast imaging data.

Updated: 2024-09-02 00:52:01

标题: 朝向通过大型多参数MRI混合模态专家模型实现乳腺癌患者的非侵入性个性化管理

摘要: 乳腺磁共振成像（MRI）是检测乳腺癌最敏感的成像技术，常规用于高危妇女。尽管乳腺MRI具有全面的多参数协议，但现有基于人工智能的研究主要依赖单一序列，并且受到验证的限制。在这里，我们报告了一个大型的多模态专家混合模型（MOME），它将多参数MRI信息集成到统一结构中，为个性化乳腺癌管理提供了一种无创方法。我们整理了最大的多参数乳腺MRI数据集，涉及中国北部、东南部和西南部三家医院的5,205名患者，用于开发和广泛评估我们的模型。MOME展示了准确和稳健的乳腺癌识别能力。它在恶性识别方面达到了与四名资深放射科医师相当的性能，并显著优于一名初级放射科医师，具有0.913 AUROC、0.948 AUPRC、0.905 F1分数和0.723 MCC。我们的研究结果表明，MOME可以降低BI-RADS 4患者进行活检的需求，比例为7.3％，将三阴性乳腺癌分类，AUROC为0.709，并预测新辅助化疗的病理完全反应，AUROC为0.694。该模型进一步支持可扩展和可解释的推断，适应缺失的模态，并通过突出病变和测量模态贡献来提供决策解释。MOME展示了一个具有辨别力、稳健、可扩展和可解释性的多模态模型，为基于多参数乳腺成像数据的无创、个性化乳腺癌患者管理铺平了道路。

更新时间: 2024-09-02 00:52:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.12606v2

Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment

Out-of-distribution (OOD) detectors can act as safety monitors in embedded cyber-physical systems by identifying samples outside a machine learning model's training distribution to prevent potentially unsafe actions. However, OOD detectors are often implemented using deep neural networks, which makes it difficult to meet real-time deadlines on embedded systems with memory and power constraints. We consider the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and apply quantization, pruning, and knowledge distillation. These techniques have been explored for other deep models, but no work has considered their combined effect on latent space OOD detection. While these techniques increase the VAE's test loss, this does not correspond to a proportional decrease in OOD detection performance and we leverage this to develop lean OOD detectors capable of real-time inference on embedded CPUs and GPUs. We propose a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector. We demonstrate this methodology with two existing OOD detectors on a Jetson Nano and reduce GPU and CPU inference time by 20% and 28% respectively while keeping AUROC within 5% of the baseline.

Updated: 2024-09-02 00:39:29

标题: 将基于VAE的超出分布检测器压缩用于嵌入式部署

摘要: 外分布（OOD）检测器可以在嵌入式智能物理系统中作为安全监控器，通过识别训练分布之外的样本来防止潜在的不安全行为。然而，OOD检测器通常使用深度神经网络实现，这使得在具有内存和功耗约束的嵌入式系统上很难满足实时截止时间。我们考虑基于变分自动编码器（VAE）的OOD检测器类，其中OOD检测是在潜在空间中进行的，并应用量化、修剪和知识蒸馏。这些技术已被探讨用于其他深度模型，但没有研究考虑它们对潜在空间OOD检测的联合效果。虽然这些技术会增加VAE的测试损失，但这并不意味着OOD检测性能成比例地下降，我们利用这一点开发出能够在嵌入式CPU和GPU上进行实时推断的精简OOD检测器。我们提出了一种设计方法，结合了所有三种压缩技术，并在保持给定OOD检测器的AUROC的同时，显著减少了内存和执行时间。我们通过在Jetson Nano上使用两种现有OOD检测器来展示这一方法，将GPU和CPU推断时间分别减少了20%和28%，同时保持AUROC在基线的5%范围内。

更新时间: 2024-09-02 00:39:29

领域: cs.LG

下载: http://arxiv.org/abs/2409.00880v1

Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts

The traditional viewpoint on Sparse Mixture of Experts (MoE) models is that instead of training a single large expert, which is computationally expensive, we can train many small experts. The hope is that if the total parameter count of the small experts equals that of the singular large expert, then we retain the representation power of the large expert while gaining computational tractability and promoting expert specialization. The recently introduced Soft MoE replaces the Sparse MoE's discrete routing mechanism with a differentiable gating function that smoothly mixes tokens. While this smooth gating function successfully mitigates the various training instabilities associated with Sparse MoE, it is unclear whether it induces implicit biases that affect Soft MoE's representation power or potential for expert specialization. We prove that Soft MoE with a single arbitrarily powerful expert cannot represent simple convex functions. This justifies that Soft MoE's success cannot be explained by the traditional viewpoint of many small experts collectively mimicking the representation power of a single large expert, and that multiple experts are actually necessary to achieve good representation power (even for a fixed total parameter count). Continuing along this line of investigation, we introduce a notion of expert specialization for Soft MoE, and while varying the number of experts yet fixing the total parameter count, we consider the following (computationally intractable) task. Given any input, how can we discover the expert subset that is specialized to predict this input's label? We empirically show that when there are many small experts, the architecture is implicitly biased in a fashion that allows us to efficiently approximate the specialized expert subset. Our method can be easily implemented to potentially reduce computation during inference.

Updated: 2024-09-02 00:39:00

标题: 超越参数数量：软混合专家中的隐性偏见

摘要: 传统观点认为，稀疏混合专家（MoE）模型不是训练一个计算昂贵的大专家，而是训练许多小专家。希望是，如果小专家的总参数数量等于单个大专家的参数数量，那么我们可以保留大专家的表示能力，同时获得计算的可行性并促进专家的专业化。最近引入的Soft MoE用可微分的门控函数取代了Sparse MoE的离散路由机制，平滑地混合令牌。虽然这种平滑的门控函数成功地缓解了与Sparse MoE相关的各种训练不稳定性，但它不清楚是否会引起影响Soft MoE的表示能力或专家专业化潜力的隐含偏见。我们证明具有单个任意强大专家的Soft MoE不能表示简单的凸函数。这证明了Soft MoE的成功不能用许多小专家共同模拟单个大专家的表示能力的传统观点来解释，实际上需要多个专家才能实现良好的表示能力（即使对于固定的总参数数量）。沿着这一调查线继续，我们引入了Soft MoE的专家专业化概念，同时改变专家的数量但固定总参数数量，我们考虑以下（计算复杂）任务。给定任何输入，我们如何发现专门用于预测该输入标签的专家子集？我们在实证上表明，当有许多小专家时，体系结构在某种方式上会被隐含偏见，从而使我们能够有效地逼近专门的专家子集。我们的方法可以很容易地实现，从而可能在推断期间减少计算。

更新时间: 2024-09-02 00:39:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.00879v1

Blending Neural Operators and Relaxation Methods in PDE Numerical Solvers

Neural networks suffer from spectral bias having difficulty in representing the high frequency components of a function while relaxation methods can resolve high frequencies efficiently but stall at moderate to low frequencies. We exploit the weaknesses of the two approaches by combining them synergistically to develop a fast numerical solver of partial differential equations (PDEs) at scale. Specifically, we propose HINTS, a hybrid, iterative, numerical, and transferable solver by integrating a Deep Operator Network (DeepONet) with standard relaxation methods, leading to parallel efficiency and algorithmic scalability for a wide class of PDEs, not tractable with existing monolithic solvers. HINTS balances the convergence behavior across the spectrum of eigenmodes by utilizing the spectral bias of DeepONet, resulting in a uniform convergence rate and hence exceptional performance of the hybrid solver overall. Moreover, HINTS applies to large-scale, multidimensional systems, it is flexible with regards to discretizations, computational domain, and boundary conditions.

Updated: 2024-09-02 00:22:45

标题: 在PDE数值求解器中融合神经算子和松弛方法

摘要: 神经网络在表示函数的高频组分时存在谱偏差，而松弛方法可以高效地解决高频组分，但在中低频率时会停滞。我们通过将两种方法的弱点结合起来，协同地开发了一种快速数值求解偏微分方程（PDEs）的方法。具体而言，我们提出了HINTS，这是一种混合、迭代、数值化和可转移的求解器，通过将深层算子网络（DeepONet）与标准松弛方法集成在一起，实现了广泛类别的PDEs的并行效率和算法可扩展性，这是现有单体求解器所无法实现的。HINTS通过利用DeepONet的谱偏差平衡了特征模式谱的收敛性行为，从而实现了统一的收敛速度，因此整体混合求解器的性能异常出色。此外，HINTS适用于大规模、多维系统，对于离散化、计算域和边界条件具有灵活性。

更新时间: 2024-09-02 00:22:45

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2208.13273v2