Arxiv Day: Article

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

The introduction of genome engineering technology has transformed biomedical research, making it possible to make precise changes to genetic information. However, creating an efficient gene-editing system requires a deep understanding of CRISPR technology, and the complex experimental systems under investigation. While Large Language Models (LLMs) have shown promise in various tasks, they often lack specific knowledge and struggle to accurately solve biological design problems. In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments. CRISPR-GPT leverages the reasoning ability of LLMs to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing outcomes. We showcase the potential of CRISPR-GPT for assisting non-expert researchers with gene-editing experiments from scratch and validate the agent's effectiveness in a real-world use case. Furthermore, we explore the ethical and regulatory considerations associated with automated gene-editing design, highlighting the need for responsible and transparent use of these tools. Our work aims to bridge the gap between beginner biological researchers and CRISPR genome engineering techniques, and demonstrate the potential of LLM agents in facilitating complex biological discovery tasks.

Updated: 2024-04-27 22:59:17

标题: CRISPR-GPT：一种用于自动设计基因编辑实验的LLM代理

摘要: 基因组工程技术的引入已经改变了生物医学研究，使得对基因信息进行精确改变成为可能。然而，创建一个高效的基因编辑系统需要对CRISPR技术和正在研究的复杂实验系统有深刻的理解。虽然大型语言模型（LLMs）在各种任务中显示出潜力，但它们通常缺乏特定知识，并且难以准确解决生物设计问题。在这项工作中，我们介绍了CRISPR-GPT，这是一个经过领域知识和外部工具增强的LLM代理，用于自动化和增强基于CRISPR的基因编辑实验的设计过程。CRISPR-GPT利用LLM的推理能力来促进选择CRISPR系统、设计引导RNA、推荐细胞递送方法、起草协议以及设计验证实验以确认编辑结果的过程。我们展示了CRISPR-GPT协助非专业研究人员从零开始进行基因编辑实验的潜力，并验证了该代理在真实用例中的有效性。此外，我们探讨了与自动化基因编辑设计相关的伦理和监管考虑，强调了对这些工具负责和透明使用的必要性。我们的工作旨在弥合初学者生物研究人员和CRISPR基因组工程技术之间的差距，并展示了LLM代理在促进复杂生物发现任务中的潜力。

更新时间: 2024-04-27 22:59:17

领域: cs.AI,cs.CL,cs.HC,q-bio.QM

下载: http://arxiv.org/abs/2404.18021v1

CHAI: Clustered Head Attention for Efficient LLM Inference

Large Language Models (LLMs) with hundreds of billions of parameters have transformed the field of machine learning. However, serving these models at inference time is both compute and memory intensive, where a single request can require multiple GPUs and tens of Gigabytes of memory. Multi-Head Attention is one of the key components of LLMs, which can account for over 50% of LLMs memory and compute requirement. We observe that there is a high amount of redundancy across heads on which tokens they pay attention to. Based on this insight, we propose Clustered Head Attention (CHAI). CHAI combines heads with a high amount of correlation for self-attention at runtime, thus reducing both memory and compute. In our experiments, we show that CHAI is able to reduce the memory requirements for storing K,V cache by up to 21.4% and inference time latency by up to 1.73x without any fine-tuning required. CHAI achieves this with a maximum 3.2% deviation in accuracy across 3 different models (i.e. OPT-66B, LLAMA-7B, LLAMA-33B) and 5 different evaluation datasets.

Updated: 2024-04-27 22:45:39

标题: CHAI：用于高效LLM推理的集群头部注意力

摘要: 具有数百亿参数的大型语言模型(LLMs)已经改变了机器学习领域。然而，在推断时为这些模型提供服务既需要计算资源又需要内存，一个请求可能需要多个GPU和数十GB的内存。多头注意力是LLMs的关键组件之一，可以占据LLMs内存和计算需求的50%以上。我们观察到，在哪些标记上它们关注的头部之间存在高度冗余。基于这一观察，我们提出了Clustered Head Attention (CHAI)。CHAI在运行时结合了具有高度相关性的头部进行自注意力，从而降低了内存和计算开销。在我们的实验中，我们展示了CHAI能够将存储K,V缓存的内存需求降低高达21.4%，将推理时间延迟降低高达1.73倍，而无需任何微调。CHAI在3种不同模型(OPT-66B，LLAMA-7B，LLAMA-33B)和5个不同的评估数据集上，在准确性上最多只有3.2%的偏差。

更新时间: 2024-04-27 22:45:39

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2403.08058v2

Enhanced DareFightingICE Competitions: Sound Design and AI Competitions

This paper presents a new and improved DareFightingICE platform, a fighting game platform with a focus on visually impaired players (VIPs), in the Unity game engine. It also introduces the separation of the DareFightingICE Competition into two standalone competitions called DareFightingICE Sound Design Competition and DareFightingICE AI Competition--at the 2024 IEEE Conference on Games (CoG)--in which a new platform will be used. This new platform is an enhanced version of the old DareFightingICE platform, having a better audio system to convey 3D sound and a better way to send audio data to AI agents. With this enhancement and by utilizing Unity, the new DareFightingICE platform is more accessible in terms of adding new features for VIPs and future audio research. This paper also improves the evaluation method for evaluating sound designs in the Sound Design Competition which will ensure a better sound design for VIPs as this competition continues to run at future CoG. To the best of our knowledge, both of our competitions are first of their kind, and the connection between the competitions to mutually improve the entries' quality with time makes these competitions an important part of representing an often overlooked segment within the broader gaming community, VIPs.

Updated: 2024-04-27 22:03:35

标题: 增强 DareFightingICE 比赛：声音设计和人工智能竞赛

摘要: 这篇论文介绍了一个新的改进过的DareFightingICE平台，这是一个专注于视障玩家（VIPs）的格斗游戏平台，使用Unity游戏引擎。它还将DareFightingICE比赛分为两个独立的比赛，分别称为DareFightingICE声音设计比赛和DareFightingICE人工智能比赛-在2024年IEEE游戏大会（CoG）上进行，新平台将被使用。这个新平台是老DareFightingICE平台的增强版本，拥有更好的音频系统来传达3D声音，并且有更好的方法将音频数据发送给人工智能代理。通过这种增强和利用Unity，新的DareFightingICE平台在为VIPs和未来音频研究添加新功能方面更加可访问。本文还改进了评估声音设计的评估方法，确保未来CoG继续运行时，VIPs会获得更好的声音设计。据我们所知，我们的两个比赛都是首次举办的，比赛之间相互改进条目质量的联系使这些比赛成为广泛游戏社区中常被忽视的部分，VIPs的重要代表。

更新时间: 2024-04-27 22:03:35

领域: cs.HC,cs.AI,cs.SD,eess.AS,I.2; H.5.2; H.5.5

下载: http://arxiv.org/abs/2403.02687v2

Application of Deep Learning for Factor Timing in Asset Management

The paper examines the performance of regression models (OLS linear regression, Ridge regression, Random Forest, and Fully-connected Neural Network) on the prediction of CMA (Conservative Minus Aggressive) factor premium and the performance of factor timing investment with them. Out-of-sample R-squared shows that more flexible models have better performance in explaining the variance in factor premium of the unseen period, and the back testing affirms that the factor timing based on more flexible models tends to over perform the ones with linear models. However, for flexible models like neural networks, the optimal weights based on their prediction tend to be unstable, which can lead to high transaction costs and market impacts. We verify that tilting down the rebalance frequency according to the historical optimal rebalancing scheme can help reduce the transaction costs.

Updated: 2024-04-27 21:57:17

标题: 资产管理中深度学习在因子择时中的应用

摘要: 本文研究了回归模型（OLS线性回归，岭回归，随机森林和全连接神经网络）在预测CMA（保守减激进）因子溢价和利用它们进行因子定时投资的表现。外样本R方显示，更灵活的模型在解释未知时期因子溢价的方差方面表现更好，并且回测证实，基于更灵活模型的因子定时往往表现优于线性模型。然而，对于像神经网络这样的灵活模型，基于其预测的最优权重往往不稳定，这可能导致较高的交易成本和市场影响。我们验证了根据历史最优再平衡方案调低再平衡频率可以帮助减少交易成本。

更新时间: 2024-04-27 21:57:17

领域: q-fin.PM,cs.LG,q-fin.CP

下载: http://arxiv.org/abs/2404.18017v1

Adaptive Interventions with User-Defined Goals for Health Behavior Change

Promoting healthy lifestyle behaviors remains a major public health concern, particularly due to their crucial role in preventing chronic conditions such as cancer, heart disease, and type 2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable health behavior change promotion. Researchers are increasingly exploring adaptive algorithms that personalize interventions to each person's unique context. However, in empirical studies, mobile health applications often suffer from small effect sizes and low adherence rates, particularly in comparison to human coaching. Tailoring advice to a person's unique goals, preferences, and life circumstances is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions. To address this, we introduce a new Thompson sampling algorithm that can accommodate personalized reward functions (i.e., goals, preferences, and constraints), while also leveraging data sharing across individuals to more quickly be able to provide effective recommendations. We prove that our modification incurs only a constant penalty on cumulative regret while preserving the sample complexity benefits of data sharing. We present empirical results on synthetic and semi-synthetic physical activity simulators, where in the latter we conducted an online survey to solicit preference data relating to physical activity, which we use to construct realistic reward models that leverages historical data from another study. Our algorithm achieves substantial performance improvements compared to baselines that do not share data or do not optimize for individualized rewards.

Updated: 2024-04-27 21:17:56

标题: 用户定义目标的适应性健康行为改变干预

摘要: 促进健康生活方式行为仍然是一项重要的公共卫生问题，特别是由于它们在预防癌症、心脏病和2型糖尿病等慢性疾病中的关键作用。移动健康应用程序为低成本、可扩展的健康行为改变促进提供了一个有前途的途径。研究人员越来越多地探索个性化干预算法，以个性化地定制干预措施，以适应每个人的独特背景。然而，在实证研究中，移动健康应用程序经常受到效果小和低依从率的困扰，特别是与人类辅导相比。将建议个性化到个人的独特目标、偏好和生活环境是健康辅导的一个关键组成部分，但在移动健康干预的自适应算法中被低估了。为了解决这个问题，我们引入了一种新的汤姆森抽样算法，它可以适应个性化的奖励函数（即目标、偏好和约束），同时利用数据共享跨个人，更快地能够提供有效的建议。我们证明了我们的修改只会在累积遗憾上产生一个常数惩罚，同时保留数据共享的样本复杂性优势。我们在合成和半合成的体育活动模拟器上提供了实证结果，后者我们进行了一项在线调查，征求与体育活动相关的偏好数据，我们利用这些数据构建了利用另一项研究的历史数据的现实奖励模型。我们的算法相比于不共享数据或不优化个性化奖励的基线实现了显著的性能改进。

更新时间: 2024-04-27 21:17:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.09483v3

Implicit Generative Prior for Bayesian Neural Networks

Predictive uncertainty quantification is crucial for reliable decision-making in various applied domains. Bayesian neural networks offer a powerful framework for this task. However, defining meaningful priors and ensuring computational efficiency remain significant challenges, especially for complex real-world applications. This paper addresses these challenges by proposing a novel neural adaptive empirical Bayes (NA-EB) framework. NA-EB leverages a class of implicit generative priors derived from low-dimensional distributions. This allows for efficient handling of complex data structures and effective capture of underlying relationships in real-world datasets. The proposed NA-EB framework combines variational inference with a gradient ascent algorithm. This enables simultaneous hyperparameter selection and approximation of the posterior distribution, leading to improved computational efficiency. We establish the theoretical foundation of the framework through posterior and classification consistency. We demonstrate the practical applications of our framework through extensive evaluations on a variety of tasks, including the two-spiral problem, regression, 10 UCI datasets, and image classification tasks on both MNIST and CIFAR-10 datasets. The results of our experiments highlight the superiority of our proposed framework over existing methods, such as sparse variational Bayesian and generative models, in terms of prediction accuracy and uncertainty quantification.

Updated: 2024-04-27 21:00:38

标题: 贝叶斯神经网络的隐式生成先验

摘要: 预测不确定性量化对于各种应用领域中可靠的决策至关重要。贝叶斯神经网络为此任务提供了一个强大的框架。然而，定义有意义的先验并确保计算效率仍然是重要挑战，特别是对于复杂的现实世界应用。本文通过提出一种新颖的神经自适应经验贝叶斯（NA-EB）框架来解决这些挑战。NA-EB利用从低维分布导出的一类隐式生成先验。这允许对复杂数据结构进行有效处理，并有效地捕捉现实世界数据集中的潜在关系。所提出的NA-EB框架结合了变分推断和梯度上升算法。这使得可以同时进行超参数选择和后验分布的近似，从而提高了计算效率。通过后验和分类一致性建立了该框架的理论基础。我们通过对各种任务的广泛评估来展示我们框架的实际应用，包括两螺旋问题、回归、10个UCI数据集以及MNIST和CIFAR-10数据集上的图像分类任务。我们实验结果突显了我们提出的框架在预测准确性和不确定性量化方面优于现有方法，如稀疏变分贝叶斯和生成模型。

更新时间: 2024-04-27 21:00:38

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2404.18008v1

LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing

Logs are important in modern software development with runtime information. Log parsing is the first step in many log-based analyses, that involve extracting structured information from unstructured log data. Traditional log parsers face challenges in accurately parsing logs due to the diversity of log formats, which directly impacts the performance of downstream log-analysis tasks. In this paper, we explore the potential of using Large Language Models (LLMs) for log parsing and propose LLMParser, an LLM-based log parser based on generative LLMs and few-shot tuning. We leverage four LLMs, Flan-T5-small, Flan-T5-base, LLaMA-7B, and ChatGLM-6B in LLMParsers. Our evaluation of 16 open-source systems shows that LLMParser achieves statistically significantly higher parsing accuracy than state-of-the-art parsers (a 96% average parsing accuracy). We further conduct a comprehensive empirical analysis on the effect of training size, model size, and pre-training LLM on log parsing accuracy. We find that smaller LLMs may be more effective than more complex LLMs; for instance where Flan-T5-base achieves comparable results as LLaMA-7B with a shorter inference time. We also find that using LLMs pre-trained using logs from other systems does not always improve parsing accuracy. While using pre-trained Flan-T5-base shows an improvement in accuracy, pre-trained LLaMA results in a decrease (decrease by almost 55% in group accuracy). In short, our study provides empirical evidence for using LLMs for log parsing and highlights the limitations and future research direction of LLM-based log parsers.

Updated: 2024-04-27 20:34:29

标题: LLMParser：使用大型语言模型进行日志解析的探索性研究

摘要: 日志在现代软件开发中具有重要意义，可以提供运行时信息。日志解析是许多基于日志的分析的第一步，涉及从非结构化日志数据中提取结构化信息。传统的日志解析器面临着准确解析日志的挑战，因为日志格式的多样性直接影响了下游日志分析任务的性能。在本文中，我们探讨了使用大型语言模型（LLMs）进行日志解析的潜力，并提出了基于生成式LLMs和少样本调整的LLMParser，一个基于LLMs的日志解析器。我们利用了四个LLMs，Flan-T5-small、Flan-T5-base、LLaMA-7B和ChatGLM-6B在LLMParsers中。我们对16个开源系统进行评估，结果显示LLMParser的解析准确性显著高于最先进的解析器（平均解析准确率为96%）。我们进一步进行了关于训练大小、模型大小和预训练LLM对日志解析准确性影响的全面实证分析。我们发现较小的LLMs可能比更复杂的LLMs更有效；例如，Flan-T5-base在较短的推理时间内实现了与LLaMA-7B相当的结果。我们还发现，使用从其他系统日志中预训练的LLMs并不总是提高解析准确性。虽然使用预训练的Flan-T5-base显示了准确性的提高，但预训练的LLaMA导致准确性下降（组准确性减少了近55%）。总之，我们的研究为使用LLMs进行日志解析提供了实证证据，并强调了基于LLM的日志解析器的局限性和未来研究方向。

更新时间: 2024-04-27 20:34:29

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.18001v1

MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch

Accurate representation of medical information is crucial for patient safety, yet artificial intelligence (AI) systems, such as Large Language Models (LLMs), encounter challenges in error-free clinical text interpretation. This paper presents a novel approach submitted to the MEDIQA-CORR 2024 shared task (Ben Abacha et al., 2024a), focusing on the automatic correction of single-word errors in clinical notes. Unlike LLMs that rely on extensive generic data, our method emphasizes extracting contextually relevant information from available clinical text data. Leveraging an ensemble of extractive and abstractive question-answering approaches, we construct a supervised learning framework with domain-specific feature engineering. Our methodology incorporates domain expertise to enhance error correction accuracy. By integrating domain expertise and prioritizing meaningful information extraction, our approach underscores the significance of a human-centric strategy in adapting AI for healthcare.

Updated: 2024-04-27 20:28:38

标题: MEDIQA-CORR 2024中的MediFact：为什么AI需要人类的触摸

摘要: 医疗信息的准确表达对患者的安全至关重要，然而人工智能系统（如大型语言模型LLMs）在无误差地解释临床文本方面面临挑战。本文介绍了一种新颖的方法，提交给了MEDIQA-CORR 2024共享任务（Ben Abacha等人，2024a），重点关注于对临床记录中单词错误的自动校正。与依赖广泛通用数据的LLMs不同，我们的方法强调从可用临床文本数据中提取上下文相关信息。通过利用抽取式和生成式问答方法的集成，我们构建了一个具有领域特定特征工程的监督学习框架。我们的方法融入了领域专业知识以提高错误校正的准确性。通过整合领域专业知识并优先考虑有意义的信息提取，我们的方法强调了在将人工智能应用于医疗保健领域中采用以人为中心策略的重要性。

更新时间: 2024-04-27 20:28:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.17999v1

Optimal Initialization of Batch Bayesian Optimization

Field experiments and computer simulations are effective but time-consuming methods of measuring the quality of engineered systems at different settings. To reduce the total time required, experimenters may employ Bayesian optimization, which is parsimonious with measurements, and take measurements of multiple settings simultaneously, in a batch. In practice, experimenters use very few batches, thus, it is imperative that each batch be as informative as possible. Typically, the initial batch in a Batch Bayesian Optimization (BBO) is constructed from a quasi-random sample of settings values. We propose a batch-design acquisition function, Minimal Terminal Variance (MTV), that designs a batch by optimization rather than random sampling. MTV adapts a design criterion function from Design of Experiments, called I-Optimality, which minimizes the variance of the post-evaluation estimates of quality, integrated over the entire space of settings. MTV weights the integral by the probability that a setting is optimal, making it able to design not only an initial batch but all subsequent batches, as well. Applicability to both initialization and subsequent batches is novel among acquisition functions. Numerical experiments on test functions and simulators show that MTV compares favorably to other BBO methods.

Updated: 2024-04-27 20:16:58

标题: Batch贝叶斯优化的最佳初始化

摘要: 实地实验和计算机模拟是测量不同设置下工程系统质量的有效但耗时的方法。为了减少总所需时间，实验者可以采用贝叶斯优化，该方法在测量上较为节俭，并能同时对多个设置进行批量测量。在实践中，实验者往往只使用很少的批次，因此每个批次尽可能具有信息量是至关重要的。通常，在批量贝叶斯优化（BBO）中，初始批次是由准随机样本的设置值构建的。我们提出了一种批次设计获取函数，名为最小末端方差（MTV），该函数通过优化而非随机抽样来设计批次。MTV从一种称为I-Optimality的实验设计准则函数中适应，该函数最小化质量后评估估计的方差，整合了整个设置空间。MTV通过设置最优的概率对该积分进行加权，使其能够设计不仅是初始批次，而且是所有随后的批次。在获取函数中，适用于初始设置和随后批次的情况是新颖的。对测试函数和模拟器的数值实验表明，MTV与其他BBO方法相比表现优异。

更新时间: 2024-04-27 20:16:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.17997v1

CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention

In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatiotemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods.

Updated: 2024-04-27 20:09:40

标题: CUE-Net：使用空间裁剪、增强的UniformerV2和修改的高效注意力机制进行暴力检测视频分析

摘要: 在这篇论文中，我们介绍了CUE-Net，这是一种新颖的架构，专为视频监控中的自动暴力检测设计。随着技术进步和成本降低，监控系统变得越来越普遍，有效监控大量视频数据的挑战变得更加紧迫。CUE-Net通过将空间裁剪与UniformerV2架构的增强版本相结合，整合了卷积和自注意机制以及一种新颖的修正高效加性注意机制（可降低自注意的二次时间复杂度），从而有效高效地识别暴力活动。该方法旨在克服传统挑战，如在视频帧中捕捉远距离或部分遮挡的主体。通过关注局部和全局时空特征，CUE-Net在RWF-2000和RLVS数据集上取得了最先进的性能，超过了现有方法。

更新时间: 2024-04-27 20:09:40

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.18952v1

MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning

The MEDIQA-M3G 2024 challenge necessitates novel solutions for Multilingual & Multimodal Medical Answer Generation in dermatology (wai Yim et al., 2024a). This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA). Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual (English, Chinese, Spanish) learning of informative skin condition representations. Using pre-trained QA models, we further bridge the gap between visual and textual information through multimodal fusion. This approach tackles complex, open-ended questions even without predefined answer choices. We empower the generation of comprehensive answers by feeding the ViT-CLIP model with multiple responses alongside images. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.

Updated: 2024-04-27 20:03:47

标题: 2024年MEDIQA-M3G的MediFact：多模态学习在皮肤科医学问答中的应用

摘要: MEDIQA-M3G 2024挑战需要针对多语言和多模态医学答案生成在皮肤科（wai Yim等，2024a）的新颖解决方案。本文通过提出一种弱监督学习方法来解决传统方法的局限性，用于开放式医学问答（QA）。我们的系统利用现成的MEDIQA-M3G图像，通过VGG16-CNN-SVM模型实现多语言（英语、中文、西班牙语）学习有信息量的皮肤状况表示。通过使用预训练的QA模型，我们进一步通过多模态融合来弥合视觉和文本信息之间的鸿沟。这种方法处理复杂的开放式问题，甚至没有预定义的答案选择。我们通过将ViT-CLIP模型与多个响应和图像一起输入来增强生成全面答案的能力。这项工作推动了医学QA研究的发展，为临床决策支持系统铺平道路，最终提高了医疗保健的交付水平。

更新时间: 2024-04-27 20:03:47

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.01583v1

TabVFL: Improving Latent Representation in Vertical Federated Learning

Autoencoders are popular neural networks that are able to compress high dimensional data to extract relevant latent information. TabNet is a state-of-the-art neural network model designed for tabular data that utilizes an autoencoder architecture for training. Vertical Federated Learning (VFL) is an emerging distributed machine learning paradigm that allows multiple parties to train a model collaboratively on vertically partitioned data while maintaining data privacy. The existing design of training autoencoders in VFL is to train a separate autoencoder in each participant and aggregate the latent representation later. This design could potentially break important correlations between feature data of participating parties, as each autoencoder is trained on locally available features while disregarding the features of others. In addition, traditional autoencoders are not specifically designed for tabular data, which is ubiquitous in VFL settings. Moreover, the impact of client failures during training on the model robustness is under-researched in the VFL scene. In this paper, we propose TabVFL, a distributed framework designed to improve latent representation learning using the joint features of participants. The framework (i) preserves privacy by mitigating potential data leakage with the addition of a fully-connected layer, (ii) conserves feature correlations by learning one latent representation vector, and (iii) provides enhanced robustness against client failures during training phase. Extensive experiments on five classification datasets show that TabVFL can outperform the prior work design, with 26.12% of improvement on f1-score.

Updated: 2024-04-27 19:40:35

标题: TabVFL：改善垂直联邦学习中的潜在表示

摘要: 自动编码器是流行的神经网络，能够压缩高维数据以提取相关的潜在信息。 TabNet是一种用于表格数据的最先进的神经网络模型，利用自动编码器架构进行训练。垂直联邦学习（VFL）是一种新兴的分布式机器学习范式，允许多方在垂直分区数据上协作训练模型，同时保持数据隐私。现有的在VFL中训练自动编码器的设计是在每个参与者中训练一个单独的自动编码器，然后在后期聚合潜在表示。这种设计可能会破坏参与方特征数据之间的重要相关性，因为每个自动编码器是在本地可用的特征上训练的，而忽略了其他特征。此外，传统的自动编码器并非专门为在VFL环境中普遍存在的表格数据而设计。此外，在VFL场景中对训练期间客户端故障对模型稳健性的影响尚未得到充分研究。在本文中，我们提出了TabVFL，一个设计用于改善利用参与者的联合特征学习潜在表示的分布式框架。该框架（i）通过添加一个全连接层来减轻潜在数据泄漏的可能性，从而保护隐私，（ii）通过学习一个潜在表示向量来保持特征相关性，并且（iii）在训练阶段提供增强的抗客户端故障能力。在五个分类数据集上进行的大量实验表明，TabVFL能够优于先前的设计，f1分数提高了26.12％。

更新时间: 2024-04-27 19:40:35

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2404.17990v1

InfoSec.pptx: A Longitudinal Study of Speakers, Topics, and Sponsors at Security Conferences in Academia and Industry

Security conferences are important venues at which academics and practitioners share knowledge about new attacks and state-of-the-art defenses. Despite this, researchers have not studied who shares information and about which security topics. To address this, our study characterizes the speakers, sponsors, and topics presented at the most prestigious academic and industry conferences. We collect a longitudinal data set that contains 9,728 abstracts and 1,686 sponsors across 4 academic and 6 industry conferences. There is limited knowledge sharing between industry and academia. Conferences vary significantly in the equality of how talks/authorship is distributed across individuals. The topics of academic and industry abstracts display consistent coverage of techniques within the MITRE ATT&CK framework. Top tier academic conferences, as well as DEFCON and Black Hat, inconsistently address the governance, response and recovery functions of the NIST Cybersecurity Framework. Commercial InfoSec and insurance conferences (RSA, Gartner, Advisen and NetDillgience) cover the framework more consistently. Prevention and detection remain the most common topic of talks, with no clear temporal trend.

Updated: 2024-04-27 19:39:50

标题: InfoSec.pptx：学术界和行业安全会议中演讲者、主题和赞助商的纵向研究

摘要: 安全会议是学术界和实践者分享有关新攻击和最新防御技术的重要场所。尽管如此，研究人员并没有研究是谁分享信息以及关于哪些安全主题。为了解决这个问题，我们的研究对最负盛名的学术和行业会议的发言人、赞助商和主题进行了描述。我们收集了一个包含4个学术和6个行业会议的长期数据集，其中包含了9,728个摘要和1,686个赞助商。行业和学术界之间的知识分享有限。会议在演讲/作者分配的公平性上有显著差异。学术和行业摘要的主题显示出对MITRE ATT＆CK框架内技术的一致覆盖。顶级学术会议，以及DEFCON和黑帽大会，不一致地涉及了NIST网络安全框架的治理、响应和恢复功能。商业信息安全和保险会议（RSA、Gartner、Advisen和NetDillgience）更一致地涵盖了该框架。预防和检测仍然是演讲的最常见主题，没有明显的时间趋势。

更新时间: 2024-04-27 19:39:50

领域: cs.CR,H.3.1; H.3.3; I.2.7; J.4; K.4.0; E.0

下载: http://arxiv.org/abs/2404.17989v1

Matching Patients to Clinical Trials with Large Language Models

Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient's eligibility on a criterion-by-criterion basis and then consolidates these predictions to assess the patient's eligibility for the target trial. We evaluate the trial-level prediction performance of TrialGPT on three publicly available cohorts of 184 patients with over 18,000 trial annotations. We also engaged three physicians to label over 1,000 patient-criterion pairs to assess its criterion-level prediction accuracy. Experimental results show that TrialGPT achieves a criterion-level accuracy of 87.3% with faithful explanations, close to the expert performance (88.7%-90.0%). The aggregated TrialGPT scores are highly correlated with human eligibility judgments, and they outperform the best-competing models by 32.6% to 57.2% in ranking and excluding clinical trials. Furthermore, our user study reveals that TrialGPT can significantly reduce the screening time (by 42.6%) in a real-life clinical trial matching task. These results and analyses have demonstrated promising opportunities for clinical trial matching with LLMs such as TrialGPT.

Updated: 2024-04-27 19:21:54

标题: 使用大型语言模型将患者与临床试验匹配

摘要: 在临床试验中，患者招募的挑战经常是阻碍因素。在这项工作中，我们介绍了TrialGPT，这是一种首创的大型语言模型（LLM）框架，用于协助患者与试验的匹配。给定患者记录，TrialGPT基于标准逐标准地预测患者的符合条件，并综合这些预测结果来评估患者是否符合目标试验的条件。我们评估了TrialGPT在三个公开可用的共184名患者的队列上的试验级别预测性能，这些患者有超过18,000个试验注释。我们还邀请三名医生标记了超过1,000对患者-标准，以评估其标准级别的预测准确性。实验结果表明，TrialGPT在标准级别上的准确率为87.3%，并提供了忠实的解释，接近专家的表现（88.7%-90.0%）。汇总的TrialGPT分数与人类的符合判断高度相关，并且在排除临床试验方面，其表现比最佳竞争模型提高了32.6%至57.2%。此外，我们的用户研究显示，TrialGPT可以在现实生活中的临床试验匹配任务中显著减少筛选时间（减少42.6%）。这些结果和分析展示了利用像TrialGPT这样的LLM进行临床试验匹配的有希望的机会。

更新时间: 2024-04-27 19:21:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2307.15051v4

Detection of Conspiracy Theories Beyond Keyword Bias in German-Language Telegram Using Large Language Models

The automated detection of conspiracy theories online typically relies on supervised learning. However, creating respective training data requires expertise, time and mental resilience, given the often harmful content. Moreover, available datasets are predominantly in English and often keyword-based, introducing a token-level bias into the models. Our work addresses the task of detecting conspiracy theories in German Telegram messages. We compare the performance of supervised fine-tuning approaches using BERT-like models with prompt-based approaches using Llama2, GPT-3.5, and GPT-4 which require little or no additional training data. We use a dataset of $\sim\!\! 4,000$ messages collected during the COVID-19 pandemic, without the use of keyword filters. Our findings demonstrate that both approaches can be leveraged effectively: For supervised fine-tuning, we report an F1 score of $\sim\!\! 0.8$ for the positive class, making our model comparable to recent models trained on keyword-focused English corpora. We demonstrate our model's adaptability to intra-domain temporal shifts, achieving F1 scores of $\sim\!\! 0.7$. Among prompting variants, the best model is GPT-4, achieving an F1 score of $\sim\!\! 0.8$ for the positive class in a zero-shot setting and equipped with a custom conspiracy theory definition.

Updated: 2024-04-27 19:17:31

标题: 使用大型语言模型在德语电报中检测超越关键词偏见的阴谋论

摘要: 自动检测在线阴谋论通常依赖于监督学习。然而，创建相应的训练数据需要专业知识、时间和心理韧性，因为通常会包含有害内容。此外，现有数据集主要是英语的，并且通常基于关键词，这会给模型引入一个令牌级别的偏见。我们的工作致力于检测德语Telegram消息中的阴谋论。我们比较了使用类似BERT模型进行监督微调的性能与使用Llama2、GPT-3.5和GPT-4进行基于提示的方法，这些方法需要很少或根本不需要额外的训练数据。我们使用了在COVID-19大流行期间收集的约4000条消息的数据集，没有使用关键词过滤器。我们的研究结果表明，这两种方法都可以有效利用：对于监督微调，我们报告了正类别的F1分数约为0.8，使得我们的模型与最近在基于关键词的英语语料库上训练的模型可比。我们展示了我们的模型对领域内时间变化的适应性，实现了约0.7的F1分数。在提示变体中，最佳模型是GPT-4，在零-shot设置中实现了正类别约0.8的F1分数，并配备了自定义的阴谋论定义。

更新时间: 2024-04-27 19:17:31

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2404.17985v1

Privacy-Preserving, Dropout-Resilient Aggregation in Decentralized Learning

Decentralized learning (DL) offers a novel paradigm in machine learning by distributing training across clients without central aggregation, enhancing scalability and efficiency. However, DL's peer-to-peer model raises challenges in protecting against inference attacks and privacy leaks. By forgoing central bottlenecks, DL demands privacy-preserving aggregation methods to protect data from 'honest but curious' clients and adversaries, maintaining network-wide privacy. Privacy-preserving DL faces the additional hurdle of client dropout, clients not submitting updates due to connectivity problems or unavailability, further complicating aggregation. This work proposes three secret sharing-based dropout resilience approaches for privacy-preserving DL. Our study evaluates the efficiency, performance, and accuracy of these protocols through experiments on datasets such as MNIST, Fashion-MNIST, SVHN, and CIFAR-10. We compare our protocols with traditional secret-sharing solutions across scenarios, including those with up to 1000 clients. Evaluations show that our protocols significantly outperform conventional methods, especially in scenarios with up to 30% of clients dropout and model sizes of up to $10^6$ parameters. Our approaches demonstrate markedly high efficiency with larger models, higher dropout rates, and extensive client networks, highlighting their effectiveness in enhancing decentralized learning systems' privacy and dropout robustness.

Updated: 2024-04-27 19:17:02

标题: 隐私保护、抗辍学的去中心化学习聚合

摘要: 分散学习（DL）通过在客户端之间分布训练而不进行中央聚合，提高了可扩展性和效率，为机器学习提供了一种新的范例。然而，DL的点对点模型提出了保护免受推理攻击和隐私泄漏的挑战。通过放弃中央瓶颈，DL需要保护隐私的聚合方法来保护数据免受“诚实但好奇”的客户和对手的攻击，从而维护网络范围的隐私。保护隐私的DL面临着客户退出的额外障碍，由于连接问题或不可用性，客户未能提交更新，进一步复杂化了聚合过程。本文提出了三种基于秘密共享的抗退出韧性方法，用于保护隐私的DL。我们通过在MNIST、Fashion-MNIST、SVHN和CIFAR-10等数据集上进行实验，评估了这些协议的效率、性能和准确性。我们将我们的协议与传统的秘密共享解决方案在各种情景下进行比较，包括具有多达1000个客户端的情况。评估结果表明，我们的协议在各种情景下明显优于传统方法，特别是在具有多达30%客户端退出和具有$10^6$个参数的模型大小的情况下。我们的方法在更大的模型、更高的退出率和更广泛的客户网络中表现出显著高的效率，突显了它们在增强分散学习系统的隐私和抗退出性能方面的有效性。

更新时间: 2024-04-27 19:17:02

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.17984v1

Softmax Attention with Constant Cost per Token

We propose a simple modification to the conventional attention mechanism applied by Transformers: Instead of quantifying pairwise query-key similarity with scaled dot-products, we quantify it with the logarithms of scaled dot-products of exponentials. Our modification linearizes attention with exponential kernel feature maps, whose corresponding feature function is infinite dimensional. We show that our modification is expressible as a composition of log-sums of exponentials, with a latent space of constant size, enabling application with constant time and space complexity per token. We implement our modification, verify that it works in practice, and conclude that it is a promising alternative to conventional attention.

Updated: 2024-04-27 19:03:14

标题: Softmax Attention with Constant Cost per Token （每个标记的恒定成本的Softmax注意力）

摘要: 我们提出了一种对Transformer应用的传统注意力机制进行简单修改的方法：我们不再用缩放点积来量化成对查询-键相似性，而是用指数的缩放点积的对数来量化。我们的修改通过指数核特征映射线性化了注意力，其对应的特征函数是无限维的。我们表明，我们的修改可以表达为对数求和的指数，具有恒定大小的潜在空间，使得每个令牌的应用具有恒定的时间和空间复杂度。我们实施了我们的修改，验证了它在实践中的有效性，并得出结论，它是传统注意力的一个有希望的替代方案。

更新时间: 2024-04-27 19:03:14

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.05843v2

Advancing Healthcare Automation: Multi-Agent Systems for Medical Necessity Justification

This paper explores the application of Swarm-Structured Multi-Agent Systems (MAS) to establish medical necessity, a process that involves a systematic review of patient-specific medical structured and unstructured data against clinical guidelines. We addressed this complex task by decomposing it into smaller, more manageable sub-tasks. Each sub-task is handled by a specialized AI agent. We conduct a systematic study of the impact of various prompting strategies on these agents and benchmark different Large Language Models (LLMs) to determine their accuracy in completing these tasks. Additionally, we investigate how these agents can provide explainability, thereby enhancing trust and transparency within the system.

Updated: 2024-04-27 18:40:05

标题: 推动医疗自动化：多智能体系统用于医疗必要性证明

摘要: 本文探讨了群体结构化多智能体系统（MAS）在建立医疗必要性方面的应用，该过程涉及对患者特定的医疗结构化和非结构化数据与临床指南进行系统性审查。我们通过将这一复杂任务分解为更小、更易管理的子任务来解决这一问题。每个子任务由专门的AI智能体处理。我们对各种提示策略对这些智能体的影响进行了系统研究，并对不同的大型语言模型（LLMs）进行基准测试，以确定它们在完成这些任务时的准确性。此外，我们还研究了这些智能体如何提供可解释性，从而增强系统内的信任和透明度。

更新时间: 2024-04-27 18:40:05

领域: cs.AI

下载: http://arxiv.org/abs/2404.17977v1

Self-Supervised Learning for Large-Scale Preventive Security Constrained DC Optimal Power Flow

Security-Constrained Optimal Power Flow (SCOPF) plays a crucial role in power grid stability but becomes increasingly complex as systems grow. This paper introduces PDL-SCOPF, a self-supervised end-to-end primal-dual learning framework for producing near-optimal solutions to large-scale SCOPF problems in milliseconds. Indeed, PDL-SCOPF remedies the limitations of supervised counterparts that rely on training instances with their optimal solutions, which becomes impractical for large-scale SCOPF problems. PDL-SCOPF mimics an Augmented Lagrangian Method (ALM) for training primal and dual networks that learn the primal solutions and the Lagrangian multipliers, respectively, to the unconstrained optimizations. In addition, PDL-SCOPF incorporates a repair layer to ensure the feasibility of the power balance in the nominal case, and a binary search layer to compute, using the Automatic Primary Response (APR), the generator dispatches in the contingencies. The resulting differentiable program can then be trained end-to-end using the objective function of the SCOPF and the power balance constraints of the contingencies. Experimental results demonstrate that the PDL-SCOPF delivers accurate feasible solutions with minimal optimality gaps. The framework underlying PDL-SCOPF aims at bridging the gap between traditional optimization methods and machine learning, highlighting the potential of self-supervised end-to-end primal-dual learning for large-scale optimization tasks.

Updated: 2024-04-27 18:36:55

标题: 自监督学习用于大规模预防性安全约束的直流最优功率流.

摘要: 安全受限最优潮流（SCOPF）在电网稳定性中起着至关重要的作用，但随着系统规模的增长，其复杂性也越来越大。本文介绍了PDL-SCOPF，这是一个自监督的端到端原始-对偶学习框架，能够在毫秒内产生大规模SCOPF问题的近优解。事实上，PDL-SCOPF弥补了依赖训练实例及其最优解的监督对应方法的局限性，这在大规模SCOPF问题中变得不切实际。PDL-SCOPF模拟了一个增广拉格朗日方法（ALM），用于训练原始和对偶网络，分别学习原始解和拉格朗日乘子，以进行无约束优化。此外，PDL-SCOPF还包含一个修复层，以确保名义情况下的功率平衡的可行性，以及一个二分查找层，使用自动主响应（APR）计算事故情况下的发电机分派。然后，可以使用SCOPF的目标函数和事故情况下的功率平衡约束来端到端训练产生可微分程序。实验结果表明，PDL-SCOPF提供了准确的可行解，最优性差距最小。PDL-SCOPF背后的框架旨在弥合传统优化方法和机器学习之间的差距，突显了自监督的端到端原始-对偶学习在大规模优化任务中的潜力。

更新时间: 2024-04-27 18:36:55

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2311.18072v2

Automating Customer Needs Analysis: A Comparative Study of Large Language Models in the Travel Industry

In the rapidly evolving landscape of Natural Language Processing (NLP), Large Language Models (LLMs) have emerged as powerful tools for many tasks, such as extracting valuable insights from vast amounts of textual data. In this study, we conduct a comparative analysis of LLMs for the extraction of travel customer needs from TripAdvisor posts. Leveraging a diverse range of models, including both open-source and proprietary ones such as GPT-4 and Gemini, we aim to elucidate their strengths and weaknesses in this specialized domain. Through an evaluation process involving metrics such as BERTScore, ROUGE, and BLEU, we assess the performance of each model in accurately identifying and summarizing customer needs. Our findings highlight the efficacy of opensource LLMs, particularly Mistral 7B, in achieving comparable performance to larger closed models while offering affordability and customization benefits. Additionally, we underscore the importance of considering factors such as model size, resource requirements, and performance metrics when selecting the most suitable LLM for customer needs analysis tasks. Overall, this study contributes valuable insights for businesses seeking to leverage advanced NLP techniques to enhance customer experience and drive operational efficiency in the travel industry.

Updated: 2024-04-27 18:28:10

标题: 自动化客户需求分析：旅游行业中大型语言模型的比较研究

摘要: 在不断发展的自然语言处理（NLP）领域，大型语言模型（LLMs）已经成为从大量文本数据中提取有价值见解的强大工具。在这项研究中，我们对LLMs进行了比较分析，用于从TripAdvisor帖子中提取旅行客户需求。利用各种模型，包括开源和专有模型，如GPT-4和Gemini，我们旨在阐明它们在这一专业领域中的优势和劣势。通过评估过程，包括BERTScore、ROUGE和BLEU等指标，我们评估了每个模型在准确识别和总结客户需求方面的表现。我们的调查结果突显了开源LLMs，特别是Mistral 7B，在达到与更大闭源模型相当的性能的同时，提供了价格实惠和定制化优势。此外，我们强调了在选择最适合客户需求分析任务的LLM时考虑模型大小、资源需求和性能指标等因素的重要性。总的来说，这项研究为寻求利用先进的NLP技术增强客户体验并推动旅行业运营效率的企业提供了宝贵的见解。

更新时间: 2024-04-27 18:28:10

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.17975v1

Multi-Task Wavelength-Multiplexed Reservoir Computing Using a Silicon Microring Resonator

Among the promising advantages of photonic computing over conventional computing architectures is the potential to increase computing efficiency through massive parallelism by using the many degrees of freedom provided by photonics. Here, we numerically demonstrate the simultaneous use of time and frequency (equivalently wavelength) multiplexing to solve three independent tasks at the same time on the same photonic circuit. In particular, we consider a microring-based time-delay reservoir computing (TDRC) scheme that simultaneously solves three tasks: Time-series prediction, classification, and wireless channel equalization. The scheme relies on time-division multiplexing to avoid the necessity of multiple physical nonlinear nodes, while the tasks are parallelized using wavelength division multiplexing (WDM). The input data modulated on each optical channel is mapped to a higher dimensional space by the nonlinear dynamics of the silicon microring cavity. The carrier wavelength and input power assigned to each optical channel have a high influence on the performance of its respective task. When all tasks operate under the same wavelength/power conditions, our results show that the computing nature of each task is the deciding factor of the level of performance achievable. However, it is possible to achieve good performance for all tasks simultaneously by optimizing the parameters of each optical channel. The variety of applications covered by the tasks shows the versatility of the proposed photonic TDRC scheme. Overall, this work provides insight into the potential of WDM-based schemes for improving the computing capabilities of reservoir computing schemes.

Updated: 2024-04-27 18:25:51

标题: 多任务波长复用的硅微环谐振器储备计算

摘要: 光子计算相对于传统计算架构的有希望的优势之一是通过利用光子提供的多个自由度，通过大规模并行性来提高计算效率的潜力。在这里，我们通过数值方法证明了同时使用时间和频率（等效波长）复用来解决同一光子电路上的三个独立任务的能力。具体来说，我们考虑了基于微环的时延储存计算（TDRC）方案，同时解决了三个任务：时间序列预测、分类和无线信道均衡。该方案依赖于时分复用来避免多个物理非线性节点的必要性，而任务则使用波分复用（WDM）进行并行化。每个光学通道上调制的输入数据通过硅微环腔的非线性动力学映射到更高维度空间。分配给每个光学通道的载波波长和输入功率对其各自任务的性能有很大影响。当所有任务在相同的波长/功率条件下运行时，我们的结果表明每个任务的计算性质是可实现性能水平的决定因素。然而，通过优化每个光学通道的参数，可以同时实现所有任务的良好性能。任务涵盖的各种应用显示了所提出的光子TDRC方案的多功能性。总的来说，这项工作为基于WDM的方案改善储存计算方案的计算能力提供了见解。

更新时间: 2024-04-27 18:25:51

领域: cs.NE,cs.ET,cs.LG,physics.optics

下载: http://arxiv.org/abs/2310.16588v2

Privacy-Preserving Aggregation for Decentralized Learning with Byzantine-Robustness

Decentralized machine learning (DL) has been receiving an increasing interest recently due to the elimination of a single point of failure, present in Federated learning setting. Yet, it is threatened by the looming threat of Byzantine clients who intentionally disrupt the learning process by broadcasting arbitrary model updates to other clients, seeking to degrade the performance of the global model. In response, robust aggregation schemes have emerged as promising solutions to defend against such Byzantine clients, thereby enhancing the robustness of Decentralized Learning. Defenses against Byzantine adversaries, however, typically require access to the updates of other clients, a counterproductive privacy trade-off that in turn increases the risk of inference attacks on those same model updates. In this paper, we introduce SecureDL, a novel DL protocol designed to enhance the security and privacy of DL against Byzantine threats. SecureDL~facilitates a collaborative defense, while protecting the privacy of clients' model updates through secure multiparty computation. The protocol employs efficient computation of cosine similarity and normalization of updates to robustly detect and exclude model updates detrimental to model convergence. By using MNIST, Fashion-MNIST, SVHN and CIFAR-10 datasets, we evaluated SecureDL against various Byzantine attacks and compared its effectiveness with four existing defense mechanisms. Our experiments show that SecureDL is effective even in the case of attacks by the malicious majority (e.g., 80% Byzantine clients) while preserving high training accuracy.

Updated: 2024-04-27 18:17:36

标题: 隐私保护的去中心化学习与拜占庭容错的聚合

摘要: 最近，分散式机器学习（DL）因消除联合学习设置中存在的单点故障而受到越来越多的关注。然而，它面临着拜占庭客户的潜在威胁，这些客户有意通过向其他客户广播任意模型更新来破坏学习过程，以降低全局模型的性能。作为应对，强大的聚合方案已经成为防御此类拜占庭客户的有希望解决方案，从而增强了分散式学习的稳健性。然而，对抗拜占庭对手通常需要访问其他客户的更新，这是一种得不偿失的隐私权衡，反过来增加了对这些模型更新的推断攻击风险。在本文中，我们介绍了SecureDL，这是一种旨在增强DL安全性和隐私性以对抗拜占庭威胁的新型DL协议。SecureDL通过安全多方计算促进协同防御，同时通过有效计算余弦相似度和更新的归一化来检测并排除对模型收敛有害的模型更新。通过使用MNIST、Fashion-MNIST、SVHN和CIFAR-10数据集，我们评估了SecureDL对各种拜占庭攻击的效果，并将其与四种现有的防御机制进行了比较。我们的实验证明，即使在恶意大多数攻击的情况下（例如80%的拜占庭客户），SecureDL也能有效地保持高训练精度。

更新时间: 2024-04-27 18:17:36

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.17970v1

Relay Mining: Incentivizing Full Non-Validating Nodes Servicing All RPC Types

Relay Mining presents a scalable solution employing probabilistic mechanisms, crypto-economic incentives, and new cryptographic primitives to estimate and prove the volume of Remote Procedure Calls (RPCs) made from a client to a server. Distributed ledgers are designed to secure permissionless state transitions (writes), highlighting a gap for incentivizing full non-validating nodes to service non-transactional (read) RPCs. This leads applications to have a dependency on altruistic or centralized off-chain Node RPC Providers. We present a solution that enables multiple RPC providers to service requests from independent applications on a permissionless network. We leverage digital signatures, commit-and-reveal schemes, and Sparse Merkle Sum Tries (SMSTs) to prove the amount of work done. This is enabled through the introduction of a novel ClosestMerkleProof proof-of-inclusion scheme. A native cryptocurrency on a distributed ledger is used to rate limit applications and disincentivize over-usage. Building upon established research in token bucket algorithms and distributed rate-limiting penalty models, our approach harnesses a feedback loop control mechanism to adjust the difficulty of mining relay rewards, dynamically scaling with network usage growth. By leveraging crypto-economic incentives, we reduce coordination overhead costs and introduce a mechanism for providing RPC services that are both geopolitically and geographically distributed. We use common formulations from rate limiting research to demonstrate how this solution in the Web3 ecosystem translates to distributed verifiable multi-tenant rate limiting in Web2.

Updated: 2024-04-27 18:06:00

标题: Relay Mining：激励全面非验证节点为所有RPC类型提供服务

摘要: Relay Mining提出了一种可扩展的解决方案，利用概率机制、加密经济激励和新的密码原语来估计和证明从客户端到服务器端发出的远程过程调用（RPC）的数量。分布式账本旨在保护无许可状态转换（写入），强调了激励全非验证节点为非事务性（读取）RPC提供服务的空白。这导致应用程序对利他主义或集中式的链下节点RPC提供者产生依赖。我们提出了一种解决方案，使多个RPC提供者能够在无许可网络上为独立应用程序提供服务。我们利用数字签名、提交和揭示方案以及Sparse Merkle Sum Tries（SMSTs）来证明完成的工作量。这是通过引入一种新颖的ClosestMerkleProof包含证明方案来实现的。分布式账本上的本地加密货币用于限制应用程序的使用率并减少过度使用的动机。借鉴令牌桶算法和分布式速率限制惩罚模型的成熟研究，我们的方法利用反馈环控制机制来调整挖矿中继奖励的难度，随着网络使用量的增长动态扩展。通过利用加密经济激励，我们降低了协调开销成本，并引入了一种提供RPC服务的机制，既在地缘政治上分布，又在地理上分布。我们使用速率限制研究中的常见公式来演示在Web3生态系统中该解决方案如何转化为Web2中的分布式可验证多租户速率限制。

更新时间: 2024-04-27 18:06:00

领域: cs.DC,cs.CR,cs.SY,eess.SY

下载: http://arxiv.org/abs/2305.10672v2

Usefulness of Emotional Prosody in Neural Machine Translation

Neural Machine Translation (NMT) is the task of translating a text from one language to another with the use of a trained neural network. Several existing works aim at incorporating external information into NMT models to improve or control predicted translations (e.g. sentiment, politeness, gender). In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice. This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions. Our proposed method follows a two-stage procedure. At first, we select a state-of-the-art Speech Emotion Recognition (SER) model to predict dimensional emotion values from all input audio in the dataset. Then, we use these predicted emotions as source tokens added at the beginning of input texts to train our NMT model. We show that integrating emotion information, especially arousal, into NMT systems leads to better translations.

Updated: 2024-04-27 18:04:28

标题: 情感语调在神经机器翻译中的实用性

摘要: 神经机器翻译（NMT）是利用训练过的神经网络将一种语言的文本翻译成另一种语言的任务。一些现有的工作旨在将外部信息纳入NMT模型中，以改进或控制预测的翻译（例如情感、礼貌、性别）。在这项工作中，我们提出通过增加另一个外部信息来源来提高翻译质量：自动识别的语音情感。这项工作的动机是基于每种情感与特定词汇表相关联的假设，这些词汇可能在不同情感之间重叠。我们提出的方法遵循两阶段过程。首先，我们选择一种最先进的语音情感识别（SER）模型，从数据集中的所有输入音频中预测维度情感值。然后，我们将这些预测情感作为源标记添加到输入文本的开头，以训练我们的NMT模型。我们展示了将情感信息尤其是唤醒信息整合到NMT系统中会导致更好的翻译。

更新时间: 2024-04-27 18:04:28

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2404.17968v1

Exploring Intrinsic Properties of Medical Images for Self-Supervised Binary Semantic Segmentation

Recent advancements in self-supervised learning have unlocked the potential to harness unlabeled data for auxiliary tasks, facilitating the learning of beneficial priors. This has been particularly advantageous in fields like medical image analysis, where labeled data are scarce. Although effective for classification tasks, this methodology has shown limitations in more complex applications, such as medical image segmentation. In this paper, we introduce Medical imaging Enhanced with Dynamic Self-Adaptive Semantic Segmentation (MedSASS), a dedicated self-supervised framework tailored for medical image segmentation. We evaluate MedSASS against existing state-of-the-art methods across four diverse medical datasets, showcasing its superiority. MedSASS outperforms existing CNN-based self-supervised methods by 3.83% and matches the performance of ViT-based methods. Furthermore, when MedSASS is trained end-to-end, covering both encoder and decoder, it demonstrates significant improvements of 14.4% for CNNs and 6% for ViT-based architectures compared to existing state-of-the-art self-supervised strategies.

Updated: 2024-04-27 18:04:11

标题: 探索医学图像的固有属性，以实现自监督的二进制语义分割

摘要: 最近自监督学习的进展已经释放出利用未标记数据进行辅助任务学习的潜力，促进了有益先验知识的学习。这在医学图像分析等领域尤为有利，因为标记数据稀缺。尽管对于分类任务有效，但这种方法在更复杂的应用中，如医学图像分割中显示出限制。在本文中，我们介绍了专门针对医学图像分割定制的自监督框架MedSASS（Medical imaging Enhanced with Dynamic Self-Adaptive Semantic Segmentation）。我们通过对比四个不同的医学数据集，展示了MedSASS优越性。MedSASS的表现比现有基于CNN的自监督方法提高了3.83%，并与基于ViT的方法持平。此外，当MedSASS进行端到端训练，涵盖了编码器和解码器时，与现有最先进的自监督策略相比，CNN的性能提高了14.4%，ViT-based架构的性能提高了6%。

更新时间: 2024-04-27 18:04:11

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.02367v2

Deep Learning for Low-Latency, Quantum-Ready RF Sensing

Recent work has shown the promise of applying deep learning to enhance software processing of radio frequency (RF) signals. In parallel, hardware developments with quantum RF sensors based on Rydberg atoms are breaking longstanding barriers in frequency range, resolution, and sensitivity. In this paper, we describe our implementations of quantum-ready machine learning approaches for RF signal classification. Our primary objective is latency: while deep learning offers a more powerful computational paradigm, it also traditionally incurs latency overheads that hinder wider scale deployment. Our work spans three axes. (1) A novel continuous wavelet transform (CWT) based recurrent neural network (RNN) architecture that enables flexible online classification of RF signals on-the-fly with reduced sampling time. (2) Low-latency inference techniques for both GPU and CPU that span over 100x reductions in inference time, enabling real-time operation with sub-millisecond inference. (3) Quantum-readiness validated through application of our models to physics-based simulation of Rydberg atom QRF sensors. Altogether, our work bridges towards next-generation RF sensors that use quantum technology to surpass previous physical limits, paired with latency-optimized AI/ML software that is suitable for real-time deployment.

Updated: 2024-04-27 17:22:12

标题: 深度学习用于低延迟、量子就绪的射频感知

摘要: 最近的研究表明，将深度学习应用于增强无线电频率（RF）信号的软件处理具有很大潜力。与此同时，基于Rydberg原子的量子RF传感器的硬件发展正打破在频率范围、分辨率和灵敏度方面长期存在的障碍。在本文中，我们描述了我们对RF信号分类实施的量子准备机器学习方法。我们的主要目标是延迟：虽然深度学习提供了更强大的计算范式，但传统上也会产生延迟开销，阻碍了更广泛的部署。我们的工作涵盖了三个方面。(1)基于连续小波变换（CWT）的递归神经网络（RNN）架构，能够实现灵活的在线RF信号分类，缩短采样时间。(2)GPU和CPU的低延迟推断技术，将推断时间缩短超过100倍，实现亚毫秒推断的实时操作。(3)通过将我们的模型应用于基于物理的Rydberg原子QRF传感器的模拟，验证了量子准备性。总的来说，我们的工作致力于实现利用量子技术超越以往物理限制的下一代RF传感器，配以适用于实时部署的延迟优化的AI/ML软件。

更新时间: 2024-04-27 17:22:12

领域: quant-ph,cs.AI,cs.LG,cs.PF,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.17962v1

PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis

Cybersecurity is one of the global issues because of the extensive dependence on cyber systems of individuals, industries, and organizations. Among the cyber attacks, phishing is increasing tremendously and affecting the global economy. Therefore, this phenomenon highlights the vital need for enhancing user awareness and robust support at both individual and organizational levels. Phishing URL identification is the best way to address the problem. Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs. However, these approaches often need more convincing accuracy and rely on datasets consisting of limited samples. Furthermore, these black box intelligent models decision to detect suspicious URLs needs proper explanation to understand the features affecting the output. To address the issues, we propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data. The proposed model outperforms existing works by attaining an accuracy of 99.85%. Additionally, our explainability analysis highlights certain features that significantly contribute to identifying the phishing URL.

Updated: 2024-04-27 17:13:49

标题: PhishGuard: 一种基于卷积神经网络的模型，用于检测网络钓鱼URL并进行可解释性分析

摘要: 网络安全是全球性问题之一，因为个人、工业和组织对网络系统的广泛依赖。在各种网络攻击中，网络钓鱼攻击急剧增加，影响全球经济。因此，这种现象突显了在个人和组织层面提升用户意识和强大支持的重要需求。识别网络钓鱼URL是解决问题的最佳方式。已经提出了各种机器学习和深度学习方法来自动检测网络钓鱼URL。然而，这些方法通常需要更有说服力的准确性，并且依赖于包含有限样本的数据集。此外，这些黑盒智能模型检测可疑URL的决策需要适当的解释，以了解影响输出的特征。为了解决这些问题，我们提出了一个1D卷积神经网络（CNN），并使用大量特征和数据训练了模型。提出的模型通过达到99.85%的准确率超越了现有作品。此外，我们的可解释性分析突出了一些显著贡献于识别网络钓鱼URL的特征。

更新时间: 2024-04-27 17:13:49

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.17960v1

Cauchy-Schwarz Divergence Information Bottleneck for Regression

The information bottleneck (IB) approach is popular to improve the generalization, robustness and explainability of deep neural networks. Essentially, it aims to find a minimum sufficient representation $\mathbf{t}$ by striking a trade-off between a compression term $I(\mathbf{x};\mathbf{t})$ and a prediction term $I(y;\mathbf{t})$, where $I(\cdot;\cdot)$ refers to the mutual information (MI). MI is for the IB for the most part expressed in terms of the Kullback-Leibler (KL) divergence, which in the regression case corresponds to prediction based on mean squared error (MSE) loss with Gaussian assumption and compression approximated by variational inference. In this paper, we study the IB principle for the regression problem and develop a new way to parameterize the IB with deep neural networks by exploiting favorable properties of the Cauchy-Schwarz (CS) divergence. By doing so, we move away from MSE-based regression and ease estimation by avoiding variational approximations or distributional assumptions. We investigate the improved generalization ability of our proposed CS-IB and demonstrate strong adversarial robustness guarantees. We demonstrate its superior performance on six real-world regression tasks over other popular deep IB approaches. We additionally observe that the solutions discovered by CS-IB always achieve the best trade-off between prediction accuracy and compression ratio in the information plane. The code is available at \url{https://github.com/SJYuCNEL/Cauchy-Schwarz-Information-Bottleneck}.

Updated: 2024-04-27 16:13:05

标题: 调制回归的Cauchy-Schwarz散度信息瓶颈

摘要: 信息瓶颈（IB）方法流行于改善深度神经网络的泛化能力、稳健性和可解释性。本质上，它旨在通过在压缩项$I(\mathbf{x};\mathbf{t})$和预测项$I(y;\mathbf{t})$之间进行权衡来找到最小的充分表示$\mathbf{t}$，其中$I(\cdot;\cdot)$指的是互信息（MI）。互信息在IB中大多以Kullback-Leibler（KL）散度的形式表达，其在回归情况下对应于基于均方误差（MSE）损失和高斯假设的预测，以及通过变分推断近似的压缩。在本文中，我们研究了回归问题的IB原理，并通过利用柯西-施瓦茨（CS）散度的有利特性，开发了一种利用深度神经网络参数化IB的新方法。通过这样做，我们摆脱了基于MSE的回归，并通过避免变分逼近或分布假设来简化估计。我们研究了我们提出的CS-IB的改进泛化能力，并展示了强大的对抗稳健性保证。我们在六个真实世界的回归任务上展示了其优越性能，超过了其他流行的深度IB方法。此外，我们发现CS-IB发现的解决方案总是在信息平面上实现最佳的预测准确性和压缩比之间的权衡。代码可在\url{https://github.com/SJYuCNEL/Cauchy-Schwarz-Information-Bottleneck}上找到。

更新时间: 2024-04-27 16:13:05

领域: cs.LG,cs.IT,math.IT,stat.ML

下载: http://arxiv.org/abs/2404.17951v1

Bounding the Expected Robustness of Graph Neural Networks Subject to Node Feature Attacks

Graph Neural Networks (GNNs) have demonstrated state-of-the-art performance in various graph representation learning tasks. Recently, studies revealed their vulnerability to adversarial attacks. In this work, we theoretically define the concept of expected robustness in the context of attributed graphs and relate it to the classical definition of adversarial robustness in the graph representation learning literature. Our definition allows us to derive an upper bound of the expected robustness of Graph Convolutional Networks (GCNs) and Graph Isomorphism Networks subject to node feature attacks. Building on these findings, we connect the expected robustness of GNNs to the orthonormality of their weight matrices and consequently propose an attack-independent, more robust variant of the GCN, called the Graph Convolutional Orthonormal Robust Networks (GCORNs). We further introduce a probabilistic method to estimate the expected robustness, which allows us to evaluate the effectiveness of GCORN on several real-world datasets. Experimental experiments showed that GCORN outperforms available defense methods. Our code is publicly available at: \href{https://github.com/Sennadir/GCORN}{https://github.com/Sennadir/GCORN}.

Updated: 2024-04-27 15:57:35

标题: 约束图神经网络受节点特征攻击时的预期稳健性

摘要: 图神经网络（GNNs）在各种图表示学习任务中表现出了最先进的性能。最近的研究揭示了它们对对抗攻击的脆弱性。在这项工作中，我们在属性图的背景下理论上定义了期望稳健性的概念，并将其与图表示学习文献中对抗稳健性的经典定义联系起来。我们的定义使我们能够推导出图卷积网络（GCNs）和图同构网络在节点特征攻击下的期望稳健性的上界。基于这些发现，我们将GNN的期望稳健性与其权重矩阵的正交性联系起来，因此提出了一种攻击无关、更稳健的GCN变体，称为图卷积正交稳健网络（GCORNs）。我们进一步引入了一种概率方法来估计期望稳健性，这使我们能够评估GCORN在几个真实世界数据集上的有效性。实验表明，GCORN胜过了现有的防御方法。我们的代码可以在以下链接找到：\href{https://github.com/Sennadir/GCORN}{https://github.com/Sennadir/GCORN}。

更新时间: 2024-04-27 15:57:35

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2404.17947v1

Interaction Event Forecasting in Multi-Relational Recursive HyperGraphs: A Temporal Point Process Approach

Modeling the dynamics of interacting entities using an evolving graph is an essential problem in fields such as financial networks and e-commerce. Traditional approaches focus primarily on pairwise interactions, limiting their ability to capture the complexity of real-world interactions involving multiple entities and their intricate relationship structures. This work addresses the problem of forecasting higher-order interaction events in multi-relational recursive hypergraphs. This is done using a dynamic graph representation learning framework that can capture complex relationships involving multiple entities. The proposed model, \textit{Relational Recursive Hyperedge Temporal Point Process} (RRHyperTPP) uses an encoder that learns a dynamic node representation based on the historical interaction patterns and then a hyperedge link prediction based decoder to model the event's occurrence. These learned representations are then used for downstream tasks involving forecasting the type and time of interactions. The main challenge in learning from hyperedge events is that the number of possible hyperedges grows exponentially with the number of nodes in the network. This will make the computation of negative log-likelihood of the temporal point process expensive, as the calculation of survival function requires a summation over all possible hyperedges. In our work, we use noise contrastive estimation to learn the parameters of our model, and we have experimentally shown that our models perform better than previous state-of-the-art methods for interaction forecasting.

Updated: 2024-04-27 15:46:54

标题: 在多关系递归超图中的交互事件预测：一种时间点过程方法

摘要: 使用演化图对相互作用实体的动态建模是金融网络和电子商务等领域的一个关键问题。传统方法主要关注成对交互，限制了捕捉涉及多个实体及其复杂关系结构的真实世界交互的能力。本研究解决了在多关系递归超图中预测高阶交互事件的问题。这是通过使用可以捕捉涉及多个实体的复杂关系的动态图表示学习框架来实现的。所提出的模型，\textit{关联递归超边时空点过程} (RRHyperTPP) 使用一个编码器，基于历史交互模式学习动态节点表示，然后使用基于超边链接预测的解码器来建模事件的发生。然后利用这些学习到的表示进行涉及预测交互类型和时间的下游任务。从超边事件中学习的主要挑战在于，可能的超边数量随着网络中节点数量的增加呈指数增长。这将使得时间点过程的负对数似然计算变得昂贵，因为生存函数的计算需要对所有可能的超边进行求和。在我们的工作中，我们使用噪声对比估计来学习我们模型的参数，并且在实验证明我们的模型比先前的交互预测最先进方法表现更好。

更新时间: 2024-04-27 15:46:54

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2404.17943v1

CBMAP: Clustering-based manifold approximation and projection for dimensionality reduction

Dimensionality reduction methods are employed to decrease data dimensionality, either to enhance machine learning performance or to facilitate data visualization in two or three-dimensional spaces. These methods typically fall into two categories: feature selection and feature transformation. Feature selection retains significant features, while feature transformation projects data into a lower-dimensional space, with linear and nonlinear methods. While nonlinear methods excel in preserving local structures and capturing nonlinear relationships, they may struggle with interpreting global structures and can be computationally intensive. Recent algorithms, such as the t-SNE, UMAP, TriMap, and PaCMAP prioritize preserving local structures, often at the expense of accurately representing global structures, leading to clusters being spread out more in lower-dimensional spaces. Moreover, these methods heavily rely on hyperparameters, making their results sensitive to parameter settings. To address these limitations, this study introduces a clustering-based approach, namely CBMAP (Clustering-Based Manifold Approximation and Projection), for dimensionality reduction. CBMAP aims to preserve both global and local structures, ensuring that clusters in lower-dimensional spaces closely resemble those in high-dimensional spaces. Experimental evaluations on benchmark datasets demonstrate CBMAP's efficacy, offering speed, scalability, and minimal reliance on hyperparameters. Importantly, CBMAP enables low-dimensional projection of test data, addressing a critical need in machine learning applications. CBMAP is made freely available at https://github.com/doganlab/cbmap and can be installed from the Python Package Directory (PyPI) software repository with the command pip install cbmap.

Updated: 2024-04-27 15:44:21

标题: CBMAP：基于聚类的流形逼近和投影用于降维

摘要: 降维方法被用来减少数据的维度，既可以用来提高机器学习性能，也可以用来在二维或三维空间中方便数据可视化。这些方法通常分为两类：特征选择和特征转换。特征选择保留重要特征，而特征转换将数据投影到一个低维空间中，使用线性和非线性方法。虽然非线性方法擅长保留局部结构和捕捉非线性关系，但可能在解释全局结构方面遇到困难，并且计算上会很耗时。最近的算法，如t-SNE、UMAP、TriMap和PaCMAP，优先考虑保留局部结构，但往往以牺牲准确表示全局结构为代价，导致聚类在低维空间中更加分散。此外，这些方法在很大程度上依赖于超参数，使得它们的结果对参数设置敏感。为了解决这些限制，本研究引入了一种基于聚类的方法，即CBMAP（基于聚类的流形近似和投影），用于降维。CBMAP旨在保留全局和局部结构，确保低维空间中的聚类与高维空间中的聚类非常相似。对基准数据集的实验评估表明CBMAP的有效性，提供了速度、可扩展性和对超参数的最小依赖。重要的是，CBMAP使得测试数据的低维投影成为可能，满足了机器学习应用中的一个关键需求。CBMAP可以在https://github.com/doganlab/cbmap 上免费获取，并可以通过命令pip install cbmap 在Python Package Directory（PyPI）软件仓库中安装。

更新时间: 2024-04-27 15:44:21

领域: cs.LG

下载: http://arxiv.org/abs/2404.17940v1

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Updated: 2024-04-27 15:25:53

标题: DeepSeekMath：推动开放式语言模型中数学推理的极限

摘要: 数学推理对于语言模型来说是一个重要的挑战，因为它具有复杂和结构化的特性。在本文中，我们介绍了DeepSeekMath 7B，它在DeepSeek-Coder-Base-v1.5 7B的基础上继续预训练，使用了来自Common Crawl的120B与自然语言和代码数据相关的令牌。DeepSeekMath 7B在竞赛级别的数学基准测试中取得了惊人的51.7%的分数，而不依赖于外部工具包和投票技术，接近Gemini-Ultra和GPT-4的性能水平。来自DeepSeekMath 7B的64个样本的自洽性达到了60.9%。DeepSeekMath的数学推理能力归因于两个关键因素：首先，我们通过精心设计的数据选择管道充分利用了公开可用的网络数据的巨大潜力。其次，我们引入了Group Relative Policy Optimization (GRPO)，这是Proximal Policy Optimization (PPO)的一个变种，可以增强数学推理能力，同时优化PPO的内存使用。

更新时间: 2024-04-27 15:25:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.03300v3

DTization: A New Method for Supervised Feature Scaling

Artificial intelligence is currently a dominant force in shaping various aspects of the world. Machine learning is a sub-field in artificial intelligence. Feature scaling is one of the data pre-processing techniques that improves the performance of machine learning algorithms. The traditional feature scaling techniques are unsupervised where they do not have influence of the dependent variable in the scaling process. In this paper, we have presented a novel feature scaling technique named DTization that employs decision tree and robust scaler for supervised feature scaling. The proposed method utilizes decision tree to measure the feature importance and based on the importance, different features get scaled differently with the robust scaler algorithm. The proposed method has been extensively evaluated on ten classification and regression datasets on various evaluation matrices and the results show a noteworthy performance improvement compared to the traditional feature scaling methods.

Updated: 2024-04-27 15:25:03

标题: DTization: 一种用于监督特征缩放的新方法

摘要: 人工智能目前是塑造世界各个方面的主导力量。机器学习是人工智能的一个子领域。特征缩放是一种数据预处理技术，可以提高机器学习算法的性能。传统的特征缩放技术是无监督的，它们在缩放过程中不受依赖变量的影响。本文提出了一种名为DTization的新颖特征缩放技术，该技术利用决策树和鲁棒缩放器进行监督特征缩放。所提出的方法利用决策树来衡量特征的重要性，并根据重要性，不同特征使用鲁棒缩放器算法进行不同程度的缩放。所提出的方法在十个分类和回归数据集上进行了广泛评估，使用各种评估指标，结果显示与传统特征缩放方法相比有显著的性能改进。

更新时间: 2024-04-27 15:25:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.17937v1

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control

We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.

Updated: 2024-04-27 15:19:24

标题: 网络系统的两级反馈控制下的入侵容忍

摘要: 我们为具有服务副本的系统制定了入侵容忍性的两级最优控制问题。在本地级别，节点控制器执行入侵恢复，全局级别上系统控制器管理复制因子。本地和全局控制问题可以被制定为经典的运营研究问题，即机器更换问题和库存补充问题。基于这种制定，我们设计了TOLERANCE，一种新颖的用于入侵容忍系统的控制架构。我们证明了两个级别上的最优控制策略具有阈值结构，并设计了计算它们的高效算法。我们在仿真环境中实施和评估了TOLERANCE，在这里我们运行了10种网络入侵。结果显示，与最先进的入侵容忍系统相比，TOLERANCE可以提高服务可用性并减少运营成本。

更新时间: 2024-04-27 15:19:24

领域: cs.DC,cs.AI,cs.CR,cs.GT,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.01741v3

A Comparative Analysis of Large Language Models for Code Documentation Generation

This paper presents a comprehensive comparative analysis of Large Language Models (LLMs) for generation of code documentation. Code documentation is an essential part of the software writing process. The paper evaluates models such as GPT-3.5, GPT-4, Bard, Llama2, and Starchat on various parameters like Accuracy, Completeness, Relevance, Understandability, Readability and Time Taken for different levels of code documentation. Our evaluation employs a checklist-based system to minimize subjectivity, providing a more objective assessment. We find that, barring Starchat, all LLMs consistently outperform the original documentation. Notably, closed-source models GPT-3.5, GPT-4, and Bard exhibit superior performance across various parameters compared to open-source/source-available LLMs, namely LLama 2 and StarChat. Considering the time taken for generation, GPT-4 demonstrated the longest duration, followed by Llama2, Bard, with ChatGPT and Starchat having comparable generation times. Additionally, file level documentation had a considerably worse performance across all parameters (except for time taken) as compared to inline and function level documentation.

Updated: 2024-04-27 15:15:40

标题: 大型语言模型在代码文档生成中的比较分析

摘要: 本文提出了一个关于大型语言模型（LLMs）在生成代码文档方面的全面比较分析。代码文档是软件编写过程中的重要部分。本文评估了诸如GPT-3.5、GPT-4、Bard、Llama2和Starchat等模型在准确性、完整性、相关性、可理解性、可读性和生成代码文档需要的时间等各种参数上表现。我们的评估采用了基于核对表的系统来最小化主观性，提供了更客观的评估。我们发现，除Starchat外，所有LLMs在各个参数上一贯表现优于原始文档。值得注意的是，闭源模型GPT-3.5、GPT-4和Bard在各种参数上表现优于开源/源代码可用的LLMs，即Llama2和StarChat。考虑到生成所需的时间，GPT-4表现出最长的持续时间，其次是Llama2、Bard，而ChatGPT和Starchat的生成时间相当。此外，文件级文档在所有参数（除了时间）上的表现相对较差，与内联和函数级文档相比。

更新时间: 2024-04-27 15:15:40

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2312.10349v2

Neural Temporal Point Process for Forecasting Higher Order and Directional Interactions

Real-world systems are made of interacting entities that evolve with time. Creating models that can forecast interactions by learning the dynamics of entities is an important problem in numerous fields. Earlier works used dynamic graph models to achieve this. However, real-world interactions are more complex than pairwise, as they involve more than two entities, and many of these higher-order interactions have directional components. Examples of these can be seen in communication networks such as email exchanges that involve a sender, and multiple recipients, citation networks, where authors draw upon the work of others, and so on. In this paper, we solve the problem of higher-order directed interaction forecasting by proposing a deep neural network-based model \textit{Directed HyperNode Temporal Point Process} for directed hyperedge event forecasting, as hyperedge provides a native framework for modeling relationships among the variable number of nodes. Our proposed technique reduces the search space by initially forecasting the nodes at which events will be observed and then forecasting hyperedge sizes and adjacency vectors for the nodes observing events. Based on these, it generates candidate hyperedges, which are then used by a hyperedge predictor to identify the ground truth. To demonstrate the efficiency of our model, we curated five datasets and conducted an extensive empirical study. We believe that this is the first work that solves the problem of forecasting higher-order directional interactions.

Updated: 2024-04-27 15:12:34

标题: 神经时间点过程用于预测高阶和方向性交互

摘要: 真实世界的系统由相互作用的实体组成，并随着时间演变。创建能够通过学习实体动态来预测相互作用的模型在许多领域中是一个重要问题。早期的研究使用动态图模型来实现这一点。然而，真实世界的相互作用比成对更加复杂，因为它们涉及到不止两个实体，并且许多这些高阶相互作用具有方向性成分。例如，可以在通信网络中看到这些，如涉及发件人和多个接收者的电子邮件交换，引用网络，作者借鉴他人的工作等。在本文中，我们通过提出一种基于深度神经网络的模型"有向超节点时间点过程"，解决了高阶有向相互作用预测的问题，因为超边提供了一个本地框架来建模可变数量节点之间的关系。我们提出的技术通过首先预测事件发生的节点，然后预测观察到事件的节点的超边大小和邻接向量来减少搜索空间。基于这些，它生成候选超边，然后由超边预测器使用以确定真实情况。为了展示我们模型的效率，我们整理了五个数据集并进行了广泛的实证研究。我们相信这是第一个解决预测高阶有向相互作用问题的工作。

更新时间: 2024-04-27 15:12:34

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2301.12210v2

Critical Review for One-class Classification: recent advances and the reality behind them

This paper offers a comprehensive review of one-class classification (OCC), examining the technologies and methodologies employed in its implementation. It delves into various approaches utilized for OCC across diverse data types, such as feature data, image, video, time series, and others. Through a systematic review, this paper synthesizes promi-nent strategies used in OCC from its inception to its current advance-ments, with a particular emphasis on the promising application. Moreo-ver, the article criticizes the state-of-the-art (SOTA) image anomaly de-tection (AD) algorithms dominating one-class experiments. These algo-rithms include outlier exposure (binary classification) and pretrained model (multi-class classification), conflicting with the fundamental con-cept of learning from one class. Our investigation reveals that the top nine algorithms for one-class CIFAR10 benchmark are not OCC. We ar-gue that binary/multi-class classification algorithms should not be com-pared with OCC.

Updated: 2024-04-27 15:04:30

标题: 一类分类的关键审查：最新进展及其背后的现实

摘要: 这篇论文全面回顾了一类分类（OCC）的相关技术和方法，探讨了其实施中所采用的技术和方法。它深入研究了在不同数据类型中用于OCC的各种方法，例如特征数据、图像、视频、时间序列等。通过系统性回顾，本文综合了OCC从起源到当前进展中所使用的重要策略，特别强调了其有前途的应用。此外，文章批评了主导一类实验的最新图像异常检测（AD）算法。这些算法包括异常值暴露（二元分类）和预训练模型（多类分类），与从一个类别学习的基本概念相冲突。我们的调查发现，针对一类CIFAR10基准的前九个算法并非OCC。我们认为，二元/多类分类算法不应与OCC进行比较。

更新时间: 2024-04-27 15:04:30

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.17931v1

Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos to achieve effective adaptation. This approach is appealing for applications because it only needs a few or even one labeled example per class in the target domain, ideal for recognizing rare but critical activities. However, the existing FSDA-AR works mostly focus on the domain adaptation on sports videos, where the domain diversity is limited. We propose a new FSDA-AR benchmark using five established datasets considering the adaptation on more diverse and challenging domains. Our results demonstrate that FSDA-AR performs comparably to unsupervised domain adaptation with significantly fewer labeled target domain samples. We further propose a novel approach, RelaMiX, to better leverage the few labeled target domain samples as knowledge guidance. RelaMiX encompasses a temporal relational attention network with relation dropout, alongside a cross-domain information alignment mechanism. Furthermore, it integrates a mechanism for mixing features within a latent space by using the few-shot target domain samples. The proposed RelaMiX solution achieves state-of-the-art performance on all datasets within the FSDA-AR benchmark. To encourage future research of few-shot domain adaptation for activity recognition, our code will be publicly available at https://github.com/KPeng9510/RelaMiX.

Updated: 2024-04-27 15:02:45

标题: 在不同领域上探索少样本适应活动识别

摘要: 域自适应对于活动识别至关重要，可以确保在不同环境、传感器类型和数据来源中实现准确和稳健的性能。虽然无监督域自适应方法已经得到广泛研究，但它们需要来自目标域的大规模未标记数据。在这项工作中，我们专注于少样本域自适应活动识别（Few-Shot Domain Adaptation for Activity Recognition，简称FSDA-AR），利用极少量标记的目标视频实现有效的自适应。这种方法对于应用非常有吸引力，因为它只需要目标域中每个类别几个甚至一个标记样本，非常适合识别罕见但关键的活动。然而，现有的FSDA-AR工作主要集中在体育视频上的域自适应，其中域的多样性受到限制。我们提出了一个新的FSDA-AR基准，使用五个已建立的数据集考虑更多样化和具有挑战性的领域上的自适应。我们的结果表明，FSDA-AR在具有显着更少标记目标域样本的情况下与无监督域自适应性能相当。我们进一步提出了一种新颖的方法RelaMiX，以更好地利用少量标记的目标域样本作为知识引导。RelaMiX包括一个带有关系丢失的时间关系注意网络，以及一个跨域信息对齐机制。此外，它还通过在潜在空间中使用少样本目标域样本的特征混合机制。所提出的RelaMiX解决方案在FSDA-AR基准中的所有数据集上实现了最先进的性能。为了鼓励未来研究关于活动识别的少样本域自适应，我们的代码将公开在https://github.com/KPeng9510/RelaMiX。

更新时间: 2024-04-27 15:02:45

领域: cs.CV,cs.AI,cs.RO,eess.IV

下载: http://arxiv.org/abs/2305.08420v3

Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments

In the era of the Internet of Things (IoT), objects connect through a dynamic network, empowered by technologies like 5G, enabling real-time data sharing. However, smart objects, notably autonomous vehicles, face challenges in critical local computations due to limited resources. Lightweight AI models offer a solution but struggle with diverse data distributions. To address this limitation, we propose a novel Multi-Stream Cellular Test-Time Adaptation (MSC-TTA) setup where models adapt on the fly to a dynamic environment divided into cells. Then, we propose a real-time adaptive student-teacher method that leverages the multiple streams available in each cell to quickly adapt to changing data distributions. We validate our methodology in the context of autonomous vehicles navigating across cells defined based on location and weather conditions. To facilitate future benchmarking, we release a new multi-stream large-scale synthetic semantic segmentation dataset, called DADE, and show that our multi-stream approach outperforms a single-stream baseline. We believe that our work will open research opportunities in the IoT and 5G eras, offering solutions for real-time model adaptation.

Updated: 2024-04-27 15:00:57

标题: 在动态环境中演化的实时模型的多流细胞测试时间自适应

摘要: 在物联网（IoT）时代，物体通过动态网络连接，借助5G等技术实现实时数据共享。然而，智能物体，尤其是自动驾驶车辆，由于资源有限，在关键的本地计算中面临挑战。轻量级人工智能模型提供了一种解决方案，但在处理多样化的数据分布时仍然存在困难。为了解决这一限制，我们提出了一种新颖的多流细胞测试时间自适应（MSC-TTA）设置，其中模型可以动态适应被分割成细胞的环境。然后，我们提出了一种实时自适应师生方法，利用每个细胞中可用的多个流来快速适应不断变化的数据分布。我们在自动驾驶车辆根据位置和天气条件穿越细胞的背景下验证了我们的方法。为了促进未来的基准测试，我们发布了一个新的多流大规模合成语义分割数据集，名为DADE，并表明我们的多流方法优于单流基准。我们相信我们的工作将开启物联网和5G时代的研究机会，为实时模型适应提供解决方案。

更新时间: 2024-04-27 15:00:57

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2404.17930v1

Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

Existing pedestrian attribute recognition (PAR) algorithms are mainly developed based on a static image, however, the performance is unreliable in challenging scenarios, such as heavy occlusion, motion blur, etc. In this work, we propose to understand human attributes using video frames that can fully use temporal information by fine-tuning a pre-trained multi-modal foundation model efficiently. Specifically, we formulate the video-based PAR as a vision-language fusion problem and adopt a pre-trained foundation model CLIP to extract the visual features. More importantly, we propose a novel spatiotemporal side-tuning strategy to achieve parameter-efficient optimization of the pre-trained vision foundation model. To better utilize the semantic information, we take the full attribute list that needs to be recognized as another input and transform the attribute words/phrases into the corresponding sentence via split, expand, and prompt operations. Then, the text encoder of CLIP is utilized for embedding processed attribute descriptions. The averaged visual tokens and text tokens are concatenated and fed into a fusion Transformer for multi-modal interactive learning. The enhanced tokens will be fed into a classification head for pedestrian attribute prediction. Extensive experiments on two large-scale video-based PAR datasets fully validated the effectiveness of our proposed framework. The source code of this paper is available at https://github.com/Event-AHU/OpenPAR.

Updated: 2024-04-27 14:43:32

标题: 基于视频的行人属性识别的时空侧调谐预训练基础模型

摘要: 现有的行人属性识别（PAR）算法主要基于静态图像开发，然而，在挑战性场景中，如严重遮挡、运动模糊等，性能不稳定。在这项工作中，我们提出使用视频帧来理解人类属性，通过有效地微调预训练的多模态基础模型充分利用时间信息。具体来说，我们将基于视频的PAR作为视觉-语言融合问题，并采用预训练的基础模型CLIP提取视觉特征。更重要的是，我们提出了一种新颖的时空侧调策略，实现了对预训练视觉基础模型的参数有效优化。为了更好地利用语义信息，我们将需要识别的全部属性列表作为另一个输入，并通过分割、扩展和提示操作将属性词语/短语转换为相应的句子。然后，利用CLIP的文本编码器对处理后的属性描述进行嵌入。平均视觉标记和文本标记被串联并输入到融合Transformer中进行多模态交互学习。增强的标记将被输入到一个分类头部用于行人属性预测。在两个大规模基于视频的PAR数据集上进行的大量实验充分验证了我们提出的框架的有效性。本文的源代码可在https://github.com/Event-AHU/OpenPAR 上找到。

更新时间: 2024-04-27 14:43:32

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.17929v1

Pre-training on High Definition X-ray Images: An Experimental Study

Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More importantly, we introduce a novel context-aware masking strategy that utilizes the chest contour as a boundary for adaptive masking operations. We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition. Extensive experiments demonstrate that our pre-trained medical foundation vision model achieves comparable or even new state-of-the-art performance on downstream benchmark datasets. The source code and pre-trained models of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis.

Updated: 2024-04-27 14:29:53

标题: 在高清X射线图像上的预训练：一项实验研究

摘要: 现有基于X射线的预训练视觉模型通常在相对较小规模的数据集（少于500k个样本）上进行，分辨率有限（例如224×224）。然而，自监督预训练大型模型成功的关键在于大规模训练数据，并且在X射线图像领域保持高分辨率是解决困难杂病的有效解决方案的保证。在本文中，我们通过提出第一个基于高清（1280×1280）X射线的预训练基础视觉模型，在我们新收集的包含超过100万张X射线图像的大规模数据集上进行。我们的模型遵循掩码自编码器框架，该框架将经过掩码处理后的令牌（具有高比例）用作输入，并通过Transformer编码器-解码器网络重构掩码图像块。更重要的是，我们引入了一种新颖的上下文感知掩码策略，利用胸部轮廓作为自适应掩码操作的边界。我们在两个下游任务上验证了我们模型的有效性，包括X射线报告生成和疾病识别。大量实验证明，我们的预训练医学基础视觉模型在下游基准数据集上达到了可比甚至崭新的最先进性能。本文的源代码和预训练模型将发布在https://github.com/Event-AHU/Medical_Image_Analysis。

更新时间: 2024-04-27 14:29:53

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.17926v1

Accurate and fast anomaly detection in industrial processes and IoT environments

We present a novel, simple and widely applicable semi-supervised procedure for anomaly detection in industrial and IoT environments, SAnD (Simple Anomaly Detection). SAnD comprises 5 steps, each leveraging well-known statistical tools, namely; smoothing filters, variance inflation factors, the Mahalanobis distance, threshold selection algorithms and feature importance techniques. To our knowledge, SAnD is the first procedure that integrates these tools to identify anomalies and help decipher their putative causes. We show how each step contributes to tackling technical challenges that practitioners face when detecting anomalies in industrial contexts, where signals can be highly multicollinear, have unknown distributions, and intertwine short-lived noise with the long(er)-lived actual anomalies. The development of SAnD was motivated by a concrete case study from our industrial partner, which we use here to show its effectiveness. We also evaluate the performance of SAnD by comparing it with a selection of semi-supervised methods on public datasets from the literature on anomaly detection. We conclude that SAnD is effective, broadly applicable, and outperforms existing approaches in both anomaly detection and runtime.

Updated: 2024-04-27 14:29:42

标题: 准确快速的工业流程和物联网环境中的异常检测

摘要: 我们提出了一种新颖、简单并广泛适用的半监督异常检测程序SAnD（Simple Anomaly Detection）。SAnD包括5个步骤，每个步骤利用众所周知的统计工具，即平滑滤波器、方差膨胀因子、马氏距离、阈值选择算法和特征重要性技术。据我们所知，SAnD是第一个将这些工具整合在一起以识别异常并帮助解释其可能原因的程序。我们展示了每个步骤如何有助于解决从业者在工业环境中检测异常时面临的技术挑战，其中信号可能高度多重共线性，具有未知分布，并且将短暂的噪声与长期存在的实际异常交织在一起。SAnD的开发是由我们的工业合作伙伴提出的一个具体案例研究激发的，我们在这里使用该案例展示其有效性。我们还通过将其与文献中的一些半监督方法在公共数据集上进行比较来评估SAnD的性能。我们得出结论，SAnD在异常检测和运行时间方面都是有效的、广泛适用的，并且优于现有方法。

更新时间: 2024-04-27 14:29:42

领域: cs.LG,stat.AP

下载: http://arxiv.org/abs/2404.17925v1

Results about sets of desirable gamble sets

Coherent sets of desirable gamble sets is used as a model for representing an agents opinions and choice preferences under uncertainty. In this paper we provide some results about the axioms required for coherence and the natural extension of a given set of desirable gamble sets. We also show that coherent sets of desirable gamble sets can be represented by a proper filter of coherent sets of desirable gambles.

Updated: 2024-04-27 14:29:13

标题: 关于理想赌注集合的结果

摘要: 一致的理想赌注集合被用作代表在不确定性下的代理人的意见和选择偏好的模型。在本文中，我们提供了关于所需一致性公理和给定理想赌注集合的自然扩展的一些结果。我们还展示了一致的理想赌注集合可以通过一致的理想赌注的适当滤波器来表示。

更新时间: 2024-04-27 14:29:13

领域: cs.AI

下载: http://arxiv.org/abs/2404.17924v1

FedCRL: Personalized Federated Learning with Contrastive Shared Representations for Label Heterogeneity in Non-IID Data

To deal with heterogeneity resulting from label distribution skew and data scarcity in distributed machine learning scenarios, this paper proposes a novel Personalized Federated Learning (PFL) algorithm, named Federated Contrastive Representation Learning (FedCRL). FedCRL introduces contrastive representation learning (CRL) on shared representations to facilitate knowledge acquisition of clients. Specifically, both local model parameters and averaged values of local representations are considered as shareable information to the server, both of which are then aggregated globally. CRL is applied between local representations and global representations to regularize personalized training by drawing similar representations closer and separating dissimilar ones, thereby enhancing local models with external knowledge and avoiding being harmed by label distribution skew. Additionally, FedCRL adopts local aggregation between each local model and the global model to tackle data scarcity. A loss-wise weighting mechanism is introduced to guide the local aggregation using each local model's contrastive loss to coordinate the global model involvement in each client, thus helping clients with scarce data. Our simulations demonstrate FedCRL's effectiveness in mitigating label heterogeneity by achieving accuracy improvements over existing methods on datasets with varying degrees of label heterogeneity.

Updated: 2024-04-27 14:05:18

标题: FedCRL：个性化的联邦学习，利用对比共享表示来解决非独立同分布数据中的标签异质性

摘要: 为了应对在分布式机器学习场景中由标签分布倾斜和数据稀缺导致的异质性，本文提出了一种新颖的个性化联邦学习（PFL）算法，名为联邦对比表示学习（FedCRL）。FedCRL在共享表示上引入对比表示学习（CRL），以促进客户端的知识获取。具体来说，本地模型参数和本地表示的平均值被视为可共享的信息发送到服务器，然后在全局进行聚合。CRL被应用于本地表示和全局表示之间，通过将相似的表示拉近并分离不相似的表示来规范个性化训练，从而增强本地模型的外部知识并避免受到标签分布倾斜的影响。此外，FedCRL采用了每个本地模型和全局模型之间的本地聚合来处理数据稀缺性。引入了一种基于损失的加权机制，通过使用每个本地模型的对比损失来指导本地聚合，协调全局模型在每个客户端的参与，从而帮助拥有稀缺数据的客户端。我们的模拟实验证明了FedCRL在减轻标签异质性方面的有效性，通过在具有不同程度标签异质性的数据集上实现准确性改进。

更新时间: 2024-04-27 14:05:18

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.17916v1

TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation

In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server. Communication is usually slow and costly and forms the main bottleneck. This is particularly true in federated learning, where a large number of users collaborate toward a global training task. In addition, it is desirable for a robust algorithm to allow for partial participation, since it is often the case that some clients are not able to participate to the entire process and are idle at certain times. Two strategies are popular to reduce the communication burden: 1) local training, which consists in communicating less frequently, or equivalently performing more local computations between the communication rounds; and 2) compression, whereby compressed information instead of full-dimensional vectors is communicated. We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation. In the strongly convex setting, TAMUNA converges linearly to the exact solution and provably benefits from the two mechanisms: it exhibits a doubly-accelerated convergence rate, with respect to the condition number of the functions and the model dimension.

Updated: 2024-04-27 13:55:25

标题: TAMUNA：具有本地训练、压缩和部分参与的双重加速分布式优化

摘要: 在分布式优化和学习中，几台机器在本地并行进行局部计算并与远程服务器进行通信。通信通常是缓慢和昂贵的，形成主要的瓶颈。这在联邦学习中尤为明显，其中大量用户合作完成全局训练任务。此外，一个强大的算法应该允许部分参与，因为通常情况下一些客户端无法参与整个过程，有时处于闲置状态。减少通信负担的两种流行策略是：1）本地训练，即减少频繁通信，或等效地在通信轮之间执行更多的本地计算；2）压缩，即传输压缩信息而不是完整的向量。我们提出了TAMUNA，这是第一个利用本地训练和压缩两种策略并允许部分参与的分布式优化算法。在强凸设置下，TAMUNA线性收敛到精确解，并且可以证明从这两种机制中获益：它表现出双倍加速的收敛速度，与函数的条件数和模型维度有关。

更新时间: 2024-04-27 13:55:25

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2302.09832v3

SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models

Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports. Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content. To mitigate this, we introduce a novel strategy, SERPENT-VLM (SElf Refining Radiology RePort GENeraTion using Vision Language Models), which improves the R2Gen task by integrating a self-refining mechanism into the MLLM framework. We employ a unique self-supervised loss that leverages similarity between pooled image representations and the contextual representations of the generated radiological text, alongside the standard Causal Language Modeling objective, to refine image-text representations. This allows the model to scrutinize and align the generated text through dynamic interaction between a given image and the generated text, therefore reducing hallucination and continuously enhancing nuanced report generation. SERPENT-VLM outperforms existing baselines such as LLaVA-Med, BiomedGPT, etc., achieving SoTA performance on the IU X-ray and Radiology Objects in COntext (ROCO) datasets, and also proves to be robust against noisy images. A qualitative case study emphasizes the significant advancements towards more sophisticated MLLM frameworks for R2Gen, opening paths for further research into self-supervised refinement in the medical imaging domain.

Updated: 2024-04-27 13:46:23

标题: SERPENT-VLM：使用视觉语言模型生成自我完善的放射学报告

摘要: 放射学报告生成（R2Gen）展示了多模态大型语言模型（MLLMs）如何自动化地创建准确和连贯的放射学报告。现有方法通常在基于文本的报告中产生细节幻觉，这些细节并不准确反映图像内容。为了缓解这一问题，我们引入了一种新颖策略，即SERPENT-VLM（使用视觉语言模型进行自我精炼放射学报告生成），通过将自我精炼机制整合到MLLM框架中，改进了R2Gen任务。我们采用一种独特的自监督损失，利用池化图像表示和生成的放射学文本的上下文表示之间的相似性，以及标准的因果语言建模目标，来精炼图像-文本表示。这使模型能够通过给定图像和生成文本之间的动态交互来审查和对齐生成文本，从而减少幻觉，并不断增强细微报告生成。SERPENT-VLM在IU X射线和放射学对象上背景下的X射线数据集上表现优于现有基线，如LLaVA-Med，BiomedGPT等，同时也证明在嘈杂图像下具有鲁棒性。定性案例研究强调了向更复杂的MLLM框架迈进的重大进展，为进一步探索医学成像领域的自监督精炼打开了道路。

更新时间: 2024-04-27 13:46:23

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.17912v1

Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection

Semi-supervised 3D object detection can benefit from the promising pseudo-labeling technique when labeled data is limited. However, recent approaches have overlooked the impact of noisy pseudo-labels during training, despite efforts to enhance pseudo-label quality through confidence-based filtering. In this paper, we examine the impact of noisy pseudo-labels on IoU-based target assignment and propose the Reliable Student framework, which incorporates two complementary approaches to mitigate errors. First, it involves a class-aware target assignment strategy that reduces false negative assignments in difficult classes. Second, it includes a reliability weighting strategy that suppresses false positive assignment errors while also addressing remaining false negatives from the first step. The reliability weights are determined by querying the teacher network for confidence scores of the student-generated proposals. Our work surpasses the previous state-of-the-art on KITTI 3D object detection benchmark on point clouds in the semi-supervised setting. On 1% labeled data, our approach achieves a 6.2% AP improvement for the pedestrian class, despite having only 37 labeled samples available. The improvements become significant for the 2% setting, achieving 6.0% AP and 5.7% AP improvements for the pedestrian and cyclist classes, respectively.

Updated: 2024-04-27 13:38:45

标题: 可靠的学生：解决半监督3D物体检测中的噪声

摘要: 半监督3D物体检测在标记数据有限时可以从有前景的伪标记技术中受益。然而，尽管努力通过基于置信度的过滤来提高伪标签质量，但最近的方法忽视了训练过程中嘈杂伪标签的影响。在本文中，我们研究了嘈杂伪标签对基于IoU的目标分配的影响，并提出了可靠学生框架，该框架整合了两种互补方法以减轻错误。首先，它包括一种类别感知的目标分配策略，可以减少对困难类别的误差负样本分配。其次，它包括一种可靠性加权策略，可以抑制误报正样本分配错误，同时也可以解决第一步中剩余的误报负样本。可靠性权重是通过查询师网络获取学生生成的提议的置信度分数来确定的。我们的工作在KITTI 3D物体检测基准点云上超越了先前的最新技术，在半监督设置中。在1%标记数据上，尽管只有37个标记样本可用，我们的方法在行人类别上实现了6.2%的AP改进。在2%设置下，改进变得显著，分别实现了6.0%和5.7%的行人和骑行者类别的AP改进。

更新时间: 2024-04-27 13:38:45

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.17910v1

Towards Cross Domain Generalization of Hamiltonian Representation via Meta Learning

Recent advances in deep learning for physics have focused on discovering shared representations of target systems by incorporating physics priors or inductive biases into neural networks. While effective, these methods are limited to the system domain, where the type of system remains consistent and thus cannot ensure the adaptation to new, or unseen physical systems governed by different laws. For instance, a neural network trained on a mass-spring system cannot guarantee accurate predictions for the behavior of a two-body system or any other system with different physical laws. In this work, we take a significant leap forward by targeting cross domain generalization within the field of Hamiltonian dynamics. We model our system with a graph neural network (GNN) and employ a meta learning algorithm to enable the model to gain experience over a distribution of systems and make it adapt to new physics. Our approach aims to learn a unified Hamiltonian representation that is generalizable across multiple system domains, thereby overcoming the limitations of system-specific models. We demonstrate that the meta-trained model captures the generalized Hamiltonian representation that is consistent across different physical domains. Overall, through the use of meta learning, we offer a framework that achieves cross domain generalization, providing a step towards a unified model for understanding a wide array of dynamical systems via deep learning.

Updated: 2024-04-27 13:32:28

标题: 通过元学习实现哈密顿表示的跨领域泛化

摘要: 物理学中深度学习的最新进展集中在通过将物理先验或归纳偏差纳入神经网络中来发现目标系统的共享表示。虽然这些方法是有效的，但仅限于系统域，其中系统的类型保持一致，因此不能确保适应由不同法则统治的新的或未知的物理系统。例如，在质量弹簧系统上训练的神经网络不能保证准确预测双体系统或任何具有不同物理法则的其他系统的行为。在本研究中，我们通过瞄准汉密尔顿动力学领域内的跨域泛化，迈出了重要的一步。我们使用图神经网络（GNN）对我们的系统进行建模，并采用元学习算法使模型能够在一系列系统上获得经验并使其适应新的物理。我们的方法旨在学习一个通用的汉密尔顿表示，可以在多个系统域中泛化，从而克服系统特定模型的局限性。我们证明，经过元训练的模型捕获了一致的通用汉密尔顿表示，适用于不同的物理域。总的来说，通过元学习的使用，我们提供了一个实现跨域泛化的框架，为通过深度学习理解各种动态系统迈出了一步。

更新时间: 2024-04-27 13:32:28

领域: cs.LG,cs.AI,physics.comp-ph

下载: http://arxiv.org/abs/2212.01168v4

AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy

We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptability in granularity enriches the model's learning capability at multiple levels and improves its performance in multi-task scenarios. Specifically, the substructure-level molecular representation preserves information about specific atom groups or arrangements, influencing chemical properties and functionalities. This proves advantageous for tasks such as property prediction. Simultaneously, the atomic-level representation, combined with generative molecular canonicalization pre-training tasks, enhances validity, novelty, and uniqueness in generative tasks. All of these features work together to give AdaMR outstanding performance on a range of downstream tasks. We fine-tuned our proposed pre-trained model on six molecular property prediction tasks (MoleculeNet datasets) and two generative tasks (ZINC250K datasets), achieving state-of-the-art (SOTA) results on five out of eight tasks.

Updated: 2024-04-27 13:28:02

标题: AdaMR：适应性分子表示用于统一的预训练策略

摘要: 我们提出了可调整的分子表示（AdaMR），作为新的大规模均匀预训练策略，用于小分子药物，作为一种新的统一的预训练策略。AdaMR利用可调整颗粒度的分子编码策略，通过一个名为分子规范化的预训练任务来实现，这使其与最近的大规模分子模型有所不同。这种颗粒度上的适应性丰富了模型在多个级别的学习能力，并提高了其在多任务场景中的性能。具体而言，亚结构级别的分子表示保留了关于特定原子组或排列的信息，影响化学性质和功能。这对于属性预测等任务非常有优势。同时，原子级别的表示，结合生成性分子规范化预训练任务，增强了生成任务中的有效性、新颖性和独特性。所有这些特性共同使AdaMR在各种下游任务中表现出色。我们在六个分子属性预测任务（MoleculeNet数据集）和两个生成任务（ZINC250K数据集）上微调了我们提出的预训练模型，在八个任务中的五个上取得了最先进的结果。

更新时间: 2024-04-27 13:28:02

领域: q-bio.BM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.06166v2

Heart Disease Detection using Quantum Computing and Partitioned Random Forest Methods

Heart disease morbidity and mortality rates are increasing, which has a negative impact on public health and the global economy. Early detection of heart disease reduces the incidence of heart mortality and morbidity. Recent research has utilized quantum computing methods to predict heart disease with more than 5 qubits and are computationally intensive. Despite the higher number of qubits, earlier work reports a lower accuracy in predicting heart disease, have not considered the outlier effects, and requires more computation time and memory for heart disease prediction. To overcome these limitations, we propose hybrid random forest quantum neural network (HQRF) using a few qubits (two to four) and considered the effects of outlier in the dataset. Two open-source datasets, Cleveland and Statlog, are used in this study to apply quantum networks. The proposed algorithm has been applied on two open-source datasets and utilized two different types of testing strategies such as 10-fold cross validation and 70-30 train/test ratio. We compared the performance of our proposed methodology with our earlier algorithm called hybrid quantum neural network (HQNN) proposed in the literature for heart disease prediction. HQNN and HQRF outperform in 10-fold cross validation and 70/30 train/test split ratio, respectively. The results show that HQNN requires a large training dataset while HQRF is more appropriate for both large and small training dataset. According to the experimental results, the proposed HQRF is not sensitive to the outlier data compared to HQNN. Compared to earlier works, the proposed HQRF achieved a maximum area under the curve (AUC) of 96.43% and 97.78% in predicting heart diseases using Cleveland and Statlog datasets, respectively with HQNN. The proposed HQRF is highly efficient in detecting heart disease at an early stage and will speed up clinical diagnosis.

Updated: 2024-04-27 13:07:23

标题: 使用量子计算和分区随机森林方法检测心脏疾病

摘要: 心脏疾病的发病率和死亡率正在增加，这对公共卫生和全球经济产生了负面影响。早期发现心脏疾病可以减少心脏死亡率和发病率。最近的研究利用量子计算方法预测心脏疾病，需要超过5个量子位，并且计算密集。尽管量子位数较多，早期的工作报告显示在预测心脏疾病时准确率较低，未考虑异常值的影响，并且需要更多的计算时间和内存用于心脏疾病预测。为了克服这些限制，我们提出了使用少量量子位（两到四个）的混合随机森林量子神经网络（HQRF），考虑了数据集中异常值的影响。本研究使用两个开源数据集，Cleveland和Statlog，应用量子网络。提出的算法已经在两个开源数据集上应用，并使用两种不同的测试策略，如10折交叉验证和70-30的训练/测试比例。我们将我们提出的方法与文献中提出的用于心脏疾病预测的早期算法混合量子神经网络（HQNN）进行了比较。HQNN和HQRF在10折交叉验证和70/30的训练/测试分裂比例中表现出色。结果显示HQNN需要大量训练数据集，而HQRF更适用于大和小训练数据集。根据实验结果，与HQNN相比，提出的HQRF对异常值数据不敏感。与早期作品相比，提出的HQRF在使用Cleveland和Statlog数据集预测心脏疾病时分别达到了最大的曲线下面积（AUC）分别为96.43%和97.78%。提出的HQRF在早期检测心脏疾病方面非常高效，将加快临床诊断的速度。

更新时间: 2024-04-27 13:07:23

领域: quant-ph,cs.IT,cs.LG,math.IT,math.OC,92C50, 68P30, 68Q87, 68T20, 68Q12

下载: http://arxiv.org/abs/2208.08882v3

Shared learning of powertrain control policies for vehicle fleets

Emerging data-driven approaches, such as deep reinforcement learning (DRL), aim at on-the-field learning of powertrain control policies that optimize fuel economy and other performance metrics. Indeed, they have shown great potential in this regard for individual vehicles on specific routes or drive cycles. However, for fleets of vehicles that must service a distribution of routes, DRL approaches struggle with learning stability issues that result in high variances and challenge their practical deployment. In this paper, we present a novel framework for shared learning among a fleet of vehicles through the use of a distilled group policy as the knowledge sharing mechanism for the policy learning computations at each vehicle. We detail the mathematical formulation that makes this possible. Several scenarios are considered to analyze the functionality, performance, and computational scalability of the framework with fleet size. Comparisons of the cumulative performance of fleets using our proposed shared learning approach with a baseline of individual learning agents and another state-of-the-art approach with a centralized learner show clear advantages to our approach. For example, we find a fleet average asymptotic improvement of 8.5 percent in fuel economy compared to the baseline while also improving on the metrics of acceleration error and shifting frequency for fleets serving a distribution of suburban routes. Furthermore, we include demonstrative results that show how the framework reduces variance within a fleet and also how it helps individual agents adapt better to new routes.

Updated: 2024-04-27 13:01:05

标题: 车队动力总成控制政策的共享学习

摘要: 新兴的数据驱动方法，如深度强化学习（DRL），旨在在车辆控制策略上进行现场学习，以优化燃油经济性和其他性能指标。事实上，它们已经显示出在特定路线或行驶周期上对个体车辆具有巨大潜力。然而，对于必须服务于各种路线的车队，DRL方法在学习稳定性问题上面临挑战，导致高方差，限制了它们的实际部署。在本文中，我们提出了一个新框架，通过使用精炼的群体策略作为知识共享机制，实现车队之间的共享学习，用于每辆车的政策学习计算。我们详细介绍了使这一切成为可能的数学公式。考虑了几种情景，以分析该框架在不同规模车队中的功能性、性能和计算可扩展性。通过将我们提出的共享学习方法与个体学习代理的基线和另一种最先进方法使用中央化学习者进行累积性能的比较，我们清楚地看到了我们方法的明显优势。例如，我们发现与基线相比，车队平均燃油经济性提高了8.5％，同时还改善了加速误差和换档频率等指标，适用于服务于各种郊区路线的车队。此外，我们还包括演示结果，展示了该框架如何减少车队内的方差，以及如何帮助个体代理更好地适应新路线。

更新时间: 2024-04-27 13:01:05

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2404.17892v1

DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging reconstruction tasks. However, the unsupervised nature of INR architecture imposes limited constraints on the solution space, particularly for the highly ill-posed reconstruction task posed by LACT and ultra-SVCT. In this study, we introduce the Diffusion Prior Driven Neural Representation (DPER), an advanced unsupervised framework designed to address the exceptionally ill-posed CT reconstruction inverse problems. DPER adopts the Half Quadratic Splitting (HQS) algorithm to decompose the inverse problem into data fidelity and distribution prior sub-problems. The two sub-problems are respectively addressed by INR reconstruction scheme and pre-trained score-based diffusion model. This combination initially preserves the implicit image local consistency prior from INR. Additionally, it effectively augments the feasibility of the solution space for the inverse problem through the generative diffusion model, resulting in increased stability and precision in the solutions. We conduct comprehensive experiments to evaluate the performance of DPER on LACT and ultra-SVCT reconstruction with two public datasets (AAPM and LIDC). The results show that our method outperforms the state-of-the-art reconstruction methods on in-domain datasets, while achieving significant performance improvements on out-of-domain datasets.

Updated: 2024-04-27 12:55:13

标题: DPER：扩散先验驱动的神经表示用于有限角度和稀疏视图CT重建

摘要: 受到不完整数据采集的挑战，有限角度和稀疏视图计算机断层扫描（LACT和SVCT）对于扩大X射线CT应用的范围至关重要。然而，由于数据采集不完整，重建的CT图像中出现了各种伪影。新兴的隐式神经表达（INR）技术，如NeRF、NeAT和NeRP，在未确定的CT成像重建任务中表现出了潜力。然而，INR架构的无监督特性对解空间施加了有限的约束，特别是对于LACT和超级SVCT提出的高度不适定的重建任务。在这项研究中，我们介绍了扩散先验驱动的神经表达（DPER），这是一个先进的无监督框架，旨在解决极度不适定的CT重建逆问题。DPER采用Half Quadratic Splitting（HQS）算法将逆问题分解为数据保真度和分布先验子问题。这两个子问题分别由INR重建方案和预训练的基于评分的扩散模型解决。这种组合最初保留了来自INR的隐式图像局部一致性先验。此外，通过生成式扩散模型有效地增加了逆问题的解空间的可行性，从而提高了解决方案的稳定性和精度。我们进行了全面的实验，评估了DPER在LACT和超级SVCT重建中的表现，使用两个公共数据集（AAPM和LIDC）。结果显示，我们的方法在领域内数据集上优于最先进的重建方法，同时在领域外数据集上也取得了显著的性能改进。

更新时间: 2024-04-27 12:55:13

领域: eess.IV,cs.AI,cs.CV,I.2.10; I.4.5

下载: http://arxiv.org/abs/2404.17890v1

Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping

Interpretable machine learning has emerged as central in leveraging artificial intelligence within high-stakes domains such as healthcare, where understanding the rationale behind model predictions is as critical as achieving high predictive accuracy. In this context, feature selection assumes a pivotal role in enhancing model interpretability by identifying the most important input features in black-box models. While random forests are frequently used in biomedicine for their remarkable performance on tabular datasets, the accuracy gained from aggregating decision trees comes at the expense of interpretability. Consequently, feature selection for enhancing interpretability in random forests has been extensively explored in supervised settings. However, its investigation in the unsupervised regime remains notably limited. To address this gap, the study introduces novel methods to construct feature graphs from unsupervised random forests and feature selection strategies to derive effective feature combinations from these graphs. Feature graphs are constructed for the entire dataset as well as individual clusters leveraging the parent-child node splits within the trees, such that feature centrality captures their relevance to the clustering task, while edge weights reflect the discriminating power of feature pairs. Graph-based feature selection methods are extensively evaluated on synthetic and benchmark datasets both in terms of their ability to reduce dimensionality while improving clustering performance, as well as to enhance model interpretability. An application on omics data for disease subtyping identifies the top features for each cluster, showcasing the potential of the proposed approach to enhance interpretability in clustering analyses and its utility in a real-world biomedical application.

Updated: 2024-04-27 12:47:37

标题: 可解释无监督树集合的特征图：中心性、交互作用和在疾病亚型分类中的应用

摘要: 可解释的机器学习已经成为在高风险领域如医疗保健中利用人工智能的核心，理解模型预测背后的原理与实现高预测准确性同样重要。在这种背景下，特征选择在提升模型可解释性中扮演关键角色，通过识别黑盒模型中最重要的输入特征。虽然随机森林在生物医学中经常被用于表格数据集的卓越表现，但通过聚合决策树获得的准确性是以可解释性的代价为代价的。因此，在监督环境中已经广泛探讨了增强随机森林的可解释性的特征选择。然而，在无监督环境中的研究仍然明显有限。为了填补这一空白，该研究引入了新方法，从无监督随机森林中构建特征图，并制定特征选择策略，从这些图中导出有效的特征组合。特征图是为整个数据集以及个别聚类构建的，利用树中的父子节点分割，使得特征中心性捕捉它们与聚类任务的相关性，而边缘权重反映特征对的区分能力。基于图的特征选择方法在合成和基准数据集上得到了广泛评估，无论是在减少维度同时提高聚类性能，还是增强模型可解释性方面。对于疾病亚型组学数据的应用确定了每个聚类的前几个特征，展示了提议方法增强聚类分析可解释性的潜力以及在现实生物医学应用中的实用性。

更新时间: 2024-04-27 12:47:37

领域: cs.LG,cs.AI,I.2.1; I.5.3; J.3

下载: http://arxiv.org/abs/2404.17886v1

Exploring the efficacy of a hybrid approach with modal decomposition over fully deep learning models for flow dynamics forecasting

Fluid dynamics problems are characterized by being multidimensional and nonlinear, causing the experiments and numerical simulations being complex, time-consuming and monetarily expensive. In this sense, there is a need to find new ways to obtain data in a more economical manner. Thus, in this work we study the application of time series forecasting to fluid dynamics problems, where the aim is to predict the flow dynamics using only past information. We focus our study on models based on deep learning that do not require a high amount of data for training, as this is the problem we are trying to address. Specifically in this work we have tested three autoregressive models where two of them are fully based on deep learning and the other one is a hybrid model that combines modal decomposition with deep learning. We ask these models to generate $200$ time-ahead predictions of two datasets coming from a numerical simulation and experimental measurements, where the latter is characterized by being turbulent. We show how the hybrid model generates more reliable predictions in the experimental case, as it is physics-informed in the sense that the modal decomposition extracts the physics in a way that allows us to predict it.

Updated: 2024-04-27 12:43:02

标题: 探索在流体动力学预测中使用模态分解混合方法与完全深度学习模型的有效性

摘要: 流体动力学问题的特点是多维和非线性，导致实验和数值模拟变得复杂、耗时且昂贵。在这个意义上，有必要找到更经济的方式来获取数据。因此，在这项工作中，我们研究了将时间序列预测应用于流体动力学问题，旨在仅利用过去信息预测流动动态。我们重点研究基于深度学习的模型，这些模型不需要大量数据进行训练，因为这是我们试图解决的问题。具体来说，在这项工作中，我们测试了三种自回归模型，其中两种完全基于深度学习，另一种是将模态分解与深度学习相结合的混合模型。我们要求这些模型生成两个数据集的200个时间步预测，一个来自数值模拟，另一个来自实验测量，后者具有湍流特性。我们展示了混合模型在实验案例中生成更可靠的预测，因为它是受物理启发的，即模态分解以一种允许我们预测的方式提取了物理信息。

更新时间: 2024-04-27 12:43:02

领域: physics.flu-dyn,cs.LG

下载: http://arxiv.org/abs/2404.17884v1

Privacy-Enhanced Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

On-device intelligence (ODI) enables artificial intelligence (AI) applications to run on end devices, providing real-time and customized AI inference without relying on remote servers. However, training models for on-device deployment face significant challenges due to the decentralized and privacy-sensitive nature of users' data, along with end-side constraints related to network connectivity, computation efficiency, etc. Existing training paradigms, such as cloud-based training, federated learning, and transfer learning, fail to sufficiently address these practical constraints that are prevalent for devices. To overcome these challenges, we propose Privacy-Enhanced Training-as-a-Service (PTaaS), a novel service computing paradigm that provides privacy-friendly, customized AI model training for end devices. PTaaS outsources the core training process to remote and powerful cloud or edge servers, efficiently developing customized on-device models based on uploaded anonymous queries, enhancing data privacy while reducing the computation load on individual devices. We explore the definition, goals, and design principles of PTaaS, alongside emerging technologies that support the PTaaS paradigm. An architectural scheme for PTaaS is also presented, followed by a series of open problems that set the stage for future research directions in the field of PTaaS.

Updated: 2024-04-27 12:39:28

标题: 隐私增强的训练即服务用于设备上的智能：概念、架构方案和开放问题

摘要: 设备上的智能（ODI）使人工智能（AI）应用程序能够在终端设备上运行，提供实时和定制的AI推理，而不依赖于远程服务器。然而，为设备部署训练模型面临着重大挑战，这是由于用户数据的分散和隐私敏感性，以及与网络连接、计算效率等相关的终端限制。现有的训练范式，如基于云的训练、联邦学习和迁移学习，未能充分解决这些设备普遍存在的实际约束。为了克服这些挑战，我们提出了隐私增强的训练即服务（PTaaS），这是一种提供面向终端设备的隐私友好、定制的AI模型训练的新型服务计算范式。PTaaS将核心训练过程外包给远程和强大的云端或边缘服务器，基于上传的匿名查询高效地开发定制的设备模型，增强数据隐私的同时减少个体设备的计算负载。我们探讨了PTaaS的定义、目标和设计原则，以及支持PTaaS范式的新兴技术。文章还提出了PTaaS的架构方案，随后列出了一系列开放问题，为PTaaS领域的未来研究方向铺平道路。

更新时间: 2024-04-27 12:39:28

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2404.10255v2

Is Mamba Effective for Time Series Forecasting?

In the realm of time series forecasting (TSF), it is imperative for models to adeptly discern and distill hidden patterns within historical time series data to forecast future states. Transformer-based models exhibit formidable efficacy in TSF, primarily attributed to their advantage in apprehending these patterns. However, the quadratic complexity of the Transformer leads to low computational efficiency and high costs, which somewhat hinders the deployment of the TSF model in real-world scenarios. Recently, Mamba, a selective state space model, has gained traction due to its ability to process dependencies in sequences while maintaining near-linear complexity. For TSF tasks, these characteristics enable Mamba to comprehend hidden patterns as the Transformer and reduce computational overhead compared to the Transformer. Therefore, we propose a Mamba-based model named Simple-Mamba (S-Mamba) for TSF. Specifically, we tokenize the time points of each variate autonomously via a linear layer. A bidirectional Mamba layer is utilized to extract inter-variate correlations and a Feed-Forward Network is set to learn temporal dependencies. Finally, the generation of forecast outcomes through a linear mapping layer. Experiments on thirteen public datasets prove that S-Mamba maintains low computational overhead and achieves leading performance. Furthermore, we conduct extensive experiments to explore Mamba's potential in TSF tasks. Our code is available at https://github.com/wzhwzhwzh0921/S-D-Mamba.

Updated: 2024-04-27 12:39:09

标题: 蛇形模式在时间序列预测中有效吗？

摘要: 在时间序列预测（TSF）领域中，模型必须熟练地识别和提炼历史时间序列数据中的隐藏模式，以预测未来状态。基于Transformer的模型在TSF中表现出强大的效能，主要归因于它们在理解这些模式方面的优势。然而，Transformer的二次复杂度导致计算效率低，成本高，这在一定程度上阻碍了TSF模型在实际场景中的部署。最近，一种选择性状态空间模型Mamba因其能够处理序列中的依赖关系而保持近线性复杂度而备受关注。对于TSF任务，这些特点使得Mamba能够像Transformer一样理解隐藏模式，并且与Transformer相比减少了计算开销。因此，我们提出了基于Mamba的模型Simple-Mamba（S-Mamba）用于TSF。具体来说，我们通过线性层独立地对每个变量的时间点进行标记。双向Mamba层用于提取变量之间的相关性，而前馈网络用于学习时间依赖关系。最后，通过线性映射层生成预测结果。在十三个公共数据集上的实验证明，S-Mamba保持低计算开销并取得领先的性能。此外，我们进行了大量实验来探索Mamba在TSF任务中的潜力。我们的代码可在https://github.com/wzhwzhwzh0921/S-D-Mamba 上找到。

更新时间: 2024-04-27 12:39:09

领域: cs.LG

下载: http://arxiv.org/abs/2403.11144v3

Noisy Node Classification by Bi-level Optimization based Multi-teacher Distillation

Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning methods to train diverse teacher models, and then aggregate their predictions through a teacher weight matrix. Furthermore, we design a new bi-level optimization strategy to dynamically adjust the teacher weight matrix based on the training progress of the student model. Finally, we design a label improvement module to improve the label quality. Extensive experimental results on real datasets show that our method achieves the best results compared to state-of-the-art methods.

Updated: 2024-04-27 12:19:08

标题: 基于双层优化的多教师蒸馏的噪声节点分类

摘要: 以前的图神经网络（GNNs）通常假设图数据具有干净的标签用于表示学习，但在实际应用中并非如此。在本文中，我们提出了一种基于双层优化的新型多教师蒸馏方法（即BO-NNC），用于在图数据上进行嘈杂节点分类。具体而言，我们首先采用多种自监督学习方法训练多个教师模型，然后通过教师权重矩阵汇总它们的预测结果。此外，我们设计了一种新的双层优化策略，根据学生模型的训练进度动态调整教师权重矩阵。最后，我们设计了一个标签改进模块来提高标签质量。在真实数据集上的大量实验结果显示，与最先进的方法相比，我们的方法取得了最佳结果。

更新时间: 2024-04-27 12:19:08

领域: cs.LG

下载: http://arxiv.org/abs/2404.17875v1

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks

This effort is focused on examining the behavior of reinforcement learning systems in personalization environments and detailing the differences in policy entropy associated with the type of learning algorithm utilized. We demonstrate that Policy Optimization agents often possess low-entropy policies during training, which in practice results in agents prioritizing certain actions and avoiding others. Conversely, we also show that Q-Learning agents are far less susceptible to such behavior and generally maintain high-entropy policies throughout training, which is often preferable in real-world applications. We provide a wide range of numerical experiments as well as theoretical justification to show that these differences in entropy are due to the type of learning being employed.

Updated: 2024-04-27 12:14:48

标题: 检验强化学习代理的政策熵对个性化任务的影响

摘要: 这项工作的重点是研究强化学习系统在个性化环境中的行为，并详细说明与所使用的学习算法类型相关的策略熵的差异。我们展示了在训练过程中，策略优化代理通常具有低熵策略，实际上导致代理优先考虑某些动作并避免其他动作。相反，我们还表明Q学习代理对这种行为的影响要小得多，并且通常在整个训练过程中保持高熵策略，这在现实世界应用中通常更可取。我们提供了大量的数值实验以及理论上的论证，以表明这些熵的差异是由所采用的学习类型决定的。

更新时间: 2024-04-27 12:14:48

领域: cs.LG,cs.AI,cs.NA,math.NA,math.OC

下载: http://arxiv.org/abs/2211.11869v4

A Survey of Deep Learning Library Testing Methods

In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Studying the characteristics of DL libraries, their associated bugs, and the corresponding testing methods is crucial for enhancing the security of DL systems and advancing the widespread application of DL technology. This paper provides an overview of the testing research related to various DL libraries, discusses the strengths and weaknesses of existing methods, and provides guidance and reference for the application of the DL library. This paper first introduces the workflow of DL underlying libraries and the characteristics of three kinds of DL libraries involved, namely DL framework, DL compiler, and DL hardware library. It then provides definitions for DL underlying library bugs and testing. Additionally, this paper summarizes the existing testing methods and tools tailored to these DL libraries separately and analyzes their effectiveness and limitations. It also discusses the existing challenges of DL library testing and outlines potential directions for future research.

Updated: 2024-04-27 11:42:13

标题: 《深度学习库测试方法调查》

摘要: 近年来，由深度学习（DL）技术驱动的软件系统在许多方面显著地促进了人们的生活。作为这些DL系统的支柱，各种DL库承担了底层优化和计算的任务。然而，与传统软件一样，DL库并不免疫于错误，这可能对用户的个人财产和安全构成严重威胁。研究DL库的特性、相关错误以及相应的测试方法对于增强DL系统的安全性并推动DL技术的广泛应用至关重要。本文概述了与各种DL库相关的测试研究，讨论了现有方法的优势和弱点，并为DL库的应用提供指导和参考。本文首先介绍了DL底层库的工作流程，以及涉及的三种DL库的特性，即DL框架、DL编译器和DL硬件库。然后对DL底层库错误和测试进行了定义。此外，本文分别总结了针对这些DL库定制的现有测试方法和工具，并分析了它们的有效性和局限性。还讨论了DL库测试的现有挑战，并概述了未来研究的潜在方向。

更新时间: 2024-04-27 11:42:13

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.17871v1

Error analysis for finite element operator learning methods for solving parametric second-order elliptic PDEs

In this paper, we provide a theoretical analysis of a type of operator learning method without data reliance based on the classical finite element approximation, which is called the finite element operator network (FEONet). We first establish the convergence of this method for general second-order linear elliptic PDEs with respect to the parameters for neural network approximation. In this regard, we address the role of the condition number of the finite element matrix in the convergence of the method. Secondly, we derive an explicit error estimate for the self-adjoint case. For this, we investigate some regularity properties of the solution in certain function classes for a neural network approximation, verifying the sufficient condition for the solution to have the desired regularity. Finally, we will also conduct some numerical experiments that support the theoretical findings, confirming the role of the condition number of the finite element matrix in the overall convergence.

Updated: 2024-04-27 11:25:58

标题: 有限元算子学习方法求解参数化二阶椭圆型偏微分方程的误差分析

摘要: 在这篇论文中，我们提供了一种基于经典有限元逼近的无需数据依赖的操作符学习方法的理论分析，该方法被称为有限元算子网络（FEONet）。我们首先针对神经网络逼近参数建立了该方法对于一般二阶线性椭圆型PDE的收敛性。在这方面，我们讨论了有限元矩阵的条件数在方法收敛中的作用。其次，我们推导了自伴情况下的显式误差估计。为此，我们研究了解在某些函数类中的正则性属性，以验证解具有所需正则性的充分条件。最后，我们还进行了一些支持理论发现的数值实验，确认了有限元矩阵的条件数在整体收敛中的作用。

更新时间: 2024-04-27 11:25:58

领域: math.NA,cs.LG,cs.NA

下载: http://arxiv.org/abs/2404.17868v1

Vision-based Discovery of Nonlinear Dynamics for 3D Moving Target

Data-driven discovery of governing equations has kindled significant interests in many science and engineering areas. Existing studies primarily focus on uncovering equations that govern nonlinear dynamics based on direct measurement of the system states (e.g., trajectories). Limited efforts have been placed on distilling governing laws of dynamics directly from videos for moving targets in a 3D space. To this end, we propose a vision-based approach to automatically uncover governing equations of nonlinear dynamics for 3D moving targets via raw videos recorded by a set of cameras. The approach is composed of three key blocks: (1) a target tracking module that extracts plane pixel motions of the moving target in each video, (2) a Rodrigues' rotation formula-based coordinate transformation learning module that reconstructs the 3D coordinates with respect to a predefined reference point, and (3) a spline-enhanced library-based sparse regressor that uncovers the underlying governing law of dynamics. This framework is capable of effectively handling the challenges associated with measurement data, e.g., noise in the video, imprecise tracking of the target that causes data missing, etc. The efficacy of our method has been demonstrated through multiple sets of synthetic videos considering different nonlinear dynamics.

Updated: 2024-04-27 11:13:55

标题: 基于视觉的三维移动目标非线性动力学发现

摘要: 数据驱动的发现控制方程在许多科学和工程领域引起了极大的兴趣。现有研究主要集中在基于对系统状态（例如轨迹）的直接测量，揭示控制非线性动态的方程。在三维空间中，针对移动目标直接从视频中提取动态规律的努力有限。为此，我们提出了一种基于视觉的方法，通过一组摄像机记录的原始视频自动揭示三维移动目标的非线性动态控制方程。该方法由三个关键模块组成：（1）目标跟踪模块，在每个视频中提取移动目标的平面像素运动；（2）基于罗德里格斯旋转公式的坐标转换学习模块，重建相对于预定义参考点的三维坐标；（3）基于样条增强库的稀疏回归器，揭示动态控制方程的基础规律。这一框架能够有效处理与测量数据相关的挑战，例如视频中的噪音，导致数据遗漏的目标跟踪不准确等。我们的方法已通过考虑不同非线性动态的多组合成视频展示了其有效性。

更新时间: 2024-04-27 11:13:55

领域: cs.CV,cs.AI,nlin.CD

下载: http://arxiv.org/abs/2404.17865v1

Solvent: liquidity verification of smart contracts

Smart contracts are programs executed by blockchains networks to regulate the exchange of crypto-assets between untrusted users. Due to their immutability, public accessibility and high value at stake, smart contracts are an attractive target for attackers, as evidenced by a long history of security incidents. This has been a driving factor for the application of formal methods to Ethereum, the leading smart contract platform, and Solidity, its main smart contract language, which have become the target of dozens of verification tools with varying objectives. A current limitation of these tools is that they are not really effective in expressing and verifying liquidity properties regarding the exchange of crypto-assets: for example, is it true that in every reachable state a user can fire a sequence of transactions to withdraw a given amount of crypto-assets? We propose Solvent, a tool aimed at verifying these kinds of properties, which are beyond the reach of existing verification tools for Solidity. We evaluate the effectiveness and performance of Solvent through a common benchmark of smart contracts.

Updated: 2024-04-27 10:54:50

标题: 溶剂：智能合约的流动性验证

摘要: 智能合约是由区块链网络执行的程序，用于规范不信任用户之间的加密资产交换。由于其不可变性、公共可访问性和高价值，智能合约成为攻击者的吸引目标，这一点通过长期的安全事件历史得以证明。这促使正式方法应用于以太坊，这是领先的智能合约平台，以及其主要智能合约语言Solidity，后者已成为数十种不同目标的验证工具的目标。目前这些工具的一个限制是它们在表达和验证有关加密资产交换的流动性属性方面并不真正有效：例如，是否在每个可达状态中，用户都可以执行一系列交易以提取特定数量的加密资产？我们提出了Solvent，这是一个旨在验证这类属性的工具，这些属性超出了现有Solidity验证工具的范围。我们通过智能合约的常见基准评估了Solvent的效力和性能。

更新时间: 2024-04-27 10:54:50

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2404.17864v1

Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy

The concept of differential privacy (DP) can quantitatively measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset. The DP, which is generally used as a constraint, has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google. A common methodology for guaranteeing DP is incorporating appropriate noise into query outputs, thereby establishing statistical defense systems against privacy attacks such as membership inference and linkage attacks. However, especially for small datasets, existing DP mechanisms occasionally add excessive amount of noise to query output, thereby discarding data utility. This is because the traditional DP computes privacy loss based on the worst-case scenario, i.e., statistical outliers. In this work, to tackle this challenge, we utilize per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances. In a nutshell, we propose a per-instance noise variance optimization (NVO) game, framed as a common interest sequential game, and show that the Nash equilibrium (NE) points of it inherently guarantee pDP for all data instances. Through extensive experiments, our proposed pDP algorithm demonstrated an average performance improvement of up to 99.53% compared to the conventional DP algorithm in terms of KL divergence.

Updated: 2024-04-27 10:36:12

标题: 隐私差分隐私中的噪声方差优化：透过基于游戏理论的逐实例差分隐私方法

摘要: 差分隐私（DP）的概念可以通过观察由于将个体包含在目标数据集中而引起的分布变化来定量衡量隐私损失。DP通常被用作约束，在像苹果和谷歌这样的行业巨头的机器学习中保护数据集方面占据重要地位。保证DP的常见方法是将适当的噪声合并到查询输出中，从而建立统计防御系统，抵御隐私攻击，如成员推断和链接攻击。然而，特别是对于小数据集，现有的DP机制有时会向查询输出添加过多的噪声，从而丢弃数据效用。这是因为传统的DP基于最坏情况即统计异常来计算隐私损失。为了解决这一挑战，我们在这项工作中利用每个实例的DP（pDP）作为约束，为每个数据实例测量隐私损失并优化针对个体实例的噪声。简而言之，我们提出了一个基于每个实例的噪声方差优化（NVO）博弈，将其构建为一个共同利益的顺序博弈，并证明其纳什均衡（NE）点从根本上保证了所有数据实例的pDP。通过大量实验证明，我们提出的pDP算法在KL散度方面相比传统的DP算法平均性能提高了高达99.53%。

更新时间: 2024-04-27 10:36:12

领域: cs.CR

下载: http://arxiv.org/abs/2404.15686v2

ResBit: Residual Bit Vector for Categorical Values

One-hot vectors, a method for representing discrete/categorical data, are commonly used in machine learning due to their simplicity and intuitiveness. However, the one-hot vectors suffer from a linear increase in dimensionality, posing computational and memory challenges, especially when dealing with datasets containing numerous categories. To address this issue, we propose Residual Bit Vectors (ResBit), a technique for densely representing categorical data. While Analog Bits presents a similar approach, it faces challenges in categorical data generation tasks. ResBit overcomes these limitations, offering a more versatile solution. In our experiments, we focus on tabular data generation, examining the performance across scenarios with varying amounts of categorical data. We verify the acceleration and ensure the maintenance or improvement of performance.

Updated: 2024-04-27 10:23:27

标题: ResBit：用于分类值的残差位向量

摘要: 一热向量是一种代表离散/分类数据的方法，由于其简单性和直观性，在机器学习中被广泛使用。然而，一热向量在维度上存在线性增长的问题，尤其是在处理包含大量类别的数据集时，会带来计算和内存挑战。为了解决这个问题，我们提出了残余比特向量（ResBit），一种用于密集表示分类数据的技术。虽然类比比特提出了类似的方法，但在分类数据生成任务中面临挑战。ResBit克服了这些限制，提供了更加多样化的解决方案。在我们的实验中，我们专注于表格数据生成，检查在不同数量的分类数据场景中的性能。我们验证了加速并确保性能的维持或改进。

更新时间: 2024-04-27 10:23:27

领域: cs.LG

下载: http://arxiv.org/abs/2309.17196v3

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate $\hbb^t$ for any fixed iteration $t$ along the trajectory. These estimators are proved to be $\sqrt n$-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration $t$, the estimates allow to select from the data an iteration $\hat t$ that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate $\hbb^t$ at any finite iteration $t$. Extensive simulations on synthetic data illustrate the theoretical results.

Updated: 2024-04-27 10:20:41

标题: 线性模型中迭代算法的不确定性量化及其在提前停止中的应用

摘要: 本文研究了在高维线性回归问题中通过迭代算法获得的$\hbb^1,\dots,\hbb^T$ 这些迭代结果，在特征维度 $p$ 与样本大小 $n$ 相当的情况下，即 $p \asymp n$。分析和提出的估计方法适用于梯度下降（GD）、近端梯度下降（proximal GD）及其加速变种，如快速迭代软阈值（FISTA）。本文提出了针对任意固定迭代次数 $t$ 的迭代结果 $\hbb^t$ 的泛化误差的新估计方法。这些估计方法在高斯设计下被证明是 $\sqrt n$ 一致的。提供了早停止的应用：当迭代结果的泛化误差是迭代次数 $t$ 的 U 形函数时，估计结果允许从数据中选择一个达到最小泛化误差的迭代次数 $\hat t$。此外，我们提供了一种技术，用于开发去偏修正和从在任意有限迭代次数 $t$ 的迭代结果 $\hbb^t$ 中得到真实系数向量分量的有效置信区间。在合成数据上的大量模拟结果展示了理论结果。

更新时间: 2024-04-27 10:20:41

领域: stat.ML,cs.LG,math.ST,stat.CO,stat.ME,stat.TH

下载: http://arxiv.org/abs/2404.17856v1

GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation

Convolutional Neural Networks (CNNs) have become widely adopted for medical image segmentation tasks, demonstrating promising performance. However, the inherent inductive biases in convolutional architectures limit their ability to model long-range dependencies and spatial correlations. While recent transformer-based architectures address these limitations by leveraging self-attention mechanisms to encode long-range dependencies and learn expressive representations, they often struggle to extract low-level features and are highly dependent on data availability. This motivated us for the development of GLIMS, a data-efficient attention-guided hybrid volumetric segmentation network. GLIMS utilizes Dilated Feature Aggregator Convolutional Blocks (DACB) to capture local-global feature correlations efficiently. Furthermore, the incorporated Swin Transformer-based bottleneck bridges the local and global features to improve the robustness of the model. Additionally, GLIMS employs an attention-guided segmentation approach through Channel and Spatial-Wise Attention Blocks (CSAB) to localize expressive features for fine-grained border segmentation. Quantitative and qualitative results on glioblastoma and multi-organ CT segmentation tasks demonstrate GLIMS' effectiveness in terms of complexity and accuracy. GLIMS demonstrated outstanding performance on BraTS2021 and BTCV datasets, surpassing the performance of Swin UNETR. Notably, GLIMS achieved this high performance with a significantly reduced number of trainable parameters. Specifically, GLIMS has 47.16M trainable parameters and 72.30G FLOPs, while Swin UNETR has 61.98M trainable parameters and 394.84G FLOPs. The code is publicly available on https://github.com/yaziciz/GLIMS.

Updated: 2024-04-27 10:18:55

标题: GLIMS：基于注意力引导的轻量级多尺度混合网络用于体积语义分割

摘要: 卷积神经网络（CNNs）已被广泛应用于医学图像分割任务，表现出有希望的性能。然而，卷积结构中固有的归纳偏差限制了其对长程依赖和空间相关性的建模能力。最近基于transformer的架构通过利用自注意机制来编码长程依赖并学习表达性表示来解决这些限制，但它们通常难以提取低级特征，并且高度依赖于数据的可用性。这激发了我们开发GLIMS的动机，这是一种数据高效的引导式混合体积分割网络。GLIMS利用扩张特征聚合卷积块（DACB）来有效地捕捉局部-全局特征相关性。此外，引入的Swin Transformer-based瓶颈桥接了局部和全局特征，以提高模型的稳健性。此外，GLIMS通过通道和空间智能注意块（CSAB）采用了一种注意引导的分割方法，以定位表达性特征用于细粒度边界分割。对脑胶质母细胞瘤和多器官CT分割任务的定量和定性结果展示了GLIMS在复杂性和准确性方面的有效性。GLIMS在BraTS2021和BTCV数据集上表现出色，超过了Swin UNETR的性能。值得注意的是，GLIMS以显著减少的可训练参数数量实现了这一高性能。具体而言，GLIMS具有47.16M可训练参数和72.30G FLOPs，而Swin UNETR具有61.98M可训练参数和394.84G FLOPs。代码可在https://github.com/yaziciz/GLIMS 上公开获取。

更新时间: 2024-04-27 10:18:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.17854v1

EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender Systems

Reinforcement Learning (RL)-Based Recommender Systems (RSs) have gained rising attention for their potential to enhance long-term user engagement. However, research in this field faces challenges, including the lack of user-friendly frameworks, inconsistent evaluation metrics, and difficulties in reproducing existing studies. To tackle these issues, we introduce EasyRL4Rec, an easy-to-use code library designed specifically for RL-based RSs. This library provides lightweight and diverse RL environments based on five public datasets and includes core modules with rich options, simplifying model development. It provides unified evaluation standards focusing on long-term outcomes and offers tailored designs for state modeling and action representation for recommendation scenarios. Furthermore, we share our findings from insightful experiments with current methods. EasyRL4Rec seeks to facilitate the model development and experimental process in the domain of RL-based RSs. The library is available for public use.

Updated: 2024-04-27 10:11:31

标题: EasyRL4Rec：一个用于基于强化学习的推荐系统的易于使用的库

摘要: 强化学习（RL）-基于推荐系统（RSs）因其提升用户长期参与度的潜力而备受关注。然而，这一领域的研究面临挑战，包括缺乏用户友好的框架、评估指标不一致以及难以复现现有研究。为了解决这些问题，我们推出了EasyRL4Rec，这是一个专门为基于RL的RSs设计的易于使用的代码库。该库基于五个公共数据集提供轻量级和多样化的RL环境，并包括具有丰富选项的核心模块，简化了模型开发过程。它提供了统一的评估标准，重点关注长期结果，并为推荐场景的状态建模和行动表示提供定制设计。此外，我们分享了与当前方法相关的有见地的实验结果。EasyRL4Rec旨在促进基于RL的RSs领域的模型开发和实验过程。该库可供公共使用。

更新时间: 2024-04-27 10:11:31

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2402.15164v2

pFedAFM: Adaptive Feature Mixture for Batch-Level Personalization in Heterogeneous Federated Learning

Model-heterogeneous personalized federated learning (MHPFL) enables FL clients to train structurally different personalized models on non-independent and identically distributed (non-IID) local data. Existing MHPFL methods focus on achieving client-level personalization, but cannot address batch-level data heterogeneity. To bridge this important gap, we propose a model-heterogeneous personalized Federated learning approach with Adaptive Feature Mixture (pFedAFM) for supervised learning tasks. It consists of three novel designs: 1) A sharing global homogeneous small feature extractor is assigned alongside each client's local heterogeneous model (consisting of a heterogeneous feature extractor and a prediction header) to facilitate cross-client knowledge fusion. The two feature extractors share the local heterogeneous model's prediction header containing rich personalized prediction knowledge to retain personalized prediction capabilities. 2) An iterative training strategy is designed to alternately train the global homogeneous small feature extractor and the local heterogeneous large model for effective global-local knowledge exchange. 3) A trainable weight vector is designed to dynamically mix the features extracted by both feature extractors to adapt to batch-level data heterogeneity. Theoretical analysis proves that pFedAFM can converge over time. Extensive experiments on 2 benchmark datasets demonstrate that it significantly outperforms 7 state-of-the-art MHPFL methods, achieving up to 7.93% accuracy improvement while incurring low communication and computation costs.

Updated: 2024-04-27 09:52:59

标题: pFedAFM：异构联邦学习中基于批级个性化的自适应特征混合

摘要: 模型异构个性化联邦学习（MHPFL）使FL客户端能够在非独立同分布（non-IID）的本地数据上训练结构不同的个性化模型。现有的MHPFL方法侧重于实现客户端级别的个性化，但无法解决批次级别数据的异质性。为了弥补这一重要差距，我们提出了一种具有自适应特征混合的模型异构个性化联邦学习方法（pFedAFM）用于监督学习任务。它包括三个新颖的设计：1）一个共享全局均一小特征提取器被分配到每个客户端的本地异构模型旁边（由异构特征提取器和预测头部组成），以促进跨客户端的知识融合。两个特征提取器共享本地异构模型的包含丰富个性化预测知识的预测头部，以保留个性化预测能力。2）设计了一种迭代训练策略，交替训练全局均一小特征提取器和本地异构大模型，以实现全局-本地知识交流。3）设计了一个可训练的权重向量，动态混合两个特征提取器提取的特征，以适应批次级别数据的异质性。理论分析证明pFedAFM可以随着时间收敛。对2个基准数据集的广泛实验表明，它明显优于7种最先进的MHPFL方法，准确率提高高达7.93%，同时产生低通信和计算成本。

更新时间: 2024-04-27 09:52:59

领域: cs.LG

下载: http://arxiv.org/abs/2404.17847v1

Using LLMs in Software Requirements Specifications: An Empirical Evaluation

The creation of a Software Requirements Specification (SRS) document is important for any software development project. Given the recent prowess of Large Language Models (LLMs) in answering natural language queries and generating sophisticated textual outputs, our study explores their capability to produce accurate, coherent, and structured drafts of these documents to accelerate the software development lifecycle. We assess the performance of GPT-4 and CodeLlama in drafting an SRS for a university club management system and compare it against human benchmarks using eight distinct criteria. Our results suggest that LLMs can match the output quality of an entry-level software engineer to generate an SRS, delivering complete and consistent drafts. We also evaluate the capabilities of LLMs to identify and rectify problems in a given requirements document. Our experiments indicate that GPT-4 is capable of identifying issues and giving constructive feedback for rectifying them, while CodeLlama's results for validation were not as encouraging. We repeated the generation exercise for four distinct use cases to study the time saved by employing LLMs for SRS generation. The experiment demonstrates that LLMs may facilitate a significant reduction in development time for entry-level software engineers. Hence, we conclude that the LLMs can be gainfully used by software engineers to increase productivity by saving time and effort in generating, validating and rectifying software requirements.

Updated: 2024-04-27 09:37:00

标题: 在软件需求规格说明书中使用LLMs：实证评估

摘要: 软件需求规格说明（SRS）文档的创建对于任何软件开发项目都是至关重要的。鉴于最近大型语言模型（LLMs）在回答自然语言查询和生成复杂文本输出方面的能力，我们的研究探讨了它们生成准确、连贯和结构化草稿文档的能力，以加快软件开发生命周期。我们评估了GPT-4和CodeLlama在为大学俱乐部管理系统起草SRS时的表现，并使用八个不同的标准与人类基准进行比较。我们的结果表明，LLMs能够与初级软件工程师的输出质量匹敌，生成完整和一致的草稿。我们还评估了LLMs识别和纠正给定需求文档中问题的能力。我们的实验表明，GPT-4能够识别问题并给出建设性反馈以加以纠正，而CodeLlama对验证的结果并不那么令人鼓舞。我们重复了四个不同用例的生成练习，以研究使用LLMs生成SRS所节省的时间。实验表明，LLMs可能显著减少初级软件工程师的开发时间。因此，我们得出结论，LLMs可以被软件工程师有益地使用，通过节省时间和精力来增加生产力，生成、验证和纠正软件需求。

更新时间: 2024-04-27 09:37:00

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2404.17842v1

Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection

Currently, smart contract vulnerabilities (SCVs) have emerged as a major factor threatening the transaction security of blockchain. Existing state-of-the-art methods rely on deep learning to mitigate this threat. They treat each input contract as an independent entity and feed it into a deep learning model to learn vulnerability patterns by fitting vulnerability labels. It is a pity that they disregard the correlation between contracts, failing to consider the commonalities between contracts of the same type and the differences among contracts of different types. As a result, the performance of these methods falls short of the desired level. To tackle this problem, we propose a novel Contrastive Learning Enhanced Automated Recognition Approach for Smart Contract Vulnerabilities, named Clear. In particular, Clear employs a contrastive learning (CL) model to capture the fine-grained correlation information among contracts and generates correlation labels based on the relationships between contracts to guide the training process of the CL model. Finally, it combines the correlation and the semantic information of the contract to detect SCVs. Through an empirical evaluation of a large-scale real-world dataset of over 40K smart contracts and compare 13 state-of-the-art baseline methods. We show that Clear achieves (1) optimal performance over all baseline methods; (2) 9.73%-39.99% higher F1-score than existing deep learning methods.

Updated: 2024-04-27 09:13:25

标题: 使用对比学习的漏洞检测改进智能合约安全

摘要: 目前，智能合约漏洞（SCVs）已经成为威胁区块链交易安全的主要因素。现有的最先进方法依赖于深度学习来减轻这种威胁。它们将每个输入合约视为独立实体，并将其输入深度学习模型以通过拟合漏洞标签来学习漏洞模式。遗憾的是，它们忽视了合约之间的相关性，未能考虑相同类型合约之间的共同点以及不同类型合约之间的差异。因此，这些方法的性能不符合期望水平。为了解决这个问题，我们提出了一种新颖的对比学习增强智能合约漏洞自动识别方法，称为Clear。具体而言，Clear采用对比学习（CL）模型来捕捉合约之间的细粒度相关信息，并根据合约之间的关系生成相关标签，以指导CL模型的训练过程。最后，它结合了合约的相关性和语义信息来检测SCVs。通过对超过40K个智能合约的大规模真实世界数据集进行实证评估，并比较13种最先进的基线方法，我们展示Clear实现了以下结果：（1）在所有基线方法中表现最佳；（2）比现有深度学习方法的F1分数高出9.73％-39.99％。

更新时间: 2024-04-27 09:13:25

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2404.17839v1

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high algorithmic performance, the computational and memory requirements of LLMs present unprecedented challenges. To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which is able to scale its model size without proportionally scaling up its computational requirements. Unfortunately, MoE's high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE's memory-hungry expert parameters to CPU memory fall short because the latency to migrate activated experts from CPU to GPU incurs high performance overhead. Our proposed Pre-gated MoE system effectively tackles the compute and memory challenges of conventional MoE architectures using our algorithm-system co-design. Pre-gated MoE employs our novel pre-gating function which alleviates the dynamic nature of sparse expert activation, allowing our proposed system to address the large memory footprint of MoEs while also achieving high performance. We demonstrate that Pre-gated MoE is able to improve performance, reduce GPU memory consumption, while also maintaining the same level of model quality. These features allow our Pre-gated MoE system to cost-effectively deploy large-scale LLMs using just a single GPU with high performance.

Updated: 2024-04-27 09:11:44

标题: 预门控模型专家（MoE）：一种用于快速和可扩展的专家混合推断的算法-系统协同设计

摘要: 近年来，基于transformers的大型语言模型(LLMs)取得了显著进展，其成功得益于模型规模的扩大。尽管它们在算法性能方面表现出色，LLMs的计算和内存需求却带来了前所未有的挑战。为了解决LLMs的高计算需求，引入了Mixture-of-Experts (MoE)架构，该架构能够在不成比例地增加计算需求的情况下扩展模型规模。不幸的是，MoE的高内存需求和稀疏专家的动态激活限制了其在现实世界问题中的适用性。先前的解决方案将MoE的内存密集型专家参数转移到CPU内存中，但由于从CPU迁移激活的专家到GPU会带来高性能开销，因此效果不佳。我们提出的Pre-gated MoE系统通过我们的算法-系统协同设计有效地解决了传统MoE架构的计算和内存挑战。Pre-gated MoE采用我们的新颖预门控函数，缓解了稀疏专家激活的动态特性，使我们提出的系统能够应对MoEs的大内存占用同时实现高性能。我们证明Pre-gated MoE能够改善性能，减少GPU内存消耗，同时保持相同水平的模型质量。这些特性使得我们的Pre-gated MoE系统能够以高性能仅使用单个GPU成本有效地部署大规模LLMs。

更新时间: 2024-04-27 09:11:44

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2308.12066v3

KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities

Linux kernel vulnerability reproduction is a critical task in system security. To reproduce a kernel vulnerability, the vulnerable environment and the Proof of Concept (PoC) program are needed. Most existing research focuses on the generation of PoC, while the construction of environment is overlooked. However, establishing an effective vulnerable environment to trigger a vulnerability is challenging. Firstly, it is hard to guarantee that the selected kernel version for reproduction is vulnerable, as the vulnerability version claims in online databases can occasionally be spurious. Secondly, many vulnerabilities can not be reproduced in kernels built with default configurations. Intricate non-default kernel configurations must be set to include and trigger a kernel vulnerability, but less information is available on how to recognize these configurations. To solve these challenges, we propose a patch-based approach to identify real vulnerable kernel versions and a graph-based approach to identify necessary configs for activating a specific vulnerability. We implement these approaches in a tool, KernJC, automating the generation of vulnerable environments for kernel vulnerabilities. To evaluate the efficacy of KernJC, we build a dataset containing 66 representative real-world vulnerabilities with PoCs from kernel vulnerability research in the past five years. The evaluation shows that KernJC builds vulnerable environments for all these vulnerabilities, 48.5% of which require non-default configs, and 4 have incorrect version claims in the National Vulnerability Database (NVD). Furthermore, we conduct large-scale spurious version detection on kernel vulnerabilities and identify 128 vulnerabilities which have spurious version claims in NVD. To foster future research, we release KernJC with the dataset in the community.

Updated: 2024-04-27 08:59:52

标题: KernJC：用于Linux内核漏洞的自动化脆弱环境生成

摘要: Linux内核漏洞再现是系统安全中的关键任务。要再现内核漏洞，需要具有漏洞的环境和Proof of Concept（PoC）程序。大部分现有研究集中在生成PoC上，而环境构建被忽视。然而，建立一个有效的易受攻击环境来触发漏洞是具有挑战性的。首先，很难保证用于再现的选择内核版本是有漏洞的，因为在线数据库中的漏洞版本声明有时可能是虚假的。其次，许多漏洞无法在使用默认配置构建的内核中再现。必须设置复杂的非默认内核配置来包含和触发内核漏洞，但关于如何识别这些配置的信息较少。为解决这些挑战，我们提出了一种基于补丁的方法来识别真正的易受攻击内核版本，并提出了一种基于图的方法来识别激活特定漏洞所需的配置。我们在一个名为KernJC的工具中实现了这些方法，自动化生成内核漏洞的易受攻击环境。为评估KernJC的有效性，我们构建了一个包含66个代表性真实世界漏洞的数据集，这些漏洞来自过去五年内核漏洞研究中的PoC。评估结果显示，KernJC为所有这些漏洞构建了易受攻击环境，其中48.5%需要非默认配置，并且有4个版本声明在国家漏洞数据库（NVD）中是错误的。此外，我们对内核漏洞进行了大规模虚假版本检测，并识别了128个在NVD中有虚假版本声明的漏洞。为促进未来研究，我们在社区中发布了包含数据集的KernJC。

更新时间: 2024-04-27 08:59:52

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2404.11107v2

Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs

Agents based on large language models (LLMs) have demonstrated effectiveness in solving a wide range of tasks by integrating LLMs with key modules such as planning, memory, and tool usage. Increasingly, customers are adopting LLM agents across a variety of commercial applications critical to reliability, including support for mental well-being, chemical synthesis, and software development. Nevertheless, our observations and daily use of LLM agents indicate that they are prone to making erroneous plans, especially when the tasks are complex and require long-term planning. In this paper, we propose PDoctor, a novel and automated approach to testing LLM agents and understanding their erroneous planning. As the first work in this direction, we formulate the detection of erroneous planning as a constraint satisfiability problem: an LLM agent's plan is considered erroneous if its execution violates the constraints derived from the user inputs. To this end, PDoctor first defines a domain-specific language (DSL) for user queries and synthesizes varying inputs with the assistance of the Z3 constraint solver. These synthesized inputs are natural language paragraphs that specify the requirements for completing a series of tasks. Then, PDoctor derives constraints from these requirements to form a testing oracle. We evaluate PDoctor with three mainstream agent frameworks and two powerful LLMs (GPT-3.5 and GPT-4). The results show that PDoctor can effectively detect diverse errors in agent planning and provide insights and error characteristics that are valuable to both agent developers and users. We conclude by discussing potential alternative designs and directions to extend PDoctor.

Updated: 2024-04-27 08:56:45

标题: 通过合成用户输入来测试和理解LLM代理中的错误规划

摘要: 基于大型语言模型（LLMs）的代理已经证明通过将LLMs与关键模块（如规划、记忆和工具使用）集成，可以有效解决各种任务。越来越多的客户正在采用LLM代理，应用于各种商业应用程序，包括支持心理健康、化学合成和软件开发等对可靠性至关重要的领域。然而，我们对LLM代理的观察和日常使用表明，它们容易制定错误的计划，特别是在任务复杂且需要长期规划的情况下。在本文中，我们提出了PDoctor，一种新颖的自动化方法，用于测试LLM代理并理解它们的错误规划。作为该领域的首个工作，我们将检测错误规划视为一个约束满足问题：如果LLM代理的计划执行违反了从用户输入中推导出的约束条件，则该计划被认为是错误的。为此，PDoctor首先为用户查询定义了一个特定领域语言（DSL），并在Z3约束求解器的帮助下合成不同的输入。这些合成的输入是自然语言段落，指定完成一系列任务的要求。然后，PDoctor从这些要求中推导出约束条件，形成一个测试Oracle。我们使用三种主流代理框架和两种强大的LLMs（GPT-3.5和GPT-4）对PDoctor进行评估。结果表明，PDoctor能够有效地检测代理规划中的各种错误，并为代理开发人员和用户提供有价值的见解和错误特征。最后，我们讨论了扩展PDoctor的潜在替代设计和方向。

更新时间: 2024-04-27 08:56:45

领域: cs.AI,cs.PL

下载: http://arxiv.org/abs/2404.17833v1

Dynamic Against Dynamic: An Open-set Self-learning Framework

In open-set recognition, existing methods generally learn statically fixed decision boundaries using known classes to reject unknown classes. Though they have achieved promising results, such decision boundaries are evidently insufficient for universal unknown classes in dynamic and open scenarios as they can potentially appear at any position in the feature space. Moreover, these methods just simply reject unknown class samples during testing without any effective utilization for them. In fact, such samples completely can constitute the true instantiated representation of the unknown classes to further enhance the model's performance. To address these issues, this paper proposes a novel dynamic against dynamic idea, i.e., dynamic method against dynamic changing open-set world, where an open-set self-learning (OSSL) framework is correspondingly developed. OSSL starts with a good closed-set classifier trained by known classes and utilizes available test samples for model adaptation during testing, thus gaining the adaptability to changing data distributions. In particular, a novel self-matching module is designed for OSSL, which can achieve the adaptation in automatically identifying known class samples while rejecting unknown class samples which are further utilized to enhance the discriminability of the model as the instantiated representation of unknown classes. Our method establishes new performance milestones respectively in almost all standard and cross-data benchmarks.

Updated: 2024-04-27 08:40:33

标题: 动态对抗动态：一种开放式自学习框架

摘要: 在开放集识别中，现有的方法通常学习静态固定的决策边界，使用已知类别来拒绝未知类别。尽管它们取得了令人满意的结果，但这些决策边界显然不足以应对动态和开放场景中的普遍未知类别，因为它们可能出现在特征空间的任何位置。此外，在测试过程中，这些方法只是简单地拒绝未知类别样本，而没有对它们进行有效利用。事实上，这些样本完全可以构成未知类别的真实实例化表示，进一步提高模型的性能。为了解决这些问题，本文提出了一种新颖的动态对抗动态思想，即动态方法对抗动态变化的开放集世界，相应地开发了一种开放集自学习（OSSL）框架。OSSL从通过已知类别训练的良好封闭集分类器开始，并利用可用的测试样本进行模型适应，从而获得对数据分布变化的适应性。特别地，为OSSL设计了一种新颖的自匹配模块，可以在自动识别已知类别样本的同时拒绝未知类别样本，这些样本进一步用于增强模型的可辨识性作为未知类别的实例化表示。我们的方法在几乎所有标准和交叉数据基准测试中分别建立了新的性能里程碑。

更新时间: 2024-04-27 08:40:33

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.17830v1

The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth

While deep neural networks are highly effective at solving complex tasks, large pre-trained models are commonly employed even to solve consistently simpler downstream tasks, which do not necessarily require a large model's complexity. Motivated by the awareness of the ever-growing AI environmental impact, we propose an efficiency strategy that leverages prior knowledge transferred by large models. Simple but effective, we propose a method relying on an Entropy-bASed Importance mEtRic (EASIER) to reduce the depth of over-parametrized deep neural networks, which alleviates their computational burden. We assess the effectiveness of our method on traditional image classification setups. The source code will be publicly released upon acceptance of the article.

Updated: 2024-04-27 08:28:25

标题: 越简单越好：一种基于熵的重要性度量方法用于减少神经网络的深度

摘要: 尽管深度神经网络在解决复杂任务方面非常有效，但即使是用于解决一直相对简单的下游任务，也通常会使用大型预训练模型，这些任务并不一定需要大型模型的复杂性。受到不断增长的人工智能环境影响意识的启发，我们提出了一种利用大型模型传递的先验知识的效率策略。我们提出了一种简单但有效的方法，依赖于基于熵的重要度评估方法（EASIER），以减少过度参数化的深度神经网络的深度，从而减轻其计算负担。我们在传统图像分类设置上评估了我们方法的有效性。文章一经接受，源代码将公开发布。

更新时间: 2024-04-27 08:28:25

领域: cs.LG

下载: http://arxiv.org/abs/2404.18949v1

Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI

The Deep learning (DL) models for diagnosing breast cancer from mammographic images often operate as "black boxes", making it difficult for healthcare professionals to trust and understand their decision-making processes. The study presents an integrated framework combining Convolutional Neural Networks (CNNs) and Explainable Artificial Intelligence (XAI) for the enhanced diagnosis of breast cancer using the CBIS-DDSM dataset. The methodology encompasses an elaborate data preprocessing pipeline and advanced data augmentation techniques to counteract dataset limitations and transfer learning using pre-trained networks such as VGG-16, Inception-V3 and ResNet was employed. A focal point of our study is the evaluation of XAI's effectiveness in interpreting model predictions, highlighted by utilizing the Hausdorff measure to assess the alignment between AI-generated explanations and expert annotations quantitatively. This approach is critical for XAI in promoting trustworthiness and ethical fairness in AI-assisted diagnostics. The findings from our research illustrate the effective collaboration between CNNs and XAI in advancing diagnostic methods for breast cancer, thereby facilitating a more seamless integration of advanced AI technologies within clinical settings. By enhancing the interpretability of AI driven decisions, this work lays the groundwork for improved collaboration between AI systems and medical practitioners, ultimately enriching patient care. Furthermore, the implications of our research extended well beyond the current methodologies. It encourages further research into how to combine multimodal data and improve AI explanations to meet the needs of clinical practice.

Updated: 2024-04-27 08:24:37

标题: 提升乳腺癌在乳腺X线摄影中的诊断：卷积神经网络和可解释人工智能的评估和整合

摘要: 深度学习（DL）模型用于从乳腺X线照片诊断乳腺癌通常作为“黑匣子”运行，使医疗专业人员难以信任和理解它们的决策过程。该研究提出了一个整合的框架，结合卷积神经网络（CNNs）和可解释人工智能（XAI），以增强使用CBIS-DDSM数据集诊断乳腺癌的方法。该方法包括一个详尽的数据预处理流程和先进的数据增强技术，以抵消数据集的限制，并采用了预训练网络如VGG-16、Inception-V3和ResNet进行迁移学习。我们研究的一个重点是评估XAI在解释模型预测中的有效性，通过利用Hausdorff度量定量评估人工智能生成的解释与专家注释之间的一致性。这种方法对于XAI在促进人工智能辅助诊断的可信度和道德公平性至关重要。我们研究的发现说明了CNNs和XAI在推进乳腺癌诊断方法方面的有效合作，从而促进了先进人工智能技术在临床环境中更无缝地融合。通过提升人工智能决策的可解释性，这项工作为改善人工智能系统与医疗从业者之间的合作奠定了基础，最终丰富了患者护理。此外，我们研究的意义远远超出了当前的方法。它鼓励进一步研究如何结合多模态数据和改进人工智能解释，以满足临床实践的需求。

更新时间: 2024-04-27 08:24:37

领域: cs.CV,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2404.03892v3

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is that owing to the rarity of anomalies, they typically exhibit more pronounced differences from their sub-adjacent neighborhoods than from their immediate vicinities. By focusing the attention on the sub-adjacent areas, we make the reconstruction of anomalies more challenging, thereby enhancing their detectability. Technically, our approach concentrates attention on the non-diagonal areas of the attention matrix by enlarging the corresponding elements in the training stage. To facilitate the implementation of the desired attention matrix pattern, we adopt linear attention because of its flexibility and adaptability. Moreover, a learnable mapping function is proposed to improve the performance of linear attention. Empirically, the Sub-Adjacent Transformer achieves state-of-the-art performance across six real-world anomaly detection benchmarks, covering diverse fields such as server monitoring, space exploration, and water treatment.

Updated: 2024-04-27 08:08:17

标题: 子母邻近变压器：利用子母邻近区域的重建误差改进时间序列异常检测

摘要: 在本文中，我们提出了一种具有新颖注意机制的Sub-Adjacent Transformer，用于无监督时间序列异常检测。与先前依赖于某个邻域内所有点进行时间点重建的方法不同，我们的方法将注意力限制在与目标点不直接相邻的区域，称为次邻域。我们的关键观察是，由于异常的稀有性，它们通常与其次邻域相比，与其直接邻近区域表现出更显著的差异。通过将注意力集中在次邻域上，我们使异常的重建更具挑战性，从而增强其可检测性。在技术上，我们的方法通过在训练阶段扩大注意矩阵的相应元素，将注意力集中在注意矩阵的非对角区域。为了促进所需的注意矩阵模式的实施，我们采用线性注意，因为它具有灵活性和适应性。此外，提出了一个可学习的映射函数来提高线性注意的性能。经验上，Sub-Adjacent Transformer 在涵盖服务器监控、空间探索和水处理等多个领域的六个真实世界异常检测基准上取得了最先进的性能。

更新时间: 2024-04-27 08:08:17

领域: cs.LG

下载: http://arxiv.org/abs/2404.18948v1

Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. However, existing methods often compromise the model performance or require additional training, which is undesirable for operators and users. To address this issue, we propose Gaussian Shading, a diffusion model watermarking technique that is both performance-lossless and training-free, while serving the dual purpose of copyright protection and tracing of offending content. Our watermark embedding is free of model parameter modifications and thus is plug-and-play. We map the watermark to latent representations following a standard Gaussian distribution, which is indistinguishable from latent representations obtained from the non-watermarked diffusion model. Therefore we can achieve watermark embedding with lossless performance, for which we also provide theoretical proof. Furthermore, since the watermark is intricately linked with image semantics, it exhibits resilience to lossy processing and erasure attempts. The watermark can be extracted by Denoising Diffusion Implicit Models (DDIM) inversion and inverse sampling. We evaluate Gaussian Shading on multiple versions of Stable Diffusion, and the results demonstrate that Gaussian Shading not only is performance-lossless but also outperforms existing methods in terms of robustness.

Updated: 2024-04-27 08:05:29

标题: 高斯着色：扩散模型中可证明的性能损失的图像水印技术

摘要: 围绕版权保护和不当内容生成的道德问题给扩散模型的实际实施带来挑战。一种有效的解决方案是在生成的图像上添加水印。然而，现有方法通常会影响模型性能或需要额外的训练，这对操作员和用户来说是不可取的。为了解决这个问题，我们提出了高斯着色，一种扩散模型水印技术，既不损失性能又无需训练，同时可以实现版权保护和追踪违规内容的双重目的。我们的水印嵌入不涉及模型参数修改，因此是即插即用的。我们将水印映射到遵循标准高斯分布的潜在表示，这与从非水印扩散模型获得的潜在表示无法区分。因此，我们可以实现无损性能的水印嵌入，我们还提供了理论证明。此外，由于水印与图像语义密切相关，它对有损处理和擦除尝试具有韧性。水印可以通过去噪扩散隐式模型（DDIM）反演和反向采样来提取。我们在多个版本的稳定扩散上评估了高斯着色，结果表明高斯着色不仅是无性能损失的，而且在鲁棒性方面优于现有方法。

更新时间: 2024-04-27 08:05:29

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2404.04956v2

Motion planning for off-road autonomous driving based on human-like cognition and weight adaptation

Driving in an off-road environment is challenging for autonomous vehicles due to the complex and varied terrain. To ensure stable and efficient travel, the vehicle requires consideration and balancing of environmental factors, such as undulations, roughness, and obstacles, to generate optimal trajectories that can adapt to changing scenarios. However, traditional motion planners often utilize a fixed cost function for trajectory optimization, making it difficult to adapt to different driving strategies in challenging irregular terrains and uncommon scenarios. To address these issues, we propose an adaptive motion planner based on human-like cognition and cost evaluation for off-road driving. First, we construct a multi-layer map describing different features of off-road terrains, including terrain elevation, roughness, obstacle, and artificial potential field map. Subsequently, we employ a CNN-LSTM network to learn the trajectories planned by human drivers in various off-road scenarios. Then, based on human-like generated trajectories in different environments, we design a primitive-based trajectory planner that aims to mimic human trajectories and cost weight selection, generating trajectories that are consistent with the dynamics of off-road vehicles. Finally, we compute optimal cost weights and select and extend behavioral primitives to generate highly adaptive, stable, and efficient trajectories. We validate the effectiveness of the proposed method through experiments in a desert off-road environment with complex terrain and varying road conditions. The experimental results show that the proposed human-like motion planner has excellent adaptability to different off-road conditions. It shows real-time operation, greater stability, and more human-like planning ability in diverse and challenging scenarios.

Updated: 2024-04-27 08:00:35

标题: 基于人类认知和重量适应的越野自动驾驶运动规划

摘要: 在崎岖的离线环境中驾驶对自动驾驶车辆来说是一项具有挑战性的任务，因为地形复杂多样。为了确保稳定和高效的行驶，车辆需要考虑和平衡环境因素，如起伏、崎岖和障碍物，以生成可以适应不断变化情境的最优轨迹。然而，传统的运动规划器通常使用固定成本函数进行轨迹优化，使其难以适应在挑战性不规则地形和不寻常情况下的不同驾驶策略。为了解决这些问题，我们提出了一种基于类人认知和成本评估的适应性运动规划器用于离线驾驶。首先，我们构建了描述离线地形不同特征的多层地图，包括地形高程、崎岖程度、障碍物和人工潜力场地图。随后，我们采用CNN-LSTM网络学习人类驾驶员在各种离线情境中规划的轨迹。然后，基于不同环境中类人生成的轨迹，我们设计了一个基于原始的轨迹规划器，旨在模仿人类轨迹和成本权重选择，生成与离线车辆动力学一致的轨迹。最后，我们计算最优成本权重，并选择和扩展行为原语以生成高度适应性、稳定和高效的轨迹。我们通过在具有复杂地形和不同路况的沙漠离线环境中进行实验来验证所提出的方法的有效性。实验结果显示，所提出的类人运动规划器在不同离线条件下具有出色的适应性。它展现了实时运行、更大的稳定性和更类人的规划能力，适用于各种具有挑战性的情况。

更新时间: 2024-04-27 08:00:35

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.17820v1

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance in input data distribution, and the use of different communication devices and topologies (e.g., NVLink, PCIe, network cards) that connect multiple compute devices, coupled with the desire for flexible training configurations. Built on top of our prior work for single-GPU platforms, we address these challenges and enable multi-GPU performance modeling by incorporating (1) data-distribution-aware performance models for embedding table lookup, and (2) data movement prediction of communication collectives, into our upgraded performance modeling pipeline equipped with inter-and intra-rank synchronization for ML workloads trained on multi-GPU platforms. Beyond accurately predicting the per-iteration training time of DLRM models with random configurations with a geomean error of 5.21% on two multi-GPU platforms, our prediction pipeline generalizes well to other types of ML workloads, such as Transformer-based NLP models with a geomean error of 3.00%. Moreover, even without actually running ML workloads like DLRMs on the hardware, it is capable of generating insights such as quickly selecting the fastest embedding table sharding configuration (with a success rate of 85%).

Updated: 2024-04-27 07:59:21

标题: 走向多GPU平台上机器学习训练的通用性能建模

摘要: 表征和预测现代机器学习（ML）工作负载在计算系统上的训练性能，其中计算和通信分散在CPU、GPU和网络设备之间，不仅是优化和规划的关键，也是一个复杂的目标。主要挑战包括CPU和GPU之间同步和负载平衡的复杂性，输入数据分布的差异，以及使用不同通信设备和拓扑结构（例如NVLink、PCIe、网络卡）连接多个计算设备，同时希望实现灵活的训练配置。基于我们之前针对单GPU平台的工作，我们通过将数据分布感知性能模型和通信集合的数据移动预测纳入我们升级后的性能建模流程中，实现了多GPU性能建模，该流程配备了ML工作负载在多GPU平台上训练时的跨和内秩同步。除了准确预测具有随机配置的DLRM模型的每次迭代训练时间，在两个多GPU平台上的几何平均误差为5.21%之外，我们的预测流程还很好地推广到其他类型的ML工作负载，例如基于Transformer的NLP模型，其几何平均误差为3.00%。此外，即使没有实际在硬件上运行像DLRM这样的ML工作负载，它也能生成洞察，例如快速选择最快的嵌入表分片配置（成功率为85%）。

更新时间: 2024-04-27 07:59:21

领域: cs.DC,cs.LG,cs.PF

下载: http://arxiv.org/abs/2404.12674v2

Data Selection: A General Principle for Building Small Interpretable Models

We present convincing empirical evidence for an effective and general strategy for building accurate small models. Such models are attractive for interpretability and also find use in resource-constrained environments. The strategy is to learn the training distribution and sample accordingly from the provided training data. The distribution learning algorithm is not a contribution of this work; our contribution is a rigorous demonstration of the broad utility of this strategy in various practical settings. We apply it to the tasks of (1) building cluster explanation trees, (2) prototype-based classification, and (3) classification using Random Forests, and show that it improves the accuracy of decades-old weak traditional baselines to be competitive with specialized modern techniques. This strategy is also versatile wrt the notion of model size. In the first two tasks, model size is considered to be number of leaves in the tree and the number of prototypes respectively. In the final task involving Random Forests, the strategy is shown to be effective even when model size comprises of more than one factor: number of trees and their maximum depth. Positive results using multiple datasets are presented that are shown to be statistically significant.

Updated: 2024-04-27 07:42:45

标题: 数据选择：建立小型可解释模型的一般原则

摘要: 我们提出了有力的经验证据，证明了一种构建准确小型模型的有效且通用策略。这种模型具有可解释性，并且在资源受限的环境中也有用途。该策略是学习训练分布并根据所提供的训练数据进行抽样。分布学习算法不是本文的贡献；我们的贡献是在各种实际环境中严格展示了该策略的广泛实用性。我们将其应用于(1)构建集群解释树、(2)基于原型的分类、以及(3)使用随机森林进行分类的任务，并展示它将几十年来传统基线的准确性提高到与专门的现代技术竞争的水平。该策略还在模型大小的概念方面具有多样性。在前两个任务中，模型大小分别被视为树中的叶子数和原型的数量。在涉及随机森林的最后一个任务中，即使模型大小包括多个因素，例如树的数量和它们的最大深度，该策略也被证明是有效的。我们展示了使用多个数据集的积极结果，并且这些结果被证明具有统计学意义。

更新时间: 2024-04-27 07:42:45

领域: cs.LG

下载: http://arxiv.org/abs/2210.03921v3

Co-learning-aided Multi-modal-deep-learning Framework of Passive DOA Estimators for a Heterogeneous Hybrid Massive MIMO Receiver

Due to its excellent performance in rate and resolution, fully-digital (FD) massive multiple-input multiple-output (MIMO) antenna arrays has been widely applied in data transmission and direction of arrival (DOA) measurements, etc. But it confronts with two main challenges: high computational complexity and circuit cost. The two problems may be addressed well by hybrid analog-digital (HAD) structure. But there exists the problem of phase ambiguity for HAD, which leads to its low-efficiency or high-latency. Does exist there such a MIMO structure of owning low-cost, low-complexity and high time efficiency at the same time. To satisfy the three properties, a novel heterogeneous hybrid MIMO receiver structure of integrating FD and heterogeneous HAD ($\rm{H}^2$AD-FD) is proposed and corresponding multi-modal (MD)-learning framework is developed. The framework includes three major stages: 1) generate the candidate sets via root multiple signal classification (Root-MUSIC) or deep learning (DL); 2) infer the class of true solutions from candidate sets using machine learning (ML) methods; 3) fuse the two-part true solutions to achieve a better DOA estimation. The above process form two methods named MD-Root-MUSIC and MDDL. To improve DOA estimation accuracy and reduce the clustering complexity, a co-learning-aided MD framework is proposed to form two enhanced methods named CoMDDL and CoMD-RootMUSIC. Moreover, the Cramer-Rao lower bound (CRLB) for the proposed $\rm{H}^2$AD-FD structure is also derived. Experimental results demonstrate that our proposed four methods could approach the CRLB for signal-to-noise ratio (SNR) > 0 dB and the proposed CoMDDL and MDDL perform better than CoMD-RootMUSIC and MD-RootMUSIC, particularly in the extremely low SNR region.

Updated: 2024-04-27 07:34:36

标题: 合作学习辅助的多模态深度学习框架用于异构混合大规模MIMO接收机的被动DOA估计器

摘要: 由于其在速率和分辨率方面表现出色，全数字（FD）大规模多输入多输出（MIMO）天线阵列已被广泛应用于数据传输和到达方向（DOA）测量等领域。但它面临两个主要挑战：高计算复杂性和电路成本。这两个问题可以通过混合模拟数字（HAD）结构很好地解决。但是HAD存在相位模糊的问题，导致其效率低或延迟高。是否存在一种同时具有低成本、低复杂度和高时间效率的MIMO结构。为满足这三个特性，提出了一种新颖的异构混合MIMO接收器结构，将FD和异构HAD（$\rm{H}^2$AD-FD）集成，并开发了相应的多模态（MD）学习框架。该框架包括三个主要阶段：1）通过根多信号分类（Root-MUSIC）或深度学习（DL）生成候选集；2）利用机器学习（ML）方法从候选集推导出真实解的类别；3）融合两部分真实解以实现更好的DOA估计。上述过程形成了两种方法，命名为MD-Root-MUSIC和MDDL。为提高DOA估计精度并减少聚类复杂度，提出了一种协同学习辅助的MD框架，形成了两种增强方法，命名为CoMDDL和CoMD-RootMUSIC。此外，还推导了所提出的$\rm{H}^2$AD-FD结构的Cramer-Rao下限（CRLB）。实验结果表明，我们提出的四种方法可以在信噪比（SNR）> 0 dB时接近CRLB，而提出的CoMDDL和MDDL在极低SNR区域表现优于CoMD-RootMUSIC和MD-RootMUSIC。

更新时间: 2024-04-27 07:34:36

领域: eess.SP,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.09556v1

Multimodal Fusion on Low-quality Data: A Comprehensive Survey

Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges and recent advances of multimodal fusion in the wild and presents them in a comprehensive taxonomy. From a data-centric view, we identify four main challenges that are faced by multimodal fusion on low-quality data, namely (1) noisy multimodal data that are contaminated with heterogeneous noises, (2) incomplete multimodal data that some modalities are missing, (3) imbalanced multimodal data that the qualities or properties of different modalities are significantly different and (4) quality-varying multimodal data that the quality of each modality dynamically changes with respect to different samples. This new taxonomy will enable researchers to understand the state of the field and identify several potential directions. We also provide discussion for the open problems in this field together with interesting future research directions.

Updated: 2024-04-27 07:22:28

标题: 低质量数据上的多模态融合：综合调查

摘要: 多模态融合侧重于整合来自多个模态的信息，以实现更准确的预测目标，已在包括自动驾驶和医学诊断在内的各种场景中取得了显著进展。然而，多模态融合的可靠性在低质量数据设置下仍然很大程度上未被探索。本文调查了野外多模态融合的常见挑战和最新进展，并以综合性分类的方式呈现。从数据中心的视角，我们确定了多模态融合在低质量数据上面临的四个主要挑战，即（1）受到异质噪声污染的噪声多模态数据，（2）存在缺失模态的不完整多模态数据，（3）质量或属性明显不同的不平衡多模态数据，以及（4）质量变化多模态数据，即每个模态的质量随不同样本动态变化。这种新的分类将使研究人员了解该领域的现状，并确定几个潜在方向。我们还对这一领域的开放问题提供了讨论，以及有趣的未来研究方向。

更新时间: 2024-04-27 07:22:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.18947v1

Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction

Relation extraction (RE) aims to identify relations between entities mentioned in texts. Although large language models (LLMs) have demonstrated impressive in-context learning (ICL) abilities in various tasks, they still suffer from poor performances compared to most supervised fine-tuned RE methods. Utilizing ICL for RE with LLMs encounters two challenges: (1) retrieving good demonstrations from training examples, and (2) enabling LLMs exhibit strong ICL abilities in RE. On the one hand, retrieving good demonstrations is a non-trivial process in RE, which easily results in low relevance regarding entities and relations. On the other hand, ICL with an LLM achieves poor performance in RE while RE is different from language modeling in nature or the LLM is not large enough. In this work, we propose a novel recall-retrieve-reason RE framework that synergizes LLMs with retrieval corpora (training examples) to enable relevant retrieving and reliable in-context reasoning. Specifically, we distill the consistently ontological knowledge from training datasets to let LLMs generate relevant entity pairs grounded by retrieval corpora as valid queries. These entity pairs are then used to retrieve relevant training examples from the retrieval corpora as demonstrations for LLMs to conduct better ICL via instruction tuning. Extensive experiments on different LLMs and RE datasets demonstrate that our method generates relevant and valid entity pairs and boosts ICL abilities of LLMs, achieving competitive or new state-of-the-art performance on sentence-level RE compared to previous supervised fine-tuning methods and ICL-based methods.

Updated: 2024-04-27 07:12:52

标题: 回忆、检索和推理：朝着更好的上下文关系抽取方向

摘要: 关系抽取（RE）旨在识别文本中提及的实体之间的关系。尽管大型语言模型（LLMs）已经展示了在各种任务中令人印象深刻的上下文学习（ICL）能力，但与大多数监督微调的RE方法相比，它们仍然表现不佳。利用LLMs进行RE面临两个挑战：（1）从训练示例中检索良好的演示，以及（2）使LLMs展示出在RE中强大的ICL能力。一方面，在RE中检索良好的演示是一个棘手的过程，很容易导致实体和关系的相关性低。另一方面，在LLM中进行ICL在RE中表现不佳，而RE在本质上与语言建模不同，或者LLM不够大。在这项工作中，我们提出了一种新颖的召回-检索-推理RE框架，将LLMs与检索库（训练示例）协同，以实现相关的检索和可靠的上下文推理。具体而言，我们从训练数据集中提炼出一致的本体知识，让LLMs生成由检索库作为有效查询基础的相关实体对。然后，这些实体对用于从检索库中检索相关的训练示例，作为LLMs进行更好ICL的演示，通过指导调整。对不同LLMs和RE数据集的广泛实验表明，我们的方法生成相关和有效的实体对，并提升了LLMs的ICL能力，在句子级RE方面相比以前的监督微调方法和基于ICL的方法，实现了具有竞争力或新的最新性能。

更新时间: 2024-04-27 07:12:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17809v1

Towards Measuring and Modeling "Culture" in LLMs: A Survey

We present a survey of 39 recent papers that aim to study cultural representation and inclusion in large language models. We observe that none of the studies define "culture," which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of "culture." We call these aspects the proxies of cultures, and organize them across three dimensions of demographic, semantic and linguistic-cultural interaction proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of "culture," such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness and situatedness of the current methods. Based on these observations, we provide several recommendations for a holistic and practically useful research agenda for furthering cultural inclusion in LLMs and LLM-based applications.

Updated: 2024-04-27 07:08:24

标题: 朝着在LLM中测量和建模“文化”的方向：一项调查

摘要: 我们提出了一项研究文化代表性和包容性的39篇最新论文的调查。我们观察到，这些研究没有定义“文化”，这是一个复杂多面的概念；相反，他们在一些专门设计的数据集上对模型进行探究，这些数据集代表了“文化”的某些方面。我们将这些方面称为文化的代理，将它们组织在人口统计、语义和语言文化交互代理的三个维度上。我们还对使用的探查方法进行分类。我们的分析表明，只有“文化”的某些方面，如价值观和目标，才得到研究，留下了其他许多有趣和重要的方面，尤其是语义领域的多样性（Thompson等人，2020年）和关于性（Hershcovich等人，2022年），尚未被探索。另外两个关键缺口是当前方法的稳健性和情境性。基于这些观察结果，我们提出了几项建议，为进一步促进LLMs和基于LLM的应用中的文化包容性提供一个全面和实用的研究议程。

更新时间: 2024-04-27 07:08:24

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.15412v3

HierCas: Hierarchical Temporal Graph Attention Networks for Popularity Prediction in Information Cascades

Information cascade popularity prediction is critical for many applications, including but not limited to identifying fake news and accurate recommendations. Traditional feature-based methods heavily rely on handcrafted features, which are domain-specific and lack generalizability to new domains. To address this problem, researchers have turned to neural network-based approaches. However, most existing methods follow a sampling-based modeling approach, potentially losing continuous dynamic information that emerges during the information diffusion process. In this paper, we propose Hierarchical Temporal Graph Attention Networks for cascade popularity prediction (HierCas), which operates on the entire cascade graph by a dynamic graph modeling approach. By leveraging time-aware node embedding, graph attention mechanisms, and hierarchical pooling structures, HierCas effectively captures the popularity trend implicit in the complex cascade. Extensive experiments conducted on two real-world datasets in different scenarios demonstrate that our HierCas significantly outperforms the state-of-the-art approaches. We have released our code at https://github.com/Daisy-zzz/HierCas.

Updated: 2024-04-27 07:07:20

标题: HierCas：用于信息级联中受欢迎预测的分层时间图注意力网络

摘要: 信息级联流行度预测对许多应用程序至关重要，包括但不限于识别假新闻和准确推荐。传统的基于特征的方法严重依赖手工制作的特征，这些特征是特定领域的，缺乏对新领域的泛化能力。为解决这一问题，研究人员转向基于神经网络的方法。然而，大多数现有方法采用基于采样的建模方法，可能丢失在信息传播过程中出现的连续动态信息。在本文中，我们提出了用于级联流行度预测的分层时间图注意力网络（HierCas），该网络通过动态图建模方法在整个级联图上运行。通过利用时间感知节点嵌入、图注意力机制和分层池化结构，HierCas有效地捕获了复杂级联中隐含的流行趋势。在不同场景下对两个真实世界数据集进行的大量实验表明，我们的HierCas明显优于最先进的方法。我们已经在https://github.com/Daisy-zzz/HierCas 上发布了我们的代码。

更新时间: 2024-04-27 07:07:20

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2310.13219v2

Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors

Relation extraction (RE) is an important task that aims to identify the relationships between entities in texts. While large language models (LLMs) have revealed remarkable in-context learning (ICL) capability for general zero and few-shot learning, recent studies indicate that current LLMs still struggle with zero and few-shot RE. Previous studies are mainly dedicated to design prompt formats and select good examples for improving ICL-based RE. Although both factors are vital for ICL, if one can fundamentally boost the ICL capability of LLMs in RE, the zero and few-shot RE performance via ICL would be significantly improved. To this end, we introduce \textsc{Micre} (\textbf{M}eta \textbf{I}n-\textbf{C}ontext learning of LLMs for \textbf{R}elation \textbf{E}xtraction), a new meta-training framework for zero and few-shot RE where an LLM is tuned to do ICL on a diverse collection of RE datasets (i.e., learning to learn in context for RE). Through meta-training, the model becomes more effectively to learn a new RE task in context by conditioning on a few training examples with no parameter updates or task-specific templates at inference time, enabling better zero and few-shot task generalization. We experiment \textsc{Micre} on various LLMs with different model scales and 12 public RE datasets, and then evaluate it on unseen RE benchmarks under zero and few-shot settings. \textsc{Micre} delivers comparable or superior performance compared to a range of baselines including supervised fine-tuning and typical in-context learning methods. We find that the gains are particular significant for larger model scales, and using a diverse set of the meta-training RE datasets is key to improvements. Empirically, we show that \textsc{Micre} can transfer the relation semantic knowledge via relation label name during inference on target RE datasets.

Updated: 2024-04-27 07:06:39

标题: 元上下文学习使大型语言模型更好地进行零和少样本关系提取

摘要: 关系抽取（RE）是一个重要的任务，旨在识别文本中实体之间的关系。尽管大型语言模型（LLMs）已经展示出在一般的零样本和少样本学习中具有显著的上下文学习（ICL）能力，但最近的研究表明当前的LLMs仍然在零样本和少样本关系抽取方面存在困难。先前的研究主要致力于设计提示格式和选择良好的示例来改善基于ICL的关系抽取。尽管这两个因素对ICL至关重要，但如果能从根本上提升LLMs在RE中的ICL能力，通过ICL实现的零样本和少样本 RE性能将得到显著提升。为此，我们引入了\textsc{Micre}（\textbf{M}eta\textbf{I}n-\textbf{C}ontext学习LLMs用于\textbf{R}elation\textbf{E}xtraction），这是一个新的元训练框架，用于零样本和少样本RE，在这个框架中，一个LLM被调整为在多样的RE数据集上进行ICL（即，在上下文中学习RE）。通过元训练，模型能够更有效地通过在少数训练示例上进行条件化学习来学习新的RE任务，而无需在推理时进行参数更新或任务特定的模板，从而实现更好的零样本和少样本任务泛化。我们在不同模型规模的各种LLMs上实验\textsc{Micre}，以及12个公共RE数据集，然后在零样本和少样本设置下对未见的RE基准进行评估。与一系列基线（包括监督微调和典型的上下文学习方法）相比，\textsc{Micre}提供了可比或优越的性能。我们发现，对于更大的模型规模，收益尤为显著，并且使用多样的元训练RE数据集是改进的关键。从经验上看，我们展示了\textsc{Micre}可以通过关系标签名称在目标RE数据集上推理期间传递关系语义知识。

更新时间: 2024-04-27 07:06:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17807v1

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introduce T-CLAP, a temporal-enhanced CLAP model. We use Large Language Models~(LLMs) and mixed-up strategies to generate temporal-contrastive captions for audio clips from extensive audio-text datasets. Subsequently, a new temporal-focused contrastive loss is designed to fine-tune the CLAP model by incorporating these synthetic data. We conduct comprehensive experiments and analysis in multiple downstream tasks. T-CLAP shows improved capability in capturing the temporal relationship of sound events and outperforms state-of-the-art models by a significant margin.

Updated: 2024-04-27 07:05:48

标题: T-CLAP：时间增强对比语音文本预训练

摘要: 对比性语言音频预训练（CLAP）已经发展出来，用于对齐音频和语言的表示，取得在检索和分类任务中显著的性能。然而，当前的CLAP难以捕捉音频和文本特征内的时间信息，这在音频检索和生成等任务中存在着显著的限制。为了解决这一问题，我们引入了T-CLAP，一个时间增强的CLAP模型。我们使用大型语言模型（LLMs）和混合策略，从广泛的音频文本数据集中生成时间对比性字幕，用于音频片段。随后，设计了一种新的以时间为重点的对比损失，通过整合这些合成数据来微调CLAP模型。我们在多个下游任务中进行了全面的实验和分析。T-CLAP在捕捉声音事件的时间关系方面表现出更强的能力，并且在显著的程度上优于最先进的模型。

更新时间: 2024-04-27 07:05:48

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2404.17806v1

From Optimization to Generalization: Fair Federated Learning against Quality Shift via Inter-Client Sharpness Matching

Due to escalating privacy concerns, federated learning has been recognized as a vital approach for training deep neural networks with decentralized medical data. In practice, it is challenging to ensure consistent imaging quality across various institutions, often attributed to equipment malfunctions affecting a minority of clients. This imbalance in image quality can cause the federated model to develop an inherent bias towards higher-quality images, thus posing a severe fairness issue. In this study, we pioneer the identification and formulation of this new fairness challenge within the context of the imaging quality shift. Traditional methods for promoting fairness in federated learning predominantly focus on balancing empirical risks across diverse client distributions. This strategy primarily facilitates fair optimization across different training data distributions, yet neglects the crucial aspect of generalization. To address this, we introduce a solution termed Federated learning with Inter-client Sharpness Matching (FedISM). FedISM enhances both local training and global aggregation by incorporating sharpness-awareness, aiming to harmonize the sharpness levels across clients for fair generalization. Our empirical evaluations, conducted using the widely-used ICH and ISIC 2019 datasets, establish FedISM's superiority over current state-of-the-art federated learning methods in promoting fairness. Code is available at https://github.com/wnn2000/FFL4MIA.

Updated: 2024-04-27 07:05:41

标题: 从优化到泛化：通过客户端间锐度匹配实现公平联邦学习以抵抗质量偏移

摘要: 由于不断升级的隐私担忧，联邦学习已被认为是使用去中心化医疗数据训练深度神经网络的重要方法。在实践中，要确保各个机构之间的成像质量一致是具有挑战性的，这往往归因于影响少数客户的设备故障。图像质量的不平衡可能导致联邦模型对高质量图像产生固有偏见，从而引发严重的公平性问题。在本研究中，我们首次在成像质量转变的背景下确定并制定了这一新的公平挑战。传统的促进联邦学习公平性的方法主要集中在平衡各种客户分布之间的经验风险上。这种策略主要促进了不同训练数据分布之间的公平优化，但忽略了泛化的关键方面。为了解决这个问题，我们引入了一种名为联邦学习与客户间锐度匹配（FedISM）的解决方案。FedISM通过整合锐度意识来增强本地训练和全局聚合，旨在协调客户之间的锐度水平，以实现公平泛化。我们的实证评估使用广泛使用的ICH和ISIC 2019数据集，证实了FedISM在促进公平性方面优于当前最先进的联邦学习方法。代码可在https://github.com/wnn2000/FFL4MIA 上找到。

更新时间: 2024-04-27 07:05:41

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.17805v1

Empirical Analysis of Dialogue Relation Extraction with Large Language Models

Dialogue relation extraction (DRE) aims to extract relations between two arguments within a dialogue, which is more challenging than standard RE due to the higher person pronoun frequency and lower information density in dialogues. However, existing DRE methods still suffer from two serious issues: (1) hard to capture long and sparse multi-turn information, and (2) struggle to extract golden relations based on partial dialogues, which motivates us to discover more effective methods that can alleviate the above issues. We notice that the rise of large language models (LLMs) has sparked considerable interest in evaluating their performance across diverse tasks. To this end, we initially investigate the capabilities of different LLMs in DRE, considering both proprietary models and open-source models. Interestingly, we discover that LLMs significantly alleviate two issues in existing DRE methods. Generally, we have following findings: (1) scaling up model size substantially boosts the overall DRE performance and achieves exceptional results, tackling the difficulty of capturing long and sparse multi-turn information; (2) LLMs encounter with much smaller performance drop from entire dialogue setting to partial dialogue setting compared to existing methods; (3) LLMs deliver competitive or superior performances under both full-shot and few-shot settings compared to current state-of-the-art; (4) LLMs show modest performances on inverse relations but much stronger improvements on general relations, and they can handle dialogues of various lengths especially for longer sequences.

Updated: 2024-04-27 06:55:41

标题: 大型语言模型在对话关系提取中的实证分析

摘要: 对话关系抽取（DRE）旨在提取对话中两个论点之间的关系，这比标准RE更具挑战性，因为对话中的人称代词频率更高，信息密度更低。然而，现有的DRE方法仍然面临两个严重问题：（1）难以捕捉长而稀疏的多轮信息，（2）难以基于部分对话提取黄金关系，这促使我们发现更有效的方法来缓解上述问题。我们注意到，大型语言模型（LLMs）的崛起引起了人们对评估它们在各种任务中表现的极大兴趣。为此，我们最初在DRE中调查不同LLMs的能力，考虑专有模型和开源模型。有趣的是，我们发现LLMs显著缓解了现有DRE方法中的两个问题。总的来说，我们有以下发现：（1）扩大模型规模大幅提升整体DRE性能并取得出色结果，解决了捕捉长而稀疏的多轮信息的难题；（2）LLMs在从整个对话设置到部分对话设置时遇到的性能下降要小得多，相比现有方法；（3）LLMs在全样本和少样本设置下相比当前最先进技术表现出有竞争力或更优的性能；（4）LLMs在逆关系上表现一般，但在一般关系上有更强的提升，并且它们可以处理各种长度的对话，尤其是对于更长的序列。

更新时间: 2024-04-27 06:55:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17802v1

Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning Approaches

Combustion instability in gas turbines and rocket engines, as one of the most challenging problems in combustion research, arises from the complex interactions among flames, which are also influenced by chemical reactions, heat and mass transfer, and acoustics. Identifying and understanding combustion instability is essential to ensure the safe and reliable operation of many combustion systems, where exploring and classifying the dynamical behaviors of complex flame systems is a core take. To facilitate fundamental studies, the present work concerns dynamical mode recognition of coupled flame oscillators made of flickering buoyant diffusion flames, which have gained increasing attention in recent years but are not sufficiently understood. The time series data of flame oscillators are generated by fully validated reacting flow simulations. Due to limitations of expertise-based models, a data-driven approach is adopted. In this study, a nonlinear dimensional reduction model of variational autoencoder (VAE) is used to project the simulation data onto a 2-dimensional latent space. Based on the phase trajectories in latent space, both supervised and unsupervised classifiers are proposed for datasets with well known labeling and without, respectively. For labeled datasets, we establish the Wasserstein-distance-based classifier (WDC) for mode recognition; for unlabeled datasets, we develop a novel unsupervised classifier (GMM-DTWC) combining dynamic time warping (DTW) and Gaussian mixture model (GMM). Through comparing with conventional approaches for dimensionality reduction and classification, the proposed supervised and unsupervised VAE-based approaches exhibit a prominent performance for distinguishing dynamical modes, implying their potential extension to dynamical mode recognition of complex combustion problems.

Updated: 2024-04-27 06:44:39

标题: 监督学习和无监督学习方法识别耦合火焰振荡器的动态模式

摘要: 燃气涡轮和火箭发动机中的燃烧不稳定性是燃烧研究中最具挑战性的问题之一，源于火焰之间复杂的相互作用，同时受化学反应、热量和质量传递以及声学的影响。识别和理解燃烧不稳定性对于确保许多燃烧系统的安全可靠运行至关重要，而探索和分类复杂火焰系统的动态行为是一个核心任务。为了促进基础研究，本研究关注由摇曳的浮力扩散火焰构成的耦合火焰振荡器的动态模式识别，这在近年来引起了越来越多的关注，但尚未得到充分理解。火焰振荡器的时间序列数据由经过充分验证的反应流模拟生成。由于基于专业知识的模型的局限性，采用了数据驱动的方法。在本研究中，使用变分自编码器（VAE）的非线性降维模型将模拟数据投影到一个二维潜在空间中。基于潜在空间中的相位轨迹，提出了针对已知标签数据集和未知标签数据集的有监督和无监督分类器。对于带有标签的数据集，我们建立了基于Wasserstein距离的分类器（WDC）用于模式识别；对于无标签数据集，我们开发了一个结合动态时间扭曲（DTW）和高斯混合模型（GMM）的新型无监督分类器（GMM-DTWC）。通过与传统的降维和分类方法进行比较，所提出的有监督和无监督基于VAE的方法在区分动态模式方面表现出显著的性能，暗示了它们在复杂燃烧问题的动态模式识别中的潜在应用。

更新时间: 2024-04-27 06:44:39

领域: cs.LG

下载: http://arxiv.org/abs/2404.17801v1

Personalized Federated Learning via Sequential Layer Expansion in Representation Learning

Federated learning ensures the privacy of clients by conducting distributed training on individual client devices and sharing only the model weights with a central server. However, in real-world scenarios, the heterogeneity of data among clients necessitates appropriate personalization methods. In this paper, we aim to address this heterogeneity using a form of parameter decoupling known as representation learning. Representation learning divides deep learning models into 'base' and 'head' components. The base component, capturing common features across all clients, is shared with the server, while the head component, capturing unique features specific to individual clients, remains local. We propose a new representation learning-based approach that suggests decoupling the entire deep learning model into more densely divided parts with the application of suitable scheduling methods, which can benefit not only data heterogeneity but also class heterogeneity. In this paper, we compare and analyze two layer scheduling approaches, namely forward (\textit{Vanilla}) and backward (\textit{Anti}), in the context of data and class heterogeneity among clients. Our experimental results show that the proposed algorithm, when compared to existing personalized federated learning algorithms, achieves increased accuracy, especially under challenging conditions, while reducing computation costs.

Updated: 2024-04-27 06:37:19

标题: 通过表示学习中的顺序层扩展实现个性化的联邦学习

摘要: 联邦学习通过在个体客户设备上进行分布式训练并仅与中央服务器共享模型权重来确保客户的隐私。然而，在现实世界的场景中，客户数据的异质性需要适当的个性化方法。本文旨在利用一种称为表示学习的参数解耦形式来解决这种异质性问题。表示学习将深度学习模型分为“基础”和“头部”组件。基础组件捕获所有客户共同的特征，并与服务器共享，而头部组件则捕获特定于个体客户的独特特征，并保留在本地。我们提出了一种基于表示学习的新方法，建议通过应用适当的调度方法将整个深度学习模型解耦为更密集划分的部分，这不仅有利于数据异质性，还有利于类别异质性。在本文中，我们比较和分析了两种层调度方法，即前向（Vanilla）和反向（Anti），在客户之间的数据和类别异质性背景下。我们的实验结果显示，与现有的个性化联邦学习算法相比，所提出的算法在挑战性条件下尤其在准确性上取得了增加，同时还减少了计算成本。

更新时间: 2024-04-27 06:37:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.17799v1

GPT for Games: A Scoping Review (2020-2023)

This paper introduces a scoping review of 55 articles to explore GPT's potential for games, offering researchers a comprehensive understanding of the current applications and identifying both emerging trends and unexplored areas. We identify five key applications of GPT in current game research: procedural content generation, mixed-initiative game design, mixed-initiative gameplay, playing games, and game user research. Drawing from insights in each of these application areas, we propose directions for future research in each one. This review aims to lay the groundwork by illustrating the state of the art for innovative GPT applications in games, promising to enrich game development and enhance player experiences with cutting-edge AI innovations.

Updated: 2024-04-27 06:26:18

标题: 游戏中的GPT：一项范围性审查（2020-2023）

摘要: 这篇论文介绍了一项对55篇文章进行范围审查的研究，旨在探讨生成预训练模型（GPT）在游戏中的潜力，为研究人员提供对当前应用的全面理解，并识别新兴趋势和未开发领域。我们确定了GPT在当前游戏研究中的五个关键应用领域：程序内容生成、混合倡议游戏设计、混合倡议游戏玩法、游戏玩家研究。基于这些应用领域的见解，我们提出了未来研究的方向。这篇评论旨在通过展示游戏中创新GPT应用的最新技术，为游戏开发提供基础，承诺通过尖端人工智能创新来丰富游戏开发并增强玩家体验。

更新时间: 2024-04-27 06:26:18

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2404.17794v1

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data

Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is crucial. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of 12 state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets (PMData, LifeSnaps, GLOBEM and AW_FB). Our experiments cover 10 consumer health prediction tasks in mental health, activity, metabolic, and sleep assessment. Our fine-tuned model, HealthAlpaca exhibits comparable performance to much larger models (GPT-3.5, GPT-4 and Gemini-Pro), achieving the best performance in 8 out of 10 tasks. Ablation studies highlight the effectiveness of context enhancement strategies. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance.

Updated: 2024-04-27 06:20:26

标题: Health-LLM：通过可穿戴传感器数据进行健康预测的大型语言模型

摘要: 大型语言模型（LLMs）能够完成许多自然语言任务，但它们远非完美。在健康应用中，基于特定领域和非语言数据的基础和解释是至关重要的。本文研究了LLMs根据上下文信息（例如用户人口统计信息、健康知识）和生理数据（例如静息心率、睡眠分钟）对健康进行推断的能力。我们在四个公共卫生数据集（PMData、LifeSnaps、GLOBEM和AW_FB）上对12个最先进的LLMs进行了提示和微调技术的全面评估。我们的实验涵盖了在心理健康、活动、代谢和睡眠评估方面的10个消费者健康预测任务。我们的微调模型HealthAlpaca表现出与更大模型（GPT-3.5、GPT-4和Gemini-Pro）相当的性能，在10个任务中有8个取得最佳表现。消融研究突显了上下文增强策略的有效性。值得注意的是，我们观察到我们的上下文增强可以使性能提高多达23.8%。构建上下文丰富的提示（结合用户上下文、健康知识和时间信息）表现出协同改进，提示中包含健康知识上下文显著增强了整体性能。

更新时间: 2024-04-27 06:20:26

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.06866v2

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

Cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus allows us to leverage the vast amount of English language resources and reduce the pre-training cost. In this study, we constructed Swallow, an LLM with enhanced Japanese capability, by extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a large Japanese web corpus. Experimental results confirmed that the performance on Japanese tasks drastically improved through continual pre-training, and the performance monotonically increased with the amount of training data up to 100B tokens. Consequently, Swallow achieved superior performance compared to other LLMs that were trained from scratch in English and Japanese. An analysis of the effects of continual pre-training revealed that it was particularly effective for Japanese question answering tasks. Furthermore, to elucidate effective methodologies for cross-lingual continual pre-training from English to Japanese, we investigated the impact of vocabulary expansion and the effectiveness of incorporating parallel corpora. The results showed that the efficiency gained through vocabulary expansion had no negative impact on performance, except for the summarization task, and that the combined use of parallel corpora enhanced translation ability.

Updated: 2024-04-27 06:07:55

标题: 持续预训练用于跨语言LLM适应：增强日语语言能力

摘要: 这项研究探讨了在英语语料库上进行初始训练的大型语言模型（LLMs）的跨语言持续预训练，使我们能够利用大量的英语语言资源并降低预训练成本。在这项研究中，我们构建了一个具有增强日语功能的LLM，命名为Swallow，通过扩展Llama 2的词汇表以包括日语字符，并在大型日语网络语料库上进行持续预训练。实验结果证实，通过持续预训练，日语任务的性能显著提高，并且性能随着训练数据量的增加呈单调递增，直到达到100B个标记。因此，与从头开始在英语和日语上进行训练的其他LLMs相比，Swallow取得了更优异的性能。对持续预训练效果的分析表明，它对日语问答任务特别有效。此外，为了阐明从英语到日语的跨语言持续预训练的有效方法，我们调查了词汇扩展的影响以及并入平行语料库的有效性。结果显示，通过词汇扩展获得的效率对性能没有负面影响，除了摘要任务外，同时并入平行语料库增强了翻译能力。

更新时间: 2024-04-27 06:07:55

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17790v1

BiLO: Bilevel Local Operator Learning for PDE inverse problems

We propose a new neural network based method for solving inverse problems for partial differential equations (PDEs) by formulating the PDE inverse problem as a bilevel optimization problem. At the upper level, we minimize the data loss with respect to the PDE parameters. At the lower level, we train a neural network to locally approximate the PDE solution operator in the neighborhood of a given set of PDE parameters, which enables an accurate approximation of the descent direction for the upper level optimization problem. The lower level loss function includes the L2 norms of both the residual and its derivative with respect to the PDE parameters. We apply gradient descent simultaneously on both the upper and lower level optimization problems, leading to an effective and fast algorithm. The method, which we refer to as BiLO (Bilevel Local Operator learning), is also able to efficiently infer unknown functions in the PDEs through the introduction of an auxiliary variable. We demonstrate that our method enforces strong PDE constraints, is robust to sparse and noisy data, and eliminates the need to balance the residual and the data loss, which is inherent to soft PDE constraints.

Updated: 2024-04-27 06:06:41

标题: BiLO：双层局部算子学习用于PDE反问题

摘要: 我们提出了一种基于神经网络的新方法，用于通过将偏微分方程（PDEs）反问题制定为双层优化问题来解决。在上层，我们最小化相对于PDE参数的数据损失。在下层，我们训练一个神经网络，在给定一组PDE参数的邻域内局部逼近PDE解算符，从而实现对上层优化问题的下降方向的准确逼近。下层损失函数包括残差及其相对于PDE参数的导数的L2范数。我们同时在上层和下层优化问题上应用梯度下降，从而导致一种有效且快速的算法。我们称之为BiLO（双层局部算符学习）的方法还能够通过引入辅助变量高效地推断PDE中的未知函数。我们证明我们的方法强制实施了严格的PDE约束，对稀疏和嘈杂数据具有鲁棒性，并且消除了需要平衡残差和数据损失的需求，这是软PDE约束固有的特性。

更新时间: 2024-04-27 06:06:41

领域: cs.LG,cs.NA,math.NA,math.OC

下载: http://arxiv.org/abs/2404.17789v1

Quantum resistant multi-signature scheme with optimal communication round: A Blockchain-based approach

Blockchain is a decentralized network to increase trust, integrity, and transparency of transactions. With the exponential growth of transactions in the realm of Blockchain, especially in Bitcoin, Blockchain size increases as all transactions must be stored and verified. In Bitcoin, validating M of N transactions involves the necessity of M authentic signatures out of the total N transactions. This procedure is so time-consuming and needs a significant storage capacity. To address these issues, several multi signature schemes have been proposed, enabling users to interactively generate a common signature on a single message. Recently, some lattice based multi signature schemes have been presented to deal with the threats of quantum computers. However, none of them have met all desirable features of multi signature schemes like aggregate public key, low numbers of communication rounds, or resistant to quantum computers. Within this paper, we present a new multi signature scheme based on lattices, known as Razhims, that has aggregate public key, necessitates solely a single round of communication, and is resistant to quantum computers. In Razhims, the aggregate public key size and the final signature size are equal to the public key size and the final signature size of a standard signature respectively, and are independent of the number of signers.

Updated: 2024-04-27 06:05:44

标题: 具有最佳通信轮次的量子抗性多重签名方案：基于区块链的方法

摘要: 区块链是一个分散网络，旨在增加交易的信任、完整性和透明度。随着区块链领域，尤其是比特币交易数量的指数增长，区块链的规模也在增加，因为所有交易都必须被存储和验证。在比特币中，验证M个N个交易涉及需要M个真实签名中的N个总交易。这个过程非常耗时，需要大量的存储容量。为了解决这些问题，提出了几种多重签名方案，使用户能够交互生成单个消息的公共签名。最近，一些基于格的多重签名方案已经提出，以应对量子计算机的威胁。然而，它们中没有一个符合所有理想的多重签名方案特性，如聚合公钥、低通信轮数或抵抗量子计算机。本文介绍了一种基于格的全新多重签名方案Razhims，它具有聚合公钥，仅需要一轮通信，且能抵抗量子计算机。在Razhims中，聚合公钥大小和最终签名大小分别等于一个标准签名的公钥大小和最终签名大小，并且与签署者数量无关。

更新时间: 2024-04-27 06:05:44

领域: cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2404.17787v1

Intrinsic Voltage Offsets in Memcapacitive Bio-Membranes Enable High-Performance Physical Reservoir Computing

Reservoir computing is a brain-inspired machine learning framework for processing temporal data by mapping inputs into high-dimensional spaces. Physical reservoir computers (PRCs) leverage native fading memory and nonlinearity in physical substrates, including atomic switches, photonics, volatile memristors, and, recently, memcapacitors, to achieve efficient high-dimensional mapping. Traditional PRCs often consist of homogeneous device arrays, which rely on input encoding methods and large stochastic device-to-device variations for increased nonlinearity and high-dimensional mapping. These approaches incur high pre-processing costs and restrict real-time deployment. Here, we introduce a novel heterogeneous memcapacitor-based PRC that exploits internal voltage offsets to enable both monotonic and non-monotonic input-state correlations crucial for efficient high-dimensional transformations. We demonstrate our approach's efficacy by predicting a second-order nonlinear dynamical system with an extremely low prediction error (0.00018). Additionally, we predict a chaotic H\'enon map, achieving a low normalized root mean square error (0.080). Unlike previous PRCs, such errors are achieved without input encoding methods, underscoring the power of distinct input-state correlations. Most importantly, we generalize our approach to other neuromorphic devices that lack inherent voltage offsets using externally applied offsets to realize various input-state correlations. Our approach and unprecedented performance are a major milestone towards high-performance full in-materia PRCs.

Updated: 2024-04-27 05:47:38

标题: 生物膜中固有的电压偏移量使高性能物理蓄积计算成为可能

摘要: Reservoir computing是一种受大脑启发的机器学习框架，用于通过将输入映射到高维空间来处理时间数据。物理储存器计算机（PRCs）利用原生衰减记忆和物理基质中的非线性，包括原子开关、光子学、挥发性忆阻器，以及最近的记电容器，实现高效的高维映射。传统的PRCs通常由同质设备阵列组成，依赖于输入编码方法和大量随机设备间的变化，以增加非线性和高维映射。这些方法会产生高昂的预处理成本，并限制实时部署。在这里，我们介绍了一种新颖的基于异质记电容器的PRC，利用内部电压偏移来实现对于高效的高维转换至关重要的单调和非单调输入状态相关性。我们通过预测一个二阶非线性动力系统来展示我们方法的有效性，预测误差极低（0.00018）。此外，我们预测了一个混沌H\'enon映射，实现了低归一化均方根误差（0.080）。与以往的PRCs不同，这些错误是在没有输入编码方法的情况下实现的，突显了独特的输入状态相关性的力量。最重要的是，我们将我们的方法推广到其他缺乏内在电压偏移的神经形态设备，使用外部施加的偏移来实现各种输入状态相关性。我们的方法和前所未有的性能是朝着高性能全内部PRCs的重要里程碑。

更新时间: 2024-04-27 05:47:38

领域: cs.ET,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.09545v1

GMValuator: Similarity-based Data Valuation for Generative Models

Data valuation plays a crucial role in machine learning. Existing data valuation methods have primarily focused on discriminative models, neglecting generative models that have recently gained considerable attention. A very few existing attempts of data valuation method designed for deep generative models either concentrates on specific models or lacks robustness in their outcomes. Moreover, efficiency still reveals vulnerable shortcomings. To bridge the gaps, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first training-free and model-agnostic approach to provide data valuation for generation tasks. It empowers efficient data valuation through our innovatively similarity matching module, calibrates biased contribution by incorporating image quality assessment, and attributes credits to all training samples based on their contributions to the generated samples. Additionally, we introduce four evaluation criteria for assessing data valuation methods in generative models, aligning with principles of plausibility and truthfulness. GMValuator is extensively evaluated on various datasets and generative architectures to demonstrate its effectiveness.

Updated: 2024-04-27 05:45:34

标题: GMValuator：基于相似性的生成模型数据估值

摘要: 数据估值在机器学习中起着至关重要的作用。现有的数据估值方法主要集中在判别模型上，忽视了近年来备受关注的生成模型。极少数现有的针对深度生成模型设计的数据估值方法要么专注于特定模型，要么在结果上缺乏稳健性。此外，效率仍然存在脆弱的缺点。为了弥补这些差距，我们从相似度匹配的角度制定了生成模型中的数据估值问题。具体而言，我们引入了生成模型估值器（GMValuator），这是一种无需训练且与模型无关的方法，用于为生成任务提供数据估值。它通过我们创新的相似度匹配模块实现了高效的数据估值，通过融入图像质量评估来校准偏见贡献，并根据它们对生成样本的贡献归因于所有训练样本。此外，我们还引入了四个评估标准，用于评估生成模型中的数据估值方法，符合可信度和真实性原则。GMValuator在各种数据集和生成架构上进行了广泛评估，以展示其有效性。

更新时间: 2024-04-27 05:45:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2304.10701v7

Explainable machine learning to enable high-throughput electrical conductivity optimization and discovery of doped conjugated polymers

The combination of high-throughput experimentation techniques and machine learning (ML) has recently ushered in a new era of accelerated material discovery, enabling the identification of materials with cutting-edge properties. However, the measurement of certain physical quantities remains challenging to automate. Specifically, meticulous process control, experimentation and laborious measurements are required to achieve optimal electrical conductivity in doped polymer materials. We propose a ML approach, which relies on readily measured absorbance spectra, to accelerate the workflow associated with measuring electrical conductivity. The classification model accurately classifies samples with a conductivity > 25 to 100 S/cm, achieving a maximum of 100 % accuracy rate. For the subset of highly conductive samples, we employed a regression model to predict their conductivities, yielding an impressive test R2 value of 0.984. We tested the models with samples of the two highest conductivities (498 and 506 S/cm) and showed that they were able to correctly classify and predict the two extrapolative conductivities at satisfactory levels of errors. The proposed ML-assisted workflow results in an improvement in the efficiency of the conductivity measurements by 89 % of the maximum achievable using our experimental techniques. Furthermore, our approach addressed the common challenge of the lack of explainability in ML models by exploiting bespoke mathematical properties of the descriptors and ML model, allowing us to gain corroborated insights into the spectral influences on conductivity. Through this study, we offer an accelerated pathway for optimizing the properties of doped polymer materials while showcasing the valuable insights that can be derived from purposeful utilization of ML in experimental science.

Updated: 2024-04-27 05:13:59

标题: 可解释的机器学习用于实现高通量电导率优化和探索掺杂共轭聚合物

摘要: 高通量实验技术和机器学习（ML）的组合最近开创了一个加速材料发现的新时代，实现了具有尖端性能的材料的识别。然而，某些物理量的测量仍然具有挑战性，很难实现自动化。具体而言，需要谨慎的过程控制、实验和繁琐的测量才能实现掺杂聚合物材料中的最佳电导率。我们提出了一种ML方法，依赖于易于测量的吸收光谱，以加速与测量电导率相关的工作流程。分类模型可以准确分类电导率大于25至100 S/cm的样品，实现了100%的准确率。对于高导电性样品子集，我们采用回归模型预测它们的电导率，得到了令人印象深刻的测试R2值为0.984。我们用两种最高电导率（498和506 S/cm）的样品测试了模型，并展示了它们能够以令人满意的误差水平正确分类和预测这两种外推电导率。所提出的ML辅助工作流程使电导率测量的效率提高了89%，接近我们实验技术的最大可实现值。此外，我们的方法通过利用描述符和ML模型的独特数学属性，解决了ML模型缺乏可解释性的常见挑战，使我们能够获得关于光谱对电导率影响的得到证实的见解。通过这项研究，我们为优化掺杂聚合物材料的性能提供了一条加速路径，同时展示了在实验科学中有目的地利用ML可以获得宝贵见解的可能性。

更新时间: 2024-04-27 05:13:59

领域: physics.app-ph,cond-mat.mtrl-sci,cs.LG

下载: http://arxiv.org/abs/2308.04103v2

Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning

In recent years, multi-agent reinforcement learning algorithms have made significant advancements in diverse gaming environments, leading to increased interest in the broader application of such techniques. To address the prevalent challenge of partial observability, communication-based algorithms have improved cooperative performance through the sharing of numerical embedding between agents. However, the understanding of the formation of collaborative mechanisms is still very limited, making designing a human-understandable communication mechanism a valuable problem to address. In this paper, we propose a novel multi-agent reinforcement learning algorithm that embeds large language models into agents, endowing them with the ability to generate human-understandable verbal communication. The entire framework has a message module and an action module. The message module is responsible for generating and sending verbal messages to other agents, effectively enhancing information sharing among agents. To further enhance the message module, we employ a teacher model to generate message labels from the global view and update the student model through Supervised Fine-Tuning (SFT). The action module receives messages from other agents and selects actions based on current local observations and received messages. Experiments conducted on the Overcooked game demonstrate our method significantly enhances the learning efficiency and performance of existing methods, while also providing an interpretable tool for humans to understand the process of multi-agent cooperation.

Updated: 2024-04-27 05:10:33

标题: Verco：多智能体强化学习中协调口头交流的学习

摘要: 最近几年，多智能体强化学习算法在各种游戏环境中取得了显著进展，引起了对这些技术更广泛应用的兴趣增加。为了解决部分可观测性的普遍挑战，基于通信的算法通过在智能体之间共享数值嵌入来提高合作性能。然而，对于协作机制的形成的理解仍非常有限，设计一个人类可以理解的通信机制成为一个有价值的问题。在本文中，我们提出了一种新颖的多智能体强化学习算法，将大型语言模型嵌入到智能体中，赋予它们生成人类可理解的口头交流的能力。整个框架包括一个消息模块和一个动作模块。消息模块负责生成和发送口头信息给其他智能体，有效增强了智能体之间的信息共享。为了进一步增强消息模块，我们使用一个教师模型从全局视图生成消息标签，并通过监督微调（SFT）来更新学生模型。动作模块接收来自其他智能体的消息，并根据当前局部观察和接收的消息选择动作。在Overcooked游戏上进行的实验表明，我们的方法显著提高了现有方法的学习效率和性能，同时为人类提供了一个可解释的工具，来理解多智能体合作的过程。

更新时间: 2024-04-27 05:10:33

领域: cs.MA,cs.AI

下载: http://arxiv.org/abs/2404.17780v1

Primal Dual Alternating Proximal Gradient Algorithms for Nonsmooth Nonconvex Minimax Problems with Coupled Linear Constraints

Nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose a primal-dual alternating proximal gradient (PDAPG) algorithm for solving nonsmooth nonconvex-(strongly) concave minimax problems with coupled linear constraints, respectively. The iteration complexity of the two algorithms are proved to be $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp. $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under nonconvex-strongly concave (resp. nonconvex-concave) setting to reach an $\varepsilon$-stationary point. To our knowledge, it is the first algorithm with iteration complexity guarantees for solving the nonconvex minimax problems with coupled linear constraints.

Updated: 2024-04-27 05:00:33

标题: 原始对偶交替近端梯度算法用于带有耦合线性约束的非光滑非凸极小极大问题

摘要: 非凸极小极大问题近年来在机器学习、信号处理和许多其他领域引起了广泛关注。本文提出了一种用于解决非光滑非凸（强）凹极小极大问题的原始-对偶交替近端梯度（PDAPG）算法，分别具有耦合线性约束。证明了这两种算法的迭代复杂度分别为$O(\varepsilon^{-2})$（或$O(\varepsilon^{-4})$），在非凸-强凹（或非凸-凹）设置下达到$\varepsilon$-稳定点。据我们所知，这是第一个具有迭代复杂度保证的算法，用于解决具有耦合线性约束的非凸极小极大问题。

更新时间: 2024-04-27 05:00:33

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2212.04672v4

MRScore: Evaluating Radiology Report Generation with LLM-based Reward System

In recent years, automated radiology report generation has experienced significant growth. This paper introduces MRScore, an automatic evaluation metric tailored for radiology report generation by leveraging Large Language Models (LLMs). Conventional NLG (natural language generation) metrics like BLEU are inadequate for accurately assessing the generated radiology reports, as systematically demonstrated by our observations within this paper. To address this challenge, we collaborated with radiologists to develop a framework that guides LLMs for radiology report evaluation, ensuring alignment with human analysis. Our framework includes two key components: i) utilizing GPT to generate large amounts of training data, i.e., reports with different qualities, and ii) pairing GPT-generated reports as accepted and rejected samples and training LLMs to produce MRScore as the model reward. Our experiments demonstrate MRScore's higher correlation with human judgments and superior performance in model selection compared to traditional metrics. Our code and datasets will be available on GitHub.

Updated: 2024-04-27 04:42:45

标题: MRScore：使用基于LLM的奖励系统评估放射学报告生成

摘要: 最近几年，自动生成放射学报告已经经历了显著增长。本文介绍了MRScore，这是一种针对放射学报告生成的自动评估指标，利用了大型语言模型（LLMs）。传统的NLG（自然语言生成）指标如BLEU不适合准确评估生成的放射学报告，这在本文中通过我们的观察得到了系统性证明。为了解决这一挑战，我们与放射科医生合作开发了一个框架，指导LLMs进行放射学报告评估，确保与人类分析一致。我们的框架包括两个关键组成部分：i）利用GPT生成大量训练数据，即具有不同质量的报告，和ii）将GPT生成的报告作为接受和拒绝样本配对，并训练LLMs生成MRScore作为模型奖励。我们的实验表明，与传统指标相比，MRScore与人类判断之间的相关性更高，并在模型选择中表现出更优异的性能。我们的代码和数据集将在GitHub上提供。

更新时间: 2024-04-27 04:42:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17778v1

Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address the triage problem in emergency departments has been examined, while few studies have explored its application in outpatient departments. With a focus on streamlining workflows and enhancing efficiency for outpatient triage, this study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance, including both within-version response analysis and between-version comparisons. For within-version, the results indicate that the internal response consistency for ChatGPT-4.0 is significantly higher than ChatGPT-3.5 (p=0.03) and both have a moderate consistency (71.2% for 4.0 and 59.6% for 3.5) in their top recommendation. However, the between-version consistency is relatively low (mean consistency score=1.43/3, median=1), indicating few recommendations match between the two versions. Also, only 50% top recommendations match perfectly in the comparisons. Interestingly, ChatGPT-3.5 responses are more likely to be complete than those from ChatGPT-4.0 (p=0.02), suggesting possible differences in information processing and response generation between the two versions. The findings offer insights into AI-assisted outpatient operations, while also facilitating the exploration of potentials and limitations of LLMs in healthcare utilization. Future research may focus on carefully optimizing LLMs and AI integration in healthcare systems based on ergonomic and human factors principles, precisely aligning with the specific needs of effective outpatient triage.

Updated: 2024-04-27 04:12:02

标题: 评估ChatGPT在门诊分诊指导中的应用：一项比较研究

摘要: 在医疗保健领域整合人工智能（AI）具有巨大的潜力，可以提高运营效率和健康结果。大型语言模型（LLM），如ChatGPT，已经展示了它们在支持医疗决策方面的能力。将LLM嵌入医疗系统正成为医疗保健发展中一个有前景的趋势。已经研究了ChatGPT在急诊部门解决分类问题的潜力，但很少有研究探讨它在门诊部门的应用。本研究专注于简化门诊分类的工作流程，提高效率，具体目的是评估ChatGPT在门诊指导中提供的回复的一致性，包括版本内回复分析和版本间比较。对于版本内，结果表明ChatGPT-4.0的内部回复一致性显著高于ChatGPT-3.5（p=0.03），两者在其顶级推荐中具有中等一致性（4.0为71.2%，3.5为59.6%）。然而，版本间的一致性相对较低（平均一致性得分为1.43/3，中位数为1），表明两个版本之间匹配的推荐很少。此外，在比较中，只有50%的顶级推荐完全匹配。有趣的是，与ChatGPT-4.0相比，ChatGPT-3.5的回复更有可能是完整的（p=0.02），这表明两个版本之间在信息处理和回复生成方面可能存在差异。这些发现为AI辅助的门诊运营提供了见解，同时促进了对LLM在医疗利用中潜力和局限性的探索。未来的研究可以重点关注根据人体工程学和人类因素原则精心优化LLM和AI在医疗系统中的整合，精确地满足有效门诊分类的特定需求。

更新时间: 2024-04-27 04:12:02

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2405.00728v1

Compressing Latent Space via Least Volume

This paper introduces Least Volume-a simple yet effective regularization inspired by geometric intuition-that can reduce the necessary number of latent dimensions needed by an autoencoder without requiring any prior knowledge of the intrinsic dimensionality of the dataset. We show that the Lipschitz continuity of the decoder is the key to making it work, provide a proof that PCA is just a linear special case of it, and reveal that it has a similar PCA-like importance ordering effect when applied to nonlinear models. We demonstrate the intuition behind the regularization on some pedagogical toy problems, and its effectiveness on several benchmark problems, including MNIST, CIFAR-10 and CelebA.

Updated: 2024-04-27 04:09:49

标题: 通过最小体积压缩潜空间

摘要: 本文介绍了一种简单而有效的正则化方法——最小体积，受几何直觉启发，能够减少自动编码器所需的潜在维度数量，而无需任何关于数据集固有维度的先验知识。我们表明解码器的利普希茨连续性是使其有效的关键，提供了PCA只是它的线性特例的证明，并揭示了当应用于非线性模型时，它具有类似PCA的重要性排序效果。我们在一些教学玩具问题上展示了正则化背后的直觉，并展示了它在包括MNIST、CIFAR-10和CelebA在内的几个基准问题上的有效性。

更新时间: 2024-04-27 04:09:49

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.17773v1

Make the Most of Your Data: Changing the Training Data Distribution to Improve In-distribution Generalization Performance

Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we prove that SAM learns easy and difficult features more uniformly, particularly in early epochs. That is, SAM is less susceptible to simplicity bias compared to GD. Based on this observation, we propose USEFUL, an algorithm that clusters examples based on the network output early in training and upsamples examples with no easy features to alleviate the pitfalls of the simplicity bias. We show empirically that modifying the training data distribution in this way effectively improves the generalization performance on the original data distribution when training with (S)GD by mimicking the training dynamics of SAM. Notably, we demonstrate that our method can be combined with SAM and existing data augmentation strategies to achieve, to the best of our knowledge, state-of-the-art performance for training ResNet18 on CIFAR10, STL10, CINIC10, Tiny-ImageNet; ResNet34 on CIFAR100; and VGG19 and DenseNet121 on CIFAR10.

Updated: 2024-04-27 03:30:50

标题: 充分利用您的数据：改变训练数据分布以提高内分布泛化性能

摘要: 我们能否修改训练数据分布以鼓励基础优化方法更好地在分布数据中找到具有卓越泛化性能的解决方案？在这项工作中，我们首次通过比较梯度下降（GD）的归纳偏差与锐度感知最小化（SAM）的归纳偏差来探讨这个问题。通过研究一个两层CNN，我们证明SAM更均匀地学习简单和困难特征，尤其是在早期阶段。也就是说，与GD相比，SAM对简单性偏见不太容易受到影响。基于这一观察，我们提出了一种算法USEFUL，该算法根据网络输出在训练初期对示例进行聚类，并对没有简单特征的示例进行上采样，以减轻简单性偏见的缺点。我们经验证明，通过模拟SAM的训练动态，以这种方式修改训练数据分布有效地改善了在原始数据分布上的泛化性能，当使用（S）GD进行训练时。值得注意的是，我们证明我们的方法可以与SAM和现有的数据增强策略结合，以在CIFAR10、STL10、CINIC10、Tiny-ImageNet上训练ResNet18；在CIFAR100上训练ResNet34；以及在CIFAR10上训练VGG19和DenseNet121，实现据我们所知的最先进性能。

更新时间: 2024-04-27 03:30:50

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.17768v1

Counterfactual Explanations of Black-box Machine Learning Models using Causal Discovery with Applications to Credit Rating

Explainable artificial intelligence (XAI) has helped elucidate the internal mechanisms of machine learning algorithms, bolstering their reliability by demonstrating the basis of their predictions. Several XAI models consider causal relationships to explain models by examining the input-output relationships of prediction models and the dependencies between features. The majority of these models have been based their explanations on counterfactual probabilities, assuming that the causal graph is known. However, this assumption complicates the application of such models to real data, given that the causal relationships between features are unknown in most cases. Thus, this study proposed a novel XAI framework that relaxed the constraint that the causal graph is known. This framework leveraged counterfactual probabilities and additional prior information on causal structure, facilitating the integration of a causal graph estimated through causal discovery methods and a black-box classification model. Furthermore, explanatory scores were estimated based on counterfactual probabilities. Numerical experiments conducted employing artificial data confirmed the possibility of estimating the explanatory score more accurately than in the absence of a causal graph. Finally, as an application to real data, we constructed a classification model of credit ratings assigned by Shiga Bank, Shiga prefecture, Japan. We demonstrated the effectiveness of the proposed method in cases where the causal graph is unknown.

Updated: 2024-04-27 03:11:26

标题: 使用因果发现解释黑盒机器学习模型的反事实解释，并应用于信用评级

摘要: 可解释人工智能（XAI）已帮助阐明机器学习算法的内部机制，通过展示其预测基础来增强其可靠性。几种XAI模型考虑因果关系，通过检查预测模型的输入输出关系和特征之间的依赖关系来解释模型。这些模型中的大多数都基于反事实概率进行解释，假设因果图是已知的。然而，这一假设复杂化了将这些模型应用于实际数据的情况，因为在大多数情况下特征之间的因果关系是未知的。因此，本研究提出了一个新颖的XAI框架，放宽了已知因果图的约束条件。该框架利用反事实概率和有关因果结构的额外先验信息，促进了通过因果发现方法估算的因果图与黑盒分类模型的集成。此外，基于反事实概率估计了解释分数。利用人工数据进行的数值实验证实了在没有因果图的情况下更准确地估计解释分数的可能性。最后，作为对实际数据的应用，我们构建了一个由日本滋贺县滋贺银行分配的信用评级的分类模型。我们展示了在因果图未知的情况下所提出方法的有效性。

更新时间: 2024-04-27 03:11:26

领域: cs.LG

下载: http://arxiv.org/abs/2402.02678v2

PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models

Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients missing out on potential therapeutic options. Recent advancements in Large Language Models (LLMs) have made automating patient-trial matching possible, as shown in multiple concurrent research studies. However, the current approaches are confined to constrained, often synthetic datasets that do not adequately mirror the complexities encountered in real-world medical data. In this study, we present the first, end-to-end large-scale empirical evaluation of clinical trial matching using real-world EHRs. Our study showcases the capability of LLMs to accurately match patients with appropriate clinical trials. We perform experiments with proprietary LLMs, including GPT-4 and GPT-3.5, as well as our custom fine-tuned model called OncoLLM and show that OncoLLM, despite its significantly smaller size, not only outperforms GPT-3.5 but also matches the performance of qualified medical doctors. All experiments were carried out on real-world EHRs that include clinical notes and available clinical trials from a single cancer center in the United States.

Updated: 2024-04-27 03:10:21

标题: PRISM：利用大型语言模型进行语义临床试验匹配的患者病历解释

摘要: 临床试验匹配是识别患者可能符合资格的试验的任务。通常，这个任务是费时费力的，需要详细验证患者的电子健康记录（EHR）与临床试验的严格包含和排除标准相符。这个过程是手动的、耗时的，难以扩展，导致许多患者错过潜在的治疗选择。近年来，大型语言模型（LLMs）的进步使得自动化患者-试验匹配成为可能，多个同时进行的研究表明了这一点。然而，当前的方法局限于受限制的、通常是合成数据集，无法充分反映真实世界医疗数据中遇到的复杂性。在这项研究中，我们提出了首个端到端的大规模实证评估，使用真实世界的EHR进行临床试验匹配。我们的研究展示了LLMs准确地将患者与适当的临床试验匹配的能力。我们使用专有的LLMs进行实验，包括GPT-4和GPT-3.5，以及我们的自定义微调模型OncoLLM，并展示了尽管其体积明显较小，但OncoLLM不仅胜过了GPT-3.5，还与合格的医生的表现相匹配。所有实验都在美国一家单一癌症中心的真实世界EHR上进行，包括临床记录和可用的临床试验。

更新时间: 2024-04-27 03:10:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.15549v2

Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing

Big Artificial Intelligence (AI) models have emerged as a crucial element in various intelligent applications at the edge, such as voice assistants in smart homes and autonomous robotics in smart factories. Training big AI models, e.g., for personalized fine-tuning and continual model refinement, poses significant challenges to edge devices due to the inherent conflict between limited computing resources and intensive workload associated with training. Despite the constraints of on-device training, traditional approaches usually resort to aggregating training data and sending it to a remote cloud for centralized training. Nevertheless, this approach is neither sustainable, which strains long-range backhaul transmission and energy-consuming datacenters, nor safely private, which shares users' raw data with remote infrastructures. To address these challenges, we alternatively observe that prevalent edge environments usually contain a diverse collection of trusted edge devices with untapped idle resources, which can be leveraged for edge training acceleration. Motivated by this, in this article, we propose collaborative edge training, a novel training mechanism that orchestrates a group of trusted edge devices as a resource pool for expedited, sustainable big AI model training at the edge. As an initial step, we present a comprehensive framework for building collaborative edge training systems and analyze in-depth its merits and sustainable scheduling choices following its workflow. To further investigate the impact of its parallelism design, we empirically study a case of four typical parallelisms from the perspective of energy demand with realistic testbeds. Finally, we discuss open challenges for sustainable collaborative edge training to point to future directions of edge-centric big AI model training.

Updated: 2024-04-27 03:09:39

标题: 使用协作边缘计算实现大型人工智能模型在无线网络中的应用

摘要: 大型人工智能（AI）模型已经成为边缘智能应用中至关重要的元素，例如智能家居中的语音助手和智能工厂中的自主机器人。训练大型AI模型，例如用于个性化微调和持续模型改进，由于边缘设备的有限计算资源和与训练相关的密集工作负载之间固有的冲突，面临着重大挑战。尽管存在设备训练的限制，传统方法通常会将训练数据聚合并发送到远程云进行集中训练。然而，这种方法既不可持续，会对远程传输和能耗高的数据中心造成压力，也不安全私密，会与远程基础设施共享用户的原始数据。为了解决这些挑战，我们相反地观察到，普遍的边缘环境通常包含一系列值得信任的边缘设备，具有未利用的空闲资源，可以利用这些资源加速边缘训练。基于此，本文提出了协作边缘训练，这是一种新颖的训练机制，将一组受信任的边缘设备编排为资源池，以在边缘快速、可持续地训练大型AI模型。作为初步步骤，我们提出了一个建立协作边缘训练系统的全面框架，并深入分析了其优点和在其工作流程中的可持续调度选择。为了进一步研究其并行设计的影响，我们从能源需求的角度，在现实测试环境中对四种典型并行设计进行了实证研究。最后，我们讨论了可持续协作边缘训练的开放挑战，指出了边缘为中心的大型AI模型训练未来方向。

更新时间: 2024-04-27 03:09:39

领域: cs.LG,cs.AI,cs.DC,cs.NI

下载: http://arxiv.org/abs/2404.17766v1

Applying Unsupervised Semantic Segmentation to High-Resolution UAV Imagery for Enhanced Road Scene Parsing

There are two challenges presented in parsing road scenes from UAV images: the complexity of processing high-resolution images and the dependency on extensive manual annotations required by traditional supervised deep learning methods to train robust and accurate models. In this paper, a novel unsupervised road parsing framework that leverages advancements in vision language models with fundamental computer vision techniques is introduced to address these critical challenges. Our approach initiates with a vision language model that efficiently processes ultra-high resolution images to rapidly identify road regions of interest. Subsequent application of the vision foundation model, SAM, generates masks for these regions without requiring category information. A self-supervised learning network then processes these masked regions to extract feature representations, which are clustered using an unsupervised algorithm that assigns unique IDs to each feature cluster. The masked regions are combined with the corresponding IDs to generate initial pseudo-labels, which initiate an iterative self-training process for regular semantic segmentation. Remarkably, the proposed method achieves a mean Intersection over Union (mIoU) of 89.96% on the development dataset without any manual annotation, demonstrating extraordinary flexibility by surpassing the limitations of human-defined categories, and autonomously acquiring knowledge of new categories from the dataset itself.

Updated: 2024-04-27 02:38:40

标题: 将无监督语义分割应用于高分辨率无人机图像以增强道路场景解析

摘要: 在从无人机图像中解析道路场景时存在两个挑战：处理高分辨率图像的复杂性以及传统监督深度学习方法所需的大量手动注释来训练稳健和准确的模型的依赖性。本文介绍了一种新颖的无监督道路解析框架，利用视觉语言模型和基本计算机视觉技术的进展来解决这些关键挑战。我们的方法首先使用视觉语言模型有效处理超高分辨率图像，快速识别感兴趣的道路区域。接着，应用视觉基础模型SAM，为这些区域生成掩模，无需类别信息。接着，一个自监督学习网络处理这些掩模区域，提取特征表示，然后使用无监督算法对这些特征进行聚类，并为每个特征簇分配唯一ID。掩模区域与相应的ID结合生成初始伪标签，启动用于常规语义分割的迭代自训练过程。值得注意的是，所提出的方法在开发数据集上实现了89.96%的平均交集联合（mIoU），而无需任何手动标注，通过超越人类定义的类别的限制，自主从数据集中获取新类别的知识，展现出非凡的灵活性。

更新时间: 2024-04-27 02:38:40

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.02985v2

Adversarial Examples: Generation Proposal in the Context of Facial Recognition Systems

In this paper we investigate the vulnerability that facial recognition systems present to adversarial examples by introducing a new methodology from the attacker perspective. The technique is based on the use of the autoencoder latent space, organized with principal component analysis. We intend to analyze the potential to craft adversarial examples suitable for both dodging and impersonation attacks, against state-of-the-art systems. Our initial hypothesis, which was not strongly favoured by the results, stated that it would be possible to separate between the "identity" and "facial expression" features to produce high-quality examples. Despite the findings not supporting it, the results sparked insights into adversarial examples generation and opened new research avenues in the area.

Updated: 2024-04-27 02:35:15

标题: 对抗性样本：在人脸识别系统环境中的生成提案

摘要: 在这篇论文中，我们通过从攻击者的角度引入一种新的方法，研究了面部识别系统对敌对示例的脆弱性。该技术基于使用自动编码器潜在空间，利用主成分分析进行组织。我们打算分析制造适用于躲避和冒充攻击的敌对示例的潜力，针对最先进的系统。我们最初的假设并没有得到强烈支持，该假设认为可以将“身份”和“面部表情”特征分离以生成高质量的示例。尽管结果并不支持这一点，但结果启发了对敌对示例生成的见解，并在该领域开辟了新的研究途径。

更新时间: 2024-04-27 02:35:15

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.17760v1

The Common Core Ontologies

The Common Core Ontologies (CCO) are designed as a mid-level ontology suite that extends the Basic Formal Ontology. CCO has since been increasingly adopted by a broad group of users and applications and is proposed as the first standard mid-level ontology. Despite these successes, documentation of the contents and design patterns of the CCO has been comparatively minimal. This paper is a step toward providing enhanced documentation for the mid-level ontology suite through a discussion of the contents of the eleven ontologies that collectively comprise the Common Core Ontology suite.

Updated: 2024-04-27 02:23:02

标题: 通用核心本体论

摘要: Common Core Ontologies（CCO）被设计为一个中级本体套件，扩展了基本形式本体。 CCO自那时起已被广泛采用，并被提议作为第一个标准中级本体。尽管取得了成功，但是CCO的内容和设计模式的文档相对较少。本文通过讨论共同核心本体套件的十一个本体的内容，是向中级本体套件提供增强文档的一步。

更新时间: 2024-04-27 02:23:02

领域: cs.AI,cs.DB,cs.LO

下载: http://arxiv.org/abs/2404.17758v1

Middle Architecture Criteria

Mid-level ontologies are used to integrate terminologies and data across disparate domains. There are, however, no clear, defensible criteria for determining whether a given ontology should count as mid-level, because we lack a rigorous characterization of what the middle level of generality is supposed to contain. Attempts to provide such a characterization have failed, we believe, because they have focused on the goal of specifying what is characteristic of those single ontologies that have been advanced as mid-level ontologies. Unfortunately, single ontologies of this sort are generally a mixture of top- and mid-level, and sometimes even of domain-level terms. To gain clarity, we aim to specify the necessary and sufficient conditions for a collection of one or more ontologies to inhabit what we call a mid-level architecture.

Updated: 2024-04-27 02:16:26

标题: 中间架构标准

摘要: 中层本体论被用来整合不同领域的术语和数据。然而，目前并没有明确可靠的标准来确定一个给定的本体论是否应该被视为中层，因为我们缺乏对于中等普遍性层次应该包含的内容的严格描述。过去试图提供这样的描述都未能成功，我们认为这是因为它们专注于规定那些被提出为中层本体论的单一本体论具有的特征。不幸的是，这种类型的单一本体论通常是顶层和中层的混合体，有时甚至包含领域级别的术语。为了明确，我们的目标是规定一个或多个本体论集合满足我们所称的中层架构的必要和充分条件。

更新时间: 2024-04-27 02:16:26

领域: cs.AI,cs.DB,cs.LO

下载: http://arxiv.org/abs/2404.17757v1

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment between its pre-training objectives and feature extraction methods. This inconsistency can diminish the quality of the image's feature representation, adversely affecting CLIP's effectiveness in target tasks. In this paper, we view text features as precise neighbors of image features in CLIP's space and present a novel CrOss-moDal nEighbor Representation(CODER) based on the distance structure between images and their neighbor texts. This feature extraction method aligns better with CLIP's pre-training objectives, thereby fully leveraging CLIP's robust cross-modal capabilities. The key to construct a high-quality CODER lies in how to create a vast amount of high-quality and diverse texts to match with images. We introduce the Auto Text Generator(ATG) to automatically generate the required texts in a data-free and training-free manner. We apply CODER to CLIP's zero-shot and few-shot image classification tasks. Experiment results across various datasets and models confirm CODER's effectiveness. Code is available at:https://github.com/YCaigogogo/CVPR24-CODER.

Updated: 2024-04-27 02:04:36

标题: 利用跨模态邻域表示提升CLIP分类

摘要: CLIP展示了出色的跨模态匹配能力，这是由于它在图像-文本对比学习任务上的训练。然而，如果没有针对单模态情景进行特定优化，它在单模态特征提取方面的性能可能不够理想。尽管如此，一些研究直接使用了CLIP的图像编码器进行少样本分类等任务，导致其预训练目标与特征提取方法之间存在不一致。这种不一致可能会降低图像特征表示的质量，从而对CLIP在目标任务中的有效性产生不利影响。在本文中，我们将文本特征视为CLIP空间中图像特征的精确邻居，并基于图像和其邻居文本之间的距离结构提出了一种基于跨模态邻居表示的新颖特征提取方法（CODER）。这种特征提取方法更好地与CLIP的预训练目标保持一致，从而充分发挥CLIP的强大跨模态能力。构建高质量CODER的关键在于如何创建大量高质量和多样化的文本来匹配图像。我们引入了自动文本生成器（ATG）以数据无关和无需训练的方式自动生成所需的文本。我们将CODER应用于CLIP的零样本和少样本图像分类任务中。跨多个数据集和模型的实验结果验证了CODER的有效性。代码可在https://github.com/YCaigogogo/CVPR24-CODER找到。

更新时间: 2024-04-27 02:04:36

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.17753v1

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual prompts. This allows users to intuitively mark images and interact with the model using natural cues like a "red bounding box" or "pointed arrow". Our simple design directly overlays visual markers onto the RGB image, eliminating the need for complex region encodings, yet achieves state-of-the-art performance on region-understanding tasks like Visual7W, PointQA, and Visual Commonsense Reasoning benchmark. Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain. Code, data, and model are publicly available.

Updated: 2024-04-27 01:53:39

标题: ViP-LLaVA：使大型多模态模型理解任意视觉提示

摘要: 尽管现有的大规模视觉语言多模型专注于整体图像理解，但在实现区域特定理解方面存在显着差距。当前使用文本坐标或空间编码的方法通常无法为视觉提示提供用户友好的界面。为了解决这一挑战，我们引入了一种能够解码任意视觉提示的新型多模型。这使用户可以直观地标记图像，并使用自然线索与模型互动，如“红色边界框”或“指向箭头”。我们的简单设计直接将视觉标记叠加到RGB图像上，消除了复杂区域编码的需求，同时在Visual7W、PointQA和Visual Commonsense Reasoning基准测试等区域理解任务上实现了最先进的性能。此外，我们提出了ViP-Bench，这是一个全面的基准测试，用于评估模型在理解多个维度的视觉提示方面的能力，从而促进未来在这一领域的研究。代码、数据和模型均可公开获取。

更新时间: 2024-04-27 01:53:39

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2312.00784v2

Application of a Dense Fusion Attention Network in Fault Diagnosis of Centrifugal Fan

Although the deep learning recognition model has been widely used in the condition monitoring of rotating machinery. However, it is still a challenge to understand the correspondence between the structure and function of the model and the diagnosis process. Therefore, this paper discusses embedding distributed attention modules into dense connections instead of traditional dense cascading operations. It not only decouples the influence of space and channel on fault feature adaptive recalibration feature weights, but also forms a fusion attention function. The proposed dense fusion focuses on the visualization of the network diagnosis process, which increases the interpretability of model diagnosis. How to continuously and effectively integrate different functions to enhance the ability to extract fault features and the ability to resist noise is answered. Centrifugal fan fault data is used to verify this network. Experimental results show that the network has stronger diagnostic performance than other advanced fault diagnostic models.

Updated: 2024-04-27 01:49:46

标题: 使用密集融合注意力网络在离心风机故障诊断中的应用

摘要: 尽管深度学习识别模型已广泛应用于旋转机械的状态监测，但理解模型结构和功能与诊断过程之间的对应关系仍然是一个挑战。因此，本文讨论了将分布式注意力模块嵌入到密集连接中，取代传统的密集级联操作。这不仅解耦了空间和通道对故障特征自适应重校正特征权重的影响，还形成了融合注意力功能。所提出的密集融合聚焦于网络诊断过程的可视化，提高了模型诊断的可解释性。如何持续有效地集成不同功能以增强提取故障特征的能力和抵抗噪声的能力得到了解答。离心风扇故障数据被用来验证这个网络。实验结果表明，该网络具有比其他先进的故障诊断模型更强的诊断性能。

更新时间: 2024-04-27 01:49:46

领域: cs.LG

下载: http://arxiv.org/abs/2311.07614v2

Generative Diffusion-based Downscaling for Climate

Downscaling, or super-resolution, provides decision-makers with detailed, high-resolution information about the potential risks and impacts of climate change, based on climate model output. Machine learning algorithms are proving themselves to be efficient and accurate approaches to downscaling. Here, we show how a generative, diffusion-based approach to downscaling gives accurate downscaled results. We focus on an idealised setting where we recover ERA5 at $0.25\degree$~resolution from coarse grained version at $2\degree$~resolution. The diffusion-based method provides superior accuracy compared to a standard U-Net, particularly at the fine scales, as highlighted by a spectral decomposition. Additionally, the generative approach provides users with a probability distribution which can be used for risk assessment. This research highlights the potential of diffusion-based downscaling techniques in providing reliable and detailed climate predictions.

Updated: 2024-04-27 01:49:14

标题: 气候的生成式扩散降尺度

摘要: 细分或超分辨率技术为决策者提供了基于气候模型输出的有关气候变化潜在风险和影响的详细高分辨率信息。机器学习算法证明了它们是高效准确的细分方法。在这里，我们展示了一种基于生成式扩散的细分方法如何提供准确的细分结果。我们专注于一个理想化的设置，从$2\degree$分辨率的粗粒化版本中恢复出$0.25\degree$分辨率的ERA5。相比标准的U-Net，基于扩散的方法在细小尺度上提供了更为准确的结果，这是由光谱分解所强调的。此外，生成式方法为用户提供了一个可以用于风险评估的概率分布。这项研究突显了基于扩散的细分技术在提供可靠和详细的气候预测方面的潜力。

更新时间: 2024-04-27 01:49:14

领域: physics.ao-ph,cs.LG

下载: http://arxiv.org/abs/2404.17752v1

UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis

This paper presents our team's participation in the MEDIQA-ClinicalNLP2024 shared task B. We present a novel approach to diagnosing clinical dermatology cases by integrating large multimodal models, specifically leveraging the capabilities of GPT-4V under a retriever and a re-ranker framework. Our investigation reveals that GPT-4V, when used as a retrieval agent, can accurately retrieve the correct skin condition 85% of the time using dermatological images and brief patient histories. Additionally, we empirically show that Naive Chain-of-Thought (CoT) works well for retrieval while Medical Guidelines Grounded CoT is required for accurate dermatological diagnosis. Further, we introduce a Multi-Agent Conversation (MAC) framework and show its superior performance and potential over the best CoT strategy. The experiments suggest that using naive CoT for retrieval and multi-agent conversation for critique-based diagnosis, GPT-4V can lead to an early and accurate diagnosis of dermatological conditions. The implications of this work extend to improving diagnostic workflows, supporting dermatological education, and enhancing patient care by providing a scalable, accessible, and accurate diagnostic tool.

Updated: 2024-04-27 01:39:05

标题: UMass-BioNLP在MEDIQA-M3G 2024的表现：DermPrompt -- 使用GPT-4V系统地探索皮肤诊断的提示工程

摘要: 本文介绍了我们团队参与MEDIQA-ClinicalNLP2024共享任务B的情况。我们提出了一种新颖的方法，通过整合大型多模型，特别是利用GPT-4V在检索器和重新排名框架下的能力，来诊断临床皮肤病例。我们的研究发现，当GPT-4V作为检索代理时，可以使用皮肤病图像和简要患者病史准确地检索到正确的皮肤状况，准确率达到85%。此外，我们经验性地展示了天真的CoT在检索中的有效性，而基于医学指南的CoT则是准确诊断皮肤病的必要条件。此外，我们引入了一个多代理对话（MAC）框架，并展示了它相对于最佳CoT策略的卓越性能和潜力。实验表明，使用天真的CoT进行检索和多代理对话进行基于批评的诊断，GPT-4V可以导致对皮肤病的早期和准确诊断。这项工作的影响延伸到改进诊断工作流程，支持皮肤病教育，并通过提供一个可扩展、可访问和准确的诊断工具来提高患者护理。

更新时间: 2024-04-27 01:39:05

领域: cs.AI

下载: http://arxiv.org/abs/2404.17749v1

Dynamic Backtracking in GFlowNets: Enhancing Decision Steps with Reward-Dependent Adjustment Mechanisms

Generative Flow Networks (GFlowNets or GFNs) are probabilistic models predicated on Markov flows, and they employ specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, etc. With a strong ability to generate high-performance biochemical molecules, GFNs accelerate the discovery of scientific substances, effectively overcoming the time-consuming, labor-intensive, and costly shortcomings of conventional material discovery methods. However, previous studies rarely focus on accumulating exploratory experience by adjusting generative structures, which leads to disorientation in complex sampling spaces. Efforts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel variant of GFNs, the Dynamic Backtracking GFN (DB-GFN), which improves the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN allows backtracking during the network construction process according to the current state's reward value, thereby correcting disadvantageous decisions and exploring alternative pathways during the exploration process. When applied to generative tasks involving biochemical molecules and genetic material sequences, DB-GFN outperforms GFN models such as LS-GFN and GTB, as well as traditional reinforcement learning methods, in sample quality, sample exploration quantity, and training convergence speed. Additionally, owing to its orthogonal nature, DB-GFN shows great potential in future improvements of GFNs, and it can be integrated with other strategies to achieve higher search performance.

Updated: 2024-04-27 01:37:10

标题: GFlowNets中的动态回溯：通过奖励依赖调整机制增强决策步骤

摘要: 生成式流网络（GFlowNets或GFNs）是建立在马尔可夫流上的概率模型，它们采用特定的摊销算法来学习生成包括生物分子、化学材料等在内的组合物质的随机策略。GFNs具有强大的生成高性能生物化学分子的能力，加速科学物质的发现，有效地克服了传统物质发现方法耗时、劳动密集和成本高昂的缺点。然而，先前的研究很少关注通过调整生成结构来积累探索经验，导致在复杂的采样空间中迷失方向。为解决这一问题的努力，如LS-GFN，局限于局部贪婪搜索，缺乏更广泛的全局调整。本文介绍了GFNs的一种新变体，即动态回溯GFN（DB-GFN），通过基于奖励的动态回溯机制改善决策步骤的适应性。DB-GFN允许根据当前状态的奖励值在网络构建过程中进行回溯，从而纠正不利决策并在探索过程中探索替代路径。在涉及生物化学分子和基因物质序列的生成任务中，DB-GFN在样本质量、样本探索数量和训练收敛速度方面优于LS-GFN和GTB等GFN模型，以及传统的强化学习方法。此外，由于其正交性质，DB-GFN在未来改进GFNs方面显示出巨大潜力，并可以与其他策略集成以实现更高的搜索性能。

更新时间: 2024-04-27 01:37:10

领域: cs.LG

下载: http://arxiv.org/abs/2404.05576v3

On the Rashomon ratio of infinite hypothesis sets

Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.

Updated: 2024-04-27 01:34:51

标题: 有关无限假设集的“罗生门比率”

摘要: 在一个分类问题和一个分类器家族的情况下，Rashomon比率衡量了产生低于给定损失的分类器比例。先前的研究探讨了在有限分类器家族情况下较大的Rashomon比率的优势。在这里，我们考虑了更一般的情况，即无限分类器家族的情况。我们表明，较大的Rashomon比率确保在从该家族的随机子集中选择具有最佳经验准确度的分类器，这有可能提高泛化能力，不会使经验损失增加太多。我们在两个涉及无限分类器家族的示例中量化了Rashomon比率，以说明其中Rashomon比率较大的情况。在第一个示例中，我们估计使用仿射分类器对正态分布类进行分类的Rashomon比率。在第二个示例中，当分类器家族由两层ReLU神经网络组成时，我们得到了修改的Gram矩阵分类问题的Rashomon比率的下界。总的来说，我们表明可以使用训练数据集以及从分类器家族中随机采样来估计Rashomon比率，并且我们提供了这种估计接近真实Rashomon比率值的保证。

更新时间: 2024-04-27 01:34:51

领域: cs.LG,math.PR,stat.ML,68T07, 68T10

下载: http://arxiv.org/abs/2404.17746v1

Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks.

Updated: 2024-04-27 01:34:46

标题: 攻击贝叶斯：贝叶斯神经网络的对抗鲁棒性

摘要: 对抗样本已被证明会导致神经网络在广泛的视觉和语言任务上失败，但最近的研究声称贝叶斯神经网络（BNNs）在本质上对对抗扰动具有稳健性。在这项研究中，我们对这一说法进行了检验。为了研究BNN的对抗稳健性，我们调查了是否可能成功地破坏最先进的BNN推理方法和预测管道，即使是采用相对不成熟的攻击方式，用于三个任务：（1）在后验预测均值下的标签预测，（2）使用贝叶斯预测不确定性进行对抗样本检测，以及（3）语义转变检测。我们发现，使用最先进的近似推理方法进行训练的BNN，甚至使用哈密顿蒙特卡洛训练的BNN，对对抗攻击具有很高的脆弱性。我们还发现了之前声称BNN具有固有对对抗攻击的鲁棒性的研究中存在各种概念和实验错误，并明确证明BNN和具有不确定性意识的贝叶斯预测管道在本质上不具备对对抗攻击的鲁棒性。

更新时间: 2024-04-27 01:34:46

领域: cs.LG,cs.AI,cs.CV,stat.ME,stat.ML

下载: http://arxiv.org/abs/2404.19640v1

Block-Diagonal Guided DBSCAN Clustering

Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.

Updated: 2024-04-27 01:34:41

标题: 块对角引导的DBSCAN聚类

摘要: 聚类分析在数据库挖掘中起着至关重要的作用，而在这一领域中最广泛使用的算法之一是DBSCAN。然而，DBSCAN存在一些限制，例如处理高维大规模数据困难、对输入参数敏感以及在生成聚类结果方面缺乏鲁棒性。本文介绍了一种改进版本的DBSCAN，利用相似图的块对角特性来引导DBSCAN的聚类过程。关键思想是构建一个图，衡量高维大规模数据点之间的相似性，并具有通过未知排列转换为块对角形式的潜力，随后通过一种集群排序过程生成所需的排列。通过识别置换图中的对角块，可以轻松确定聚类结构。我们提出了一种基于梯度下降的方法来解决所提出的问题。此外，我们开发了一种基于DBSCAN的点遍历算法，可以识别图中密度较高的聚类并生成聚类的增强排序。然后，通过根据遍历顺序的置换实现图的块对角结构，为自动和交互式聚类分析提供灵活的基础。我们引入了一种分割和细化算法，可以在特定情况下自动搜索置换图中的所有对角块，并在理论上提供最佳保证。我们在十二个具有挑战性的真实世界基准聚类数据集上广泛评估了我们提出的方法，并展示了与每个数据集上最先进的聚类方法相比其优越的性能。

更新时间: 2024-04-27 01:34:41

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2404.01341v2

An Attention-Based Deep Learning Architecture for Real-Time Monocular Visual Odometry: Applications to GPS-free Drone Navigation

Drones are increasingly used in fields like industry, medicine, research, disaster relief, defense, and security. Technical challenges, such as navigation in GPS-denied environments, hinder further adoption. Research in visual odometry is advancing, potentially solving GPS-free navigation issues. Traditional visual odometry methods use geometry-based pipelines which, while popular, often suffer from error accumulation and high computational demands. Recent studies utilizing deep neural networks (DNNs) have shown improved performance, addressing these drawbacks. Deep visual odometry typically employs convolutional neural networks (CNNs) and sequence modeling networks like recurrent neural networks (RNNs) to interpret scenes and deduce visual odometry from video sequences. This paper presents a novel real-time monocular visual odometry model for drones, using a deep neural architecture with a self-attention module. It estimates the ego-motion of a camera on a drone, using consecutive video frames. An inference utility processes the live video feed, employing deep learning to estimate the drone's trajectory. The architecture combines a CNN for image feature extraction and a long short-term memory (LSTM) network with a multi-head attention module for video sequence modeling. Tested on two visual odometry datasets, this model converged 48% faster than a previous RNN model and showed a 22% reduction in mean translational drift and a 12% improvement in mean translational absolute trajectory error, demonstrating enhanced robustness to noise.

Updated: 2024-04-27 01:22:45

标题: 一种基于注意力的深度学习架构用于实时单目视觉里程计：应用于无GPS的无人机导航

摘要: The use of drones is becoming increasingly popular in various fields such as industry, medicine, research, disaster relief, defense, and security. However, technical challenges, such as navigation in GPS-denied environments, are hindering further adoption. Research in visual odometry is progressing and may potentially solve the issue of GPS-free navigation. While traditional visual odometry methods rely on geometry-based pipelines, they often suffer from error accumulation and high computational demands. Recent studies have shown that deep neural networks (DNNs) can improve performance and address these drawbacks. Deep visual odometry typically uses convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to interpret scenes and infer visual odometry from video sequences. This paper introduces a novel real-time monocular visual odometry model for drones, utilizing a deep neural architecture with a self-attention module. The model estimates the ego-motion of a drone's camera using consecutive video frames and employs deep learning to estimate the drone's trajectory from live video feed. The architecture combines a CNN for image feature extraction with a long short-term memory (LSTM) network and a multi-head attention module for video sequence modeling. Tested on two visual odometry datasets, this model outperformed a previous RNN model by converging 48% faster, showing a 22% reduction in mean translational drift, and a 12% improvement in mean translational absolute trajectory error, indicating enhanced robustness to noise.

更新时间: 2024-04-27 01:22:45

领域: cs.RO,cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2404.17745v1

Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models

Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant research effort to improve image sample quality, there is little work on representation-controlled generation using diffusion models. Specifically, causal modeling and controllable counterfactual generation using DPMs is an underexplored area. In this work, we propose CausalDiffAE, a diffusion-based causal representation learning framework to enable counterfactual generation according to a specified causal model. Our key idea is to use an encoder to extract high-level semantically meaningful causal variables from high-dimensional data and model stochastic variation using reverse diffusion. We propose a causal encoding mechanism that maps high-dimensional data to causally related latent factors and parameterize the causal mechanisms among latent factors using neural networks. To enforce the disentanglement of causal variables, we formulate a variational objective and leverage auxiliary label information in a prior to regularize the latent space. We propose a DDIM-based counterfactual generation procedure subject to do-interventions. Finally, to address the limited label supervision scenario, we also study the application of CausalDiffAE when a part of the training data is unlabeled, which also enables granular control over the strength of interventions in generating counterfactuals during inference. We empirically show that CausalDiffAE learns a disentangled latent space and is capable of generating high-quality counterfactual images.

Updated: 2024-04-27 00:09:26

标题: 因果扩散自编码器：通过扩散概率模型实现反事实生成

摘要: 扩散概率模型（DPMs）已成为高质量图像生成的最新技术。然而，DPMs具有一个任意的带有噪声的潜在空间，没有可解释或可控的语义。尽管已经有大量研究工作致力于提高图像样本质量，但在使用扩散模型进行表示控制生成方面几乎没有工作。具体而言，使用DPMs进行因果建模和可控对事实生成是一个未被充分探索的领域。在这项工作中，我们提出了CausalDiffAE，这是一个基于扩散的因果表示学习框架，可以根据指定的因果模型实现对事实生成。我们的关键思想是使用编码器从高维数据中提取高层语义有意义的因果变量，并使用逆扩散来建模随机变化。我们提出了一个因果编码机制，将高维数据映射到因果相关的潜在因子，并使用神经网络参数化潜在因子之间的因果机制。为了强制因果变量的解缠，我们制定了一个变分目标，并利用先验中的辅助标签信息来规范潜在空间。我们提出了一个基于DDIM的可控对事实生成过程，可以进行干预。最后，为了解决有限标签监督的情况，我们还研究了当部分训练数据未标记时CausalDiffAE的应用，这也使得在推断过程中生成对事实时可以对干预的强度进行精细控制。我们实证表明，CausalDiffAE学习了一个解缠的潜在空间，并能够生成高质量的对事实图像。

更新时间: 2024-04-27 00:09:26

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.17735v1

Building a Large Japanese Web Corpus for Large Language Models

Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This corpus consists of approximately 312.1 billion characters (approximately 173 million pages), which is the largest of all available training corpora for Japanese LLMs, surpassing CC-100 (approximately 25.8 billion characters), mC4 (approximately 239.7 billion characters) and OSCAR 23.10 (approximately 74 billion characters). To confirm the quality of the corpus, we performed continual pre-training on Llama 2 7B, 13B, 70B, Mistral 7B v0.1, and Mixtral 8x7B Instruct as base LLMs and gained consistent (6.6-8.1 points) improvements on Japanese benchmark datasets. We also demonstrate that the improvement on Llama 2 13B brought from the presented corpus was the largest among those from other existing corpora.

Updated: 2024-04-27 00:02:45

标题: 构建一个大规模日语网络语料库用于大型语言模型

摘要: 开放式的日语大型语言模型（LLMs）已经在像CC-100、mC4和OSCAR这样的语料库的日语部分上进行了训练。然而，这些语料库并不是为了日语文本的质量而创建的。本研究通过从Common Crawl存档中提取和精炼文本来构建一个大规模的日语网络语料库（在2020年至2023年之间爬取了大约634亿页的21个快照）。该语料库包含大约3121亿个字符（大约173百万页），是目前所有可用于日语LLMs训练的语料库中最大的，超过了CC-100（大约258亿个字符）、mC4（大约2397亿个字符）和OSCAR 23.10（大约740亿个字符）。为了确认语料库的质量，我们在Llama 2 7B、13B、70B、Mistral 7B v0.1和Mixtral 8x7B上进行了持续的预训练，并在日本基准数据集上获得了一致的（6.6-8.1点）改进。我们还展示了从所提出的语料库中带来的对Llama 2 13B的改进是现有语料库中最大的。

更新时间: 2024-04-27 00:02:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.17733v1

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

Large Language Models (LLMs) have dramatically advanced AI applications, yet their deployment remains challenging due to their immense inference costs. Recent studies ameliorate the computational costs of LLMs by increasing their activation sparsity but suffer from significant performance degradation on downstream tasks. In this work, we introduce a new framework for sparsifying the activations of base LLMs and reducing inference costs, dubbed Contextually Aware Thresholding for Sparsity (CATS). CATS is relatively simple, easy to implement, and highly effective. At the heart of our framework is a new non-linear activation function. We demonstrate that CATS can be applied to various base models, including Mistral-7B and Llama2-7B, and outperforms existing sparsification techniques in downstream task performance. More precisely, CATS-based models often achieve downstream task performance within 1-2% of their base models without any fine-tuning and even at activation sparsity levels of 50%. Furthermore, CATS-based models converge faster and display better task performance than competing techniques when fine-tuning is applied. Finally, we develop a custom GPU kernel for efficient implementation of CATS that translates the activation of sparsity of CATS to real wall-clock time speedups. Our custom kernel implementation of CATS results in a ~15% improvement in wall-clock inference latency of token generation on both Llama-7B and Mistral-7B.

Updated: 2024-04-27 00:01:02

标题: CATS：大型语言模型中稀疏性的上下文感知阈值化

摘要: 大型语言模型（LLMs）已经显著推动了人工智能应用的发展，但由于其巨大的推理成本，它们的部署仍然具有挑战性。最近的研究通过增加激活稀疏性来改善LLMs的计算成本，但在下游任务上遭受了显著的性能下降。在这项工作中，我们引入了一种新的框架来稀疏化基本LLMs的激活并减少推理成本，称为适应上下文的稀疏化阈值（CATS）。CATS相对简单、易于实现且高效。我们框架的核心是一种新的非线性激活函数。我们证明CATS可以应用于各种基本模型，包括Mistral-7B和Llama2-7B，并在下游任务性能方面优于现有的稀疏化技术。更具体地说，基于CATS的模型通常在没有任何微调的情况下，在激活稀疏性水平为50%时，实现下游任务性能接近其基本模型的1-2%。此外，当应用微调时，基于CATS的模型收敛速度更快，展现出比竞争技术更好的任务性能。最后，我们为CATS开发了一个定制的GPU核心，以实现高效的实现，将CATS的激活稀疏性转化为真实的挂钟时间加速。我们对CATS的定制核心实现导致Llama-7B和Mistral-7B的令牌生成挂钟推理延迟约15%的改善。

更新时间: 2024-04-27 00:01:02

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.08763v2