Arxiv Day: Article

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium.

Updated: 2024-04-28 23:35:05

标题: 谁先行动？优化在拥有多个机器人的斯塔克贝格博弈中的行动顺序

摘要: 我们考虑计算多智能体空间导航问题的社会最优决策顺序，即智能体做出决策的顺序，并且其关联的N个玩家斯塔克尔贝格轨迹博弈中的均衡状态。我们将这个问题建模为一个混合整数优化问题，涵盖了所有可能与决策顺序排列相关的斯塔克尔贝格博弈空间。为了解决这个问题，我们引入了Branch and Play (B&P)，这是一个高效且精确的算法，可以保证收敛到社会最优决策顺序及其斯塔克尔贝格均衡状态。作为B&P的一个子程序，我们采用并扩展了顺序轨迹规划，即一种流行的多智能体控制方法，以便为任何给定的决策顺序高效地计算有效的本地斯塔克尔贝格均衡状态。我们演示了B&P在协调空中交通管制、群体编队和交付车队方面的实际效用。我们发现，B&P始终优于各种基线，并计算出社会最优均衡状态。

更新时间: 2024-04-28 23:35:05

领域: cs.RO,cs.AI,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2402.09246v2

Public-private funding models in open source software development: A case study on scikit-learn

Governments are increasingly funding open source software (OSS) development to support software security, digital sovereignty, and national competitiveness in science and innovation, amongst others. However, little is known about how OSS developers evaluate the relative benefits and drawbacks of emergent governmental funding for OSS. This paper explores this question through a case study on scikit-learn, a popular Python library for machine learning, which has been funded by public research grants, commercial sponsorship, micro-donations, and a 32 million euro grant announced in France's artificial intelligence strategy. Through 25 interviews with scikit-learn's maintainers and funders, this study makes two key contributions to research and practice. First, it contributes novel empirical findings on the effective design and implementation of a public-private funding model in an OSS project, as well as how the maintainers of scikit-learn have designed and employed governance protocols to balance the diverse interests of their funders and to safeguard their community ethos. Second, it offers practical lessons on funding in community-led OSS projects and makes recommendations to practitioners. The paper concludes with a discussion of the key recommendations.

Updated: 2024-04-28 22:26:22

标题: 《开源软件开发中的公私合作资助模式：以scikit-learn为案例研究》

摘要: 政府越来越多地资助开源软件（OSS）的开发，以支持软件安全、数字主权和国家在科学创新等领域的竞争力。然而，目前对于OSS开发者如何评估新兴政府资助对OSS的相对利弊知之甚少。本文通过对scikit-learn进行案例研究，这是一个用于机器学习的流行Python库，该库通过公共研究资助、商业赞助、微捐款以及法国人工智能战略中宣布的3200万欧元补助金进行资助。通过对scikit-learn的维护者和资助者进行的25次访谈，本研究在研究和实践方面做出了两个关键贡献。首先，它提供了关于在OSS项目中设计和实施公私资助模式的有效实证研究结果，以及scikit-learn的维护者如何设计和使用治理协议来平衡资助者的不同利益，并维护他们的社区精神。其次，它提供了关于社区主导的OSS项目资助的实际经验教训，并向从业者提出建议。文章最后讨论了主要的建议。

更新时间: 2024-04-28 22:26:22

领域: cs.SE,cs.AI,cs.CY,cs.LG,K.4.1

下载: http://arxiv.org/abs/2404.06484v4

RTA-Former: Reverse Transformer Attention for Polyp Segmentation

Polyp segmentation is a key aspect of colorectal cancer prevention, enabling early detection and guiding subsequent treatments. Intelligent diagnostic tools, including deep learning solutions, are widely explored to streamline and potentially automate this process. However, even with many powerful network architectures, there still comes the problem of producing accurate edge segmentation. In this paper, we introduce a novel network, namely RTA-Former, that employs a transformer model as the encoder backbone and innovatively adapts Reverse Attention (RA) with a transformer stage in the decoder for enhanced edge segmentation. The results of the experiments illustrate that RTA-Former achieves state-of-the-art (SOTA) performance in five polyp segmentation datasets. The strong capability of RTA-Former holds promise in improving the accuracy of Transformer-based polyp segmentation, potentially leading to better clinical decisions and patient outcomes. Our code is publicly available on GitHub.

Updated: 2024-04-28 22:21:56

标题: RTA-Former: 用于息肉分割的反向变压器注意力

摘要: 息息相关的肠道息肉分割是结直肠癌预防的关键，可以实现早期发现并指导随后的治疗。智能诊断工具，包括深度学习解决方案，被广泛探讨以简化和潜在自动化这一过程。然而，即使有许多强大的网络架构，仍然存在产生准确边缘分割的问题。在本文中，我们介绍了一种新颖的网络，即RTA-Former，它采用变压器模型作为编码器骨干，并创新地将反向注意力（RA）与变压器阶段结合在解码器中，以增强边缘分割。实验结果表明，RTA-Former在五个息肉分割数据集中实现了最先进的性能。RTA-Former的强大能力有望提高基于变压器的息肉分割的准确性，潜在地导致更好的临床决策和患者结果。我们的代码可以在GitHub上公开获取。

更新时间: 2024-04-28 22:21:56

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.11671v2

Multi-stage Attack Detection and Prediction Using Graph Neural Networks: An IoT Feasibility Study

With the ever-increasing reliance on digital networks for various aspects of modern life, ensuring their security has become a critical challenge. Intrusion Detection Systems play a crucial role in ensuring network security, actively identifying and mitigating malicious behaviours. However, the relentless advancement of cyber-threats has rendered traditional/classical approaches insufficient in addressing the sophistication and complexity of attacks. This paper proposes a novel 3-stage intrusion detection system inspired by a simplified version of the Lockheed Martin cyber kill chain to detect advanced multi-step attacks. The proposed approach consists of three models, each responsible for detecting a group of attacks with common characteristics. The detection outcome of the first two stages is used to conduct a feasibility study on the possibility of predicting attacks in the third stage. Using the ToN IoT dataset, we achieved an average of 94% F1-Score among different stages, outperforming the benchmark approaches based on Random-forest model. Finally, we comment on the feasibility of this approach to be integrated in a real-world system and propose various possible future work.

Updated: 2024-04-28 22:11:24

标题: 使用图神经网络进行多阶段攻击检测和预测：一项物联网可行性研究

摘要: 随着现代生活各个方面对数字网络的日益依赖，确保其安全性已成为一项关键挑战。入侵检测系统在确保网络安全方面发挥着至关重要的作用，积极识别和减轻恶意行为。然而，网络威胁的不断进步使得传统/经典方法在应对攻击的复杂性和复杂性方面不足。本文提出了一个受洛克希德·马丁网络杀伤链简化版本启发的新颖的三阶段入侵检测系统，用于检测高级的多步攻击。所提出的方法包括三个模型，每个模型负责检测具有共同特征的攻击组。前两阶段的检测结果被用于对第三阶段的攻击预测可能性进行可行性研究。使用ToN IoT数据集，我们在不同阶段中实现了94%的平均F1分数，优于基于Random-forest模型的基准方法。最后，我们评论了这种方法在实际系统中集成的可行性，并提出了各种可能的未来工作。

更新时间: 2024-04-28 22:11:24

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2404.18328v1

A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation

In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the limited size and potential bias present in real-world datasets. This study presents and explores a generative augmentation framework of social network advertising data. Our framework explores three generative models for data augmentation - Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs) - to enrich data availability and diversity in the context of social network advertising analytics effectiveness. By performing synthetic extensions of the feature space, we find that through data augmentation, the performance of various classifiers has been quantitatively improved. Furthermore, we compare the relative performance gains brought by each data augmentation technique, providing insights for practitioners to select appropriate techniques to enhance model performance. This paper contributes to the literature by showing that synthetic data augmentation alleviates the limitations imposed by small or imbalanced datasets in the field of social network advertising. At the same time, this article also provides a comparative perspective on the practicality of different data augmentation methods, thereby guiding practitioners to choose appropriate techniques to enhance model performance.

Updated: 2024-04-28 22:00:53

标题: 通过数据增强增强社交网络广告预测的比较研究

摘要: 在不断演变的社交网络广告领域，数据的数量和准确性在预测模型的性能中起着至关重要的作用。然而，由于真实世界数据集的规模有限且可能存在偏见，鲁棒性预测算法的开发经常受到阻碍。本研究提出并探讨了一种社交网络广告数据生成增强框架。我们的框架探索了三种数据增强的生成模型 - 生成对抗网络（GANs）、变分自动编码器（VAEs）和高斯混合模型（GMMs）- 以丰富社交网络广告分析效果中的数据可用性和多样性。通过对特征空间进行合成扩展，我们发现通过数据增强，各种分类器的性能在数量上得到了提高。此外，我们比较了每种数据增强技术带来的相对性能增益，为从业者选择适当的技术来提升模型性能提供了见解。本文通过展示合成数据增强减轻了社交网络广告领域小型或不平衡数据集所施加的限制，为不同数据增强方法的实用性提供了比较视角，从而指导从业者选择适当的技术来提升模型性能。

更新时间: 2024-04-28 22:00:53

领域: cs.SI,cs.AI

下载: http://arxiv.org/abs/2404.13812v3

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies

While Deep Reinforcement Learning (DRL) has emerged as a promising solution for intricate control tasks, the lack of explainability of the learned policies impedes its uptake in safety-critical applications, such as automated driving systems (ADS). Counterfactual (CF) explanations have recently gained prominence for their ability to interpret black-box Deep Learning (DL) models. CF examples are associated with minimal changes in the input, resulting in a complementary output by the DL model. Finding such alternations, particularly for high-dimensional visual inputs, poses significant challenges. Besides, the temporal dependency introduced by the reliance of the DRL agent action on a history of past state observations further complicates the generation of CF examples. To address these challenges, we propose using a saliency map to identify the most influential input pixels across the sequence of past observed states by the agent. Then, we feed this map to a deep generative model, enabling the generation of plausible CFs with constrained modifications centred on the salient regions. We evaluate the effectiveness of our framework in diverse domains, including ADS, Atari Pong, Pacman and space-invaders games, using traditional performance metrics such as validity, proximity and sparsity. Experimental results demonstrate that this framework generates more informative and plausible CFs than the state-of-the-art for a wide range of environments and DRL agents. In order to foster research in this area, we have made our datasets and codes publicly available at https://github.com/Amir-Samadi/SAFE-RL.

Updated: 2024-04-28 21:47:34

标题: SAFE-RL：用于深度强化学习策略的关注度感知对抗解释器

摘要: 深度强化学习（DRL）作为复杂控制任务的一种有前景的解决方案，但学习策略的不可解释性阻碍了其在安全关键应用（如自动驾驶系统（ADS））中的应用。对于其能够解释黑盒深度学习（DL）模型的能力，对立事实（CF）解释最近变得备受关注。CF示例与输入的最小更改相关，从而导致DL模型的补充输出。特别是对于高维视觉输入，找到这种变化提出了重大挑战。此外，DRL代理动作依赖于过去状态观察历史引入的时间依赖性进一步复杂化了CF示例的生成。为了解决这些挑战，我们建议使用显著性地图来识别代理在过去观察到的状态序列中最具影响力的输入像素。然后，我们将这个映射输入到一个深度生成模型中，使其能够生成基于显著区域的受限修饰的合理CF。我们评估了我们的框架在包括ADS、Atari Pong、Pacman和太空入侵游戏在内的不同领域中的有效性，使用传统性能指标如有效性、接近性和稀疏性。实验结果表明，这个框架为各种环境和DRL代理生成比现有技术更具信息量和可信度的CF。为了促进该领域的研究，我们已经将我们的数据集和代码公开发布在https://github.com/Amir-Samadi/SAFE-RL。

更新时间: 2024-04-28 21:47:34

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.18326v1

Aligned Diffusion Schrödinger Bridges

Diffusion Schr\"odinger bridges (DSB) have recently emerged as a powerful framework for recovering stochastic dynamics via their marginal observations at different time points. Despite numerous successful applications, existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data, which naturally arises in many biological phenomena. In this paper, we propose a novel algorithmic framework that, for the first time, solves DSBs while respecting the data alignment. Our approach hinges on a combination of two decades-old ideas: The classical Schr\"odinger bridge theory and Doob's $h$-transform. Compared to prior methods, our approach leads to a simpler training procedure with lower variance, which we further augment with principled regularization schemes. This ultimately leads to sizeable improvements across experiments on synthetic and real data, including the tasks of predicting conformational changes in proteins and temporal evolution of cellular differentiation processes.

Updated: 2024-04-28 21:25:49

标题: 对齐扩散薛定谔桥

摘要: Diffusion Schr\"odinger bridges (DSB)最近出现为一种强大的框架，通过在不同时间点的边际观察来恢复随机动态。尽管存在许多成功的应用，但解决DSB的现有算法迄今为止未能利用数据对齐的结构，这在许多生物现象中自然产生。在本文中，我们提出了一个新颖的算法框架，首次解决了尊重数据对齐的DSB。我们的方法依赖于两个二十年前的思想的结合：经典的Schr\"odinger桥理论和Doob的$h$-变换。与先前的方法相比，我们的方法导致了具有更低方差的简化训练过程，我们进一步通过原则性的正则化方案进行增强。这最终导致在合成和真实数据的实验中取得了可观的改进，包括预测蛋白质的构象变化和细胞分化过程的时间演变等任务。

更新时间: 2024-04-28 21:25:49

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2302.11419v3

Federated Multilinear Principal Component Analysis with Applications in Prognostics

Multilinear Principal Component Analysis (MPCA) is a widely utilized method for the dimension reduction of tensor data. However, the integration of MPCA into federated learning remains unexplored in existing research. To tackle this gap, this article proposes a Federated Multilinear Principal Component Analysis (FMPCA) method, which enables multiple users to collaboratively reduce the dimension of their tensor data while keeping each user's data local and confidential. The proposed FMPCA method is guaranteed to have the same performance as traditional MPCA. An application of the proposed FMPCA in industrial prognostics is also demonstrated. Simulated data and a real-world data set are used to validate the performance of the proposed method.

Updated: 2024-04-28 21:21:45

标题: 联邦式多线性主成分分析在预测学中的应用

摘要: 多线性主成分分析（MPCA）是一种广泛应用于张量数据降维的方法。然而，将MPCA整合到联邦学习中在现有研究中尚未被探索。为了弥补这一空白，本文提出了一种联邦多线性主成分分析（FMPCA）方法，该方法使多个用户能够协作地降低其张量数据的维度，同时保持每个用户的数据局部和保密。提出的FMPCA方法被保证具有与传统MPCA相同的性能。本文还展示了提出的FMPCA在工业预测中的应用。模拟数据和真实数据集被用来验证提出方法的性能。

更新时间: 2024-04-28 21:21:45

领域: cs.LG,eess.IV,stat.ML

下载: http://arxiv.org/abs/2312.06050v2

Position paper: Do not explain (vision models) without context

Does the stethoscope in the picture make the adjacent person a doctor or a patient? This, of course, depends on the contextual relationship of the two objects. If it is obvious, why don not explanation methods for vision models use contextual information? In this paper, we (1) review the most popular methods of explaining computer vision models by pointing out that they do not take into account context information, (2) provide examples of real-world use cases where spatial context plays a significant role, (3) propose new research directions that may lead to better use of context information in explaining computer vision models, (4) argue that a change in approach to explanations is needed from 'where' to 'how'.

Updated: 2024-04-28 20:57:55

标题: 立场文件：不要在没有上下文的情况下解释（视觉模型）

摘要: 这张图片中的听诊器是否让旁边的人成为医生还是患者？这当然取决于这两个物体之间的上下文关系。如果很明显，为什么视觉模型的解释方法不使用上下文信息呢？本文回顾了最流行的计算机视觉模型解释方法，指出它们没有考虑到上下文信息，提供了现实世界应用案例，其中空间上下文起到重要作用，提出了可能导致更好利用上下文信息来解释计算机视觉模型的新研究方向，并主张需要从“在哪里”到“如何”的解释方法的改变。

更新时间: 2024-04-28 20:57:55

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.18316v1

DIRESA, a distance-preserving nonlinear dimension reduction technique based on regularized autoencoders

In meteorology, finding similar weather patterns or analogs in historical datasets can be useful for data assimilation, forecasting, and postprocessing. In climate science, analogs in historical and climate projection data are used for attribution and impact studies. However, most of the time, those large weather and climate datasets are nearline. They must be downloaded, which takes a lot of bandwidth and disk space, before the computationally expensive search can be executed. We propose a dimension reduction technique based on autoencoder (AE) neural networks to compress those datasets and perform the search in an interpretable, compressed latent space. A distance-regularized Siamese twin autoencoder (DIRESA) architecture is designed to preserve distance in latent space while capturing the nonlinearities in the datasets. Using conceptual climate models of different complexities, we show that the latent components thus obtained provide physical insight into the dominant modes of variability in the system. Compressing datasets with DIRESA reduces the online storage and keeps the latent components uncorrelated, while the distance (ordering) preservation and reconstruction fidelity robustly outperform Principal Component Analysis (PCA) and other dimension reduction techniques such as UMAP or variational autoencoders.

Updated: 2024-04-28 20:54:57

标题: DIRESA，一种基于正则化自编码器的保持距离的非线性降维技术

摘要: 在气象学中，寻找历史数据集中相似天气模式或类似物对于数据同化、预测和后处理是有用的。在气候科学中，使用历史和气候预测数据中的类似物进行归因和影响研究。然而，大多数时候，这些大型天气和气候数据集是在线的。在执行计算密集型搜索之前，它们必须被下载，这需要大量的带宽和磁盘空间。我们提出了一种基于自动编码器（AE）神经网络的降维技术，用于压缩这些数据集并在可解释的压缩潜在空间中执行搜索。设计了一个距离正则化的连体孪生自动编码器（DIRESA）架构，以在潜在空间中保持距离同时捕捉数据集中的非线性。使用不同复杂性的概念性气候模型，我们展示了因此获得的潜在组件提供了对系统中变异主导模式的物理洞察力。使用DIRESA压缩数据集可以减少在线存储并保持潜在组件不相关，而距离（排序）保持和重建准确性表现稳健地优于主成分分析（PCA）和其他降维技术，如UMAP或变分自动编码器。

更新时间: 2024-04-28 20:54:57

领域: cs.LG,nlin.CD,physics.ao-ph

下载: http://arxiv.org/abs/2404.18314v1

Trends and Challenges of Real-time Learning in Large Language Models: A Critical Review

Real-time learning concerns the ability of learning systems to acquire knowledge over time, enabling their adaptation and generalization to novel tasks. It is a critical ability for intelligent, real-world systems, especially when data may be insufficient or difficult to obtain. This review provides a comprehensive analysis of real-time learning in Large Language Models. It synthesizes the state-of-the-art real-time learning paradigms, including continual learning, meta-learning, parameter-efficient learning, and mixture-of-experts learning. We demonstrate their utility for real-time learning by describing specific achievements from these related topics and their critical factors. Finally, the paper highlights current problems and challenges for future research in the field. By consolidating the latest relevant research developments, this review offers a comprehensive understanding of real-time learning and its implications for designing and developing LLM-based learning systems addressing real-world problems.

Updated: 2024-04-28 20:44:53

标题: 大语言模型实时学习的趋势和挑战：关键评论

摘要: 实时学习关注学习系统随时间获取知识的能力，使其能够适应和泛化到新颖的任务。这对于智能的现实世界系统尤为关键，特别是在数据可能不足或难以获取的情况下。本综述提供了对大型语言模型中实时学习的全面分析。它综合了最新的实时学习范式，包括持续学习、元学习、参数高效学习和专家混合学习。我们通过描述这些相关主题的具体成就及其关键因素来展示它们在实时学习中的实用性。最后，本文强调了当前领域未来研究中的问题和挑战。通过整合最新的相关研究进展，本综述提供了对实时学习及其对设计和开发解决现实问题的基于LLM的学习系统的意义的全面理解。

更新时间: 2024-04-28 20:44:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.18311v1

Near-Term Enforcement of AI Chip Export Controls Using A Minimal Firmware-Based Design for Offline Licensing

Offline licensing is a technical mechanism for compute governance that could be used to prevent unregulated training of potentially dangerous frontier AI models. The mechanism works by disabling AI chips unless they have an up-to-date license from a regulator. In this report, we present a technical design for a minimal version of offline licensing that could be delivered via a firmware update. Existing AI chips could potentially support offline licensing within a year if they have the following (relatively common) hardware security features: firmware verification, firmware rollback protection, and secure non-volatile memory. Public documentation suggests that NVIDIA's H100 AI chip already has these security features. Without additional hardware modifications, the system is susceptible to physical hardware attacks. However, these attacks might require expensive equipment and could be difficult to reliably apply to thousands of AI chips. A firmware-based offline licensing design shares the same legal requirements and license approval mechanism as a hardware-based solution. Implementing a firmware-based solution now could accelerate the eventual deployment of a more secure hardware-based solution in the future. For AI chip manufacturers, implementing this security mechanism might allow chips to be sold to customers that would otherwise be prohibited by export restrictions. For governments, it may be important to be able to prevent unsafe or malicious actors from training frontier AI models in the next few years. Based on this initial analysis, firmware-based offline licensing could partially solve urgent security and trade problems and is technically feasible for AI chips that have common hardware security features.

Updated: 2024-04-28 20:38:20

标题: 使用基于最小固件设计的离线许可证，近期执行AI芯片出口管制

摘要: 离线许可是一种计算治理的技术机制，可用于防止未经监管的潜在危险前沿人工智能模型的训练。该机制通过禁用人工智能芯片，除非它们具有来自监管机构的最新许可证，来起作用。在本报告中，我们提出了一个离线许可的最小版本的技术设计，可以通过固件更新交付。现有的人工智能芯片如果具有以下（相对常见的）硬件安全特性：固件验证、固件回滚保护和安全非易失性存储器，可能在一年内支持离线许可。公开文档显示，英伟达的H100人工智能芯片已经具备这些安全特性。没有额外的硬件修改，系统容易受到物理硬件攻击。然而，这些攻击可能需要昂贵的设备，并且可能难以可靠地应用于成千上万个人工智能芯片。基于固件的离线许可设计与基于硬件的解决方案具有相同的法律要求和许可审批机制。现在实施基于固件的解决方案可能会加速将来部署更安全的基于硬件的解决方案。对于人工智能芯片制造商来说，实施这种安全机制可能允许芯片销售给否则会受到出口限制的客户。对于政府来说，防止不安全或恶意行为者在未来几年训练前沿人工智能模型可能是重要的。基于这一初步分析，基于固件的离线许可可以部分解决紧迫的安全和贸易问题，并对具有常见硬件安全特性的人工智能芯片来说，在技术上是可行的。

更新时间: 2024-04-28 20:38:20

领域: cs.CR,cs.CY

下载: http://arxiv.org/abs/2404.18308v1

Retrieval-Oriented Knowledge for Click-Through Rate Prediction

Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-and-play Retrieval-Oriented Knowledge (ROK) framework. Specifically, a knowledge base, consisting of a retrieval-oriented embedding layer and a knowledge encoder, is designed to preserve and imitate the retrieved & aggregated representations in a decomposition-reconstruction paradigm. Knowledge distillation and contrastive learning methods are utilized to optimize the knowledge base, and the learned retrieval-enhanced representations can be integrated with arbitrary CTR models in both instance-wise and feature-wise manners. Extensive experiments on three large-scale datasets show that ROK achieves competitive performance with the retrieval-based CTR models while reserving superior inference efficiency and model compatibility.

Updated: 2024-04-28 20:21:03

标题: 点击率预测的检索导向知识

摘要: 点击率（CTR）预测在个性化推荐中起着重要作用。最近，基于样本级别的检索模型（如RIM）通过检索和聚合相关样本取得了显著的性能。然而，它们在推理阶段的低效使它们在工业应用中不切实际。为了克服这个问题，本文提出了一个通用的即插即用的检索导向知识（ROK）框架。具体地，一个包含检索导向嵌入层和知识编码器的知识库被设计用来在分解-重构范式中保存和模仿检索和聚合表示。知识蒸馏和对比学习方法被用来优化知识库，学到的检索增强表示可以以实例和特征方式与任意CTR模型集成。在三个大规模数据集上的大量实验表明，ROK在保留卓越推理效率和模型兼容性的同时，实现了与基于检索的CTR模型竞争性能。

更新时间: 2024-04-28 20:21:03

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2404.18304v1

Deciphering Heartbeat Signatures: A Vision Transformer Approach to Explainable Atrial Fibrillation Detection from ECG Signals

Remote patient monitoring based on wearable single-lead electrocardiogram (ECG) devices has significant potential for enabling the early detection of heart disease, especially in combination with artificial intelligence (AI) approaches for automated heart disease detection. There have been prior studies applying AI approaches based on deep learning for heart disease detection. However, these models are yet to be widely accepted as a reliable aid for clinical diagnostics, in part due to the current black-box perception surrounding many AI algorithms. In particular, there is a need to identify the key features of the ECG signal that contribute toward making an accurate diagnosis, thereby enhancing the interpretability of the model. In the present study, we develop a vision transformer approach to identify atrial fibrillation based on single-lead ECG data. A residual network (ResNet) approach is also developed for comparison with the vision transformer approach. These models are applied to the Chapman-Shaoxing dataset to classify atrial fibrillation, as well as another common arrhythmia, sinus bradycardia, and normal sinus rhythm heartbeats. The models enable the identification of the key regions of the heartbeat that determine the resulting classification, and highlight the importance of P-waves and T-waves, as well as heartbeat duration and signal amplitude, in distinguishing normal sinus rhythm from atrial fibrillation and sinus bradycardia.

Updated: 2024-04-28 20:05:45

标题: 破译心跳特征：一种视觉转换器方法解释心房颤动的可解释性检测从心电图信号中

摘要: 基于可穿戴单导联心电图（ECG）设备的远程患者监测具有显著的潜力，可实现早期检测心脏疾病，特别是与人工智能（AI）方法相结合用于自动心脏疾病检测。先前的研究已应用基于深度学习的AI方法进行心脏疾病检测。然而，这些模型尚未被广泛接受作为临床诊断的可靠辅助工具，部分原因是目前对许多AI算法的黑匣子感知。特别是，有必要确定对准确诊断有所贡献的ECG信号的关键特征，从而增强模型的可解释性。在本研究中，我们开发了一种视觉变换器方法，以单导联ECG数据为基础识别房颤。还开发了一个残差网络（ResNet）方法，用于与视觉变换器方法进行比较。这些模型应用于查普曼-绍兴数据集，用于分类房颤，以及另一种常见心律失常，窦性心动过缓和正常窦性心律心搏。这些模型使得能够确定决定最终分类的心搏的关键区域，并突显了P波和T波的重要性，以及心搏持续时间和信号振幅，在区分正常窦性心律与房颤和窦性心动过缓方面的重要性。

更新时间: 2024-04-28 20:05:45

领域: eess.SP,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.09474v2

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

Updated: 2024-04-28 20:05:02

标题: 通过蝴蝶分解实现参数高效的正交微调

摘要: 大型基础模型变得无处不在，但从头开始训练它们成本过高。因此，有效地将这些强大的模型调整到下游任务变得越来越重要。在本文中，我们研究了一种原则性的微调范式--正交微调（OFT）--用于下游任务适应。尽管展示出良好的泛化能力，OFT仍然使用了相当数量的可训练参数，这是由于正交矩阵的高维度造成的。为了解决这个问题，我们首先从信息传输的角度审视OFT，然后确定了几个能够实现更好参数效率的关键期望。受到库利-图基快速傅立叶变换算法如何实现高效信息传输的启发，我们提出了一种使用蝴蝶结构的高效正交参数化。我们将这种参数化应用于OFT，创建了一种新颖的参数高效微调方法，称为正交蝴蝶（BOFT）。通过将OFT纳入其中作为一种特殊情况，BOFT引入了一个广义的正交微调框架。最后，我们进行了一项广泛的实证研究，将大型视觉变换器、大型语言模型和文本到图像扩散模型调整到各种视觉和语言下游任务中。

更新时间: 2024-04-28 20:05:02

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2311.06243v2

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes

This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent's engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of $\tilde{\mathcal{O}}(1)$, a significant improvement over the $\tilde{\mathcal{O}}(\sqrt{T})$ bound exhibited by classical counterparts.

Updated: 2024-04-28 20:04:52

标题: 无限时域平均奖励马尔可夫决策过程中的后悔分析中的量子加速

摘要: 本文研究了量子加速在解决无限时间视角马尔可夫决策过程(MDPs)中提高平均奖励结果的潜力。我们引入了一种创新的量子框架，用于代理与未知MDP的互动，扩展了传统的交互范式。我们的方法涉及设计一种乐观驱动的表格强化学习算法，利用代理通过高效的量子均值估计技术获取的量子信号。通过彻底的理论分析，我们证明了量子优势在均值估计中导致无限时间强化学习中遗憾保证的指数级进展。具体而言，所提出的量子算法实现了一个遗憾边界为$\tilde{\mathcal{O}}(1)$，明显优于经典对应物所展示的$\tilde{\mathcal{O}}(\sqrt{T})$边界。

更新时间: 2024-04-28 20:04:52

领域: cs.LG,cs.AI,quant-ph

下载: http://arxiv.org/abs/2310.11684v3

Using Deep Q-Learning to Dynamically Toggle between Push/Pull Actions in Computational Trust Mechanisms

Recent work on decentralized computational trust models for open Multi Agent Systems has resulted in the development of CA, a biologically inspired model which focuses on the trustee's perspective. This new model addresses a serious unresolved problem in existing trust and reputation models, namely the inability to handle constantly changing behaviors and agents' continuous entry and exit from the system. In previous work, we compared CA to FIRE, a well-known trust and reputation model, and found that CA is superior when the trustor population changes, whereas FIRE is more resilient to the trustee population changes. Thus, in this paper, we investigate how the trustors can detect the presence of several dynamic factors in their environment and then decide which trust model to employ in order to maximize utility. We frame this problem as a machine learning problem in a partially observable environment, where the presence of several dynamic factors is not known to the trustor and we describe how an adaptable trustor can rely on a few measurable features so as to assess the current state of the environment and then use Deep Q Learning (DQN), in a single-agent Reinforcement Learning setting, to learn how to adapt to a changing environment. We ran a series of simulation experiments to compare the performance of the adaptable trustor with the performance of trustors using only one model (FIRE or CA) and we show that an adaptable agent is indeed capable of learning when to use each model and, thus, perform consistently in dynamic environments.

Updated: 2024-04-28 19:44:56

标题: 使用深度Q学习在计算信任机制中动态切换推/拉动作

摘要: 最近关于开放多智能体系统的去中心化计算信任模型的研究已经导致了CA的开发，这是一个受生物启发的模型，侧重于受托人的观点。这种新模型解决了现有信任和声誉模型中一个严重未解决的问题，即无法处理不断变化的行为和代理人对系统的持续进入和退出。在先前的工作中，我们将CA与FIRE进行了比较，FIRE是一个众所周知的信任和声誉模型，发现当信任人口发生变化时，CA更优越，而FIRE对受托人口的变化更具弹性。因此，在本文中，我们调查了信任者如何能够检测其环境中的几个动态因素的存在，然后决定使用哪种信任模型以最大化效用。我们将这个问题框架化为一个部分可观测环境中的机器学习问题，其中信任者并不知道几个动态因素的存在，并且我们描述了一个适应性信任者如何依赖于一些可测量的特征来评估环境的当前状态，然后使用Deep Q Learning（DQN）在单一智能体强化学习设置中学习如何适应不断变化的环境。我们进行了一系列模拟实验来比较适应性信任者的性能与仅使用一个模型（FIRE或CA）的信任者的性能，并且我们展示了适应性代理确实能够学会何时使用每种模型，从而在动态环境中表现稳定。

更新时间: 2024-04-28 19:44:56

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.18296v1

Panoptic Segmentation and Labelling of Lumbar Spine Vertebrae using Modified Attention Unet

Segmentation and labeling of vertebrae in MRI images of the spine are critical for the diagnosis of illnesses and abnormalities. These steps are indispensable as MRI technology provides detailed information about the tissue structure of the spine. Both supervised and unsupervised segmentation methods exist, yet acquiring sufficient data remains challenging for achieving high accuracy. In this study, we propose an enhancing approach based on modified attention U-Net architecture for panoptic segmentation of 3D sliced MRI data of the lumbar spine. Our method achieves an impressive accuracy of 99.5\% by incorporating novel masking logic, thus significantly advancing the state-of-the-art in vertebral segmentation and labeling. This contributes to more precise and reliable diagnosis and treatment planning.

Updated: 2024-04-28 19:35:00

标题: 全视分割和标记腰椎椎骨使用改进的注意力Unet

摘要: 脊柱MRI图像中椎骨的分割和标记对于诊断疾病和异常是至关重要的。由于MRI技术提供了有关脊柱组织结构的详细信息，因此这些步骤是必不可少的。存在监督和无监督分割方法，但是获取足够的数据仍然具有挑战性，以实现高准确性。在这项研究中，我们提出了一种基于修改的关注U-Net架构的增强方法，用于腰椎的3D切片MRI数据的全景分割。我们的方法通过整合新颖的遮罩逻辑，实现了99.5\%的令人印象深刻的准确性，从而显着推进了椎骨分割和标记的最新技术。这有助于更精确和可靠的诊断和治疗规划。

更新时间: 2024-04-28 19:35:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.18291v1

Polynomial Semantics of Tractable Probabilistic Circuits

Probabilistic circuits compute multilinear polynomials that represent multivariate probability distributions. They are tractable models that support efficient marginal inference. However, various polynomial semantics have been considered in the literature (e.g., network polynomials, likelihood polynomials, generating functions, and Fourier transforms). The relationships between circuit representations of these polynomial encodings of distributions is largely unknown. In this paper, we prove that for distributions over binary variables, each of these probabilistic circuit models is equivalent in the sense that any circuit for one of them can be transformed into a circuit for any of the others with only a polynomial increase in size. They are therefore all tractable for marginal inference on the same class of distributions. Finally, we explore the natural extension of one such polynomial semantics, called probabilistic generating circuits, to categorical random variables, and establish that inference becomes #P-hard.

Updated: 2024-04-28 19:34:38

标题: 可处理概率电路的多项式语义

摘要: 概率电路计算代表多变量概率分布的多线性多项式。它们是可处理的模型，支持高效的边缘推断。然而，在文献中已经考虑了各种多项式语义（例如，网络多项式，似然多项式，生成函数和傅立叶变换）。这些多项式编码分布的电路表示之间的关系在很大程度上是未知的。在本文中，我们证明对于二进制变量的分布，这些概率电路模型中的每一个在某种意义上是等价的，即其中任何一个的电路都可以转换成另一个的电路，其大小仅仅是多项式级别的增加。因此，它们对于相同类别的分布的边缘推断都是可处理的。最后，我们探讨了一种多项式语义的自然扩展，称为概率生成电路，用于分类随机变量，并确定推断变得#P难。

更新时间: 2024-04-28 19:34:38

领域: cs.AI

下载: http://arxiv.org/abs/2402.09085v2

Joint Energy and Latency Optimization in Federated Learning over Cell-Free Massive MIMO Networks

Federated learning (FL) is a distributed learning paradigm wherein users exchange FL models with a server instead of raw datasets, thereby preserving data privacy and reducing communication overhead. However, the increased number of FL users may hinder completing large-scale FL over wireless networks due to high imposed latency. Cell-free massive multiple-input multiple-output~(CFmMIMO) is a promising architecture for implementing FL because it serves many users on the same time/frequency resources. While CFmMIMO enhances energy efficiency through spatial multiplexing and collaborative beamforming, it remains crucial to meticulously allocate uplink transmission powers to the FL users. In this paper, we propose an uplink power allocation scheme in FL over CFmMIMO by considering the effect of each user's power on the energy and latency of other users to jointly minimize the users' uplink energy and the latency of FL training. The proposed solution algorithm is based on the coordinate gradient descent method. Numerical results show that our proposed method outperforms the well-known max-sum rate by increasing up to~$27$\% and max-min energy efficiency of the Dinkelbach method by increasing up to~$21$\% in terms of test accuracy while having limited uplink energy and latency budget for FL over CFmMIMO.

Updated: 2024-04-28 19:24:58

标题: 在无蜂窝大规模MIMO网络上联合能量和延迟优化的联邦学习

摘要: 联邦学习（FL）是一种分布式学习范式，用户通过与服务器交换FL模型而不是原始数据集，从而保护数据隐私并减少通信开销。然而，增加的FL用户数量可能会由于高延迟而阻碍在无线网络上完成大规模FL。无蜂窝大规模多输入多输出（CFmMIMO）是一种有前途的架构，用于实现FL，因为它可以在相同的时/频资源上为多个用户提供服务。虽然CFmMIMO通过空间复用和协作波束成形提高了能量效率，但仍然至关重要的是要精心分配上行传输功率给FL用户。本文提出了一种在CFmMIMO上的FL上行功率分配方案，考虑了每个用户的功率对其他用户的能量和延迟的影响，共同最小化用户的上行能量和FL训练的延迟。所提出的解决方案算法基于坐标梯度下降法。数值结果表明，我们提出的方法在测试准确率方面优于众所周知的最大和速率，最高提高了27％，并且在FL在CFmMIMO上具有有限的上行能量和延迟预算时，Dinkelbach方法的最大最小能量效率提高了21％。

更新时间: 2024-04-28 19:24:58

领域: cs.LG

下载: http://arxiv.org/abs/2404.18287v1

Smishing Dataset I: Phishing SMS Dataset from Smishtank.com

While smishing (SMS Phishing) attacks have risen to become one of the most common types of social engineering attacks, there is a lack of relevant smishing datasets. One of the biggest challenges in the domain of smishing prevention is the availability of fresh smishing datasets. Additionally, as time persists, smishing campaigns are shut down and the crucial information related to the attack are lost. With the changing nature of smishing attacks, a consistent flow of new smishing examples is needed by both researchers and engineers to create effective defenses. In this paper, we present the community-sourced smishing datasets from the smishtank.com. It provides a wealth of information relevant to combating smishing attacks through the breakdown and analysis of smishing samples at the point of submission. In the contribution of our work, we provide a corpus of 1090 smishing samples that have been publicly submitted through the site. Each message includes information relating to the sender, message body, and any brands referenced in the message. Additionally, when a URL is found, we provide additional information on the domain, VirusTotal results, and a characterization of the URL. Through the open access of fresh smishing data, we empower academia and industries to create robust defenses against this evolving threat.

Updated: 2024-04-28 19:12:53

标题: Smishing数据集I：来自Smishtank.com的钓鱼短信数据集

摘要: 尽管短信钓鱼（SMS钓鱼）攻击已经成为最常见的社会工程攻击之一，但相关的短信钓鱼数据集仍然缺乏。在短信钓鱼预防领域面临的最大挑战之一是缺乏新鲜的短信钓鱼数据集。此外，随着时间的推移，短信钓鱼活动被关闭，与攻击相关的关键信息也会丢失。随着短信钓鱼攻击性质的变化，研究人员和工程师需要持续获取新的短信钓鱼样本，以创建有效的防御措施。在本文中，我们介绍了来自smishtank.com的社区源短信钓鱼数据集。该数据集通过对提交时的短信样本进行拆分和分析，提供了大量与打击短信钓鱼攻击相关的信息。在我们的工作贡献中，我们提供了一个包含1090个短信钓鱼样本的语料库，这些样本是通过该网站公开提交的。每条消息包括有关发件人、消息正文和消息中提及的任何品牌的信息。此外，当发现URL时，我们还提供有关域名、VirusTotal结果和URL特征的额外信息。通过开放获取新鲜的短信钓鱼数据，我们赋予学术界和工业界创造针对这种不断演变的威胁的强大防御措施的能力。

更新时间: 2024-04-28 19:12:53

领域: cs.CR

下载: http://arxiv.org/abs/2402.18430v2

Improve Academic Query Resolution through BERT-based Question Extraction from Images

Providing fast and accurate resolution to the student's query is an essential solution provided by Edtech organizations. This is generally provided with a chat-bot like interface to enable students to ask their doubts easily. One preferred format for student queries is images, as it allows students to capture and post questions without typing complex equations and information. However, this format also presents difficulties, as images may contain multiple questions or textual noise that lowers the accuracy of existing single-query answering solutions. In this paper, we propose a method for extracting questions from text or images using a BERT-based deep learning model and compare it to the other rule-based and layout-based methods. Our method aims to improve the accuracy and efficiency of student query resolution in Edtech organizations.

Updated: 2024-04-28 19:11:08

标题: 通过基于BERT的图像问题提取改进学术查询解决方案

摘要: 提供快速准确地解决学生问题是教育科技机构提供的一种重要解决方案。通常通过类似于聊天机器人的界面来实现，以使学生能够轻松提出疑问。学生查询的首选格式之一是图像，因为它允许学生在不输入复杂方程式和信息的情况下捕捉并发布问题。然而，这种格式也存在困难，因为图像可能包含多个问题或文本噪音，降低了现有单一查询回答解决方案的准确性。在本文中，我们提出了一种使用基于BERT的深度学习模型从文本或图像中提取问题的方法，并将其与其他基于规则和布局的方法进行比较。我们的方法旨在提高教育科技机构中学生问题解决的准确性和效率。

更新时间: 2024-04-28 19:11:08

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.01587v1

Commercial Anti-Smishing Tools and Their Comparative Effectiveness Against Modern Threats

Smishing, also known as SMS phishing, is a type of fraudulent communication in which an attacker disguises SMS communications to deceive a target into providing their sensitive data. Smishing attacks use a variety of tactics; however, they have a similar goal of stealing money or personally identifying information (PII) from a victim. In response to these attacks, a wide variety of anti-smishing tools have been developed to block or filter these communications. Despite this, the number of phishing attacks continue to rise. In this paper, we developed a test bed for measuring the effectiveness of popular anti-smishing tools against fresh smishing attacks. To collect fresh smishing data, we introduce Smishtank.com, a collaborative online resource for reporting and collecting smishing data sets. The SMS messages were validated by a security expert and an in-depth qualitative analysis was performed on the collected messages to provide further insights. To compare tool effectiveness, we experimented with 20 smishing and benign messages across 3 key segments of the SMS messaging delivery ecosystem. Our results revealed significant room for improvement in all 3 areas against our smishing set. Most anti-phishing apps and bulk messaging services didn't filter smishing messages beyond the carrier blocking. The 2 apps that blocked the most smish also blocked 85-100\% of benign messages. Finally, while carriers did not block any benign messages, they were only able to reach a 25-35\% blocking rate for smishing messages. Our work provides insights into the performance of anti-smishing tools and the roles they play in the message blocking process. This paper would enable the research community and industry to be better informed on the current state of anti-smishing technology on the SMS platform.

Updated: 2024-04-28 19:06:50

标题: 商业反短信欺诈工具及其对现代威胁的比较有效性

摘要: Smishing，也称为短信钓鱼，是一种欺诈性通信，攻击者伪装短信通信，欺骗目标提供其敏感数据。Smishing攻击使用各种策略，但它们有一个相似的目标，即从受害者那里窃取钱财或个人身份信息（PII）。针对这些攻击，已经开发了各种反smishing工具，用于阻止或过滤这些通信。尽管如此，钓鱼攻击数量仍在上升。在本文中，我们开发了一个测试平台，用于测量流行的反smishing工具对新的smishing攻击的有效性。为了收集新的smishing数据，我们引入了Smishtank.com，这是一个协作在线资源，用于报告和收集smishing数据集。短信消息经过安全专家验证，并对收集的消息进行了深入的定性分析，以提供进一步的见解。为了比较工具的有效性，我们在短信消息传递生态系统的3个关键部分中实验了20条smishing和良性消息。我们的结果显示，在我们的smishing集合中，所有3个领域都有显着的改进空间。大多数反钓鱼应用程序和批量消息服务在运营商阻止之外并没有过滤smishing消息。最有效地阻止smish的2个应用程序也阻止了85-100％的良性消息。最后，虽然运营商没有阻止任何良性消息，但他们只能达到25-35％的smishing消息阻止率。我们的工作提供了有关反smishing工具性能及其在消息阻止过程中发挥的作用的见解。本文将使研究界和产业界更好地了解短信平台上反smishing技术的当前状态。

更新时间: 2024-04-28 19:06:50

领域: cs.CR

下载: http://arxiv.org/abs/2309.07447v2

Bias Neutralization Framework: Measuring Fairness in Large Language Models with Bias Intelligence Quotient (BiQ)

The burgeoning influence of Large Language Models (LLMs) in shaping public discourse and decision-making underscores the imperative to address inherent biases within these AI systems. In the wake of AI's expansive integration across sectors, addressing racial bias in LLMs has never been more critical. This paper introduces a novel framework called Comprehensive Bias Neutralization Framework (CBNF) which embodies an innovative approach to quantifying and mitigating biases within LLMs. Our framework combines the Large Language Model Bias Index (LLMBI) [Oketunji, A., Anas, M., Saina, D., (2023)] and Bias removaL with No Demographics (BLIND) [Orgad, H., Belinkov, Y. (2023)] methodologies to create a new metric called Bias Intelligence Quotient (BiQ)which detects, measures, and mitigates racial bias in LLMs without reliance on demographic annotations. By introducing a new metric called BiQ that enhances LLMBI with additional fairness metrics, CBNF offers a multi-dimensional metric for bias assessment, underscoring the necessity of a nuanced approach to fairness in AI [Mehrabi et al., 2021]. This paper presents a detailed analysis of Latimer AI (a language model incrementally trained on black history and culture) in comparison to ChatGPT 3.5, illustrating Latimer AI's efficacy in detecting racial, cultural, and gender biases through targeted training and refined bias mitigation strategies [Latimer & Bender, 2023].

Updated: 2024-04-28 18:47:14

标题: 偏见中和框架：使用偏见智商（BiQ）在大型语言模型中衡量公平性

摘要: 随着大型语言模型（LLMs）在塑造公共话语和决策制定中的日益增长的影响，迫切需要解决这些人工智能系统内在偏见的问题。在人工智能跨行业广泛整合的背景下，解决LLMs中的种族偏见变得更加关键。本文介绍了一个名为全面偏见中和框架（CBNF）的新框架，它采用创新方法来量化和减少LLMs内部的偏见。我们的框架结合了大型语言模型偏见指数（LLMBI）[Oketunji, A., Anas, M., Saina, D.,（2023年）]和不带人口统计信息的偏见去除（BLIND）[Orgad, H., Belinkov, Y.（2023）]方法，创建了一个称为偏见智商（BiQ）的新指标，该指标可以在LLMs中检测、衡量和减轻种族偏见，无需依赖人口统计注释。通过引入一个名为BiQ的新指标，该指标通过增加公平度指标来增强LLMBI，CBNF提供了一个多维度度量标准来评估偏见，强调了在人工智能中对公平性采取细致方法的必要性[Mehrabi等人，2021]。本文通过对比Latimer AI（一个在黑人历史和文化上逐步训练的语言模型）和ChatGPT 3.5，详细分析了Latimer AI在通过有针对性的训练和精细化的偏见减轻策略来检测种族、文化和性别偏见方面的有效性[Latimer＆Bender，2023]。

更新时间: 2024-04-28 18:47:14

领域: cs.CL,cs.AI,D.1; I.2

下载: http://arxiv.org/abs/2404.18276v1

Kernel Corrector LSTM

Forecasting methods are affected by data quality issues in two ways: 1. they are hard to predict, and 2. they may affect the model negatively when it is updated with new data. The latter issue is usually addressed by pre-processing the data to remove those issues. An alternative approach has recently been proposed, Corrector LSTM (cLSTM), which is a Read \& Write Machine Learning (RW-ML) algorithm that changes the data while learning to improve its predictions. Despite promising results being reported, cLSTM is computationally expensive, as it uses a meta-learner to monitor the hidden states of the LSTM. We propose a new RW-ML algorithm, Kernel Corrector LSTM (KcLSTM), that replaces the meta-learner of cLSTM with a simpler method: Kernel Smoothing. We empirically evaluate the forecasting accuracy and the training time of the new algorithm and compare it with cLSTM and LSTM. Results indicate that it is able to decrease the training time while maintaining a competitive forecasting accuracy.

Updated: 2024-04-28 18:44:10

标题: 核心矫正器LSTM

摘要: 预测方法受到数据质量问题的影响，主要体现在两个方面：1. 预测困难，2. 当模型用新数据更新时可能会对模型产生负面影响。后者通常通过对数据进行预处理来解决这些问题。最近提出了一种替代方法，即修正器LSTM（cLSTM），它是一种读取和写入机器学习（RW-ML）算法，通过改变数据来改善预测能力。尽管报道了有希望的结果，但cLSTM在计算方面较为昂贵，因为它使用元学习器来监视LSTM的隐藏状态。我们提出了一种新的RW-ML算法，核心修正器LSTM（KcLSTM），它用核平滑替换了cLSTM的元学习器。我们对新算法的预测准确性和训练时间进行了经验评估，并与cLSTM和LSTM进行了比较。结果表明，KcLSTM能够减少训练时间，同时保持竞争力的预测准确性。

更新时间: 2024-04-28 18:44:10

领域: cs.LG

下载: http://arxiv.org/abs/2404.18273v1

Graph Anomaly Detection in Time Series: A Survey

With the recent advances in technology, a wide range of systems continue to collect a large amount of data over time and thus generate time series. Time-Series Anomaly Detection (TSAD) is an important task in various time-series applications such as e-commerce, cybersecurity, vehicle maintenance, and healthcare monitoring. However, this task is very challenging as it requires considering both the intra-variable dependency and the inter-variable dependency, where a variable can be defined as an observation in time-series data. Recent graph-based approaches have made impressive progress in tackling the challenges of this field. In this survey, we conduct a comprehensive and up-to-date review of TSAD using graphs, referred to as G-TSAD. First, we explore the significant potential of graph representation learning for time-series data. Then, we review state-of-the-art graph anomaly detection techniques in the context of time series and discuss their strengths and drawbacks. Finally, we discuss the technical challenges and potential future directions for possible improvements in this research field.

Updated: 2024-04-28 18:43:33

标题: 时间序列中的图异常检测：一项调查

摘要: 随着技术的不断进步，各种系统继续收集大量数据，并因此生成时间序列。时间序列异常检测（TSAD）是各种时间序列应用中的重要任务，如电子商务、网络安全、车辆维护和医疗监测。然而，这个任务非常具有挑战性，因为它需要考虑变量内部依赖性和变量间依赖性，其中变量可以定义为时间序列数据中的观测值。最近基于图的方法在解决这一领域的挑战方面取得了显著进展。在这项调查中，我们通过使用图形进行TSAD的全面和最新审查，即G-TSAD。首先，我们探讨了图形表示学习在时间序列数据中的重要潜力。然后，我们审查了在时间序列环境中的最新图形异常检测技术，并讨论了它们的优势和缺点。最后，我们讨论了这一研究领域可能改进的技术挑战和未来发展方向。

更新时间: 2024-04-28 18:43:33

领域: cs.LG

下载: http://arxiv.org/abs/2302.00058v4

Parameter-Efficient Tuning Large Language Models for Graph Representation Learning

Text-rich graphs, which exhibit rich textual information on nodes and edges, are prevalent across a wide range of real-world business applications. Large Language Models (LLMs) have demonstrated remarkable abilities in understanding text, which also introduced the potential for more expressive modeling in text-rich graphs. Despite these capabilities, efficiently applying LLMs to representation learning on graphs presents significant challenges. Recently, parameter-efficient fine-tuning methods for LLMs have enabled efficient new task generalization with minimal time and memory consumption. Inspired by this, we introduce Graph-aware Parameter-Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning with LLMs on text-rich graphs. Specifically, we utilize a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt. This prompt is then inserted at the beginning of the text sequence. To improve the quality of graph prompts, we pre-trained the GNN to assist the frozen LLM in predicting the next token in the node text. Compared with existing joint GNN and LMs, our method directly generate the node embeddings from large language models with an affordable fine-tuning cost. We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations. Our results demonstrate the efficacy and efficiency of our model, showing that it can be smoothly integrated with various large language models, including OPT, LLaMA and Falcon.

Updated: 2024-04-28 18:36:59

标题: 高效调参大型语言模型用于图表示学习

摘要: Text-rich graphs在现实世界的商业应用中广泛存在，节点和边上展示了丰富的文本信息。大型语言模型(LLMs)展示了在理解文本方面的显著能力，也引入了在text-rich graphs中更具表现力建模的潜力。尽管具备这些能力，有效地将LLMs应用于图表示学习仍然面临着重大挑战。最近，适用于LLMs的参数高效调整方法使得在最小时间和内存消耗下实现了新任务泛化的高效性。受此启发，我们引入了一种新颖的方法——Graph-aware Parameter-Efficient Fine-Tuning (GPEFT)，用于在text-rich graphs上使用LLMs进行高效的图表示学习。具体来说，我们利用图神经网络(GNN)将邻居节点的结构信息编码成一个图提示。然后将该提示插入到文本序列的开头。为了提高图提示的质量，我们对GNN进行了预训练，以帮助冻结的LLM在预测节点文本中的下一个标记。与现有的联合GNN和LMs相比，我们的方法直接从大型语言模型中生成节点嵌入，成本可承受。我们通过在8个不同的text-rich graphs上进行的综合实验验证了我们的方法，在链接预测评估中观察到hit@1和Mean Reciprocal Rank (MRR)的平均改进为2%。我们的结果证明了我们模型的有效性和高效性，显示出它可以顺利地与各种大型语言模型集成，包括OPT、LLaMA和Falcon。

更新时间: 2024-04-28 18:36:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.18271v1

Pragmatic Formal Verification of Sequential Error Detection and Correction Codes (ECCs) used in Safety-Critical Design

Error Detection and Correction Codes (ECCs) are often used in digital designs to protect data integrity. Especially in safety-critical systems such as automotive electronics, ECCs are widely used and the verification of such complex logic becomes more critical considering the ISO 26262 safety standards. Exhaustive verification of ECC using formal methods has been a challenge given the high number of data bits to protect. As an example, for an ECC of 128 data bits with a possibility to detect up to four-bit errors, the combination of bit errors is given by 128C1 + 128C2 + 128C3 + 128C4 = 1.1 * 10^7. This vast analysis space often leads to bounded proof results. Moreover, the complexity and state-space increase further if the ECC has sequential encoding and decoding stages. To overcome such problems and sign-off the design with confidence within reasonable proof time, we present a pragmatic formal verification approach of complex ECC cores with several complexity reduction techniques and know-how that were learnt during the course of verification. We discuss using the linearity of the syndrome generator as a helper assertion, using the abstract model as glue logic to compare the RTL with the sequential version of the circuit, k-induction-based model checking and using mathematical relations captured as properties to simplify the verification in order to get an unbounded proof result within 24 hours of proof runtime.

Updated: 2024-04-28 18:31:09

标题: 安全关键设计中使用的顺序错误检测和纠错码（ECCs）的实用形式验证

摘要: 错误检测和校正码（ECCs）经常被用于数字设计中以保护数据完整性。特别是在诸如汽车电子等安全关键系统中，ECCs被广泛使用，考虑到ISO 26262安全标准，对这种复杂逻辑的验证变得更加关键。使用形式方法对ECC进行详尽验证一直是一个挑战，因为需要保护的数据位数较多。举例来说，对于一个包含128个数据位和可能检测最多四位错误的ECC，组合位错误的数量为128C1 + 128C2 + 128C3 + 128C4 = 1.1 * 10^7。这个庞大的分析空间通常导致有界的证明结果。此外，如果ECC具有顺序编码和解码阶段，则复杂性和状态空间进一步增加。为了克服这些问题，并能够在合理的证明时间内确保设计通过，我们提出了一种实用的形式验证方法，应用了多种复杂性降低技术和在验证过程中学到的知识。我们讨论了利用综合生成器的线性性作为辅助断言，使用抽象模型作为胶水逻辑来比较RTL与电路的顺序版本，基于k-归纳的模型检查以及使用数学关系捕获的属性来简化验证，以便在24小时内获得无界证明结果。

更新时间: 2024-04-28 18:31:09

领域: cs.AI

下载: http://arxiv.org/abs/2404.18270v1

Differentially-Private Hierarchical Federated Learning

While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. In this work, we propose \underline{H}ierarchical \underline{F}ederated Learning with \underline{H}ierarchical \underline{D}ifferential \underline{P}rivacy ({\tt H$^2$FDP}), a DP-enhanced FL methodology for jointly optimizing privacy and performance in hierarchical networks. Building upon recent proposals for Hierarchical Differential Privacy (HDP), one of the key concepts of {\tt H$^2$FDP} is adapting DP noise injection at different layers of an established FL hierarchy -- edge devices, edge servers, and cloud servers -- according to the trust models within particular subnetworks. We conduct a comprehensive analysis of the convergence behavior of {\tt H$^2$FDP}, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. Leveraging these relationships, we develop an adaptive control algorithm for {\tt H$^2$FDP} that tunes properties of local model training to minimize communication energy, latency, and the stationarity gap while striving to maintain a sub-linear convergence rate and meet desired privacy criteria. Subsequent numerical evaluations demonstrate that {\tt H$^2$FDP} obtains substantial improvements in these metrics over baselines for different privacy budgets, and validate the impact of different system configurations.

Updated: 2024-04-28 18:27:04

标题: 差分隐私层次化联邦学习

摘要: 在联邦学习（FL）中，消除了原始数据在网络上传输的需求，但仍然容易受到通过通信的模型参数造成的隐私泄霏的影响。本文提出了具有分层差分隐私的分层联邦学习（H2FDP），这是一种在分层网络中共同优化隐私和性能的DP增强型FL方法。基于最近提出的分层差分隐私（HDP）的构想，H2FDP的一个关键概念是根据特定子网络内的信任模型，在建立的FL层次结构的不同层次上对DP噪声注入进行调整 - 边缘设备、边缘服务器和云服务器。我们对H2FDP的收敛行为进行了全面分析，揭示了在参数调整下训练过程如何亚线性地收敛到一个取决于网络层次结构、信任模型和目标隐私水平的有限稳定性差值。利用这些关系，我们为H2FDP开发了一个自适应控制算法，调整本地模型训练的属性，以最小化通信能耗、延迟和稳定性差值，同时努力维持亚线性收敛速率并满足所需的隐私标准。随后的数值评估表明，H2FDP在不同隐私预算下在这些指标上获得了实质性的改进，并验证了不同系统配置的影响。

更新时间: 2024-04-28 18:27:04

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2401.11592v3

CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval

Users often make ambiguous requests that require clarification. We study the problem of asking clarification questions in an information retrieval setting, where systems often face ambiguous search queries and it is challenging to turn the uncertainty in the retrieval model into a natural language question. We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate. Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn. When evaluated on a real-world retrieval dataset of users searching for books, our system outperforms traditional heuristics such as information gain on retrieval success by 17% and vanilla-prompted LLMs by 39% relative.

Updated: 2024-04-28 18:21:31

标题: CLARINET: 为检索提出澄清问题增强语言模型

摘要: 用户经常提出需要澄清的模糊请求。我们研究了在信息检索环境中询问澄清问题的问题，系统经常面临模糊的搜索查询，将检索模型中的不确定性转化为自然语言问题是具有挑战性的。我们提出了CLARINET，一个通过选择能够最大化正确候选项中的确定性的问题来提出信息性澄清问题的系统。我们的方法通过增加一个大型语言模型（LLM）来对检索分布进行条件化，通过端到端微调生成在每个轮次最大化真实候选项排名的问题。在一个真实的用户搜索图书的检索数据集上进行评估时，我们的系统相对于传统的启发式方法，如检索成功的信息增益，性能提高了17％，相对于基本提示的LLM提高了39％。

更新时间: 2024-04-28 18:21:31

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.15784v1

LINOCS: Lookahead Inference of Networked Operators for Continuous Stability

Identifying latent interactions within complex systems is key to unlocking deeper insights into their operational dynamics, including how their elements affect each other and contribute to the overall system behavior. For instance, in neuroscience, discovering neuron-to-neuron interactions is essential for understanding brain function; in ecology, recognizing the interactions among populations is key for understanding complex ecosystems. Such systems, often modeled as dynamical systems, typically exhibit noisy high-dimensional and non-stationary temporal behavior that renders their identification challenging. Existing dynamical system identification methods often yield operators that accurately capture short-term behavior but fail to predict long-term trends, suggesting an incomplete capture of the underlying process. Methods that consider extended forecasts (e.g., recurrent neural networks) lack explicit representations of element-wise interactions and require substantial training data, thereby failing to capture interpretable network operators. Here we introduce Lookahead-driven Inference of Networked Operators for Continuous Stability (LINOCS), a robust learning procedure for identifying hidden dynamical interactions in noisy time-series data. LINOCS integrates several multi-step predictions with adaptive weights during training to recover dynamical operators that can yield accurate long-term predictions. We demonstrate LINOCS' ability to recover the ground truth dynamical operators underlying synthetic time-series data for multiple dynamical systems models (including linear, piece-wise linear, time-changing linear systems' decomposition, and regularized linear time-varying systems) as well as its capability to produce meaningful operators with robust reconstructions through various real-world examples.

Updated: 2024-04-28 18:16:58

标题: LINOCS：连续稳定性的网络操作员预测infer Lookahead Inference of Networked Operators for Continuous Stability

摘要: 在复杂系统内部识别潜在的相互作用是解锁其运行动态更深层洞察力的关键，包括了解其元素如何相互影响并对整个系统行为做出贡献。例如，在神经科学中，发现神经元之间的相互作用对于理解大脑功能至关重要；在生态学中，识别种群之间的相互作用对于理解复杂生态系统至关重要。这些系统通常被建模为动力系统，通常表现出嘈杂的高维度和非平稳的时间行为，这使得它们的识别具有挑战性。现有的动力系统识别方法通常产生能够准确捕捉短期行为但无法预测长期趋势的算子，表明对底层过程的捕捉不完整。考虑到扩展预测的方法（例如，递归神经网络）缺乏明确的元素间相互作用表示，并且需要大量的训练数据，因此无法捕捉可解释的网络算子。在这里，我们介绍了一种名为Lookahead-driven Inference of Networked Operators for Continuous Stability（LINOCS）的强大学习程序，用于识别嘈杂时间序列数据中隐藏的动态相互作用。LINOCS在训练过程中集成了几个多步预测，并使用自适应权重来恢复动态算子，这些算子可以产生准确的长期预测。我们展示了LINOCS在多个动力系统模型（包括线性、分段线性、时变线性系统分解和正则化线性时变系统）的合成时间序列数据中恢复地面真实动态算子的能力，以及通过各种真实世界示例产生有意义的算子和具有稳健重构的能力。

更新时间: 2024-04-28 18:16:58

领域: eess.SY,cs.LG,cs.SY,q-bio.QM

下载: http://arxiv.org/abs/2404.18267v1

Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

Nigerian Pidgin is an English-derived contact language and is traditionally an oral language, spoken by approximately 100 million people. No orthographic standard has yet been adopted, and thus the few available Pidgin datasets that exist are characterised by noise in the form of orthographic variations. This contributes to under-performance of models in critical NLP tasks. The current work is the first to describe various types of orthographic variations commonly found in Nigerian Pidgin texts, and model this orthographic variation. The variations identified in the dataset form the basis of a phonetic-theoretic framework for word editing, which is used to generate orthographic variations to augment training data. We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis. The proposed variation generation framework augments the training data with new orthographic variants which are relevant for the test set but did not occur in the training set originally. Our results demonstrate the positive effect of augmenting the training data with a combination of real texts from other corpora as well as synthesized orthographic variation, resulting in performance improvements of 2.1 points in sentiment analysis and 1.4 BLEU points in translation to English.

Updated: 2024-04-28 18:07:13

标题: 模拟正字变化改善尼日利亚皮钦语的自然语言处理性能

摘要: 尼日利亚皮钦语是一种源自英语的接触语言，传统上是口头语言，大约有1亿人口使用。尚未制定正字法标准，因此现有的皮钦语数据集少且存在正字法变体的噪音。这导致模型在关键的自然语言处理任务中表现不佳。本研究首次描述了尼日利亚皮钦语文本中常见的各种正字法变体，并对这些正字法变体进行建模。数据集中识别的变体构成了一个基于语音理论的单词编辑框架，用于生成正字法变体以增加训练数据。我们测试了这种数据增强对两个关键的自然语言处理任务的影响：机器翻译和情感分析。提出的变体生成框架通过使用新的正字法变体增加训练数据，这些变体在测试集中是相关的，但在原始训练集中并未出现。我们的结果表明，通过将训练数据与其他语料库中的真实文本以及合成的正字法变体相结合，可以实现情感分析性能提升2.1点，英语翻译BLEU分数提升1.4点。

更新时间: 2024-04-28 18:07:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.18264v1

Safe Deep Policy Adaptation

A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.

Updated: 2024-04-28 18:04:30

标题: 安全深度政策适应

摘要: 自主和人工智能的一个关键目标是使自主机器人能够在动态和不确定的环境中快速适应。经典的自适应控制和安全控制提供稳定性和安全性保证，但局限于特定系统类别。相反，基于强化学习（RL）的策略适应提供了灵活性和泛化能力，但也带来了安全性和稳健性挑战。我们提出了SafeDPA，这是一个新颖的RL和控制框架，同时解决了策略适应和安全强化学习的问题。SafeDPA在模拟中联合学习自适应策略和动力学模型，预测环境配置，并利用少量真实世界数据对动力学模型进行微调。在RL策略之上引入基于控制屏障函数（CBF）的安全过滤器，以确保在真实世界部署过程中的安全性。我们提供了SafeDPA的理论安全保证，并展示了SafeDPA对学习错误和额外干扰的稳健性。在经典控制问题（倒立摆）、模拟基准测试（Safety Gym）和真实世界敏捷机器人平台（RC Car）上的全面实验表明，SafeDPA在安全性和任务性能方面优于最先进的基线。特别是，在真实世界实验中出现未知干扰时，SafeDPA表现出显著的泛化能力，安全率比基线提高了300%。

更新时间: 2024-04-28 18:04:30

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.08602v3

Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning

An advantage of Large Language Models (LLMs) is their contextualization capability - providing different responses based on student inputs like solution strategy or prior discussion, to potentially better engage students than standard feedback. We present a design and evaluation of a proof-of-concept LLM application to offer students dynamic and contextualized feedback. Specifically, we augment an Online Programming Exercise bot for a college-level Cloud Computing course with ChatGPT, which offers students contextualized reflection triggers during a collaborative query optimization task in database design. We demonstrate that LLMs can be used to generate highly situated reflection triggers that incorporate details of the collaborative discussion happening in context. We discuss in depth the exploration of the design space of the triggers and their correspondence with the learning objectives as well as the impact on student learning in a pilot study with 34 students.

Updated: 2024-04-28 17:56:14

标题: 生成关于替代解决方案路径的情境反思触发器：支持计算机辅助协作学习的生成式AI案例研究

摘要: 大型语言模型（LLMs）的一个优势是它们的情境化能力-根据学生输入（如解决方案策略或先前讨论）提供不同的响应，以更好地吸引学生而不是标准反馈。我们提出了一个概念验证LLM应用程序的设计和评估，以向学生提供动态和情境化的反馈。具体来说，我们在一门大学级云计算课程的在线编程练习机器人上增加了ChatGPT，该机器人在数据库设计中的协作查询优化任务中为学生提供情境化的反思触发器。我们展示了LLMs可以用来生成高度情境化的反思触发器，这些触发器包含正在上下文中发生的协作讨论的细节。我们深入讨论了触发器设计空间的探索以及它们与学习目标的对应以及在34名学生的试点研究中对学生学习的影响。

更新时间: 2024-04-28 17:56:14

领域: cs.AI

下载: http://arxiv.org/abs/2404.18262v1

Riemannian Laplace Approximation with the Fisher Metric

Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties depend heavily on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments.

Updated: 2024-04-28 17:47:04

标题: 用费舍尔度量的黎曼Laplace近似

摘要: 拉普拉斯方法通过在其模态处用高斯分布来近似目标密度。由于伯恩斯坦-冯米斯定理，它在贝叶斯推断中具有计算效率和渐近精确性，但对于复杂目标和有限数据后验来说，它通常是一个过于粗糙的近似。拉普拉斯近似的最近一般化将高斯近似根据选择的黎曼几何变换，提供了一个更丰富的近似族，同时仍保持计算效率。然而，正如本文所示，其性质严重依赖于选择的度量，事实上，在先前的工作中采用的度量导致了近似过于狭窄，甚至在有限数据的极限情况下也具有偏差。我们通过进一步发展近似族，推导出两种在无限数据极限下精确的替代变体，扩展了该方法的理论分析，并在一系列实验中展示了实际改进。

更新时间: 2024-04-28 17:47:04

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2311.02766v5

Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing

Developers often dedicate significant time to maintaining and refactoring existing code. However, most prior work on generative models for code focuses solely on creating new code, overlooking the distinctive needs of editing existing code. In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same codebase. Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks. We represent code changes using a line diff format and employ static analysis to form large customized model contexts, ensuring the availability of appropriate information for prediction. We collect a code editing dataset from the commit histories of 1650 open-source Python projects for training and evaluation. In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models (bringing exact-match accuracy from 34.7 up to 60.4), demonstrating the benefits of incorporating editing history for code completion. In a multi-round, multi-edit setting, we observe substantial gains by iteratively conditioning on additional user edits. We have open-sourced our code, data, and model weights to encourage future research and have released a VSCode extension powered by our model for interactive IDE usage.

Updated: 2024-04-28 17:45:56

标题: 合著者：利用上下文变化进行多轮代码自动编辑

摘要: 开发人员通常会花费大量时间来维护和重构现有代码。然而，大多数关于代码生成模型的先前工作仅专注于创建新代码，忽视了编辑现有代码的独特需求。在这项工作中，我们探索了一个多轮代码自动编辑设置，旨在根据同一代码库中的最近更改来预测对代码区域的编辑。我们的模型Coeditor是一个经过微调的专门设计用于代码编辑任务的语言模型。我们使用行差异格式表示代码更改，并使用静态分析形成大型定制模型上下文，确保预测所需的适当信息可用。我们从1650个开源Python项目的提交历史中收集了一个代码编辑数据集用于训练和评估。在一个简化的单轮、单编辑任务中，Coeditor在性能上明显优于GPT-3.5和SOTA开源代码完成模型（将精确匹配准确率从34.7提升到60.4），展示了为代码完成集成编辑历史的好处。在一个多轮、多编辑设置中，我们观察到通过迭代地根据额外用户编辑进行调整而实现了实质性的收益。我们已经开源了我们的代码、数据和模型权重，以鼓励未来的研究，并发布了一个由我们的模型驱动的VSCode扩展程序，用于交互式IDE使用。

更新时间: 2024-04-28 17:45:56

领域: cs.SE,cs.LG,cs.PL

下载: http://arxiv.org/abs/2305.18584v2

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai

Updated: 2024-04-28 17:42:13

标题: WMDP基准：通过遗忘机制来衡量和减少恶意使用

摘要: 白宫关于人工智能的行政命令强调了大型语言模型（LLMs）赋予恶意行为者在开发生物、网络和化学武器方面的能力的风险。为了衡量这些恶意使用的风险，政府机构和主要人工智能实验室正在开发对LLMs中危险能力进行评估。然而，目前的评估是私有的，阻碍了进一步研究减轻风险。此外，它们仅关注几条高度具体的恶意使用路径。为了填补这些空白，我们公开发布了大规模杀伤性武器代理（WMDP）基准，这是一个包含3,668个选择题的数据集，作为生物安全、网络安全和化学安全中危险知识的代理测量。WMDP由一组学术界和技术顾问共同开发，并经过严格筛选以消除敏感信息后进行公开发布。WMDP发挥了两个作用：首先，作为LLMs中危险知识的评估，其次，作为去除这种危险知识的遗忘方法的基准。为了指导去除的进展，我们开发了RMU，一种基于控制模型表示的最新遗忘方法。RMU在WMDP上降低了模型的性能，同时保持了生物学和计算机科学等领域的一般能力，表明去除可能是从LLMs中减少恶意使用的具体途径。我们在https://wmdp.ai上公开发布我们的基准和代码。

更新时间: 2024-04-28 17:42:13

领域: cs.LG,cs.AI,cs.CL,cs.CY

下载: http://arxiv.org/abs/2403.03218v5

PatentGPT: A Large Language Model for Intellectual Property

In recent years, large language models have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) space is challenging due to the strong need for specialized knowledge, privacy protection, processing of extremely long text in this field. In this technical report, we present for the first time a low-cost, standardized procedure for training IP-oriented LLMs, meeting the unique requirements of the IP domain. Using this standard process, we have trained the PatentGPT series models based on open-source pretrained models. By evaluating them on the open-source IP-oriented benchmark MOZIP, our domain-specific LLMs outperforms GPT-4, indicating the effectiveness of the proposed training procedure and the expertise of the PatentGPT models in the IP demain. What is impressive is that our model significantly outperformed GPT-4 on the 2019 China Patent Agent Qualification Examination by achieving a score of 65, reaching the level of human experts. Additionally, the PatentGPT model, which utilizes the SMoE architecture, achieves performance comparable to that of GPT-4 in the IP domain and demonstrates a better cost-performance ratio on long-text tasks, potentially serving as an alternative to GPT-4 within the IP domain.

Updated: 2024-04-28 17:36:43

标题: 专利GPT：一种用于知识产权的大型语言模型

摘要: 最近几年，由于大型语言模型在多种自然语言处理任务中表现出色，吸引了广泛关注，并在各个领域得到广泛应用。然而，在知识产权（IP）领域应用大型语言模型具有挑战性，因为在该领域需要专业知识、隐私保护和处理极长文本。在这份技术报告中，我们首次提出了一种低成本、标准化的IP导向LLM训练程序，满足IP领域的独特需求。使用这一标准流程，我们基于开源预训练模型训练了PatentGPT系列模型。通过在开源IP导向基准MOZIP上评估它们，我们的领域特定LLM在性能上优于GPT-4，表明了提出的训练程序的有效性以及PatentGPT模型在IP领域的专业知识。令人印象深刻的是，我们的模型在2019年中国专利代理人资格考试中表现优异，取得了65分的成绩，达到了人类专家水平。此外，采用SMoE架构的PatentGPT模型在IP领域实现了与GPT-4相当的性能，并在长文本任务上表现出更好的成本性能比，有望成为IP领域内GPT-4的替代方案。

更新时间: 2024-04-28 17:36:43

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2404.18255v1

Lifted Inference beyond First-Order Logic

Weighted First Order Model Counting (WFOMC) is fundamental to probabilistic inference in statistical relational learning models. As WFOMC is known to be intractable in general ($\#$P-complete), logical fragments that admit polynomial time WFOMC are of significant interest. Such fragments are called domain liftable. Recent works have shown that the two-variable fragment of first order logic extended with counting quantifiers ($\mathrm{C^2}$) is domain-liftable. However, many properties of real-world data, like acyclicity in citation networks and connectivity in social networks, cannot be modeled in $\mathrm{C^2}$, or first order logic in general. In this work, we expand the domain liftability of $\mathrm{C^2}$ with multiple such properties. We show that any $\mathrm{C^2}$ sentence remains domain liftable when one of its relations is restricted to represent a directed acyclic graph, a connected graph, a tree (resp. a directed tree) or a forest (resp. a directed forest). All our results rely on a novel and general methodology of "counting by splitting". Besides their application to probabilistic inference, our results provide a general framework for counting combinatorial structures. We expand a vast array of previous results in discrete mathematics literature on directed acyclic graphs, phylogenetic networks, etc.

Updated: 2024-04-28 17:35:00

标题: 超越一阶逻辑的提升推理

摘要: 加权一阶模型计数（WFOMC）对于统计关系学习模型中的概率推理至关重要。由于WFOMC通常是难解的（$\#$P-complete），因此具有多项式时间WFOMC的逻辑片段具有重要意义。这种片段被称为领域可提升。最近的研究表明，使用计数量词扩展的一阶逻辑的双变量片段（$\mathrm{C^2}$）是领域可提升的。然而，许多现实世界数据的特性，如引用网络中的无环性和社交网络中的连通性，不能在$\mathrm{C^2}$或一阶逻辑中进行建模。在这项工作中，我们扩展了$\mathrm{C^2}$的领域可提升性，包括多种属性。我们展示了当$\mathrm{C^2}$句子的其中一个关系被限制表示为有向无环图、连通图、树（或有向树）或森林（或有向森林）时，其仍然是领域可提升的。所有结果都基于一种新颖且通用的“分割计数”方法。除了在概率推理中的应用外，我们的结果还为计数组合结构提供了一个通用框架。我们扩展了在离散数学文献中关于有向无环图、系统发生网络等方面的大量先前结果。

更新时间: 2024-04-28 17:35:00

领域: cs.AI,cs.LO,math.CO

下载: http://arxiv.org/abs/2308.11738v2

Learning to Detect Slip through Tactile Estimation of the Contact Force Field and its Entropy

Detection of slip during object grasping and manipulation plays a vital role in object handling. Existing solutions primarily rely on visual information to devise a strategy for grasping. However, for robotic systems to attain a level of proficiency comparable to humans, especially in consistently handling and manipulating unfamiliar objects, integrating artificial tactile sensing is increasingly essential. We introduce a novel physics-informed, data-driven approach to detect slip continuously in real time. We employ the GelSight Mini, an optical tactile sensor, attached to custom-designed grippers to gather tactile data. Our work leverages the inhomogeneity of tactile sensor readings during slip events to develop distinctive features and formulates slip detection as a classification problem. To evaluate our approach, we test multiple data-driven models on 10 common objects under different loading conditions, textures, and materials. Our results show that the best classification algorithm achieves a high average accuracy of 95.61%. We further illustrate the practical application of our research in dynamic robotic manipulation tasks, where our real-time slip detection and prevention algorithm is implemented.

Updated: 2024-04-28 17:29:03

标题: 学习通过触觉估计接触力场及其熵来检测滑动

摘要: 在物体抓取和操纵过程中，滑动的检测起着至关重要的作用。现有的解决方案主要依赖于视觉信息来制定抓取策略。然而，为了使机器人系统达到与人类相媲美的熟练水平，尤其是在持续处理和操纵陌生物体方面，集成人工触觉传感器变得越来越重要。我们介绍了一种新颖的、受物理启发的、数据驱动的方法，可以实时连续检测滑动。我们使用GelSight Mini，一种光学触觉传感器，连接到定制设计的夹具上收集触觉数据。我们利用滑动事件期间触觉传感器读数的不均匀性来开发独特的特征，并将滑动检测形式化为一个分类问题。为了评估我们的方法，我们在不同加载条件、纹理和材料下测试了10种常见物体上的多个数据驱动模型。我们的结果显示，最佳分类算法的平均准确率达到了95.61%。我们进一步展示了我们研究在动态机器人操纵任务中的实际应用，其中我们的实时滑动检测和预防算法得以实施。

更新时间: 2024-04-28 17:29:03

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2303.00935v4

Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment

With the rise of Visual and Language Pretraining (VLP), an increasing number of downstream tasks are adopting the paradigm of pretraining followed by fine-tuning. Although this paradigm has demonstrated potential in various multimodal downstream tasks, its implementation in the remote sensing domain encounters some obstacles. Specifically, the tendency for same-modality embeddings to cluster together impedes efficient transfer learning. To tackle this issue, we review the aim of multimodal transfer learning for downstream tasks from a unified perspective, and rethink the optimization process based on three distinct objectives. We propose "Harmonized Transfer Learning and Modality Alignment (HarMA)", a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment, while minimizing training overhead through parameter-efficient fine-tuning. Remarkably, without the need for external data for training, HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing. Our experiments reveal that HarMA achieves competitive and even superior performance to fully fine-tuned models with only minimal adjustable parameters. Due to its simplicity, HarMA can be integrated into almost all existing multimodal pretraining models. We hope this method can facilitate the efficient application of large models to a wide range of downstream tasks while significantly reducing the resource consumption. Code is available at https://github.com/seekerhuang/HarMA.

Updated: 2024-04-28 17:20:08

标题: 高效的远程感知：基于协调迁移学习和模态对齐的方法

摘要: 随着视觉和语言预训练（VLP）的兴起，越来越多的下游任务采用了预训练后微调的范式。尽管这种范式在各种多模态下游任务中展现出潜力，但在遥感领域的实施遇到了一些障碍。具体来说，相同模态嵌入聚集在一起的倾向阻碍了有效的迁移学习。为了解决这个问题，我们从统一的视角重新审视了多模态迁移学习的目标，重新思考了基于三个不同目标的优化过程。我们提出了“协调迁移学习和模态对齐（HarMA）”，这是一种方法，同时满足任务约束、模态对齐和单模态统一对齐，同时通过参数高效微调来最小化训练开销。值得注意的是，不需要外部数据进行训练，HarMA在遥感领域两个热门的多模态检索任务中实现了最先进的性能。我们的实验表明，HarMA仅通过微调极少调整参数就能达到与完全微调模型竞争甚至优越的性能。由于其简单性，HarMA可以集成到几乎所有现有的多模态预训练模型中。我们希望这种方法能够促进大型模型在广泛的下游任务中的有效应用，同时显著减少资源消耗。代码可在 https://github.com/seekerhuang/HarMA 上找到。

更新时间: 2024-04-28 17:20:08

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.18253v1

Machine Learning for Blockchain Data Analysis: Progress and Opportunities

Blockchain technology has rapidly emerged to mainstream attention, while its publicly accessible, heterogeneous, massive-volume, and temporal data are reminiscent of the complex dynamics encountered during the last decade of big data. Unlike any prior data source, blockchain datasets encompass multiple layers of interactions across real-world entities, e.g., human users, autonomous programs, and smart contracts. Furthermore, blockchain's integration with cryptocurrencies has introduced financial aspects of unprecedented scale and complexity such as decentralized finance, stablecoins, non-fungible tokens, and central bank digital currencies. These unique characteristics present both opportunities and challenges for machine learning on blockchain data. On one hand, we examine the state-of-the-art solutions, applications, and future directions associated with leveraging machine learning for blockchain data analysis critical for the improvement of blockchain technology such as e-crime detection and trends prediction. On the other hand, we shed light on the pivotal role of blockchain by providing vast datasets and tools that can catalyze the growth of the evolving machine learning ecosystem. This paper serves as a comprehensive resource for researchers, practitioners, and policymakers, offering a roadmap for navigating this dynamic and transformative field.

Updated: 2024-04-28 17:18:08

标题: 区块链数据分析的机器学习：进展与机遇

摘要: 区块链技术迅速引起主流关注，其公开可访问、异构、大容量和时间序列的数据让人联想到在大数据时代的复杂动态。与以往的数据源不同，区块链数据集涵盖了跨现实世界实体的多层交互，例如人类用户、自主程序和智能合约。此外，区块链与加密货币的整合引入了规模和复杂性前所未有的金融方面，如去中心化金融、稳定币、非同质化代币和中央银行数字货币。这些独特特征既为机器学习在区块链数据上提供了机遇，也带来了挑战。一方面，我们研究了与利用机器学习进行区块链数据分析相关的最新解决方案、应用程序和未来方向，这对于改进区块链技术，如电子犯罪检测和趋势预测至关重要。另一方面，我们通过提供庞大的数据集和工具，阐明了区块链在促进不断发展的机器学习生态系统增长中的关键作用。本文为研究人员、从业者和政策制定者提供了全面的资源，为他们在这个充满活力和变革的领域中提供了一份导航路线。

更新时间: 2024-04-28 17:18:08

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2404.18251v1

Classical integrability in the presence of a cosmological constant: analytic and machine learning results

We study the integrability of two-dimensional theories that are obtained by a dimensional reduction of certain four-dimensional gravitational theories describing the coupling of Maxwell fields and neutral scalar fields to gravity in the presence of a potential for the neutral scalar fields. By focusing on a certain solution subspace, we show that a subset of the equations of motion in two dimensions are the compatibility conditions for a modified version of the Breitenlohner-Maison linear system. Subsequently, we study the Liouville integrability of the 2D models encoding the chosen 4D solution subspace from a one-dimensional point of view by constructing Lax pair matrices. In this endeavour, we successfully employ a linear neural network to search for Lax pair matrices for these models, thereby illustrating how machine learning approaches can be effectively implemented to augment the identification of integrable structures in classical systems.

Updated: 2024-04-28 17:02:24

标题: 存在宇宙常数时的经典可积性：分析和机器学习结果

摘要: 我们研究了通过降维得到的二维理论的可积性，这些理论是描述了麦克斯韦场和中性标量场与引力耦合在一起并存在中性标量场势的特定四维引力理论的降维。通过专注于特定的解子空间，我们展示了二维中的一部分运动方程是修正Breitenlohner-Maison线性系统的兼容性条件。随后，我们从一维的角度研究了编码所选四维解子空间的2D模型的Liouville可积性，通过构建Lax对矩阵。在这个努力中，我们成功地利用线性神经网络来寻找这些模型的Lax对矩阵，从而说明了机器学习方法如何有效地用于增强对古典系统中可积结构的识别。

更新时间: 2024-04-28 17:02:24

领域: hep-th,cs.LG,math-ph,math.MP

下载: http://arxiv.org/abs/2404.18247v1

AdaFSNet: Time Series Classification Based on Convolutional Network with a Adaptive and Effective Kernel Size Configuration

Time series classification is one of the most critical and challenging problems in data mining, existing widely in various fields and holding significant research importance. Despite extensive research and notable achievements with successful real-world applications, addressing the challenge of capturing the appropriate receptive field (RF) size from one-dimensional or multi-dimensional time series of varying lengths remains a persistent issue, which greatly impacts performance and varies considerably across different datasets. In this paper, we propose an Adaptive and Effective Full-Scope Convolutional Neural Network (AdaFSNet) to enhance the accuracy of time series classification. This network includes two Dense Blocks. Particularly, it can dynamically choose a range of kernel sizes that effectively encompass the optimal RF size for various datasets by incorporating multiple prime numbers corresponding to the time series length. We also design a TargetDrop block, which can reduce redundancy while extracting a more effective RF. To assess the effectiveness of the AdaFSNet network, comprehensive experiments were conducted using the UCR and UEA datasets, which include one-dimensional and multi-dimensional time series data, respectively. Our model surpassed baseline models in terms of classification accuracy, underscoring the AdaFSNet network's efficiency and effectiveness in handling time series classification tasks.

Updated: 2024-04-28 16:58:53

标题: AdaFSNet：基于自适应和有效核大小配置的卷积网络的时间序列分类

摘要: 时间序列分类是数据挖掘中最关键和具有挑战性的问题之一，广泛存在于各个领域并具有重要的研究意义。尽管已经进行了广泛的研究并取得了显著的成就，并成功应用于现实世界中，但从不同长度的一维或多维时间序列中捕获适当的感受野（RF）大小的挑战仍然是一个持续存在的问题，这极大地影响性能，并且在不同数据集之间变化很大。在本文中，我们提出了一种自适应和有效的全范围卷积神经网络（AdaFSNet）来提高时间序列分类的准确性。该网络包括两个密集块。特别地，它可以通过结合与时间序列长度对应的多个质数，动态选择一系列有效地包含各个数据集的最佳RF大小的卷积核大小。我们还设计了一个TargetDrop块，可以在提取更有效的RF时减少冗余。为了评估AdaFSNet网络的有效性，我们使用UCR和UEA数据集进行了全面的实验，分别包括一维和多维时间序列数据。我们的模型在分类准确性方面超过了基线模型，突显了AdaFSNet网络在处理时间序列分类任务中的效率和有效性。

更新时间: 2024-04-28 16:58:53

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.18246v1

A Cross Attention Approach to Diagnostic Explainability using Clinical Practice Guidelines for Depression

The lack of explainability using relevant clinical knowledge hinders the adoption of Artificial Intelligence-powered analysis of unstructured clinical dialogue. A wealth of relevant, untapped Mental Health (MH) data is available in online communities, providing the opportunity to address the explainability problem with substantial potential impact as a screening tool for both online and offline applications. We develop a method to enhance attention in popular transformer models and generate clinician-understandable explanations for classification by incorporating external clinical knowledge. Inspired by how clinicians rely on their expertise when interacting with patients, we leverage relevant clinical knowledge to model patient inputs, providing meaningful explanations for classification. This will save manual review time and engender trust. We develop such a system in the context of MH using clinical practice guidelines (CPG) for diagnosing depression, a mental health disorder of global concern. We propose an application-specific language model called ProcesS knowledge-infused cross ATtention (PSAT), which incorporates CPGs when computing attention. Through rigorous evaluation on three expert-curated datasets related to depression, we demonstrate application-relevant explainability of PSAT. PSAT also surpasses the performance of nine baseline models and can provide explanations where other baselines fall short. We transform a CPG resource focused on depression, such as the Patient Health Questionnaire (e.g. PHQ-9) and related questions, into a machine-readable ontology using SNOMED-CT. With this resource, PSAT enhances the ability of models like GPT-3.5 to generate application-relevant explanations.

Updated: 2024-04-28 16:41:37

标题: 一种利用抑郁症临床实践指导方针的诊断可解释性的交叉注意力方法

摘要: 使用相关临床知识缺乏解释性阻碍了人工智能驱动的非结构化临床对话分析的采用。在线社区中提供了大量相关且未被开发的心理健康（MH）数据，这为通过大量潜在影响作为在线和离线应用的筛选工具来解决解释性问题提供了机会。我们开发了一种方法，通过整合外部临床知识来增强流行的变压器模型的关注力，并为分类生成临床人员可理解的解释。受到临床医生在与患者交流时依赖他们的专业知识的启发，我们利用相关的临床知识来对患者输入进行建模，为分类提供有意义的解释。这将节省手动审核时间并建立信任。我们在MH背景下使用临床实践指南（CPG）来诊断全球关注的精神健康障碍抑郁症的情况下开发了这样的系统。我们提出了一个名为ProcesS knowledge-infused cross ATtention（PSAT）的应用特定语言模型，该模型在计算注意力时整合了CPG。通过对与抑郁症相关的三个专家策划的数据集进行严格评估，我们展示了PSAT的应用相关可解释性。PSAT还超越了九个基准模型的性能，并且可以在其他基准模型无法提供解释的情况下提供解释。我们将专注于抑郁症的CPG资源，如患者健康问卷（例如PHQ-9）和相关问题，转化为机器可读的本体论，使用SNOMED-CT。借助这一资源，PSAT增强了像GPT-3.5这样的模型生成应用相关解释的能力。

更新时间: 2024-04-28 16:41:37

领域: cs.AI

下载: http://arxiv.org/abs/2311.13852v2

SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning

Large Language Models (LLMs) have highlighted the necessity of effective unlearning mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims at removing undesired data influences and associated model capabilities without compromising utility out of the scope of unlearning. While interest in studying LLM unlearning is growing,the impact of the optimizer choice for LLM unlearning remains under-explored. In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between {second-order optimization} and influence unlearning (a classical approach using influence functions to update the model for data influence removal). This insight propels us to develop a second-order unlearning framework, termed SOUL, built upon the second-order clipped stochastic optimization (Sophia)-based LLM training method. SOUL extends the static, one-shot model update using influence unlearning to a dynamic, iterative unlearning process. Our extensive experiments show that SOUL consistently outperforms conventional first-order methods across various unlearning tasks, models, and metrics, suggesting the promise of second-order optimization in providing a scalable and easily implementable solution for LLM unlearning.

Updated: 2024-04-28 16:31:32

标题: 灵魂：解锁用于LLM遗忘的二阶优化的力量

摘要: 大型语言模型(LLMs)凸显了有效取消学习机制的必要性，以符合数据规范和道德人工智能实践。LLM取消学习旨在消除不受欢迎的数据影响和相关模型能力，而不损害取消学习范围之外的效用。尽管对研究LLM取消学习的兴趣正在增长，但LLM取消学习的优化器选择的影响仍未得到充分探讨。在这项工作中，我们首次揭示了LLM取消学习中优化器选择的重要性，建立了{二阶优化}与影响取消学习之间的明确联系(一种使用影响函数来更新模型以移除数据影响的经典方法)。这一洞见促使我们开发了一个基于二阶剪切随机优化(Sophia)的LLM训练方法的二阶取消学习框架，称为SOUL。SOUL将使用影响取消学习进行静态、一次性模型更新的方法扩展为动态、迭代的取消学习过程。我们的广泛实验表明，SOUL在各种取消学习任务、模型和指标中始终优于传统的一阶方法，表明二阶优化在提供一种可扩展且易于实施的解决方案方面具有潜力，用于LLM取消学习。

更新时间: 2024-04-28 16:31:32

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2404.18239v1

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance of state-of-the-art models in zero- and few-shot scenarios for most tasks, and particularly well for the QA task, even though they have never seen examples from these tasks before. However, we observed that the classification and RE tasks perform below what can be achieved with a specifically trained model for the medical field, such as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all the studied tasks, with some models being better suited for certain tasks than others.

Updated: 2024-04-28 16:17:43

标题: 一项关于指导微调大型语言模型应用于临床和生物医学任务的零样本学习和少样本学习研究

摘要: 我们评估了四种最先进的指令调整的大型语言模型（LLMs）——ChatGPT、Flan-T5 UL2、Tk-Instruct和Alpaca——在一组包括命名实体识别（NER）、问答（QA）、关系抽取（RE）等13个真实世界的临床和生物医学自然语言处理（NLP）任务上的表现。我们的整体结果表明，评估的LLMs在大多数任务的零次和少次样本情况下开始接近最先进模型的性能，尤其是在QA任务上表现良好，即使它们以前从未见过这些任务的示例。然而，我们观察到分类和RE任务的表现低于专门针对医学领域训练的模型（如PubMedBERT）能够达到的水平。最后，我们注意到没有一个LLM在所有研究任务上都优于其他模型，一些模型更适合某些任务。

更新时间: 2024-04-28 16:17:43

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2307.12114v2

A Note on Asynchronous Challenges: Unveiling Formulaic Bias and Data Loss in the Hayashi-Yoshida Estimator

The Hayashi-Yoshida (\HY)-estimator exhibits an intrinsic, telescoping property that leads to an often overlooked computational bias, which we denote,formulaic or intrinsic bias. This formulaic bias results in data loss by cancelling out potentially relevant data points, the nonextant data points. This paper attempts to formalize and quantify the data loss arising from this bias. In particular, we highlight the existence of nonextant data points via a concrete example, and prove necessary and sufficient conditions for the telescoping property to induce this type of formulaic bias.Since this type of bias is nonexistent when inputs, i.e., observation times, $\Pi^{(1)} :=(t_i^{(1)})_{i=0,1,\ldots}$ and $\Pi^{(2)} :=(t_j^{(2)})_{j=0,1,\ldots}$, are synchronous, we introduce the (a,b)-asynchronous adversary. This adversary generates inputs $\Pi^{(1)}$ and $\Pi^{(2)}$ according to two independent homogenous Poisson processes with rates a>0 and b>0, respectively. We address the foundational questions regarding cumulative minimal (or least) average data point loss, and determine the values for a and b. We prove that for equal rates a=b, the minimal average cumulative data loss over both inputs is attained and amounts to 25\%. We present an algorithm, which is based on our theorem, for computing the exact number of nonextant data points given inputs $\Pi^{(1)}$ and $\Pi^{(2)}$, and suggest alternative methods. Finally, we use simulated data to empirically compare the (cumulative) average data loss of the (\HY)-estimator.

Updated: 2024-04-28 16:14:31

标题: 关于异步挑战的注解：揭示早矶-吉田估计器中的公式偏差和数据丢失

摘要: Hayashi-Yoshida（HY）估计器具有一种固有的、具有望远镜特性的属性，导致通常被忽视的计算偏差，我们将其称为公式性或固有偏差。这种公式性偏差导致数据丢失，通过取消潜在相关的数据点，即不存在的数据点。本文试图形式化和量化由此偏差引起的数据丢失。特别地，我们通过一个具体例子突出了不存在的数据点的存在，并证明了望远镜特性引起这种类型的公式性偏差的必要和充分条件。由于当输入（即观测时间）$\Pi^{(1)}:=(t_i^{(1)})_{i=0,1,\ldots}$ 和 $\Pi^{(2)} :=(t_j^{(2)})_{j=0,1,\ldots}$ 同步时，这种类型的偏差不存在，我们引入了（a，b）-异步对手。该对手根据两个独立的速率分别为a>0和b>0的齐次泊松过程生成输入$\Pi^{(1)}$和$\Pi^{(2)}$。我们讨论了关于累积最小（或最小）平均数据点丢失的基础问题，并确定了a和b的值。我们证明，对于相等的速率a=b，两个输入的累积平均数据损失最小，并且为25\%。我们提出了一种基于我们的定理的算法，用于计算给定输入$\Pi^{(1)}$和$\Pi^{(2)}$的不存在数据点的确切数量，并建议替代方法。最后，我们使用模拟数据对（累积）平均数据丢失进行了经验比较。

更新时间: 2024-04-28 16:14:31

领域: stat.ML,cs.LG,math.CO,math.PR

下载: http://arxiv.org/abs/2404.18233v1

End-to-End Mesh Optimization of a Hybrid Deep Learning Black-Box PDE Solver

Deep learning has been widely applied to solve partial differential equations (PDEs) in computational fluid dynamics. Recent research proposed a PDE correction framework that leverages deep learning to correct the solution obtained by a PDE solver on a coarse mesh. However, end-to-end training of such a PDE correction model over both solver-dependent parameters such as mesh parameters and neural network parameters requires the PDE solver to support automatic differentiation through the iterative numerical process. Such a feature is not readily available in many existing solvers. In this study, we explore the feasibility of end-to-end training of a hybrid model with a black-box PDE solver and a deep learning model for fluid flow prediction. Specifically, we investigate a hybrid model that integrates a black-box PDE solver into a differentiable deep graph neural network. To train this model, we use a zeroth-order gradient estimator to differentiate the PDE solver via forward propagation. Although experiments show that the proposed approach based on zeroth-order gradient estimation underperforms the baseline that computes exact derivatives using automatic differentiation, our proposed method outperforms the baseline trained with a frozen input mesh to the solver. Moreover, with a simple warm-start on the neural network parameters, we show that models trained by these zeroth-order algorithms achieve an accelerated convergence and improved generalization performance.

Updated: 2024-04-28 16:01:21

标题: 一个混合深度学习黑盒PDE求解器的端到端网格优化

摘要: 深度学习已被广泛应用于计算流体动力学中解决偏微分方程（PDEs）。最近的研究提出了一种PDE校正框架，利用深度学习来校正在粗网格上由PDE求解器获得的解。然而，对这样一个PDE校正模型进行端到端训练，需要考虑求解器相关参数（如网格参数和神经网络参数），这要求PDE求解器支持通过迭代数值过程进行的自动微分。许多现有求解器并不具备这样的功能。在本研究中，我们探讨了使用黑盒PDE求解器和深度学习模型进行流体流预测的混合模型的端到端训练的可行性。具体而言，我们研究了一个将黑盒PDE求解器集成到可微分深度图神经网络中的混合模型。为了训练这个模型，我们使用零阶梯度估计器通过正向传播来区分PDE求解器。虽然实验表明，基于零阶梯度估计的提出方法表现不如使用自动微分计算精确导数的基准方法，但我们提出的方法优于使用冻结输入网格进行训练的基准方法。此外，通过对神经网络参数进行简单的热启动，我们展示了通过这些零阶算法训练的模型实现了加速收敛和改善泛化性能。

更新时间: 2024-04-28 16:01:21

领域: cs.LG,cs.NA,math.NA,math.OC

下载: http://arxiv.org/abs/2404.11766v2

Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data

Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration. In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). It offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets; (2) it safeguards patient privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized server. We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing it enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability.

Updated: 2024-04-28 16:00:01

标题: 去中心化、协作和隐私保护的多医院数据机器学习

摘要: 机器学习（ML）已经展示出在医疗数据分析方面的巨大潜力。从不同来源和环境收集的大型数据集对于在医疗保健领域中的ML模型实现更好的准确性和泛化能力至关重要。由于隐私和监管要求复杂和多样化，跨不同医疗机构共享数据是具有挑战性的。因此，允许多个参与方共同训练一个ML模型，利用每个参与方可用的私有数据集，而无需直接共享这些数据集或通过合作损害这些数据集的隐私是困难但至关重要的。在本文中，我们通过提出去中心化、协作和隐私保护的多医院数据ML（DeCaPH）来应对这一挑战。它提供以下关键好处：（1）允许不同参与方共同训练一个ML模型，而无需传输其私有数据集；（2）通过限制在训练过程中跨参与方共享的任何内容引起的潜在隐私泄露，保护患者隐私；以及（3）促进ML模型的训练，而不依赖于中央服务器。我们利用真实世界分布式医疗数据集展示了DeCaPH在三个不同任务上的泛化性能和实力：使用电子健康记录进行患者死亡率预测，使用单个细胞人类基因组进行细胞类型分类，以及使用胸部放射学图像进行病理学识别。我们展示了使用DeCaPH框架训练的ML模型具有改进的效用-隐私权衡，表明它使模型能够在保护训练数据点隐私的同时表现出良好性能。此外，在一般情况下，使用DeCaPH框架训练的ML模型胜过仅使用各方私有数据集训练的模型，表明DeCaPH增强了模型的泛化能力。

更新时间: 2024-04-28 16:00:01

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2402.00205v2

Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers

Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models with a downstream application and thus error quantification plays a key role. However, by ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of the Gaussian process inference theorem to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models by blurring the boundaries between numerical analysis and Bayesian inference.

Updated: 2024-04-28 15:57:36

标题: 物理信息高斯过程回归泛化线性偏微分方程求解器

摘要: 线性偏微分方程（PDEs）是一类重要的广泛应用的机械模型，描述了诸如热传导、电磁学和波传播等物理过程。在实践中，基于离散化的专门数值方法被用于解决PDEs。它们通常使用未知模型参数的估计和（如果可用）物理测量来进行初始化。这些求解器通常被嵌入到具有下游应用的更大科学模型中，因此错误量化起着关键作用。然而，通过忽略参数和测量不确定性，传统的PDE求解器可能无法产生其固有近似误差的一致估计。在这项工作中，我们以一种基于原则的方式将解决线性PDEs解释为受物理启发的高斯过程（GP）回归。我们的框架基于将高斯过程推断定理泛化为通过任意有界线性算子进行的观察。关键的是，这种概率观点允许（1）量化固有的离散化误差；（2）将关于模型参数的不确定性传播到解决方案；以及（3）在嘈杂的测量条件下进行。通过展示这种表述的优势，我们证明它严格泛化了加权残差法，这是一类中心PDE求解器的方法，包括配点法、有限体积法、伪光谱法和（广义）Galerkin方法，如有限元和谱方法。因此，这一类方法可以直接配备一个结构化的误差估计。总之，我们的结果使得机械模型可以作为模块化构建块无缝集成到概率模型中，模糊了数值分析和贝叶斯推断之间的界限。

更新时间: 2024-04-28 15:57:36

领域: cs.LG,cs.NA,math.NA,stat.ML

下载: http://arxiv.org/abs/2212.12474v6

From Persona to Personalization: A Survey on Role-Playing Language Agents

Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playing performance. RPLAs can mimic a wide range of personas, ranging from historical figures and fictional characters to real-life individuals. Consequently, they have catalyzed numerous AI applications, such as emotional companions, interactive video games, personalized assistants and copilots, and digital clones. In this paper, we conduct a comprehensive survey of this field, illustrating the evolution and recent progress in RPLAs integrating with cutting-edge LLM technologies. We categorize personas into three types: 1) Demographic Persona, which leverages statistical stereotypes; 2) Character Persona, focused on well-established figures; and 3) Individualized Persona, customized through ongoing user interactions for personalized services. We begin by presenting a comprehensive overview of current methodologies for RPLAs, followed by the details for each persona type, covering corresponding data sourcing, agent construction, and evaluation. Afterward, we discuss the fundamental risks, existing limitations, and future prospects of RPLAs. Additionally, we provide a brief review of RPLAs in AI applications, which reflects practical user demands that shape and drive RPLA research. Through this work, we aim to establish a clear taxonomy of RPLA research and applications, and facilitate future research in this critical and ever-evolving field, and pave the way for a future where humans and RPLAs coexist in harmony.

Updated: 2024-04-28 15:56:41

标题: 从角色扮演到个性化：关于角色扮演语言代理的调查

摘要: 最近大型语言模型（LLMs）的进展显著推动了角色扮演语言代理（RPLAs）的崛起，即专门设计用于模拟指定人物的AI系统。通过利用LLMs的多种先进能力，包括上下文学习、遵循指令和社交智能，RPLAs实现了出色的人类相似性和生动的角色扮演表现。RPLAs可以模仿各种人物，从历史人物和虚构角色到现实生活中的个人。因此，它们催生了许多AI应用，如情感伴侣、互动视频游戏、个性化助手和副驾驶员以及数字克隆。在本文中，我们对该领域进行了全面调研，展示了RPLAs与尖端LLM技术整合的演变和最新进展。我们将人物分为三种类型：1）人口统计角色，利用统计陈规；2）角色角色，专注于知名人物；和3）个性化角色，通过持续用户互动定制个性化服务。我们首先介绍了RPLAs的当前方法论概述，然后详细介绍每种人物类型，涵盖相应的数据来源、代理构建和评估。随后，我们讨论了RPLAs的基本风险、现有限制和未来前景。此外，我们对AI应用中的RPLAs进行了简要回顾，反映了塑造和推动RPLA研究的实际用户需求。通过这项工作，我们旨在建立RPLA研究和应用的清晰分类体系，并促进这一关键且不断发展的领域的未来研究，为人类和RPLAs和谐共存的未来铺平道路。

更新时间: 2024-04-28 15:56:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.18231v1

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Reasoners

Chain of Thought prompting strategy has enhanced the performance of Large Language Models (LLMs) across various NLP tasks. However, it still has shortcomings when dealing with complex reasoning tasks, including understanding errors, calculation errors and process errors (e.g., missing-step and hallucinations). Subsequently, our in-depth analyses among various error types show that deeply understanding the whole problem is critical in addressing complicated reasoning tasks. Motivated by this, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to enhance the LLMs' reasoning abilities. The core of our method is to encourage the LLMs to deeply understand the problems and leverage the key problem-solving information for better reasoning. Extensive experiments on 10 diverse reasoning benchmarks show that our DUP method consistently outperforms the other counterparts by a large margin. More encouragingly, DUP achieves a new SOTA result on the GSM8K benchmark, with an accuracy of 97.1% in a zero-shot setting.

Updated: 2024-04-28 15:55:52

标题: 在GSM8K上实现>97%准确率：深入理解问题使LLMs成为更好的推理者

摘要: 思维链推动策略已经提高了大型语言模型（LLMs）在各种自然语言处理任务中的性能。然而，在处理复杂推理任务时，它仍然存在一些缺点，包括理解错误、计算错误和流程错误（例如，缺失步骤和幻觉）。随后，我们对各种错误类型进行了深入分析，发现深入理解整个问题对于解决复杂推理任务至关重要。受此启发，我们提出了一种简单但有效的方法，即深入理解问题（DUP），以增强LLMs的推理能力。我们方法的核心是鼓励LLMs深入理解问题，并利用关键的问题解决信息进行更好的推理。在10个不同的推理基准数据集上进行了大量实验，结果显示我们的DUP方法始终明显优于其他对照组。更令人鼓舞的是，DUP在GSM8K基准数据集上取得了新的SOTA结果，在零样本设置下精度达到了97.1%。

更新时间: 2024-04-28 15:55:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.14963v2

TextGram: Towards a better domain-adaptive pretraining

For green AI, it is crucial to measure and reduce the carbon footprint emitted during the training of large language models. In NLP, performing pre-training on Transformer models requires significant computational resources. This pre-training involves using a large amount of text data to gain prior knowledge for performing downstream tasks. Thus, it is important that we select the correct data in the form of domain-specific data from this vast corpus to achieve optimum results aligned with our domain-specific tasks. While training on large unsupervised data is expensive, it can be optimized by performing a data selection step before pretraining. Selecting important data reduces the space overhead and the substantial amount of time required to pre-train the model while maintaining constant accuracy. We investigate the existing selection strategies and propose our own domain-adaptive data selection method - TextGram - that effectively selects essential data from large corpora. We compare and evaluate the results of finetuned models for text classification task with and without data selection. We show that the proposed strategy works better compared to other selection methods.

Updated: 2024-04-28 15:44:57

标题: TextGram：朝着更好的领域自适应预训练方向

摘要: 为了实现绿色人工智能，关键是要衡量并减少在训练大型语言模型时所排放的碳足迹。在自然语言处理领域，对Transformer模型进行预训练需要大量的计算资源。这种预训练涉及使用大量文本数据来获取执行下游任务所需的先验知识。因此，重要的是我们从这个庞大的语料库中选择正确的领域特定数据，以实现与我们领域特定任务对齐的最佳结果。虽然在大规模无监督数据上进行训练是昂贵的，但可以通过在预训练之前执行数据选择步骤来进行优化。选择重要数据可以减少空间开销和预训练模型所需的大量时间，同时保持恒定的准确性。我们调查了现有的选择策略，并提出了我们自己的领域自适应数据选择方法-TextGram，该方法有效地从大型语料库中选择关键数据。我们比较和评估了进行文本分类任务的微调模型的结果，其中包括数据选择和不包括数据选择。我们展示了所提出的策略相比其他选择方法效果更好。

更新时间: 2024-04-28 15:44:57

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.18228v1

A new method of modeling the multi-stage decision-making process of CRT using machine learning with uncertainty quantification

Aims. The purpose of this study is to create a multi-stage machine learning model to predict cardiac resynchronization therapy (CRT) response for heart failure (HF) patients. This model exploits uncertainty quantification to recommend additional collection of single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI) variables if baseline clinical variables and features from electrocardiogram (ECG) are not sufficient. Methods. 218 patients who underwent rest-gated SPECT MPI were enrolled in this study. CRT response was defined as an increase in left ventricular ejection fraction (LVEF) > 5% at a 6+-1 month follow-up. A multi-stage ML model was created by combining two ensemble models: Ensemble 1 was trained with clinical variables and ECG; Ensemble 2 included Ensemble 1 plus SPECT MPI features. Uncertainty quantification from Ensemble 1 allowed for multi-stage decision-making to determine if the acquisition of SPECT data for a patient is necessary. The performance of the multi-stage model was compared with that of Ensemble models 1 and 2. Results. The response rate for CRT was 55.5% (n = 121) with overall male gender 61.0% (n = 133), an average age of 62.0+-11.8, and LVEF of 27.7+-11.0. The multi-stage model performed similarly to Ensemble 2 (which utilized the additional SPECT data) with AUC of 0.75 vs. 0.77, accuracy of 0.71 vs. 0.69, sensitivity of 0.70 vs. 0.72, and specificity 0.72 vs. 0.65, respectively. However, the multi-stage model only required SPECT MPI data for 52.7% of the patients across all folds. Conclusions. By using rule-based logic stemming from uncertainty quantification, the multi-stage model was able to reduce the need for additional SPECT MPI data acquisition without sacrificing performance.

Updated: 2024-04-28 15:33:12

标题: 使用机器学习和不确定性量化对CRT多阶段决策过程进行建模的新方法

摘要: 目的。本研究的目的是创建一个多阶段机器学习模型，用于预测心力衰竭（HF）患者的心脏再同步治疗（CRT）反应。该模型利用不确定性量化来建议如果基线临床变量和来自心电图（ECG）的特征不足，则额外收集单光子发射计算机断层摄影心肌灌注显像（SPECT MPI）变量。方法。本研究纳入了接受静息门控SPECT MPI检查的218名患者。CRT反应被定义为在6+-1个月随访中左心室射血分数（LVEF）增加> 5％。通过结合两个集成模型创建了一个多阶段ML模型：集成1是用临床变量和ECG进行训练的；集成2包括集成1和SPECT MPI特征。集成1的不确定性量化允许进行多阶段决策，以确定是否需要为患者获取SPECT数据。将多阶段模型的性能与集成模型1和2进行了比较。结果。CRT的反应率为55.5％（n = 121），总体男性比例为61.0％（n = 133），平均年龄为62.0+-11.8，LVEF为27.7+-11.0。多阶段模型的表现与使用额外SPECT数据的集成2相似，AUC分别为0.75和0.77，准确率分别为0.71和0.69，敏感度分别为0.70和0.72，特异性分别为0.72和0.65。然而，多阶段模型仅在所有折叠中对52.7％的患者需要SPECT MPI数据。结论。通过使用源自不确定性量化的基于规则的逻辑，多阶段模型能够减少对额外SPECT MPI数据获取的需求，而不会牺牲性能。

更新时间: 2024-04-28 15:33:12

领域: cs.LG,eess.SP,physics.med-ph

下载: http://arxiv.org/abs/2309.08415v4

BUFF: Boosted Decision Tree based Ultra-Fast Flow matching

Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations are often quite challenging, even with the most advanced architectures. Based on the findings that tree-based models surpass the performance of deep learning models for tasks specific to tabular data, we adopt the very recent generative modeling class named conditional flow matching and employ different techniques to integrate the usage of Gradient Boosted Trees. The performances are evaluated for various tasks on different analysis level with several public datasets. We demonstrate the training and inference time of most high-level simulation tasks can achieve speedup by orders of magnitude. The application can be extended to low-level feature simulation and conditioned generations with competitive performance.

Updated: 2024-04-28 15:31:20

标题: BUFF：基于增强决策树的超快速流匹配

摘要: 表格数据作为高能物理中最常见的类型之一。与常见的均匀数据（如像素化图像）不同，模拟高维度表格数据并准确捕捉它们的相关性通常是非常具有挑战性的，即使使用最先进的架构也是如此。基于树模型优于深度学习模型在特定于表格数据的任务中的表现的发现，我们采用了最近的生成建模类别，名为条件流匹配，并采用不同的技术来整合梯度增强树的使用。对于不同分析级别的各种任务，我们评估了几个公共数据集的性能。我们展示了大多数高级模拟任务的训练和推断时间可以实现数量级的加速。该应用程序可以扩展到具有竞争性性能的低级特征模拟和条件生成。

更新时间: 2024-04-28 15:31:20

领域: physics.ins-det,cs.LG,hep-ex,hep-ph,physics.data-an

下载: http://arxiv.org/abs/2404.18219v1

L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi

The availability of text or topic classification datasets in the low-resource Marathi language is limited, typically consisting of fewer than 4 target labels, with some achieving nearly perfect accuracy. In this work, we introduce L3Cube-MahaNews, a Marathi text classification corpus that focuses on News headlines and articles. This corpus stands out as the largest supervised Marathi Corpus, containing over 1.05L records classified into a diverse range of 12 categories. To accommodate different document lengths, MahaNews comprises three supervised datasets specifically designed for short text, long documents, and medium paragraphs. The consistent labeling across these datasets facilitates document length-based analysis. We provide detailed data statistics and baseline results on these datasets using state-of-the-art pre-trained BERT models. We conduct a comparative analysis between monolingual and multilingual BERT models, including MahaBERT, IndicBERT, and MuRIL. The monolingual MahaBERT model outperforms all others on every dataset. These resources also serve as Marathi topic classification datasets or models and are publicly available at https://github.com/l3cube-pune/MarathiNLP .

Updated: 2024-04-28 15:20:45

标题: L3Cube-MahaNews：马拉地语基于新闻的短文本和长文档分类数据集

摘要: 马拉地语的低资源文本或主题分类数据集的可用性有限，通常包含少于4个目标标签，有些数据集几乎可以完美准确分类。在这项工作中，我们介绍了L3Cube-MahaNews，这是一个专注于新闻标题和文章的马拉地语文本分类语料库。这个语料库是最大的有监督马拉地语语料库，包含超过1.05L条记录，分类为12个不同的类别。为了适应不同的文档长度，MahaNews包括三个有监督数据集，专门设计用于短文本、长文档和中等段落。这些数据集之间的一致标记有助于基于文档长度的分析。我们使用最先进的预训练BERT模型在这些数据集上提供了详细的数据统计和基线结果。我们进行了单语和多语BERT模型之间的比较分析，包括MahaBERT、IndicBERT和MuRIL。单语MahaBERT模型在每个数据集上表现优于其他模型。这些资源也可作为马拉地语主题分类数据集或模型，并可在https://github.com/l3cube-pune/MarathiNLP 上公开获取。

更新时间: 2024-04-28 15:20:45

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.18216v1

Data Upcycling Knowledge Distillation for Image Super-Resolution

Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models. However, current KD methods for super-resolution (SR) networks overlook the nature of SR task that the outputs of the teacher model are noisy approximations to the ground-truth distribution of high-quality images (GT), which shades the teacher model's knowledge to result in limited KD effects. To utilize the teacher model beyond the GT upper-bound, we present the Data Upcycling Knowledge Distillation (DUKD), to transfer the teacher model's knowledge to the student model through the upcycled in-domain data derived from training data. Besides, we impose label consistency regularization to KD for SR by the paired invertible augmentations to improve the student model's performance and robustness. Comprehensive experiments demonstrate that the DUKD method significantly outperforms previous arts on several SR tasks.

Updated: 2024-04-28 15:19:15

标题: 数据升级知识蒸馏用于图像超分辨率

摘要: 知识蒸馏（KD）通过将繁琐的预训练教师模型中与任务相关的知识转移至紧凑的学生模型来压缩深度神经网络。然而，目前用于超分辨率（SR）网络的KD方法忽视了SR任务的本质，即教师模型的输出是高质量图像（GT）的噪声近似，这使得教师模型的知识变得有限，导致KD效果有限。为了利用教师模型超过GT上限的知识，我们提出了数据升级知识蒸馏（DUKD），通过从训练数据衍生的升级领域数据将教师模型的知识转移至学生模型。此外，我们通过成对可逆增强来对KD进行标签一致性正则化，以提高学生模型的性能和鲁棒性。全面的实验表明，DUKD方法在多个SR任务上明显优于先前的方法。

更新时间: 2024-04-28 15:19:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2309.14162v4

Contrastive Learning Method for Sequential Recommendation based on Multi-Intention Disentanglement

Sequential recommendation is one of the important branches of recommender system, aiming to achieve personalized recommended items for the future through the analysis and prediction of users' ordered historical interactive behaviors. However, along with the growth of the user volume and the increasingly rich behavioral information, how to understand and disentangle the user's interactive multi-intention effectively also poses challenges to behavior prediction and sequential recommendation. In light of these challenges, we propose a Contrastive Learning sequential recommendation method based on Multi-Intention Disentanglement (MIDCL). In our work, intentions are recognized as dynamic and diverse, and user behaviors are often driven by current multi-intentions, which means that the model needs to not only mine the most relevant implicit intention for each user, but also impair the influence from irrelevant intentions. Therefore, we choose Variational Auto-Encoder (VAE) to realize the disentanglement of users' multi-intentions, and propose two types of contrastive learning paradigms for finding the most relevant user's interactive intention, and maximizing the mutual information of positive sample pairs, respectively. Experimental results show that MIDCL not only has significant superiority over most existing baseline methods, but also brings a more interpretable case to the research about intention-based prediction and recommendation.

Updated: 2024-04-28 15:13:36

标题: 基于多意图解缠的序列推荐对比学习方法

摘要: 顺序推荐是推荐系统的重要分支之一，旨在通过分析和预测用户的有序历史互动行为，为未来实现个性化推荐物品。然而，随着用户数量的增长和行为信息的日益丰富，如何有效地理解和解开用户的互动多意图也对行为预测和顺序推荐提出挑战。针对这些挑战，我们提出了一种基于多意图解缠的对比学习顺序推荐方法（MIDCL）。在我们的工作中，意图被认为是动态和多样化的，用户行为通常受当前多意图驱动，这意味着模型不仅需要挖掘每个用户最相关的隐含意图，还需要削弱来自无关意图的影响。因此，我们选择变分自动编码器（VAE）实现用户多意图的解缠，并提出两种对比学习范式，分别用于找到最相关的用户互动意图，并最大化正样本对的互信息。实验结果表明，MIDCL不仅在大多数现有基准方法上具有显著优势，还为基于意图的预测和推荐研究带来了更具可解释性的案例。

更新时间: 2024-04-28 15:13:36

领域: cs.IR,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.18214v1

S$^2$Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification

Land cover analysis using hyperspectral images (HSI) remains an open problem due to their low spatial resolution and complex spectral information. Recent studies are primarily dedicated to designing Transformer-based architectures for spatial-spectral long-range dependencies modeling, which is computationally expensive with quadratic complexity. Selective structured state space model (Mamba), which is efficient for modeling long-range dependencies with linear complexity, has recently shown promising progress. However, its potential in hyperspectral image processing that requires handling numerous spectral bands has not yet been explored. In this paper, we innovatively propose S$^2$Mamba, a spatial-spectral state space model for hyperspectral image classification, to excavate spatial-spectral contextual features, resulting in more efficient and accurate land cover analysis. In S$^2$Mamba, two selective structured state space models through different dimensions are designed for feature extraction, one for spatial, and the other for spectral, along with a spatial-spectral mixture gate for optimal fusion. More specifically, S$^2$Mamba first captures spatial contextual relations by interacting each pixel with its adjacent through a Patch Cross Scanning module and then explores semantic information from continuous spectral bands through a Bi-directional Spectral Scanning module. Considering the distinct expertise of the two attributes in homogenous and complicated texture scenes, we realize the Spatial-spectral Mixture Gate by a group of learnable matrices, allowing for the adaptive incorporation of representations learned across different dimensions. Extensive experiments conducted on HSI classification benchmarks demonstrate the superiority and prospect of S$^2$Mamba. The code will be available at: https://github.com/PURE-melo/S2Mamba.

Updated: 2024-04-28 15:12:56

标题: S$^2$Mamba: 一种用于高光谱图像分类的空间-光谱状态空间模型

摘要: 利用高光谱图像（HSI）进行土地覆盖分析仍然是一个开放性问题，因为其空间分辨率低且光谱信息复杂。最近的研究主要致力于设计基于Transformer的架构，用于对空间-光谱长距离依赖关系进行建模，但这种方法计算复杂度高，具有二次复杂度。最近显示出了在线性复杂度下有效建模长距离依赖关系的选择性结构状态空间模型（Mamba）取得了有希望的进展。然而，其在需要处理大量光谱波段的高光谱图像处理中的潜力尚未被探索。本文中，我们创新地提出了S$^2$Mamba，这是一个用于高光谱图像分类的空间-光谱状态空间模型，以挖掘空间-光谱上下文特征，从而实现更高效和准确的土地覆盖分析。在S$^2$Mamba中，通过两个选择性结构状态空间模型设计了不同维度的特征提取，一个用于空间，另一个用于光谱，并配有用于最佳融合的空间-光谱混合门。具体来说，S$^2$Mamba首先通过Patch Cross Scanning模块与相邻像素进行交互来捕捉空间上下文关系，然后通过双向光谱扫描模块从连续光谱波段中探索语义信息。考虑到这两个属性在均匀和复杂纹理场景中的不同专业知识，我们通过一组可学习矩阵实现了空间-光谱混合门，允许跨不同维度学习到的表示的自适应整合。在HSI分类基准上进行的大量实验表明了S$^2$Mamba的优越性和前景。代码可在以下链接获取：https://github.com/PURE-melo/S2Mamba.

更新时间: 2024-04-28 15:12:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.18213v1

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Image editing has advanced significantly with the introduction of text-conditioned diffusion models. Despite this progress, seamlessly adding objects to images based on textual instructions without requiring user-provided input masks remains a challenge. We address this by leveraging the insight that removing objects (Inpaint) is significantly simpler than its inverse process of adding them (Paint), attributed to the utilization of segmentation mask datasets alongside inpainting models that inpaint within these masks. Capitalizing on this realization, by implementing an automated and extensive pipeline, we curate a filtered large-scale image dataset containing pairs of images and their corresponding object-removed versions. Using these pairs, we train a diffusion model to inverse the inpainting process, effectively adding objects into images. Unlike other editing datasets, ours features natural target images instead of synthetic ones; moreover, it maintains consistency between source and target by construction. Additionally, we utilize a large Vision-Language Model to provide detailed descriptions of the removed objects and a Large Language Model to convert these descriptions into diverse, natural-language instructions. We show that the trained model surpasses existing ones both qualitatively and quantitatively, and release the large-scale dataset alongside the trained models for the community.

Updated: 2024-04-28 15:07:53

标题: 用修补绘制：通过首先移除图像对象来学习添加对象

摘要: 随着基于文本条件扩散模型的引入，图像编辑已经取得了显著进展。尽管有这一进展，根据文本指令无缝地向图像中添加对象而无需用户提供输入掩模仍然是一个挑战。我们通过利用以下见解来解决这个问题：去除对象（修复）比添加对象（绘制）的过程要简单得多，这归因于在修复模型内修复的分割掩模数据集的利用。利用这一认识，通过实施一个自动化和广泛的流程，我们筛选了一个包含图像对和相应去除对象版本的大规模图像数据集。利用这些图像对，我们训练了一个扩散模型来逆转修复过程，有效地向图像中添加对象。与其他编辑数据集不同，我们的数据集使用自然目标图像而不是合成图像；此外，它通过构造保持源图像和目标图像之间的一致性。此外，我们利用一个大型视觉-语言模型来提供去除对象的详细描述，以及一个大型语言模型将这些描述转换为多样化、自然语言指令。我们展示了训练模型在质量和数量上均超过现有模型，并向社区发布了这个大规模数据集以及训练好的模型。

更新时间: 2024-04-28 15:07:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.18212v1

A survey of dynamic graph neural networks

Graph neural networks (GNNs) have emerged as a powerful tool for effectively mining and learning from graph-structured data, with applications spanning numerous domains. However, most research focuses on static graphs, neglecting the dynamic nature of real-world networks where topologies and attributes evolve over time. By integrating sequence modeling modules into traditional GNN architectures, dynamic GNNs aim to bridge this gap, capturing the inherent temporal dependencies of dynamic graphs for a more authentic depiction of complex networks. This paper provides a comprehensive review of the fundamental concepts, key techniques, and state-of-the-art dynamic GNN models. We present the mainstream dynamic GNN models in detail and categorize models based on how temporal information is incorporated. We also discuss large-scale dynamic GNNs and pre-training techniques. Although dynamic GNNs have shown superior performance, challenges remain in scalability, handling heterogeneous information, and lack of diverse graph datasets. The paper also discusses possible future directions, such as adaptive and memory-enhanced models, inductive learning, and theoretical analysis.

Updated: 2024-04-28 15:07:48

标题: 动态图神经网络调查

摘要: 图神经网络（GNNs）已经成为一种强大的工具，用于有效地挖掘和学习图结构化数据，应用领域涵盖了许多领域。然而，大多数研究集中在静态图上，忽视了现实世界网络的动态性质，其中拓扑结构和属性随时间演变。通过将序列建模模块集成到传统GNN架构中，动态GNN旨在弥合这一差距，捕获动态图的固有时间依赖性，以更真实地描绘复杂网络。本文全面回顾了基本概念、关键技术和最新动态GNN模型。我们详细介绍了主流动态GNN模型，并根据如何整合时间信息对模型进行分类。我们还讨论了大规模动态GNN和预训练技术。尽管动态GNN表现出优越性能，但在可伸缩性、处理异构信息和缺乏多样化图数据集方面仍存在挑战。本文还讨论了可能的未来方向，如自适应和记忆增强模型、归纳学习和理论分析。

更新时间: 2024-04-28 15:07:48

领域: cs.LG

下载: http://arxiv.org/abs/2404.18211v1

4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and evaluation purposes. As a result, related model development thus far often defaults to tabular approaches trained on ubiquitous single-table benchmarks, or on the relational side, graph-based alternatives such as GNNs applied to a completely different set of graph datasets devoid of tabular characteristics. To more precisely target RDBs lying at the nexus of these two complementary regimes, we explore a broad class of baseline models predicated on: (i) converting multi-table datasets into graphs using various strategies equipped with efficient subsampling, while preserving tabular characteristics; and (ii) trainable models with well-matched inductive biases that output predictions based on these input subgraphs. Then, to address the dearth of suitable public benchmarks and reduce siloed comparisons, we assemble a diverse collection of (i) large-scale RDB datasets and (ii) coincident predictive tasks. From a delivery standpoint, we operationalize the above four dimensions (4D) of exploration within a unified, scalable open-source toolbox called 4DBInfer. We conclude by presenting evaluations using 4DBInfer, the results of which highlight the importance of considering each such dimension in the design of RDB predictive models, as well as the limitations of more naive approaches such as simply joining adjacent tables. Our source code is released at https://github.com/awslabs/multi-table-benchmark .

Updated: 2024-04-28 15:04:54

标题: 4DBInfer：一种用于关系数据库上的基于图形的预测建模的4D基准测试工具箱

摘要: 尽管关系数据库(RDB)存储着分布在相互连接的表中的大量丰富、信息丰富的数据，但预测机器学习模型在应用于这些任务时的进展可能远远落后于其他领域的进步，如计算机视觉或自然语言处理。这种不足部分源于缺乏建立/公开的RDB基准，这些基准对于训练和评估目的至关重要。因此，迄今为止，相关模型开发往往默认采用在普遍单表基准上训练的表格方法，或者在关系方面，采用基于图的替代方案，如应用于完全不具有表格特征的图数据集的GNN。为了更准确地针对位于这两种互补制度交汇处的RDB，我们探索了一个基于广泛类基线模型的基础，该基础建立在以下两点上：(i)使用各种策略将多表数据集转换为图，并配备有效的子采样，同时保留表格特征；和(ii)具有良好匹配的归纳偏差的可训练模型，根据这些输入子图输出预测。然后，为了解决适当公开基准的不足和减少孤立的比较，我们组合了一个多样化的(ⅰ)大规模RDB数据集和(ⅱ)同时的预测任务。从交付的角度来看，我们在称为4DBInfer的统一、可扩展的开源工具箱中操作化了上述四个探索维度(4D)。最后，我们通过使用4DBInfer进行评估，并强调考虑在设计RDB预测模型时考虑每个维度的重要性，以及更为朴素方法的局限性，比如简单地连接相邻表。我们的源代码发布在https://github.com/awslabs/multi-table-benchmark。

更新时间: 2024-04-28 15:04:54

领域: cs.LG,cs.DB

下载: http://arxiv.org/abs/2404.18209v1

Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications

In recent times Large Language Models have exhibited tremendous capabilities, especially in the areas of mathematics, code generation and general-purpose reasoning. However for specialized domains especially in applications that require parsing and analyzing large chunks of numeric or tabular data even state-of-the-art (SOTA) models struggle. In this paper, we introduce a new approach to solving domain-specific tabular data analysis tasks by presenting a unique RAG workflow that mitigates the scalability issues of existing tabular LLM solutions. Specifically, we present Tabular Embedding Model (TEM), a novel approach to fine-tune embedding models for tabular Retrieval-Augmentation Generation (RAG) applications. Embedding models form a crucial component in the RAG workflow and even current SOTA embedding models struggle as they are predominantly trained on textual datasets and thus underperform in scenarios involving complex tabular data. The evaluation results showcase that our approach not only outperforms current SOTA embedding models in this domain but also does so with a notably smaller and more efficient model structure.

Updated: 2024-04-28 14:58:55

标题: 表格嵌入模型（TEM）：微调嵌入模型用于表格RAG应用

摘要: 最近，大型语言模型展现出了巨大的能力，特别是在数学、代码生成和通用推理领域。然而，在需要解析和分析大量数值或表格数据的专业领域中，即使是最先进的模型也会遇到困难。在本文中，我们介绍了一种新的方法来解决特定领域的表格数据分析任务，通过提出一种独特的RAG工作流程，缓解了现有表格LLM解决方案的可扩展性问题。具体地，我们提出了Tabular Embedding Model (TEM)，这是一种新颖的方法，用于微调嵌入模型，用于表格检索-增强生成（RAG）应用。嵌入模型是RAG工作流程中至关重要的组成部分，即使是当前最先进的嵌入模型也会遇到困难，因为它们主要是在文本数据集上训练的，因此在涉及复杂表格数据的情景下表现不佳。评估结果显示，我们的方法不仅在这一领域中超越了当前最先进的嵌入模型，而且使用了一个明显更小更高效的模型结构。

更新时间: 2024-04-28 14:58:55

领域: cs.AI,cs.CL,cs.IR

下载: http://arxiv.org/abs/2405.01585v1

DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design

Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when they share characteristics with the environments they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the implicit regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which have more control over the data generation mechanism. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.

Updated: 2024-04-28 14:56:45

标题: DRED：通过数据规范化环境设计在强化学习中的零迁移

摘要: 使用深度强化学习（RL）训练的自主代理通常缺乏成功泛化到新环境的能力，即使它们与训练过程中遇到的环境具有相似特征。在这项工作中，我们研究了个体环境实例或级别的抽样如何影响RL代理的零样本泛化（ZSG）能力。我们发现，对于共享基础层的深度演员评论体系结构，根据其价值损失优先级别最小化了代理内部表示和生成的训练数据中的训练级别集之间的互信息。这为某些自适应抽样策略实现的隐式正则化提供了新颖的理论证明。然后，我们将注意力转向无监督环境设计（UED）方法，这些方法对数据生成机制具有更多控制。我们发现现有的UED方法可以显着改变训练分布，从而导致低ZSG性能。为了防止过度拟合和分布转移，我们引入了数据正则化环境设计（DRED）。DRED使用经过训练的生成模型生成级别，减少了分布转移，并在自适应级别抽样策略和UED方法上实现了显著的ZSG改进。

更新时间: 2024-04-28 14:56:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.03479v2

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond.

Updated: 2024-04-28 14:47:09

标题: LMM-PCQA：利用LMM辅助点云质量评估

摘要: 尽管大型多模态模型（LMMs）已在各种质量评估研究中进行了广泛探索和应用，但它们在点云质量评估（PCQA）中的整合尚未被探索。鉴于LMM在低级视觉和质量评估任务中的卓越性能和稳健性，本研究旨在通过文本监督调查将PCQA知识传授给LMM。为实现这一目标，我们在微调阶段将质量标签转化为文本描述，使LMM能够从点云的2D投影中推导出质量评分。为弥补3D领域感知的损失，还提取了结构特征。然后将这些质量评分和结构特征结合并回归为质量分数。我们的实验结果证实了我们方法的有效性，展示了LMM与PCQA的新颖整合，提升了模型的理解和评估准确性。我们希望我们的贡献可以激发对LMM与PCQA融合的后续研究，促进3D视觉质量分析及其他方面的进步。

更新时间: 2024-04-28 14:47:09

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.18203v1

Learnable Linguistic Watermarks for Tracing Model Extraction Attacks on Large Language Models

In the rapidly evolving domain of artificial intelligence, safeguarding the intellectual property of Large Language Models (LLMs) is increasingly crucial. Current watermarking techniques against model extraction attacks, which rely on signal insertion in model logits or post-processing of generated text, remain largely heuristic. We propose a novel method for embedding learnable linguistic watermarks in LLMs, aimed at tracing and preventing model extraction attacks. Our approach subtly modifies the LLM's output distribution by introducing controlled noise into token frequency distributions, embedding an statistically identifiable controllable watermark.We leverage statistical hypothesis testing and information theory, particularly focusing on Kullback-Leibler Divergence, to differentiate between original and modified distributions effectively. Our watermarking method strikes a delicate well balance between robustness and output quality, maintaining low false positive/negative rates and preserving the LLM's original performance.

Updated: 2024-04-28 14:45:53

标题: 可学习的语言水印用于追踪对大型语言模型的模型提取攻击

摘要: 在人工智能这个快速发展的领域中，保护大型语言模型（LLMs）的知识产权变得越来越重要。目前针对模型提取攻击的水印技术，依赖于在模型logits中插入信号或对生成的文本进行后处理，仍然主要是启发式的。我们提出了一种新颖的方法，用于在LLMs中嵌入可学习的语言水印，旨在跟踪和防止模型提取攻击。我们的方法通过向令牌频率分布引入受控噪声，微妙地修改了LLM的输出分布，嵌入了一个统计上可识别的可控水印。我们利用统计假设检验和信息论，特别关注Kullback-Leibler Divergence，有效区分原始分布和修改后的分布。我们的水印方法在鲁棒性和输出质量之间取得了微妙的平衡，保持低的误报率和漏报率，同时保留了LLM的原始性能。

更新时间: 2024-04-28 14:45:53

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.01509v1

Simple Policy Optimization

PPO (Proximal Policy Optimization) algorithm has demonstrated excellent performance in many fields, and it is considered as a simple version of TRPO (Trust Region Policy Optimization) algorithm. However, the ratio clipping operation in PPO may not always effectively enforce the trust region constraints, this can be a potential factor affecting the stability of the algorithm. In this paper, we propose Simple Policy Optimization (SPO) algorithm, which introduces a novel clipping method for KL divergence between the old and current policies. Extensive experimental results in Atari 2600 environments indicate that, compared to the mainstream variants of PPO, SPO achieves better sample efficiency, extremely low KL divergence, and higher policy entropy, and is robust to the increase in network depth or complexity. More importantly, SPO maintains the simplicity of an unconstrained first-order algorithm. Our code is available at https://github.com/MyRepositories-hub/Simple-Policy-Optimization.

Updated: 2024-04-28 14:45:49

标题: 简单策略优化

摘要: PPO（Proximal Policy Optimization）算法在许多领域表现出色，被认为是TRPO（Trust Region Policy Optimization）算法的一个简化版本。然而，在PPO中的比率剪切操作可能并不总是有效地强制执行信任区域约束，这可能是影响算法稳定性的潜在因素。本文提出了Simple Policy Optimization（SPO）算法，引入了一种新颖的剪切方法，用于老策略和当前策略之间的KL散度。在Atari 2600环境中进行的大量实验结果表明，与PPO的主流变体相比，SPO实现了更好的样本效率，极低的KL散度和更高的策略熵，并且对网络深度或复杂性的增加具有鲁棒性。更重要的是，SPO保持了无约束一阶算法的简单性。我们的代码可在https://github.com/MyRepositories-hub/Simple-Policy-Optimization上找到。

更新时间: 2024-04-28 14:45:49

领域: cs.LG

下载: http://arxiv.org/abs/2401.16025v5

WorldGPT: Empowering LLM as Multimodal World Model

World models are progressively being employed across diverse fields, extending from basic environment simulation to complex scenario construction. However, existing models are mainly trained on domain-specific states and actions, and confined to single-modality state representations. In this paper, We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM). WorldGPT acquires an understanding of world dynamics through analyzing millions of videos across various domains. To further enhance WorldGPT's capability in specialized scenarios and long-term tasks, we have integrated it with a novel cognitive architecture that combines memory offloading, knowledge retrieval, and context reflection. As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios. Conducting evaluations on WorldNet directly demonstrates WorldGPT's capability to accurately model state transition patterns, affirming its effectiveness in understanding and predicting the dynamics of complex scenarios. We further explore WorldGPT's emerging potential in serving as a world simulator, helping multimodal agents generalize to unfamiliar domains through efficiently synthesising multimodal instruction instances which are proved to be as reliable as authentic data for fine-tuning purposes. The project is available on \url{https://github.com/DCDmllm/WorldGPT}.

Updated: 2024-04-28 14:42:02

标题: WorldGPT：将LLM作为多模态世界模型进行赋能

摘要: 世界模型正在逐渐被应用于各个领域，从基本环境模拟到复杂场景构建。然而，现有模型主要是在特定领域的状态和动作上进行训练，并且局限于单模态状态表示。在本文中，我们介绍了WorldGPT，这是一个基于多模态大型语言模型（MLLM）构建的通用世界模型。WorldGPT通过分析跨多个领域的数百万个视频来获得对世界动态的理解。为了进一步增强WorldGPT在专门场景和长期任务中的能力，我们将其与一种结合了记忆卸载、知识检索和上下文反思的新型认知架构进行了整合。至于评估，我们构建了WorldNet，一个涵盖各种实际场景的多模态状态转移预测基准。在WorldNet上进行评估直接展示了WorldGPT准确建模状态转移模式的能力，确认了其在理解和预测复杂场景动态方面的有效性。我们进一步探讨了WorldGPT在作为世界模拟器方面的新兴潜力，通过高效合成多模态指令实例来帮助多模态代理在陌生领域中泛化，这些实例被证明与真实数据一样可靠，可用于微调目的。该项目可在\url{https://github.com/DCDmllm/WorldGPT}上找到。

更新时间: 2024-04-28 14:42:02

领域: cs.AI,cs.MM

下载: http://arxiv.org/abs/2404.18202v1

Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting

Spatiotemporal forecasting techniques are significant for various domains such as transportation, energy, and weather. Accurate prediction of spatiotemporal series remains challenging due to the complex spatiotemporal heterogeneity. In particular, current end-to-end models are limited by input length and thus often fall into spatiotemporal mirage, i.e., similar input time series followed by dissimilar future values and vice versa. To address these problems, we propose a novel self-supervised pre-training framework Spatial-Temporal-Decoupled Masked Pre-training (STD-MAE) that employs two decoupled masked autoencoders to reconstruct spatiotemporal series along the spatial and temporal dimensions. Rich-context representations learned through such reconstruction could be seamlessly integrated by downstream predictors with arbitrary architectures to augment their performances. A series of quantitative and qualitative evaluations on six widely used benchmarks (PEMS03, PEMS04, PEMS07, PEMS08, METR-LA, and PEMS-BAY) are conducted to validate the state-of-the-art performance of STD-MAE. Codes are available at https://github.com/Jimmy-7664/STD-MAE.

Updated: 2024-04-28 14:40:48

标题: 时空解耦蒙版预训练用于时空预测

摘要: 时空预测技术在交通、能源和天气等各个领域都具有重要意义。由于复杂的时空异质性，对时空序列的准确预测仍然具有挑战性。特别是，当前的端到端模型受输入长度限制，因此往往会陷入时空幻觉，即类似的输入时间序列后跟不同的未来值，反之亦然。为了解决这些问题，我们提出了一种新颖的自监督预训练框架Spatial-Temporal-Decoupled Masked Pre-training (STD-MAE)，该框架采用两个解耦的掩码自动编码器来重建沿空间和时间维度的时空序列。通过这种重建学习到的丰富上下文表示可以无缝集成到下游预测器中，以增强它们的性能。我们对六个广泛使用的基准数据集（PEMS03、PEMS04、PEMS07、PEMS08、METR-LA和PEMS-BAY）进行了一系列定量和定性评估，以验证STD-MAE的最先进性能。代码可在https://github.com/Jimmy-7664/STD-MAE获取。

更新时间: 2024-04-28 14:40:48

领域: cs.LG

下载: http://arxiv.org/abs/2312.00516v3

Permutation-equivariant quantum convolutional neural networks

The Symmetric group $S_{n}$ manifests itself in large classes of quantum systems as the invariance of certain characteristics of a quantum state with respect to permuting the qubits. The subgroups of $S_{n}$ arise, among many other contexts, to describe label symmetry of classical images with respect to spatial transformations, e.g. reflection or rotation. Equipped with the formalism of geometric quantum machine learning, in this work we propose the architectures of equivariant quantum convolutional neural networks (EQCNNs) adherent to $S_{n}$ and its subgroups. We demonstrate that a careful choice of pixel-to-qubit embedding order can facilitate easy construction of EQCNNs for small subgroups of $S_{n}$. Our novel EQCNN architecture corresponding to the full permutation group $S_{n}$ is built by applying all possible QCNNs with equal probability, which can also be conceptualized as a dropout strategy in quantum neural networks. For subgroups of $S_{n}$, our numerical results using MNIST datasets show better classification accuracy than non-equivariant QCNNs. The $S_{n}$-equivariant QCNN architecture shows significantly improved training and test performance than non-equivariant QCNN for classification of connected and non-connected graphs. When trained with sufficiently large number of data, the $S_{n}$-equivariant QCNN shows better average performance compared to $S_{n}$-equivariant QNN . These results contribute towards building powerful quantum machine learning architectures in permutation-symmetric systems.

Updated: 2024-04-28 14:34:28

标题: 置换等变量子卷积神经网络

摘要: 对称群$S_{n}$在大类量子系统中表现出来，即量子状态的某些特征对于对量子比特进行置换具有不变性。$S_{n}$的子群在许多其他背景下出现，用来描述经空间转换（如反射或旋转）的经典图像的标签对称性。在几何量子机器学习的形式主义下，本文提出了与$S_{n}$及其子群一致的等变量子卷积神经网络（EQCNNs）的架构。我们证明了对于$S_{n}$的小子群，精心选择的像素到量子比特嵌入顺序可以便于构建EQCNNs。我们提出的新颖的对应于全排列群$S_{n}$的EQCNN架构是通过应用所有可能的QCNNs并且具有相等概率构建的，这也可以被概念化为量子神经网络中的一种丢弃策略。对于$S_{n}$的子群，我们的数值结果使用MNIST数据集显示出比非等变QCNNs更好的分类准确性。$S_{n}$-等变QCNN架构表现出比非等变QCNN更好的训练和测试性能，用于连接和非连接图的分类。当用足够多的数据进行训练时，$S_{n}$-等变QCNN表现出比$S_{n}$-等变QNN更好的平均性能。这些结果有助于在置换对称系统中建立强大的量子机器学习架构。

更新时间: 2024-04-28 14:34:28

领域: quant-ph,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.18198v1

Multimodality Invariant Learning for Multimedia-Based New Item Recommendation

Multimedia-based recommendation provides personalized item suggestions by learning the content preferences of users. With the proliferation of digital devices and APPs, a huge number of new items are created rapidly over time. How to quickly provide recommendations for new items at the inference time is challenging. What's worse, real-world items exhibit varying degrees of modality missing(e.g., many short videos are uploaded without text descriptions). Though many efforts have been devoted to multimedia-based recommendations, they either could not deal with new multimedia items or assumed the modality completeness in the modeling process. In this paper, we highlight the necessity of tackling the modality missing issue for new item recommendation. We argue that users' inherent content preference is stable and better kept invariant to arbitrary modality missing environments. Therefore, we approach this problem from a novel perspective of invariant learning. However, how to construct environments from finite user behavior training data to generalize any modality missing is challenging. To tackle this issue, we propose a novel Multimodality Invariant Learning reCommendation(a.k.a. MILK) framework. Specifically, MILK first designs a cross-modality alignment module to keep semantic consistency from pretrained multimedia item features. After that, MILK designs multi-modal heterogeneous environments with cyclic mixup to augment training data, in order to mimic any modality missing for invariant user preference learning. Extensive experiments on three real datasets verify the superiority of our proposed framework. The code is available at https://github.com/HaoyueBai98/MILK.

Updated: 2024-04-28 14:29:09

标题: 多模态不变学习用于基于多媒体的新项目推荐

摘要: 多媒体推荐通过学习用户的内容偏好提供个性化的物品建议。随着数字设备和应用程序的不断增加，随着时间的推移，大量新项目迅速创建。如何在推理时间快速提供新项目的建议是具有挑战性的。更糟糕的是，真实世界的项目展示不同程度的模态缺失（例如，许多短视频上传时没有文本描述）。尽管许多努力已经投入到基于多媒体的推荐中，但它们要么无法处理新的多媒体项目，要么在建模过程中假定模态完整性。在本文中，我们强调了解决新项目推荐中模态缺失问题的必要性。我们认为用户固有的内容偏好是稳定的，并且最好保持不变以适应任意模态缺失环境。因此，我们从不变学习的新颖角度解决了这个问题。然而，如何从有限的用户行为训练数据中构建环境以概括任何模态缺失是具有挑战性的。为了解决这个问题，我们提出了一个新颖的多模态不变学习推荐框架（即MILK）。具体而言，MILK首先设计了一个跨模态对齐模块，以保持来自预训练多媒体项目特征的语义一致性。之后，MILK设计了多模态异构环境，并通过循环混合增强训练数据，以模仿任何模态缺失以进行不变用户偏好学习。在三个真实数据集上进行的大量实验证实了我们提出的框架的优越性。代码可在https://github.com/HaoyueBai98/MILK上找到。

更新时间: 2024-04-28 14:29:09

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.15783v1

A General Causal Inference Framework for Cross-Sectional Observational Data

Causal inference methods for observational data are highly regarded due to their wide applicability. While there are already numerous methods available for de-confounding bias, these methods generally assume that covariates consist solely of confounders or make naive assumptions about the covariates. Such assumptions face challenges in both theory and practice, particularly when dealing with high-dimensional covariates. Relaxing these naive assumptions and identifying the confounding covariates that truly require correction can effectively enhance the practical significance of these methods. Therefore, this paper proposes a General Causal Inference (GCI) framework specifically designed for cross-sectional observational data, which precisely identifies the key confounding covariates and provides corresponding identification algorithm. Specifically, based on progressive derivations of the Markov property on Directed Acyclic Graph, we conclude that the key confounding covariates are equivalent to the common root ancestors of the treatment and the outcome variable. Building upon this conclusion, the GCI framework is composed of a novel Ancestor Set Identification (ASI) algorithm and de-confounding inference methods. Firstly, the ASI algorithm is theoretically supported by the conditional independence properties and causal asymmetry between variables, enabling the identification of key confounding covariates. Subsequently, the identified confounding covariates are used in the de-confounding inference methods to obtain unbiased causal effect estimation, which can support informed decision-making. Extensive experiments on synthetic datasets demonstrate that the GCI framework can effectively identify the critical confounding covariates and significantly improve the precision, stability, and interpretability of causal inference in observational studies.

Updated: 2024-04-28 14:26:27

标题: 一种适用于横断面观测数据的普遍因果推断框架

摘要: 对于观测数据的因果推断方法备受推崇，因为它们具有广泛的适用性。尽管已经有许多可用于去卷积偏差的方法，但这些方法通常假定协变量仅包含混杂因素，或者对协变量做出天真的假设。这些假设在理论和实践中都面临挑战，特别是在处理高维协变量时。放宽这些天真的假设，确定真正需要修正的混杂协变量，可以有效地提升这些方法的实际意义。因此，本文提出了一种专门针对横断面观测数据设计的通用因果推断（GCI）框架，该框架能够精确识别关键的混杂协变量，并提供相应的识别算法。具体而言，基于对有向无环图上马尔可夫性质的逐步推导，我们得出结论，关键的混杂协变量等同于治疗和结果变量的公共根祖先。基于这一结论，GCI框架由一种新颖的祖先集识别（ASI）算法和去卷积推断方法组成。首先，ASI算法在理论上受到条件独立性属性和变量之间因果不对称性的支持，使其能够识别关键的混杂协变量。随后，识别出的混杂协变量被用于去卷积推断方法中，以获得无偏的因果效应估计，从而支持知情决策。对合成数据集进行的广泛实验表明，GCI框架能够有效地识别关键的混杂协变量，并显著提高观测研究中因果推断的精确性、稳定性和可解释性。

更新时间: 2024-04-28 14:26:27

领域: stat.ME,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.18197v1

Exploring the Robustness of In-Context Learning with Noisy Labels

Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspired by prior research that studies ICL ability using simple function classes, we take a closer look at this problem by investigating the robustness of Transformers against noisy labels. Specifically, we first conduct a thorough evaluation and analysis of the robustness of Transformers against noisy labels during in-context learning and show that they exhibit notable resilience against diverse types of noise in demonstration labels. Furthermore, we delve deeper into this problem by exploring whether introducing noise into the training set, akin to a form of data augmentation, enhances such robustness during inference, and find that such noise can indeed improve the robustness of ICL. Overall, our fruitful analysis and findings provide a comprehensive understanding of the resilience of Transformer models against label noises during ICL and provide valuable insights into the research on Transformers in natural language processing. Our code is available at https://github.com/InezYu0928/in-context-learning.

Updated: 2024-04-28 14:05:23

标题: 探索具有噪声标签的上下文学习的鲁棒性

摘要: 最近，Transformer架构展示的神秘的上下文学习（ICL）能力，特别是在大型语言模型（LLMs）中，引起了重要的研究兴趣。然而，在训练语料库和提示演示中普遍存在的嘈杂样本的情况下，Transformers的上下文学习能力的韧性仍未得到充分探讨。在本文中，受先前研究使用简单函数类研究ICL能力的启发，我们通过调查Transformer对嘈杂标签的韧性来更深入地研究这个问题。具体来说，我们首先对Transformer在上下文学习过程中对嘈杂标签的韧性进行了彻底评估和分析，并展示它们对演示标签中不同类型的噪声表现出显著的韧性。此外，我们深入探讨了这个问题，探讨是否将噪声引入训练集，类似于一种数据增强的形式，能够增强在推理过程中的这种韧性，并发现这种噪声确实可以提高ICL的韧性。总的来说，我们丰富的分析和发现为理解Transformer模型在ICL过程中抵抗标签噪声的韧性提供了全面的理解，并为自然语言处理中的Transformer研究提供了有价值的见解。我们的代码可以在https://github.com/InezYu0928/in-context-learning找到。

更新时间: 2024-04-28 14:05:23

领域: cs.CL,cs.AI,cs.CR,cs.LG,math.OC

下载: http://arxiv.org/abs/2404.18191v1

Naive Bayes Classifiers and One-hot Encoding of Categorical Variables

This paper investigates the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding, when using a Na\"{\i}ve Bayes classifier. This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Na\"{\i}ve Bayes classifier. The differences between the two classifiers are analysed mathematically and experimentally. In our experiments using probability vectors drawn from a Dirichlet distribution, the two classifiers are found to agree on the maximum a posteriori class label for most cases, although the posterior probabilities are usually greater for the PoB case.

Updated: 2024-04-28 14:04:58

标题: 朴素贝叶斯分类器和分类变量的独热编码

摘要: 本文研究了在使用朴素贝叶斯分类器时，将$K$值分类变量错误地编码为$K$位的独热编码的后果。这导致了一个伯努利乘积（PoB）假设，而不是正确的分类朴素贝叶斯分类器。对这两种分类器之间的差异进行了数学和实验分析。在我们使用从狄利克雷分布中抽取的概率向量进行实验时，大多数情况下发现这两种分类器在最大后验类别标签上达成一致，尽管后验概率通常对于PoB情况更大。

更新时间: 2024-04-28 14:04:58

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2404.18190v1

Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors

Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of the differences between architectures and that pretraining with standard denoising objectives, using $\textit{only the downstream task data}$, leads to dramatic gains across multiple architectures and to very small gaps between Transformers and state space models (SSMs). In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained, and we improve the best reported results of SSMs on the PathX-256 task by 20 absolute points. Subsequently, we analyze the utility of previously-proposed structured parameterizations for SSMs and show they become mostly redundant in the presence of data-driven initialization obtained through pretraining. Our work shows that, when evaluating different architectures on supervised tasks, incorporation of data-driven priors via pretraining is essential for reliable performance estimation, and can be done efficiently.

Updated: 2024-04-28 13:52:55

标题: 永远不要从头开始训练：长序列模型的公平比较需要数据驱动的先验知识

摘要: 建模跨序列的长程依赖关系是机器学习中长期以来的目标，已经导致了一些架构，如状态空间模型，这些架构在长序列上明显优于Transformers。然而，这些令人印象深刻的实证收益大多在基准测试中（例如Long Range Arena）得到了证明，其中模型是随机初始化并训练以从输入序列中预测目标标签。在这项工作中，我们展示了随机初始化导致对架构之间差异的严重高估，并且使用仅下游任务数据的标准去噪目标进行预训练，导致多个架构之间的显著收益，并且在Transformers和状态空间模型（SSMs）之间的差距非常小。与以往的研究形成鲜明对比的是，我们发现当适当进行预训练时，普通的Transformers可以与Long Range Arena上的S4的表现匹敌，并且我们将PathX-256任务的SSM的最佳报告结果提高了20个绝对分数。随后，我们分析了先前提出的用于SSM的结构化参数化的实用性，并展示了在通过预训练获得的数据驱动初始化的情况下，它们在很大程度上变得多余。我们的工作表明，在评估不同架构在监督任务上的表现时，通过预训练加入数据驱动的先验是可靠性能估计的重要因素，而且可以高效地完成。

更新时间: 2024-04-28 13:52:55

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2310.02980v4

Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of \textbf{9.7}$\%$ within the Stanford Cars dataset and \textbf{12$\times$} clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at \url{https://github.com/xjtuYW/PNP.git}.

Updated: 2024-04-28 13:49:54

标题: 超越已知聚类：探索高效泛化类发现的新原型

摘要: 广义类别发现（GCD）旨在动态地为无标签数据分配标签，部分基于从已标记数据中学到的知识，其中无标签数据可能来自已知或新领域。目前的方法通常涉及跨所有数据的聚类，并通过原型对比学习来学习概念。然而，现有方法很大程度上依赖于聚类算法的性能，因此受到其固有限制。首先，估计的聚类数量通常小于基本事实，使现有方法缺乏全面概念学习的原型。为解决这个问题，我们提出了一种自适应探测机制，引入可学习的潜在原型来扩展聚类原型（中心）。由于潜在原型没有基本事实，我们开发了一个自监督原型学习框架，以端到端方式优化潜在原型。其次，聚类计算密集，传统的对标记和无标签实例进行聚类的策略加剧了这个问题。为了对抗这种低效率，我们选择仅对无标签实例进行聚类，然后通过引入的潜在原型扩展聚类原型，快速探索新领域。尽管我们提出的方法简单，但对各种数据集进行了广泛的实证分析，证实我们的方法始终提供最先进的结果。具体而言，在斯坦福汽车数据集中，我们的方法超过最近的竞争对手\textbf{9.7}$\%$，在植物标本馆19数据集中，聚类效率提高了\textbf{12$\times$}。我们将在\url{https://github.com/xjtuYW/PNP.git}公开发布代码和检查点。

更新时间: 2024-04-28 13:49:54

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.08995v3

Ranked List Truncation for Large Language Model-based Re-Ranking

We study ranked list truncation (RLT) from a novel "retrieve-then-re-rank" perspective, where we optimize re-ranking by truncating the retrieved list (i.e., trim re-ranking candidates). RLT is crucial for re-ranking as it can improve re-ranking efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis. It also has the potential to improve re-ranking effectiveness. Despite its importance, there is limited research into applying RLT methods to this new perspective. To address this research gap, we reproduce existing RLT methods in the context of re-ranking, especially newly emerged large language model (LLM)-based re-ranking. In particular, we examine to what extent established findings on RLT for retrieval are generalizable to the "retrieve-then-re-rank" setup from three perspectives: (i) assessing RLT methods in the context of LLM-based re-ranking with lexical first-stage retrieval, (ii) investigating the impact of different types of first-stage retrievers on RLT methods, and (iii) investigating the impact of different types of re-rankers on RLT methods. We perform experiments on the TREC 2019 and 2020 deep learning tracks, investigating 8 RLT methods for pipelines involving 3 retrievers and 2 re-rankers. We reach new insights into RLT methods in the context of re-ranking.

Updated: 2024-04-28 13:39:33

标题: 基于大型语言模型的重新排序的排名列表截断

摘要: 我们从一个新颖的“检索-然后重新排名”的角度研究了排名列表截断（RLT），在这种角度下，我们通过截断检索列表（即修剪重新排名候选人）来优化重新排名。RLT对重新排名至关重要，因为它可以通过在每个查询基础上向重新排名器发送可变长度的候选人列表来提高重新排名效率。它还有可能提高重新排名效果。尽管其重要性，但将RLT方法应用于这种新视角的研究有限。为了填补这一研究空白，我们在重新排名的背景下重现现有的RLT方法，特别是新兴的基于大语言模型（LLM）的重新排名。具体来说，我们研究了关于检索的RLT建议在哪些程度上可以推广到“检索-然后重新排名”设置的三个角度：（i）评估在具有词汇第一阶段检索的LLM基础重新排名背景下的RLT方法，（ii）调查不同类型第一阶段检索器对RLT方法的影响，以及（iii）调查不同类型重新排名器对RLT方法的影响。我们在TREC 2019和2020深度学习赛道上进行实验，研究了涉及3个检索器和2个重新排名器的管道中的8个RLT方法。我们在重新排名的背景下对RLT方法达到了新的见解。

更新时间: 2024-04-28 13:39:33

领域: cs.IR,cs.AI,cs.CL,cs.LG,H.3.3

下载: http://arxiv.org/abs/2404.18185v1

Innovative Application of Artificial Intelligence Technology in Bank Credit Risk Management

With the rapid growth of technology, especially the widespread application of artificial intelligence (AI) technology, the risk management level of commercial banks is constantly reaching new heights. In the current wave of digitalization, AI has become a key driving force for the strategic transformation of financial institutions, especially the banking industry. For commercial banks, the stability and safety of asset quality are crucial, which directly relates to the long-term stable growth of the bank. Among them, credit risk management is particularly core because it involves the flow of a large amount of funds and the accuracy of credit decisions. Therefore, establishing a scientific and effective credit risk decision-making mechanism is of great strategic significance for commercial banks. In this context, the innovative application of AI technology has brought revolutionary changes to bank credit risk management. Through deep learning and big data analysis, AI can accurately evaluate the credit status of borrowers, timely identify potential risks, and provide banks with more accurate and comprehensive credit decision support. At the same time, AI can also achieve realtime monitoring and early warning, helping banks intervene before risks occur and reduce losses.

Updated: 2024-04-28 13:29:35

标题: 银行信用风险管理中人工智能技术的创新应用

摘要: 随着技术的快速增长，特别是人工智能（AI）技术的广泛应用，商业银行的风险管理水平不断达到新的高度。在当前数字化浪潮中，AI已成为金融机构特别是银行业战略转型的关键驱动力。对于商业银行来说，资产质量的稳定和安全至关重要，这直接关系到银行的长期稳定增长。其中，信贷风险管理尤为核心，因为它涉及大量资金流动和信贷决策的准确性。因此，建立科学有效的信贷风险决策机制对商业银行具有重大战略意义。在这种背景下，AI技术的创新应用为银行信贷风险管理带来了革命性变化。通过深度学习和大数据分析，AI能够准确评估借款人的信用状况，及时识别潜在风险，并为银行提供更准确全面的信贷决策支持。同时，AI还能实现实时监控和提前预警，帮助银行在风险发生之前进行干预，减少损失。

更新时间: 2024-04-28 13:29:35

领域: q-fin.RM,cs.AI

下载: http://arxiv.org/abs/2404.18183v1

Exploring Weight Balancing on Long-Tailed Recognition Problem

Recognition problems in long-tailed data, in which the sample size per class is heavily skewed, have gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various methods have been devised to address these problems.Recently, weight balancing, which combines well-known classical regularization techniques with two-stage training, has been proposed. Despite its simplicity, it is known for its high performance compared with existing methods devised in various ways. However, there is a lack of understanding as to why this method is effective for long-tailed data. In this study, we analyze weight balancing by focusing on neural collapse and the cone effect at each training stage and found that it can be decomposed into an increase in Fisher's discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis enables the training method to be further simplified by reducing the number of training stages to one while increasing accuracy. Code is available at https://github.com/HN410/Exploring-Weight-Balancing-on-Long-Tailed-Recognition-Problem.

Updated: 2024-04-28 13:28:08

标题: 探索长尾识别问题中的权重平衡

摘要: 长尾数据中的识别问题越来越重要，其中每个类别的样本量严重倾斜，因为数据集中每个类别的样本量分布通常是指数分布，除非有意进行样本量调整。已经设计了各种方法来解决这些问题。最近，提出了权重平衡方法，它将众所周知的经典正则化技术与两阶段训练相结合。尽管这种方法简单，但与现有各种方法相比，其性能较高。然而，对于这种方法为何对长尾数据有效缺乏理解。在这项研究中，我们通过关注每个训练阶段的神经坍塌和锥效应来分析权重平衡，并发现它可以分解为由于权重衰减和交叉熵损失引起的特征提取器的Fisher判别比增加，以及由于权重衰减和类平衡损失引起的隐式对数调整。我们的分析使得训练方法可以通过减少训练阶段的数量而增加准确性而进一步简化。源代码可在https://github.com/HN410/Exploring-Weight-Balancing-on-Long-Tailed-Recognition-Problem找到。

更新时间: 2024-04-28 13:28:08

领域: cs.LG

下载: http://arxiv.org/abs/2305.16573v7

Assessing Image Quality Using a Simple Generative Representation

Perceptual image quality assessment (IQA) is the task of predicting the visual quality of an image as perceived by a human observer. Current state-of-the-art techniques are based on deep representations trained in discriminative manner. Such representations may ignore visually important features, if they are not predictive of class labels. Recent generative models successfully learn low-dimensional representations using auto-encoding and have been argued to preserve better visual features. Here we leverage existing auto-encoders and propose VAE-QA, a simple and efficient method for predicting image quality in the presence of a full-reference. We evaluate our approach on four standard benchmarks and find that it significantly improves generalization across datasets, has fewer trainable parameters, a smaller memory footprint and faster run time.

Updated: 2024-04-28 13:18:47

标题: 使用简单生成表示评估图像质量

摘要: 知觉图像质量评估（IQA）是预测图像在人类观察者眼中的视觉质量的任务。当前最先进的技术基于以区分方式训练的深度表示。如果这些表示不预测类标签，则可能会忽略视觉重要特征。最近的生成模型成功地使用自动编码学习低维表示，并据称能够更好地保留视觉特征。在这里，我们利用现有的自动编码器，提出了VAE-QA，这是一种简单高效的方法，用于在完全参考的情况下预测图像质量。我们在四个标准基准上评估了我们的方法，并发现它显著提高了跨数据集的泛化能力，具有更少的可训练参数，更小的内存占用和更快的运行时间。

更新时间: 2024-04-28 13:18:47

领域: eess.IV,cs.AI,cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2404.18178v1

Collaborative Pareto Set Learning in Multiple Multi-Objective Optimization Problems

Pareto Set Learning (PSL) is an emerging research area in multi-objective optimization, focusing on training neural networks to learn the mapping from preference vectors to Pareto optimal solutions. However, existing PSL methods are limited to addressing a single Multi-objective Optimization Problem (MOP) at a time. When faced with multiple MOPs, this limitation results in significant inefficiencies and hinders the ability to exploit potential synergies across varying MOPs. In this paper, we propose a Collaborative Pareto Set Learning (CoPSL) framework, which learns the Pareto sets of multiple MOPs simultaneously in a collaborative manner. CoPSL particularly employs an architecture consisting of shared and MOP-specific layers. The shared layers are designed to capture commonalities among MOPs collaboratively, while the MOP-specific layers tailor these general insights to generate solution sets for individual MOPs. This collaborative approach enables CoPSL to efficiently learn the Pareto sets of multiple MOPs in a single execution while leveraging the potential relationships among various MOPs. To further understand these relationships, we experimentally demonstrate that shareable representations exist among MOPs. Leveraging these shared representations effectively improves the capability to approximate Pareto sets. Extensive experiments underscore the superior efficiency and robustness of CoPSL in approximating Pareto sets compared to state-of-the-art approaches on a variety of synthetic and real-world MOPs. Code is available at https://github.com/ckshang/CoPSL.

Updated: 2024-04-28 13:14:57

标题: 多目标优化问题中的协同帕累托集学习

摘要: 帕累托集学习（PSL）是多目标优化中一个新兴的研究领域，侧重于训练神经网络学习从偏好向量到帕累托最优解的映射。然而，现有的PSL方法只能处理一个多目标优化问题（MOP）。当面对多个MOP时，这种限制导致显著的低效性，并妨碍了利用不同MOP之间潜在协同关系的能力。在本文中，我们提出了一个协作式帕累托集学习（CoPSL）框架，以协作方式同时学习多个MOP的帕累托集。CoPSL特别采用了一个由共享层和MOP特定层组成的架构。共享层旨在协同捕捉MOP之间的共同特点，而MOP特定层则根据这些通用见解生成各个MOP的解集。这种协作方法使CoPSL能够在单次执行中高效学习多个MOP的帕累托集，同时利用各种MOP之间的潜在关系。为了进一步理解这些关系，我们通过实验证明MOP之间存在可共享的表示。有效利用这些共享表示可以提高逼近帕累托集的能力。大量实验强调了CoPSL在逼近帕累托集方面相对于最先进方法的优越效率和稳健性，涵盖了各种合成和真实世界的MOP。代码可在https://github.com/ckshang/CoPSL获得。

更新时间: 2024-04-28 13:14:57

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2404.01224v2

Mamba-FETrack: Frame-Event Tracking via State Space Model

RGB-Event based tracking is an emerging research topic, focusing on how to effectively integrate heterogeneous multi-modal data (synchronized exposure video frames and asynchronous pulse Event stream). Existing works typically employ Transformer based networks to handle these modalities and achieve decent accuracy through input-level or feature-level fusion on multiple datasets. However, these trackers require significant memory consumption and computational complexity due to the use of self-attention mechanism. This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM) to achieve high-performance tracking while effectively reducing computational costs and realizing more efficient tracking. Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams. Then, we also propose to boost the interactive learning between the RGB and Event features using the Mamba network. The fused features will be fed into the tracking head for target object localization. Extensive experiments on FELT and FE108 datasets fully validated the efficiency and effectiveness of our proposed tracker. Specifically, our Mamba-based tracker achieves 43.5/55.6 on the SR/PR metric, while the ViT-S based tracker (OSTrack) obtains 40.0/50.9. The GPU memory cost of ours and ViT-S based tracker is 13.98GB and 15.44GB, which decreased about $9.5\%$. The FLOPs and parameters of ours/ViT-S based OSTrack are 59GB/1076GB and 7MB/60MB, which decreased about $94.5\%$ and $88.3\%$, respectively. We hope this work can bring some new insights to the tracking field and greatly promote the application of the Mamba architecture in tracking. The source code of this work will be released on \url{https://github.com/Event-AHU/Mamba_FETrack}.

Updated: 2024-04-28 13:12:49

标题: Mamba-FETrack：基于状态空间模型的帧事件跟踪

摘要: RGB-Event跟踪是一个新兴的研究课题，重点是如何有效地整合异构多模态数据（同步曝光视频帧和异步脉冲事件流）。现有作品通常采用基于Transformer的网络来处理这些模态，并通过多个数据集上的输入级或特征级融合实现相当准确度。然而，由于使用了自注意机制，这些跟踪器需要消耗大量内存和计算复杂度。本文提出了一种基于状态空间模型（SSM）的新型RGB-Event跟踪框架Mamba-FETrack，以实现高性能跟踪，同时有效降低计算成本并实现更高效的跟踪。具体而言，我们采用两种模态特定的Mamba骨干网络来提取RGB帧和事件流的特征。然后，我们还提出使用Mamba网络增强RGB和事件特征之间的交互学习。融合的特征将被送入跟踪头部进行目标物体定位。在FELT和FE108数据集上进行的大量实验证实了我们提出的跟踪器的效率和有效性。具体来说，我们基于Mamba的跟踪器在SR/PR度量上分别达到了43.5/55.6，而基于ViT-S的跟踪器（OSTrack）则分别获得了40.0/50.9。我们的跟踪器和基于ViT-S的跟踪器的GPU内存成本分别为13.98GB和15.44GB，分别减少了约9.5%。我们的/ViT-S的OSTrack的FLOPs和参数分别为59GB/1076GB和7MB/60MB，分别减少了约94.5%和88.3%。我们希望这项工作能为跟踪领域带来一些新的见解，并大大促进Mamba架构在跟踪中的应用。本文的源代码将在\url{https://github.com/Event-AHU/Mamba_FETrack}上发布。

更新时间: 2024-04-28 13:12:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.18174v1

Latent Space Bayesian Optimization with Latent Data Augmentation for Enhanced Exploration

Latent Space Bayesian Optimization (LSBO) combines generative models, typically Variational Autoencoders (VAE), with Bayesian Optimization (BO) to generate de-novo objects of interest. However, LSBO faces challenges due to the mismatch between the objectives of BO and VAE, resulting in poor exploration capabilities. In this paper, we propose novel contributions to enhance LSBO efficiency and overcome this challenge. We first introduce the concept of latent consistency/inconsistency as a crucial problem in LSBO, arising from the VAE-BO mismatch. To address this, we propose the Latent Consistent Aware-Acquisition Function (LCA-AF) that leverages consistent points in LSBO. Additionally, we present LCA-VAE, a novel VAE method that creates a latent space with increased consistent points through data augmentation in latent space and penalization of latent inconsistencies. Combining LCA-VAE and LCA-AF, we develop LCA-LSBO. Our approach achieves high sample-efficiency and effective exploration, emphasizing the significance of addressing latent consistency through the novel incorporation of data augmentation in latent space within LCA-VAE in LSBO. We showcase the performance of our proposal via de-novo image generation and de-novo chemical design tasks.

Updated: 2024-04-28 12:29:06

标题: 潜在空间贝叶斯优化与潜在数据增强的增强探索

摘要: 潜在空间贝叶斯优化（LSBO）结合生成模型，通常是变分自动编码器（VAE），与贝叶斯优化（BO）一起生成感兴趣的全新对象。然而，LSBO面临挑战，因为BO和VAE的目标不匹配，导致探索能力不足。在本文中，我们提出了增强LSBO效率并克服这一挑战的新颖贡献。我们首先引入了潜在一致性/不一致性的概念，作为LSBO中的一个关键问题，由VAE-BO不匹配引起。为了解决这个问题，我们提出了潜在一致性感知采集函数（LCA-AF），利用LSBO中的一致点。此外，我们提出了LCA-VAE，一种通过在潜在空间中进行数据增强和惩罚潜在不一致性来创建具有增加一致点的潜在空间的新型VAE方法。结合LCA-VAE和LCA-AF，我们开发了LCA-LSBO。我们的方法实现了高样本效率和有效的探索，强调通过在LSBO中的LCA-VAE中潜在空间中的数据增强的新颖整合解决潜在一致性的重要性。我们通过全新图像生成和全新化学设计任务展示了我们提案的性能。

更新时间: 2024-04-28 12:29:06

领域: cs.LG

下载: http://arxiv.org/abs/2302.02399v4

IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning

Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge. Although rehearsal-based approaches have been fairly successful in mitigating catastrophic forgetting, they suffer from overfitting on buffered samples and prior information loss, hindering generalization under low-buffer regimes. Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes. Specifically, we employ a two-pronged implicit-explicit regularization approach using contrastive representation learning (CRL) and consistency regularization. To further leverage the global relationship between representations learned using CRL, we propose a regularization strategy to guide the classifier toward the activation correlations in the unit hypersphere of the CRL. Our results show that IMEX-Reg significantly improves generalization performance and outperforms rehearsal-based approaches in several CL scenarios. It is also robust to natural and adversarial corruptions with less task-recency bias. Additionally, we provide theoretical insights to support our design decisions further.

Updated: 2024-04-28 12:25:09

标题: IMEX-Reg：连续学习中的函数空间中的隐式-显式正则化

摘要: 持续学习（CL）仍然是深度神经网络面临的长期挑战之一，因为之前获得的知识容易遗忘。尽管基于复述的方法在减轻灾难性遗忘方面取得了相当成功的成果，但它们在缓冲样本上过拟合并且丢失先前的信息，阻碍了在低缓冲区域下的泛化。受人类如何使用强归纳偏差进行学习的启发，我们提出了IMEX-Reg来改进在低缓冲区域下经验复述在CL中的泛化性能。具体而言，我们采用了一个双重隐式-显式正则化方法，使用对比表示学习（CRL）和一致性正则化。为了进一步利用使用CRL学习的表示之间的全局关系，我们提出了一种正则化策略，以引导分类器朝向CRL的单位超球体中的激活相关性。我们的结果表明，IMEX-Reg显著提高了泛化性能，并在几种CL场景中优于基于复述的方法。它还对自然和敌对的破坏具有鲁棒性，并减少了任务近期性偏差。此外，我们提供了理论洞察以进一步支持我们的设计决策。

更新时间: 2024-04-28 12:25:09

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.18161v1

PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload

Cloud providers can greatly benefit from accurate workload prediction. However, the workload of cloud servers is highly variable, with occasional heavy workload bursts. This makes workload prediction challenging. There are mainly two categories of workload prediction methods: statistical methods and neural-network-based ones. The former ones rely on strong mathematical assumptions and have reported low accuracy when predicting highly variable workload. The latter ones offer higher overall accuracy, yet they are vulnerable to data imbalance between heavy workload and common one. This impairs the prediction accuracy of neural network-based models on heavy workload. Either the overall inaccuracy of statistic methods or the heavy-workload inaccuracy of neural-network-based models can cause service level agreement violations. Thus, we propose PePNet to improve overall especially heavy workload prediction accuracy. It has two distinctive characteristics: (i) A Periodicity-Perceived Mechanism to detect the existence of periodicity and the length of one period automatically, without any priori knowledge. Furthermore, it fuses periodic information adaptively, which is suitable for periodic, lax periodic and aperiodic time series. (ii) An Achilles' Heel Loss Function iteratively optimizing the most under-fitting part in predicting sequence for each step, which significantly improves the prediction accuracy of heavy load. Extensive experiments conducted on Alibaba2018, SMD dataset and Dinda's dataset demonstrate that PePNet improves MAPE for overall workload by 20.0% on average, compared with state-of-the-art methods. Especially, PePNet improves MAPE for heavy workload by 23.9% on average.

Updated: 2024-04-28 12:23:38

标题: PePNet：一种支持重负荷稀有事件的周期性感知工作负载预测网络

摘要: 云服务提供商可以从准确的工作负载预测中获益。然而，云服务器的工作负载变化很大，偶尔会出现大量工作负载突发情况，这使得工作负载预测具有挑战性。工作负载预测方法主要分为两类：统计方法和基于神经网络的方法。前者依赖于强大的数学假设，在预测高度变化的工作负载时准确率较低。后者提供更高的整体准确性，但容易出现重工作负载和普通工作负载之间数据不平衡的情况。这会影响基于神经网络的模型在重工作负载上的预测准确性。统计方法的整体不准确性或基于神经网络的模型在重工作负载上的不准确性都可能导致服务水平协议违规。因此，我们提出了PePNet来提高整体尤其是重工作负载的预测准确性。它具有两个独特特点：（i）周期性感知机制，能够自动检测周期性的存在和一个周期的长度，无需任何先验知识。此外，它能够自适应地融合周期信息，适用于周期性、松散周期性和非周期性时间序列。（ii）阿喀琉斯之踵损失函数，迭代优化每一步中预测序列中最不适配的部分，显著提高了重负载的预测准确性。在阿里巴巴2018年、SMD数据集和Dinda数据集上进行的大量实验表明，与最先进的方法相比，PePNet平均提高了整体工作负载的MAPE值20.0%，特别是重工作负载的MAPE平均提高了23.9%。

更新时间: 2024-04-28 12:23:38

领域: cs.DC,cs.LG

下载: http://arxiv.org/abs/2308.01917v2

Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models

Monitoring calf behaviour continuously would be beneficial to identify routine practices (e.g., weaning, dehorning, etc.) that impact calf welfare in dairy farms. In that regard, accelerometer data collected from neck collars can be used along with Machine Learning models to classify calf behaviour automatically. Hand-crafted features are commonly used in Machine Learning models, while ROCKET and Catch22 features are specifically designed for time-series classification problems in related fields. This study aims to compare the performance of ROCKET and Catch22 features to Hand-Crafted features. 30 Irish Holstein Friesian and Jersey pre-weaned calves were monitored using accelerometer sensors allowing for 27.4 hours of annotated behaviors. Additional time-series were computed from the raw X, Y and Z-axis and split into 3-second time windows. ROCKET, Catch22 and Hand-Crafted features were calculated for each time window, and the dataset was then split into the train, validation and test sets. Each set of features was used to train three Machine Learning models (Random Forest, eXtreme Gradient Boosting, and RidgeClassifierCV) to classify six behaviours indicative of pre-weaned calf welfare (drinking milk, grooming, lying, running, walking and other). Models were tuned with the validation set, and the performance of each feature-model combination was evaluated with the test set. The best performance across the three models was obtained with ROCKET [average balanced accuracy +/- standard deviation] (0.70 +/- 0.07), followed by Catch22 (0.69 +/- 0.05), surpassing Hand-Crafted (0.65 +/- 0.034). The best balanced accuracy (0.77) was obtained with ROCKET and RidgeClassifierCV, followed by Catch22 and Random Forest (0.73). Thus, tailoring these approaches for specific behaviours and contexts will be crucial in advancing precision livestock farming and enhancing animal welfare on a larger scale.

Updated: 2024-04-28 12:23:01

标题: 使用机器学习模型评估ROCKET和Catch22特征在小牛行为分类中的应用效果

摘要: 监测小牛行为的连续性对于识别对奶牛养殖场小牛福利产生影响的常规做法（如断奶、去角等）将是有益的。在这方面，从脖圈收集的加速度计数据可以与机器学习模型一起使用，以自动分类小牛行为。手工特征通常用于机器学习模型中，而ROCKET和Catch22特征专门设计用于相关领域的时间序列分类问题。本研究旨在比较ROCKET和Catch22特征与手工特征的性能。监测了30头爱尔兰荷斯坦弗里斯和泽西斯预断奶小牛，使用加速度计传感器监测了27.4小时的行为注释。从原始X、Y和Z轴计算出额外的时间序列，并将其分成3秒的时间窗口。对每个时间窗口计算了ROCKET、Catch22和手工特征，然后将数据集分成训练、验证和测试集。每组特征用于训练三个机器学习模型（随机森林、极限梯度提升和RidgeClassifierCV）来分类预断奶小牛福利的六种行为（喝牛奶、梳洗、躺着、奔跑、行走和其他）。模型经过验证集调整，并用测试集评估每个特征-模型组合的性能。三个模型中最佳性能是用ROCKET [平均平衡精度+/-标准偏差]（0.70 +/- 0.07）获得的，其次是Catch22（0.69 +/- 0.05），超过了手工制作（0.65 +/- 0.034）。最佳平衡精度（0.77）是用ROCKET和RidgeClassifierCV获得的，其次是Catch22和随机森林（0.73）。因此，定制这些方法以适应特定行为和环境将对推动精准畜牧业和提升更大规模的动物福利至关重要。

更新时间: 2024-04-28 12:23:01

领域: cs.LG

下载: http://arxiv.org/abs/2404.18159v1

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.

Updated: 2024-04-28 12:11:43

标题: 重新审视视觉强化学习中的可塑性：数据、模块和训练阶段

摘要: Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.

更新时间: 2024-04-28 12:11:43

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.07418v2

Revisiting Neural Networks for Continual Learning: An Architectural Perspective

Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i.e., width and depth, and also at the network components, i.e., skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CL-friendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL. Code is available at https://github.com/byyx666/ArchCraft.

Updated: 2024-04-28 12:08:26

标题: 重新审视神经网络的持续学习：从架构角度出发

摘要: 努力克服灾难性遗忘主要集中在开发更有效的持续学习（CL）方法上。相反，较少的注意力被用来分析网络架构设计（例如，网络深度、宽度和组件）在持续学习中的作用。本文旨在弥合网络架构设计和持续学习之间的差距，并对网络架构对持续学习的影响进行全面研究。这项工作考虑网络规模层面的架构设计，即宽度和深度，以及网络组件，即跳过连接、全局池化层和下采样。在这两种情况下，我们首先通过系统地探索架构设计如何影响持续学习来得出见解。然后，基于这些见解，我们为持续学习构建了一个专门的搜索空间，并进一步提出了一种简单而有效的ArchCraft方法，以引导一个友好于持续学习的架构，即，这种方法将AlexNet/ResNet重新设计为AlexAC/ResAC。在各种持续学习设置和场景中进行的实验验证表明，改进的架构在参数效率方面表现出色，实现了持续学习的最新性能，而在任务IL和类别IL中的参数方面比朴素的CL架构紧凑了86％，61％和97％。代码可在https://github.com/byyx666/ArchCraft获取。

更新时间: 2024-04-28 12:08:26

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2404.14829v3

InstructEdit: Instruction-based Knowledge Editing for Large Language Models

Knowledge editing for large language models can offer an efficient solution to alter a model's behavior without negatively impacting the overall performance. However, the current approaches encounter issues with limited generalizability across tasks, necessitating one distinct editor for each task, significantly hindering the broader applications. To address this, we take the first step to analyze the multi-task generalization issue in knowledge editing. Specifically, we develop an instruction-based editing technique, termed InstructEdit, which facilitates the editor's adaptation to various task performances simultaneously using simple instructions. With only one unified editor for each LLM, we empirically demonstrate that InstructEdit can improve the editor's control, leading to an average 14.86% increase in Reliability in multi-task editing setting. Furthermore, experiments involving holdout unseen task illustrate that InstructEdit consistently surpass previous strong baselines. To further investigate the underlying mechanisms of instruction-based knowledge editing, we analyze the principal components of the editing gradient directions, which unveils that instructions can help control optimization direction with stronger OOD generalization. Code and datasets are available in https://github.com/zjunlp/EasyEdit.

Updated: 2024-04-28 12:03:38

标题: InstructEdit: 大型语言模型的基于指令的知识编辑

摘要: 大语言模型的知识编辑可以提供一种有效的解决方案，可以改变模型的行为，而不会对整体性能产生负面影响。然而，目前的方法在跨任务上存在有限的泛化能力，需要为每个任务使用一个独特的编辑器，严重阻碍了更广泛的应用。为了解决这个问题，我们迈出了第一步，分析了知识编辑中的多任务泛化问题。具体地，我们开发了一种基于指令的编辑技术，称为InstructEdit，它可以通过简单的指令促进编辑器同时适应各种任务的表现。只需为每个大语言模型统一使用一个编辑器，我们经验性地证明InstructEdit可以改善编辑器的控制，导致多任务编辑设置中可靠性平均提高了14.86%。此外，涉及留置未见任务的实验表明，InstructEdit始终优于先前的强基线。为了进一步研究基于指令的知识编辑的基本机制，我们分析了编辑梯度方向的主要组成部分，揭示了指令可以帮助控制具有更强OOD泛化的优化方向。代码和数据集可在https://github.com/zjunlp/EasyEdit上获取。

更新时间: 2024-04-28 12:03:38

领域: cs.CL,cs.AI,cs.CV,cs.HC,cs.LG

下载: http://arxiv.org/abs/2402.16123v2

Origami Single-end Capacitive Sensing for Continuous Shape Estimation of Morphing Structures

In this work, we propose a novel single-end morphing capacitive sensing method for shape tracking, FxC, by combining Folding origami structures and Capacitive sensing to detect the morphing structural motions using state-of-the-art sensing circuits and deep learning. It was observed through embedding areas of origami structures with conductive materials as single-end capacitive sensing patches, that the sensor signals change coherently with the motion of the structure. Different from other origami capacitors where the origami structures are used in adjusting the thickness of the dielectric layer of double-plate capacitors, FxC uses only a single conductive plate per channel, and the origami structure directly changes the geometry of the conductive plate. We examined the operation principle of morphing single-end capacitors through 3D geometry simulation combined with physics theoretical deduction, which deduced similar behaviour as observed in experimentation. Then a software pipeline was developed to use the sensor signals to reconstruct the dynamic structural geometry through data-driven deep neural network regression of geometric primitives extracted from vision tracking. We created multiple folding patterns to validate our approach, based on folding patterns including Accordion, Chevron, Sunray and V-Fold patterns with different layouts of capacitive sensors using paper-based and textile-based materials. Experimentation results show that the geometry primitives predicted from the capacitive signals have a strong correlation with the visual ground truth with R-squared value of up to 95% and tracking error of 6.5 mm for patches. The simulation and machine learning constitute two-way information exchange between the sensing signals and structural geometry.

Updated: 2024-04-28 11:52:29

标题: 折纸单端电容传感器用于连续形态估计的研究

摘要: 在这项工作中，我们提出了一种新颖的单端变形电容传感方法，用于形状跟踪，命名为FxC，通过将折叠折纸结构和电容传感结合起来，利用最先进的传感电路和深度学习来检测变形结构运动。通过将折纸结构的嵌入区域与导电材料作为单端电容传感补丁，观察到传感器信号与结构运动一致地变化。与其他折纸电容器不同，其他折叠电容器使用折纸结构来调节双板电容器的介质层厚度，FxC每个通道仅使用一个导电板，并且折纸结构直接改变了导电板的几何形状。我们通过3D几何模拟结合物理理论推导检验了变形单端电容器的工作原理，该推导得出了与实验观察到的类似行为。然后开发了一个软件流水线，利用传感器信号通过数据驱动的深度神经网络回归来重建动态结构几何，提取自视觉跟踪的几何基元。我们创建了多个折叠图案来验证我们的方法，基于包括手风琴、山峰、太阳射线和V折叠图案的折叠图案，使用基于纸张和基于纺织品的不同布局的电容传感器。实验结果显示，从电容信号预测的几何基元与视觉地面实况具有高达95%的R平方值和6.5毫米的跟踪误差。模拟和机器学习构成了传感信号和结构几何之间的双向信息交流。

更新时间: 2024-04-28 11:52:29

领域: cs.HC,cs.LG,eess.IV,eess.SP

下载: http://arxiv.org/abs/2307.05370v2

Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories

The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable for real-world scenarios. In this paper, we propose a deepfake video detection method based on 3D spatiotemporal trajectories. Specifically, we utilize a robust 3D model to construct spatiotemporal motion features, integrating feature details from both 2D and 3D frames to mitigate the influence of large head rotation angles or insufficient lighting within frames. Furthermore, we separate facial expressions from head movements and design a sequential analysis method based on phase space motion trajectories to explore the feature differences between genuine and fake faces in deepfake videos. We conduct extensive experiments to validate the performance of our proposed method on several compressed deepfake benchmarks. The robustness of the well-designed features is verified by calculating the consistent distribution of facial landmarks before and after video compression.Our method yields satisfactory results and showcases its potential for practical applications.

Updated: 2024-04-28 11:48:13

标题: 基于3D时空轨迹的压缩深度伪造视频检测

摘要: 恶意行为者滥用深度伪造技术可能对国家、社会和个人构成潜在威胁。然而，现有用于检测深度伪造的方法主要集中在未压缩视频上，例如噪声特征、局部纹理或频率统计。当应用于压缩视频时，这些方法检测性能下降，不太适合实际场景。本文提出了一种基于3D时空轨迹的深度伪造视频检测方法。具体来说，我们利用强大的3D模型构建时空运动特征，整合来自2D和3D帧的特征细节，以减轻帧内大头部旋转角度或照明不足的影响。此外，我们将面部表情与头部运动分开，并设计了一种基于相空间运动轨迹的顺序分析方法，以探索深度伪造视频中真实和伪造面部之间的特征差异。我们进行了大量实验证实我们提出的方法在几个压缩深度伪造基准上的性能。通过计算视频压缩前后的面部标志的一致分布来验证设计良好特征的鲁棒性。我们的方法取得了令人满意的结果，展示了其在实际应用中的潜力。

更新时间: 2024-04-28 11:48:13

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2404.18149v1

Multi-Agent Reinforcement Learning for Energy Networks: Computational Challenges, Progress and Open Problems

The rapidly changing architecture and functionality of electrical networks and the increasing penetration of renewable and distributed energy resources have resulted in various technological and managerial challenges. These have rendered traditional centralized energy-market paradigms insufficient due to their inability to support the dynamic and evolving nature of the network. This survey explores how multi-agent reinforcement learning (MARL) can support the decentralization and decarbonization of energy networks and mitigate the associated challenges. This is achieved by specifying key computational challenges in managing energy networks, reviewing recent research progress on addressing them, and highlighting open challenges that may be addressed using MARL.

Updated: 2024-04-28 11:39:54

标题: 能源网络的多智能体强化学习：计算挑战、进展和未解问题

摘要: 随着电力网络架构和功能的快速变化，以及可再生能源和分布式能源资源的不断增加，出现了各种技术和管理挑战。传统集中式能源市场范式由于无法支持网络的动态和不断演变的特性而变得不足够。本调查探讨了多智能体强化学习（MARL）如何支持能源网络的分散化和脱碳化，并减轻相关挑战。通过指定管理能源网络中的关键计算挑战，审查最近在解决这些挑战方面取得的研究进展，并突出可能使用MARL解决的开放挑战。

更新时间: 2024-04-28 11:39:54

领域: cs.AI

下载: http://arxiv.org/abs/2404.15583v2

Generative AI for Visualization: State of the Art and Future Directions

Generative AI (GenAI) has witnessed remarkable progress in recent years and demonstrated impressive performance in various generation tasks in different domains such as computer vision and computational design. Many researchers have attempted to integrate GenAI into visualization framework, leveraging the superior generative capacity for different operations. Concurrently, recent major breakthroughs in GenAI like diffusion model and large language model have also drastically increase the potential of GenAI4VIS. From a technical perspective, this paper looks back on previous visualization studies leveraging GenAI and discusses the challenges and opportunities for future research. Specifically, we cover the applications of different types of GenAI methods including sequence, tabular, spatial and graph generation techniques for different tasks of visualization which we summarize into four major stages: data enhancement, visual mapping generation, stylization and interaction. For each specific visualization sub-task, we illustrate the typical data and concrete GenAI algorithms, aiming to provide in-depth understanding of the state-of-the-art GenAI4VIS techniques and their limitations. Furthermore, based on the survey, we discuss three major aspects of challenges and research opportunities including evaluation, dataset, and the gap between end-to-end GenAI and generative algorithms. By summarizing different generation algorithms, their current applications and limitations, this paper endeavors to provide useful insights for future GenAI4VIS research.

Updated: 2024-04-28 11:27:30

标题: 生成式人工智能用于可视化：现状与未来发展方向

摘要: 生成式人工智能（GenAI）近年来取得了显著的进展，在不同领域如计算机视觉和计算设计中展现出了令人印象深刻的性能。许多研究人员尝试将GenAI整合到可视化框架中，利用其优越的生成能力进行不同操作。与此同时，最近GenAI的重大突破，如扩散模型和大型语言模型，也大幅提高了GenAI4VIS的潜力。从技术角度来看，本文回顾了以前利用GenAI进行可视化研究，并讨论了未来研究的挑战和机遇。具体而言，我们涵盖了不同类型的GenAI方法的应用，包括序列、表格、空间和图生成技术，用于可视化的不同任务，我们总结为四个主要阶段：数据增强、视觉映射生成、风格化和交互。对于每个特定的可视化子任务，我们阐述了典型的数据和具体的GenAI算法，旨在提供对最新GenAI4VIS技术及其局限性的深入理解。此外，基于调查，我们讨论了评估、数据集和端到端GenAI与生成算法之间的差距等三个主要挑战和研究机会方面。通过总结不同的生成算法、它们当前的应用和局限性，本文致力于为未来的GenAI4VIS研究提供有用的见解。

更新时间: 2024-04-28 11:27:30

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.18144v1

MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds

We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing with many of the challenges advanced by our new benchmark, such as scaling to instances with thousands of objects. Based on these results, we identify areas of improvement for future planners. Our framework is made available at https://github.com/IretonLiu/mine-pddl/.

Updated: 2024-04-28 11:22:36

标题: MinePlanner: 大型Minecraft世界中长期规划的基准Benchmark

摘要: 我们提出了一个基于Minecraft游戏的规划任务新基准。我们的基准总共包含45个任务，同时还支持自动创建新的Minecraft任务的命题和数值实例。我们在这些任务上对数值和命题规划系统进行基准测试，结果表明当前最先进的规划器无法处理我们的新基准提出的许多挑战，比如扩展到包含成千上万个对象的实例。根据这些结果，我们确定了未来规划器改进的方向。我们的框架可在https://github.com/IretonLiu/mine-pddl/上找到。

更新时间: 2024-04-28 11:22:36

领域: cs.AI

下载: http://arxiv.org/abs/2312.12891v2

Conversational Disease Diagnosis via External Planner-Controlled Large Language Models

The development of large language models (LLM) have brought unprecedented possibilities for artificial intelligence (AI) based medical diagnosis. However, the application perspective of LLMs in real diagnosis scenarios is still unclear because they are not adept at collecting patient data proactively. This study presents a novel approach that implemented AI systems to emulate the two-phase process used by physicians during medical consultations. Our methodology involves two specialized planners: the first employs a data-driven, reinforcement learning approach to formulate disease screening questions; the second uses LLMs to parse medical guidelines and conducts differential diagnosis. By utilizing real patient electronic medical records (EMR) data, we constructed simulated dialogues between virtual patients and doctors and evaluate the diagnostic abilities of our system. We demonstrate that our system surpasses existing models, including GPT-4 Turbo, in both disease screening and differential diagnosis. This research represents a step towards integrating AI more seamlessly into clinical settings, potentially improving the accuracy and accessibility of medical diagnostics.

Updated: 2024-04-28 11:19:53

标题: 通过外部计划控制的大型语言模型进行对话式疾病诊断

摘要: 大型语言模型（LLM）的发展为基于人工智能（AI）的医学诊断带来了前所未有的可能性。然而，LLMs在实际诊断场景中的应用前景仍不明确，因为它们不擅长主动收集患者数据。本研究提出了一种新颖的方法，利用人工智能系统模拟医生在医学咨询过程中使用的两阶段流程。我们的方法涉及两个专门的规划者：第一个采用数据驱动的强化学习方法制定疾病筛查问题；第二个使用LLMs解析医学指南并进行不同诊断。通过利用真实患者电子病历（EMR）数据，我们构建了虚拟患者与医生之间的模拟对话，并评估了我们系统的诊断能力。我们展示了我们的系统在疾病筛查和不同诊断方面均超过了GPT-4 Turbo等现有模型。这项研究代表了将人工智能更无缝地整合到临床设置中的一步，可能提高医学诊断的准确性和可访问性。

更新时间: 2024-04-28 11:19:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.04292v2

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise compression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations-layer-level and neuron-level-before and during FL fine-tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theoretical guarantees. Experimental results on seven commonly used datasets (i.e., four text and three vision) demonstrate the superiority of FedPFT.

Updated: 2024-04-28 11:11:16

标题: FedPFT：基础模型的联邦代理微调

摘要: 通过联邦学习（FL）将基础模型（FM）调整为下游任务是一种保护数据隐私和宝贵FM的有前途的策略。现有方法通过在FL中将子FM分配给客户来微调FM，然而，由于微调不足和梯度不可避免的误差积累，导致性能不佳。本文提出了一种新颖的方法Federated Proxy Fine-Tuning（FedPFT），通过两个关键模块增强FM在下游任务中的调整。首先，子FM构建模块采用逐层压缩方法，通过强调那些关键神经元，促进对所有层的FM进行全面微调。其次，子FM对齐模块在FL微调过程中进行两步蒸馏-逐层和神经元级别-分别在FL微调之前和期间，以准确对齐子FM和FM以减少梯度误差，从而在理论上保证。对七个常用数据集（即四个文本和三个视觉数据集）的实验结果表明了FedPFT的优越性。

更新时间: 2024-04-28 11:11:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2404.11536v2

Inverse-Free Fast Natural Gradient Descent Method for Deep Learning

Second-order optimization techniques have the potential to achieve faster convergence rates compared to first-order methods through the incorporation of second-order derivatives or statistics. However, their utilization in deep learning is limited due to their computational inefficiency. Various approaches have been proposed to address this issue, primarily centered on minimizing the size of the matrix to be inverted. Nevertheless, the necessity of performing the inverse operation iteratively persists. In this work, we present a fast natural gradient descent (FNGD) method that only requires inversion during the first epoch. Specifically, it is revealed that natural gradient descent (NGD) is essentially a weighted sum of per-sample gradients. Our novel approach further proposes to share these weighted coefficients across epochs without affecting empirical performance. Consequently, FNGD exhibits similarities to the average sum in first-order methods, leading to the computational complexity of FNGD being comparable to that of first-order methods. Extensive experiments on image classification and machine translation tasks demonstrate the efficiency of the proposed FNGD. For training ResNet-18 on CIFAR-100, FNGD can achieve a speedup of 2.07$\times$ compared with KFAC. For training Transformer on Multi30K, FNGD outperforms AdamW by 24 BLEU score while requiring almost the same training time.

Updated: 2024-04-28 10:52:32

标题: 无逆矩阵快速自然梯度下降方法用于深度学习

摘要: 二阶优化技术具有潜力比一阶方法更快地收敛速度，通过引入二阶导数或统计量。然而，在深度学习中它们的利用受到计算效率的限制。已经提出了各种方法来解决这个问题，主要集中在最小化需要求逆的矩阵的大小。然而，执行逆操作的必要性仍然存在。在这项工作中，我们提出了一种快速自然梯度下降（FNGD）方法，它只需要在第一个时期进行求逆。具体来说，揭示了自然梯度下降（NGD）本质上是每个样本梯度的加权和。我们的新方法进一步建议在时期间共享这些加权系数，而不影响经验性能。因此，FNGD类似于一阶方法中的平均和，导致FNGD的计算复杂度与一阶方法相当。对图像分类和机器翻译任务的大量实验证明了所提出的FNGD的效率。在CIFAR-100上训练ResNet-18时，FNGD可以实现与KFAC相比的2.07倍加速。在Multi30K上训练Transformer时，FNGD的表现优于AdamW 24 BLEU分数，同时几乎需要相同的训练时间。

更新时间: 2024-04-28 10:52:32

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.03473v2

Is Complexity an Illusion?

Simplicity is held by many to be the key to general intelligence. Simpler models tend to "generalise", identifying the cause or generator of data with greater sample efficiency. The implications of the correlation between simplicity and generalisation extend far beyond computer science, addressing questions of physics and even biology. Yet simplicity is a property of form, while generalisation is of function. In interactive settings, any correlation between the two depends on interpretation. In theory there could be no correlation and yet in practice, there is. Previous theoretical work showed generalisation to be a consequence of "weak" constraints on implied by function, not form. Experiments demonstrated choosing weak constraints over simple forms yielded a 110-500% improvement in generalisation rate. Here we show that all constraints can take equally simple forms, regardless of weakness. However if forms are spatially extended, then function is represented using a finite subset of forms. If function is represented using a finite subset of forms, then we can force a correlation between simplicity and generalisation by making weak constraints take simple forms. If function determined by a goal directed process (e.g. natural selection), then efficiency demands weak constraints take simple forms. Complexity has no causal influence on generalisation, but appears to due to confounding.

Updated: 2024-04-28 10:44:36

标题: 复杂性是一种幻觉吗？

摘要: 许多人认为简单是智能的关键。简单的模型往往会“泛化”，以更高的样本效率识别数据的原因或生成器。简单性与泛化之间的相关性的影响远远超出了计算机科学范畴，涉及到物理甚至生物学的问题。然而，简单性是一种形式属性，而泛化是一种功能属性。在交互设置中，两者之间的任何相关性取决于解释。理论上可能没有相关性，但实际上却存在。先前的理论工作表明，泛化是由函数所隐含的“弱”约束的结果，而不是形式。实验证明，选择弱约束而不是简单形式可以使泛化率提高110-500%。在这里，我们展示了所有约束都可以采用同样简单的形式，无论其弱强程度如何。然而，如果形式在空间上延伸，那么函数将使用有限的形式子集来表示。如果函数使用有限的形式子集来表示，那么我们可以通过使弱约束采用简单形式来迫使简单性和泛化之间存在相关性。如果函数由目标导向过程（例如自然选择）决定，则效率要求弱约束采用简单形式。复杂性对泛化没有因果影响，但似乎由于混淆而产生影响。

更新时间: 2024-04-28 10:44:36

领域: cs.AI

下载: http://arxiv.org/abs/2404.07227v3

AnyPattern: Towards In-context Image Copy Detection

This paper explores in-context learning for image copy detection (ICD), i.e., prompting an ICD model to identify replicated images with new tampering patterns without the need for additional training. The prompts (or the contexts) are from a small set of image-replica pairs that reflect the new patterns and are used at inference time. Such in-context ICD has good realistic value, because it requires no fine-tuning and thus facilitates fast reaction against the emergence of unseen patterns. To accommodate the "seen $\rightarrow$ unseen" generalization scenario, we construct the first large-scale pattern dataset named AnyPattern, which has the largest number of tamper patterns ($90$ for training and $10$ for testing) among all the existing ones. We benchmark AnyPattern with popular ICD methods and reveal that existing methods barely generalize to novel patterns. We further propose a simple in-context ICD method named ImageStacker. ImageStacker learns to select the most representative image-replica pairs and employs them as the pattern prompts in a stacking manner (rather than the popular concatenation manner). Experimental results show (1) training with our large-scale dataset substantially benefits pattern generalization ($+26.66 \%$ $\mu AP$), (2) the proposed ImageStacker facilitates effective in-context ICD (another round of $+16.75 \%$ $\mu AP$), and (3) AnyPattern enables in-context ICD, i.e., without such a large-scale dataset, in-context learning does not emerge even with our ImageStacker. Beyond the ICD task, we also demonstrate how AnyPattern can benefit artists, i.e., the pattern retrieval method trained on AnyPattern can be generalized to identify style mimicry by text-to-image models. The project is publicly available at https://anypattern.github.io.

Updated: 2024-04-28 10:15:37

标题: AnyPattern：朝向上下文中的图像复制检测

摘要: 本文探讨了图像复制检测（ICD）的情境学习，即通过提示ICD模型识别具有新篡改模式的复制图像，而无需额外的训练。提示（或上下文）来自一小组反映新模式的图像副本对，在推断时使用。这种情境学习ICD具有很好的现实价值，因为它不需要微调，从而有助于快速应对未知模式的出现。为了适应“已知→未知”泛化场景，我们构建了第一个大规模模式数据集AnyPattern，该数据集在所有现有数据集中具有最多的篡改模式（90个用于训练，10个用于测试）。我们使用流行的ICD方法对AnyPattern进行基准测试，发现现有方法几乎无法推广到新的模式。我们进一步提出了一种简单的情境学习ICD方法，名为ImageStacker。ImageStacker学习选择最具代表性的图像副本对，并以堆叠的方式使用它们作为模式提示（而不是流行的串联方式）。实验结果表明：（1）使用我们的大规模数据集训练大大有利于模式泛化（+26.66％μAP），（2）提出的ImageStacker促进了有效的情境学习ICD（另一轮+16.75％μAP），（3）AnyPattern实现了情境学习ICD，即，没有这样一个大规模数据集，甚至使用我们的ImageStacker也无法实现情境学习。除了ICD任务，我们还展示了AnyPattern如何使艺术家受益，即，在AnyPattern上训练的模式检索方法可以推广到通过文本到图像模型识别风格模仿。该项目可在https://anypattern.github.io上公开访问。

更新时间: 2024-04-28 10:15:37

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.13788v2

Convergence Analysis of Flow Matching in Latent Space with Transformers

We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analysis demonstrates the effectiveness of this approach, showing that the distribution of samples generated via estimated ODE flow converges to the target distribution in the Wasserstein-2 distance under mild and practical assumptions. Furthermore, we show that arbitrary smooth functions can be effectively approximated by transformer networks with Lipschitz continuity, which may be of independent interest.

Updated: 2024-04-28 10:10:33

标题: 潜空间中使用Transformer进行流匹配的收敛性分析

摘要: 我们提出了基于ODE的生成模型（具体为流匹配）的理论收敛性保证。我们使用预先训练的自动编码器网络将高维原始输入映射到低维潜在空间，其中一个变换器网络被训练用于预测从标准正态分布到目标潜在分布的变换的速度场。我们的误差分析证明了这种方法的有效性，表明通过估计的ODE流生成的样本分布在温斯坦-2距离下收敛到目标分布，只需一些温和且实用的假设。此外，我们还展示了任意平滑函数可以通过具有Lipschitz连续性的变换器网络有效逼近，这可能具有独立的兴趣。

更新时间: 2024-04-28 10:10:33

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2404.02538v2

Enhancing Fairness in Neural Networks Using FairVIC

Mitigating bias in automated decision-making systems, specifically deep learning models, is a critical challenge in achieving fairness. This complexity stems from factors such as nuanced definitions of fairness, unique biases in each dataset, and the trade-off between fairness and model accuracy. To address such issues, we introduce FairVIC, an innovative approach designed to enhance fairness in neural networks by addressing inherent biases at the training stage. FairVIC differs from traditional approaches that typically address biases at the data preprocessing stage. Instead, it integrates variance, invariance and covariance into the loss function to minimise the model's dependency on protected characteristics for making predictions, thus promoting fairness. Our experimentation and evaluation consists of training neural networks on three datasets known for their biases, comparing our results to state-of-the-art algorithms, evaluating on different sizes of model architectures, and carrying out sensitivity analysis to examine the fairness-accuracy trade-off. Through our implementation of FairVIC, we observed a significant improvement in fairness across all metrics tested, without compromising the model's accuracy to a detrimental extent. Our findings suggest that FairVIC presents a straightforward, out-of-the-box solution for the development of fairer deep learning models, thereby offering a generalisable solution applicable across many tasks and datasets.

Updated: 2024-04-28 10:10:21

标题: 使用FairVIC增强神经网络中的公平性

摘要: 在实现公平性方面，缓解自动决策系统中的偏见，特别是深度学习模型，是一个关键挑战。这种复杂性源于公平性的微妙定义、每个数据集中独特的偏见以及公平性和模型准确性之间的权衡。为了解决这些问题，我们引入了FairVIC，这是一种创新方法，旨在通过在训练阶段解决固有偏见来增强神经网络的公平性。FairVIC与通常在数据预处理阶段解决偏见的传统方法不同。相反，它将方差、不变性和协变量集成到损失函数中，以最小化模型对受保护特征的依赖，从而促进公平性。我们的实验和评估包括在三个以其偏见而闻名的数据集上训练神经网络，将结果与最先进的算法进行比较，在不同大小的模型架构上进行评估，并进行敏感性分析以检查公平性和准确性之间的权衡。通过我们对FairVIC的实施，我们观察到在所有测试指标上公平性显著提高，而不会对模型的准确性产生有害影响。我们的研究结果表明，FairVIC为开发更公平的深度学习模型提供了一个简单易用的解决方案，从而提供了一个通用的解决方案，适用于许多任务和数据集。

更新时间: 2024-04-28 10:10:21

领域: cs.LG,cs.AI,cs.CY,stat.ML

下载: http://arxiv.org/abs/2404.18134v1

Logic Agent: Enhancing Validity with Logic Rule Invocation

Chain-of-Thought (CoT) prompting has emerged as a pivotal technique for augmenting the inferential capabilities of language models during reasoning tasks. Despite its advancements, CoT often grapples with challenges in validating reasoning validity and ensuring informativeness. Addressing these limitations, this paper introduces the Logic Agent (LA), an agent-based framework aimed at enhancing the validity of reasoning processes in Large Language Models (LLMs) through strategic logic rule invocation. Unlike conventional approaches, LA transforms LLMs into logic agents that dynamically apply propositional logic rules, initiating the reasoning process by converting natural language inputs into structured logic forms. The logic agent leverages a comprehensive set of predefined functions to systematically navigate the reasoning process. This methodology not only promotes the structured and coherent generation of reasoning constructs but also significantly improves their interpretability and logical coherence. Through extensive experimentation, we demonstrate LA's capacity to scale effectively across various model sizes, markedly improving the precision of complex reasoning across diverse tasks.

Updated: 2024-04-28 10:02:28

标题: 逻辑代理：通过逻辑规则调用增强有效性

摘要: Chain-of-Thought (CoT)提示已经成为一种关键技术，用于增强语言模型在推理任务中的推理能力。尽管它取得了进展，但CoT经常面临验证推理有效性和确保信息性的挑战。为了解决这些局限性，本文介绍了逻辑Agent（LA），这是一个基于代理的框架，旨在通过策略性逻辑规则调用来增强大型语言模型（LLMs）中推理过程的有效性。与传统方法不同，LA将LLMs转化为逻辑代理，动态应用命题逻辑规则，通过将自然语言输入转换为结构化逻辑形式来启动推理过程。逻辑代理利用一套全面预定义的功能系统地导航推理过程。这种方法不仅促进了推理构造的结构化和连贯生成，还显著提高了它们的可解释性和逻辑连贯性。通过广泛的实验，我们展示了LA在各种模型尺寸上有效扩展的能力，显著改善了跨不同任务的复杂推理的精确性。

更新时间: 2024-04-28 10:02:28

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.18130v1

Stochastic Gradient Descent for Gaussian Processes Done Right

As is well known, both sampling from the posterior and computing the mean of the posterior in Gaussian process regression reduces to solving a large linear system of equations. We study the use of stochastic gradient descent for solving this linear system, and show that when \emph{done right} -- by which we mean using specific insights from the optimisation and kernel communities -- stochastic gradient descent is highly effective. To that end, we introduce a particularly simple \emph{stochastic dual descent} algorithm, explain its design in an intuitive manner and illustrate the design choices through a series of ablation studies. Further experiments demonstrate that our new method is highly competitive. In particular, our evaluations on the UCI regression tasks and on Bayesian optimisation set our approach apart from preconditioned conjugate gradients and variational Gaussian process approximations. Moreover, our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.

Updated: 2024-04-28 09:48:23

标题: 正确使用随机梯度下降的高斯过程

摘要: 众所周知，从后验中抽样和计算后验均值在高斯过程回归中都归结为解决一个大型线性方程组。我们研究了使用随机梯度下降来解决这个线性系统，并展示了当“正确地”使用特定来自优化和核社区的见解时，随机梯度下降非常有效。为此，我们引入了一种特别简单的“随机对偶下降”算法，以直观的方式解释其设计，并通过一系列消融研究展示设计选择。进一步的实验表明，我们的新方法非常有竞争力。特别是，我们在UCI回归任务和贝叶斯优化上的评估将我们的方法与预处理共轭梯度和变分高斯过程逼近方法区分开。此外，我们的方法使高斯过程回归在分子结合亲和力预测方面与最先进的图神经网络不相上下。

更新时间: 2024-04-28 09:48:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2310.20581v2

Scenario-Adaptive Fine-Grained Personalization Network: Tailoring User Behavior Representation to the Scenario Context

Existing methods often adjust representations adaptively only after aggregating user behavior sequences. This coarse-grained approach to re-weighting the entire user sequence hampers the model's ability to accurately model the user interest migration across different scenarios. To enhance the model's capacity to capture user interests from historical behavior sequences in each scenario, we develop a ranking framework named the Scenario-Adaptive Fine-Grained Personalization Network (SFPNet), which designs a kind of fine-grained method for multi-scenario personalized recommendations. Specifically, SFPNet comprises a series of blocks named as Scenario-Tailoring Block, stacked sequentially. Each block initially deploys a parameter personalization unit to integrate scenario information at a coarse-grained level by redefining fundamental features. Subsequently, we consolidate scenario-adaptively adjusted feature representations to serve as context information. By employing residual connection, we incorporate this context into the representation of each historical behavior, allowing for context-aware fine-grained customization of the behavior representations at the scenario-level, which in turn supports scenario-aware user interest modeling.

Updated: 2024-04-28 09:29:12

标题: 情景自适应细粒度个性化网络：根据情景背景调整用户行为表征

摘要: 现有方法通常只在聚合用户行为序列后才自适应地调整表示。这种对整个用户序列重新加权的粗粒度方法阻碍了模型准确建模用户兴趣在不同场景中的迁移能力。为了增强模型从历史行为序列中捕捉用户兴趣的能力，我们开发了一个名为场景自适应精细化个性化网络（SFPNet）的排名框架，它设计了一种多场景个性化推荐的精细化方法。具体来说，SFPNet包括一系列称为场景定制块的块，按顺序堆叠。每个块最初部署一个参数个性化单元，通过重新定义基本特征在粗粒度水平上集成场景信息。随后，我们 consolida 场景自适应调整的特征表示，作为上下文信息。通过使用残差连接，我们将此上下文整合到每个历史行为的表示中，实现了在场景水平上行为表示的上下文感知精细化定制，从而支持场景感知的用户兴趣建模。

更新时间: 2024-04-28 09:29:12

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.09709v2

DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.

Updated: 2024-04-28 09:25:44

标题: DaCapo：加速自主系统中视频分析的持续学习

摘要: 深度神经网络（DNN）视频分析对于自动系统（例如自动驾驶车辆、无人机和安全机器人）至关重要。然而，由于计算资源和电池电力有限，真实世界的部署面临挑战。为了应对这些挑战，连续学习利用部署中的轻量级“学生”模型（推理），利用更大的“教师”模型对采样数据进行标记（标记），并持续对学生模型进行重新训练以适应不断变化的场景（重新训练）。本文强调了现有连续学习系统的局限性：（1）它们关注重新训练的计算，却忽视了推理和标记的计算需求，（2）它们依赖耗电量高的GPU，不适用于电池操作的自动系统，（3）它们位于远程集中服务器上，面向多租户场景，同样不适用于自动系统，因为存在隐私、网络可用性和延迟问题。我们提出了一个硬件-算法共同设计的连续学习解决方案DaCapo，使自动系统能够以高性能和高能效的方式执行推理、标记和训练的并发执行。DaCapo包括（1）一个可在各自精度上并行执行内核的空间可分割和精度灵活的加速器，和（2）一个时空资源分配算法，策略性地在资源-精度权衡空间中导航，为资源分配做出最佳决策以实现最大准确性。我们的评估表明，DaCapo比基于GPU的最先进连续学习系统Ekya和EOMU分别实现了6.5%和5.5%更高的准确性，同时消耗的功耗减少了254倍。

更新时间: 2024-04-28 09:25:44

领域: cs.AR,cs.LG,cs.RO

下载: http://arxiv.org/abs/2403.14353v2

What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

In sequential decision-making problems involving sensitive attributes like race and gender, reinforcement learning (RL) agents must carefully consider long-term fairness while maximizing returns. Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear. In this paper, we address this gap in the literature by investigating the sources of inequality through a causal lens. We first analyse the causal relationships governing the data generation process and decompose the effect of sensitive attributes on long-term well-being into distinct components. We then introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics, distinguishing it from those induced by decision-making or inherited from the past. This notion requires evaluating the expected changes in the next state and the reward induced by changing the value of the sensitive attribute while holding everything else constant. To quantitatively evaluate this counterfactual concept, we derive identification formulas that allow us to obtain reliable estimations from data. Extensive experiments demonstrate the effectiveness of the proposed techniques in explaining, detecting, and reducing inequality in reinforcement learning. We publicly release code at https://github.com/familyld/InsightFair.

Updated: 2024-04-28 08:49:45

标题: 公平背后隐藏着什么？探索强化学习中的公平动态

摘要: 在涉及种族和性别等敏感属性的顺序决策问题中，强化学习（RL）代理必须在最大化回报的同时仔细考虑长期公平性。最近的研究提出了许多不同类型的公平概念，但RL问题中不公平是如何产生的仍不清楚。本文通过因果透镜探究了不平等的根源，以填补文献中的这一空白。我们首先分析了支配数据生成过程的因果关系，并将敏感属性对长期幸福感的影响分解为不同的组成部分。然后，我们引入了一个称为动态公平性的新概念，明确捕捉了源自环境动态的不平等，区分出那些由决策造成的和那些从过去继承而来的。这一概念需要评估在保持其他条件不变的情况下改变敏感属性值所产生的下一个状态和奖励的预期变化。为了定量评估这一反事实概念，我们推导了识别公式，允许我们从数据中获得可靠的估计。大量实验证明了所提出技术在解释、检测和减少强化学习中的不平等方面的有效性。我们在 https://github.com/familyld/InsightFair 上公开发布了代码。

更新时间: 2024-04-28 08:49:45

领域: cs.LG,cs.AI,cs.CY,stat.ME

下载: http://arxiv.org/abs/2404.10942v2

Self-Supervised Temporal Graph learning with Temporal and Structural Intensity Alignment

Temporal graph learning aims to generate high-quality representations for graph-based tasks with dynamic information, which has recently garnered increasing attention. In contrast to static graphs, temporal graphs are typically organized as node interaction sequences over continuous time rather than an adjacency matrix. Most temporal graph learning methods model current interactions by incorporating historical neighborhood. However, such methods only consider first-order temporal information while disregarding crucial high-order structural information, resulting in suboptimal performance. To address this issue, we propose a self-supervised method called S2T for temporal graph learning, which extracts both temporal and structural information to learn more informative node representations. Notably, the initial node representations combine first-order temporal and high-order structural information differently to calculate two conditional intensities. An alignment loss is then introduced to optimize the node representations, narrowing the gap between the two intensities and making them more informative. Concretely, in addition to modeling temporal information using historical neighbor sequences, we further consider structural knowledge at both local and global levels. At the local level, we generate structural intensity by aggregating features from high-order neighbor sequences. At the global level, a global representation is generated based on all nodes to adjust the structural intensity according to the active statuses on different nodes. Extensive experiments demonstrate that the proposed model S2T achieves at most 10.13% performance improvement compared with the state-of-the-art competitors on several datasets.

Updated: 2024-04-28 08:49:34

标题: 自监督的时间图学习：时间和结构强度对齐

摘要: 时间图学习旨在为具有动态信息的基于图的任务生成高质量的表示，最近引起了越来越多的关注。与静态图相比，时间图通常以节点交互序列的形式组织，而不是邻接矩阵。大多数时间图学习方法通过整合历史邻域来建模当前交互。然而，这些方法只考虑了一阶时间信息，而忽略了关键的高阶结构信息，导致性能不佳。为了解决这个问题，我们提出了一种自监督方法称为S2T用于时间图学习，它提取了时间和结构信息以学习更具信息量的节点表示。值得注意的是，初始节点表示以不同方式结合一阶时间和高阶结构信息来计算两个条件强度。然后引入一种对齐损失来优化节点表示，缩小两个强度之间的差距，使它们更具信息量。具体而言，除了使用历史邻居序列建模时间信息外，我们还在本地和全局层面考虑结构知识。在本地层面，我们通过聚合来自高阶邻居序列的特征来生成结构强度。在全局层面，基于所有节点生成一个全局表示，以根据不同节点上的活动状态调整结构强度。大量实验证明，所提出的模型S2T在几个数据集上相比最先进的竞争对手最多提高了10.13%的性能。

更新时间: 2024-04-28 08:49:34

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2302.07491v3

GARA: A novel approach to Improve Genetic Algorithms' Accuracy and Efficiency by Utilizing Relationships among Genes

Genetic algorithms have played an important role in engineering optimization. Traditional GAs treat each gene separately. However, biophysical studies of gene regulatory networks revealed direct associations between different genes. It inspires us to propose an improvement to GA in this paper, Gene Regulatory Genetic Algorithm (GRGA), which, to our best knowledge, is the first time to utilize relationships among genes for improving GA's accuracy and efficiency. We design a directed multipartite graph encapsulating the solution space, called RGGR, where each node corresponds to a gene in the solution and the edge represents the relationship between adjacent nodes. The edge's weight reflects the relationship degree and is updated based on the idea that the edges' weights in a complete chain as candidate solution with acceptable or unacceptable performance should be strengthened or reduced, respectively. The obtained RGGR is then employed to determine appropriate loci of crossover and mutation operators, thereby directing the evolutionary process toward faster and better convergence. We analyze and validate our proposed GRGA approach in a single-objective multimodal optimization problem, and further test it on three types of applications, including feature selection, text summarization, and dimensionality reduction. Results illustrate that our GARA is effective and promising.

Updated: 2024-04-28 08:33:39

标题: GARA：利用基因之间的关系来提高遗传算法的准确性和效率的新方法

摘要: 遗传算法在工程优化中发挥着重要作用。传统的遗传算法将每个基因单独处理。然而，基因调控网络的生物物理研究揭示了不同基因之间的直接关联。这启发我们在本文中提出一种改进遗传算法的方法，称为基因调控遗传算法（GRGA），据我们所知，这是第一次利用基因之间的关系来提高遗传算法的准确性和效率。我们设计了一个包含解空间的有向多部分图，称为RGGR，其中每个节点对应于解中的一个基因，边代表相邻节点之间的关系。边的权重反映了关系程度，并根据以下思想更新：在具有可接受或不可接受性能的完整链中，边的权重应分别加强或减少。然后使用获得的RGGR来确定交叉和突变算子的适当位点，从而引导进化过程朝着更快和更好的收敛方向。我们在一个单目标多模态优化问题中分析和验证了我们提出的GRGA方法，并进一步在三种应用中进行了测试，包括特征选择、文本摘要和降维。结果表明我们的GARA是有效且有前景的。

更新时间: 2024-04-28 08:33:39

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2404.18955v1

Tackling Noisy Labels with Network Parameter Additive Decomposition

Given data with noisy labels, over-parameterized deep networks suffer overfitting mislabeled data, resulting in poor generalization. The memorization effect of deep networks shows that although the networks have the ability to memorize all noisy data, they would first memorize clean training data, and then gradually memorize mislabeled training data. A simple and effective method that exploits the memorization effect to combat noisy labels is early stopping. However, early stopping cannot distinguish the memorization of clean data and mislabeled data, resulting in the network still inevitably overfitting mislabeled data in the early training stage.In this paper, to decouple the memorization of clean data and mislabeled data, and further reduce the side effect of mislabeled data, we perform additive decomposition on network parameters. Namely, all parameters are additively decomposed into two groups, i.e., parameters $\mathbf{w}$ are decomposed as $\mathbf{w}=\bm{\sigma}+\bm{\gamma}$. Afterward, the parameters $\bm{\sigma}$ are considered to memorize clean data, while the parameters $\bm{\gamma}$ are considered to memorize mislabeled data. Benefiting from the memorization effect, the updates of the parameters $\bm{\sigma}$ are encouraged to fully memorize clean data in early training, and then discouraged with the increase of training epochs to reduce interference of mislabeled data. The updates of the parameters $\bm{\gamma}$ are the opposite. In testing, only the parameters $\bm{\sigma}$ are employed to enhance generalization. Extensive experiments on both simulated and real-world benchmarks confirm the superior performance of our method.

Updated: 2024-04-28 08:29:21

标题: 使用网络参数附加分解解决嘈杂标签问题

摘要: 鉴于带有嘈杂标签的数据，过度参数化的深度网络容易过拟合错误标记的数据，导致泛化能力不佳。深度网络的记忆效应表明，尽管网络有能力记忆所有的嘈杂数据，但它们首先会记忆干净的训练数据，然后逐渐记忆错误标记的训练数据。一种简单有效的方法利用记忆效应来对抗嘈杂标签是早停止。然而，早停止无法区分干净数据和错误标记数据的记忆，导致网络在早期训练阶段仍然不可避免地过拟合错误标记的数据。本文针对这一问题，为了解耦干净数据和错误标记数据的记忆，并进一步减少错误标记数据的副作用，我们对网络参数进行了加性分解。即，所有参数被加性地分解为两组，即参数$\mathbf{w}$被分解为$\mathbf{w}=\bm{\sigma}+\bm{\gamma}$。随后，参数$\bm{\sigma}$被视为记忆干净数据，而参数$\bm{\gamma}$被视为记忆错误标记数据。受益于记忆效应，参数$\bm{\sigma}$的更新在早期训练中被鼓励充分记忆干净数据，然后随着训练轮数的增加而受到抑制，以减少错误标记数据的干扰。参数$\bm{\gamma}$的更新则相反。在测试中，只有参数$\bm{\sigma}$被用来增强泛化能力。对模拟和真实世界基准数据的大量实验证实了我们方法的卓越性能。

更新时间: 2024-04-28 08:29:21

领域: cs.LG

下载: http://arxiv.org/abs/2403.13241v2

Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models

Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions. Pivoting via high-resource languages remains a strong strategy for low-resource directions, and in this paper we revisit ways of pivoting through multiple languages. Previous work has used a simple averaging of probability distributions from multiple paths, but we find that this performs worse than using a single pivot, and exacerbates the hallucination problem because the same hallucinations can be probable across different paths. We also propose MaxEns, a novel combination strategy that makes the output biased towards the most confident predictions, hypothesising that confident predictions are less prone to be hallucinations. We evaluate different strategies on the FLORES benchmark for 20 low-resource language directions, demonstrating that MaxEns improves translation quality for low-resource languages while reducing hallucination in translations, compared to both direct translation and an averaging approach. On average, multi-pivot strategies still lag behind using English as a single pivot language, raising the question of how to identify the best pivoting strategy for a given translation direction.

Updated: 2024-04-28 08:26:11

标题: 调查使用大规模多语言机器翻译模型进行多轴集成

摘要: 大规模多语言机器翻译模型允许使用单个模型翻译大量语言，但在低资源和极低资源翻译方向上性能有限。通过高资源语言进行枢纽仍然是低资源方向的强大策略，在本文中，我们重新审视了通过多种语言进行枢纽的方式。先前的工作使用了简单的概率分布平均值从多条路径中，但我们发现这种方法效果不如使用单个枢纽，并且加剧了幻觉问题，因为相同的幻觉可能在不同的路径上是可信的。我们还提出了MaxEns，一种新颖的组合策略，使输出偏向于最有信心的预测，假设有信心的预测不太容易产生幻觉。我们在FLORES基准测试中评估了不同策略，在20个低资源语言方向上，表明MaxEns提高了低资源语言的翻译质量，同时减少了翻译中的幻觉，与直接翻译和平均方法相比。平均而言，多枢纽策略仍然落后于使用英语作为单一枢纽语言，这引发了如何确定针对特定翻译方向的最佳枢纽策略的问题。

更新时间: 2024-04-28 08:26:11

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.07439v3

When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour

Large Language Models have been demonstrating the ability to solve complex tasks by delivering answers that are positively evaluated by humans due in part to the intensive use of human feedback that refines responses. However, the suggestibility transmitted through human feedback increases the inclination to produce responses that correspond to the users' beliefs or misleading prompts as opposed to true facts, a behaviour known as sycophancy. This phenomenon decreases the bias, robustness, and, consequently, their reliability. In this paper, we shed light on the suggestibility of Large Language Models (LLMs) to sycophantic behaviour, demonstrating these tendencies via human-influenced prompts over different tasks. Our investigation reveals that LLMs show sycophantic tendencies when responding to queries involving subjective opinions and statements that should elicit a contrary response based on facts. In contrast, when confronted with mathematical tasks or queries that have an objective answer, these models at various scales seem not to follow the users' hints by demonstrating confidence in delivering the correct answers.

Updated: 2024-04-28 08:06:06

标题: 当大型语言模型与人类相矛盾时？大型语言模型的谄媚行为

摘要: 大型语言模型已经展示出通过提供得到人类积极评价的答案来解决复杂任务的能力，部分原因是大量使用人类反馈来改进响应。然而，通过人类反馈传递的易受影响性增加了产生与用户信念或误导性提示相符的响应的倾向，而不是真实事实，这种行为被称为拍马屁。这种现象降低了偏见、鲁棒性和因此可靠性。在本文中，我们揭示了大型语言模型（LLMs）对拍马屁行为的易受影响性，通过人类影响的提示在不同任务上展示这些倾向。我们的调查揭示了当回答涉及主观观点和应该根据事实引起相反回应的陈述的查询时，LLMs显示出拍马屁倾向。相反，当面对数学任务或具有客观答案的查询时，这些模型在不同规模上似乎不遵循用户的提示，而是表现出对提供正确答案的信心。

更新时间: 2024-04-28 08:06:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.09410v3

Attack on Scene Flow using Point Clouds

Deep neural networks have made significant advancements in accurately estimating scene flow using point clouds, which is vital for many applications like video analysis, action recognition, and navigation. Robustness of these techniques, however, remains a concern, particularly in the face of adversarial attacks that have been proven to deceive state-of-the-art deep neural networks in many domains. Surprisingly, the robustness of scene flow networks against such attacks has not been thoroughly investigated. To address this problem, the proposed approach aims to bridge this gap by introducing adversarial white-box attacks specifically tailored for scene flow networks. Experimental results show that the generated adversarial examples obtain up to 33.7 relative degradation in average end-point error on the KITTI and FlyingThings3D datasets. The study also reveals the significant impact that attacks targeting point clouds in only one dimension or color channel have on average end-point error. Analyzing the success and failure of these attacks on the scene flow networks and their 2D optical flow network variants show a higher vulnerability for the optical flow networks.

Updated: 2024-04-28 08:05:55

标题: 使用点云攻击场景流

摘要: 深度神经网络在准确估计场景流动方面取得了重大进展，使用点云对此进行估计对于许多应用非常重要，如视频分析、动作识别和导航。然而，这些技术的鲁棒性仍然是一个问题，特别是面对已被证明能够欺骗最先进的深度神经网络的对抗性攻击时。令人惊讶的是，场景流动网络对这种攻击的鲁棒性尚未得到彻底调查。为了解决这个问题，提出的方法旨在通过引入专门针对场景流动网络的白盒对抗攻击来弥合这一差距。实验结果表明，生成的对抗性示例在KITTI和FlyingThings3D数据集上的平均端点误差相对降低高达33.7％。该研究还揭示了针对点云仅在一个维度或颜色通道的攻击对平均端点误差的显著影响。分析这些攻击在场景流动网络和它们的2D光流网络变体上的成功与失败显示了光流网络更容易受到攻击。

更新时间: 2024-04-28 08:05:55

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2404.13621v2

Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning

Food classification is the foundation for developing food vision tasks and plays a key role in the burgeoning field of computational nutrition. Due to the complexity of food requiring fine-grained classification, recent academic research mainly modifies Convolutional Neural Networks (CNNs) and/or Vision Transformers (ViTs) to perform food category classification. However, to learn fine-grained features, the CNN backbone needs additional structural design, whereas ViT, containing the self-attention module, has increased computational complexity. In recent months, a new Sequence State Space (S4) model, through a Selection mechanism and computation with a Scan (S6), colloquially termed Mamba, has demonstrated superior performance and computation efficiency compared to the Transformer architecture. The VMamba model, which incorporates the Mamba mechanism into image tasks (such as classification), currently establishes the state-of-the-art (SOTA) on the ImageNet dataset. In this research, we introduce an academically underestimated food dataset CNFOOD-241, and pioneer the integration of a residual learning framework within the VMamba model to concurrently harness both global and local state features inherent in the original VMamba architectural design. The research results show that VMamba surpasses current SOTA models in fine-grained and food classification. The proposed Res-VMamba further improves the classification accuracy to 79.54\% without pretrained weight. Our findings elucidate that our proposed methodology establishes a new benchmark for SOTA performance in food recognition on the CNFOOD-241 dataset. The code can be obtained on GitHub: https://github.com/ChiShengChen/ResVMamba.

Updated: 2024-04-28 08:04:53

标题: Res-VMamba：使用具有深度残差学习的选择性状态空间模型进行细粒度食品类别视觉分类

摘要: 食物分类是开发食物视觉任务的基础，并在计算营养学领域发挥关键作用。由于食物的复杂性需要细粒度分类，最近的学术研究主要修改卷积神经网络（CNNs）和/或视觉变换器（ViTs）来进行食物类别分类。然而，为了学习细粒度特征，CNN骨干需要额外的结构设计，而ViT则包含自注意模块，增加了计算复杂性。最近几个月，通过选择机制和扫描（S6）进行计算的新的序列状态空间（S4）模型，俗称Mamba，表现出优越的性能和计算效率，相比之下，超越了Transformer架构。将Mamba机制整合到图像任务（如分类）中的VMamba模型目前在ImageNet数据集上取得了最新技术（SOTA）。在这项研究中，我们介绍了一个在学术上被低估的食物数据集CNFOOD-241，并开创性地在VMamba模型中引入了残差学习框架，以同时利用原始VMamba架构设计中固有的全局和局部状态特征。研究结果表明，VMamba在细粒度和食物分类方面超越了当前的SOTA模型。提出的Res-VMamba将分类准确率提高到了79.54\%，没有预训练权重。我们的研究结果阐明，我们提出的方法在CNFOOD-241数据集上建立了食物识别的SOTA性能的新基准。代码可以在GitHub上获取：https://github.com/ChiShengChen/ResVMamba。

更新时间: 2024-04-28 08:04:53

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2402.15761v2

Multi-stream Transmission for Directional Modulation Network via Distributed Multi-UAV-aided Multi-active-IRS

Active intelligent reflecting surface (IRS) is a revolutionary technique for the future 6G networks. The conventional far-field single-IRS-aided directional modulation(DM) networks have only one (no direct path) or two (existing direct path) degrees of freedom (DoFs). This means that there are only one or two streams transmitted simultaneously from base station to user and will seriously limit its rate gain achieved by IRS. How to create multiple DoFs more than two for DM? In this paper, single large-scale IRS is divided to multiple small IRSs and a novel multi-IRS-aided multi-stream DM network is proposed to achieve a point-to-point multi-stream transmission by creating $K$ ($\geq3$) DoFs, where multiple small IRSs are placed distributively via multiple unmanned aerial vehicles (UAVs). The null-space projection, zero-forcing (ZF) and phase alignment are adopted to design the transmit beamforming vector, receive beamforming vector and phase shift matrix (PSM), respectively, called NSP-ZF-PA. Here, $K$ PSMs and their corresponding beamforming vectors are independently optimized. The weighted minimum mean-square error (WMMSE) algorithm is involved in alternating iteration for the optimization variables by introducing the power constraint on IRS, named WMMSE-PC, where the majorization-minimization (MM) algorithm is used to solve the total PSM. To achieve a lower computational complexity, a maximum trace method, called Max-TR-SVD, is proposed by optimize the PSM of all IRSs. Numerical simulation results has shown that the proposed NSP-ZF-PA performs much better than Max-TR-SVD in terms of rate. In particular, the rate of NSP-ZF-PA with sixteen small IRSs is about five times that of NSP-ZF-PA with combining all small IRSs as a single large IRS. Thus, a dramatic rate enhancement may be achieved by multiple distributed IRSs.

Updated: 2024-04-28 07:58:27

标题: 多无人机辅助的多活性IRS的方向调制网络的多流传输

摘要: 主动智能反射表面（IRS）是未来6G网络的一项革命性技术。传统的远场单IRS辅助定向调制（DM）网络只有一个（无直接路径）或两个（现有直接路径）自由度（DoFs）。这意味着基站到用户只能同时传输一个或两个数据流，严重限制了通过IRS实现的速率增益。如何为DM创建多于两个的多个DoFs？本文将单一大规模IRS分割为多个小IRS，并提出了一种新颖的多IRS辅助多流DM网络，通过通过多个无人机（UAV）分布放置多个小IRS，实现点对点的多流传输，从而创建K（≥3）个DoFs。采用零空间投影、零强制（ZF）和相位对齐来设计发射波束形成矢量、接收波束形成矢量和相移矩阵（PSM），称为NSP-ZF-PA。在此过程中，K个PSM及其对应的波束形成矢量是独立优化的。加权最小均方误差（WMMSE）算法通过引入IRS功率约束，即WMMSE-PC，参与交替迭代以优化变量，其中使用主导极小化（MM）算法来解决总PSM。为了实现较低的计算复杂性，提出了一种最大迹方法，称为Max-TR-SVD，通过优化所有IRS的PSM。数值模拟结果表明，所提出的NSP-ZF-PA在速率方面表现比Max-TR-SVD好得多。特别是，具有十六个小IRS的NSP-ZF-PA的速率约为将所有小IRS组合为单个大IRS的NSP-ZF-PA的五倍。因此，通过多个分布式IRS可以实现显著的速率增强。

更新时间: 2024-04-28 07:58:27

领域: eess.SP,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2404.15297v2

Advancing Supervised Learning with the Wave Loss Function: A Robust and Smooth Approach

Loss function plays a vital role in supervised learning frameworks. The selection of the appropriate loss function holds the potential to have a substantial impact on the proficiency attained by the acquired model. The training of supervised learning algorithms inherently adheres to predetermined loss functions during the optimization process. In this paper, we present a novel contribution to the realm of supervised machine learning: an asymmetric loss function named wave loss. It exhibits robustness against outliers, insensitivity to noise, boundedness, and a crucial smoothness property. Theoretically, we establish that the proposed wave loss function manifests the essential characteristic of being classification-calibrated. Leveraging this breakthrough, we incorporate the proposed wave loss function into the least squares setting of support vector machines (SVM) and twin support vector machines (TSVM), resulting in two robust and smooth models termed Wave-SVM and Wave-TSVM, respectively. To address the optimization problem inherent in Wave-SVM, we utilize the adaptive moment estimation (Adam) algorithm. It is noteworthy that this paper marks the first instance of the Adam algorithm application to solve an SVM model. Further, we devise an iterative algorithm to solve the optimization problems of Wave-TSVM. To empirically showcase the effectiveness of the proposed Wave-SVM and Wave-TSVM, we evaluate them on benchmark UCI and KEEL datasets (with and without feature noise) from diverse domains. Moreover, to exemplify the applicability of Wave-SVM in the biomedical domain, we evaluate it on the Alzheimer Disease Neuroimaging Initiative (ADNI) dataset. The experimental outcomes unequivocally reveal the prowess of Wave-SVM and Wave-TSVM in achieving superior prediction accuracy against the baseline models.

Updated: 2024-04-28 07:32:00

标题: 推进监督学习与Wave损失函数：一种稳健且平滑的方法

摘要: 损失函数在监督学习框架中起着至关重要的作用。选择适当的损失函数可能对所获得模型的效率产生重大影响。在优化过程中，监督学习算法的训练固有地遵循预先确定的损失函数。本文介绍了一个对监督机器学习领域的创新贡献：一种名为波损失的非对称损失函数。它对异常值具有鲁棒性，对噪音不敏感，有界性，并且具有重要的平滑性质。从理论上讲，我们证明了所提出的波损失函数体现了分类校准的基本特征。利用这一突破，我们将所提出的波损失函数纳入支持向量机（SVM）和双支持向量机（TSVM）的最小二乘设置中，分别得到两个稳健且平滑的模型，分别称为Wave-SVM和Wave-TSVM。为了解决Wave-SVM中固有的优化问题，我们利用了自适应矩估计（Adam）算法。值得注意的是，本文标志着Adam算法首次应用于解决SVM模型的实例。此外，我们设计了一个迭代算法来解决Wave-TSVM的优化问题。为了实证展示所提出的Wave-SVM和Wave-TSVM的有效性，我们在来自不同领域的基准UCI和KEEL数据集上评估它们（包括有和无特征噪声）。此外，为了展示Wave-SVM在生物医学领域的适用性，我们在阿尔茨海默病神经影像计划（ADNI）数据集上对其进行评估。实验结果明确展示了Wave-SVM和Wave-TSVM在预测准确性上优于基准模型的实力。

更新时间: 2024-04-28 07:32:00

领域: cs.LG

下载: http://arxiv.org/abs/2404.18101v1

FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection

Existing benchmarks for fake news detection have significantly contributed to the advancement of models in assessing the authenticity of news content. However, these benchmarks typically focus solely on news pertaining to a single semantic topic or originating from a single platform, thereby failing to capture the diversity of multi-domain news in real scenarios. In order to understand fake news across various domains, the external knowledge and fine-grained annotations are indispensable to provide precise evidence and uncover the diverse underlying strategies for fabrication, which are also ignored by existing benchmarks. To address this gap, we introduce a novel multi-domain knowledge-enhanced benchmark with fine-grained annotations, named \textbf{FineFake}. FineFake encompasses 16,909 data samples spanning six semantic topics and eight platforms. Each news item is enriched with multi-modal content, potential social context, semi-manually verified common knowledge, and fine-grained annotations that surpass conventional binary labels. Furthermore, we formulate three challenging tasks based on FineFake and propose a knowledge-enhanced domain adaptation network. Extensive experiments are conducted on FineFake under various scenarios, providing accurate and reliable benchmarks for future endeavors. The entire FineFake project is publicly accessible as an open-source repository at \url{https://github.com/Accuser907/FineFake}.

Updated: 2024-04-28 07:26:08

标题: FineFake：一个知识丰富的细粒度多领域假新闻检测数据集

摘要: 现有的检测假新闻的基准显著促进了模型在评估新闻内容真实性方面的进展。然而，这些基准通常仅关注与单一语义主题相关的新闻或来自单一平台的新闻，因此未能捕捉真实场景中多领域新闻的多样性。为了理解不同领域的假新闻，外部知识和精细的注释是不可或缺的，以提供精确证据并揭示制造的多样潜在策略，这也是现有基准所忽视的。为了填补这一差距，我们引入了一个新颖的多领域知识增强基准，名为\textbf{FineFake}。FineFake涵盖了16,909个数据样本，涵盖了六个语义主题和八个平台。每个新闻项目都丰富了多模态内容、潜在社会背景、半手动验证的常见知识，以及超越传统二进制标签的精细注释。此外，我们基于FineFake制定了三项具有挑战性的任务，并提出了一个知识增强的领域自适应网络。在FineFake上进行了广泛实验，提供了未来努力的准确可靠的基准。整个FineFake项目可以通过开源存储库公开访问，网址为\url{https://github.com/Accuser907/FineFake}。

更新时间: 2024-04-28 07:26:08

领域: cs.CL,cs.AI,cs.MM

下载: http://arxiv.org/abs/2404.01336v2

Multi-Task Learning in Natural Language Processing: An Overview

Deep learning approaches have achieved great success in the field of Natural Language Processing (NLP). However, directly training deep neural models often suffer from overfitting and data scarcity problems that are pervasive in NLP tasks. In recent years, Multi-Task Learning (MTL), which can leverage useful information of related tasks to achieve simultaneous performance improvement on these tasks, has been used to handle these problems. In this paper, we give an overview of the use of MTL in NLP tasks. We first review MTL architectures used in NLP tasks and categorize them into four classes, including parallel architecture, hierarchical architecture, modular architecture, and generative adversarial architecture. Then we present optimization techniques on loss construction, gradient regularization, data sampling, and task scheduling to properly train a multi-task model. After presenting applications of MTL in a variety of NLP tasks, we introduce some benchmark datasets. Finally, we make a conclusion and discuss several possible research directions in this field.

Updated: 2024-04-28 07:25:45

标题: 自然语言处理中的多任务学习：概述

摘要: 深度学习方法在自然语言处理领域取得了巨大成功。然而，直接训练深度神经模型通常会遇到在NLP任务中普遍存在的过拟合和数据稀缺问题。近年来，多任务学习（MTL）被用来处理这些问题，通过利用相关任务的有用信息实现同时提高这些任务的性能。本文概述了MTL在NLP任务中的应用。首先回顾了在NLP任务中使用的MTL架构，并将其分类为四类，包括并行架构、层次架构、模块化架构和生成对抗架构。然后我们介绍了在损失构建、梯度正则化、数据采样和任务调度方面的优化技术，以正确训练多任务模型。在介绍MTL在各种NLP任务中的应用后，我们介绍了一些基准数据集。最后，我们做出了结论，并讨论了这一领域中几个可能的研究方向。

更新时间: 2024-04-28 07:25:45

领域: cs.AI

下载: http://arxiv.org/abs/2109.09138v2

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

Conventional text-to-speech (TTS) research has predominantly focused on enhancing the quality of synthesized speech for speakers in the training dataset. The challenge of synthesizing lifelike speech for unseen, out-of-dataset speakers, especially those with limited reference data, remains a significant and unresolved problem. While zero-shot or few-shot speaker-adaptive TTS approaches have been explored, they have many limitations. Zero-shot approaches tend to suffer from insufficient generalization performance to reproduce the voice of speakers with heavy accents. While few-shot methods can reproduce highly varying accents, they bring a significant storage burden and the risk of overfitting and catastrophic forgetting. In addition, prior approaches only provide either zero-shot or few-shot adaptation, constraining their utility across varied real-world scenarios with different demands. Besides, most current evaluations of speaker-adaptive TTS are conducted only on datasets of native speakers, inadvertently neglecting a vast portion of non-native speakers with diverse accents. Our proposed framework unifies both zero-shot and few-shot speaker adaptation strategies, which we term as "instant" and "fine-grained" adaptations based on their merits. To alleviate the insufficient generalization performance observed in zero-shot speaker adaptation, we designed two innovative discriminators and introduced a memory mechanism for the speech decoder. To prevent catastrophic forgetting and reduce storage implications for few-shot speaker adaptation, we designed two adapters and a unique adaptation procedure.

Updated: 2024-04-28 06:50:55

标题: USAT：一种通用的说话者自适应文本转语音方法

摘要: 传统的文本到语音（TTS）研究主要集中在提高训练数据集中说话者合成语音的质量上。为看不见的、超出数据集的说话者合成栩栩如生的语音的挑战，尤其是那些参考数据有限的说话者，仍然是一个重要且未解决的问题。尽管已经探索了零样本或少样本的说话者自适应TTS方法，但它们存在许多限制。零样本方法往往在重口音说话者的声音复制上存在不足的泛化性能。而少样本方法可以复制高度变化的口音，但会带来重大的存储负担以及过拟合和灾难性遗忘的风险。此外，先前的方法只提供零样本或少样本自适应，限制了它们在不同需求的各种实际场景中的实用性。此外，目前大多数说话者自适应TTS的评估仅在本地说话者数据集上进行，无意中忽视了具有不同口音的大量非母语说话者。我们提出的框架统一了零样本和少样本说话者自适应策略，我们称之为基于它们的优点的“即时”和“精细”的自适应。为了缓解零样本说话者自适应中观察到的不足泛化性能，我们设计了两个创新的鉴别器，并为语音解码器引入了一种记忆机制。为了防止少样本说话者自适应中的灾难性遗忘和减少存储影响，我们设计了两个适配器和一个独特的自适应过程。

更新时间: 2024-04-28 06:50:55

领域: cs.SD,cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.18094v1

Synthetic Lagrangian Turbulence by Generative Diffusion Models

Lagrangian turbulence lies at the core of numerous applied and fundamental problems related to the physics of dispersion and mixing in engineering, bio-fluids, atmosphere, oceans, and astrophysics. Despite exceptional theoretical, numerical, and experimental efforts conducted over the past thirty years, no existing models are capable of faithfully reproducing statistical and topological properties exhibited by particle trajectories in turbulence. We propose a machine learning approach, based on a state-of-the-art diffusion model, to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers, thereby bypassing the need for direct numerical simulations or experiments to obtain reliable Lagrangian data. Our model demonstrates the ability to reproduce most statistical benchmarks across time scales, including the fat-tail distribution for velocity increments, the anomalous power law, and the increased intermittency around the dissipative scale. Slight deviations are observed below the dissipative scale, particularly in the acceleration and flatness statistics. Surprisingly, the model exhibits strong generalizability for extreme events, producing events of higher intensity and rarity that still match the realistic statistics. This paves the way for producing synthetic high-quality datasets for pre-training various downstream applications of Lagrangian turbulence.

Updated: 2024-04-28 06:44:33

标题: 合成拉格朗日湍流的生成扩散模型

摘要: 拉格朗日湍流是与工程、生物流体、大气、海洋和天体物理相关的许多应用和基础问题的核心。尽管过去三十年进行了卓越的理论、数值和实验努力，但目前没有现有模型能够忠实地复制湍流中粒子轨迹所展示的统计和拓扑特性。我们提出了一种基于最先进扩散模型的机器学习方法，用于在高雷诺数下生成三维湍流中的单粒子轨迹，从而避免需要进行直接数值模拟或实验来获得可靠的拉格朗日数据。我们的模型展示了能够在各个时间尺度上重现大多数统计基准，包括速度增量的尾部分布、异常幂律以及在耗散尺度周围的增加的间歇性。在耗散尺度以下观察到了轻微的偏差，特别是在加速度和平坦度统计方面。令人惊讶的是，该模型对极端事件具有很强的泛化能力，产生了更高强度和稀有性的事件，但仍与现实统计数据相匹配。这为生成用于预训练拉格朗日湍流各种下游应用的合成高质量数据集铺平了道路。

更新时间: 2024-04-28 06:44:33

领域: physics.flu-dyn,cond-mat.stat-mech,cs.CE,cs.LG,nlin.CD

下载: http://arxiv.org/abs/2307.08529v2

A Novel Classification of Attacks on Blockchain Layers: Vulnerabilities, Attacks, Mitigations, and Research Directions

The widespread adoption of blockchain technology has amplified the spectrum of potential threats to its integrity and security. The ongoing quest to exploit vulnerabilities emphasizes how critical it is to expand on current research initiatives. Thus, using a methodology based on discrete blockchain layers, our survey study aims to broaden the existing body of knowledge by thoroughly discussing both new and known attack vectors inside the blockchain ecosystem. This survey proposes a novel classification of blockchain attacks and an in-depth investigation of blockchain data security. In particular, the paper provides a thorough discussion of the attack techniques and vulnerabilities that are specific to each tier, along with a detailed look at mitigating techniques. We reveal the deep dynamics of these security concerns by closely investigating the fundamental causes of attacks at various blockchain tiers. We clarify mitigation methods for known vulnerabilities and offer new information on recently developed attack vectors. We also discuss the implications of quantum computing in blockchain and the weaknesses in the current technology that can be exploited in the future. Our study advances the field of blockchain security and privacy research while also contributing to our understanding of blockchain vulnerabilities and attacks. This survey paper is a useful tool for readers who want to learn more about the intricacies of blockchain security. It also invites researchers to help strengthen blockchain privacy and security, paving the way for further developments in this dynamic and ever-evolving field.

Updated: 2024-04-28 06:40:50

标题: 区块链层面攻击的新分类：漏洞、攻击、缓解措施和研究方向

摘要: 区块链技术的广泛应用增加了其完整性和安全性潜在威胁的范围。对漏洞的不断利用强调了扩展当前研究倡议的重要性。因此，通过基于离散区块链层的方法论，我们的调查研究旨在通过彻底讨论区块链生态系统内部的新攻击向量和已知攻击向量，扩大现有知识体系。这份调查提出了区块链攻击的新分类，并深入研究了区块链数据安全。特别是，本文对每个层面特有的攻击技术和漏洞进行了彻底讨论，同时详细介绍了缓解技术。我们通过密切调查各个区块链层面攻击的根本原因，揭示了这些安全问题的深层动态。我们澄清了已知漏洞的缓解方法，并提供了关于最近开发的攻击向量的新信息。我们还讨论了量子计算在区块链中的影响以及当前技术中可以在未来利用的弱点。我们的研究推进了区块链安全和隐私研究领域，同时也有助于我们了解区块链漏洞和攻击。这份调查论文对于想要了解区块链安全复杂性的读者来说是一个有用的工具。它还邀请研究人员帮助加强区块链隐私和安全，为这个充满活力和不断发展的领域的进一步发展铺平道路。

更新时间: 2024-04-28 06:40:50

领域: cs.CR

下载: http://arxiv.org/abs/2404.18090v1

Online,Target-Free LiDAR-Camera Extrinsic Calibration via Cross-Modal Mask Matching

LiDAR-camera extrinsic calibration (LCEC) is crucial for data fusion in intelligent vehicles. Offline, target-based approaches have long been the preferred choice in this field. However, they often demonstrate poor adaptability to real-world environments. This is largely because extrinsic parameters may change significantly due to moderate shocks or during extended operations in environments with vibrations. In contrast, online, target-free approaches provide greater adaptability yet typically lack robustness, primarily due to the challenges in cross-modal feature matching. Therefore, in this article, we unleash the full potential of large vision models (LVMs), which are emerging as a significant trend in the fields of computer vision and robotics, especially for embodied artificial intelligence, to achieve robust and accurate online, target-free LCEC across a variety of challenging scenarios. Our main contributions are threefold: we introduce a novel framework known as MIAS-LCEC, provide an open-source versatile calibration toolbox with an interactive visualization interface, and publish three real-world datasets captured from various indoor and outdoor environments. The cornerstone of our framework and toolbox is the cross-modal mask matching (C3M) algorithm, developed based on a state-of-the-art (SoTA) LVM and capable of generating sufficient and reliable matches. Extensive experiments conducted on these real-world datasets demonstrate the robustness of our approach and its superior performance compared to SoTA methods, particularly for the solid-state LiDARs with super-wide fields of view.

Updated: 2024-04-28 06:25:56

标题: 在线、无需目标的LiDAR-相机外参校准：基于跨模态掩码匹配的方法

摘要: 激光雷达-相机外参标定(LCEC)对于智能车辆数据融合至关重要。离线时，基于目标的方法长期以来一直是该领域的首选。然而，它们往往表现出对真实世界环境的适应性不足。这主要是因为外参参数可能会由于中等冲击或在具有振动的环境中进行长时间操作而发生显著变化。相比之下，在线时，无目标的方法提供更大的适应性，但通常缺乏鲁棒性，主要是由于跨模态特征匹配的挑战。因此，在本文中，我们发挥了大视觉模型(LVMs)的全部潜力，这些模型在计算机视觉和机器人领域，特别是在具有体现人工智能的领域中，正崭露头角，以实现在各种具有挑战性场景中的鲁棒且准确的在线、无目标LCEC。我们的主要贡献有三个方面：我们引入了一种名为MIAS-LCEC的新框架，提供了一个开源多功能标定工具箱和交互式可视化界面，并发布了三个从各种室内和室外环境中捕获的真实世界数据集。我们框架和工具箱的基石是跨模态掩模匹配(C3M)算法，该算法基于最先进的LVM开发，能够生成足够可靠的匹配。对这些真实世界数据集进行的大量实验表明了我们方法的鲁棒性，以及与最先进方法相比的卓越性能，特别是对于具有超宽视野的固态激光雷达。

更新时间: 2024-04-28 06:25:56

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.18083v1

Cyber Security in Containerization Platforms: A Comparative Study of Security Challenges, Measures and Best Practices

The paper reviews the comparative study of security measures, challenges, and best practices with a view to enhancing cyber safety in containerized platforms. This review is intended to give insight into the enhanced security posture of containerized environments, with a view to examining safety vulnerabilities in containerization platforms, exploring strategies for increasing containers isolation and assessing how encryption techniques play an important role in providing secure applications. The paper also provides practical guidance for organizations seeking to strengthen their cyber security defenses in the containerization area platforms.

Updated: 2024-04-28 06:22:25

标题: 容器化平台中的网络安全：安全挑战、措施和最佳实践的比较研究

摘要: 这篇论文回顾了关于安全措施、挑战和最佳实践的比较研究，旨在提高容器化平台的网络安全。这篇综述旨在揭示容器化环境的增强安全姿态，以便审查容器化平台的安全漏洞，探索增加容器隔离的策略，并评估加密技术在提供安全应用程序方面的重要作用。该论文还为寻求加强容器化领域网络安全防御的组织提供了实用指导。

更新时间: 2024-04-28 06:22:25

领域: cs.CR

下载: http://arxiv.org/abs/2404.18082v1

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost MCTS-based algorithms. Specifically, we propose a new scheme that simplifies data collecting and reanalyzing, which significantly reduces the search cost while guarantees the performance as well. Furthermore, to accelerate each search process, we conceive a method to reuse the subsequent information in the trajectory. The corresponding analysis conducted on the bandit model also provides auxiliary theoretical substantiation for our design. Experiments conducted on Atari environments and board games demonstrates that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero.

Updated: 2024-04-28 06:21:04

标题: ReZero：通过及时和快速重新分析来增强基于MCTS的算法

摘要: 基于MCTS的算法，如MuZero及其衍生版本，在各种决策领域取得了广泛成功。这些算法利用重新分析的过程来增强样本效率，尽管以大量的墙钟时间消耗为代价。为解决这一问题，我们提出了一种名为ReZero的通用方法来提升基于MCTS的算法。具体来说，我们提出了一种新方案，简化了数据收集和重新分析，显著降低了搜索成本，同时保证了性能。此外，为加速每个搜索过程，我们构想了一种方法来重复使用轨迹中的后续信息。在赌博模型上进行的相应分析还为我们的设计提供了辅助理论支持。在Atari环境和棋盘游戏上进行的实验表明，ReZero显著提高了训练速度，同时保持了高样本效率。该代码可作为LightZero基准的一部分在https://github.com/opendilab/LightZero上获得。

更新时间: 2024-04-28 06:21:04

领域: cs.AI

下载: http://arxiv.org/abs/2404.16364v2

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. To further explore and enhance LLMs' potential in music composition by leveraging their reasoning ability and the large knowledge base in music history and theory, we propose ComposerX, an agent-based symbolic music generation framework. We find that applying a multi-agent approach significantly improves the music composition quality of GPT-4. The results demonstrate that ComposerX is capable of producing coherent polyphonic music compositions with captivating melodies, while adhering to user instructions.

Updated: 2024-04-28 06:17:42

标题: ComposerX：使用LLMs进行多智能体符号音乐创作

摘要: 音乐创作代表了人类的创造性一面，本身就是一个复杂的任务，需要理解和生成具有长期依赖和和谐约束的信息的能力。尽管在STEM学科中展示出令人印象深刻的能力，但当前的LLMs在这项任务中很容易失败，即使装备了现代技术如“上下文学习”和“思维链”也会生成质量不佳的音乐。为了进一步探索和提升LLMs在音乐创作中的潜力，利用它们的推理能力和音乐历史和理论的大量知识库，我们提出了ComposerX，一个基于代理的符号音乐生成框架。我们发现，采用多代理方法显著提高了GPT-4的音乐作曲质量。结果表明，ComposerX能够生成连贯的多声部音乐作品，具有引人入胜的旋律，并遵循用户的指令。

更新时间: 2024-04-28 06:17:42

领域: cs.SD,cs.AI,cs.CL,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2404.18081v1

Research and application of artificial intelligence based webshell detection model: A literature review

Webshell, as the "culprit" behind numerous network attacks, is one of the research hotspots in the field of cybersecurity. However, the complexity, stealthiness, and confusing nature of webshells pose significant challenges to the corresponding detection schemes. With the rise of Artificial Intelligence (AI) technology, researchers have started to apply different intelligent algorithms and neural network architectures to the task of webshell detection. However, the related research still lacks a systematic and standardized methodological process, which is confusing and redundant. Therefore, following the development timeline, we carefully summarize the progress of relevant research in this field, dividing it into three stages: Start Stage, Initial Development Stage, and In-depth Development Stage. We further elaborate on the main characteristics and core algorithms of each stage. In addition, we analyze the pain points and challenges that still exist in this field and predict the future development trend of this field from our point of view. To the best of our knowledge, this is the first review that details the research related to AI-based webshell detection. It is also hoped that this paper can provide detailed technical information for more researchers interested in AI-based webshell detection tasks.

Updated: 2024-04-28 06:14:27

标题: 基于人工智能的Webshell检测模型的研究与应用：文献综述

摘要: Webshell作为众多网络攻击的“罪魁祸首”，是网络安全领域研究的热点之一。然而，Webshell的复杂性、隐匿性和混淆性给相应的检测方案带来了重大挑战。随着人工智能（AI）技术的兴起，研究人员已开始将不同的智能算法和神经网络架构应用于Webshell检测任务。然而，相关研究仍缺乏系统化和标准化的方法论过程，令人困惑且冗余。因此，根据发展时间线，我们仔细总结了该领域相关研究的进展，将其分为三个阶段：开始阶段、初步发展阶段和深入发展阶段。我们进一步阐述了每个阶段的主要特征和核心算法。此外，我们分析了该领域仍然存在的痛点和挑战，并从我们的角度预测了该领域的未来发展趋势。据我们所知，这是首次详细介绍基于AI的Webshell检测相关研究的综述。希望本文能为更多对基于AI的Webshell检测任务感兴趣的研究人员提供详细的技术信息。

更新时间: 2024-04-28 06:14:27

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.00066v1

Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks

The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Outbound Load Planning Problem (OLPP) that considers flow and load planning challenges jointly in order to adjust loads and flows as the demand forecast changes over time before the day of operations in a terminal. The paper aims at developing a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the OLPP as a mixed-integer programming model and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate terminals. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, this paper proposes a lexicographical optimization approach that eliminates those symmetries by generating optimal solutions staying close to a reference plan. Moreover, this paper designs an optimization proxy that addresses the computational challenges of the optimization model. The optimization proxy combines a machine-learning model and a repair procedure to find near-optimal solutions that satisfy real-time constraints imposed by planners in the loop. An extensive computational study on industrial instances shows that the optimization proxy is orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the OLPP for load consolidation and the significant savings obtained from combining machine learning and optimization.

Updated: 2024-04-28 06:00:23

标题: 卡车服务网络中基于优化的动态货物装载规划学习

摘要: 装载规划问题是快递公司服务网络设计中的一个关键挑战：它决定了在两个终端之间的时间内分配多少拖车进行派送。另一个关键挑战是确定流程计划，它指定了包裹数量如何分配到计划的装载中。本文考虑了出库装载规划问题（OLPP），该问题综合考虑了流程和装载规划挑战，以便在终端操作日之前随着需求预测的变化随时调整装载和流程。本文旨在开发一个决策支持工具，以帮助网络中各个终端的规划人员做出这些决策。本文将OLPP制定为一个混合整数规划模型，并展示了在网络中每种商品可以通过主要和备用终端路由的情况下会产生大量对称性。因此，优化求解器可能会针对密切相关的问题返回根本不同的解决方案，令规划人员感到困惑并降低对优化的信任。为了解决这一限制，本文提出了一个词典优化方法，通过生成与参考计划接近的最优解来消除这些对称性。此外，本文设计了一个优化代理，解决了优化模型的计算挑战。优化代理结合了机器学习模型和修复程序，以找到满足规划人员实时约束的接近最优解决方案。对工业实例进行的广泛计算研究表明，优化代理在生成一致解决方案方面快了数个数量级。所提出的方法还展示了OLPP在装载整合和通过结合机器学习和优化获得的显著节省方面的益处。

更新时间: 2024-04-28 06:00:23

领域: cs.AI,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2307.04050v2

Explainable, Interpretable & Trustworthy AI for Intelligent Digital Twin: Case Study on Remaining Useful Life

Artificial intelligence (AI) and Machine learning (ML) are increasingly used in energy and engineering systems, but these models must be fair, unbiased, and explainable. It is critical to have confidence in AI's trustworthiness. ML techniques have been useful in predicting important parameters and in improving model performance. However, for these AI techniques to be useful for making decisions, they need to be audited, accounted for, and easy to understand. Therefore, the use of explainable AI (XAI) and interpretable machine learning (IML) is crucial for the accurate prediction of prognostics, such as remaining useful life (RUL), in a digital twin system, to make it intelligent while ensuring that the AI model is transparent in its decision-making processes and that the predictions it generates can be understood and trusted by users. By using AI that is explainable, interpretable, and trustworthy, intelligent digital twin systems can make more accurate predictions of RUL, leading to better maintenance and repair planning, and ultimately, improved system performance. The objective of this paper is to explain the ideas of XAI and IML and to justify the important role of AI/ML in the digital twin framework and components, which requires XAI to understand the prediction better. This paper explains the importance of XAI and IML in both local and global aspects to ensure the use of trustworthy AI/ML applications for RUL prediction. We used the RUL prediction for the XAI and IML studies and leveraged the integrated Python toolbox for interpretable machine learning~(PiML).

Updated: 2024-04-28 05:53:46

标题: 可解释、可解释和值得信赖的智能数字孪生AI：剩余寿命案例研究

摘要: 人工智能（AI）和机器学习（ML）在能源和工程系统中的应用越来越普遍，但这些模型必须公平、无偏、可解释。对AI的信任度至关重要。机器学习技术在预测重要参数和提高模型性能方面非常有用。然而，为了使这些AI技术在决策中有用，它们需要经过审计、核算，并且易于理解。因此，使用可解释的AI（XAI）和可解释的机器学习（IML）对于准确预测预后，如剩余寿命（RUL），在数字孪生系统中至关重要，使其智能化，同时确保AI模型在决策过程中透明，并且生成的预测能够被用户理解和信任。通过使用可解释、可解释和值得信赖的AI，智能数字孪生系统可以更准确地预测RUL，从而实现更好的维护和修理规划，最终提高系统性能。本文的目标是解释XAI和IML的理念，并证明AI/ML在数字孪生框架和组件中的重要作用，这需要XAI更好地理解预测。本文解释了XAI和IML在本地和全局方面的重要性，以确保可信赖的AI/ML应用于RUL预测。我们使用RUL预测进行XAI和IML研究，并利用集成Python工具箱进行可解释的机器学习（PiML）。

更新时间: 2024-04-28 05:53:46

领域: cs.LG,stat.AP,stat.CO

下载: http://arxiv.org/abs/2301.06676v2

Generative AI for Low-Carbon Artificial Intelligence of Things

By integrating Artificial Intelligence (AI) with the Internet of Things (IoT), Artificial Intelligence of Things (AIoT) has revolutionized many fields. However, AIoT is facing the challenges of energy consumption and carbon emissions due to the continuous advancement of mobile technology. Fortunately, Generative AI (GAI) holds immense potential to reduce carbon emissions of AIoT due to its excellent reasoning and generation capabilities. In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT. Specifically, we first study the main impacts that cause carbon emissions in AIoT, and then introduce GAI techniques and their relations to carbon emissions. We then explore the application prospects of GAI in low-carbon AIoT, focusing on how GAI can reduce carbon emissions of network components. Subsequently, we propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules to generate more accurate and reliable optimization problems. Furthermore, we utilize Generative Diffusion Models (GDMs) to identify optimal strategies for carbon emission reduction. Simulation results demonstrate the effectiveness of the proposed framework. Finally, we insightfully provide open research directions for low-carbon AIoT.

Updated: 2024-04-28 05:46:28

标题: Generative AI 用于低碳物联网人工智能

摘要: 通过将人工智能（AI）与物联网（IoT）相结合，物联网人工智能（AIoT）已经在许多领域引发了革命。然而，由于移动技术不断发展，AIoT面临能源消耗和碳排放的挑战。幸运的是，生成式人工智能（GAI）具有巨大潜力，可以通过其出色的推理和生成能力来减少AIoT的碳排放。本文探讨了GAI在减少碳排放方面的潜力，并提出了一种新颖的GAI启用解决方案，用于低碳AIoT。具体来说，我们首先研究了导致AIoT碳排放的主要影响因素，然后介绍了GAI技术及其与碳排放的关系。然后，我们探讨了GAI在低碳AIoT中的应用前景，重点关注GAI如何减少网络组件的碳排放。随后，我们提出了一个大型语言模型（LLM）启用的碳排放优化框架，其中我们设计了可插拔的LLM和检索增强生成（RAG）模块，以生成更准确和可靠的优化问题。此外，我们利用生成扩散模型（GDM）来识别减少碳排放的最佳策略。模拟结果表明了所提出框架的有效性。最后，我们深入提供了低碳AIoT的开放研究方向。

更新时间: 2024-04-28 05:46:28

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2404.18077v1

Heterogeneous Acceleration Pipeline for Recommendation System Training

Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer time. In contrast, the GPU-only mode utilizes High Bandwidth Memory (HBM) across multiple GPUs for storing embedding tables. However, this approach is expensive and presents scaling concerns. This paper introduces Hotline, a heterogeneous acceleration pipeline that addresses these concerns. Hotline develops a data-aware and model-aware scheduling pipeline by leveraging the insight that only a few embedding entries are frequently accessed (popular). This approach utilizes CPU main memory for non-popular embeddings and GPUs' HBM for popular embeddings. To achieve this, Hotline accelerator fragments a mini-batch into popular and non-popular micro-batches. It gathers the necessary working parameters for non-popular micro-batches from the CPU, while GPUs execute popular micro-batches. The hardware accelerator dynamically coordinates the execution of popular embeddings on GPUs and non-popular embeddings from the CPU's main memory. Real-world datasets and models confirm Hotline's effectiveness, reducing average end-to-end training time by 2.2x compared to Intel-optimized CPU-GPU DLRM baseline.

Updated: 2024-04-28 05:44:15

标题: 推荐系统训练的异构加速管道

摘要: 推荐模型依赖于深度学习网络和大型嵌入表，导致计算和内存密集型过程。这些模型通常使用混合CPU-GPU或仅GPU配置进行训练。混合模式结合了GPU的神经网络加速和CPU的内存存储和供应用于嵌入表，但可能会产生显著的CPU到GPU传输时间。相比之下，仅GPU模式利用多个GPU上的高带宽内存（HBM）存储嵌入表。然而，这种方法昂贵并且存在扩展性问题。本文介绍了Hotline，一个异构加速管线，解决了这些问题。Hotline通过利用只有少量嵌入条目经常访问（热门）的洞察力，开发了一个数据感知和模型感知的调度管线。这种方法利用CPU主内存存储非热门嵌入，并利用GPU的HBM存储热门嵌入。为了实现这一点，Hotline加速器将一个小批次分成热门和非热门微批次。它从CPU收集非热门微批次的必要工作参数，而GPU执行热门微批次。硬件加速器动态协调GPU上热门嵌入和CPU主内存中非热门嵌入的执行。真实世界的数据集和模型验证了Hotline的有效性，将平均端到端训练时间降低了2.2倍，与Intel优化的CPU-GPU DLRM基线相比。

更新时间: 2024-04-28 05:44:15

领域: cs.AR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2204.05436v2

MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps. To evaluate the performance of MMAC-Copilot, we conducted experiments using both the GAIA benchmark and our newly introduced Visual Interaction Benchmark (VIBench). VIBench focuses on non-API-interactable applications across various domains, including 3D gaming, recreation, and office scenarios. MMAC-Copilot achieved exceptional performance on GAIA, with an average improvement of 6.8\% over existing leading systems. Furthermore, it demonstrated remarkable capability on VIBench, particularly in managing various methods of interaction within systems and applications. These results underscore MMAC-Copilot's potential in advancing the field of autonomous virtual agents through its innovative approach to agent collaboration.

Updated: 2024-04-28 05:33:15

标题: MMAC-Copilot：多模态代理协作操作系统Copilot

摘要: 自主虚拟代理通常受到其与现实世界环境交互的单一模式的限制，从而限制了它们的多功能性。为了解决这个问题，我们提出了多模态代理协作框架（MMAC-Copilot），这个框架利用不同代理的集体专业知识来增强与操作系统的交互能力。该框架引入了团队协作链，使每个参与代理根据其特定领域知识贡献见解，有效减少与知识领域差距相关的幻想。为了评估MMAC-Copilot的性能，我们使用GAIA基准和我们新引入的视觉交互基准（VIBench）进行了实验。VIBench专注于各种领域的非API可交互应用程序，包括3D游戏、娱乐和办公场景。MMAC-Copilot在GAIA上取得了出色的表现，平均改进了6.8％，超过了现有领先系统。此外，它在VIBench上展现了卓越的能力，特别是在管理系统和应用程序内的各种交互方法方面。这些结果强调了MMAC-Copilot通过其创新的代理协作方法在推动自主虚拟代理领域的潜力。

更新时间: 2024-04-28 05:33:15

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2404.18074v1

Can Perplexity Predict Fine-Tuning Performance? An Investigation of Tokenization Effects on Sequential Language Models for Nepali

Recent language models use subwording mechanisms to handle Out-of-Vocabulary(OOV) words seen during test time and, their generation capacity is generally measured using perplexity, an intrinsic metric. It is known that increasing the subword granularity results in a decrease of perplexity value. However, the study of how subwording affects the understanding capacity of language models has been very few and only limited to a handful of languages. To reduce this gap we used 6 different tokenization schemes to pretrain relatively small language models in Nepali and used the representations learned to finetune on several downstream tasks. Although byte-level BPE algorithm has been used in recent models like GPT, RoBERTa we show that on average they are sub-optimal in comparison to algorithms such as SentencePiece in finetuning performances for Nepali. Additionally, similar recent studies have focused on the Bert-based language model. We, however, pretrain and finetune sequential transformer-based language models.

Updated: 2024-04-28 05:26:12

标题: 困惑能否预测微调性能？对尼泊尔语序列语言模型中的分词效果进行调查

摘要: 最近的语言模型使用子词机制来处理测试时看到的Out-of-Vocabulary（OOV）单词，它们的生成能力通常使用困惑度作为内在度量标准。众所周知，增加子词粒度会导致困惑度值的降低。然而，关于子词如何影响语言模型的理解能力的研究非常少，且仅限于少数几种语言。为了缩小这一差距，我们使用了6种不同的标记化方案来预训练尼泊尔语的相对较小的语言模型，并利用学到的表示来在几个下游任务上进行微调。尽管像GPT、RoBERTa等最近的模型中使用了字节级BPE算法，但我们发现它们在尼泊尔语的微调性能上平均而言不如算法如SentencePiece。此外，类似的最近研究集中在基于Bert的语言模型上。然而，我们预先训练和微调了基于顺序变压器的语言模型。

更新时间: 2024-04-28 05:26:12

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.18071v1

Which images to label for few-shot medical landmark detection?

The success of deep learning methods relies on the availability of well-labeled large-scale datasets. However, for medical images, annotating such abundant training data often requires experienced radiologists and consumes their limited time. Few-shot learning is developed to alleviate this burden, which achieves competitive performances with only several labeled data. However, a crucial yet previously overlooked problem in few-shot learning is about the selection of template images for annotation before learning, which affects the final performance. We herein propose a novel Sample Choosing Policy (SCP) to select "the most worthy" images for annotation, in the context of few-shot medical landmark detection. SCP consists of three parts: 1) Self-supervised training for building a pre-trained deep model to extract features from radiological images, 2) Key Point Proposal for localizing informative patches, and 3) Representative Score Estimation for searching the most representative samples or templates. The advantage of SCP is demonstrated by various experiments on three widely-used public datasets. For one-shot medical landmark detection, its use reduces the mean radial errors on Cephalometric and HandXray datasets by 14.2% (from 3.595mm to 3.083mm) and 35.5% (4.114mm to 2.653mm), respectively.

Updated: 2024-04-28 05:13:35

标题: 应该标记哪些图像用于少样本医学地标检测？

摘要: 深度学习方法的成功依赖于可用的标记良好的大规模数据集。然而，对于医学图像，标注如此丰富的训练数据通常需要经验丰富的放射科医生，并消耗他们有限的时间。少样本学习被开发出来以减轻这种负担，它仅需要少量标记数据就能达到竞争性的性能。然而，在少样本学习中一个关键但之前被忽视的问题是在学习之前选择模板图像进行标注，这会影响最终的性能。在这里，我们提出了一种新颖的样本选择策略（SCP），在少样本医学地标检测的背景下选择“最有价值”的图像进行标注。SCP包括三个部分：1）自监督训练用于构建一个预训练的深度模型来从放射图像中提取特征，2）关键点提议用于定位信息丰富的区域，以及3）代表性分数估计用于搜索最具代表性的样本或模板。SCP的优势通过对三个广泛使用的公共数据集进行各种实验来展示。对于一次性医学地标检测，其使用将Cephalometric和HandXray数据集上的平均径向误差分别降低了14.2％（从3.595mm降至3.083mm）和35.5％（从4.114mm降至2.653mm）。

更新时间: 2024-04-28 05:13:35

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2112.04386v3

OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering

State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState

Updated: 2024-04-28 05:04:45

标题: OptiState：使用基于变压器视觉和卡尔曼滤波的门控网络对四足机器人进行状态估计

摘要: 四足机器人的状态估计是具有挑战性的，因为它们具有高度动态的运动特性，并且受传感器精度限制。通过集成卡尔曼滤波、优化和基于学习的模式，我们提出了一种混合解决方案，结合了本体感知和外部感知信息，用于估计机器人躯干的状态。通过利用关节编码器和IMU测量，我们的卡尔曼滤波器通过一个包含凸模型预测控制优化的地面反作用力控制输出的单刚体模型得到增强。通过Gated Recurrent Units进一步改进估计，该方法还考虑了深度图像上应用的Vision Transformer自动编码器提供的语义洞察和机器人高度。这个框架不仅提供了准确的机器人状态估计，包括不确定性评估，还可以通过学习来减少由传感器测量和模型简化导致的非线性误差。所提出的方法在硬件上使用四足机器人在各种地形上进行了评估，与我们的VIO SLAM基线相比，均方根误差提高了65%。代码示例：https://github.com/AlexS28/OptiState

更新时间: 2024-04-28 05:04:45

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2401.16719v3

Quantized Context Based LIF Neurons for Recurrent Spiking Neural Networks in 45nm

In this study, we propose the first hardware implementation of a context-based recurrent spiking neural network (RSNN) emphasizing on integrating dual information streams within the neocortical pyramidal neurons specifically Context- Dependent Leaky Integrate and Fire (CLIF) neuron models, essential element in RSNN. We present a quantized version of the CLIF neuron (qCLIF), developed through a hardware-software codesign approach utilizing the sparse activity of RSNN. Implemented in a 45nm technology node, the qCLIF is compact (900um^2) and achieves a high accuracy of 90% despite 8 bit quantization on DVS gesture classification dataset. Our analysis spans a network configuration from 10 to 200 qCLIF neurons, supporting up to 82k synapses within a 1.86 mm^2 footprint, demonstrating scalability and efficiency

Updated: 2024-04-28 04:32:44

标题: 基于量化背景的45nm重复脉冲神经网络中的LIF神经元

摘要: 在这项研究中，我们提出了第一个硬件实现的基于上下文的循环脉冲神经网络（RSNN），重点在于在新皮层锥体神经元内集成双信息流，特别是上下文相关的漏电积分和放电（CLIF）神经元模型，这是RSNN中的基本要素。我们提出了CLIF神经元的量化版本（qCLIF），通过利用RSNN的稀疏活动，采用硬件-软件协同设计方法进行开发。在45纳米技术节点上实现的qCLIF体积紧凑（900平方微米），尽管在DVS手势分类数据集上进行了8位量化，但精度高达90%。我们的分析涵盖了从10到200个qCLIF神经元的网络配置，支持在1.86平方毫米的占地面积内最多82k个突触，展示了可伸缩性和效率。

更新时间: 2024-04-28 04:32:44

领域: cs.NE,cs.AI,cs.AR,cs.CV,q-bio.NC

下载: http://arxiv.org/abs/2404.18066v1

Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions

This paper focuses on the feasibility of Deep Neural Operator (DeepONet) as a robust surrogate modeling method within the context of digital twin (DT) for nuclear energy systems. Through benchmarking and evaluation, this study showcases the generalizability and computational efficiency of DeepONet in solving a challenging particle transport problem. DeepONet also exhibits remarkable prediction accuracy and speed, outperforming traditional ML methods, making it a suitable algorithm for real-time DT inference. However, the application of DeepONet also reveals challenges related to optimal sensor placement and model evaluation, critical aspects of real-world implementation. Addressing these challenges will further enhance the method's practicality and reliability. Overall, DeepONet presents a promising and transformative nuclear engineering research and applications tool. Its accurate prediction and computational efficiency capabilities can revolutionize DT systems, advancing nuclear engineering research. This study marks an important step towards harnessing the power of surrogate modeling techniques in critical engineering domains.

Updated: 2024-04-28 04:31:36

标题: 深度神经运算器驱动的核系统实时推理，实现数字孪生解决方案

摘要: 本文重点研究了深度神经算子（DeepONet）作为核能系统数字孪生（DT）中强大的替代建模方法的可行性。通过基准测试和评估，本研究展示了DeepONet在解决具有挑战性的粒子输运问题中的泛化能力和计算效率。DeepONet还展示了出色的预测准确性和速度，优于传统的机器学习方法，使其成为实时DT推断的合适算法。然而，DeepONet的应用也揭示了与传感器最佳放置和模型评估相关的挑战，这是实际实施的关键方面。解决这些挑战将进一步提升该方法的实用性和可靠性。总的来说，DeepONet呈现出一个有前途且具有变革性的核工程研究和应用工具。其准确的预测和计算效率能力可以革新DT系统，推进核工程研究。本研究标志着在关键工程领域利用替代建模技术的重要一步。

更新时间: 2024-04-28 04:31:36

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2308.07523v2

BetterV: Controlled Verilog Generation with Discriminative Guidance

Due to the growing complexity of modern Integrated Circuits (ICs), there is a need for automated circuit design methods. Recent years have seen rising research in hardware design language generation to facilitate the design process. In this work, we propose a Verilog generation framework, BetterV, which fine-tunes the large language models (LLMs) on processed domain-specific datasets and incorporates generative discriminators for guidance on particular design demands. The Verilog modules are collected, filtered and processed from internet to form a clean and abundant dataset. Instruct-tuning methods are specially designed to fine-tune the LLMs to understand the knowledge about Verilog. Furthermore, data are augmented to enrich the training set and also used to train a generative discriminator on particular downstream task, which leads a guidance for the LLMs to optimize the Verilog implementation. BetterV has the ability to generate syntactically and functionally correct Verilog, which can outperform GPT-4 on the VerilogEval benchmark. With the help of task-specific generative discriminator, BetterV can achieve remarkable improvement on various electronic design automation (EDA) downstream tasks, including the netlist node reduction for synthesis and verification runtime reduction with Boolean Satisfiability (SAT) solving.

Updated: 2024-04-28 04:20:31

标题: BetterV: 带有判别性引导的受控 Verilog 生成

摘要: 由于现代集成电路（IC）的复杂性不断增加，自动化电路设计方法变得必不可少。近年来，硬件设计语言生成的研究逐渐增加，以促进设计过程。在这项工作中，我们提出了一个Verilog生成框架，BetterV，该框架通过在处理过的特定领域数据集上对大型语言模型（LLMs）进行微调，并结合生成性鉴别器以指导特定设计需求。从互联网收集、过滤和处理Verilog模块，形成一个干净且丰富的数据集。特别设计了指导微调方法，以使LLMs理解关于Verilog的知识。此外，数据进行增强以丰富训练集，同时用于训练生成性鉴别器执行特定的下游任务，从而为LLMs优化Verilog实现提供指导。BetterV能够生成语法和功能正确的Verilog，可以在VerilogEval基准测试中胜过GPT-4。借助特定任务的生成性鉴别器，BetterV能够在各种电子设计自动化（EDA）下游任务中取得显著改进，包括用于合成的减少网表节点和用于验证运行时间的布尔可满足性（SAT）求解的减少。

更新时间: 2024-04-28 04:20:31

领域: cs.AI,cs.PL

下载: http://arxiv.org/abs/2402.03375v2

RLRF:Reinforcement Learning from Reflection through Debates as Feedback for Bias Mitigation in LLMs

Biases and stereotypes in Large Language Models (LLMs) can have negative implications for user experience and societal outcomes. Current approaches to bias mitigation like Reinforcement Learning from Human Feedback (RLHF) rely on costly manual feedback. While LLMs have the capability to understand logic and identify biases in text, they often struggle to effectively acknowledge and address their own biases due to factors such as prompt influences, internal mechanisms, and policies. We found that informing LLMs that the content they generate is not their own and questioning them about potential biases in the text can significantly enhance their recognition and improvement capabilities regarding biases. Based on this finding, we propose RLRF (Reinforcement Learning from Reflection through Debates as Feedback), replacing human feedback with AI for bias mitigation. RLRF engages LLMs in multi-role debates to expose biases and gradually reduce biases in each iteration using a ranking scoring mechanism. The dialogue are then used to create a dataset with high-bias and low-bias instances to train the reward model in reinforcement learning. This dataset can be generated by the same LLMs for self-reflection or a superior LLMs guiding the former in a student-teacher mode to enhance its logical reasoning abilities. Experimental results demonstrate the significant effectiveness of our approach in bias reduction.

Updated: 2024-04-28 04:08:39

标题: RLRF：通过辩论反思进行强化学习，作为LLMs中偏见缓解的反馈

摘要: 大型语言模型（LLMs）中的偏见和刻板印象可能会对用户体验和社会结果产生负面影响。目前的偏见缓解方法，如从人类反馈中进行强化学习（RLHF），依赖于昂贵的手动反馈。虽然LLMs具有理解逻辑和识别文本中偏见的能力，但由于提示影响、内部机制和政策等因素，它们常常难以有效承认和解决自身的偏见。我们发现，告知LLMs它们生成的内容并非自己的，并质疑他们关于文本潜在偏见的能力，可以显著增强它们对偏见的认识和改进能力。基于这一发现，我们提出了RLRF（通过辩论作为反馈的反思强化学习），用AI替代人类反馈进行偏见缓解。RLRF让LLMs参与多角色辩论，揭示偏见，并使用排名评分机制逐渐减少每次迭代中的偏见。然后使用对话创建一个高偏见和低偏见实例的数据集，用于训练强化学习中的奖励模型。这个数据集可以由相同的LLMs进行自我反思，或者由一个更高级的LLMs以学生-教师模式指导前者，以增强其逻辑推理能力。实验结果表明，我们的方法在减少偏见方面具有显著的有效性。

更新时间: 2024-04-28 04:08:39

领域: cs.AI

下载: http://arxiv.org/abs/2404.10160v2

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

In this paper, we propose an effective two-stage approach named Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts while achieving high fidelity by using a pre-trained multi-view diffusion model. Multi-view diffusion models, such as MVDream, have shown to generate high-fidelity 3D assets using score distillation sampling (SDS). However, applied naively, these methods often fail to comprehend compositional text prompts, and may often entirely omit certain subjects or parts. To address this issue, we first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline. We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation, without the necessity to re-train the multi-view diffusion model or craft a high-quality compositional 3D dataset. We further propose a hybrid optimization strategy to encourage synergy between the SDS loss and the sparse RGB reference images. Our method consistently outperforms previous state-of-the-art (SOTA) methods in generating compositional 3D assets, excelling in both quality and accuracy, and enabling diverse 3D from the same text prompt.

Updated: 2024-04-28 04:05:10

标题: 通过预训练的多视角扩散模型实现基于组合和多样性的文本到3D的方法

摘要: 在本文中，我们提出了一种名为Grounded-Dreamer的有效的两阶段方法，用于生成可以准确遵循复杂、构成性文本提示的3D资产，同时通过使用预训练的多视角扩散模型实现高保真度。多视角扩散模型，如MVDream，已经证明可以使用得分蒸馏采样（SDS）生成高保真度的3D资产。然而，这些方法如果简单应用，通常无法理解构成性文本提示，可能经常完全省略某些主题或部分。为了解决这个问题，我们首先主张利用文本引导的4视图图像作为文本到3D管道中的瓶颈。然后，我们引入了一种注意力重新聚焦机制，以鼓励文本对齐的4视图图像生成，而无需重新训练多视角扩散模型或制作高质量的构成性3D数据集。我们进一步提出了一种混合优化策略，以鼓励SDS损失和稀疏的RGB参考图像之间的协同作用。我们的方法始终优于先前的最先进（SOTA）方法，在生成构成性3D资产方面表现出色，无论是在质量上还是准确性上，都能从相同的文本提示中产生多样化的3D。

更新时间: 2024-04-28 04:05:10

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2404.18065v1

Machine Learning Techniques for Data Reduction of CFD Applications

We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. The tensor consists of species that represent different elements in a CFD simulation. To guarantee the error bound of the reconstructed data, principal component analysis (PCA) is applied to the residual between the original and reconstructed data. This yields a basis matrix, which is then used to project the residual of each instance. The resulting coefficients are retained to enable accurate reconstruction. Experimental results demonstrate that our approach can deliver two orders of magnitude in reduction while still keeping the errors of primary data under scientifically acceptable bounds. Compared to reduction-based approaches based on SZ, our method achieves a substantially higher compression ratio for a given error bound or a better error for a given compression ratio.

Updated: 2024-04-28 04:01:09

标题: 《机器学习技术在CFD应用中的数据减少》

摘要: 我们提出了一种称为保证块自编码器的方法，利用张量相关性（GBATC）来减少计算流体动力学（CFD）和其他科学应用生成的时空数据。它使用了一个多维张量块（在空间和时间上延伸）作为输入和输出，捕获了张量内的时空和物种间关系。张量包含代表CFD模拟中不同元素的物种。为了保证重构数据的误差界限，我们对原始数据和重构数据之间的残差应用主成分分析（PCA）。这产生了一个基础矩阵，然后用于投影每个实例的残差。得到的系数被保留以实现准确的重构。实验结果表明，我们的方法可以实现两个数量级的减少，同时仍然保持主要数据的误差在科学上可接受的范围内。与基于SZ的降维方法相比，我们的方法为给定的误差界限实现了大幅度的压缩比，或者对于给定的压缩比实现了更好的误差。

更新时间: 2024-04-28 04:01:09

领域: cs.LG

下载: http://arxiv.org/abs/2404.18063v1

Towards the New XAI: A Hypothesis-Driven Approach to Decision Support Using Evidence

Prior research on AI-assisted human decision-making has explored several different explainable AI (XAI) approaches. A recent paper has proposed a paradigm shift calling for hypothesis-driven XAI through a conceptual framework called evaluative AI that gives people evidence that supports or refutes hypotheses without necessarily giving a decision-aid recommendation. In this paper, we describe and evaluate an approach for hypothesis-driven XAI based on the Weight of Evidence (WoE) framework, which generates both positive and negative evidence for a given hypothesis. Through human behavioural experiments, we show that our hypothesis-driven approach increases decision accuracy and reduces reliance compared to a recommendation-driven approach and an AI-explanation-only baseline, but with a small increase in under-reliance compared to the recommendation-driven approach. Further, we show that participants used our hypothesis-driven approach in a materially different way to the two baselines.

Updated: 2024-04-28 03:29:54

标题: 走向新的XAI：使用证据的假设驱动方法进行决策支持

摘要: 先前关于人工智能辅助人类决策的研究已经探索了几种不同的可解释人工智能（XAI）方法。最近一篇论文提出了一种范式转变，呼吁通过一个名为评估性人工智能的概念框架进行基于假设的XAI，该框架为人们提供支持或驳斥假设的证据，而并非必然给出决策辅助建议。在本文中，我们描述并评估了一种基于证据权重（WoE）框架的假设驱动XAI方法，该方法为给定的假设生成正面和负面证据。通过人类行为实验，我们展示了我们的假设驱动方法相比于基于建议的方法和仅基于人工智能解释的基准线，可以提高决策准确性并减少依赖，但与基于建议的方法相比在减少依赖方面略微增加。此外，我们还表明，与两个基准线相比，参与者使用我们的假设驱动方法的方式存在实质性差异。

更新时间: 2024-04-28 03:29:54

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2402.01292v2

Prompt Customization for Continual Learning

Contemporary continual learning approaches typically select prompts from a pool, which function as supplementary inputs to a pre-trained model. However, this strategy is hindered by the inherent noise of its selection approach when handling increasing tasks. In response to these challenges, we reformulate the prompting approach for continual learning and propose the prompt customization (PC) method. PC mainly comprises a prompt generation module (PGM) and a prompt modulation module (PMM). In contrast to conventional methods that employ hard prompt selection, PGM assigns different coefficients to prompts from a fixed-sized pool of prompts and generates tailored prompts. Moreover, PMM further modulates the prompts by adaptively assigning weights according to the correlations between input data and corresponding prompts. We evaluate our method on four benchmark datasets for three diverse settings, including the class, domain, and task-agnostic incremental learning tasks. Experimental results demonstrate consistent improvement (by up to 16.2\%), yielded by the proposed method, over the state-of-the-art (SOTA) techniques.

Updated: 2024-04-28 03:28:27

标题: 持续学习的即时定制

摘要: 当代的持续学习方法通常从一个池中选择提示，这些提示作为预先训练模型的补充输入。然而，当处理不断增加的任务时，这种策略受到其选择方法固有噪音的阻碍。为了应对这些挑战，我们重新构思了持续学习的提示方法，并提出了提示定制（PC）方法。PC主要包括提示生成模块（PGM）和提示调制模块（PMM）。与采用硬提示选择的传统方法相比，PGM为来自固定大小提示池的提示分配不同的系数，并生成定制的提示。此外，PMM通过根据输入数据和相应提示之间的相关性自适应地分配权重，进一步调制提示。我们在四个基准数据集上评估了我们的方法，涵盖了三种不同的设置，包括类别、领域和任务不可知的增量学习任务。实验结果表明，提出的方法相对于最先进的技术，实现了一致的改进（最高达16.2%）。

更新时间: 2024-04-28 03:28:27

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.18060v1

Heterogeneous Graph Neural Networks for End-to-End Traffic Assignment and Traffic Flow Learning

The traffic assignment problem is one of the significant components of traffic flow analysis for which various solution approaches have been proposed. However, deploying these approaches for large-scale networks poses significant challenges. In this paper, we leverage the power of heterogeneous graph neural networks to propose a novel data-driven approach for end-to-end traffic assignment and traffic flow learning. Our model integrates an adaptive graph attention mechanism with auxiliary "virtual" links connecting origin-destination node pairs, This integration enables the model to capture spatial traffic patterns across different links, By incorporating the node-based flow conservation law into the overall loss function, the model ensures the prediction results in compliance with flow conservation principles, resulting in highly accurate predictions for both link flow and flow-capacity ratios. We present numerical experiments on urban transportation networks and show that the proposed heterogeneous graph neural network model outperforms other conventional neural network models in terms of convergence rate and prediction accuracy. Notably, by introducing two different training strategies, the proposed heterogeneous graph neural network model can also be generalized to different network topologies. This approach offers a promising solution for complex traffic flow analysis and prediction, enhancing our understanding and management of a wide range of transportation systems.

Updated: 2024-04-28 02:16:05

标题: 异构图神经网络用于端到端交通分配和交通流学习

摘要: 交通分配问题是交通流分析的重要组成部分之一，为此提出了各种解决方案。然而，将这些方法应用于大规模网络存在重大挑战。本文利用异质图神经网络的能力，提出了一种新颖的数据驱动方法，用于端到端的交通分配和交通流学习。我们的模型集成了自适应图注意机制，辅助连接起始-目的地节点对的“虚拟”链接。这种整合使模型能够捕捉不同链接之间的空间交通模式。通过将基于节点的流量守恒定律纳入整体损失函数，模型确保预测结果符合流量守恒原则，从而实现对链接流量和流量容量比的高度准确预测。我们在城市交通网络上进行了数值实验，并表明所提出的异质图神经网络模型在收敛速度和预测准确性方面优于其他传统神经网络模型。值得注意的是，通过引入两种不同的训练策略，所提出的异质图神经网络模型还可以推广到不同的网络拓扑。这种方法为复杂交通流分析和预测提供了一个有前景的解决方案，增强了我们对各种交通系统的理解和管理能力。

更新时间: 2024-04-28 02:16:05

领域: cs.LG

下载: http://arxiv.org/abs/2310.13193v2

Few-shot Name Entity Recognition on StackOverflow

StackOverflow, with its vast question repository and limited labeled examples, raise an annotation challenge for us. We address this gap by proposing RoBERTa+MAML, a few-shot named entity recognition (NER) method leveraging meta-learning. Our approach, evaluated on the StackOverflow NER corpus (27 entity types), achieves a 5% F1 score improvement over the baseline. We improved the results further domain-specific phrase processing enhance results.

Updated: 2024-04-28 01:58:10

标题: StackOverflow上的少样本命名实体识别

摘要: StackOverflow，拥有庞大的问题库和有限的标记示例，为我们提出了一个注释挑战。我们通过提出RoBERTa+MAML来解决这一差距，这是一种利用元学习的少样本命名实体识别（NER）方法。我们的方法在StackOverflow NER语料库（27种实体类型）上进行评估，相对基准线提高了5%的F1分数。我们通过改进领域特定短语处理进一步提高了结果。

更新时间: 2024-04-28 01:58:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.09405v2

Clustered Policy Decision Ranking

Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. Given a trained policy, we propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. We compare our measure against a previous statistical fault localization based ranking procedure.

Updated: 2024-04-28 01:48:57

标题: 集群化政策决策排名

摘要: 通过强化学习（RL）训练的策略通常即使在简单任务中也非常复杂。在一个包含n个时间步骤的情节中，一个策略将做出n个关于采取行动的决定，其中许多可能对观察者来说看起来不直观。此外，不清楚这些决定中哪些直接有助于实现奖励以及它们的贡献有多显著。鉴于一个经过训练的策略，我们提出了一种基于统计协方差估计的黑盒方法，该方法对环境的状态进行聚类，并根据在其状态中做出的决定的重要性对每个聚类进行排名。我们将我们的度量与先前基于统计故障定位的排名程序进行比较。

更新时间: 2024-04-28 01:48:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.12970v2

Enhancing Group Fairness in Online Settings Using Oblique Decision Forests

Fairness, especially group fairness, is an important consideration in the context of machine learning systems. The most commonly adopted group fairness-enhancing techniques are in-processing methods that rely on a mixture of a fairness objective (e.g., demographic parity) and a task-specific objective (e.g., cross-entropy) during the training process. However, when data arrives in an online fashion -- one instance at a time -- optimizing such fairness objectives poses several challenges. In particular, group fairness objectives are defined using expectations of predictions across different demographic groups. In the online setting, where the algorithm has access to a single instance at a time, estimating the group fairness objective requires additional storage and significantly more computation (e.g., forward/backward passes) than the task-specific objective at every time step. In this paper, we propose Aranyani, an ensemble of oblique decision trees, to make fair decisions in online settings. The hierarchical tree structure of Aranyani enables parameter isolation and allows us to efficiently compute the fairness gradients using aggregate statistics of previous decisions, eliminating the need for additional storage and forward/backward passes. We also present an efficient framework to train Aranyani and theoretically analyze several of its properties. We conduct empirical evaluations on 5 publicly available benchmarks (including vision and language datasets) to show that Aranyani achieves a better accuracy-fairness trade-off compared to baseline approaches.

Updated: 2024-04-28 01:40:10

标题: 利用斜决策森林增强在线环境中的群体公平性

摘要: 公平性，尤其是群体公平性，在机器学习系统的背景下是一个重要考虑因素。最常采用的群体公平性增强技术是依赖于在训练过程中同时使用公平目标（如人口平衡）和任务特定目标（如交叉熵）的处理方法。然而，在数据以在线方式到达时 - 每次一个实例 - 优化这种公平目标面临几个挑战。特别是，群体公平性目标是使用在不同人口群体中预测期望来定义的。在在线设置中，算法每次只能访问一个实例，估计群体公平性目标需要额外的存储和比任务特定目标更多的计算（如前向/后向传递）。在本文中，我们提出了Aranyani，这是一个斜决策树的集合，用于在线环境中做出公平决策。Aranyani的分层树结构使参数隔离，并允许我们使用先前决策的聚合统计有效地计算公平性梯度，消除了额外存储和前向/后向传递的需求。我们还提出了一个有效的框架来训练Aranyani，并在理论上分析了其几个性质。我们对5个公开可用的基准测试（包括视觉和语言数据集）进行了实证评估，以展示Aranyani相对于基线方法实现更好的准确性-公平性权衡。

更新时间: 2024-04-28 01:40:10

领域: cs.LG

下载: http://arxiv.org/abs/2310.11401v4

Utilizing Large Language Models for Information Extraction from Real Estate Transactions

Real estate sales contracts contain crucial information for property transactions, but manual extraction of data can be time-consuming and error-prone. This paper explores the application of large language models, specifically transformer-based architectures, for automated information extraction from real estate contracts. We discuss challenges, techniques, and future directions in leveraging these models to improve efficiency and accuracy in real estate contract analysis.

Updated: 2024-04-28 01:38:38

标题: 利用大型语言模型从房地产交易中提取信息

摘要: 房地产销售合同包含房产交易的关键信息，但手动提取数据可能耗时且容易出错。本文探讨了大型语言模型，特别是基于Transformer结构的架构，在自动提取房地产合同信息方面的应用。我们讨论了在利用这些模型来提高房地产合同分析效率和准确性方面所面临的挑战、技术和未来方向。

更新时间: 2024-04-28 01:38:38

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.18043v1

Variational Optimization for Quantum Problems using Deep Generative Networks

Optimization is one of the keystones of modern science and engineering. Its applications in quantum technology and machine learning helped nurture variational quantum algorithms and generative AI respectively. We propose a general approach to design variational optimization algorithms based on generative models: the Variational Generative Optimization Network (VGON). To demonstrate its broad applicability, we apply VGON to three quantum tasks: finding the best state in an entanglement-detection protocol, finding the ground state of a 1D quantum spin model with variational quantum circuits, and generating degenerate ground states of many-body quantum Hamiltonians. For the first task, VGON greatly reduces the optimization time compared to stochastic gradient descent while generating nearly optimal quantum states. For the second task, VGON alleviates the barren plateau problem in variational quantum circuits. For the final task, VGON can identify the degenerate ground state spaces after a single stage of training and generate a variety of states therein.

Updated: 2024-04-28 00:58:28

标题: 使用深度生成网络进行量子问题的变分优化

摘要: 优化是现代科学和工程的基石之一。其在量子技术和机器学习中的应用有助于培养变分量子算法和生成式人工智能。我们提出了一种基于生成模型设计变分优化算法的通用方法：变分生成优化网络（VGON）。为了展示其广泛适用性，我们将VGON应用于三个量子任务：在纠缠检测协议中找到最佳状态，用变分量子电路找到1D量子自旋模型的基态，以及生成多体量子哈密顿量的简并基态。对于第一个任务，与随机梯度下降相比，VGON大大减少了优化时间，同时生成了几乎最优的量子态。对于第二个任务，VGON缓解了变分量子电路中的贫瘠高原问题。对于最后一个任务，VGON可以在单个训练阶段后识别简并的基态空间，并生成其中各种状态。

更新时间: 2024-04-28 00:58:28

领域: quant-ph,cs.LG,math.OC

下载: http://arxiv.org/abs/2404.18041v1

Observable Perfect Equilibrium

While Nash equilibrium has emerged as the central game-theoretic solution concept, many important games contain several Nash equilibria and we must determine how to select between them in order to create real strategic agents. Several Nash equilibrium refinement concepts have been proposed and studied for sequential imperfect-information games, the most prominent being trembling-hand perfect equilibrium, quasi-perfect equilibrium, and recently one-sided quasi-perfect equilibrium. These concepts are robust to certain arbitrarily small mistakes, and are guaranteed to always exist; however, we argue that neither of these is the correct concept for developing strong agents in sequential games of imperfect information. We define a new equilibrium refinement concept for extensive-form games called observable perfect equilibrium in which the solution is robust over trembles in publicly-observable action probabilities (not necessarily over all action probabilities that may not be observable by opposing players). Observable perfect equilibrium correctly captures the assumption that the opponent is playing as rationally as possible given mistakes that have been observed (while previous solution concepts do not). We prove that observable perfect equilibrium is always guaranteed to exist, and demonstrate that it leads to a different solution than the prior extensive-form refinements in no-limit poker. We expect observable perfect equilibrium to be a useful equilibrium refinement concept for modeling many important imperfect-information games of interest in artificial intelligence.

Updated: 2024-04-28 00:35:34

标题: 可观察的完美均衡

摘要: 尽管纳什均衡已成为中心博弈论解决方案概念，但许多重要游戏包含多个纳什均衡，我们必须确定如何在它们之间进行选择，以创建真正的战略代理。已经提出并研究了几个用于顺序不完全信息游戏的纳什均衡完美度概念，其中最突出的是微震手完美均衡、准完美均衡，最近还有单边准完美均衡。这些概念对某些任意小的错误具有强大的鲁棒性，并且保证始终存在；然而，我们认为，这些概念都不是在不完全信息的顺序游戏中培养强大代理的正确概念。我们为广泛形式游戏定义了一个新的均衡完美度概念，称为可观察完美均衡，其中解决方案在公开可观察的动作概率上是稳健的（不一定在对手可能无法观察到的所有动作概率上）。可观察到的完美均衡正确捕捉了对手在观察到的错误情况下尽可能理性地游戏的假设（而以前的解决方案概念则不是）。我们证明可观察到的完美均衡总是保证存在，并且证明它导致与以往广泛形式精炼在无限扑克中的解决方案不同。我们预计可观察到的完美均衡将成为一个有用的均衡完美度概念，用于建模许多人工智能领域感兴趣的重要不完全信息游戏。

更新时间: 2024-04-28 00:35:34

领域: cs.GT,cs.AI,cs.MA,econ.TH

下载: http://arxiv.org/abs/2210.16506v9