Arxiv Day: Article

Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

As natural language becomes the default interface for human-AI interaction, there is a need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (an average of 47%) among confident responses. We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty. Our work highlights new safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.

Updated: 2024-07-09 23:53:06

标题: 依赖不可靠之处：语言模型不愿表达不确定性的影响

摘要: 随着自然语言成为人工智能与人类交互的默认界面，需要语言模型适当地传达下游应用中的不确定性。在这项工作中，我们研究了语言模型如何通过自然语言表达响应的信心，以及下游用户对语言模型表达的不确定性做出的反应。我们检查了公开部署的模型，并发现语言模型在回答问题时即使产生错误的响应也不愿表达不确定性。语言模型可以被明确提示表达信心，但往往过于自信，导致自信回答中高错误率（平均47%）。我们通过进行人类实验测试语言模型过度自信的风险，并展示用户在很大程度上依赖于语言模型生成的文本，无论是否标记为确定性。最后，我们调查了用于后训练对齐的偏好标注数据集，发现人类对带有不确定性的文本存在偏见。我们的工作突出了人类与语言模型交互面临的新安全危害，并提出设计建议和未来的缓解策略。

更新时间: 2024-07-09 23:53:06

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2401.06730v2

Lifestyle-Informed Personalized Blood Biomarker Prediction via Novel Representation Learning

Blood biomarkers are an essential tool for healthcare providers to diagnose, monitor, and treat a wide range of medical conditions. Current reference values and recommended ranges often rely on population-level statistics, which may not adequately account for the influence of inter-individual variability driven by factors such as lifestyle and genetics. In this work, we introduce a novel framework for predicting future blood biomarker values and define personalized references through learned representations from lifestyle data (physical activity and sleep) and blood biomarkers. Our proposed method learns a similarity-based embedding space that captures the complex relationship between biomarkers and lifestyle factors. Using the UK Biobank (257K participants), our results show that our deep-learned embeddings outperform traditional and current state-of-the-art representation learning techniques in predicting clinical diagnosis. Using a subset of UK Biobank of 6440 participants who have follow-up visits, we validate that the inclusion of these embeddings and lifestyle factors directly in blood biomarker models improves the prediction of future lab values from a single lab visit. This personalized modeling approach provides a foundation for developing more accurate risk stratification tools and tailoring preventative care strategies. In clinical settings, this translates to the potential for earlier disease detection, more timely interventions, and ultimately, a shift towards personalized healthcare.

Updated: 2024-07-09 23:52:53

标题: 通过新颖的表示学习实现基于生活方式的个性化血液生物标记预测

摘要: 血液生物标志物是医疗保健提供者诊断、监测和治疗各种医疗状况的必备工具。目前的参考值和推荐范围往往依赖于人口统计数据，这可能不能充分考虑由生活方式和遗传等因素驱动的个体间变异的影响。在这项工作中，我们引入了一种新颖的框架，通过从生活方式数据（如体育活动和睡眠）和血液生物标志物中学习到的表示来预测未来的血液生物标志物值并定义个性化参考值。我们提出的方法学习了一个基于相似度的嵌入空间，捕捉了生物标志物和生活方式因素之间的复杂关系。利用英国生物库（257K参与者），我们的结果表明，我们深度学习的嵌入优于传统的和当前的最先进的表示学习技术在预测临床诊断方面。利用英国生物库中6440名有随访的参与者的子集，我们验证了将这些嵌入和生活方式因素直接包含在血液生物标志物模型中，可以改善从单次实验室访问中预测未来实验室值的能力。这种个性化建模方法为开发更准确的风险分层工具和量身定制的预防护理策略奠定了基础。在临床设置中，这意味着更早地发现疾病、更及时的干预，最终实现向个性化医疗的转变。

更新时间: 2024-07-09 23:52:53

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.07277v1

A cyclical route linking fundamental mechanism and AI algorithm: An example from tuning Poisson's ratio in amorphous networks

"AI for science" is widely recognized as a future trend in the development of scientific research. Currently, although machine learning algorithms have played a crucial role in scientific research with numerous successful cases, relatively few instances exist where AI assists researchers in uncovering the underlying physical mechanisms behind a certain phenomenon and subsequently using that mechanism to improve machine learning algorithms' efficiency. This article uses the investigation into the relationship between extreme Poisson's ratio values and the structure of amorphous networks as a case study to illustrate how machine learning methods can assist in revealing underlying physical mechanisms. Upon recognizing that the Poisson's ratio relies on the low-frequency vibrational modes of dynamical matrix, we can then employ a convolutional neural network, trained on the dynamical matrix instead of traditional image recognition, to predict the Poisson's ratio of amorphous networks with a much higher efficiency. Through this example, we aim to showcase the role that artificial intelligence can play in revealing fundamental physical mechanisms, which subsequently improves the machine learning algorithms significantly.

Updated: 2024-07-09 23:45:34

标题: 一个将基本机制和人工智能算法联系起来的循环路径：以非晶网络中泊松比调节为例

摘要: “AI for science”被广泛认为是科学研究发展的未来趋势。目前，尽管机器学习算法在科学研究中发挥了关键作用，并取得了许多成功案例，但AI帮助研究人员揭示某一现象背后的物理机制，进而利用该机制提高机器学习算法效率的情况相对较少。本文以探讨极端泊松比值与无定形网络结构之间关系的研究为案例，展示了机器学习方法如何协助揭示潜在物理机制。在认识到泊松比依赖于动态矩阵的低频振动模式后，我们可以使用卷积神经网络，该网络在动态矩阵上进行训练，而不是传统的图像识别，以更高效地预测无定形网络的泊松比。通过这个例子，我们旨在展示人工智能在揭示基本物理机制方面的作用，进而显著改进机器学习算法。

更新时间: 2024-07-09 23:45:34

领域: cond-mat.soft,cs.LG

下载: http://arxiv.org/abs/2312.03404v3

PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games

This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level data (which includes, but is not limited to, setting, key items, and non-playable characters (NPCs)), but by also fostering dynamic, free-form interactions between the player and the environment that align with the procedural game narrative. The NPCs generated by PANGeA are personality-biased and express traits from the Big 5 Personality Model in their generated responses. PANGeA addresses challenges behind ingesting free-form text input, which can prompt LLM responses beyond the scope of the game narrative. A novel validation system that uses the LLM's intelligence evaluates text input and aligns generated responses with the unfolding narrative. Making these interactions possible, PANGeA is supported by a server that hosts a custom memory system that supplies context for augmenting generated responses thus aligning them with the procedural narrative. For its broad application, the server has a REST interface enabling any game engine to integrate directly with PANGeA, as well as an LLM interface adaptable with local or private LLMs. PANGeA's ability to foster dynamic narrative generation by aligning responses with the procedural narrative is demonstrated through an empirical study and ablation test of two versions of a demo game. These are, a custom, browser-based GPT and a Unity demo. As the results show, PANGeA holds potential to assist game designers in using LLMs to generate narrative-consistent content even when provided varied and unpredictable, free-form text input.

Updated: 2024-07-09 23:45:27

标题: PANGeA：利用生成AI进行回合制视频游戏的程序化人工叙事

摘要: 这项研究介绍了使用生成式人工智能（PANGeA）的程序化人工叙事，这是一种结构化方法，通过游戏设计师的高级标准指导，利用大型语言模型（LLMs）生成角色扮演视频游戏（RPGs）的叙事内容。与先前用于视频游戏设计的LLMs应用不同，PANGeA通过不仅生成游戏级别数据（包括但不限于设定、关键物品和非玩家角色（NPCs）），而且促进玩家与环境之间与程序化游戏叙事相一致的动态、自由形式互动而创新。PANGeA生成的NPCs具有个性偏见，并在生成的响应中表达出大五人格模型中的特征。PANGeA解决了摄取自由形式文本输入背后的挑战，这可能导致LLM响应超出游戏叙事范围。使用LLM智能的新颖验证系统评估文本输入，并将生成的响应与不断展开的叙事相一致。为了实现这些互动，PANGeA由一个主机支持，该主机拥有一个供应上下文以增强生成的响应从而与程序化叙事相一致的定制记忆系统。为了广泛应用，该服务器具有REST接口，使任何游戏引擎可以直接集成到PANGeA中，以及一个适用于本地或私人LLMs的LLM接口。通过实证研究和两个版本演示游戏的消融测试，展示了PANGeA通过将响应与程序化叙事相一致来促进动态叙事生成的能力。结果显示，PANGeA具有潜力帮助游戏设计师使用LLMs生成与叙事一致的内容，即使提供了多样化和不可预测的自由形式文本输入。

更新时间: 2024-07-09 23:45:27

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2404.19721v3

Exploring Camera Encoder Designs for Autonomous Driving Perception

The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design parameters including width and depth of the model, stage compute ratio, attention mechanisms, and input resolution, supported by systematic analysis to each modifications. This customization yields an architecture optimized for AV camera encoder achieving 8.79% mAP improvement over the baseline. We believe our effort could become a sweet cookbook of image encoders for AV and pave the way to the next-level drive system.

Updated: 2024-07-09 23:44:58

标题: 探索用于自动驾驶感知的摄像头编码器设计

摘要: 自动驾驶汽车（AV）的基石是稳健的感知系统，摄像头编码器起着至关重要的作用。现有的工作通常利用预训练的卷积神经网络（CNN）或视觉变换器（ViTs）设计用于一般视觉任务，如图像分类、分割和2D检测。尽管这些知名架构在AV相关任务中取得了最先进的准确性，例如3D目标检测，但由于工业级AV数据集的微妙复杂性，网络设计仍然存在显著的改进潜力。此外，现有的公共AV基准通常包含不足的数据，这可能导致对这些架构的评估不准确。为了揭示AV特定模型的见解，我们从标准通用编码器ConvNeXt开始，并逐步转变设计。我们调整不同的设计参数，包括模型的宽度和深度，阶段计算比例，注意机制和输入分辨率，并对每种修改进行系统性分析。这种定制产生了一个针对AV摄像头编码器优化的架构，相对基线提高了8.79%的mAP。我们相信我们的努力可以成为AV图像编码器的甜蜜食谱，并为下一级驾驶系统铺平道路。

更新时间: 2024-07-09 23:44:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07276v1

Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support

Cinematic audio source separation (CASS) is a relatively new subtask of audio source separation, concerned with the separation of a mixture into the dialogue, music, and effects stems. To date, only one publicly available dataset exists for CASS, that is, the Divide and Remaster (DnR) dataset, which is currently at version 2. While DnR v2 has been an incredibly useful resource for CASS, several areas of improvement have been identified, particularly through its use in the 2023 Sound Demixing Challenge. In this work, we develop version 3 of the DnR dataset, addressing issues relating to vocal content in non-dialogue stems, loudness distributions, mastering process, and linguistic diversity. In particular, the dialogue stem of DnR v3 includes speech content from more than 30 languages from multiple families including but not limited to the Germanic, Romance, Indo-Aryan, Dravidian, Malayo-Polynesian, and Bantu families. Benchmark results using the Bandit model indicated that training on multilingual data yields significant generalizability to the model even in languages with low data availability. Even in languages with high data availability, the multilingual model often performs on par or better than dedicated models trained on monolingual CASS datasets.

Updated: 2024-07-09 23:39:37

标题: 重新制作《分裂》与《重制版》：支持多语言的电影音频源分离数据集

摘要: 电影音频源分离（CASS）是音频源分离的一个相对较新的子任务，涉及将混合物分离成对话、音乐和效果音轨。迄今为止，仅有一个公开可用的CASS数据集存在，即分隔并重制（DnR）数据集，目前处于第2版。虽然DnR v2已经成为CASS的一个非常有用的资源，但在其在2023年声音分离挑战中的使用中已经确定了几个改进的方面。在这项工作中，我们开发了DnR数据集的第3版，解决了与非对话音轨中的人声内容、响度分布、制作过程和语言多样性相关的问题。特别是，DnR v3的对话音轨包括来自多个家族的30多种语言的语音内容，包括但不限于日耳曼语、罗曼语、印欧-雅利安语、德拉维达语、马来-波利尼西亚语和班图语系。使用Bandit模型的基准结果表明，在多语言数据上训练可以显著提高模型的泛化能力，即使在数据可用性较低的语言中也是如此。即使在数据可用性较高的语言中，多语言模型的性能通常与在单语CASS数据集上训练的专门模型相当甚至更好。

更新时间: 2024-07-09 23:39:37

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2407.07275v1

Scalable Neural Symbolic Regression using Control Variables

Symbolic regression (SR) is a powerful technique for discovering the analytical mathematical expression from data, finding various applications in natural sciences due to its good interpretability of results. However, existing methods face scalability issues when dealing with complex equations involving multiple variables. To address this challenge, we propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability. The core idea is to decompose multi-variable symbolic regression into a set of single-variable SR problems, which are then combined in a bottom-up manner. The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs). Second, the data generator is used to generate samples for a certain variable by controlling the input variables. Thirdly, single-variable symbolic regression is applied to estimate the corresponding mathematical expression. Lastly, we repeat steps 2 and 3 by gradually adding variables one by one until completion. We evaluate the performance of our method on multiple benchmark datasets. Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables. Moreover, it can substantially reduce the search space for symbolic regression. The source code will be made publicly available upon publication.

Updated: 2024-07-09 23:24:53

标题: 可扩展的神经符号回归方法使用控制变量

摘要: 符号回归（SR）是一种从数据中发现分析数学表达式的强大技术，在自然科学中具有很好的结果可解释性，因此在各种应用中得到广泛应用。然而，现有方法在处理涉及多个变量的复杂方程时存在可伸缩性问题。为了解决这一挑战，我们提出了ScaleSR，一种可伸缩的符号回归模型，利用控制变量来提高准确性和可伸缩性。其核心思想是将多变量符号回归分解为一组单变量SR问题，然后以自下而上的方式组合这些问题。所提出的方法涉及四个步骤。首先，我们使用深度神经网络（DNNs）从观察数据中学习数据生成器。其次，使用数据生成器通过控制输入变量生成某个变量的样本。第三，应用单变量符号回归来估计相应的数学表达式。最后，我们逐步添加变量，重复步骤2和3直至完成。我们在多个基准数据集上评估了我们方法的性能。实验结果表明，所提出的ScaleSR在发现涉及多个变量的数学表达式方面明显优于现有技术基线。此外，它可以显著减少符号回归的搜索空间。源代码将在发布后公开提供。

更新时间: 2024-07-09 23:24:53

领域: cs.LG

下载: http://arxiv.org/abs/2306.04718v2

Benign overfitting in leaky ReLU networks with moderate input dimension

The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data that can be decomposed into the sum of a common signal and a random noise component, that lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign (or harmful) overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with gradient descent (GD) satisfy this property. In contrast to prior work we do not require the training data to be nearly orthogonal. Notably, for input dimension $d$ and training sample size $n$, while results in prior work require $d = \Omega(n^2 \log n)$, here we require only $d = \Omega\left(n\right)$.

Updated: 2024-07-09 23:20:12

标题: Leaky ReLU网络在输入维度适中时的良性过拟合

摘要: 良性过拟合问题探讨了一个模型是否能够完美拟合嘈杂的训练数据，同时仍能很好地泛化。我们研究了在二层漏ReLU网络上使用铰链损失进行训练的良性过拟合问题，该网络用于二分类任务。我们考虑可以分解为共同信号和随机噪声分量之和的输入数据，这些数据位于彼此正交的子空间上。我们对模型参数的信噪比（SNR）特征化条件，区分了产生良性和非良性（或有害）过拟合的情况：特别是，如果SNR很高，则会发生良性过拟合，相反，如果SNR很低，则会发生有害过拟合。我们将良性和非良性过拟合都归因于一个近似边界最大化性质，并展示了用梯度下降（GD）训练的漏ReLU网络满足这一性质。与之前的工作相比，我们不要求训练数据几乎正交。值得注意的是，在输入维度$d$和训练样本大小$n$的情况下，之前的结果要求$d = \Omega(n^2 \log n)$，而在这里我们只需要$d = \Omega\left(n\right)$。

更新时间: 2024-07-09 23:20:12

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2403.06903v2

Differential privacy and Sublinear time are incompatible sometimes

Differential privacy and sublinear algorithms are both rapidly emerging algorithmic themes in times of big data analysis. Although recent works have shown the existence of differentially private sublinear algorithms for many problems including graph parameter estimation and clustering, little is known regarding hardness results on these algorithms. In this paper, we initiate the study of lower bounds for problems that aim for both differentially-private and sublinear-time algorithms. Our main result is the incompatibility of both the desiderata in the general case. In particular, we prove that a simple problem based on one-way marginals yields both a differentially-private algorithm, as well as a sublinear-time algorithm, but does not admit a ``strictly'' sublinear-time algorithm that is also differentially private.

Updated: 2024-07-09 22:33:57

标题: 差分隐私与次线性时间有时是不兼容的

摘要: 隐私差异性和次线性算法都是在大数据分析时期迅速兴起的算法主题。虽然最近的研究表明了对许多问题包括图参数估计和聚类存在差异性私有次线性算法，但对这些算法的难度结果知之甚少。本文中，我们开始研究旨在实现差异性私有和次线性时间算法的问题的下界。我们的主要结果是在一般情况下这两个目标的不相容性。特别地，我们证明基于单向边际的一个简单问题既产生了一个差异性私有算法，也产生了一个次线性时间算法，但不允许一个“严格”的同时也是差异性私有的次线性时间算法。

更新时间: 2024-07-09 22:33:57

领域: cs.DS,cs.CR

下载: http://arxiv.org/abs/2407.07262v1

Identification of emotions on Twitter during the 2022 electoral process in Colombia

The study of Twitter as a means for analyzing social phenomena has gained interest in recent years due to the availability of large amounts of data in a relatively spontaneous environment. Within opinion-mining tasks, emotion detection is specially relevant, as it allows for the identification of people's subjective responses to different social events in a more granular way than traditional sentiment analysis based on polarity. In the particular case of political events, the analysis of emotions in social networks can provide valuable information on the perception of candidates, proposals, and other important aspects of the public debate. In spite of this importance, there are few studies on emotion detection in Spanish and, to the best of our knowledge, few resources are public for opinion mining in Colombian Spanish, highlighting the need for generating resources addressing the specific cultural characteristics of this variety. In this work, we present a small corpus of tweets in Spanish related to the 2022 Colombian presidential elections, manually labeled with emotions using a fine-grained taxonomy. We perform classification experiments using supervised state-of-the-art models (BERT models) and compare them with GPT-3.5 in few-shot learning settings. We make our dataset and code publicly available for research purposes.

Updated: 2024-07-09 22:26:42

标题: 在哥伦比亚2022年选举过程中在Twitter上识别情绪

摘要: 近年来，由于在一个相对自发的环境中可获取大量数据，因此利用Twitter分析社会现象的研究引起了人们的兴趣。在观点挖掘任务中，情感检测特别重要，因为它允许以比基于极性的传统情感分析更加细粒化的方式识别人们对不同社会事件的主观反应。在政治事件的特殊情况下，社交网络中情感的分析可以为候选人、提案和公共辩论的其他重要方面的感知提供有价值的信息。尽管这一点很重要，但关于西班牙语情感检测的研究很少，据我们所知，哥伦比亚西班牙语的观点挖掘资源也很少，突显了有必要生成资源来处理这种特定文化特征的需求。在这项工作中，我们提供了一个与2022年哥伦比亚总统选举相关的西班牙语推文小语料库，使用细粒度分类法手动标记了情感。我们使用监督式最先进的模型（BERT模型）进行分类实验，并将它们与GPT-3.5在少样本学习环境中进行比较。我们公开提供我们的数据集和代码供研究目的使用。

更新时间: 2024-07-09 22:26:42

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.07258v1

Neural Embedding Compression For Efficient Multi-Task Earth Observation Modelling

As repositories of large scale data in earth observation (EO) have grown, so have transfer and storage costs for model training and inference, expending significant resources. We introduce Neural Embedding Compression (NEC), based on the transfer of compressed embeddings to data consumers instead of raw data. We adapt foundation models (FM) through learned neural compression to generate multi-task embeddings while navigating the tradeoff between compression rate and embedding utility. We update only a small fraction of the FM parameters (10%) for a short training period (1% of the iterations of pre-training). We evaluate NEC on two EO tasks: scene classification and semantic segmentation. Compared with applying traditional compression to the raw data, NEC achieves similar accuracy with a 75% to 90% reduction in data. Even at 99.7% compression, performance drops by only 5% on the scene classification task. Overall, NEC is a data-efficient yet performant approach for multi-task EO modelling.

Updated: 2024-07-09 22:15:31

标题: 神经嵌入压缩用于高效多任务地球观测建模

摘要: 随着地球观测（EO）中大规模数据的存储库不断增长，模型训练和推断的传输和存储成本也在增加，消耗了大量资源。我们引入了基于将压缩嵌入传输给数据消费者而不是原始数据的神经嵌入压缩（NEC）。我们通过学习的神经压缩来调整基础模型（FM），生成多任务嵌入，同时在压缩率和嵌入实用性之间寻找平衡。我们仅对FM参数的一小部分（10%）进行更新，进行短期训练（预训练迭代的1%）。我们在两个EO任务上评估了NEC：场景分类和语义分割。与将传统压缩应用于原始数据相比，NEC在数据减少了75%到90%的情况下实现了类似的准确性。即使在99.7%的压缩下，性能在场景分类任务上也仅下降了5%。总体而言，NEC是一种数据高效且性能出色的多任务EO建模方法。

更新时间: 2024-07-09 22:15:31

领域: cs.LG

下载: http://arxiv.org/abs/2403.17886v5

RotRNN: Modelling Long Sequences with Rotations

Linear recurrent models, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple model with fewer theoretical assumptions than prior works, with a practical implementation that remains faithful to its theoretical derivation, achieving comparable scores to the LRU and SSMs on several long sequence modelling datasets.

Updated: 2024-07-09 21:37:36

标题: RotRNN：使用旋转建模长序列

摘要: 线性递归模型，如状态空间模型（SSMs）和线性递归单元（LRUs），最近在长序列建模基准测试中表现出最先进的性能。尽管它们取得了成功，但它们也带来了一些缺点，最明显的是它们复杂的初始化和归一化方案。在这项工作中，我们通过提出RotRNN来解决其中一些问题——这是一个利用旋转矩阵便利性质的线性递归模型。我们展示了RotRNN提供了一个简单的模型，比之前的工作具有更少的理论假设，并且实际实现保持忠实于其理论推导，实现了与LRU和SSMs在几个长序列建模数据集上可比较的得分。

更新时间: 2024-07-09 21:37:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.07239v1

The Quantum Imitation Game: Reverse Engineering of Quantum Machine Learning Models

Quantum Machine Learning (QML) amalgamates quantum computing paradigms with machine learning models, providing significant prospects for solving complex problems. However, with the expansion of numerous third-party vendors in the Noisy Intermediate-Scale Quantum (NISQ) era of quantum computing, the security of QML models is of prime importance, particularly against reverse engineering, which could expose trained parameters and algorithms of the models. We assume the untrusted quantum cloud provider is an adversary having white-box access to the transpiled user-designed trained QML model during inference. Reverse engineering (RE) to extract the pre-transpiled QML circuit will enable re-transpilation and usage of the model for various hardware with completely different native gate sets and even different qubit technology. Such flexibility may not be obtained from the transpiled circuit which is tied to a particular hardware and qubit technology. The information about the number of parameters, and optimized values can allow further training of the QML model to alter the QML model, tamper with the watermark, and/or embed their own watermark or refine the model for other purposes. In this first effort to investigate the RE of QML circuits, we perform RE and compare the training accuracy of original and reverse-engineered Quantum Neural Networks (QNNs) of various sizes. We note that multi-qubit classifiers can be reverse-engineered under specific conditions with a mean error of order 1e-2 in a reasonable time. We also propose adding dummy fixed parametric gates in the QML models to increase the RE overhead for defense. For instance, adding 2 dummy qubits and 2 layers increases the overhead by ~1.76 times for a classifier with 2 qubits and 3 layers with a performance overhead of less than 9%. We note that RE is a very powerful attack model which warrants further efforts on defenses.

Updated: 2024-07-09 21:35:19

标题: 量子模拟游戏：量子机器学习模型的逆向工程

摘要: Quantum Machine Learning (QML)将量子计算范式与机器学习模型相结合，为解决复杂问题提供了重要前景。然而，在量子计算的Noisy Intermediate-Scale Quantum (NISQ)时代，随着众多第三方供应商的扩张，QML模型的安全性变得至关重要，特别是针对逆向工程，可能暴露模型的训练参数和算法。我们假设不信任的量子云提供商是一个对用户设计的训练过的QML模型具有白盒访问权限的对手。通过逆向工程提取预转译的QML电路，可以重新转译并在完全不同的硬件上使用该模型，甚至使用不同的量子比特技术。这种灵活性可能无法从与特定硬件和量子比特技术绑定的转译电路中获得。关于参数数量和优化值的信息可以进一步训练QML模型，改变QML模型，篡改水印，或嵌入自己的水印或调整模型以用于其他目的。在这项首次调查QML电路的逆向工程中，我们进行了逆向工程，并比较了不同规模的原始和逆向工程的量子神经网络(QNN)的训练准确度。我们注意到，在合理的时间内，特定条件下的多量子比特分类器可以被逆向工程，平均误差约为1e-2。我们还提出在QML模型中添加虚拟固定参数门，以增加防御的逆向工程开销。例如，为具有2个量子比特和3层的分类器添加2个虚拟量子比特和2层，开销增加约1.76倍，性能开销不到9%。我们注意到逆向工程是一个非常强大的攻击模型，需要进一步努力进行防御。

更新时间: 2024-07-09 21:35:19

领域: quant-ph,cs.CR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2407.07237v1

Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology

As experts in voice modification, trans-feminine gender-affirming voice teachers have unique perspectives on voice that confound current understandings of speaker identity. To demonstrate this, we present the Versatile Voice Dataset (VVD), a collection of three speakers modifying their voices along gendered axes. The VVD illustrates that current approaches in speaker modeling, based on categorical notions of gender and a static understanding of vocal texture, fail to account for the flexibility of the vocal tract. Utilizing publicly-available speaker embeddings, we demonstrate that gender classification systems are highly sensitive to voice modification, and speaker verification systems fail to identify voices as coming from the same speaker as voice modification becomes more drastic. As one path towards moving beyond categorical and static notions of speaker identity, we propose modeling individual qualities of vocal texture such as pitch, resonance, and weight.

Updated: 2024-07-09 21:19:49

标题: 性别后的言语：关于言语科学和技术下一步的跨性别女性视角

摘要: 作为声音修改专家，跨性别女性肯定声音教师对声音具有独特的观点，这些观点使当前对说话者身份的理解变得复杂。为了证明这一点，我们提出了多功能声音数据集（VVD），其中包括三名说话者沿着性别轴修改他们的声音。VVD表明，基于性别的分类概念和对声音质地的固定理解的当前说话者建模方法，未能考虑到声道的灵活性。通过利用公开可用的说话者嵌入，我们证明性别分类系统对声音修改非常敏感，而说话者验证系统则无法识别声音是否来自相同的说话者，当声音修改变得更加激烈时。作为超越分类和静态说话者身份概念的一种途径，我们提出对声音质地的个体特质进行建模，例如音调、共鸣和重量。

更新时间: 2024-07-09 21:19:49

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.07235v1

FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding

Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains 10,000 flowcharts created using a customizable script. The dataset is enriched with annotations for visual components, OCR, Mermaid code representation, and VQA question-answer pairs. Despite the proven capabilities of Large Vision-Language Models (LVLMs) in various visual understanding tasks, their effectiveness in decoding flowcharts - a crucial element of scientific communication - has yet to be thoroughly investigated. The FlowLearn test set is crafted to assess the performance of LVLMs in flowchart comprehension. Our study thoroughly evaluates state-of-the-art LVLMs, identifying existing limitations and establishing a foundation for future enhancements in this relatively underexplored domain. For instance, in tasks involving simulated flowcharts, GPT-4V achieved the highest accuracy (58%) in counting the number of nodes, while Claude recorded the highest accuracy (83%) in OCR tasks. Notably, no single model excels in all tasks within the FlowLearn framework, highlighting significant opportunities for further development.

Updated: 2024-07-09 21:16:00

标题: FlowLearn：对流程图理解中的大型视觉语言模型进行评估

摘要: 流程图是表示复杂概念的图形工具，用简洁的视觉表示。本文介绍了FlowLearn数据集，这是一个旨在增强对流程图理解的资源。FlowLearn包含复杂的科学流程图和模拟流程图。科学子集包含来自科学文献的3,858个流程图，模拟子集包含使用可定制脚本创建的10,000个流程图。该数据集附带有视觉组件、OCR、美人鱼代码表示和VQA问答对的注释。尽管大型视觉-语言模型（LVLMs）在各种视觉理解任务中已被证明具有能力，但它们在解码流程图（科学交流的关键元素）方面的有效性尚未得到彻底调查。FlowLearn测试集旨在评估LVLMs在理解流程图方面的表现。我们的研究全面评估了最先进的LVLMs，确定了现有的局限性，并为这个相对未被充分开发的领域的未来增强奠定了基础。例如，在涉及模拟流程图的任务中，GPT-4V在计算节点数量方面实现了最高的准确率（58％），而Claude在OCR任务中记录了最高的准确率（83％）。值得注意的是，在FlowLearn框架内的所有任务中，没有单一模型在所有任务中表现出色，突显了进一步发展的重要机会。

更新时间: 2024-07-09 21:16:00

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05183v2

Gated Ensemble of Spatio-temporal Mixture of Experts for Multi-task Learning in Ride-hailing System

Designing spatio-temporal forecasting models separately in a task-wise and city-wise manner poses a burden for the expanding transportation network companies. Therefore, a multi-task learning architecture is proposed in this study by developing gated ensemble of spatio-temporal mixture of experts network (GESME-Net) with convolutional recurrent neural network (CRNN), convolutional neural network (CNN), and recurrent neural network (RNN) for simultaneously forecasting spatio-temporal tasks in a city as well as across different cities. Furthermore, a task adaptation layer is integrated with the architecture for learning joint representation in multi-task learning and revealing the contribution of the input features utilized in prediction. The proposed architecture is tested with data from Didi Chuxing for: (i) simultaneously forecasting demand and supply-demand gap in Beijing, and (ii) simultaneously forecasting demand across Chengdu and Xian. In both scenarios, models from our proposed architecture outperformed the single-task and multi-task deep learning benchmarks and ensemble-based machine learning algorithms.

Updated: 2024-07-09 21:14:08

标题: 在网约车系统中用于多任务学习的时空混合专家门控集合

摘要: 在任务和城市层面分别设计时空预测模型给不断扩展的交通网络公司带来了负担。因此，本研究提出了一种多任务学习架构，通过开发带有卷积循环神经网络（CRNN）、卷积神经网络（CNN）和循环神经网络（RNN）的时空专家混合门控集成网络（GESME-Net），同时预测城市内和跨城市的时空任务。此外，还集成了任务适应层，用于学习多任务学习中的联合表示，并揭示用于预测的输入特征的贡献。该架构使用滴滴出行的数据进行测试：（i）同时预测北京的需求和供需缺口，以及（ii）同时预测成都和西安的需求。在这两种情况下，我们提出的架构模型表现优于单一任务和多任务深度学习基准模型以及集成机器学习算法。

更新时间: 2024-07-09 21:14:08

领域: cs.LG

下载: http://arxiv.org/abs/2012.15408v4

(Security) Assertions by Large Language Models

The security of computer systems typically relies on a hardware root of trust. As vulnerabilities in hardware can have severe implications on a system, there is a need for techniques to support security verification activities. Assertion-based verification is a popular verification technique that involves capturing design intent in a set of assertions that can be used in formal verification or testing-based checking. However, writing security-centric assertions is a challenging task. In this work, we investigate the use of emerging large language models (LLMs) for code generation in hardware assertion generation for security, where primarily natural language prompts, such as those one would see as code comments in assertion files, are used to produce SystemVerilog assertions. We focus our attention on a popular LLM and characterize its ability to write assertions out of the box, given varying levels of detail in the prompt. We design an evaluation framework that generates a variety of prompts, and we create a benchmark suite comprising real-world hardware designs and corresponding golden reference assertions that we want to generate with the LLM.

Updated: 2024-07-09 21:08:06

标题: 大型语言模型的（安全）断言

摘要: 计算机系统的安全通常依赖于硬件信任根。由于硬件中的漏洞可能对系统造成严重影响，因此需要支持安全验证活动的技术。基于断言的验证是一种流行的验证技术，涉及在一组断言中捕获设计意图，这些断言可以用于形式验证或基于测试的检查。然而，编写以安全为中心的断言是一项具有挑战性的任务。在这项工作中，我们研究了在硬件断言生成中使用新兴大型语言模型（LLMs）进行代码生成，其中主要使用自然语言提示，如在断言文件中看到的代码注释，来生成SystemVerilog断言。我们专注于一种流行的LLM，并表征其在给定不同详细级别的提示时编写断言的能力。我们设计了一个评估框架，生成各种提示，并创建了一个基准套件，包括真实的硬件设计和相应的黄金参考断言，我们希望用LLM生成。

更新时间: 2024-07-09 21:08:06

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2306.14027v2

Using Galaxy Evolution as Source of Physics-Based Ground Truth for Generative Models

Generative models producing images have enormous potential to advance discoveries across scientific fields and require metrics capable of quantifying the high dimensional output. We propose that astrophysics data, such as galaxy images, can test generative models with additional physics-motivated ground truths in addition to human judgment. For example, galaxies in the Universe form and change over billions of years, following physical laws and relationships that are both easy to characterize and difficult to encode in generative models. We build a conditional denoising diffusion probabilistic model (DDPM) and a conditional variational autoencoder (CVAE) and test their ability to generate realistic galaxies conditioned on their redshifts (galaxy ages). This is one of the first studies to probe these generative models using physically motivated metrics. We find that both models produce comparable realistic galaxies based on human evaluation, but our physics-based metrics are better able to discern the strengths and weaknesses of the generative models. Overall, the DDPM model performs better than the CVAE on the majority of the physics-based metrics. Ultimately, if we can show that generative models can learn the physics of galaxy evolution, they have the potential to unlock new astrophysical discoveries.

Updated: 2024-07-09 21:01:08

标题: 利用星系演化作为基于物理的地面真实数据源为生成模型

摘要: 生成图像的生成模型在推动科学领域的发现方面具有巨大潜力，并需要能够量化高维输出的指标。我们提出，天体物理学数据，如星系图像，可以通过额外的基于物理学的真实性来测试生成模型，除了人类判断之外。例如，宇宙中的星系在数十亿年内形成并改变，遵循易于表征但难以编码的物理定律和关系。我们构建了一个条件去噪扩散概率模型（DDPM）和一个条件变分自动编码器（CVAE），并测试它们生成基于红移（星系年龄）条件的逼真星系的能力。这是第一项使用基于物理动机的指标探究这些生成模型的研究之一。我们发现，根据人类评价，这两个模型都能生成可比较逼真的星系，但我们基于物理学的指标更能够辨别生成模型的优势和劣势。总体而言，DDPM模型在大多数基于物理学的指标上表现比CVAE更好。最终，如果我们能够证明生成模型可以学习星系演化的物理学，它们将有潜力揭示新的天体物理学发现。

更新时间: 2024-07-09 21:01:08

领域: astro-ph.IM,cs.AI

下载: http://arxiv.org/abs/2407.07229v1

ConvNLP: Image-based AI Text Detection

The potentials of Generative-AI technologies like Large Language models (LLMs) to revolutionize education are undermined by ethical considerations around their misuse which worsens the problem of academic dishonesty. LLMs like GPT-4 and Llama 2 are becoming increasingly powerful in generating sophisticated content and answering questions, from writing academic essays to solving complex math problems. Students are relying on these LLMs to complete their assignments and thus compromising academic integrity. Solutions to detect LLM-generated text are compute-intensive and often lack generalization. This paper presents a novel approach for detecting LLM-generated AI-text using a visual representation of word embedding. We have formulated a novel Convolutional Neural Network called ZigZag ResNet, as well as a scheduler for improving generalization, named ZigZag Scheduler. Through extensive evaluation using datasets of text generated by six different state-of-the-art LLMs, our model demonstrates strong intra-domain and inter-domain generalization capabilities. Our best model detects AI-generated text with an impressive average detection rate (over inter- and intra-domain test data) of 88.35%. Through an exhaustive ablation study, our ZigZag ResNet and ZigZag Scheduler provide a performance improvement of nearly 4% over the vanilla ResNet. The end-to-end inference latency of our model is below 2.5ms per sentence. Our solution offers a lightweight, computationally efficient, and faster alternative to existing tools for AI-generated text detection, with better generalization performance. It can help academic institutions in their fight against the misuse of LLMs in academic settings. Through this work, we aim to contribute to safeguarding the principles of academic integrity and ensuring the trustworthiness of student work in the era of advanced LLMs.

Updated: 2024-07-09 20:44:40

标题: ConvNLP：基于图像的AI文本检测

摘要: 生成式AI技术（如大型语言模型LLMs）的潜力革命性地改变教育受到了其被滥用的道德考虑的影响，这加剧了学术不端行为的问题。像GPT-4和Llama 2这样的LLMs在生成复杂内容和回答问题方面变得越来越强大，从写学术论文到解决复杂数学问题。学生们依赖这些LLMs来完成他们的作业，从而损害了学术诚信。检测LLM生成的文本的解决方案需要大量计算资源，并且通常缺乏泛化性。本文提出了一种新颖的方法，使用单词嵌入的可视化表示来检测LLM生成的AI文本。我们设计了一种新颖的卷积神经网络称为ZigZag ResNet，以及一个用于提高泛化性能的调度器，名为ZigZag Scheduler。通过使用由六种不同最先进的LLMs生成的文本数据集进行广泛评估，我们的模型展示了强大的领域内和领域间泛化能力。我们的最佳模型检测AI生成的文本的平均检测率（在领域间和领域内测试数据上）达到了惊人的88.35%。通过详尽的剔除研究，我们的ZigZag ResNet和ZigZag Scheduler相较于基本ResNet提供了将近4%的性能改进。我们模型的端到端推理延迟低于每句话2.5毫秒。我们的解决方案为检测AI生成的文本提供了一种轻量级、计算高效且更快的替代方案，具有更好的泛化性能。它可以帮助学术机构在抵抗学术环境中LLMs被滥用的斗争中。通过这项工作，我们的目标是为维护学术诚信原则和确保在先进LLMs时代学生作品的可信度做出贡献。

更新时间: 2024-07-09 20:44:40

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.07225v1

SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems

This paper presents a novel clustering algorithm from the SPINEX (Similarity-based Predictions with Explainable Neighbors Exploration) algorithmic family. The newly proposed clustering variant leverages the concept of similarity and higher-order interactions across multiple subspaces to group data into clusters. To showcase the merit of SPINEX, a thorough set of benchmarking experiments was carried out against 13 algorithms, namely, Affinity Propagation, Agglomerative, Birch, DBSCAN, Gaussian Mixture, HDBSCAN, K-Means, KMedoids, Mean Shift, MiniBatch K-Means, OPTICS, Spectral Clustering, and Ward Hierarchical. Then, the performance of all algorithms was examined across 51 synthetic and real datasets from various domains, dimensions, and complexities. Furthermore, we present a companion complexity analysis to compare the complexity of SPINEX to that of the aforementioned algorithms. Our results demonstrate that SPINEX can outperform commonly adopted clustering algorithms by ranking within the top-5 best performing algorithms and has moderate complexity. Finally, a demonstration of the explainability capabilities of SPINEX, along with future research needs, is presented.

Updated: 2024-07-09 20:38:01

标题: SPINEX-Clustering: 基于相似性的可解释邻域探索用于聚类问题的预测

摘要: 本文介绍了来自SPINEX（基于相似性的可解释邻居探索预测）算法家族的一种新型聚类算法。新提出的聚类变体利用相似性概念和跨多个子空间的高阶交互作用将数据分组到簇中。为了展示SPINEX的优点，我们对13种算法进行了全面的基准实验，包括亲和传播、凝聚、Birch、DBSCAN、高斯混合、HDBSCAN、K均值、K中心、平均漂移、MiniBatch K均值、OPTICS、谱聚类和Ward层次聚类。然后，我们考察了所有算法在来自各个领域、维度和复杂性的51个合成和真实数据集上的性能。此外，我们提供了一个伴随的复杂性分析，以比较SPINEX的复杂性与上述算法的复杂性。我们的结果表明，SPINEX可以通过排名在前5个性能最佳的算法中胜过常用的聚类算法，并且具有适度的复杂性。最后，我们展示了SPINEX的可解释能力以及未来研究的需求。

更新时间: 2024-07-09 20:38:01

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.07222v1

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non-iid or the number of malicious clients is large, as confirmed in our experiments. In this work, we propose FLForensics, the first poison-forensics method for FL. FLForensics complements existing training-phase defenses. In particular, when training-phase defenses fail and a poisoned global model is deployed, FLForensics aims to trace back the malicious clients that performed the poisoning attack after a misclassified target input is identified. We theoretically show that FLForensics can accurately distinguish between benign and malicious clients under a formal definition of poisoning attack. Moreover, we empirically show the effectiveness of FLForensics at tracing back both existing and adaptive poisoning attacks on five benchmark datasets.

Updated: 2024-07-09 20:35:36

标题: 将恶意客户端在毒化攻击中的追溯归因于联邦学习

摘要: 中毒攻击危害了联邦学习（FL）的训练阶段，使学习到的全局模型错误分类攻击者选择的目标输入。现有的防御主要集中在保护FL的训练阶段，使学到的全局模型不受污染。然而，这些防御在客户端的本地训练数据高度非独立同分布或恶意客户端数量较大时往往效果有限，这在我们的实验证实中已得到确认。在这项工作中，我们提出了FLForensics，这是FL的首个毒素取证方法。FLForensics补充了现有的训练阶段防御措施。特别是，当训练阶段的防御失败并且部署了有毒的全局模型时，FLForensics旨在在确定错误分类的目标输入后追踪执行毒素攻击的恶意客户端。我们在理论上表明，FLForensics能够在毒素攻击的正式定义下准确区分良性和恶意客户端。此外，我们还经验性地展示了FLForensics在追踪五个基准数据集上的现有和自适应毒素攻击的有效性。

更新时间: 2024-07-09 20:35:36

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2407.07221v1

Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations

One of the most promising applications of machine learning (ML) in computational physics is to accelerate the solution of partial differential equations (PDEs). The key objective of ML-based PDE solvers is to output a sufficiently accurate solution faster than standard numerical methods, which are used as a baseline comparison. We first perform a systematic review of the ML-for-PDE solving literature. Of articles that use ML to solve a fluid-related PDE and claim to outperform a standard numerical method, we determine that 79% (60/76) compare to a weak baseline. Second, we find evidence that reporting biases, especially outcome reporting bias and publication bias, are widespread. We conclude that ML-for-PDE solving research is overoptimistic: weak baselines lead to overly positive results, while reporting biases lead to underreporting of negative results. To a large extent, these issues appear to be caused by factors similar to those of past reproducibility crises: researcher degrees of freedom and a bias towards positive results. We call for bottom-up cultural changes to minimize biased reporting as well as top-down structural reforms intended to reduce perverse incentives for doing so.

Updated: 2024-07-09 20:28:03

标题: 弱基线和报告偏差导致机器学习在液体相关的偏微分方程中过于乐观

摘要: 机器学习（ML）在计算物理学中最有前途的应用之一是加速偏微分方程（PDE）的解决方案。基于ML的PDE求解器的关键目标是比标准数值方法更快地输出足够精确的解决方案，这些数值方法被用作基准比较。我们首先对ML用于PDE求解的文献进行系统性回顾。在那些使用ML解决与流体相关的PDE并声称优于标准数值方法的文章中，我们确定79%（60/76）与弱基准进行比较。其次，我们发现证据表明，报告偏倚，特别是结果报告偏倚和出版偏倚，是普遍存在的。我们得出结论，ML用于PDE求解的研究存在过度乐观的问题：弱基准导致结果过于积极，而报告偏倚导致负面结果被低估。在很大程度上，这些问题似乎是由类似于过去的可重复性危机引起的因素造成的：研究人员的自由度和对正面结果的偏好。我们呼吁进行自下而上的文化变革，以最小化偏见报告，同时进行自上而下的结构性改革，以减少为此产生的不当激励。

更新时间: 2024-07-09 20:28:03

领域: math.NA,cs.LG,cs.NA,physics.flu-dyn

下载: http://arxiv.org/abs/2407.07218v1

The Emperor is Now Clothed: A Secure Governance Framework for Web User Authentication through Password Managers

Existing approaches to facilitate the interaction between password managers and web applications fall short of providing adequate functionality and mitigation strategies against prominent attacks. HTML Autofill is not sufficiently expressive, Credential Management API does not support browser extension password managers, and other proposed solutions do not conform to established user mental models. In this paper, we propose Berytus, a browser-based governance framework that mediates the interaction between password managers and web applications. Two APIs are designed to support Berytus acting as an orchestrator between password managers and web applications. An implementation of the framework in Firefox is developed that fully supports registration and authentication processes. As an orchestrator, Berytus is able to authenticate web applications and facilitate authenticated key exchange between web applications and password managers, which as we show, can provide effective mitigation strategies against phishing, cross-site scripting, inline code injection (e.g., by a malicious browser extension), and TLS proxy in the middle attacks, whereas existing mitigation strategies such as Content Security Policy and credential tokenisation are only partially effective. The framework design also provides desirable functional properties such as support for multi-step, multi-factor, and custom authentication schemes. We provide a comprehensive security and functionality evaluation and discuss possible future directions.

Updated: 2024-07-09 19:49:49

标题: 现在的皇帝穿上了新衣：一种通过密码管理器实现Web用户身份验证的安全治理框架

摘要: 现有的旨在促进密码管理器和网络应用程序之间互动的方法缺乏提供足够功能和缓解突出攻击的策略。 HTML自动填充不够表达，凭据管理API不支持浏览器扩展密码管理器，而其他提出的解决方案也不符合已建立的用户心智模型。本文提出了Berytus，一个基于浏览器的治理框架，用于调解密码管理器和网络应用程序之间的互动。设计了两个API来支持Berytus作为密码管理器和网络应用程序之间的协调器。在Firefox中开发了该框架的实现，完全支持注册和认证过程。作为协调器，Berytus能够验证网络应用程序并促进网络应用程序和密码管理器之间的经过身份验证的密钥交换，正如我们所展示的，可以提供有效的缓解策略来抵御钓鱼、跨站点脚本、内联代码注入（例如，由恶意浏览器扩展）和TLS代理中间攻击，而现有的缓解策略，如内容安全策略和凭据标记化，则只能部分有效。该框架设计还提供了理想的功能属性，如支持多步骤、多因素和自定义认证方案。我们提供了全面的安全性和功能评估，并讨论了可能的未来方向。

更新时间: 2024-07-09 19:49:49

领域: cs.CR

下载: http://arxiv.org/abs/2407.07205v1

Commute-Time-Optimised Graphs for GNNs

We explore graph rewiring methods that optimise commute time. Recent graph rewiring approaches facilitate long-range interactions in sparse graphs, making such rewirings commute-time-optimal $\textit{on average}$. However, when an expert prior exists on which node pairs should or should not interact, a superior rewiring would favour short commute times between these privileged node pairs. We construct two synthetic datasets with known priors reflecting realistic settings, and use these to motivate two bespoke rewiring methods that incorporate the known prior. We investigate the regimes where our rewiring improves test performance on the synthetic datasets. Finally, we perform a case study on a real-world citation graph to investigate the practical implications of our work.

Updated: 2024-07-09 19:31:49

标题: GNNs的通勤时间优化图

摘要: 我们探讨了优化通勤时间的图重连方法。最近的图重连方法在稀疏图中促进了长距离交互，使这些重连在平均情况下达到了通勤时间最优。然而，当存在专家先验知识指导哪些节点对应该交互，哪些不应该交互时，一种更优的重连应该更倾向于这些特权节点对之间的通勤时间较短。我们构建了两个具有已知先验知识的合成数据集，反映了现实情境，并利用这些数据集激发了两种定制的重连方法，以融入已知的先验知识。我们研究了我们的重连在合成数据集上改善测试性能的情况。最后，我们对一个实际的引用图进行了案例研究，以探讨我们工作的实际影响。

更新时间: 2024-07-09 19:31:49

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2407.08762v1

A New Approach Towards Autoformalization

Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of background and context. In this paper, we propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks: unlinked formalization (formalization with unlinked definitions and theorems), entity linking (linking to the proper theorems and definitions), and finally adjusting types so it passes the type checker. In addition, we present arXiv2Formal, a benchmark dataset for unlinked formalization consisting of 50 theorems formalized for the Lean theorem prover sampled from papers on arXiv.org. We welcome any contributions from the community to future versions of this dataset.

Updated: 2024-07-09 19:28:30

标题: 一种新的自动形式化方法

摘要: 验证数学证明是困难的，但可以借助计算机自动化。自动形式化是将自然语言数学自动转换为可以由程序验证的形式语言的任务。这是一项具有挑战性的任务，特别是对于研究论文中的高级数学。研究论文中的数学需要大量的背景和上下文。在本文中，我们提出了一个解决研究级数学自动形式化的途径，通过将任务分解为更容易和更可接近的子任务：未链接形式化（未链接定义和定理的形式化）、实体链接（链接到正确的定理和定义），最后调整类型以使其通过类型检查器。此外，我们提出了arXiv2Formal，一个未链接形式化的基准数据集，其中包括从arXiv.org上的论文中抽样的50个定理针对Lean定理证明器进行形式化。我们欢迎社区对该数据集的未来版本提出任何贡献。

更新时间: 2024-07-09 19:28:30

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2310.07957v3

Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO

With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques. The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations. Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it. Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.

Updated: 2024-07-09 19:27:34

标题: 不均匀性就是你所需要的：使用ECHO进行高效及时的加密流量分类

摘要: 随着95%的互联网流量现在加密，对这种流量进行有效分类对于网络安全和管理至关重要。本文介绍了ECHO - 一种用于基于ML/DL的加密流量分类的新型优化过程。ECHO旨在同时针对分类时间和内存利用率进行优化，并结合了两种创新技术。第一个组件HO（超参数优化的二元化）旨在创建高效的流量表示。虽然先前的研究通常使用将数据包大小和数据包到达时间映射到固定大小箱中的表示，但我们发现非均匀的二元化明显更有效。这些非均匀的二元化是通过在训练阶段使用超参数优化算法得出的。对于给定的表示大小，HO显著提高了准确性，或者等效地使用较小的表示实现了可比的准确性。然后，我们引入了EC（流量的早期分类），它通过适应不同退出时间的分类器级联来实现更快的分类，其中分类基于自信水平。EC将平均分类延迟降低了高达90%。值得注意的是，这种方法不仅保持了分类准确性，而且在某些情况下还提高了准确性。通过使用三个公开可用的数据集，我们表明了结合了早期分类和超参数优化（ECHO）的方法可显著提高分类效率。

更新时间: 2024-07-09 19:27:34

领域: cs.NI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.01852v3

Agnostic Active Learning of Single Index Models with Linear Sample Complexity

We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d^{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.

Updated: 2024-07-09 19:20:57

标题: 不可知主义者主动学习单指数模型的线性样本复杂性

摘要: 我们研究了形式为$F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$的单指数模型的主动学习方法，其中$f:\mathbb{R} \to \mathbb{R}$，${\mathbf x,\mathbf w} \in \mathbb{R}^d$。除了作为非线性神经网络简单示例的理论兴趣外，单指数模型由于在科学机器学习中的应用，如偏微分方程（PDEs）的代理建模，而受到了近期的重视。这些应用需要对对抗噪声鲁棒的样本高效主动学习方法。即，即使在具有挑战性的不可知学习环境中也能正常工作。我们提供了关于单指数模型不可知主动学习的两个主要结果。首先，当$f$已知且利普希茨时，我们证明通过{统计杠杆得分采样}收集的$\tilde{O}(d)$个样本足以学习一个接近最优的单指数模型。杠杆得分采样实现简单、高效，并且已广泛用于主动学习线性模型。我们的结果对数据分布不做任何假设，除对数因子外是最优的，并且在最近的\cite{gajjar2023active}的${O}(d^{2})$界上有二次改进。其次，我们展示了即使在更困难的情况下，即当$f$是\emph{未知}时，$\tilde{O}(d)$个样本也足够。我们的结果利用了高维概率工具，包括达德利不等式和双Sudakov最小化，以及一种新的、对分布敏感的利普希茨函数类的离散化。

更新时间: 2024-07-09 19:20:57

领域: cs.LG

下载: http://arxiv.org/abs/2405.09312v3

Principal Component Analysis in Space Forms

Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.

Updated: 2024-07-09 19:09:28

标题: 空间形式中的主成分分析

摘要: 主成分分析(PCA)是现代数据科学的重要工具。虽然PCA假定数据符合欧几里得几何，但对于特定数据类型，如分层和循环数据结构，其他空间更为合适。我们研究了空间形式中的PCA；即具有恒定曲率的空间。在黎曼流形上的一个点上，我们可以基于一组切向量定义一个黎曼仿射子空间。为给定空间形式中的点找到最佳的低维仿射子空间相当于降维。我们的空间形式PCA(SFPCA)寻求最能代表一组流形值点且具有最小投影成本的仿射子空间。我们提出了适当的成本函数，具有两个特性：(1)其最佳仿射子空间是特征方程的解，(2)不同维度的最佳仿射子空间形成一个嵌套集合。这些特性在现有方法上取得了进展，这些方法大多是迭代算法，收敛速度较慢且理论保证较弱。我们在球形和双曲空间中对所提出的SFPCA进行了真实和模拟数据的评估。我们表明，它在估计真实子空间(在模拟数据中)方面的收敛速度或准确性方面优于替代方法，通常两者兼具。

更新时间: 2024-07-09 19:09:28

领域: stat.ML,cs.LG,eess.SP,math.DG

下载: http://arxiv.org/abs/2301.02750v2

Joint Composite Latent Space Bayesian Optimization

Bayesian Optimization (BO) is a technique for sample-efficient black-box optimization that employs probabilistic models to identify promising input locations for evaluation. When dealing with composite-structured functions, such as f=g o h, evaluating a specific location x yields observations of both the final outcome f(x) = g(h(x)) as well as the intermediate output(s) h(x). Previous research has shown that integrating information from these intermediate outputs can enhance BO performance substantially. However, existing methods struggle if the outputs h(x) are high-dimensional. Many relevant problems fall into this setting, including in the context of generative AI, molecular design, or robotics. To effectively tackle these challenges, we introduce Joint Composite Latent Space Bayesian Optimization (JoCo), a novel framework that jointly trains neural network encoders and probabilistic models to adaptively compress high-dimensional input and output spaces into manageable latent representations. This enables viable BO on these compressed representations, allowing JoCo to outperform other state-of-the-art methods in high-dimensional BO on a wide variety of simulated and real-world problems.

Updated: 2024-07-09 19:02:16

标题: 联合复合潜空间贝叶斯优化

摘要: 贝叶斯优化（BO）是一种用于高效样本黑盒优化的技术，它利用概率模型来识别有前途的输入位置以进行评估。在处理复合结构函数时，例如 f=g o h，评估特定位置 x 会产生最终结果 f(x) = g(h(x)) 和中间输出 h(x) 的观察结果。先前的研究表明，整合这些中间输出的信息可以大大增强 BO 的性能。然而，现有方法在输出 h(x) 是高维的情况下存在困难。许多相关问题都属于这种情况，包括在生成 AI、分子设计或机器人技术领域。为了有效应对这些挑战，我们引入了联合复合潜在空间贝叶斯优化（JoCo）的新框架，该框架同时训练神经网络编码器和概率模型，以自适应地压缩高维输入和输出空间到可管理的潜在表示。这使得在这些压缩表示上进行可行的 BO 成为可能，使 JoCo 能够在各种模拟和实际问题上胜过其他最先进的方法。

更新时间: 2024-07-09 19:02:16

领域: cs.LG

下载: http://arxiv.org/abs/2311.02213v2

Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity

While data-driven predictive models are a strictly technological construct, they may operate within a social context in which benign engineering choices entail implicit, indirect and unexpected real-life consequences. Fairness of such systems -- pertaining both to individuals and groups -- is one relevant consideration in this space; algorithms can discriminate people across various protected characteristics regardless of whether these properties are included in the data or discernible through proxy variables. To date, this notion has predominantly been studied for a fixed model, often under different classification thresholds, striving to identify and eradicate undesirable, discriminative and possibly unlawful aspects of its operation. Here, we backtrack on this fixed model assumption to propose and explore a novel definition of cross-model fairness where individuals can be harmed when one predictor is chosen ad hoc from a group of equally well performing models, i.e., in view of utility-based model multiplicity. Since a person may be classified differently across models that are otherwise considered equivalent, this individual could argue for a predictor granting them the most favourable outcome, employing which may have adverse effects on other people. We introduce this scenario with a two-dimensional example and linear classification; then, we present a comprehensive empirical study based on real-life predictive models and data sets that are popular with the algorithmic fairness community; finally, we investigate analytical properties of cross-model fairness and its ramifications in a broader context. Our findings suggest that such unfairness can be readily found in real life and it may be difficult to mitigate by technical means alone as doing so is likely to degrade predictive performance.

Updated: 2024-07-09 19:00:08

标题: 跨模型公平性：模型多样性下公平性和伦理的实证研究

摘要: 尽管数据驱动的预测模型是严格的技术构建，但它们可能在社会背景下运行，其中良性工程选择可能导致隐含的、间接的和意想不到的现实后果。在这一领域，这种系统的公平性--与个人和群体相关--是一个相关考虑因素；算法可以在各种受保护特征中歧视人们，无论这些特性是否包含在数据中或通过代理变量可辨认。迄今为止，这个概念主要是针对一个固定模型进行研究的，通常在不同的分类阈值下，努力识别和消除其运行中的不良、歧视性和可能非法的方面。在这里，我们放弃了这个固定模型的假设，提出并探讨了一个新的跨模型公平性的定义，在这种定义中，当从一组表现同样出色的模型中临时选择一个预测器时，个人可能会受到伤害，即基于效用的多模型选择。由于一个人可能在被认为相当的模型中被不同地分类，这个个人可能会为授予他们最有利结果的预测器辩护，使用这个预测器可能对其他人产生不利影响。我们以一个二维示例和线性分类介绍了这种情景；然后，我们基于与算法公平性社区流行的真实预测模型和数据集展开了一项全面的实证研究；最后，我们研究了跨模型公平性的分析特性及其在更广泛背景下的影响。我们的研究结果表明，这种不公平在现实生活中很容易发现，并且仅凭技术手段可能很难加以缓解，因为这样做可能会降低预测性能。

更新时间: 2024-07-09 19:00:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2203.07139v4

TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era

High-Energy Physics experiments are facing a multi-fold data increase with every new iteration. This is certainly the case for the upcoming High-Luminosity LHC upgrade. Such increased data processing requirements forces revisions to almost every step of the data processing pipeline. One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking. A Machine Learning-assisted solution is expected to provide significant improvements, since the most time-consuming step in tracking is the assignment of hits to particles or track candidates. This is the topic of this paper. We take inspiration from large language models. As such, we consider two approaches: the prediction of the next word in a sentence (next hit point in a track), as well as the one-shot prediction of all hits within an event. In an extensive design effort, we have experimented with three models based on the Transformer architecture and one model based on the U-Net architecture, performing track association predictions for collision event hit points. In our evaluation, we consider a spectrum of simple to complex representations of the problem, eliminating designs with lower metrics early on. We report extensive results, covering both prediction accuracy (score) and computational performance. We have made use of the REDVID simulation framework, as well as reductions applied to the TrackML data set, to compose five data sets from simple to complex, for our experiments. The results highlight distinct advantages among different designs in terms of prediction accuracy and computational performance, demonstrating the efficiency of our methodology. Most importantly, the results show the viability of a one-shot encoder-classifier based Transformer solution as a practical approach for the task of tracking.

Updated: 2024-07-09 18:47:25

标题: TrackFormers: 寻找适用于高亮度LHC时代的基于Transformer的粒子跟踪方法

摘要: 高能物理实验每一次新的迭代都面临着数据量的多重增加。这对即将到来的高亮度LHC升级来说无疑是事实。这种增加的数据处理需求迫使几乎每个数据处理管道的步骤都进行修订。其中需要彻底改造的一个步骤是粒子轨迹重建任务，也就是跟踪。预计基于机器学习的解决方案将带来显著的改进，因为跟踪中最耗时的步骤是将hit分配给粒子或轨迹候选。这就是本文的主题。我们从大型语言模型中得到启发。因此，我们考虑了两种方法：预测句子中的下一个单词（轨迹中的下一个hit点），以及一次性预测事件中所有hit点。在一个广泛的设计工作中，我们尝试了基于Transformer架构的三种模型和基于U-Net架构的一种模型，用于对碰撞事件hit点进行轨迹关联预测。在我们的评估中，我们考虑了从简单到复杂的问题表示，及早排除了指标较低的设计。我们报告了广泛的结果，涵盖了预测准确性（得分）和计算性能。我们利用了REDVID模拟框架，以及对TrackML数据集应用的减少，为我们的实验组合了从简单到复杂的五个数据集。结果突显了不同设计在预测准确性和计算性能方面的明显优势，展示了我们方法的效率。最重要的是，结果显示了基于一次性编码器-分类器的Transformer解决方案作为跟踪任务的实用方法的可行性。

更新时间: 2024-07-09 18:47:25

领域: hep-ex,cs.LG

下载: http://arxiv.org/abs/2407.07179v1

Training Guarantees of Neural Network Classification Two-Sample Tests by Kernel Analysis

We construct and analyze a neural network two-sample test to determine whether two datasets came from the same distribution (null hypothesis) or not (alternative hypothesis). We perform time-analysis on a neural tangent kernel (NTK) two-sample test. In particular, we derive the theoretical minimum training time needed to ensure the NTK two-sample test detects a deviation-level between the datasets. Similarly, we derive the theoretical maximum training time before the NTK two-sample test detects a deviation-level. By approximating the neural network dynamics with the NTK dynamics, we extend this time-analysis to the realistic neural network two-sample test generated from time-varying training dynamics and finite training samples. A similar extension is done for the neural network two-sample test generated from time-varying training dynamics but trained on the population. To give statistical guarantees, we show that the statistical power associated with the neural network two-sample test goes to 1 as the neural network training samples and test evaluation samples go to infinity. Additionally, we prove that the training times needed to detect the same deviation-level in the null and alternative hypothesis scenarios are well-separated. Finally, we run some experiments showcasing a two-layer neural network two-sample test on a hard two-sample test problem and plot a heatmap of the statistical power of the two-sample test in relation to training time and network complexity.

Updated: 2024-07-09 18:45:58

标题: 神经网络分类两样本检验的训练保证：通过核分析

摘要: 我们构建并分析了一个神经网络双样本检验，以确定两个数据集是否来自相同的分布（零假设）或不是（备择假设）。我们对神经切线核（NTK）双样本检验进行时间分析。特别地，我们推导出确保NTK双样本检验检测到数据集之间偏差水平所需的理论最小训练时间。同样地，我们推导出NTK双样本检验在检测到偏差水平之前所需的理论最大训练时间。通过将神经网络动态近似为NTK动态，我们将这种时间分析扩展到由时间变化的训练动态和有限训练样本生成的现实神经网络双样本检验。类似的扩展也适用于由时间变化的训练动态生成但在总体上进行训练的神经网络双样本检验。为了给出统计保证，我们展示了神经网络双样本检验相关的统计功效随着神经网络训练样本和测试评估样本趋近于无穷大而趋向于1。此外，我们证明了在零假设和备择假设场景中检测相同偏差水平所需的训练时间是明显分开的。最后，我们在一个难题上运行了一些实验，展示了一个两层神经网络双样本检验，并绘制了与训练时间和网络复杂度相关的双样本检验的统计功效的热力图。

更新时间: 2024-07-09 18:45:58

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.04806v2

Non-Coherent Over-the-Air Decentralized Gradient Descent

Implementing Decentralized Gradient Descent (DGD) in wireless systems is challenging due to noise, fading, and limited bandwidth, necessitating topology awareness, transmission scheduling, and the acquisition of channel state information (CSI) to mitigate interference and maintain reliable communications. These operations may result in substantial signaling overhead and scalability challenges in large networks lacking central coordination. This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI (both average and instantaneous). At its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme that exploits a noisy energy superposition property of wireless channels. Nodes encode their local optimization signals into energy levels within an OFDM frame and transmit simultaneously, without coordination. The key insight is that the received energy equals, on average, the sum of the energies of the transmitted signals, scaled by their respective average channel gains, akin to a consensus step. This property enables unbiased consensus estimation, utilizing average channel gains as mixing weights, thereby removing the need for their explicit design or for CSI. Introducing a consensus stepsize mitigates consensus estimation errors due to energy fluctuations around their expected values. For strongly-convex problems, it is shown that the expected squared distance between the local and globally optimum models vanishes at a rate of $\mathcal O(1/\sqrt{k})$ after $k$ iterations, with suitable decreasing learning and consensus stepsizes. Extensions accommodate a broad class of fading models and frequency-selective channels. Numerical experiments on image classification demonstrate faster convergence in terms of running time compared to state-of-the-art schemes, especially in dense network scenarios.

Updated: 2024-07-09 18:32:09

标题: 非相干无线分布式梯度下降

摘要: 在无线系统中实施分散梯度下降（DGD）面临噪声、衰落和带宽有限等挑战，需要拓扑意识、传输调度和获取信道状态信息（CSI）以减轻干扰并保持可靠通信。这些操作可能导致大型网络中的显著信令开销和可扩展性挑战，缺乏中央协调。本文介绍了一种可扩展的DGD算法，消除了对调度、拓扑信息或CSI（平均和瞬时）的需求。其核心是一种利用无线信道的噪声能量叠加特性的非相干空中共识方案。节点将其本地优化信号编码成OFDM帧内的能量级别并同时传输，无需协调。关键洞见是，接收到的能量等于平均而言，传输信号的能量之和，按照它们各自的平均信道增益进行缩放，类似于共识步骤。该特性利用了平均信道增益作为混合权重，从而消除了对其明确设计或CSI的需求。引入共识步长可缓解由于能量围绕其期望值波动而导致的共识估计误差。对于强凸问题，经验证在$k$次迭代后，本地和全局最优模型之间的预期平方距离以$\mathcal O(1/\sqrt{k})$的速率消失，同时采用适当减小的学习和共识步长。扩展包括广泛的衰落模型和频率选择性信道。基于图像分类的数值实验显示，与最先进的方案相比，特别是在密集网络场景中，收敛速度更快。

更新时间: 2024-07-09 18:32:09

领域: eess.SP,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2211.10777v3

A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrogram, we propose a Mel spectrogram enhancement paradigm based on the continuous wavelet transform (CWT). This paradigm introduces an additional task: a more detailed wavelet spectrogram, which like the post-processing network takes as input the Mel spectrogram output by the decoder. We choose Tacotron2 and Fastspeech2 for experimental validation in order to test autoregressive (AR) and non-autoregressive (NAR) speech systems, respectively. The experimental results demonstrate that the speech synthesised using the model with the Mel spectrogram enhancement paradigm exhibits higher MOS, with an improvement of 0.14 and 0.09 compared to the baseline model, respectively. These findings provide some validation for the universality of the enhancement paradigm, as they demonstrate the success of the paradigm in different architectures.

Updated: 2024-07-09 18:21:48

标题: 一种基于CWT的语音合成中的Mel频谱图增强范式

摘要: 声学特征在提高合成语音质量方面发挥着重要作用。目前，梅尔频谱图是大多数声学模型中广泛使用的声学特征。然而，由于其傅立叶变换过程引起的细粒度损失，梅尔频谱图合成的语音清晰度在突变信号中受到损害。为了获得更详细的梅尔频谱图，我们提出了一种基于连续小波变换（CWT）的梅尔频谱图增强范式。该范式引入了一个额外的任务：一个更详细的小波频谱图，类似于后处理网络，以解码器输出的梅尔频谱图作为输入。我们选择Tacotron2和Fastspeech2进行实验验证，以分别测试自回归（AR）和非自回归（NAR）语音系统。实验结果表明，使用具有梅尔频谱图增强范式的模型合成的语音具有更高的MOS，分别比基准模型提高了0.14和0.09。这些发现为增强范式的普适性提供了一些验证，因为它们展示了该范式在不同结构中的成功。

更新时间: 2024-07-09 18:21:48

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2406.12164v2

Guessing human intentions to avoid dangerous situations in caregiving robots

For robots to interact socially, they must interpret human intentions and anticipate their potential outcomes accurately. This is particularly important for social robots designed for human care, which may face potentially dangerous situations for people, such as unseen obstacles in their way, that should be avoided. This paper explores the Artificial Theory of Mind (ATM) approach to inferring and interpreting human intentions. We propose an algorithm that detects risky situations for humans, selecting a robot action that removes the danger in real time. We use the simulation-based approach to ATM and adopt the 'like-me' policy to assign intentions and actions to people. Using this strategy, the robot can detect and act with a high rate of success under time-constrained situations. The algorithm has been implemented as part of an existing robotics cognitive architecture and tested in simulation scenarios. Three experiments have been conducted to test the implementation's robustness, precision and real-time response, including a simulated scenario, a human-in-the-loop hybrid configuration and a real-world scenario.

Updated: 2024-07-09 18:20:06

标题: 猜测人类意图以避免护理机器人中的危险情况

摘要: 为了使机器人能够进行社交互动，它们必须准确解释人类的意图并预测其潜在结果。这对于为人类护理而设计的社交机器人尤为重要，它们可能面临潜在危险的情况，例如道路上看不见的障碍物，应该避免。本文探讨了推断和解释人类意图的人工心灵理论（ATM）方法。我们提出了一种算法，可以实时检测对人类构成危险的情况，选择一种机器人行动来消除危险。我们采用基于模拟的ATM方法，并采用“像我”的政策来为人类分配意图和行动。使用这种策略，机器人可以在时间受限的情况下以高成功率检测和行动。该算法已作为现有机器人认知架构的一部分实施，并在模拟场景中进行了测试。已进行了三个实验，以测试实施的稳健性、精度和实时响应，包括一个模拟场景、一个人机混合配置和一个真实场景。

更新时间: 2024-07-09 18:20:06

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2403.16291v3

scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

We propose a novel method, scTree, for single-cell Tree Variational Autoencoders, extending a hierarchical clustering approach to single-cell RNA sequencing data. scTree corrects for batch effects while simultaneously learning a tree-structured data representation. This VAE-based method allows for a more in-depth understanding of complex cellular landscapes independently of the biasing effects of batches. We show empirically on seven datasets that scTree discovers the underlying clusters of the data and the hierarchical relations between them, as well as outperforms established baseline methods across these datasets. Additionally, we analyze the learned hierarchy to understand its biological relevance, thus underpinning the importance of integrating batch correction directly into the clustering procedure.

Updated: 2024-07-09 18:17:26

标题: scTree：在scRNA-seq数据中发现细胞层次结构时考虑批次效应

摘要: 我们提出了一种新颖的方法scTree，用于单细胞树变分自动编码器，将层次聚类方法扩展到单细胞RNA测序数据。scTree可以在纠正批次效应的同时学习树状数据表示。这种基于VAE的方法允许更深入地理解复杂的细胞景观，独立于批次的偏倚效应。我们在七个数据集上通过实验证明，scTree能够发现数据的潜在聚类和它们之间的层次关系，同时在这些数据集上优于已建立的基线方法。此外，我们分析了学习到的层次结构，以了解其生物学相关性，从而强调了将批次校正直接整合到聚类过程中的重要性。

更新时间: 2024-07-09 18:17:26

领域: cs.LG

下载: http://arxiv.org/abs/2406.19300v2

Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback

Existing committee-based Byzantine state machine replication (SMR) protocols, typically deployed in production blockchains, face a clear trade-off: (1) they either achieve linear communication cost in the happy path, but sacrifice liveness during periods of asynchrony, or (2) they are robust (progress with probability one) but pay quadratic communication cost. We believe this trade-off is unwarranted since existing linear protocols still have asymptotic quadratic cost in the worst case. We design Ditto, a Byzantine SMR protocol that enjoys the best of both worlds: optimal communication on and off the happy path (linear and quadratic, respectively) and progress guarantee under asynchrony and DDoS attacks. We achieve this by replacing the view-synchronization of partially synchronous protocols with an asynchronous fallback mechanism at no extra asymptotic cost. Specifically, we start from HotStuff, a state-of-the-art linear protocol, and gradually build Ditto. As a separate contribution and an intermediate step, we design a 2-chain version of HotStuff, Jolteon, which leverages a quadratic view-change mechanism to reduce the latency of the standard 3-chain HotStuff. We implement and experimentally evaluate all our systems. Notably, Jolteon's commit latency outperforms HotStuff by 200-300ms with varying system size. Additionally, Ditto adapts to the network and provides better performance than Jolteon under faulty conditions and better performance than VABA (a state-of-the-art asynchronous protocol) under faultless conditions. This proves our case that breaking the robustness-efficiency trade-off is in the realm of practicality.

Updated: 2024-07-09 18:10:49

标题: 闪电兽和百变怪：具有异步回退的网络自适应高效共识

摘要: 现有的基于委员会的拜占庭状态机复制（SMR）协议，通常部署在生产区块链上，面临明显的折衷：（1）它们在正常情况下可以实现线性通信成本，但在异步期间牺牲活跃性，或者（2）它们虽然稳健（以概率一进展），但付出二次通信成本。我们认为这种折衷是不必要的，因为现有的线性协议在最坏情况下仍具有渐近二次成本。我们设计了Ditto，一种拜占庭SMR协议，它同时具有两者的优点：在正常和非正常情况下都能实现最佳通信（分别为线性和二次），并且能够在异步和DDoS攻击下保证进展。我们通过用异步回退机制取代部分同步协议的视图同步来实现这一点，而不增加渐近成本。具体来说，我们从HotStuff开始，逐步构建Ditto。作为一个独立的贡献和中间步骤，我们设计了HotStuff的二链版本Jolteon，它利用二次视图更改机制来降低标准3链HotStuff的延迟。我们实现并实验评估了所有系统。值得注意的是，Jolteon的提交延迟在不同系统规模下比HotStuff提高了200-300ms。此外，Ditto适应网络并在故障条件下提供比Jolteon更好的性能，在无故障条件下比VABA（一种最先进的异步协议）提供更好的性能。这证明了我们的观点，即打破稳健性和效率之间的折衷是切实可行的。

更新时间: 2024-07-09 18:10:49

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2106.10362v4

UEFI Vulnerability Signature Generation using Static and Symbolic Analysis

Since its major release in 2006, the Unified Extensible Firmware Interface (UEFI) has become the industry standard for interfacing a computer's hardware and operating system, replacing BIOS. UEFI has higher privileged security access to system resources than any other software component, including the system kernel. Hence, identifying and characterizing vulnerabilities in UEFI is extremely important for computer security. However, automated detection and characterization of UEFI vulnerabilities is a challenging problem. Static vulnerability analysis techniques are scalable but lack precision (reporting many false positives), whereas symbolic analysis techniques are precise but are hampered by scalability issues due to path explosion and the cost of constraint solving. In this paper, we introduce a technique called STatic Analysis guided Symbolic Execution (STASE), which integrates both analysis approaches to leverage their strengths and minimize their weaknesses. We begin with a rule-based static vulnerability analysis on LLVM bitcode to identify potential vulnerability targets for symbolic execution. We then focus symbolic execution on each target to achieve precise vulnerability detection and signature generation. STASE relies on the manual specification of reusable vulnerability rules and attacker-controlled inputs. However, it automates the generation of harnesses that guide the symbolic execution process, addressing the usability and scalability of symbolic execution, which typically requires manual harness generation to reduce the state space. We implemented and applied STASE to the implementations of UEFI code base. STASE detects and generates vulnerability signatures for 5 out of 9 recently reported PixieFail vulnerabilities and 13 new vulnerabilities in Tianocore's EDKII codebase.

Updated: 2024-07-09 18:08:49

标题: 使用静态和符号分析生成UEFI漏洞签名

摘要: 自2006年发布以来，统一可扩展固件接口（UEFI）已成为与计算机硬件和操作系统进行接口的行业标准，取代了BIOS。UEFI对系统资源具有比任何其他软件组件更高的特权安全访问权限，包括系统内核。因此，识别和表征UEFI中的漏洞对计算机安全至关重要。然而，自动检测和表征UEFI漏洞是一个具有挑战性的问题。静态漏洞分析技术具有可扩展性，但缺乏精确性（报告了许多误报），而符号分析技术精确但由于路径爆炸和约束求解成本而受到可扩展性问题的影响。在本文中，我们介绍了一种称为STatic Analysis guided Symbolic Execution（STASE）的技术，将两种分析方法整合在一起，利用它们的优势并最小化它们的弱点。我们从在LLVM位码上进行基于规则的静态漏洞分析开始，以识别符号执行的潜在漏洞目标。然后我们专注于每个目标的符号执行，以实现精确的漏洞检测和特征生成。STASE依赖于可重复使用的漏洞规则和攻击者控制的输入的手动规范。但是，它自动化生成引导符号执行过程的测试用例，解决了符号执行的可用性和可扩展性问题，通常需要手动生成测试用例以减少状态空间。我们实施并应用STASE到UEFI代码库的实现中。STASE检测并生成了最近报告的9个PixieFail漏洞中的5个漏洞和Tianocore的EDKII代码库中的13个新漏洞的漏洞特征。

更新时间: 2024-07-09 18:08:49

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.07166v1

Short-Long Policy Evaluation with Novel Actions

From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers. One major bottleneck in this innovation cycle is the amount of time it takes to observe the downstream effects of a decision policy that incorporates new interventions. The key question is whether we can quickly evaluate long-term outcomes of a new decision policy without making long-term observations. Organizations often have access to prior data about past decision policies and their outcomes, evaluated over the full horizon of interest. Motivated by this, we introduce a new setting for short-long policy evaluation for sequential decision making tasks. Our proposed methods significantly outperform prior results on simulators of HIV treatment, kidney dialysis and battery charging. We also demonstrate that our methods can be useful for applications in AI safety by quickly identifying when a new decision policy is likely to have substantially lower performance than past policies.

Updated: 2024-07-09 18:05:10

标题: 短长政策评估与新颖行动

摘要: 从将LLMs纳入教育中，到识别新药物并改善充电电池的方式，创新者不断尝试新战略，以寻求对学生、患者和消费者更好的长期结果。创新循环中的一个主要瓶颈是观察包含新干预措施的决策政策的下游效果所需的时间。关键问题是我们是否能够在不进行长期观察的情况下快速评估新决策政策的长期结果。组织通常可以获得有关过去决策政策及其结果的先前数据，这些数据在感兴趣的完整时间范围内进行评估。受此激励，我们为顺序决策任务引入了新的短长期政策评估设置。我们提出的方法在HIV治疗、肾透析和充电电池的模拟器上明显优于先前的结果。我们还证明了我们的方法可以在人工智能安全应用中发挥作用，快速识别新决策政策可能比过去政策表现明显更差的情况。

更新时间: 2024-07-09 18:05:10

领域: cs.LG

下载: http://arxiv.org/abs/2407.03674v2

RASP: A Drone-based Reconfigurable Actuation and Sensing Platform for Engaging Physical Environments with Foundation Models

Foundation models and large language models have shown immense human-like understanding and capabilities for generating text and digital media. However, foundation models that can freely sense, interact, and actuate the physical world like in the digital domain is far from being realized. This is due to a number of challenges including: 1) being constrained to the types of static devices and sensors deployed, 2) events often being localized to one part of a large space, and 3) requiring dense and deployments of devices to achieve full coverage. As a critical step towards enabling foundation models to successfully and freely interact with the physical environment, we propose RASP, a modular and reconfigurable sensing and actuation platform that allows drones to autonomously swap onboard sensors and actuators in only $25$ seconds, allowing a single drone to quickly adapt to a diverse range of tasks. We demonstrate through real smart home deployments that RASP enables FMs and LLMs to complete diverse tasks up to $85\%$ more successfully by allowing them to target specific areas with specific sensors and actuators on-the-fly.

Updated: 2024-07-09 18:03:46

标题: RASP：一种基于无人机的可重构驱动和感知平台，用于利用基础模型与物理环境互动

摘要: 基于基础模型和大型语言模型已展示出巨大的类人理解和生成文本和数字媒体的能力。然而，能够像数字领域那样自由感知、交互和执行物理世界的基础模型远未实现。这是由于一些挑战，包括：1）受限于部署的静态设备和传感器类型，2）事件通常局限于大空间的某一部分，3）需要密集部署设备才能实现全面覆盖。作为使基础模型成功并自由与物理环境互动的关键步骤，我们提出了RASP，一种模块化和可重构的感知和执行平台，允许无人机在仅25秒内自主更换机载传感器和执行器，使单个无人机能够快速适应各种任务。我们通过真实的智能家居部署展示了RASP使FMs和LLMs能够通过在运行时针对特定区域使用特定传感器和执行器，成功完成多样任务的能力提高了85%。

更新时间: 2024-07-09 18:03:46

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2403.12853v2

AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning

The pervasive deployment of Large Language Models-LLMs in various sectors often neglects the nuanced requirements of individuals and small organizations, who benefit more from models precisely tailored to their specific business contexts rather than those with broadly superior general capabilities. This work introduces \textbf{AnyTaskTune}, a novel fine-tuning methodology coined as \textbf{Task-Fine-Tune}, specifically developed to elevate model performance on a diverse array of domain-specific tasks. This method involves a meticulous process to identify and define targeted sub-tasks within a domain, followed by the creation of specialized enhancement datasets for fine-tuning, thereby optimizing task-specific model performance. We conducted comprehensive fine-tuning experiments not only in the legal domain for tasks such as keyword extraction and sentence prediction but across over twenty different sub-tasks derived from the domains of finance, healthcare, law, psychology, consumer services, and human resources. To substantiate our approach and facilitate community engagement, we will open-source these bilingual task datasets. Our findings demonstrate that models fine-tuned using the \textbf{Task-Fine-Tune} methodology not only achieve superior performance on these specific tasks but also significantly outperform models with higher general capabilities in their respective domains. Our work is publicly available at \url{https://github.com/PandaVT/DataTager}.

Updated: 2024-07-09 17:59:56

标题: AnyTaskTune：通过任务微调实现高级领域特定解决方案

摘要: 大型语言模型（LLMs）在各个领域的普遍部署往往忽视了个人和小型组织的微妙需求，这些个体和组织更多受益于专门定制的模型，而不是具有广泛优越通用能力的模型。本文介绍了一种被称为 \textbf{AnyTaskTune} 的新颖微调方法，即 \textbf{Task-Fine-Tune}，专门设计用于提高模型在各种领域特定任务上的表现。该方法涉及一个细致的过程，以确定和定义领域内的目标子任务，然后创建专门的增强数据集进行微调，从而优化任务特定模型的性能。我们进行了全面的微调实验，不仅在法律领域进行了关键词提取和句子预测等任务，还跨越了金融、医疗保健、法律、心理学、消费服务和人力资源等领域中派生出的二十多个不同子任务。为了证实我们的方法并促进社区参与，我们将开源这些双语任务数据集。我们的研究结果表明，使用 \textbf{Task-Fine-Tune} 方法进行微调的模型不仅在这些特定任务上实现了卓越的性能，而且在各自领域中明显优于具有更高通用能力的模型。我们的工作可在 \url{https://github.com/PandaVT/DataTager} 上公开获取。

更新时间: 2024-07-09 17:59:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.07094v1

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary LLM like BitNet b1.58) to match the performance of its full-precision counterparts (e.g., FP16 or BF16) in transformer-based LLMs. It achieves this by employing an autoregressive distillation (AD) loss with maintaining equivalent model dimensions (130M, 1.3B, 7B) and training data volume as regular LLM pretraining, while delivering competitive results in terms of perplexity and task-specific effectiveness. Intriguingly, by analyzing the training trajectory, we find that the pretrained weight is not necessary for training binarized LLMs from scratch. This research encourages a new computational framework and may facilitate the future design of specialized hardware tailored for fully 1-bit LLMs. We make all models, code, and training dataset fully accessible and transparent to support further research (Code: https://github.com/LiqunMa/FBI-LLM. Model: https://huggingface.co/LiqunMa/).

Updated: 2024-07-09 17:59:48

标题: FBI-LLM：通过自回归蒸馏从零开始扩展完全二值化的LLMs

摘要: 这项工作介绍了一个全二进制大型语言模型（FBI-LLM），首次展示了如何从头开始训练一个大规模的二进制语言模型（而不是像BitNet b1.58那样是部分二进制或三进制LLM）以匹配其全精度对应物（如FP16或BF16）在基于Transformer的LLM中的性能。它通过使用自回归蒸馏（AD）损失来实现这一点，保持等效的模型维度（130M、1.3B、7B）和训练数据量与常规LLM预训练相同，同时在困惑度和任务特定效果方面取得竞争性结果。有趣的是，通过分析训练轨迹，我们发现预训练权重对于从头开始训练二进制LLM并不是必要的。这项研究鼓励采用新的计算框架，可能有助于未来设计专门为全一位LLM量身定制的硬件。我们提供所有模型、代码和训练数据集的全部访问和透明度，以支持进一步的研究（代码：https://github.com/LiqunMa/FBI-LLM。模型：https://huggingface.co/LiqunMa/）。

更新时间: 2024-07-09 17:59:48

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.07093v1

V-VIPE: Variational View Invariant Pose Embedding

Learning to represent three dimensional (3D) human pose given a two dimensional (2D) image of a person, is a challenging problem. In order to make the problem less ambiguous it has become common practice to estimate 3D pose in the camera coordinate space. However, this makes the task of comparing two 3D poses difficult. In this paper, we address this challenge by separating the problem of estimating 3D pose from 2D images into two steps. We use a variational autoencoder (VAE) to find an embedding that represents 3D poses in canonical coordinate space. We refer to this embedding as variational view-invariant pose embedding V-VIPE. Using V-VIPE we can encode 2D and 3D poses and use the embedding for downstream tasks, like retrieval and classification. We can estimate 3D poses from these embeddings using the decoder as well as generate unseen 3D poses. The variability of our encoding allows it to generalize well to unseen camera views when mapping from 2D space. To the best of our knowledge, V-VIPE is the only representation to offer this diversity of applications. Code and more information can be found at https://v-vipe.github.io/.

Updated: 2024-07-09 17:59:47

标题: V-VIPE：变分视角不变姿势嵌入

摘要: 学习如何从一个二维图像中表示三维人体姿态是一个具有挑战性的问题。为了使问题变得更加明确，目前普遍的做法是在摄像机坐标空间中估计三维姿态。然而，这使得比较两个三维姿态的任务变得困难。本文通过将从二维图像估计三维姿态的问题分为两个步骤来解决这一挑战。我们使用变分自动编码器（VAE）来找到一个表示三维姿态的嵌入，该嵌入以规范坐标空间表示。我们将这个嵌入称为变分视图不变姿态嵌入V-VIPE。使用V-VIPE，我们可以编码二维和三维姿态，并将嵌入用于下游任务，如检索和分类。我们可以使用解码器从这些嵌入中估计三维姿态，同时生成看不见的三维姿态。我们的编码的可变性使其在从二维空间映射时能很好地推广到看不见的相机视图。据我们所知，V-VIPE是唯一能提供这种多样应用的表示。代码和更多信息可以在https://v-vipe.github.io/找到。

更新时间: 2024-07-09 17:59:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07092v1

Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic

Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space, by adding the fine-tuned weights of different tasks. The performance has been further improved by a linear property which is illustrated by weight disentanglement. Yet, conventional linearization methods (e.g., NTK linearization) not only double the time and training cost but also have a disadvantage on single-task performance. We propose a simple yet effective and efficient method that only fine-tunes linear layers, which improves weight disentanglement and efficiency simultaneously. Specifically, our study reveals that only fine-tuning the linear layers in the attention modules makes the whole model occur in a linear regime, significantly improving weight disentanglement. To further understand how our method improves the disentanglement of task arithmetic, we present a comprehensive study of task arithmetic by differentiating the role of representation model and task-specific model. In particular, we find that the representation model plays an important role in improving weight disentanglement whereas the task-specific models such as the classification heads can degenerate the weight disentanglement performance. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to editing pre-trained models.

Updated: 2024-07-09 17:59:17

标题: Feine-Tuning线性层仅仅是一种简单而有效的任务算术方式

摘要: 任务算法最近被提出作为一种成本效益高且可扩展的方法，可以直接在权重空间中编辑预训练模型，通过添加不同任务的微调权重。通过权重解缠示出的线性特性进一步改进了性能。然而，传统的线性化方法（例如，NTK线性化）不仅会使时间和训练成本翻倍，而且在单任务性能上也存在劣势。我们提出了一种简单而有效且高效的方法，只对线性层进行微调，从而同时改进了权重解缠和效率。具体来说，我们的研究发现，只对注意力模块中的线性层进行微调使整个模型进入线性区域，显著改善了权重解缠。为了进一步了解我们的方法如何改进任务算法的解缠，我们通过区分表示模型和特定任务模型的作用，对任务算法进行了全面研究。特别是，我们发现表示模型在改善权重解缠中发挥着重要作用，而分类头等特定任务模型可能会降低权重解缠性能。总的来说，我们的工作揭示了任务算法的基本机制的新见解，并提供了一种更可靠和有效的方法来编辑预训练模型。

更新时间: 2024-07-09 17:59:17

领域: cs.LG

下载: http://arxiv.org/abs/2407.07089v1

Safe and Reliable Training of Learning-Based Aerospace Controllers

In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and safety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of DRL controllers, which can help ensure their safe behavior. We showcase a design-for-verification approach utilizing k-induction and demonstrate its use in verifying liveness properties. In addition, we also give a brief overview of neural Lyapunov Barrier certificates and summarize their capabilities on a case study. Finally, we describe several other novel reachability-based approaches which, despite failing to provide guarantees of interest, could be effective for verification of other DRL systems, and could be of further interest to the community.

Updated: 2024-07-09 17:58:50

标题: 学习型航空航天控制器的安全可靠培训

摘要: 近年来，深度强化学习（DRL）方法已经为许多复杂领域生成了非常成功的控制器。然而，这些模型的不透明性限制了它们在航空航天系统和安全关键领域的适用性，因为一个错误可能会带来严重后果。在本文中，我们介绍了在DRL控制器的训练和验证方面的创新进展，这可以帮助确保它们的安全行为。我们展示了一种利用k-归纳的设计-验证方法，并展示了它在验证活跃性属性方面的应用。此外，我们还简要概述了神经李雅普诺夫屏障证书，并总结了它们在一个案例研究中的能力。最后，我们描述了几种基于可达性的其他创新方法，尽管未能提供感兴趣的保证，但可能对其他DRL系统的验证有效，并且可能引起社区的进一步兴趣。

更新时间: 2024-07-09 17:58:50

领域: cs.AI,cs.LO,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.07088v1

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. Using copyrighted fiction books as text sources, we provide automatic evaluation protocols to assess literal and non-literal copying, balanced against the model utility in terms of the ability to recall facts from the copyrighted works and generate fluent completions. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters. Larger models demonstrate significantly more copying, with literal copying rates increasing from 0.2% to 10.5% and non-literal copying from 2.3% to 6.9% when comparing Llama3-8B and 70B models, respectively. We further evaluate the effectiveness of current strategies for mitigating copying and show that (1) training-time alignment can reduce literal copying but may increase non-literal copying, and (2) current inference-time mitigation methods primarily reduce literal but not non-literal copying.

Updated: 2024-07-09 17:58:18

标题: CopyBench：衡量语言模型生成中受版权保护文本的直接和非直接复制

摘要: 评估语言模型（LMs）对受版权保护内容的复制程度对人工智能和法律界具有重要意义。法院在评估复制程度时考虑了文字和非文字相似之处，但先前的研究只关注了文字相似之处。为了弥补这一差距，我们引入了CopyBench，一个旨在测量LM生成中的文字和非文字复制的基准。使用受版权保护的小说作为文本来源，我们提供了自动评估方案，以评估文字和非文字复制，同时平衡模型在从受版权作品中召回事实和生成流畅完整性方面的实用性。我们发现，虽然文字复制相对较少，但两种非文字复制--事件复制和角色复制--甚至在参数仅为7B的模型中也存在。较大的模型展示出更多的复制，当比较Llama3-8B和70B模型时，文字复制率从0.2％增加到10.5％，非文字复制率从2.3％增加到6.9％。我们进一步评估了当前减少复制的策略的有效性，并显示（1）训练时对齐可以减少文字复制，但可能增加非文字复制，（2）当前推理时的减少方法主要减少文字而不是非文字复制。

更新时间: 2024-07-09 17:58:18

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.07087v1

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Multi-agent reinforcement learning (MARL) methods struggle with the non-stationarity of multi-agent systems and fail to adaptively learn online when tested with novel agents. Here, we leverage large language models (LLMs) to create an autonomous agent that can handle these challenges. Our agent, Hypothetical Minds, consists of a cognitively-inspired architecture, featuring modular components for perception, memory, and hierarchical planning over two levels of abstraction. We introduce the Theory of Mind module that scaffolds the high-level planning process by generating hypotheses about other agents' strategies in natural language. It then evaluates and iteratively refines these hypotheses by reinforcing hypotheses that make correct predictions about the other agents' behavior. Hypothetical Minds significantly improves performance over previous LLM-agent and RL baselines on a range of competitive, mixed motive, and collaborative domains in the Melting Pot benchmark, including both dyadic and population-based environments. Additionally, comparisons against LLM-agent baselines and ablations reveal the importance of hypothesis evaluation and refinement for succeeding on complex scenarios.

Updated: 2024-07-09 17:57:15

标题: 假设性思维：为具有大型语言模型的多智能体任务构建心智支架理论

摘要: 多智能体强化学习（MARL）方法在多智能体系统的非稳态性方面面临困难，并且在测试新型智能体时无法实现在线自适应学习。在这里，我们利用大型语言模型（LLMs）创建了一个能够处理这些挑战的自主智能体。我们的智能体“假设思维”由一个受认知启发的架构组成，具有感知、记忆和分层规划的模块化组件，涵盖两个抽象层次。我们引入了心理理论模块，通过生成关于其他智能体策略的假设并用自然语言评估和迭代地完善这些假设，来支撑高层规划过程。然后，通过强化能够正确预测其他智能体行为的假设来评估和不断完善这些假设。假设思维在Melting Pot基准测试中的一系列竞争性、混合动机和合作领域中显著提高了性能，包括二元和基于人口的环境。此外，与LLM-智能体基准线和消融实验的比较揭示了假设评估和完善对于在复杂场景中取得成功的重要性。

更新时间: 2024-07-09 17:57:15

领域: cs.AI

下载: http://arxiv.org/abs/2407.07086v1

Cardinality-Aware Set Prediction and Top-$k$ Classification

We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted. To optimize this loss function, we propose two families of surrogate losses: cost-sensitive comp-sum losses and cost-sensitive constrained losses. Minimizing these loss functions leads to new cardinality-aware algorithms that we describe in detail in the case of both top-$k$ and threshold-based classifiers. We establish $H$-consistency bounds for our cardinality-aware surrogate loss functions, thereby providing a strong theoretical foundation for our algorithms. We report the results of extensive experiments on CIFAR-10, CIFAR-100, ImageNet, and SVHN datasets demonstrating the effectiveness and benefits of our cardinality-aware algorithms.

Updated: 2024-07-09 17:57:07

标题: 基数感知集合预测和Top-$k$分类

摘要: 我们提出了对基数感知的top-$k$分类进行详细研究，这是一种旨在学习准确的top-$k$集预测器的新方法，同时保持低基数。我们引入了一种针对这种情况量身定制的新目标损失函数，考虑了分类错误和预测集的基数。为了优化这个损失函数，我们提出了两种替代损失函数：成本敏感的comp-sum损失和成本敏感的约束损失。最小化这些损失函数导致了新的基数感知算法，我们在top-$k$和基于阈值的分类器的情况下详细描述了这些算法。我们为基数感知替代损失函数建立了$H$-一致性界限，从而为我们的算法提供了坚实的理论基础。我们在CIFAR-10、CIFAR-100、ImageNet和SVHN数据集上进行了大量实验，展示了我们的基数感知算法的有效性和优势。

更新时间: 2024-07-09 17:57:07

领域: cs.LG

下载: http://arxiv.org/abs/2407.07140v1

Stabilized Proximal-Point Methods for Federated Optimization

In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern federated learning settings. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that solves local subproblems in each iteration and that can exploit second-order similarity among individual functions. However, to achieve such communication efficiency, the accuracy requirement for solving the local subproblems is slightly sub-optimal. Inspired by the hybrid projection-proximal point method, in this work, we i) propose a novel distributed algorithm S-DANE. This method adopts a more stabilized prox-center in the proximal step compared with DANE, and matches its deterministic communication complexity. Moreover, the accuracy condition of the subproblem is milder, leading to enhanced local computation efficiency. Furthermore, it supports partial client participation and arbitrary stochastic local solvers, making it more attractive in practice. We further ii) accelerate S-DANE, and show that the resulting algorithm achieves the best-known communication complexity among all existing methods for distributed convex optimization, with the same improved local computation efficiency as S-DANE.

Updated: 2024-07-09 17:56:29

标题: 稳定的联邦优化的近端点方法

摘要: 在开发高效的优化算法时，必须考虑通信约束——这是现代联邦学习环境中的一个重要挑战。在非加速算法中，DANE是已知的通信复杂度最低的算法，它是一种分布式近端点算法，可以在每次迭代中解决本地子问题，并且可以利用个体函数之间的二阶相似性。然而，为了实现这种通信效率，解决本地子问题的精度要求略低于最优。受混合投影-近端点方法的启发，在这项工作中，我们i)提出了一种新颖的分布式算法S-DANE。与DANE相比，该方法在近端步骤中采用了更稳定的近端中心，并且与其匹配确定性通信复杂度。此外，子问题的精度条件更为宽松，导致了增强的本地计算效率。此外，它支持部分客户端参与和任意随机本地求解器，使其在实践中更具吸引力。我们进一步ii)加速了S-DANE，并展示了由此产生的算法在分布式凸优化中实现了已知的最佳通信复杂度，同时具有与S-DANE相同的改进本地计算效率。

更新时间: 2024-07-09 17:56:29

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2407.07084v1

Can Learned Optimization Make Reinforcement Learning Less Difficult?

While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whether learned optimization can help overcome these problems. Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed solutions to these difficulties. We show that our parameterization is flexible enough to enable meta-learning in diverse learning contexts, including the ability to use stochasticity for exploration. Our experiments demonstrate that when meta-trained on single and small sets of environments, OPEN outperforms or equals traditionally used optimizers. Furthermore, OPEN shows strong generalization across a distribution of environments and a range of agent architectures.

Updated: 2024-07-09 17:55:23

标题: 学习优化是否能使强化学习变得更容易？

摘要: 虽然强化学习（RL）在现实世界的决策制定中具有巨大潜力，但它面临许多独特困难，通常需要特别考虑。特别是：它非常不稳定；受到高度可塑性损失的影响；并且需要探索以防止过早收敛到局部最优解并最大化回报。在本文中，我们考虑学习优化是否可以帮助克服这些问题。我们的方法，即学习优化以应对可塑性、探索和非稳态性（OPEN），元学习一个更新规则，其输入特征和输出结构受到先前提出的解决方案的启发。我们展示了我们的参数化足够灵活，可以在多样的学习环境中进行元学习，包括利用随机性进行探索的能力。我们的实验表明，当在单个和小型环境集上进行元训练时，OPEN的性能优于或等同于传统使用的优化器。此外，OPEN在环境分布和一系列代理架构上展现出强大的泛化能力。

更新时间: 2024-07-09 17:55:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.07082v1

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

While personalized text-to-image generation has enabled the learning of a single concept from multiple images, a more practical yet challenging scenario involves learning multiple concepts within a single image. However, existing works tackling this scenario heavily rely on extensive human annotations. In this paper, we introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts. Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models. To achieve this, we present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects. Specifically, a concept localization approach automatically locates and disentangles salient concepts by leveraging spatial correspondence from diffusion self-attention; and based on the lookup association between a concept and a conceptual token, a concept-wise optimization process learns discriminative tokens that represent each individual concept. Finally, we establish an evaluation protocol tailored for the UCE task. Extensive experiments demonstrate that ConceptExpress is a promising solution to the UCE task. Our code and data are available at: https://github.com/haoosz/ConceptExpress

Updated: 2024-07-09 17:50:28

标题: ConceptExpress：利用扩散模型进行单图像无监督概念提取

摘要: 个性化文本到图像生成已经实现了从多个图像中学习单个概念，但更实践且具有挑战性的场景涉及在单个图像内学习多个概念。然而，现有研究解决这种场景的作品很大程度上依赖于广泛的人工注释。在本文中，我们介绍了一项名为无监督概念提取（UCE）的新任务，该任务考虑了一个没有任何人类概念知识的无监督设置。给定包含多个概念的图像，该任务旨在仅依靠预训练扩散模型的现有知识提取和重新创建个别概念。为了实现这一目标，我们提出了ConceptExpress，通过发挥预训练扩散模型的固有能力，在两个方面解决UCE。具体来说，概念定位方法通过利用扩散自注意力的空间对应关系自动定位和解开显著概念，并且基于概念和概念标记之间的查找关联，一个概念优化过程学习表示每个个体概念的判别标记。最后，我们建立了一个专门针对UCE任务的评估协议。大量实验证明ConceptExpress是UCE任务的一个有前途的解决方案。我们的代码和数据可在以下链接找到：https://github.com/haoosz/ConceptExpress

更新时间: 2024-07-09 17:50:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07077v1

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

Updated: 2024-07-09 17:44:34

标题: 回顾镜头：仅使用注意力图检测和减轻大型语言模型中的上下文幻觉

摘要: 当被要求总结文章或回答问题时，大型语言模型（LLMs）可能会产生细节幻觉，并给出与输入上下文不符的不准确答案。本文描述了一种检测这种上下文幻觉的简单方法。我们假设上下文幻觉与LLM关注所提供上下文信息与其自身生成信息的程度有关。基于这一直觉，我们提出了一个简单的幻觉检测模型，其输入特征由在上下文和新生成标记之间的注意权重比率（对于每个注意头）给出。我们发现，基于这些回溯比率特征的线性分类器与利用LLM的整个隐藏状态或基于文本的蕴涵模型的更丰富检测器一样有效。基于回溯比率的检测器——回溯镜头——被发现能够跨任务和模型传递，允许一个在7B模型上训练的检测器应用于更大的13B模型（无需重新训练）。我们进一步应用这个检测器来减少上下文幻觉，并发现一个简单的分类器引导解码方法能够减少幻觉数量，例如在XSum摘要任务中减少了9.6%。

更新时间: 2024-07-09 17:44:34

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.07071v1

Stealing Part of a Production Language Model

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.

Updated: 2024-07-09 17:44:00

标题: 窃取生产语言模型的一部分

摘要: 我们介绍了第一种模型窃取攻击，可以从类似于OpenAI的ChatGPT或Google的PaLM-2的黑匣子生产语言模型中提取精确、非平凡的信息。具体而言，我们的攻击可以在典型的API访问下恢复transformer模型的嵌入投影层（考虑对称性）。在不到20美元的成本下，我们的攻击可以提取OpenAI的Ada和Babbage语言模型的整个投影矩阵。因此，我们首次确认这些黑匣子模型分别具有隐藏维度1024和2048。我们还恢复了gpt-3.5-turbo模型的确切隐藏维度大小，并估计恢复整个投影矩阵将花费不到2000美元的查询成本。我们最后讨论了潜在的防御和缓解措施，并探讨了可能扩展我们攻击的未来工作的影响。

更新时间: 2024-07-09 17:44:00

领域: cs.CR

下载: http://arxiv.org/abs/2403.06634v2

Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

Updated: 2024-07-09 17:43:06

标题: 第二届可解释人工智能艺术国际研讨会论文集 (XAIxArts)

摘要: 这次第二届国际艺术可解释人工智能研讨会（XAIxArts）汇集了人机交互、交互设计、人工智能、可解释人工智能（XAI）和数字艺术领域的研究者社群，探讨了XAI在艺术领域的作用。研讨会在第16届ACM创意与认知会议（C&C 2024）上举行，地点为美国芝加哥。

更新时间: 2024-07-09 17:43:06

领域: cs.AI,cs.HC,cs.MM,cs.SD,eess.AS

下载: http://arxiv.org/abs/2406.14485v4

Explainable Hyperdimensional Computing for Balancing Privacy and Transparency in Additive Manufacturing Monitoring

In-situ sensing, in conjunction with learning models, presents a unique opportunity to address persistent defect issues in Additive Manufacturing (AM) processes. However, this integration introduces significant data privacy concerns, such as data leakage, sensor data compromise, and model inversion attacks, revealing critical details about part design, material composition, and machine parameters. Differential Privacy (DP) models, which inject noise into data under mathematical guarantees, offer a nuanced balance between data utility and privacy by obscuring traces of sensing data. However, the introduction of noise into learning models, often functioning as black boxes, complicates the prediction of how specific noise levels impact model accuracy. This study introduces the Differential Privacy-HyperDimensional computing (DP-HD) framework, leveraging the explainability of the vector symbolic paradigm to predict the noise impact on the accuracy of in-situ monitoring, safeguarding sensitive data while maintaining operational efficiency. Experimental results on real-world high-speed melt pool data of AM for detecting overhang anomalies demonstrate that DP-HD achieves superior operational efficiency, prediction accuracy, and robust privacy protection, outperforming state-of-the-art Machine Learning (ML) models. For example, when implementing the same level of privacy protection (with a privacy budget set at 1), our model achieved an accuracy of 94.43\%, surpassing the performance of traditional models such as ResNet50 (52.30\%), GoogLeNet (23.85\%), AlexNet (55.78\%), DenseNet201 (69.13\%), and EfficientNet B2 (40.81\%). Notably, DP-HD maintains high performance under substantial noise additions designed to enhance privacy, unlike current models that suffer significant accuracy declines under high privacy constraints.

Updated: 2024-07-09 17:42:26

标题: 可解释的高维计算：在增材制造监测中平衡隐私和透明度

摘要: 在现场感知与学习模型相结合的情况下，为解决增材制造（AM）过程中持续存在的缺陷问题提供了独特的机会。然而，这种整合引入了重要的数据隐私问题，如数据泄露、传感器数据 compromi、模型反演攻击，揭示了零件设计、材料组成和机器参数的关键细节。差分隐私（DP）模型将噪声注入数据，在数学保证下，通过模糊传感数据的痕迹，为数据效用和隐私之间提供微妙的平衡。然而，将噪声引入学习模型，通常作为黑匣子，使得预测特定噪声水平如何影响模型准确性变得复杂。本研究介绍了差分隐私-高维计算（DP-HD）框架，利用向量符号范式的可解释性来预测噪声对现场监测准确性的影响，保护敏感数据同时保持运营效率。对AM的真实高速熔池数据进行的实验结果表明，DP-HD在检测悬挑异常方面实现了卓越的运营效率、预测准确性和稳健的隐私保护，优于最先进的机器学习（ML）模型。例如，当实现相同水平的隐私保护（隐私预算设置为1）时，我们的模型实现了94.43%的准确率，超过了传统模型如ResNet50（52.30%）、GoogLeNet（23.85%）、AlexNet（55.78%）、DenseNet201（69.13%）和EfficientNet B2（40.81%）的性能。值得注意的是，DP-HD在设计用于增强隐私的大量噪声添加下仍保持高性能，而目前的模型在高隐私约束下会出现显著的准确性下降。

更新时间: 2024-07-09 17:42:26

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2407.07066v1

Non-Asymptotic Performance of Social Machine Learning Under Limited Data

This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. In this work, we examine the classification task with limited observations during the decision-making phase, which requires a non-asymptotic performance analysis. We establish a condition for consistent training and derive an upper bound on the probability of error for classification. The results clarify the dependence on the statistical properties of the data and the combination policy used over the graph. They also establish the exponential decay of the probability of error with respect to the number of unlabeled samples.

Updated: 2024-07-09 17:39:58

标题: 有限数据条件下社交机器学习的非渐近性能

摘要: 本文研究了与社交机器学习框架相关的错误概率，该框架涉及独立的训练阶段，然后在图上进行合作决策阶段。该框架解决了以分布方式对一系列未标记数据进行分类的问题。在这项工作中，我们研究了在决策阶段观察有限的情况下的分类任务，这需要进行非渐近性能分析。我们建立了一致训练的条件，并推导了分类错误概率的上界。结果阐明了对数据的统计属性和在图上使用的组合策略的依赖关系。它们还建立了与未标记样本数量呈指数衰减的错误概率。

更新时间: 2024-07-09 17:39:58

领域: cs.LG,cs.MA,eess.SP

下载: http://arxiv.org/abs/2306.09397v2

VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

Binary similarity involves determining whether two binary programs exhibit similar functionality, often originating from the same source code. In this work, we propose VexIR2Vec, an approach for binary similarity using VEX-IR, an architecture-neutral Intermediate Representation (IR). We extract the embeddings from sequences of basic blocks, termed peepholes, derived by random walks on the control-flow graph. The peepholes are normalized using transformations inspired by compiler optimizations. The VEX-IR Normalization Engine mitigates, with these transformations, the architectural and compiler-induced variations in binaries while exposing semantic similarities. We then learn the vocabulary of representations at the entity level of the IR using the knowledge graph embedding techniques in an unsupervised manner. This vocabulary is used to derive function embeddings for similarity assessment using VexNet, a feed-forward Siamese network designed to position similar functions closely and separate dissimilar ones in an n-dimensional space. This approach is amenable for both diffing and searching tasks, ensuring robustness against Out-Of-Vocabulary (OOV) issues. We evaluate VexIR2Vec on a dataset comprising 2.7M functions and 15.5K binaries from 7 projects compiled across 12 compilers targeting x86 and ARM architectures. In diffing experiments, VexIR2Vec outperforms the nearest baselines by $40\%$, $18\%$, $21\%$, and $60\%$ in cross-optimization, cross-compilation, cross-architecture, and obfuscation settings, respectively. In the searching experiment, VexIR2Vec achieves a mean average precision of $0.76$, outperforming the nearest baseline by $46\%$. Our framework is highly scalable and is built as a lightweight, multi-threaded, parallel library using only open-source tools. VexIR2Vec is $3.1$-$3.5 \times$ faster than the closest baselines and orders-of-magnitude faster than other tools.

Updated: 2024-07-09 17:38:42

标题: VEXIR2Vec：一种用于二进制相似度的与架构无关的嵌入框架

摘要: 二进制相似性涉及确定两个二进制程序是否具有类似的功能，通常源自相同的源代码。在这项工作中，我们提出了VexIR2Vec，一种使用VEX-IR的二进制相似性方法，VEX-IR是一种与架构无关的中间表示（IR）。我们从基本块序列中提取嵌入，称为peepholes，通过对控制流图上的随机游走得到。这些peepholes使用受编译器优化启发的转换进行规范化。VEX-IR标准化引擎通过这些转换减轻了二进制文件中的架构和编译器诱导变化，同时展现了语义相似性。然后，我们以无监督的方式使用知识图嵌入技术在IR的实体级别学习表示的词汇。这个词汇被用来为相似性评估推导函数嵌入，使用VexNet，一个设计用于在n维空间中将相似函数紧密放置并分离不相似函数的前馈孪生网络。这种方法适用于差异和搜索任务，确保抵抗词汇外（OOV）问题的稳健性。我们在一个数据集上评估了VexIR2Vec，该数据集包含来自7个项目编译的12个编译器针对x86和ARM架构的2.7M个函数和15.5K个二进制文件。在差异实验中，VexIR2Vec在交叉优化、交叉编译、交叉架构和混淆设置中分别比最近的基线表现提高了40％、18％、21％和60％。在搜索实验中，VexIR2Vec的平均精度达到了0.76，比最近的基准提高了46％。我们的框架具有高度可伸缩性，构建为一个轻量级、多线程、并行库，只使用开源工具。VexIR2Vec比最接近的基线快3.1-3.5倍，并比其他工具快几个数量级。

更新时间: 2024-07-09 17:38:42

领域: cs.PL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2312.00507v2

Prompting Techniques for Secure Code Generation: A Systematic Investigation

Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. OBJECTIVE: In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. METHOD: First we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code-generation prompts. RESULTS: Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.

Updated: 2024-07-09 17:38:03

标题: 安全代码生成的提示技术：系统性研究

摘要: 大型语言模型（LLMs）在软件开发中日益受到关注，通过提示驱动编程，使开发人员能够根据自然语言（NL）指令创建代码。然而，一些研究质疑它们生成安全代码的能力，从而影响提示生成的软件质量。与此同时，各种精心设计提示的提示技术已经出现，以引出LLMs的最佳响应。然而，这些提示策略与安全代码生成之间的相互作用仍未得到充分探讨，需要进一步研究。目标：在本研究中，我们调查了不同提示技术对LLMs生成的代码安全性的影响。方法：首先，我们进行系统文献综述，以确定可用于代码生成任务的现有提示技术。然后，在GPT-3、GPT-3.5和GPT-4模型上评估这些技术的子集用于安全代码生成。为此，我们使用一个包含150个NL安全相关代码生成提示的现有数据集。结果：我们的工作（i）对代码生成的潜在提示技术进行分类，（ii）调整并评估一部分已确定的技术用于安全代码生成任务，（iii）观察到在经过测试的LLMs中的安全弱点减少，尤其是在使用一种名为Recursive Criticism and Improvement（RCI）的现有技术后，为关于LLM生成代码安全性的持续讨论提供宝贵见解。

更新时间: 2024-07-09 17:38:03

领域: cs.SE,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.07064v1

Differentiable Optimization of Similarity Scores Between Models and Brains

What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates, and optimize synthetic datasets to become more similar to these neural recordings. How similar can these synthetic datasets be to neural activity while failing to encode task relevant variables? We find that some measures like linear regression and CKA, differ from angular Procrustes, and yield high similarity scores even when task relevant variables cannot be linearly decoded from the synthetic datasets. Synthetic datasets optimized to maximize similarity scores initially learn the first principal component of the target dataset, but angular Procrustes captures higher variance dimensions much earlier than methods like linear regression and CKA. We show in both theory and simulations how these scores change when different principal components are perturbed. And finally, we jointly optimize multiple similarity scores to find their allowed ranges, and show that a high angular Procrustes similarity, for example, implies a high CKA score, but not the converse.

Updated: 2024-07-09 17:31:47

标题: 模型和大脑之间相似性分数的可微优化

摘要: 什么指标应该指导更现实的大脑模型的发展？一个建议是使用线性回归、中心核对齐（CKA）和角度Procrustes距离等方法来量化模型和大脑之间的相似性。为了更好地了解这些相似性度量的局限性，我们分析了记录在五个非人灵长类动物实验中的神经活动，并优化合成数据集以使其更类似于这些神经记录。这些合成数据集与神经活动有多相似，同时又无法编码任务相关变量？我们发现一些度量方法如线性回归和CKA，与角度Procrustes有所不同，并在任务相关变量无法从合成数据集中线性解码时产生高相似性分数。最初为最大化相似性分数而优化的合成数据集最初学习目标数据集的第一个主成分，但角度Procrustes比线性回归和CKA等方法更早地捕捉到更高方差的维度。我们在理论和模拟中展示了当不同主成分受到干扰时这些分数如何变化。最后，我们联合优化多个相似性分数以找到它们的允许范围，并展示例如高角度Procrustes相似性意味着高CKA分数，但反之则不成立。

更新时间: 2024-07-09 17:31:47

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2407.07059v1

Gaussian Interpolation Flows

Gaussian denoising has emerged as a powerful method for constructing simulation-free continuous normalizing flows for generative modeling. Despite their empirical successes, theoretical properties of these flows and the regularizing effect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free continuous normalizing flows built on Gaussian denoising. Through a unified framework termed Gaussian interpolation flow, we establish the Lipschitz regularity of the flow velocity field, the existence and uniqueness of the flow, and the Lipschitz continuity of the flow map and the time-reversed flow map for several rich classes of target distributions. This analysis also sheds light on the auto-encoding and cycle consistency properties of Gaussian interpolation flows. Additionally, we study the stability of these flows in source distributions and perturbations of the velocity field, using the quadratic Wasserstein distance as a metric. Our findings offer valuable insights into the learning techniques employed in Gaussian interpolation flows for generative modeling, providing a solid theoretical foundation for end-to-end error analyses of learning Gaussian interpolation flows with empirical observations.

Updated: 2024-07-09 17:30:34

标题: 高斯插值流

摘要: 高斯去噪已经成为一种构建基于模拟无关的连续正态流进行生成建模的强大方法。尽管它们在实证上取得了成功，但这些流的理论性质以及高斯去噪的正则化效果仍然未被广泛探索。在这项工作中，我们旨在通过研究建立在高斯去噪基础上的模拟无关的连续正态流的完整性来填补这一空白。通过一个统一的框架称为高斯插值流，我们建立了流速场的利普希茨正则性，流的存在性和唯一性，以及流映射和时间反转流映射对于几个丰富类别的目标分布的利普希茨连续性。这一分析还揭示了高斯插值流的自编码和循环一致性属性。此外，我们使用二次Wasserstein距离作为度量，研究了这些流在源分布和速度场扰动中的稳定性。我们的发现为生成建模中使用的高斯插值流学习技术提供了有价值的见解，为学习具有实证观察的高斯插值流的端到端错误分析提供了坚实的理论基础。

更新时间: 2024-07-09 17:30:34

领域: stat.ML,cs.LG,62G05, 68T07

下载: http://arxiv.org/abs/2311.11475v2

milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing

Human motion sensing plays a crucial role in smart systems for decision-making, user interaction, and personalized services. Extensive research that has been conducted is predominantly based on cameras, whose intrusive nature limits their use in smart home applications. To address this, mmWave radars have gained popularity due to their privacy-friendly features. In this work, we propose milliFlow, a novel deep learning approach to estimate scene flow as complementary motion information for mmWave point cloud, serving as an intermediate level of features and directly benefiting downstream human motion sensing tasks. Experimental results demonstrate the superior performance of our method when compared with the competing approaches. Furthermore, by incorporating scene flow information, we achieve remarkable improvements in human activity recognition and human parsing and support human body part tracking.

Updated: 2024-07-09 17:29:35

标题: milliFlow: 毫米波雷达点云上的场景流估计，用于人体运动感知

摘要: 人体运动感知在智能系统中扮演着决策、用户交互和个性化服务的关键角色。已进行的广泛研究主要基于摄像头，然而其侵入性限制了在智能家居应用中的使用。为解决这一问题，毫米波雷达因其隐私友好的特性而备受青睐。在这项工作中，我们提出了一种新颖的深度学习方法milliFlow，用于估算场景流，作为mmWave点云的补充运动信息，作为中间特征级别并直接有益于下游人体运动感知任务。实验结果表明，与竞争方法相比，我们的方法表现出卓越的性能。此外，通过整合场景流信息，我们在人体活动识别和人体解析方面取得了显著的改进，并支持人体部位跟踪。

更新时间: 2024-07-09 17:29:35

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.17010v7

Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process

Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first latent diffusion segmentation model, named SDSeg, built upon stable diffusion (SD). SDSeg incorporates a straightforward latent estimation strategy to facilitate a single-step reverse process and utilizes latent fusion concatenation to remove the necessity for multiple samples. Extensive experiments indicate that SDSeg surpasses existing state-of-the-art methods on five benchmark datasets featuring diverse imaging modalities. Remarkably, SDSeg is capable of generating stable predictions with a solitary reverse step and sample, epitomizing the model's stability as implied by its name. The code is available at https://github.com/lin-tianyu/Stable-Diffusion-Seg

Updated: 2024-07-09 17:25:27

标题: 稳定的扩散分割方法用于具有单步反向过程的生物医学图像

摘要: 扩散模型已经在各种生成任务中展示了它们的有效性。然而，当应用于医学图像分割时，这些模型会遇到一些挑战，包括显著的资源和时间需求。它们还需要一个多步反向过程和多个样本才能产生可靠的预测。为了解决这些挑战，我们引入了第一个潜在扩散分割模型，命名为SDSeg，建立在稳定扩散（SD）之上。SDSeg采用了一种直观的潜在估计策略，以促进单步反向过程，并利用潜在融合串联来消除多个样本的必要性。大量实验证明，SDSeg在五个具有不同成像模态的基准数据集上超过了现有最先进的方法。值得注意的是，SDSeg能够通过一个孤立的反向步骤和样本生成稳定的预测，体现出模型的稳定性，正如其名称所暗示的那样。代码可在https://github.com/lin-tianyu/Stable-Diffusion-Seg 上找到。

更新时间: 2024-07-09 17:25:27

领域: cs.CV,cs.AI,eess.IV

下载: http://arxiv.org/abs/2406.18361v3

Multicell-Fold: geometric learning in folding multicellular life

During developmental processes such as embryogenesis, how a group of cells fold into specific structures, is a central question in biology that defines how living organisms form. Establishing tissue-level morphology critically relies on how every single cell decides to position itself relative to its neighboring cells. Despite its importance, it remains a major challenge to understand and predict the behavior of every cell within the living tissue over time during such intricate processes. To tackle this question, we propose a geometric deep learning model that can predict multicellular folding and embryogenesis, accurately capturing the highly convoluted spatial interactions among cells. We demonstrate that multicellular data can be represented with both granular and foam-like physical pictures through a unified graph data structure, considering both cellular interactions and cell junction networks. We successfully use our model to achieve two important tasks, interpretable 4-D morphological sequence alignment, and predicting local cell rearrangements before they occur at single-cell resolution. Furthermore, using an activation map and ablation studies, we demonstrate that cell geometries and cell junction networks together regulate local cell rearrangement which is critical for embryo morphogenesis. This approach provides a novel paradigm to study morphogenesis, highlighting a unified data structure and harnessing the power of geometric deep learning to accurately model the mechanisms and behaviors of cells during development. It offers a pathway toward creating a unified dynamic morphological atlas for a variety of developmental processes such as embryogenesis.

Updated: 2024-07-09 17:21:49

标题: 多细胞折叠：在多细胞生命中的几何学习

摘要: 在发育过程中，一群细胞如何折叠成特定结构，是生物学中一个核心问题，它定义了生物形成的方式。建立组织水平形态在很大程度上取决于每个单个细胞如何决定相对于其相邻细胞的位置。尽管其重要性，理解并预测在这种复杂过程中每个细胞在活体组织中随时间的行为仍然是一个重大挑战。为了解决这个问题，我们提出了一个几何深度学习模型，可以预测多细胞折叠和胚胎发育，准确捕捉细胞之间高度复杂的空间相互作用。我们证明多细胞数据可以通过统一的图数据结构表示为颗粒和泡沫状的物理图像，考虑细胞之间的相互作用和细胞连接网络。我们成功地使用我们的模型完成了两个重要任务，可解释的4-D形态序列对齐和预测局部细胞重排在发生之前的单细胞分辨率。此外，通过激活图和消蚀研究，我们证明细胞几何形状和细胞连接网络共同调节局部细胞重排，这对胚胎形态发生至关重要。这种方法提供了一种新的范式来研究形态发生，突出了统一的数据结构，并利用几何深度学习的力量来准确建模细胞在发育过程中的机制和行为。它为创建各种发育过程（如胚胎发育）的统一动态形态图谱提供了一条途径。

更新时间: 2024-07-09 17:21:49

领域: cond-mat.soft,cs.LG,physics.bio-ph

下载: http://arxiv.org/abs/2407.07055v1

A Differentially Private Blockchain-Based Approach for Vertical Federated Learning

We present the Differentially Private Blockchain-Based Vertical Federal Learning (DP-BBVFL) algorithm that provides verifiability and privacy guarantees for decentralized applications. DP-BBVFL uses a smart contract to aggregate the feature representations, i.e., the embeddings, from clients transparently. We apply local differential privacy to provide privacy for embeddings stored on a blockchain, hence protecting the original data. We provide the first prototype application of differential privacy with blockchain for vertical federated learning. Our experiments with medical data show that DP-BBVFL achieves high accuracy with a tradeoff in training time due to on-chain aggregation. This innovative fusion of differential privacy and blockchain technology in DP-BBVFL could herald a new era of collaborative and trustworthy machine learning applications across several decentralized application domains.

Updated: 2024-07-09 17:20:49

标题: 一个基于区块链的差分隐私垂直联邦学习方法

摘要: 我们提出了差分私有区块链垂直联邦学习（DP-BBVFL）算法，为去中心化应用提供了可验证性和隐私保障。DP-BBVFL使用智能合约透明地聚合来自客户端的特征表示，即嵌入。我们应用本地差分隐私来为存储在区块链上的嵌入提供隐私保护，从而保护原始数据。我们提供了差分隐私与区块链在垂直联邦学习中的首个原型应用。我们对医疗数据的实验表明，DP-BBVFL在训练时间上有一定的折衷，但达到了高准确性。DP-BBVFL中差分隐私和区块链技术的创新融合可能预示着跨多个去中心化应用领域的协作和可信机器学习应用的新时代。

更新时间: 2024-07-09 17:20:49

领域: cs.CR,cs.ET,cs.LG

下载: http://arxiv.org/abs/2407.07054v1

CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

Multimodal sentiment analysis is an active research area that combines multiple data modalities, e.g., text, image and audio, to analyze human emotions and benefits a variety of applications. Existing multimodal sentiment analysis methods can be classified as modality interaction-based methods, modality transformation-based methods and modality similarity-based methods. However, most of these methods highly rely on the strong correlations between modalities, and cannot fully uncover and utilize the correlations between modalities to enhance sentiment analysis. Therefore, these methods usually achieve bad performance for identifying the sentiment of multimodal data with weak correlations. To address this issue, we proposed a two-stage semi-supervised model termed Correlation-aware Multimodal Transformer (CorMulT) which consists pre-training stage and prediction stage. At the pre-training stage, a modality correlation contrastive learning module is designed to efficiently learn modality correlation coefficients between different modalities. At the prediction stage, the learned correlation coefficients are fused with modality representations to make the sentiment prediction. According to the experiments on the popular multimodal dataset CMU-MOSEI, CorMulT obviously surpasses state-of-the-art multimodal sentiment analysis methods.

Updated: 2024-07-09 17:07:29

标题: CorMulT：一种半监督模态相关感知多模态变压器用于情感分析

摘要: 多模态情感分析是一个活跃的研究领域，它结合了多种数据模态，例如文本、图像和音频，用于分析人类情绪并造福于各种应用。现有的多模态情感分析方法可以被分类为基于模态交互的方法、基于模态转换的方法和基于模态相似性的方法。然而，大多数这些方法高度依赖于模态之间的强相关性，并且不能充分揭示和利用模态之间的相关性来增强情感分析。因此，这些方法通常在识别具有弱相关性的多模态数据的情感方面表现不佳。为了解决这个问题，我们提出了一个两阶段半监督模型，称为关联感知多模态变换器（CorMulT），其中包括预训练阶段和预测阶段。在预训练阶段，设计了一个模态相关性对比学习模块，以有效地学习不同模态之间的模态相关系数。在预测阶段，学习得到的相关系数与模态表示融合以进行情感预测。根据对流行的多模态数据集CMU-MOSEI的实验，CorMulT明显超越了最先进的多模态情感分析方法。

更新时间: 2024-07-09 17:07:29

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.07046v1

Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs

Tackling the problem of learning probabilistic classifiers from incomplete data in the context of Knowledge Graphs expressed in Description Logics, we describe an inductive approach based on learning simple belief networks. Specifically, we consider a basic probabilistic model, a Naive Bayes classifier, based on multivariate Bernoullis and its extension to a two-tier network in which this classification model is connected to a lower layer consisting of a mixture of Bernoullis. We show how such models can be converted into (probabilistic) axioms (or rules) thus ensuring more interpretability. Moreover they may be also initialized exploiting expert knowledge. We present and discuss the outcomes of an empirical evaluation which aimed at testing the effectiveness of the models on a number of random classification problems with different ontologies.

Updated: 2024-07-09 17:05:52

标题: 简单且可解释的知识图谱概率分类器

摘要: 在描述逻辑中表达的知识图中，从不完整数据中学习概率分类器的问题，我们描述了一种基于学习简单信念网络的归纳方法。具体来说，我们考虑了一个基本的概率模型，朴素贝叶斯分类器，基于多元伯努利分布及其扩展为一个两层网络，其中这个分类模型连接到一个由伯努利分布混合而成的较低层。我们展示了如何将这样的模型转换成（概率）公理（或规则），从而确保更多的可解释性。此外，它们还可以利用专家知识初始化。我们展示并讨论了一项经验评估的结果，旨在测试这些模型在多个具有不同本体论的随机分类问题上的有效性。

更新时间: 2024-07-09 17:05:52

领域: cs.AI

下载: http://arxiv.org/abs/2407.07045v1

ProtoSAM - One Shot Medical Image Segmentation With Foundational Models

This work introduces a new framework, ProtoSAM, for one-shot medical image segmentation. It combines the use of prototypical networks, known for few-shot segmentation, with SAM - a natural image foundation model. The method proposed creates an initial coarse segmentation mask using the ALPnet prototypical network, augmented with a DINOv2 encoder. Following the extraction of an initial mask, prompts are extracted, such as points and bounding boxes, which are then input into the Segment Anything Model (SAM). State-of-the-art results are shown on several medical image datasets and demonstrate automated segmentation capabilities using a single image example (one shot) with no need for fine-tuning of the foundation model.

Updated: 2024-07-09 17:04:08

标题: ProtoSAM -基础模型下的一次性医学图像分割

摘要: 这项工作介绍了一个新的框架ProtoSAM，用于一次性医学图像分割。它结合了用于少样本分割的原型网络，与SAM - 一种自然图像基础模型。所提出的方法使用ALPnet原型网络创建一个初始粗分割掩模，其配备了一个DINOv2编码器。在提取初始掩模之后，提取提示，如点和边界框，然后输入到Segment Anything Model (SAM)中。在几个医学图像数据集上展示了最先进的结果，并展示了使用单个图像示例（一次性）进行自动分割的能力，无需对基础模型进行微调。

更新时间: 2024-07-09 17:04:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07042v1

Hiding Local Manipulations on SAR Images: a Counter-Forensic Attack

The vast accessibility of Synthetic Aperture Radar (SAR) images through online portals has propelled the research across various fields. This widespread use and easy availability have unfortunately made SAR data susceptible to malicious alterations, such as local editing applied to the images for inserting or covering the presence of sensitive targets. Vulnerability is further emphasized by the fact that most SAR products, despite their original complex nature, are often released as amplitude-only information, allowing even inexperienced attackers to edit and easily alter the pixel content. To contrast malicious manipulations, in the last years the forensic community has begun to dig into the SAR manipulation issue, proposing detectors that effectively localize the tampering traces in amplitude images. Nonetheless, in this paper we demonstrate that an expert practitioner can exploit the complex nature of SAR data to obscure any signs of manipulation within a locally altered amplitude image. We refer to this approach as a counter-forensic attack. To achieve the concealment of manipulation traces, the attacker can simulate a re-acquisition of the manipulated scene by the SAR system that initially generated the pristine image. In doing so, the attacker can obscure any evidence of manipulation, making it appear as if the image was legitimately produced by the system. We assess the effectiveness of the proposed counter-forensic approach across diverse scenarios, examining various manipulation operations. The obtained results indicate that our devised attack successfully eliminates traces of manipulation, deceiving even the most advanced forensic detectors.

Updated: 2024-07-09 17:03:57

标题: 在SAR图像上隐藏局部篡改：一种反取证攻击

摘要: 合成孔径雷达（SAR）图像通过在线门户的广泛可访问性推动了各个领域的研究。这种广泛使用和易得性不幸地使SAR数据容易受到恶意篡改，例如对图像进行局部编辑以插入或覆盖敏感目标的存在。脆弱性进一步强调了大多数SAR产品，尽管其原始复杂性，通常被发布为仅振幅信息，使得即使是经验不足的攻击者也能轻松编辑和改变像素内容。为了对抗恶意篡改，在过去几年中，法医界已经开始研究SAR篡改问题，提出了能够有效定位振幅图像中篡改痕迹的检测器。然而，在本文中，我们证明了一名专业从业者可以利用SAR数据的复杂性来掩盖局部修改的振幅图像中的任何篡改迹象。我们将这种方法称为反取证攻击。为了掩盖篡改痕迹，攻击者可以模拟SAR系统初次生成原始图像的重新采集。通过这样做，攻击者可以掩盖任何篡改迹象，使其看起来像是系统合法生成的图像。我们评估了所提出的反取证方法在各种场景下的有效性，检查了各种篡改操作。获得的结果表明，我们设计的攻击成功消除了篡改痕迹，甚至欺骗了最先进的取证检测器。

更新时间: 2024-07-09 17:03:57

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2407.07041v1

Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs

We consider (stochastic) softmax policy gradient (PG) methods for bandits and tabular Markov decision processes (MDPs). While the PG objective is non-concave, recent research has used the objective's smoothness and gradient domination properties to achieve convergence to an optimal policy. However, these theoretical results require setting the algorithm parameters according to unknown problem-dependent quantities (e.g. the optimal action or the true reward vector in a bandit problem). To address this issue, we borrow ideas from the optimization literature to design practical, principled PG methods in both the exact and stochastic settings. In the exact setting, we employ an Armijo line-search to set the step-size for softmax PG and demonstrate a linear convergence rate. In the stochastic setting, we utilize exponentially decreasing step-sizes, and characterize the convergence rate of the resulting algorithm. We show that the proposed algorithm offers similar theoretical guarantees as the state-of-the art results, but does not require the knowledge of oracle-like quantities. For the multi-armed bandit setting, our techniques result in a theoretically-principled PG algorithm that does not require explicit exploration, the knowledge of the reward gap, the reward distributions, or the noise. Finally, we empirically compare the proposed methods to PG approaches that require oracle knowledge, and demonstrate competitive performance.

Updated: 2024-07-09 16:59:42

标题: 走向基于原则的、实用的用于Bandits和表格MDPs的政策梯度

摘要: 我们考虑（随机）softmax策略梯度（PG）方法用于赌博机和表格式马尔可夫决策过程（MDPs）。虽然PG目标是非凹的，但最近的研究已经利用了目标的平滑性和梯度支配特性来实现收敛到最优策略。然而，这些理论结果需要根据未知的问题相关量（例如，在赌博问题中的最优动作或真实奖励向量）设置算法参数。为了解决这个问题，我们借鉴了优化文献中的思想，在精确和随机环境中设计了实用、原则性的PG方法。在精确环境中，我们使用Armijo线搜索来设置softmax PG的步长，并展示了线性收敛率。在随机环境中，我们利用指数递减的步长，并表征了结果算法的收敛速度。我们展示了所提出的算法提供了与最新技术结果类似的理论保证，但不需要知道类似oracle的量。对于多臂赌博机设置，我们的技术产生了一个理论上基础扎实的PG算法，不需要明确的探索，奖励差距，奖励分布或噪声的知识。最后，我们通过实证比较了所提出的方法和需要oracle知识的PG方法，并展示了竞争性的性能。

更新时间: 2024-07-09 16:59:42

领域: cs.LG

下载: http://arxiv.org/abs/2405.13136v2

Trajectory Data Mining and Trip Travel Time Prediction on Specific Roads

Predicting a trip's travel time is essential for route planning and navigation applications. The majority of research is based on international data that does not apply to Pakistan's road conditions. We designed a complete pipeline for mining trajectories from sensors data. On this data, we employed state-of-the-art approaches, including a shallow artificial neural network, a deep multi-layered perceptron, and a long-short-term memory, to explore the issue of travel time prediction on frequent routes. The experimental results demonstrate an average prediction error ranging from 30 seconds to 1.2 minutes on trips lasting 10 minutes to 60 minutes on six most frequent routes in regions of Islamabad, Pakistan.

Updated: 2024-07-09 16:50:15

标题: 特定道路上的轨迹数据挖掘和行程旅行时间预测

摘要: 预测行程的旅行时间对于路线规划和导航应用至关重要。大多数研究基于国际数据，这些数据不适用于巴基斯坦的道路条件。我们设计了一个完整的管道，用于从传感器数据中挖掘轨迹。在这些数据上，我们采用了最先进的方法，包括浅层人工神经网络、深度多层感知器和长短期记忆，以探讨在常见路线上对旅行时间进行预测的问题。实验结果表明，在伊斯兰堡地区六条最常见路线上，持续10分钟到60分钟的行程，平均预测误差介于30秒到1.2分钟之间。

更新时间: 2024-07-09 16:50:15

领域: cs.AI

下载: http://arxiv.org/abs/2407.07030v1

Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and Client

Although Federated Learning (FL) is promising in knowledge sharing for heterogeneous Artificial Intelligence of Thing (AIoT) devices, their training performance and energy efficacy are severely restricted in practical battery-driven scenarios due to the ``wooden barrel effect'' caused by the mismatch between homogeneous model paradigms and heterogeneous device capability. As a result, due to various kinds of differences among devices, it is hard for existing FL methods to conduct training effectively in energy-constrained scenarios, such as battery constraints of devices. To tackle the above issues, we propose an energy-aware FL framework named DR-FL, which considers the energy constraints in both clients and heterogeneous deep learning models to enable energy-efficient FL. Unlike Vanilla FL, DR-FL adopts our proposed Muti-Agents Reinforcement Learning (MARL)-based dual-selection method, which allows participated devices to make contributions to the global model effectively and adaptively based on their computing capabilities and energy capacities in a MARL-based manner. Experiments conducted with various widely recognized datasets demonstrate that DR-FL has the capability to optimize the exchange of knowledge among diverse models in large-scale AIoT systems while adhering to energy limitations. Additionally, it improves the performance of each individual heterogeneous device's model.

Updated: 2024-07-09 16:46:19

标题: 朝向能源感知的联邦学习：通过多智能体强化学习的双重选择方法来选择模型和客户

摘要: 尽管联邦学习（FL）在异构物联网（AIoT）设备的知识共享方面具有潜力，但在实际的电池驱动场景中，由于同质模型范式与异构设备能力之间的不匹配导致了“木桶效应”，它们的训练性能和能源效率受到严重限制。因此，由于设备之间存在各种差异，现有的FL方法很难在电池受限制的情况下有效进行训练。为解决以上问题，我们提出了一种名为DR-FL的能源感知FL框架，该框架考虑了客户端和异构深度学习模型中的能源限制，从而实现了能源高效的FL。与普通的FL不同，DR-FL采用了我们提出的基于多智能体强化学习（MARL）的双重选择方法，允许参与设备基于其计算能力和能源容量以MARL方式有效自适应地为全局模型作出贡献。通过使用各种广泛认可的数据集进行的实验表明，DR-FL具有在大规模AIoT系统中优化不同模型之间知识交换的能力，同时遵守能源限制。此外，它提高了每个个体异构设备模型的性能。

更新时间: 2024-07-09 16:46:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.08183v2

Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on small, fully labeled TAL datasets for training an action localizer. In this paper, we explore the scalability of self-training with unlabeled YouTube videos for OV-TAL. Our self-training approach consists of two stages. First, a class-agnostic action localizer is trained on a human-labeled TAL dataset and used to generate pseudo-labels for unlabeled videos. Second, the large-scale pseudo-labeled dataset is combined with the human-labeled dataset to train the localizer. Extensive experiments demonstrate that leveraging web-scale videos in self-training significantly enhances the generalizability of an action localizer. Additionally, we highlighted issues with existing OV-TAL evaluation schemes and proposed a new evaluation protocol. Code is released at https://github.com/HYUNJS/STOV-TAL

Updated: 2024-07-09 16:44:04

标题: 探究自我训练在开放词汇时空动作定位中的可扩展性

摘要: 时间动作定位（TAL）中的词汇量受到大规模标注数据集稀缺的限制。为了解决这一问题，最近的研究将强大的预训练视觉语言模型（VLMs），如CLIP，纳入到执行开放词汇TAL（OV-TAL）中。然而，与在大量图像/视频-文本对上训练的VLMs不同，现有的OV-TAL方法仍然依赖于小规模、完全标记的TAL数据集来训练动作定位器。在本文中，我们探讨了利用未标记的YouTube视频进行OV-TAL的自我训练的可扩展性。我们的自我训练方法包括两个阶段。首先，在人工标记的TAL数据集上训练一个类别不可知的动作定位器，并用于为未标记视频生成伪标签。其次，将大规模的伪标记数据集与人工标记的数据集结合起来训练定位器。大量实验证明，利用网络规模视频在自我训练中显著提高了动作定位器的泛化能力。此外，我们还强调了现有OV-TAL评估方案存在的问题，并提出了一种新的评估方案。代码已发布在https://github.com/HYUNJS/STOV-TAL。

更新时间: 2024-07-09 16:44:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07024v1

A Deep RL Approach on Task Placement and Scaling of Edge Resources for Cellular Vehicle-to-Network Service Provisioning

Cellular-Vehicle-to-Everything (C-V2X) is currently at the forefront of the digital transformation of our society. By enabling vehicles to communicate with each other and with the traffic environment using cellular networks, we redefine transportation, improving road safety and transportation services, increasing efficiency of vehicular traffic flows, and reducing environmental impact. To effectively facilitate the provisioning of Cellular Vehicular-to-Network (C-V2N) services, we tackle the interdependent problems of service task placement and scaling of edge resources. Specifically, we formulate the joint problem and prove that it is not computationally tractable. To address its complexity we introduce a Deep Hybrid Policy Gradient (DHPG), a Deep Reinforcement Learning (DRL) approach for hybrid action spaces.The performance of DHPG is evaluated against several state-of-the-art (SoA) solutions through simulations employing a real-world C-V2N traffic dataset. The results demonstrate that DHPG outperforms SoA solutions in maintaining C-V2N service latency below the preset delay threshold, while simultaneously optimizing the utilization of computing resources. Finally, time complexity analysis is conducted to verify that the proposed approach can support real-time C-V2N services.

Updated: 2024-07-09 16:42:37

标题: 一个关于基于深度强化学习的边缘资源任务放置和缩放在车载网络服务提供中的应用

摘要: 细胞车辆到一切（C-V2X）目前处于数字转型的前沿，使车辆能够通过蜂窝网络与彼此和交通环境通信，重新定义交通运输，提高道路安全和交通服务质量，增加车辆流量效率，减少环境影响。为了有效促进细胞车辆到网络（C-V2N）服务的提供，我们解决了服务任务布置和边缘资源缩放的相互依赖问题。具体而言，我们制定了联合问题，并证明它不可计算。为了解决其复杂性，我们引入了一种深度混合策略梯度（DHPG），这是一种深度强化学习（DRL）方法，适用于混合动作空间。通过使用真实的C-V2N交通数据集进行模拟，对DHPG的性能与几种最先进的解决方案进行了评估。结果表明，DHPG在保持C-V2N服务延迟低于预设延迟阈值的同时，同时优化了计算资源的利用，优于最先进的解决方案。最后，进行了时间复杂性分析，以验证所提出的方法可以支持实时C-V2N服务。

更新时间: 2024-07-09 16:42:37

领域: cs.AI,cs.MA,cs.NI

下载: http://arxiv.org/abs/2305.09832v2

Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an adaptive visual sector, mimics the dynamic allocation of attention human drivers exhibit based on factors like spatial orientation, proximity, and driving speed. On the other hand, the "student" model focuses on real-time interaction and human decision-making, drawing parallels to the human memory storage mechanism. Furthermore, we improve the model's efficiency by introducing a new Fourier Adaptive Spike Neural Network (FA-SNN), allowing for faster and more precise predictions with fewer parameters. Evaluated using the NGSIM, HighD, and MoCAD benchmarks, HLTP++ demonstrates superior performance compared to existing models, which reduces the predicted trajectory error with over 11% on the NGSIM dataset and 25% on the HighD datasets. Moreover, HLTP++ demonstrates strong adaptability in challenging environments with incomplete input data. This marks a significant stride in the journey towards fully AD systems.

Updated: 2024-07-09 16:42:17

标题: Less is More: 高效的脑启发学习方法用于自动驾驶轨迹预测

摘要: 准确和安全地预测周围车辆的轨迹对于充分实现自动驾驶（AD）至关重要。本文介绍了人类轨迹预测模型（HLTP++），该模型模拟人类认知过程以改进自动驾驶中的轨迹预测。HLTP++采用了一种新颖的师生知识蒸馏框架。配备自适应视觉区域的“师傅”模型模仿了人类驾驶员基于空间定向、接近度和驾驶速度等因素展示的动态注意力分配。另一方面，“学生”模型专注于实时交互和人类决策制定，类比于人类记忆存储机制。此外，我们通过引入新的傅立叶自适应尖峰神经网络（FA-SNN）提高了模型的效率，使其能够更快、更精确地进行预测，同时参数更少。通过NGSIM、HighD和MoCAD基准测试，HLTP++表现出优于现有模型的性能，将NGSIM数据集上的预测轨迹误差减少了超过11%，HighD数据集上减少了25%。此外，HLTP++在输入数据不完整的挑战环境中展现了强大的适应性。这标志着在全面实现自动驾驶系统的道路上迈出了重要的一步。

更新时间: 2024-07-09 16:42:17

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2407.07020v1

End-To-End Causal Effect Estimation from Unstructured Natural Language Data

Knowing the effect of an intervention is critical for human decision-making, but current approaches for causal effect estimation rely on manual data collection and structuring, regardless of the causal assumptions. This increases both the cost and time-to-completion for studies. We show how large, diverse observational text data can be mined with large language models (LLMs) to produce inexpensive causal effect estimates under appropriate causal assumptions. We introduce NATURAL, a novel family of causal effect estimators built with LLMs that operate over datasets of unstructured text. Our estimators use LLM conditional distributions (over variables of interest, given the text data) to assist in the computation of classical estimators of causal effect. We overcome a number of technical challenges to realize this idea, such as automating data curation and using LLMs to impute missing information. We prepare six (two synthetic and four real) observational datasets, paired with corresponding ground truth in the form of randomized trials, which we used to systematically evaluate each step of our pipeline. NATURAL estimators demonstrate remarkable performance, yielding causal effect estimates that fall within 3 percentage points of their ground truth counterparts, including on real-world Phase 3/4 clinical trials. Our results suggest that unstructured text data is a rich source of causal effect information, and NATURAL is a first step towards an automated pipeline to tap this resource.

Updated: 2024-07-09 16:38:48

标题: 从非结构化自然语言数据中进行端到端因果效应估计

摘要: 认识干预效果对于人类决策至关重要，但目前的因果效应估计方法依赖于手动数据收集和结构化，而不考虑因果假设。这提高了研究的成本和完成时间。我们展示了如何利用大规模、多样化的观察性文本数据，通过大型语言模型（LLMs）产生廉价的因果效应估计结果，前提是采用适当的因果假设。我们引入了NATURAL，这是一种新颖的由LLMs构建的因果效应估计器家族，其在未结构化文本数据集上运行。我们的估计器使用LLM条件分布（在给定文本数据的感兴趣变量上）来辅助计算因果效应的经典估计器。我们克服了许多技术挑战来实现这一想法，比如自动化数据筛选和使用LLMs来填补缺失信息。我们准备了六个（两个合成和四个真实）观察性数据集，配对对应的随机试验的基本真相，用于系统评估我们流程的每一步。NATURAL估计器展现出出色的性能，产生的因果效应估计结果与其基本真相相差不超过3个百分点，包括在真实世界的第3/4阶段临床试验中。我们的结果表明，未结构化文本数据是因果效应信息的丰富来源，而NATURAL是向自动化流程迈出的第一步。

更新时间: 2024-07-09 16:38:48

领域: cs.LG,cs.CL,stat.ME

下载: http://arxiv.org/abs/2407.07018v1

A representation learning approach to probe for dynamical dark energy in matter power spectra

We present DE-VAE, a variational autoencoder (VAE) architecture to search for a compressed representation of dynamical dark energy (DE) models in observational studies of the cosmic large-scale structure. DE-VAE is trained on matter power spectra boosts generated at wavenumbers $k\in(0.01-2.5) \ h/\rm{Mpc}$ and at four redshift values $z\in(0.1,0.48,0.78,1.5)$ for the most typical dynamical DE parametrization with two extra parameters describing an evolving DE equation of state. The boosts are compressed to a lower-dimensional representation, which is concatenated with standard cold dark matter (CDM) parameters and then mapped back to reconstructed boosts; both the compression and the reconstruction components are parametrized as neural networks. Remarkably, we find that a single latent parameter is sufficient to predict 95% (99%) of DE power spectra generated over a broad range of cosmological parameters within $1\sigma$ ($2\sigma$) of a Gaussian error which includes cosmic variance, shot noise and systematic effects for a Stage IV-like survey. This single parameter shows a high mutual information with the two DE parameters, and these three variables can be linked together with an explicit equation through symbolic regression. Considering a model with two latent variables only marginally improves the accuracy of the predictions, and adding a third latent variable has no significant impact on the model's performance. We discuss how the DE-VAE architecture can be extended from a proof of concept to a general framework to be employed in the search for a common lower-dimensional parametrization of a wide range of beyond-$\Lambda$CDM models and for different cosmological datasets. Such a framework could then both inform the development of cosmological surveys by targeting optimal probes, and provide theoretical insight into the common phenomenological aspects of beyond-$\Lambda$CDM models.

Updated: 2024-07-09 16:30:26

标题: 一种表示学习方法用于探测物质功率谱中的动力暗能量

摘要: 我们提出了DE-VAE，这是一种变分自动编码器（VAE）架构，用于在宇宙大尺度结构的观测研究中寻找动态暗能量（DE）模型的压缩表示。DE-VAE在波数$k\in(0.01-2.5) \ h/\rm{Mpc}$和四个红移值$z\in(0.1,0.48,0.78,1.5)$处训练，针对具有两个额外参数描述演化DE态方程的最典型动态DE参数化。这些增强被压缩为低维表示，与标准冷暗物质（CDM）参数连接，然后映射回重建的增强；压缩和重建组件均被参数化为神经网络。值得注意的是，我们发现单个潜在参数足以预测95%（99%）的DE功率谱，这些功率谱在广泛的宇宙参数范围内在一个包括宇宙方差、射线噪声和系统效应的高斯误差的$1\sigma$（$2\sigma$）内生成，适用于类似于第四阶段的调查。这个单一参数与两个DE参数具有很高的互信息，这三个变量可以通过符号回归的显式方程联系在一起。考虑一个具有两个潜在变量的模型仅在预测的准确性方面略微提高，添加第三个潜在变量对模型的性能没有显著影响。我们讨论了DE-VAE架构如何从一个概念验证扩展为一个通用框架，可用于寻找一个宽泛的超出-$\Lambda$CDM模型的共同低维参数化，并适用于不同的宇宙学数据集。这样一个框架可以通过定位最佳探测器来指导宇宙学调查的发展，并为超出-$\Lambda$CDM模型的共同现象学方面提供理论洞察。

更新时间: 2024-07-09 16:30:26

领域: astro-ph.CO,astro-ph.IM,cs.LG

下载: http://arxiv.org/abs/2310.10717v2

Distilling System 2 into System 1

Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought (Wei et al., 2022), many such System 2 techniques have been proposed such as Rephrase and Respond (Deng et al., 2023a), System 2 Attention (Weston and Sukhbaatar, 2023) and Branch-Solve-Merge (Saha et al., 2023). In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations without intermediate reasoning token sequences, as this reasoning has been distilled into System 1. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that such System 2 distillation will be an important feature of future continually learning AI systems, enabling them to focus System 2 capabilities on the reasoning tasks that they cannot yet do well.

Updated: 2024-07-09 16:29:11

标题: 将系统2提炼为系统1

摘要: 大型语言模型（LLMs）可以在推理过程中花费额外的计算资源来生成中间思想，从而帮助产生更好的最终响应。自从“思维链”（Wei等，2022）以来，许多类似的System 2技术已被提出，例如重述与回应（Deng等，2023a），System 2关注（Weston和Sukhbaatar，2023）和分支-解决-合并（Saha等，2023）。在这项工作中，我们研究了自监督方法，将System 2技术产生的更高质量的输出“编译”（蒸馏）回LLM生成中，而无需中间推理令牌序列，因为这种推理已被蒸馏为System 1。我们展示了几种这样的技术可以成功蒸馏，与原始System 1性能相比表现出改进的结果，并且比System 2具有更少的推理成本。我们认为这种System 2蒸馏将成为未来持续学习人工智能系统的重要特征，使它们能够将System 2能力集中在它们尚不能很好完成的推理任务上。

更新时间: 2024-07-09 16:29:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06023v2

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods.

Updated: 2024-07-09 16:29:08

标题: 贝叶斯-LoRA：基于最优量化级别和秩值通过可微分贝叶斯门进行参数高效微调

摘要: 在自然语言处理中，一种常见的做法是在通用领域预训练单个模型，然后对下游任务进行微调。然而，对于大型语言模型来说，微调整个模型可能会计算成本高昂，并导致能源消耗非常大。因此，最近提出了几种参数高效微调（PEFT）方法。其中一种最受欢迎的方法是低秩适应（LoRA），其关键见解是将预训练模型的更新权重分解为两个低秩矩阵。然而，现有的方法要么在所有不同的权重矩阵上使用相同的秩值，这已被证明是一个次优选择，要么不使用任何量化技术，这是模型能源消耗时最重要的因素之一。在这项工作中，我们提出了贝叶斯-LoRA，它从贝叶斯角度考虑低秩适应和量化，通过在量化水平和秩值上采用先验分布。因此，B-LoRA能够在特定的下游任务上微调预训练模型，找到每个低秩矩阵的最佳秩值和量化水平。我们通过在GLUE基准测试上微调预训练的DeBERTaV3来验证提出的模型。此外，我们将其与相关基线进行比较，并展示定性和定量结果，显示该方法如何学习最佳秩量化矩阵。与基线方法相比，B-LoRA的总比特操作数量降低了约70%，性能与基线相当或更好。

更新时间: 2024-07-09 16:29:08

领域: cs.AI

下载: http://arxiv.org/abs/2406.13046v2

Explainable AI for Enhancing Efficiency of DL-based Channel Estimation

The support of artificial intelligence (AI) based decision-making is a key element in future 6G networks, where the concept of native AI will be introduced. Moreover, AI is widely employed in different critical applications such as autonomous driving and medical diagnosis. In such applications, using AI as black-box models is risky and challenging. Hence, it is crucial to understand and trust the decisions taken by these models. Tackling this issue can be achieved by developing explainable AI (XAI) schemes that aim to explain the logic behind the black-box model behavior, and thus, ensure its efficient and safe deployment. Recently, we proposed a novel perturbation-based XAI-CHEST framework that is oriented toward channel estimation in wireless communications. The core idea of the XAI-CHEST framework is to identify the relevant model inputs by inducing high noise on the irrelevant ones. This manuscript provides the detailed theoretical foundations of the XAI-CHEST framework. In particular, we derive the analytical expressions of the XAI-CHEST loss functions and the noise threshold fine-tuning optimization problem. Hence the designed XAI-CHEST delivers a smart input feature selection methodology that can further improve the overall performance while optimizing the architecture of the employed model. Simulation results show that the XAI-CHEST framework provides valid interpretations, where it offers an improved bit error rate performance while reducing the required computational complexity in comparison to the classical DL-based channel estimation.

Updated: 2024-07-09 16:24:21

标题: 可解释的人工智能用于提高基于深度学习的信道估计效率

摘要: 基于人工智能（AI）决策支持是未来6G网络的关键要素，其中将引入本地AI的概念。此外，AI被广泛应用于不同的关键应用，如自动驾驶和医学诊断。在这些应用中，将AI作为黑匣子模型使用是有风险和挑战的。因此，理解和信任这些模型所做决策是至关重要的。解决这个问题可以通过开发可解释AI（XAI）方案来实现，这些方案旨在解释黑匣子模型行为背后的逻辑，从而确保其高效且安全的部署。最近，我们提出了一个新颖的基于扰动的XAI-CHEST框架，该框架旨在用于无线通信中的信道估计。XAI-CHEST框架的核心思想是通过在不相关的输入上引入高噪声来识别相关的模型输入。本文提供了XAI-CHEST框架的详细理论基础。特别是，我们推导了XAI-CHEST损失函数和噪声阈值微调优化问题的分析表达式。因此，设计的XAI-CHEST提供了一种智能的输入特征选择方法，可以在优化所使用的模型架构的同时进一步改善整体性能。模拟结果显示，与经典的基于深度学习的信道估计相比，XAI-CHEST框架提供了有效的解释，同时在减少所需的计算复杂性的同时提供了改进的误码率性能。

更新时间: 2024-07-09 16:24:21

领域: cs.AI,eess.SP

下载: http://arxiv.org/abs/2407.07009v1

Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak

Active feedback control in magnetic confinement fusion devices is desirable to mitigate plasma instabilities and enable robust operation. Optical high-speed cameras provide a powerful, non-invasive diagnostic and can be suitable for these applications. In this study, we process fast camera data, at rates exceeding 100kfps, on $\textit{in situ}$ Field Programmable Gate Array (FPGA) hardware to track magnetohydrodynamic (MHD) mode evolution and generate control signals in real-time. Our system utilizes a convolutional neural network (CNN) model which predicts the $n$=1 MHD mode amplitude and phase using camera images with better accuracy than other tested non-deep-learning-based methods. By implementing this model directly within the standard FPGA readout hardware of the high-speed camera diagnostic, our mode tracking system achieves a total trigger-to-output latency of 17.6$\mu$s and a throughput of up to 120kfps. This study at the High Beta Tokamak-Extended Pulse (HBT-EP) experiment demonstrates an FPGA-based high-speed camera data acquisition and processing system, enabling application in real-time machine-learning-based tokamak diagnostic and control as well as potential applications in other scientific domains.

Updated: 2024-07-09 16:20:06

标题: 在托卡马克上部署的基于光学和机器学习的低延迟模式跟踪

摘要: 在磁约束聚变设备中，主动反馈控制是为了减轻等离子体不稳定性并实现稳健操作而被认为是理想的。光学高速摄像机提供了一种强大的非侵入式诊断方法，可适用于这些应用。在这项研究中，我们利用$\textit{in situ}$现场可编程门阵列（FPGA）硬件处理快速摄像机数据，其速率超过100kfps，实时跟踪磁流体力学（MHD）模式演变并生成控制信号。我们的系统采用卷积神经网络（CNN）模型，利用摄像机图像更准确地预测$n$=1 MHD模式振幅和相位，比其他经过测试的非深度学习方法更准确。通过直接在高速摄像机诊断的标准FPGA读出硬件内实现该模型，我们的模式跟踪系统实现了总触发到输出的延迟为17.6$\mu$s，吞吐量高达120kfps。这项在高Beta托卡马克-扩展脉冲（HBT-EP）实验中进行的研究展示了一种基于FPGA的高速摄像机数据采集和处理系统，实现了实时机器学习基于托卡马克诊断和控制的应用，以及其他科学领域的潜在应用。

更新时间: 2024-07-09 16:20:06

领域: physics.plasm-ph,cs.AR,cs.LG,physics.ins-det

下载: http://arxiv.org/abs/2312.00128v3

Empirical analysis of Biding Precedent efficiency in the Brazilian Supreme Court via Similar Case Retrieval

Binding precedents (S\'umulas Vinculantes) constitute a juridical instrument unique to the Brazilian legal system and whose objectives include the protection of the Federal Supreme Court against repetitive demands. Studies of the effectiveness of these instruments in decreasing the Court's exposure to similar cases, however, indicate that they tend to fail in such a direction, with some of the binding precedents seemingly creating new demands. We empirically assess the legal impact of five binding precedents, 11, 14, 17, 26 and 37, at the highest court level through their effects on the legal subjects they address. This analysis is only possible through the comparison of the Court's ruling about the precedents' themes before they are created, which means that these decisions should be detected through techniques of Similar Case Retrieval. The contributions of this article are therefore twofold: on the mathematical side, we compare the uses of different methods of Natural Language Processing -- TF-IDF, LSTM, BERT, and regex -- for Similar Case Retrieval, whereas on the legal side, we contrast the inefficiency of these binding precedents with a set of hypotheses that may justify their repeated usage. We observe that the deep learning models performed significantly worse in the specific Similar Case Retrieval task and that the reasons for binding precedents to fail in responding to repetitive demand are heterogeneous and case-dependent, making it impossible to single out a specific cause.

Updated: 2024-07-09 16:17:16

标题: 巴西最高法院拍卖先例效率的实证分析：通过类似案例检索进行

摘要: 约束性先例（S\'umulas Vinculantes）构成了巴西法律体系中独特的司法工具，其目标包括保护联邦最高法院免受重复要求的侵害。然而，对这些工具在减少法院面临类似案件方面的有效性的研究表明，它们往往在这个方向上失败，一些约束性先例似乎创造了新的要求。我们通过对五个约束性先例（11、14、17、26和37）在最高法院层面上对其涉及的法律主题的影响来进行实证评估。这种分析只有通过在它们被创建之前对法院关于这些先例主题的裁决进行比较才有可能，这意味着这些决定应该通过相似案例检索技术来检测。本文的贡献因此是双重的：在数学方面，我们比较了不同自然语言处理方法（TF-IDF、LSTM、BERT和regex）在相似案例检索中的使用情况，而在法律方面，我们对这些约束性先例的低效性与一组可能解释它们反复使用的假设进行对比。我们观察到深度学习模型在特定相似案例检索任务中表现明显较差，而约束性先例未能回应重复要求的原因是多样化的，并且依赖于案例，使得无法单独确定具体原因。

更新时间: 2024-07-09 16:17:16

领域: cs.CL,cs.AI,cs.IR,cs.LG,68T50 (Primary), 68T07 (Secondary)

下载: http://arxiv.org/abs/2407.07004v1

Learning to Complement and to Defer to Multiple Users

With the development of Human-AI Collaboration in Classification (HAI-CC), integrating users and AI predictions becomes challenging due to the complex decision-making process. This process has three options: 1) AI autonomously classifies, 2) learning to complement, where AI collaborates with users, and 3) learning to defer, where AI defers to users. Despite their interconnected nature, these options have been studied in isolation rather than as components of a unified system. In this paper, we address this weakness with the novel HAI-CC methodology, called Learning to Complement and to Defer to Multiple Users (LECODU). LECODU not only combines learning to complement and learning to defer strategies, but it also incorporates an estimation of the optimal number of users to engage in the decision process. The training of LECODU maximises classification accuracy and minimises collaboration costs associated with user involvement. Comprehensive evaluations across real-world and synthesized datasets demonstrate LECODU's superior performance compared to state-of-the-art HAI-CC methods. Remarkably, even when relying on unreliable users with high rates of label noise, LECODU exhibits significant improvement over both human decision-makers alone and AI alone.

Updated: 2024-07-09 16:16:44

标题: 学习如何与多个用户相互补充和推迟

摘要: 随着人工智能协作分类（HAI-CC）的发展，由于复杂的决策过程，将用户和人工智能预测整合在一起变得具有挑战性。这个过程有三种选择：1）人工智能自主分类，2）学习互补，即人工智能与用户协作，3）学习推迟，即人工智能推迟给用户。尽管这些选择之间彼此相互关联，但它们通常被独立研究，而不被视为一个统一系统的组成部分。在本文中，我们通过新颖的HAI-CC方法学习互补和推迟给多个用户（LECODU）来解决这一弱点。LECODU不仅结合了学习互补和学习推迟策略，还包括对参与决策过程中用户数量的最佳估计。LECODU的训练最大化了分类准确性，并最小化了与用户参与相关的协作成本。在真实世界和合成数据集上进行的全面评估显示，LECODU相对于最先进的HAI-CC方法表现出更好的性能。值得注意的是，即使依赖于具有高误差率的不可靠用户，LECODU也比仅依靠人类决策者和仅依靠人工智能表现出显著的改进。

更新时间: 2024-07-09 16:16:44

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.07003v1

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of user-facing performance crucial for real-time applications such as chat and translation. In this paper, we first identify the pitfalls of current performance metrics in evaluating LLM inference systems. We then propose Metron, a comprehensive performance evaluation framework that includes fluidity-index -- a novel metric designed to reflect the intricacies of the LLM inference process and its impact on real-time user experience. Finally, we evaluate various existing open-source platforms and model-as-a-service offerings using Metron, discussing their strengths and weaknesses. Metron is available at https://github.com/project-metron/metron.

Updated: 2024-07-09 16:13:26

标题: Metron: 面向LLM推理系统的整体性能评估框架

摘要: 为了在生产环境中为大型语言模型（LLMs）提供服务可能会造成巨大的成本，这促使了推理系统优化方面的最新进展。如今，这些系统通常根据传统的延迟和吞吐量指标（例如TTFT、TBT、标准化延迟和TPOT）进行评估。然而，这些指标未能完全捕捉到LLM推理的微妙之处，导致了对用户界面性能的不完全评估，这对于实时应用程序（如聊天和翻译）至关重要。在本文中，我们首先识别了当前性能指标在评估LLM推理系统方面的缺陷。然后，我们提出了Metron，这是一个全面的性能评估框架，其中包括流动性指数 - 一种旨在反映LLM推理过程的复杂性及其对实时用户体验的影响的新型指标。最后，我们使用Metron评估了各种现有的开源平台和模型即服务（model-as-a-service）提供，讨论它们的优势和劣势。Metron可在https://github.com/project-metron/metron上找到。

更新时间: 2024-07-09 16:13:26

领域: cs.LG,cs.AI,cs.CL,cs.DC

下载: http://arxiv.org/abs/2407.07000v1

Changepoint Detection in Highly-Attributed Dynamic Graphs

Detecting anomalous behavior in dynamic networks remains a constant challenge. This problem is further exacerbated when the underlying topology of these networks is affected by individual highly-dimensional node attributes. We address this issue by tracking a network's modularity as a proxy of its community structure. We leverage Graph Neural Networks (GNNs) to estimate each snapshot's modularity. GNNs can account for both network structure and high-dimensional node attributes, providing a comprehensive approach for estimating network statistics. Our method is validated through simulations that demonstrate its ability to detect changes in highly-attributed networks by analyzing shifts in modularity. Moreover, we find our method is able to detect a real-world event within the \#Iran Twitter reply network, where each node has high-dimensional textual attributes.

Updated: 2024-07-09 16:12:44

标题: 高属性动态图中的变点检测

摘要: 在动态网络中检测异常行为仍然是一个持续的挑战。当这些网络的基本拓扑受到具有高维节点属性的个体影响时，这个问题进一步恶化。我们通过跟踪网络的模块化作为其社区结构的代理来解决这个问题。我们利用图神经网络（GNNs）来估计每个快照的模块化。GNNs可以考虑网络结构和高维节点属性，为估计网络统计量提供了全面的方法。我们通过模拟验证了我们的方法，证明了它能够通过分析模块化的变化来检测高度属性化的网络中的变化。此外，我们发现我们的方法能够在\#伊朗Twitter回复网络中检测到一个真实事件，其中每个节点都具有高维文本属性。

更新时间: 2024-07-09 16:12:44

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2407.06998v1

A Unified Learn-to-Distort-Data Framework for Privacy-Utility Trade-off in Trustworthy Federated Learning

In this paper, we first give an introduction to the theoretical basis of the privacy-utility equilibrium in federated learning based on Bayesian privacy definitions and total variation distance privacy definitions. We then present the \textit{Learn-to-Distort-Data} framework, which provides a principled approach to navigate the privacy-utility equilibrium by explicitly modeling the distortion introduced by the privacy-preserving mechanism as a learnable variable and optimizing it jointly with the model parameters. We demonstrate the applicability of our framework to a variety of privacy-preserving mechanisms on the basis of data distortion and highlight its connections to related areas such as adversarial training, input robustness, and unlearnable examples. These connections enable leveraging techniques from these areas to design effective algorithms for privacy-utility equilibrium in federated learning under the \textit{Learn-to-Distort-Data} framework.

Updated: 2024-07-09 16:11:04

标题: 一个统一的学习扭曲数据框架，用于信任的联邦学习中隐私效用权衡

摘要: 在本文中，我们首先介绍了基于贝叶斯隐私定义和总变差距离隐私定义的联邦学习中隐私-效用平衡的理论基础。然后，我们提出了“学习扭曲数据”框架，该框架提供了一种原则性方法，通过将隐私保护机制引入的扭曲明确建模为可学习变量，并与模型参数一起进行优化，来导航隐私-效用平衡。我们展示了我们的框架在基于数据扭曲的各种隐私保护机制上的适用性，并突出其与对抗训练、输入稳健性和难以学习示例等相关领域的联系。这些联系使得可以利用这些领域的技术来设计在“学习扭曲数据”框架下的联邦学习中隐私-效用平衡的有效算法。

更新时间: 2024-07-09 16:11:04

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2407.04751v2

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks. The robustness of these models, essential for ensuring their reliability in practice, has also garnered significant attention. With a wide array of research on robust IR being proposed, we believe it is the opportune moment to consolidate the current status, glean insights from existing methodologies, and lay the groundwork for future development. We view the robustness of IR to be a multifaceted concept, emphasizing its necessity against adversarial attacks, out-of-distribution (OOD) scenarios and performance variance. With a focus on adversarial and OOD robustness, we dissect robustness solutions for dense retrieval models (DRMs) and neural ranking models (NRMs), respectively, recognizing them as pivotal components of the neural IR pipeline. We provide an in-depth discussion of existing methods, datasets, and evaluation metrics, shedding light on challenges and future directions in the era of large language models. To the best of our knowledge, this is the first comprehensive survey on the robustness of neural IR models, and we will also be giving our first tutorial presentation at SIGIR 2024 \url{https://sigir2024-robust-information-retrieval.github.io}. Along with the organization of existing work, we introduce a Benchmark for robust IR (BestIR), a heterogeneous evaluation benchmark for robust neural information retrieval, which is publicly available at \url{https://github.com/Davion-Liu/BestIR}. We hope that this study provides useful clues for future research on the robustness of IR models and helps to develop trustworthy search engines \url{https://github.com/Davion-Liu/Awesome-Robustness-in-Information-Retrieval}.

Updated: 2024-07-09 16:07:01

标题: 强大的神经信息检索：对抗和超出分布的视角

摘要: 最近在神经信息检索（IR）模型方面取得的进展显著提高了它们在各种IR任务中的效果。这些模型的稳健性对于确保它们在实践中的可靠性至关重要，并且也引起了重要关注。随着大量关于稳健IR的研究提出，我们认为现在是整合当前状况、从现有方法中获取见解并为未来发展奠定基础的时机。我们认为IR的稳健性是一个多方面的概念，强调其对抗对手攻击、分布之外（OOD）情景和性能变化的必要性。在对对抗性和OOD稳健性的关注下，我们分析了密集检索模型（DRMs）和神经排名模型（NRMs）的稳健性解决方案，分别将它们视为神经IR流程的关键组成部分。我们深入讨论了现有方法、数据集和评估指标，阐明了在大型语言模型时代面临的挑战和未来方向。据我们所知，这是有关神经IR模型稳健性的第一份全面调查，我们还将在SIGIR 2024年进行首次教程演示。除了组织现有工作，我们还引入了一个用于稳健IR的基准（BestIR），这是一个用于稳健神经信息检索的异构评估基准，可在https://github.com/Davion-Liu/BestIR 上公开获取。我们希望这项研究为未来关于IR模型稳健性的研究提供有用线索，并有助于开发可信赖的搜索引擎。

更新时间: 2024-07-09 16:07:01

领域: cs.IR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.06992v1

Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics

Extracting scientific understanding from particle-physics experiments requires solving diverse learning problems with high precision and good data efficiency. We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. L-GATr represents high-energy data in a geometric algebra over four-dimensional space-time and is equivariant under Lorentz transformations, the symmetry group of relativistic kinematics. At the same time, the architecture is a Transformer, which makes it versatile and scalable to large systems. L-GATr is first demonstrated on regression and classification tasks from particle physics. We then construct the first Lorentz-equivariant generative model: a continuous normalizing flow based on an L-GATr network, trained with Riemannian flow matching. Across our experiments, L-GATr is on par with or outperforms strong domain-specific baselines.

Updated: 2024-07-09 16:01:23

标题: 洛伦兹等变几何代数变换器在高能物理中的应用

摘要: 从粒子物理实验中提取科学理解需要解决高精度和良好数据效率的多样化学习问题。我们提出Lorentz几何代数变换器（L-GATr），这是一个新的多用途架构，用于高能物理。L-GATr在四维时空中的几何代数中表示高能数据，并且在Lorentz变换下是等变的，这是相对论运动学的对称群。同时，这个架构是一个Transformer，使其能够适用于大型系统。L-GATr首先在粒子物理的回归和分类任务上进行了展示。然后我们构建了第一个Lorentz等变生成模型：基于L-GATr网络的连续标准流，通过黎曼流匹配进行训练。在我们的实验中，L-GATr与或者胜过强大的领域特定基线。

更新时间: 2024-07-09 16:01:23

领域: physics.data-an,cs.LG,hep-ph,stat.ML

下载: http://arxiv.org/abs/2405.14806v2

Regularization in Spider-Style Strategy Discovery and Schedule Construction

To achieve the best performance, automatic theorem provers often rely on schedules of diverse proving strategies to be tried out (either sequentially or in parallel) on a given problem. In this paper, we report on a large-scale experiment with discovering strategies for the Vampire prover, targeting the FOF fragment of the TPTP library and constructing a schedule for it, based on the ideas of Andrei Voronkov's system Spider. We examine the process from various angles, discuss the difficulty (or ease) of obtaining a strong Vampire schedule for the CASC competition, and establish how well a schedule can be expected to generalize to unseen problems and what factors influence this property.

Updated: 2024-07-09 16:00:10

标题: 蜘蛛式策略发现和计划构建中的正则化

摘要: 为了达到最佳性能，自动定理证明器通常依赖于多样的证明策略安排（无论是按顺序还是并行）来尝试解决给定问题。在本文中，我们报告了一个大规模实验，旨在发现Vampire证明器的策略，针对TPTP库的FOF片段并基于Andrei Voronkov的Spider系统构建一个调度。我们从各个角度检查这个过程，讨论为CASC竞赛获得强大Vampire调度的难度（或容易程度），并确定可以预期一个调度在未知问题上的泛化程度以及影响这一特性的因素。

更新时间: 2024-07-09 16:00:10

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2403.12869v2

Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges persist due to inherent uncertainty in human speech. Recent advances in Large Language Models (LLMs) provide an avenue to address these challenges, but their ability to extract the relevant semantic information remains unexplored. In this study, we evaluate four publicly available LLMs (GPT-4, Gemini-Pro, Llama3, and Mixtral), investigating their ability to comprehend utterances even in the presence of uncertainty and identify the relevant data context and visual tasks. Our findings reveal that LLMs are sensitive to uncertainties in utterances. Despite this sensitivity, they are able to extract the relevant data context. However, LLMs struggle with inferring visualization tasks. Based on these results, we highlight future research directions on using LLMs for visualization generation.

Updated: 2024-07-09 15:59:29

标题: 评估LLMs在数据可视化中对自然语言话语的语义分析能力

摘要: 回应人类对数据集的话语自动生成数据可视化需要对数据话语进行深入的语义理解，包括对数据属性、可视化任务和必要的数据准备步骤的隐含和明确引用。数据可视化的自然语言接口（NLIs）已经探索了推断此类信息的方式，然而由于人类语音中固有的不确定性，挑战仍然存在。最近大型语言模型（LLMs）的进展提供了解决这些挑战的途径，但它们提取相关语义信息的能力尚未被探索。在本研究中，我们评估了四个公开可用的LLMs（GPT-4、Gemini-Pro、Llama3和Mixtral），调查它们理解话语的能力，即使在存在不确定性的情况下，也能识别相关数据上下文和视觉任务。我们的研究结果表明，LLMs对话语中的不确定性很敏感。尽管存在这种敏感性，它们能够提取相关的数据上下文。然而，LLMs在推断可视化任务方面存在困难。基于这些结果，我们强调了未来在使用LLMs进行可视化生成方面的研究方向。

更新时间: 2024-07-09 15:59:29

领域: cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.06129v2

PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PEER (Plan, Execute, Express, Review) multi-agent framework. This systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment. Given the concerns of cost and data privacy, enterprises are shifting from proprietary models like GPT-4 to custom models, striking a balance between cost, security, and performance. We developed industrial practices leveraging online data and user feedback for efficient model tuning. This study provides best practice guidelines for applying multi-agent systems in domain-specific problem-solving and implementing effective agent tuning strategies. Our empirical studies, particularly in the financial question-answering domain, demonstrate that our approach achieves 95.0% of GPT-4's performance, while effectively managing costs and ensuring data privacy.

Updated: 2024-07-09 15:59:28

标题: 同行评审：利用多智能体框架和调优方法在领域特定任务中进行专家化处理

摘要: 在特定领域的应用中，GPT-4在精确提示或检索增强生成（RAG）的支持下展现出显著的潜力，但面临着性能、成本和数据隐私的关键三难问题。高性能需要复杂的处理技术，然而在复杂工作流程中管理多个代理通常成本高昂且具有挑战性。为了解决这个问题，我们引入了PEER（计划，执行，表达，审查）多代理框架。该框架通过整合精确问题分解、先进信息检索、全面摘要和严格的自我评估，系统化地处理特定领域的任务。考虑到成本和数据隐私的问题，企业正在从像GPT-4这样的专利模型转向定制模型，寻求在成本、安全和性能之间取得平衡。我们开发了利用在线数据和用户反馈进行高效模型调整的工业实践。这项研究提供了在特定领域问题解决中应用多代理系统和实施有效代理调整策略的最佳实践指南。我们的实证研究，特别是在金融问答领域，展示了我们的方法在有效管理成本和确保数据隐私的同时实现了GPT-4性能的95.0%。

更新时间: 2024-07-09 15:59:28

领域: cs.AI

下载: http://arxiv.org/abs/2407.06985v1

Can virtual staining for high-throughput screening generalize?

The large volume and variety of imaging data from high-throughput screening (HTS) in the pharmaceutical industry present an excellent resource for training virtual staining models. However, the potential of models trained under one set of experimental conditions to generalize to other conditions remains underexplored. This study systematically investigates whether data from three cell types (lung, ovarian, and breast) and two phenotypes (toxic and non-toxic conditions) commonly found in HTS can effectively train virtual staining models to generalize across three typical HTS distribution shifts: unseen phenotypes, unseen cell types, and the combination of both. Utilizing a dataset of 772,416 paired bright-field, cytoplasm, nuclei, and DNA-damage stain images, we evaluate the generalization capabilities of models across pixel-based, instance-wise, and biological-feature-based levels. Our findings indicate that training virtual nuclei and cytoplasm models on non-toxic condition samples not only generalizes to toxic condition samples but leads to improved performance across all evaluation levels compared to training on toxic condition samples. Generalization to unseen cell types shows variability depending on the cell type; models trained on ovarian or lung cell samples often perform well under other conditions, while those trained on breast cell samples consistently show poor generalization. Generalization to unseen cell types and phenotypes shows good generalization across all levels of evaluation compared to addressing unseen cell types alone. This study represents the first large-scale, data-centric analysis of the generalization capability of virtual staining models trained on diverse HTS datasets, providing valuable strategies for experimental training data generation.

Updated: 2024-07-09 15:54:06

标题: 虚拟染色技术是否适用于高通量筛选？

摘要: 在制药行业的高通量筛选（HTS）中，大量和多样化的成像数据为训练虚拟染色模型提供了极好的资源。然而，针对在一组实验条件下训练的模型是否能推广到其他条件的潜力仍未被充分探索。本研究系统地调查了来自三种细胞类型（肺部、卵巢和乳腺）和两种表型（有毒和无毒条件）的数据是否能有效训练虚拟染色模型，以便在三种典型的HTS分布转变情况下进行推广：看不见的表型、看不见的细胞类型以及两者的组合。利用包含772,416对亮场、细胞质、细胞核和DNA损伤染色图像的数据集，我们评估了模型在基于像素、基于实例和基于生物特征的各个层面的推广能力。我们的研究结果表明，在非有毒条件样本上训练虚拟细胞核和细胞质模型不仅能推广到有毒条件样本，而且相比在有毒条件样本上训练，在所有评估层面上表现出更好的性能。对看不见的细胞类型的推广表现出因细胞类型而异的变异性；在卵巢或肺细胞样本上训练的模型通常在其他条件下表现良好，而在乳腺细胞样本上训练的模型则一贯表现出较差的推广能力。对看不见的细胞类型和表型的推广表现出比仅处理看不见的细胞类型更好的推广能力。这项研究代表了第一个在多样化HTS数据集上训练的虚拟染色模型的推广能力的大规模、以数据为中心的分析，为实验训练数据生成提供了宝贵的策略。

更新时间: 2024-07-09 15:54:06

领域: cs.LG,cs.AI,cs.CV,q-bio.QM

下载: http://arxiv.org/abs/2407.06979v1

Towards a Novel Privacy-Preserving Distributed Multiparty Data Outsourcing Scheme for Cloud Computing with Quantum Key Distribution

The intersection of cloud computing, blockchain technology, and the impending era of quantum computing presents a critical juncture for data security. This research addresses the escalating vulnerabilities by proposing a comprehensive framework that integrates Quantum Key Distribution (QKD), CRYSTALS Kyber, and Zero-Knowledge Proofs (ZKPs) for securing data in cloud-based blockchain systems. The primary objective is to fortify data against quantum threats through the implementation of QKD, a quantum-safe cryptographic protocol. We leverage the lattice-based cryptographic mechanism, CRYSTALS Kyber, known for its resilience against quantum attacks. Additionally, ZKPs are introduced to enhance data privacy and verification processes within the cloud and blockchain environment. A significant focus of this research is the performance evaluation of the proposed framework. Rigorous analyses encompass encryption and decryption processes, quantum key generation rates, and overall system efficiency. Practical implications are scrutinized, considering factors such as file size, response time, and computational overhead. The evaluation sheds light on the framework's viability in real-world cloud environments, emphasizing its efficiency in mitigating quantum threats. The findings contribute a robust quantum-safe and ZKP-integrated security framework tailored for cloud-based blockchain storage. By addressing critical gaps in theoretical advancements, this research offers practical insights for organizations seeking to secure their data against quantum threats. The framework's efficiency and scalability underscore its practical feasibility, serving as a guide for implementing enhanced data security in the evolving landscape of quantum computing and blockchain integration within cloud environments.

Updated: 2024-07-09 15:53:04

标题: 朝着一种新颖的隐私保护分布式多方数据外包方案的发展，基于量子密钥分发的云计算

摘要: 云计算、区块链技术和即将到来的量子计算时代的交集为数据安全提供了一个关键时刻。本研究通过提出一个综合框架，整合了量子密钥分发（QKD）、CRYSTALS Kyber和零知识证明（ZKPs）来保护基于云的区块链系统中的数据，以解决不断增加的漏洞。主要目标是通过实施QKD这一量子安全的加密协议来加固数据以抵御量子威胁。我们利用基于格的加密机制CRYSTALS Kyber，这种机制以其对抗量子攻击的韧性而闻名。此外，引入ZKPs来增强云和区块链环境中的数据隐私和验证过程。本研究的一个重要焦点是对所提出框架的性能评估。严格的分析包括加密和解密过程、量子密钥生成率和整体系统效率。实际影响因素如文件大小、响应时间和计算开销也得到审查。评估揭示了该框架在真实云环境中的可行性，强调了它在缓解量子威胁方面的效率。研究结果为基于云的区块链存储量身定制了一个强大的量子安全和ZKP集成安全框架。通过填补理论进展中的关键空白，本研究为寻求保护数据免受量子威胁的组织提供了实际见解。该框架的效率和可扩展性突显了其实际可行性，成为在量子计算和区块链整合不断发展的云环境中实施增强数据安全的指南。

更新时间: 2024-07-09 15:53:04

领域: cs.CR

下载: http://arxiv.org/abs/2407.18923v1

Advancing Manuscript Metadata: Work in Progress at the Jagiellonian University

As part of ongoing research projects, three Jagiellonian University units -- the Jagiellonian University Museum, the Jagiellonian University Archives, and the Jagiellonian Library -- are collaborating to digitize cultural heritage documents, describe them in detail, and then integrate these descriptions into a linked data cloud. Achieving this goal requires, as a first step, the development of a metadata model that, on the one hand, complies with existing standards, on the other hand, allows interoperability with other systems, and on the third, captures all the elements of description established by the curators of the collections. In this paper, we present a report on the current status of the work, in which we outline the most important requirements for the data model under development and then make a detailed comparison with the two standards that are the most relevant from the point of view of collections: Europeana Data Model used in Europeana and Encoded Archival Description used in Kalliope.

Updated: 2024-07-09 15:52:06

标题: 推进手稿元数据：在雅盖隆大学进行中的工作

摘要: 作为正在进行的研究项目的一部分，波兰雅盖隆大学的三个单位 -- 雅盖隆大学博物馆、雅盖隆大学档案馆和雅盖隆大学图书馆 -- 正在合作数字化文化遗产文件，详细描述它们，然后将这些描述整合到一个链接数据云中。实现这一目标需要作为第一步的元数据模型的开发，一方面要符合现有标准，另一方面要允许与其他系统的互操作性，并且要捕捉收藏馆馆长所建立的所有描述元素。在本文中，我们介绍了工作的当前状态报告，在报告中我们概述了正在开发的数据模型的最重要要求，然后与两个对收藏品来说最相关的标准进行了详细比较：欧洲数字化博物馆(Europeana)使用的欧洲数字化数据模型和Kalliope使用的编码档案描述(Encoded Archival Description)。

更新时间: 2024-07-09 15:52:06

领域: cs.DL,cs.AI

下载: http://arxiv.org/abs/2407.06976v1

Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects

In response to several cultural heritage initiatives at the Jagiellonian University, we have developed a new digitization workflow in collaboration with the Jagiellonian Library (JL). The solution is based on easy-to-access technological solutions -- Microsoft 365 cloud with MS Excel files as metadata acquisition interfaces, Office Script for validation, and MS Sharepoint for storage -- that allows metadata acquisition by domain experts (philologists, historians, philosophers, librarians, archivists, curators, etc.) regardless of their experience with information systems. The ultimate goal is to create a knowledge graph that describes the analyzed holdings, linked to general knowledge bases, as well as to other cultural heritage collections, so careful attention is paid to the high accuracy of metadata and proper links to external sources. The workflow has already been evaluated in two pilots in the DiHeLib project focused on digitizing the so-called "Berlin Collection" and in two workshops with international guests, which allowed for its refinement and confirmation of its correctness and usability for JL. As the proposed workflow does not interfere with existing systems or domain guidelines regarding digitization and basic metadata collection in a given institution (e.g., file type, image quality, use of Dublin Core/MARC-21), but extends them in order to enable rich metadata collection, not previously possible, we believe that it could be of interest to all GLAMs (galleries, libraries, archives, and museums).

Updated: 2024-07-09 15:49:47

标题: 微软基于云的数字化工作流程，用于文化遗产对象的丰富元数据获取

摘要: 作为对瓦迪斯瓦夫大学几项文化遗产倡议的回应，我们与瓦迪斯瓦夫图书馆（JL）合作开发了一种新的数字化工作流程。该解决方案基于易于访问的技术解决方案——使用Microsoft 365云与MS Excel文件作为元数据获取界面，使用Office Script进行验证，以及使用MS Sharepoint进行存储——这使得领域专家（语言学家、历史学家、哲学家、图书馆员、档案管理员、策展人等）可以进行元数据获取，而不受其对信息系统的经验影响。最终目标是创建一个描述被分析藏品的知识图谱，与通用知识库以及其他文化遗产收藏品联系起来，因此需要特别注意元数据的高准确性和正确链接到外部来源。该工作流程已在DiHeLib项目的两个试点中进行了评估，该项目专注于数字化所谓的“柏林收藏”，并在两个国际客人的研讨会中进行了测试，这些活动使其得以完善并确认其正确性和对JL的可用性。由于所提出的工作流程不会干扰现有系统或领域指南关于数字化和给定机构基本元数据收集的规定（例如，文件类型、图像质量、使用Dublin Core/MARC-21），而是扩展这些规定以实现丰富的元数据收集，这是以前无法实现的，我们相信这可能对所有GLAM（美术馆、图书馆、档案馆和博物馆）都有兴趣。

更新时间: 2024-07-09 15:49:47

领域: cs.DL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2407.06972v1

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3$\times$ lower GPU memory usage and 5$\times$ faster fitting time not only rivals INRs (e.g., WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 2000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. Code is available at https://github.com/Xinjie-Q/GaussianImage.

Updated: 2024-07-09 15:48:32

标题: 高斯图像：通过二维高斯喷洒实现的1000 FPS图像表示和压缩

摘要: 最近，隐式神经表示（INRs）在图像表示和压缩方面取得了巨大成功，提供了高视觉质量和快速渲染速度，每秒可达10-1000帧，假设有足够的GPU资源可用。然而，这种要求通常会阻碍它们在内存有限的低端设备上的使用。为此，我们提出了一种名为GaussianImage的2D高斯喷射图像表示和压缩的突破性范式。我们首先引入2D高斯来表示图像，每个高斯具有8个参数，包括位置、协方差和颜色。随后，我们揭示了一种基于累积求和的新型渲染算法。值得注意的是，我们的方法在GPU内存使用方面至少降低了3倍，并且拟合时间快了5倍，不仅在表示性能上与INRs（如WIRE、I-NGP）不相上下，而且无论参数大小如何，都提供了1500-2000 FPS的更快渲染速度。此外，我们整合了现有的矢量量化技术来构建图像编解码器。实验结果表明，我们的编解码器在速率失真性能上与基于压缩的INRs（如COIN和COIN++）相媲美，同时实现了约2000 FPS的解码速度。此外，初步概念验证表明，在使用部分bits-back编码时，我们的编解码器优于COIN和COIN++。代码可在https://github.com/Xinjie-Q/GaussianImage找到。

更新时间: 2024-07-09 15:48:32

领域: eess.IV,cs.AI,cs.CV,cs.MM

下载: http://arxiv.org/abs/2403.08551v5

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

We present in this paper novel accelerated fully first-order methods in \emph{Bilevel Optimization} (BLO). Firstly, for BLO under the assumption that the lower-level functions admit the typical strong convexity assumption, the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation} (\texttt{PRAF${}^2$BA}) algorithm leveraging \emph{fully} first-order oracles is proposed, whereas the algorithm for finding approximate first-order and second-order stationary points with state-of-the-art oracle query complexities in solving complex optimization tasks. Secondly, applying as a special case of BLO the \emph{nonconvex-strongly-convex} (NCSC) minimax optimization, \texttt{PRAF${}^2$BA} rediscovers \emph{perturbed restarted accelerated gradient descent ascent} (\texttt{PRAGDA}) that achieves the state-of-the-art complexity for finding approximate second-order stationary points. Additionally, we investigate the challenge of finding stationary points of the hyper-objective function in BLO when lower-level functions lack the typical strong convexity assumption, where we identify several regularity conditions of the lower-level problems that ensure tractability and present hardness results indicating the intractability of BLO for general convex lower-level functions. Under these regularity conditions we propose the \emph{Inexact Gradient-Free Method} (\texttt{IGFM}), utilizing the \emph{Switching Gradient Method} (\texttt{SGM}) as an efficient sub-routine to find an approximate stationary point of the hyper-objective in polynomial time. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms.

Updated: 2024-07-09 15:48:20

标题: 加速的全一阶方法用于双层和极小极大优化

摘要: 本文介绍了一种新颖的加速全一阶方法，用于\emph{双层优化}（BLO）。首先，在假设下层函数满足典型的强凸性假设的情况下，提出了利用\emph{全}一阶预言者的\emph{(扰动)重新启动加速全一阶双层逼近}（\texttt{PRAF${}^2$BA}）算法，该算法在解决复杂优化任务时具有最先进的预言者查询复杂度，用于寻找近似一阶和二阶稳定点。其次，将\emph{非凸强凸}（NCSC）极小极大优化作为BLO的特例，\texttt{PRAF${}^2$BA}重新发现了实现寻找近似二阶稳定点的\emph{扰动重新启动加速梯度下降上升}（\texttt{PRAGDA}）方法，实现了寻找近似二阶稳定点的最先进复杂度。此外，我们探讨了在下层函数缺乏典型强凸性假设时，寻找BLO中超目标函数的稳定点的挑战，我们确定了下层问题的几个正则条件，确保可处理性，并提出了表明对于一般凸下层函数，BLO的不可处理性的困难结果。在这些正则条件下，我们提出了\emph{不精确梯度自由方法}（\texttt{IGFM}），利用\emph{切换梯度方法}（\texttt{SGM}）作为一个高效子程序，在多项式时间内找到超目标的近似稳定点。我们提供了针对实际问题的实证研究，进一步验证了我们提出的算法的优越性。

更新时间: 2024-07-09 15:48:20

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.00914v3

Improving Out-of-Distribution Detection by Combining Existing Post-hoc Methods

Since the seminal paper of Hendrycks et al. arXiv:1610.02136, Post-hoc deep Out-of-Distribution (OOD) detection has expanded rapidly. As a result, practitioners working on safety-critical applications and seeking to improve the robustness of a neural network now have a plethora of methods to choose from. However, no method outperforms every other on every dataset arXiv:2210.07242, so the current best practice is to test all the methods on the datasets at hand. This paper shifts focus from developing new methods to effectively combining existing ones to enhance OOD detection. We propose and compare four different strategies for integrating multiple detection scores into a unified OOD detector, based on techniques such as majority vote, empirical and copulas-based Cumulative Distribution Function modeling, and multivariate quantiles based on optimal transport. We extend common OOD evaluation metrics -- like AUROC and FPR at fixed TPR rates -- to these multi-dimensional OOD detectors, allowing us to evaluate them and compare them with individual methods on extensive benchmarks. Furthermore, we propose a series of guidelines to choose what OOD detectors to combine in more realistic settings, i.e. in the absence of known OOD data, relying on principles drawn from Outlier Exposure arXiv:1812.04606. The code is available at https://github.com/paulnovello/multi-ood.

Updated: 2024-07-09 15:46:39

标题: 通过结合现有的事后方法来改进超范围检测

摘要: 自从Hendrycks等人的开创性论文arXiv:1610.02136以来，事后深度离群值（OOD）检测迅速发展。因此，从事安全关键应用程序并寻求改进神经网络鲁棒性的从业者现在有许多选择的方法。然而，没有一种方法在所有数据集上表现优于其他方法arXiv:2210.07242，因此当前最佳实践是在手头的数据集上测试所有方法。本文将重点从开发新方法转移到有效结合现有方法以增强OOD检测。我们提出并比较了四种不同的策略，将多个检测分数整合到统一的OOD检测器中，基于技术如多数表决、经验和基于Copulas的累积分布函数建模，以及基于最优传输的多元分位数。我们将常见的OOD评估指标--如AUROC和在固定TPR率下的FPR--扩展到这些多维OOD检测器，使我们能够在广泛的基准测试中评估它们并将其与个别方法进行比较。此外，我们提出一系列指南，以选择在更现实的设置中组合OOD检测器，即在没有已知OOD数据的情况下，依赖于来自Outlier Exposure arXiv:1812.04606的原则。代码可在https://github.com/paulnovello/multi-ood找到。

更新时间: 2024-07-09 15:46:39

领域: stat.ML,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.07135v1

GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning

Large language models (LLMs) have greatly impacted the natural language processing (NLP) field, particularly for the English language. These models have demonstrated capabilities in understanding and generating human-like text. The success of language models largely depends on the availability of high-quality instruction datasets, which consist of detailed task descriptions and corresponding responses that are essential for training the models to address a variety of prompts accurately. However, the availability and quality of these resources vary by language. While models perform well in English, they often need help with languages like Arabic, due to the lack of datasets for fine-tuning Arabic-specific tasks. To address this issue, we introduce InstAr-500k, a new Arabic instruction dataset created by generating and collecting content that covers several domains and instruction types. We assess this dataset by fine-tuning an open-source Gemma-7B model on several downstream tasks to improve its functionality. Based on multiple evaluations, our fine-tuned model achieves excellent performance on several Arabic NLP benchmarks. These outcomes emphasize the effectiveness of our dataset in elevating the capabilities of language models for Arabic. Our instruction dataset bridges the performance gap between English and Arabic language models by providing resources that amplify Arabic NLP development. Building on this foundation, we developed a model, GemmAr-7B-V1, specifically tuned to excel at a wide range of Arabic NLP tasks.

Updated: 2024-07-09 15:36:11

标题: GemmAr：通过阿拉伯语指导调整来增强LLMs

摘要: 大型语言模型（LLMs）在自然语言处理（NLP）领域产生了巨大影响，特别是对于英语。这些模型展示了理解和生成类似人类文本的能力。语言模型的成功在很大程度上取决于高质量的指导数据集的可用性，这些数据集包括详细的任务描述和相应的响应，对于训练模型准确地处理各种提示至关重要。然而，这些资源的可用性和质量因语言而异。虽然模型在英语中表现良好，但是在像阿拉伯语这样的语言中，由于缺乏用于微调特定任务的数据集，它们通常需要帮助。为解决这个问题，我们引入了InstAr-500k，一个由生成和收集覆盖多个领域和指导类型内容而创建的新的阿拉伯语指导数据集。我们通过在几个下游任务上对开源的Gemma-7B模型进行微调来评估这个数据集，以提高其功能性。根据多次评估，我们微调的模型在几个阿拉伯语NLP基准测试上取得了出色的表现。这些结果强调了我们的数据集在提升阿拉伯语语言模型能力方面的有效性。我们的指导数据集通过提供资源来增强阿拉伯语NLP发展，弥合了英语和阿拉伯语语言模型之间的性能差距。在此基础上，我们开发了一个模型，GemmAr-7B-V1，专门针对各种阿拉伯语NLP任务的表现进行优化。

更新时间: 2024-07-09 15:36:11

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.02147v2

ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization

In-context learning (ICL) is a recent advancement in the capabilities of large language models (LLMs). This feature allows users to perform a new task without updating the model. Concretely, users can address tasks during the inference time by conditioning on a few input-label pair demonstrations along with the test input. It is different than the conventional fine-tuning paradigm and offers more flexibility. However, this capability also introduces potential issues. For example, users may use the model on any data without restriction, such as performing tasks with improper or sensitive content, which might violate the model policy or conflict with the model owner's interests. As a model owner, it is crucial to establish a mechanism to control the model's behavior under ICL, depending on the model owner's requirements for various content. To this end, we introduce the concept of "applicability authorization" tailored for LLMs, particularly for ICL behavior, and propose a simple approach, ICLGuard. It is a fine-tuning framework designed to allow the model owner to regulate ICL behavior on different data. ICLGuard preserves the original LLM and fine-tunes only a minimal set of additional trainable parameters to "guard" the LLM. Empirical results show that the guarded LLM can deactivate its ICL ability on target data without affecting its ICL ability on other data and its general functionality across all data.

Updated: 2024-07-09 15:35:06

标题: ICLGuard：控制上下文学习行为以适用性授权

摘要: 上下文学习（ICL）是大型语言模型（LLMs）能力的最新进展。这一特性允许用户在不更新模型的情况下执行新任务。具体来说，用户在推理时可以通过在测试输入中加入少量输入-标签对演示来完成任务。这与传统的微调范式不同，并提供更灵活性。然而，这种能力也引入了潜在问题。例如，用户可能在任何数据上使用模型，如执行具有不当或敏感内容的任务，这可能违反模型政策或与模型所有者的利益冲突。作为模型所有者，建立一个机制来控制ICL下模型的行为对于满足模型所有者对各种内容的要求至关重要。为此，我们提出了"适用授权"概念，专门针对LLMs，特别是针对ICL行为，并提出了一种简单方法ICLGuard。这是一个微调框架，旨在使模型所有者能够调节不同数据上的ICL行为。ICLGuard保留了原始LLM，并仅微调了一组最少的附加可训练参数来"保护"LLM。实证结果显示，受保护的LLM可以在目标数据上停用其ICL能力，而不影响其在其他数据上的ICL能力和在所有数据上的一般功能性。

更新时间: 2024-07-09 15:35:06

领域: cs.CR,cs.CL

下载: http://arxiv.org/abs/2407.06955v1

Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation

This study explores the transfer learning capabilities of the TrOCR architecture to Spanish. TrOCR is a transformer-based Optical Character Recognition (OCR) model renowned for its state-of-the-art performance in English benchmarks. Inspired by Li et al. assertion regarding its adaptability to multilingual text recognition, we investigate two distinct approaches to adapt the model to a new language: integrating an English TrOCR encoder with a language specific decoder and train the model on this specific language, and fine-tuning the English base TrOCR model on a new language data. Due to the scarcity of publicly available datasets, we present a resource-efficient pipeline for creating OCR datasets in any language, along with a comprehensive benchmark of the different image generation methods employed with a focus on Visual Rich Documents (VRDs). Additionally, we offer a comparative analysis of the two approaches for the Spanish language, demonstrating that fine-tuning the English TrOCR on Spanish yields superior recognition than the language specific decoder for a fixed dataset size. We evaluate our model employing character and word error rate metrics on a public available printed dataset, comparing the performance against other open-source and cloud OCR spanish models. As far as we know, these resources represent the best open-source model for OCR in Spanish. The Spanish TrOCR models are publicly available on HuggingFace [20] and the code to generate the dataset is available on Github [25].

Updated: 2024-07-09 15:31:41

标题: 西班牙语TrOCR：利用迁移学习进行语言适应

摘要: 本研究探讨了TrOCR架构在西班牙语中的迁移学习能力。TrOCR是一种基于变压器的光学字符识别（OCR）模型，以其在英语基准测试中表现出色而闻名。受李等人关于其适应多语言文本识别的断言启发，我们研究了将模型适应到新语言的两种不同方法：将英语TrOCR编码器与特定语言解码器集成，并在该特定语言上对模型进行训练，以及在新语言数据上微调英语基础TrOCR模型。由于公开可用数据集的稀缺性，我们提出了一个资源高效的OCR数据集创建流程，以及一个重点放在视觉丰富文档（VRD）上使用的不同图像生成方法的综合基准测试。此外，我们为西班牙语提供了两种方法的比较分析，结果表明在固定数据集大小下，将英语TrOCR微调到西班牙语比使用特定语言解码器具有更优越的识别能力。我们使用公开可用的印刷数据集评估我们的模型，比较其在字符和词错误率指标上的表现与其他开源和云OCR西班牙语模型。据我们所知，这些资源代表了西班牙语OCR中最好的开源模型。西班牙语TrOCR模型可以在HuggingFace [20]上公开获得，生成数据集的代码可以在Github [25]上找到。

更新时间: 2024-07-09 15:31:41

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.06950v1

Equilibria in Two-Stage Facility Location with Atomic Clients

We consider competitive facility location as a two-stage multi-agent system with two types of clients. For a given host graph with weighted clients on the vertices, first facility agents strategically select vertices for opening their facilities. Then, the clients strategically select which of the opened facilities in their neighborhood to patronize. Facilities want to attract as much client weight as possible, clients want to minimize congestion on the chosen facility. All recently studied versions of this model assume that clients can split their weight strategically. We consider clients with unsplittable weights but allow mixed strategies. So clients may randomize over which facility to patronize. Besides modeling a natural client behavior, this subtle change yields drastic changes, e.g., for a given facility placement, qualitatively different client equilibria are possible. As our main result, we show that pure subgame perfect equilibria always exist if all client weights are identical. For this, we use a novel potential function argument, employing a hierarchical classification of the clients and sophisticated rounding in each step. In contrast, for non-identical clients, we show that deciding the existence of even approximately stable states is computationally intractable. On the positive side, we give a tight bound of $2$ on the price of anarchy which implies high social welfare of equilibria, if they exist.

Updated: 2024-07-09 15:31:11

标题: 两阶段设施选址中原子客户的平衡态

摘要: 我们将竞争性设施选址视为一个具有两种类型客户的两阶段多代理系统。对于给定的带有加权客户的主机图，首先设施代理根据策略选择顶点来开设他们的设施。然后，客户会策略性地选择在他们邻域内要光顾的已开设设施。设施希望吸引尽可能多的客户权重，客户希望最小化在选择设施时的拥挤程度。所有最近研究过的这个模型的版本都假设客户可以策略性地分配他们的权重。我们考虑具有不可分割权重的客户，但允许混合策略。因此，客户可能会在要光顾的设施上随机选择。除了对自然客户行为进行建模外，这个微妙的变化会产生 dr 改变，例如，对于给定的设施位置，可能存在 qualitatively 不同的客户均衡。作为我们的主要结果，我们证明了纯子博弈完美均衡总是存在，如果所有客户权重都相同的话。为此，我们使用了一个新颖的潜在函数论证，采用了对客户的分层分类和每一步的复杂近似。相比之下，对于非相同客户，我们证明了决定即使近似稳定状态的存在是计算难题。在积极的一面，我们给出了价格无序性的 $2$ 的紧密上界，这意味着如果存在均衡，它们将具有高社会福利。

更新时间: 2024-07-09 15:31:11

领域: cs.GT,cs.AI

下载: http://arxiv.org/abs/2403.03114v2

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Large language models (LLMs) have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, obtaining fine-grained feedback from the code interpreter, and engaging in self-reflection and correction. By annotating diverse interactive tool-use trajectories and employing query evolution on GSM8K and MATH datasets, we generate an instruction fine-tuning dataset called DotaMathQA with 574K query-response pairs. We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs across various in-domain and out-of-domain benchmarks. Notably, DotaMath-deepseek-7B showcases an outstanding performance of 64.8% on the competitive MATH dataset and 86.7% on GSM8K. Besides, DotaMath-deepseek-7B maintains strong competitiveness on a series of in-domain and out-of-domain benchmarks (Avg. 80.1%). Looking forward, we anticipate that the DotaMath paradigm will open new pathways for addressing intricate mathematical problems. Our code is publicly available at https://github.com/ChengpengLi1003/DotaMath.

Updated: 2024-07-09 15:29:03

标题: DotaMath：借助代码辅助和自我校正进行数学推理的思维分解

摘要: 大型语言模型（LLMs）在处理简单数学问题方面取得了令人印象深刻的进展，但它们仍然在处理更具挑战性和复杂的数学任务时遇到困难。在本文中，我们介绍了一系列采用思维分解与代码辅助以及自我纠正进行数学推理的LLMs，被称为DotaMath。DotaMath模型通过将复杂数学任务分解为更简单的逻辑子任务，利用代码解决这些子任务，从代码解释器获得细粒度反馈，并进行自我反思和纠正来应对这些任务。通过对GSM8K和MATH数据集进行各种互动工具使用轨迹的注释和使用查询演化，我们生成了一个包含574K查询-响应对的指导微调数据集，称为DotaMathQA。我们使用模仿学习在DotaMathQA上训练一系列基础LLMs，结果表明与各种领域内外的基准LLMs相比，DotaMath模型取得了显著的性能。值得注意的是，DotaMath-deepseek-7B在竞争性的MATH数据集上表现出色，达到64.8％，在GSM8K上达到86.7％。此外，DotaMath-deepseek-7B在一系列领域内外的基准测试中保持着强大的竞争力（平均80.1％）。展望未来，我们期待DotaMath范式将为解决复杂的数学问题开辟新的途径。我们的代码可以在https://github.com/ChengpengLi1003/DotaMath上公开获取。

更新时间: 2024-07-09 15:29:03

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.04078v2

Self-Recognition in Language Models

A rapidly growing number of applications rely on a small set of closed-source language models (LMs). This dependency might introduce novel security risks if LMs develop self-recognition capabilities. Inspired by human identity verification methods, we propose a novel approach for assessing self-recognition in LMs using model-generated "security questions". Our test can be externally administered to keep track of frontier models as it does not require access to internal model parameters or output probabilities. We use our test to examine self-recognition in ten of the most capable open- and closed-source LMs currently publicly available. Our extensive experiments found no empirical evidence of general or consistent self-recognition in any examined LM. Instead, our results suggest that given a set of alternatives, LMs seek to pick the "best" answer, regardless of its origin. Moreover, we find indications that preferences about which models produce the best answers are consistent across LMs. We additionally uncover novel insights on position bias considerations for LMs in multiple-choice settings.

Updated: 2024-07-09 15:23:28

标题: 语言模型中的自我识别

摘要: 一个迅速增长的应用数量依赖于一小部分闭源语言模型（LMs）。如果LMs发展出自我识别能力，这种依赖可能引入新的安全风险。受人类身份验证方法启发，我们提出了一种新颖的方法，用于评估LMs中的自我识别，使用模型生成的“安全问题”。我们的测试可以在外部进行，无需访问内部模型参数或输出概率，以便追踪前沿模型。我们使用我们的测试来检查目前公开可用的十个最有能力的开源和闭源LMs中的自我识别。我们的广泛实验证明，没有任何被检查的LM中存在普遍或一致的自我识别的经验证据。相反，我们的结果表明，在给定一组替代方案时，LMs试图选择“最佳”答案，而不管其来源如何。此外，我们发现关于哪些模型产生最佳答案的偏好在LMs之间是一致的迹象。我们还发现了关于LMs在多项选择设置中的位置偏见考虑的新颖见解。

更新时间: 2024-07-09 15:23:28

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06946v1

Working Backwards: Learning to Place by Picking

We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific, contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the grasping process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects initially located at their target placement locations. Our system can collect hundreds of demonstrations in contact-constrained environments without human intervention using two modules: compliant control for grasping and tactile regrasping. We train a policy directly from visual observations through behavioural cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table). We validate our approach in home robot scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of success rate and data efficiency, while requiring no human supervision.

Updated: 2024-07-09 15:21:24

标题: 倒推学习：通过挑选学习放置

摘要: 我们提出了通过拾取（PvP）的方法，用于自主收集特定放置任务的真实世界演示，在这些任务中，必须将物体操纵到特定的接触约束位置。通过PvP，我们通过反转抓取过程并利用拾取和放置问题的固有对称性来接近机器人物体放置演示的收集。具体来说，我们从一组对象的抓取序列中获得放置演示，这些对象最初位于它们的目标放置位置。我们的系统可以在接触约束环境中收集数百个演示，无需人类干预，使用两个模块：用于抓取的顺应控制和触觉重新抓取。我们通过行为克隆直接从视觉观察中通过自主收集的演示训练策略。通过这样做，该策略可以推广到训练环境之外的物体放置场景，而无需特权信息（例如，放置从桌子上拿起的盘子）。我们在包括洗碗机装载和摆桌的家庭机器人场景中验证了我们的方法。我们的方法产生了优于通过动作教学训练的策略的机器人放置策略，无论是在成功率还是数据效率方面，同时无需人类监督。

更新时间: 2024-07-09 15:21:24

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.02352v3

A hybrid LLM workflow can help identify user privilege related variables in programs of any size

Many programs involves operations and logic manipulating user privileges, which is essential for the security of an organization. Therefore, one common malicious goal of attackers is to obtain or escalate the privileges, causing privilege leakage. To protect the program and the organization against privilege leakage attacks, it is important to eliminate the vulnerabilities which can be exploited to achieve such attacks. Unfortunately, while memory vulnerabilities are less challenging to find, logic vulnerabilities are much more imminent, harmful and difficult to identify. Accordingly, many analysts choose to find user privilege related (UPR) variables first as start points to investigate the code where the UPR variables may be used to see if there exists any vulnerabilities, especially the logic ones. In this paper, we introduce a large language model (LLM) workflow that can assist analysts in identifying such UPR variables, which is considered to be a very time-consuming task. Specifically, our tool will audit all the variables in a program and output a UPR score, which is the degree of relationship (closeness) between the variable and user privileges, for each variable. The proposed approach avoids the drawbacks introduced by directly prompting a LLM to find UPR variables by focusing on leverage the LLM at statement level instead of supplying LLM with very long code snippets. Those variables with high UPR scores are essentially potential UPR variables, which should be manually investigated. Our experiments show that using a typical UPR score threshold (i.e., UPR score >0.8), the false positive rate (FPR) is only 13.49%, while UPR variable found is significantly more than that of the heuristic based method.

Updated: 2024-07-09 15:20:52

标题: 一个混合LLM工作流程可以帮助识别任何规模程序中与用户权限相关的变量

摘要: 许多程序涉及操作和逻辑操作用户权限，这对组织的安全至关重要。因此，攻击者的一个常见恶意目标是获取或提升权限，导致权限泄漏。为了保护程序和组织免受权限泄漏攻击，消除可能被利用以实现此类攻击的漏洞至关重要。不幸的是，虽然内存漏洞更容易发现，但逻辑漏洞更为迫在眉睫、有害且难以识别。因此，许多分析人员选择首先找到与用户权限相关的变量作为调查代码的起点，看看这些变量是否被用于存在漏洞，特别是逻辑漏洞。在本文中，我们介绍了一个大型语言模型(LLM)工作流程，可以帮助分析人员识别这些与用户权限相关的变量，这被认为是一项非常耗时的任务。具体而言，我们的工具将审核程序中的所有变量，并为每个变量输出一个UPR分数，这是变量与用户权限之间关系(接近程度)的度量。所提出的方法遏制了直接提示LLM查找UPR变量引入的缺点，重点是利用LLM在语句级别上而不是提供LLM非常长的代码片段。具有较高UPR分数的变量基本上是潜在的UPR变量，应该进行手动调查。我们的实验表明，使用典型的UPR分数阈值(即UPR分数>0.8)，假阳性率(FPR)仅为13.49%，而找到的UPR变量显著多于基于启发式方法的数量。

更新时间: 2024-07-09 15:20:52

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2403.15723v2

An Improved Two-Step Attack on CRYSTALS-Kyber

After three rounds of post-quantum cryptography (PQC) strict evaluations conducted by the national institute of standards and technology (NIST), CRYSTALS-Kyber has successfully been selected and drafted for standardization from the mid of 2022. It becomes urgent to further evaluate Kyber's physical security for the upcoming deployment phase. In this paper, we present an improved two-step attack on Kyber to quickly recover the full secret key, s, by using much fewer energy traces and less time. In the first step, we use the correlation power analysis (CPA) attack to obtain a portion of guess values of s with a small number of energy traces. The CPA attack is enhanced by utilizing both the Pearson and Kendall's rank correlation coefficients and modifying the leakage model to improve the accuracy. In the second step, we adopt the lattice attack to recover s based on the results of CPA. The success rate is largely built up by constructing a trail-and-error method. We implement the proposed attack for the reference implementation of Kyber512 (4 128-value groups of s) on ARM Cortex-M4 and successfully recover a 128-value group of s in about 9 minutes using a 16-core machine. Additionally, in that case, we only cost at most 60 CPA guess values for a group and 15 power traces for a guess.

Updated: 2024-07-09 15:19:09

标题: 一种改进的两步攻击CRYSTALS-Kyber

摘要: 在国家标准与技术研究所（NIST）进行了三轮后量子密码学（PQC）严格评估之后，CRYSTALS-Kyber已成功被选中并于2022年中期起草标准化。对于即将到来的部署阶段，进一步评估Kyber的物理安全变得紧迫。本文介绍了一种改进的两步攻击Kyber的方法，通过使用更少的能量轨迹和更少的时间快速恢复完整的秘钥s。在第一步中，我们使用相关功率分析（CPA）攻击来获得s的部分猜测值，只需少量的能量轨迹。通过利用皮尔逊和肯德尔秩相关系数，并修改泄漏模型以提高准确性，增强了CPA攻击。在第二步中，我们采用格攻击来基于CPA的结果恢复s。成功率主要通过构建试错方法建立。我们在ARM Cortex-M4上实现了对Kyber512（4个128值组的s）的参考实现的攻击，并成功地在大约9分钟内使用16核机器恢复了一个128值组的s。此外，在这种情况下，我们只需要最多60个CPA猜测值来处理一个组，以及15个功率轨迹来进行一个猜测。

更新时间: 2024-07-09 15:19:09

领域: cs.CR

下载: http://arxiv.org/abs/2407.06942v1

Raply: A profanity-mitigated rap generator

The task of writing rap is challenging and involves producing complex rhyming schemes, yet meaningful lyrics. In this work, we propose Raply, a fine-tuned GPT-2 model capable of producing meaningful rhyming text in the style of rap. In addition to its rhyming capabilities, the model is able to generate less offensive content. It was achieved through the fine-tuning the model on a new dataset Mitislurs, a profanity-mitigated corpus. We evaluate the output of the model on two criteria: 1) rhyming based on the rhyme density metric; 2) profanity content, using the list of profanities for the English language. To our knowledge, this is the first attempt at profanity mitigation for rap lyrics generation.

Updated: 2024-07-09 15:18:56

标题: Raply：一种减少粗话的说唱生成器

摘要: 写rap的任务具有挑战性，需要产生复杂的押韵方案，同时要有意义的歌词。在这项工作中，我们提出了Raply，一个经过精细调整的GPT-2模型，能够以rap风格生成有意义的押韵文本。除了具有押韵能力外，该模型还能生成较少冒犯性的内容。这是通过在一个新的数据集Mitislurs上对模型进行微调实现的，这是一个经过减少亵渎语言的语料库。我们根据两个标准评估模型的输出：1）基于韵密度指标的押韵能力；2）亵渎内容，使用英语语言的亵渎词列表。据我们所知，这是针对rap歌词生成的第一次尝试减少亵渎性。

更新时间: 2024-07-09 15:18:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06941v1

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a promising paradigm to address these challenges, whereas its existing methods lack design for AU characteristics. Therefore, we innovatively investigate PETL paradigm to AU detection, introducing AUFormer and proposing a novel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism. An individual MoKE specific to a certain AU with minimal learnable parameters first integrates personalized multi-scale and correlation knowledge. Then the MoKE collaborates with other MoKEs in the expert group to obtain aggregated information and inject it into the frozen Vision Transformer (ViT) to achieve parameter-efficient AU detection. Additionally, we design a Margin-truncated Difficulty-aware Weighted Asymmetric Loss (MDWA-Loss), which can encourage the model to focus more on activated AUs, differentiate the difficulty of unactivated AUs, and discard potential mislabeled samples. Extensive experiments from various perspectives, including within-domain, cross-domain, data efficiency, and micro-expression domain, demonstrate AUFormer's state-of-the-art performance and robust generalization abilities without relying on additional relevant data. The code for AUFormer is available at https://github.com/yuankaishen2001/AUFormer.

Updated: 2024-07-09 15:15:21

标题: AUFormer: 视觉变换器是参数高效的面部动作单元检测器

摘要: 面部动作单元（AU）是情感计算领域中的重要概念，AU检测一直是一个热门的研究课题。现有方法存在过拟合问题，这是由于在稀缺的AU注释数据集上利用大量可学习参数，或者过分依赖大量额外相关数据导致的。参数高效的迁移学习（PETL）提供了一个有希望的范式来解决这些挑战，然而其现有方法缺乏针对AU特征的设计。因此，我们创新地将PETL范式应用于AU检测，引入AUFormer并提出了一种新颖的知识专家混合机制（MoKE）。一个针对特定AU的MoKE首先集成个性化的多尺度和相关知识，拥有最少的可学习参数。然后MoKE与专家组中的其他MoKE合作，获取聚合信息并将其注入冻结的视觉变压器（ViT）以实现参数高效的AU检测。此外，我们设计了一种截断边界难度感知加权不对称损失（MDWA-Loss），可以鼓励模型更专注于激活的AU，区分未激活AU的难度，丢弃潜在的错误标记样本。从各种角度进行的广泛实验，包括领域内、跨领域、数据效率和微表情领域，展示了AUFormer的最先进性能和强大的泛化能力，而无需依赖额外的相关数据。AUFormer的代码可在https://github.com/yuankaishen2001/AUFormer找到。

更新时间: 2024-07-09 15:15:21

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.04697v2

SpiralShard: Highly Concurrent and Secure Blockchain Sharding via Linked Cross-shard Endorsement

Blockchain sharding improves the scalability of blockchain systems by partitioning the whole blockchain state, nodes, and transaction workloads into different shards. However, existing blockchain sharding systems generally suffer from a small number of shards, resulting in limited concurrency. The main reason is that existing sharding systems require large shard sizes to ensure security. To enhance the concurrency of blockchain sharding securely, we propose SpiralShard. The intuition is to allow the existence of some shards with a larger fraction of malicious nodes (i.e., corrupted shards), thus reducing shard sizes. SpiralShard can configure more and smaller shards for higher concurrency at the same network size. To ensure security with the existence of corrupted shards, we propose the Linked Cross-shard Endorsement (LCE) protocol. According to our LCE protocol, the blocks of each shard are sequentially verified and endorsed by a group of shards before being finalized. As a result, a corrupted shard can eliminate forks with the help of the other shards. We implement SpiralShard based on Harmony and conduct extensive evaluations. Experimental results show that, compared with Harmony, SpiralShard achieves around 19x throughput gain under a large network size with 4,000+ nodes.

Updated: 2024-07-09 15:14:44

标题: SpiralShard：通过链式跨片认可实现高度并发和安全的区块链分片

摘要: 区块链分片通过将整个区块链状态、节点和交易负载分区到不同的分片中，提高了区块链系统的可扩展性。然而，现有的区块链分片系统通常存在分片数量较少，导致并发性受限。主要原因在于现有的分片系统需要较大的分片大小来确保安全性。为了安全地增强区块链分片的并发性，我们提出了SpiralShard。其核心思想是允许存在一些拥有较大比例恶意节点（即被破坏的分片），从而减小分片大小。SpiralShard可以在相同网络规模下配置更多、更小的分片，以实现更高的并发性。为了确保存在被破坏的分片时的安全性，我们提出了联合跨分片认可（LCE）协议。根据我们的LCE协议，每个分片的区块在最终确定之前会被一组分片顺序验证和认可。因此，被破坏的分片可以在其他分片的帮助下消除分叉。我们基于Harmony实现了SpiralShard，并进行了广泛的评估。实验结果显示，在拥有4000多个节点的大型网络规模下，与Harmony相比，SpiralShard实现了约19倍的吞吐量增益。

更新时间: 2024-07-09 15:14:44

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2407.08651v1

Bayesian Federated Learning with Hamiltonian Monte Carlo: Algorithm and Theory

This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification. We establish rigorous convergence guarantees of FA-HMC on non-iid distributed data sets, under the strong convexity and Hessian smoothness assumptions. Our analysis investigates the effects of parameter space dimension, noise on gradients and momentum, and the frequency of communication (between the central node and local nodes) on the convergence and communication costs of FA-HMC. Beyond that, we establish the tightness of our analysis by showing that the convergence rate cannot be improved even for continuous FA-HMC process. Moreover, extensive empirical studies demonstrate that FA-HMC outperforms the existing Federated Averaging-Langevin Monte Carlo (FA-LD) algorithm.

Updated: 2024-07-09 15:10:59

标题: 用哈密尔顿蒙特卡洛进行贝叶斯联邦学习：算法与理论

摘要: 这项工作介绍了一种新颖高效的贝叶斯联邦学习算法，即联邦平均随机哈密尔顿蒙特卡罗（FA-HMC），用于参数估计和不确定性量化。我们建立了FA-HMC在非独立同分布数据集上的严格收敛保证，基于强凸性和Hessian平滑性假设。我们的分析研究了参数空间维度、梯度和动量上的噪声以及通信频率（中心节点与本地节点之间）对FA-HMC的收敛和通信成本的影响。此外，我们通过展示即使是连续FA-HMC过程，收敛速度也无法改进来证明我们分析的紧密性。此外，大量实证研究表明，FA-HMC胜过现有的联邦平均-朗格朗日蒙特卡罗（FA-LD）算法。

更新时间: 2024-07-09 15:10:59

领域: cs.LG,stat.CO,stat.ML

下载: http://arxiv.org/abs/2407.06935v1

Integrating Ontology Design with the CRISP-DM in the context of Cyber-Physical Systems Maintenance

In the following contribution, a method is introduced that integrates domain expert-centric ontology design with the Cross-Industry Standard Process for Data Mining (CRISP-DM). This approach aims to efficiently build an application-specific ontology tailored to the corrective maintenance of Cyber-Physical Systems (CPS). The proposed method is divided into three phases. In phase one, ontology requirements are systematically specified, defining the relevant knowledge scope. Accordingly, CPS life cycle data is contextualized in phase two using domain-specific ontological artifacts. This formalized domain knowledge is then utilized in the CRISP-DM to efficiently extract new insights from the data. Finally, the newly developed data-driven model is employed to populate and expand the ontology. Thus, information extracted from this model is semantically annotated and aligned with the existing ontology in phase three. The applicability of this method has been evaluated in an anomaly detection case study for a modular process plant.

Updated: 2024-07-09 15:06:47

标题: 在网络物理系统维护的背景下，将本体设计与CRISP-DM集成

摘要: 在接下来的贡献中，介绍了一种将领域专家中心本体设计与跨行业标准数据挖掘过程（CRISP-DM）相结合的方法。该方法旨在有效构建一个针对网络物理系统（CPS）的纠正性维护的特定应用本体。所提出的方法分为三个阶段。在第一阶段，本体需求被系统地确定，定义相关知识范围。因此，在第二阶段，使用领域特定的本体工件对CPS生命周期数据进行上下文化。然后，这种形式化的领域知识被用于CRISP-DM，以有效地从数据中提取新的见解。最后，新开发的数据驱动模型被用于填充和扩展本体。因此，在第三阶段，从这个模型中提取的信息被语义注释并与现有的本体对齐。这种方法的适用性已在模块化过程工厂异常检测案例研究中进行了评估。

更新时间: 2024-07-09 15:06:47

领域: cs.AI

下载: http://arxiv.org/abs/2407.06930v1

Convergence of the Chambolle-Pock Algorithm in the Absence of Monotonicity

The Chambolle-Pock algorithm (CPA), also known as the primal-dual hybrid gradient method, has gained popularity over the last decade due to its success in solving large-scale convex structured problems. This work extends its convergence analysis for problems with varying degrees of (non)monotonicity, quantified through a so-called oblique weak Minty condition on the associated primal-dual operator. Our results reveal novel stepsize and relaxation parameter ranges which do not only depend on the norm of the linear mapping, but also on its other singular values. In particular, in nonmonotone settings, in addition to the classical stepsize conditions, extra bounds on the stepsizes and relaxation parameters are required. On the other hand, in the strongly monotone setting, the relaxation parameter is allowed to exceed the classical upper bound of two. Moreover, we build upon the recently introduced class of semimonotone operators, providing sufficient convergence conditions for CPA when the individual operators are semimonotone. Since this class of operators encompasses traditional operator classes including (hypo)- and co(hypo)-monotone operators, this analysis recovers and extends existing results for CPA. Tightness of the proposed stepsize ranges is demonstrated through several examples.

Updated: 2024-07-09 14:51:36

标题: Chambolle-Pock算法在缺乏单调性的情况下的收敛性

摘要: The Chambolle-Pock算法（CPA），也称为原始-对偶混合梯度方法，在过去十年中因其在解决大规模凸结构问题方面取得成功而广受欢迎。本文扩展了其收敛分析，针对具有不同程度（非）单调性的问题，并通过所谓的斜弱Minty条件对相关原始-对偶算子进行量化。我们的结果揭示了新颖的步长和松弛参数范围，这些范围不仅取决于线性映射的范数，还取决于其其他奇异值。特别是，在非单调设置中，除了经典步长条件外，还需要额外对步长和松弛参数进行限制。另一方面，在强单调设置中，允许松弛参数超过传统的上限两。此外，我们基于最近引入的半单调算子类，提供了CPA的收敛条件，当单个算子是半单调时。由于这类算子包括传统的算子类，包括（假设）-和共（假设）-单调算子，因此这种分析恢复并扩展了CPA的现有结果。通过几个例子展示了所提议的步长范围的紧密性。

更新时间: 2024-07-09 14:51:36

领域: math.OC,cs.LG,47H04, 49J52, 49J53, 65K15, 90C26

下载: http://arxiv.org/abs/2312.06540v2

Synthetic Data: Revisiting the Privacy-Utility Trade-off

Synthetic data has been considered a better privacy-preserving alternative to traditionally sanitized data across various applications. However, a recent article challenges this notion, stating that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques, and that it leads to unpredictable utility loss and highly unpredictable privacy gain. The article also claims to have identified a breach in the differential privacy guarantees provided by PATEGAN and PrivBayes. When a study claims to refute or invalidate prior findings, it is crucial to verify and validate the study. In our work, we analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment, which limits the applicability of its findings to general cases. Our exploration also revealed that the game did not satisfy a crucial precondition concerning data distributions, which contributed to the perceived violation of the differential privacy guarantees offered by PATEGAN and PrivBayes. We also conducted a privacy-utility trade-off analysis in a more general and unconstrained environment. Our experimentation demonstrated that synthetic data achieves a more favorable privacy-utility trade-off compared to the provided implementation of k-anonymization, thereby reaffirming earlier conclusions.

Updated: 2024-07-09 14:48:43

标题: 合成数据：重新审视隐私-效用权衡

摘要: 合成数据一直被认为是传统消毒数据的更好的隐私保护替代方案，适用于各种应用。然而，最近一篇文章对这一观点提出质疑，指出合成数据并不提供比传统匿名化技术更好的隐私和效用之间的权衡，而且会导致无法预测的效用损失和高度不可预测的隐私增益。该文章还声称已经发现了PATEGAN和PrivBayes提供的差分隐私保证存在漏洞。当一项研究声称反驳或否定之前的发现时，验证和验证该研究是至关重要的。在我们的工作中，我们分析了文章中描述的隐私游戏的实施情况，并发现它在一个高度专门化和受限环境中运行，这限制了其发现对一般情况的适用性。我们的探索还揭示了该游戏未满足关于数据分布的关键前提条件，这导致了对PATEGAN和PrivBayes提供的差分隐私保证的认为违反。我们还在一个更一般和不受限制的环境中进行了隐私-效用权衡分析。我们的实验表明，与提供的k-匿名化实现相比，合成数据实现了更有利的隐私-效用权衡，从而重申了先前的结论。

更新时间: 2024-07-09 14:48:43

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.07926v1

Differentially Private Multiway and $k$-Cut

In this paper, we address the challenge of differential privacy in the context of graph cuts, specifically focusing on the minimum $k$-cut and multiway cut problems. We introduce edge-differentially private algorithms that achieve nearly optimal performance for these problems. For the multiway cut problem, we first provide a private algorithm with a multiplicative approximation ratio that matches the state-of-the-art non-private algorithm. We then present a tight information-theoretic lower bound on the additive error, demonstrating that our algorithm on weighted graphs is near-optimal for constant $k$. For the minimum $k$-cut problem, our algorithms leverage a known bound on the number of approximate $k$-cuts, resulting in a private algorithm with optimal additive error $O(k\log n)$ for fixed privacy parameter. We also establish a information-theoretic lower bound that matches this additive error. Additionally, we give an efficient private algorithm for $k$-cut even for non-constant $k$, including a polynomial-time 2-approximation with an additive error of $\widetilde{O}(k^{1.5})$.

Updated: 2024-07-09 14:46:33

标题: 差分隐私的多路和$k$-割

摘要: 在本文中，我们讨论了图割中差分隐私的挑战，特别关注最小$k$-割和多路割问题。我们引入了边差分私密算法，为这些问题实现了几乎最佳性能。对于多路割问题，我们首先提供了一个具有乘法逼近比率的私密算法，与最先进的非私密算法相匹配。然后，我们提出了一个紧密信息理论下界，证明了我们在加权图上的算法对于恒定$k$几乎是最佳的。对于最小$k$-割问题，我们的算法利用了对近似$k$-割数量的已知上界，导致了一个具有固定隐私参数的最佳添加误差$O(k\log n)$的私密算法。我们还建立了一个与此添加误差匹配的信息理论下界。此外，我们为非恒定$k$甚至为$k$-割提供了一个有效的私密算法，包括一个具有$\widetilde{O}(k^{1.5})$添加误差的多项式时间2-逼近。

更新时间: 2024-07-09 14:46:33

领域: cs.CR,cs.DS

下载: http://arxiv.org/abs/2407.06911v1

Fine-grained large-scale content recommendations for MSX sellers

One of the most critical tasks of Microsoft sellers is to meticulously track and nurture potential business opportunities through proactive engagement and tailored solutions. Recommender systems play a central role to help sellers achieve their goals. In this paper, we present a content recommendation model which surfaces various types of content (technical documentation, comparison with competitor products, customer success stories etc.) that sellers can share with their customers or use for their own self-learning. The model operates at the opportunity level which is the lowest possible granularity and the most relevant one for sellers. It is based on semantic matching between metadata from the contents and carefully selected attributes of the opportunities. Considering the volume of seller-managed opportunities in organizations such as Microsoft, we show how to perform efficient semantic matching over a very large number of opportunity-content combinations. The main challenge is to ensure that the top-5 relevant contents for each opportunity are recommended out of a total of $\approx 40,000$ published contents. We achieve this target through an extensive comparison of different model architectures and feature selection. Finally, we further examine the quality of the recommendations in a quantitative manner using a combination of human domain experts as well as by using the recently proposed "LLM as a judge" framework.

Updated: 2024-07-09 14:46:09

标题: 细粒度的大规模内容推荐给MSX卖家

摘要: 微软销售人员最关键的任务之一是通过主动参与和量身定制的解决方案，精心跟踪和培育潜在的商机。推荐系统在帮助销售人员实现目标方面发挥着核心作用。在本文中，我们提出了一个内容推荐模型，该模型展示了销售人员可以与客户分享或用于自我学习的各种类型的内容（技术文档、与竞争对手产品的比较、客户成功案例等）。该模型在机会级别运作，这是最低可能的细粒度，也是销售人员最相关的。它基于内容的元数据和机会的精心选择属性之间的语义匹配。考虑到像微软这样的组织中销售人员管理的机会数量，我们展示了如何在非常庞大的机会-内容组合上进行高效的语义匹配。主要挑战在于确保对于每个机会推荐的前5个相关内容中，有大约40,000个已发布内容。通过对不同模型架构和特征选择的广泛比较，我们实现了这一目标。最后，我们进一步以定量方式检验推荐的质量，使用人类领域专家的组合以及最近提出的“LLM作为评判者”框架。

更新时间: 2024-07-09 14:46:09

领域: cs.IR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06910v1

Intercepting Unauthorized Aerial Robots in Controlled Airspace Using Reinforcement Learning

The proliferation of unmanned aerial vehicles (UAVs) in controlled airspace presents significant risks, including potential collisions, disruptions to air traffic, and security threats. Ensuring the safe and efficient operation of airspace, particularly in urban environments and near critical infrastructure, necessitates effective methods to intercept unauthorized or non-cooperative UAVs. This work addresses the critical need for robust, adaptive systems capable of managing such threats through the use of Reinforcement Learning (RL). We present a novel approach utilizing RL to train fixed-wing UAV pursuer agents for intercepting dynamic evader targets. Our methodology explores both model-based and model-free RL algorithms, specifically DreamerV3, Truncated Quantile Critics (TQC), and Soft Actor-Critic (SAC). The training and evaluation of these algorithms were conducted under diverse scenarios, including unseen evasion strategies and environmental perturbations. Our approach leverages high-fidelity flight dynamics simulations to create realistic training environments. This research underscores the importance of developing intelligent, adaptive control systems for UAV interception, significantly contributing to the advancement of secure and efficient airspace management. It demonstrates the potential of RL to train systems capable of autonomously achieving these critical tasks.

Updated: 2024-07-09 14:45:47

标题: 使用强化学习拦截受控领空中的未经授权空中机器人

摘要: 在受控空域中无人机（UAV）的增加带来了重大风险，包括潜在碰撞、干扰空中交通和安全威胁。确保空域的安全和高效运作，特别是在城市环境和靠近关键基础设施附近，需要有效的方法来拦截未经授权或不合作的无人机。这项工作解决了通过强化学习（RL）使用鲁棒、自适应系统来处理这些威胁的迫切需求。我们提出了一种利用RL训练固定翼无人机追逐者代理以拦截动态躲避目标的新方法。我们的方法探索了基于模型和无模型的RL算法，特别是DreamerV3、截断量子评论家（TQC）和软演员-评论家（SAC）。这些算法的训练和评估在不同场景下进行，包括未见过的躲避策略和环境干扰。我们的方法利用高保真飞行动力学模拟来创建真实的训练环境。这项研究强调了为无人机拦截开发智能、自适应控制系统的重要性，这将极大地促进安全和高效的空域管理。它展示了RL训练系统能够自主完成这些关键任务的潜力。

更新时间: 2024-07-09 14:45:47

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06909v1

Bayesian grey-box identification of nonlinear convection effects in heat transfer dynamics

We propose a computational procedure for identifying convection in heat transfer dynamics. The procedure is based on a Gaussian process latent force model, consisting of a white-box component (i.e., known physics) for the conduction and linear convection effects and a Gaussian process that acts as a black-box component for the nonlinear convection effects. States are inferred through Bayesian smoothing and we obtain approximate posterior distributions for the kernel covariance function's hyperparameters using Laplace's method. The nonlinear convection function is recovered from the Gaussian process states using a Bayesian regression model. We validate the procedure by simulation error using the identified nonlinear convection function, on both data from a simulated system and measurements from a physical assembly.

Updated: 2024-07-09 14:37:11

标题: 贝叶斯灰盒方法识别热传递动力学中的非线性对流效应

摘要: 我们提出了一种用于识别热传递动力学中对流的计算程序。该程序基于高斯过程潜在力模型，包括一个白盒子组件（即已知物理学）用于导热和线性对流效应，以及一个高斯过程作为非线性对流效应的黑盒子组件。通过贝叶斯平滑推断状态，并使用拉普拉斯方法获得核协方差函数的超参数的近似后验分布。通过贝叶斯回归模型，从高斯过程状态中恢复非线性对流函数。我们通过使用识别的非线性对流函数进行模拟误差验证该程序，在来自模拟系统的数据和来自物理装配的测量数据上进行。

更新时间: 2024-07-09 14:37:11

领域: eess.SY,cs.CE,cs.LG,cs.SY

下载: http://arxiv.org/abs/2407.01226v2

AllMatch: Exploiting All Unlabeled Data for Semi-Supervised Learning

Existing semi-supervised learning algorithms adopt pseudo-labeling and consistency regulation techniques to introduce supervision signals for unlabeled samples. To overcome the inherent limitation of threshold-based pseudo-labeling, prior studies have attempted to align the confidence threshold with the evolving learning status of the model, which is estimated through the predictions made on the unlabeled data. In this paper, we further reveal that classifier weights can reflect the differentiated learning status across categories and consequently propose a class-specific adaptive threshold mechanism. Additionally, considering that even the optimal threshold scheme cannot resolve the problem of discarding unlabeled samples, a binary classification consistency regulation approach is designed to distinguish candidate classes from negative options for all unlabeled samples. By combining the above strategies, we present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100% utilization ratio for the unlabeled data. We extensively evaluate our approach on multiple benchmarks, encompassing both balanced and imbalanced settings. The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.

Updated: 2024-07-09 14:35:57

标题: 全匹配：利用所有未标记数据进行半监督学习

摘要: 现有的半监督学习算法采用伪标记和一致性调节技术为未标记样本引入监督信号。为了克服基于阈值的伪标记的固有限制，先前的研究尝试通过对未标记数据进行预测来估计模型的演进学习状态，并将置信度阈值与之对齐。在本文中，我们进一步揭示分类器权重可以反映不同类别之间的学习状态差异，因此提出了一个特定于类别的自适应阈值机制。此外，考虑到即使最佳阈值方案也无法解决丢弃未标记样本的问题，我们设计了一种二元分类一致性调节方法，用于区分所有未标记样本的候选类别和负面选项。通过结合以上策略，我们提出了一个名为AllMatch的新颖SSL算法，其实现了改进的伪标记准确性和对未标记数据的100%利用率。我们在多个基准测试中广泛评估了我们的方法，包括平衡和不平衡设置。结果表明，AllMatch始终优于现有的最先进方法。

更新时间: 2024-07-09 14:35:57

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.15763v2

Hypergraph based Understanding for Document Semantic Entity Recognition

Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time. It can conduct a more detailed analysis of the document text representation analyzed by the upstream model and achieves a better performance of semantic information. We apply this method on the basis of GraphLayoutLM to construct a new semantic entity recognition model HGALayoutLM. Our experiment results on FUNSD, CORD, XFUND and SROIE show that our method can effectively improve the performance of semantic entity recognition tasks based on the original model. The results of HGALayoutLM on FUNSD and XFUND reach the new state-of-the-art results.

Updated: 2024-07-09 14:35:49

标题: 基于超图的文档语义实体识别理解

摘要: 语义实体识别是在视觉丰富的文档理解领域中的重要任务。它通过分析文本节点之间的位置关系和文本内容之间的关系来区分文本的语义类型。现有的文档理解模型主要关注实体类别，而忽略了实体边界的提取。我们构建了一个新颖的超图注意力文档语义实体识别框架HGA，它利用超图注意力同时关注实体边界和实体类别。它能够对上游模型分析的文档文本表示进行更详细的分析，并实现更好的语义信息性能。我们将这种方法应用在GraphLayoutLM的基础上构建了一个新的语义实体识别模型HGALayoutLM。我们在FUNSD、CORD、XFUND和SROIE上的实验结果表明，我们的方法可以有效提高基于原始模型的语义实体识别任务的性能。HGALayoutLM在FUNSD和XFUND上达到了最新的最佳结果。

更新时间: 2024-07-09 14:35:49

领域: cs.AI

下载: http://arxiv.org/abs/2407.06904v1

Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective

One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data are dispatched to multiple annotators. The annotator-produced labels are then fused to serve downstream learning and inference tasks. This annotation process often creates noisy labels due to various reasons, such as the limited expertise, or unreliability of annotators, among others. Therefore, a core objective in crowdsourcing is to develop methods that effectively mitigate the negative impact of such label noise on learning tasks. This feature article introduces advances in learning from noisy crowdsourced labels. The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches, emphasizing analytical insights and algorithmic developments. In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization, and novel, principled solutions of longstanding challenges in crowdsourcing -- showing how SP perspectives drive the advancements of this field. Furthermore, this article touches upon emerging topics that are critical for developing cutting-edge AI/ML systems, such as crowdsourcing in reinforcement learning with human feedback (RLHF) and direct preference optimization (DPO) that are key techniques for fine-tuning large language models (LLMs).

Updated: 2024-07-09 14:34:40

标题: 从众包噪声标签中学习：一个信号处理的视角

摘要: 推动人工智能（AI）和机器学习（ML）进步的主要催化剂之一是海量、精心策划的数据集的可用性。策划这类海量数据集的常用技术是众包，其中数据被分发给多个注释者。然后将注释者产生的标签融合，用于下游学习和推理任务。由于各种原因，如注释者的专业知识有限或不可靠性等，这种注释过程经常会产生嘈杂的标签。因此，众包的核心目标是开发有效减轻这种标签噪声对学习任务的负面影响的方法。本特色文章介绍了从嘈杂的众包标签中学习的进展。重点是关键的众包模型及其方法论处理，从经典统计模型到最近的基于深度学习的方法，强调分析洞察和算法发展。特别是，本文回顾了信号处理（SP）理论和方法，如张量的可识别性和非负矩阵分解，以及众包领域长期挑战的新颖、有原则的解决方案之间的联系，展示了SP视角如何推动该领域的进步。此外，本文还涉及对于开发尖端AI/ML系统至关重要的新兴主题，例如在强化学习中与人类反馈（RLHF）和直接偏好优化（DPO）相关的众包，这是微调大型语言模型（LLMs）的关键技术。

更新时间: 2024-07-09 14:34:40

领域: eess.SP,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2407.06902v1

Sampling for Model Predictive Trajectory Planning in Autonomous Driving using Normalizing Flows

Alongside optimization-based planners, sampling-based approaches are often used in trajectory planning for autonomous driving due to their simplicity. Model predictive path integral control is a framework that builds upon optimization principles while incorporating stochastic sampling of input trajectories. This paper investigates several sampling approaches for trajectory generation. In this context, normalizing flows originating from the field of variational inference are considered for the generation of sampling distributions, as they model transformations of simple to more complex distributions. Accordingly, learning-based normalizing flow models are trained for a more efficient exploration of the input domain for the task at hand. The developed algorithm and the proposed sampling distributions are evaluated in two simulation scenarios.

Updated: 2024-07-09 14:31:07

标题: 使用归一化流进行自动驾驶模型预测轨迹规划的采样

摘要: 除了基于优化的规划器外，由于其简单性，采样方法在自动驾驶的轨迹规划中经常被使用。模型预测路径积分控制是一个建立在优化原则基础上并结合随机采样输入轨迹的框架。本文研究了用于轨迹生成的几种采样方法。在这个背景下，来自变分推断领域的标准化流被考虑用于生成采样分布，因为它们模拟了从简单到复杂分布的转换。因此，基于学习的标准化流模型被训练用于更有效地探索手头任务的输入域。开发的算法和提出的采样分布在两个模拟场景中进行了评估。

更新时间: 2024-07-09 14:31:07

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2404.09657v2

Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In contrast, resting state (RS) EEG signals are a viable alternative due to ease of acquisition with rich subject information. In this paper, we propose a novel subject-adaptive transfer learning strategy that utilizes RS EEG signals to adapt models on unseen subject data. Specifically, we disentangle extracted features into task- and subject-dependent features and use them to calibrate RS EEG signals for obtaining task information while preserving subject characteristics. The calibrated signals are then used to adapt the model to the target subject, enabling the model to simulate processing TS EEG signals of the target subject. The proposed method achieves state-of-the-art accuracy on three public benchmarks, demonstrating the effectiveness of our method in cross-subject EEG MI classification. Our findings highlight the potential of leveraging RS EEG signals to advance practical brain-computer interface systems. The code is available at https://github.com/SionAn/MICCAI2024-ResTL.

Updated: 2024-07-09 14:30:24

标题: 主题自适应迁移学习：利用静息态脑电信号进行跨主体脑电运动想象分类

摘要: 脑电图（EEG）运动想象（MI）分类是一项基础性但具有挑战性的任务，因为信号在个体之间存在变化，即个体间的可变性。先前的方法尝试通过在训练中使用目标主体的特定任务（TS）EEG信号来减轻这种情况。然而，记录TS EEG信号需要时间，并限制了其在各个领域的适用性。相比之下，静息态（RS）EEG信号是一种可行的替代方案，因为其易于获取且包含丰富的主体信息。在本文中，我们提出了一种新颖的主体自适应迁移学习策略，利用RS EEG信号来调整模型对未见主体数据。具体而言，我们将提取的特征分解为任务和主体相关特征，并使用它们来校准RS EEG信号，以获取任务信息同时保留主体特征。然后使用校准信号来调整模型以适应目标主体，使模型能够模拟处理目标主体的TS EEG信号。所提出的方法在三个公共基准上实现了最先进的准确性，展示了我们的方法在跨主体EEG MI分类中的有效性。我们的研究结果突显了利用RS EEG信号推进实用脑-计算机界面系统的潜力。代码可在https://github.com/SionAn/MICCAI2024-ResTL获得。

更新时间: 2024-07-09 14:30:24

领域: eess.SP,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19346v2

A Concept-Value Network as a Brain Model

This paper suggests a statistical framework for describing the relations between the physical and conceptual entities of a brain-like model. Features and concept instances are put into context, where the paper suggests that features may be the electrical wiring, although chemical connections are also possible. With this idea, the actual length of the connection is important, because it is related to firing rates and neuron synchronization, but the signal type is less important. The paper then suggests that concepts are neuron groups that link feature sets and concept instances are determined by chemical signals from those groups. Therefore, features become the static horizontal framework of the neural system and concepts are vertically interconnected combinations of these. With regards to functionality, the neuron is then considered to be functional and the more horizontal memory structures can be glial. This would also suggest that features can be distributed entities and not concentrated to a single area.

Updated: 2024-07-09 14:26:52

标题: 一个概念-价值网络作为一个大脑模型

摘要: 这篇论文提出了一个描述类似大脑模型的物理和概念实体之间关系的统计框架。特征和概念实例被放置在上下文中，论文建议特征可能是电气布线，尽管化学连接也是可能的。根据这个想法，连接的实际长度很重要，因为它与发射速率和神经元同步有关，但信号类型不那么重要。论文随后指出，概念是将特征集链接在一起的神经元组，概念实例是由这些组发出的化学信号确定的。因此，特征成为神经系统的静态水平框架，概念是这些特征的垂直互连组合。就功能而言，神经元被认为是功能性的，更水平的记忆结构可以是胶质细胞。这也表明，特征可以是分布式实体，而不是集中在单一区域。

更新时间: 2024-07-09 14:26:52

领域: cs.NE,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/1904.04579v4

City-Scale Multi-Camera Vehicle Tracking System with Improved Self-Supervised Camera Link Model

Multi-Target Multi-Camera Tracking (MTMCT) has broad applications and forms the basis for numerous future city-wide systems (e.g. traffic management, crash detection, etc.). However, the challenge of matching vehicle trajectories across different cameras based solely on feature extraction poses significant difficulties. This article introduces an innovative multi-camera vehicle tracking system that utilizes a self-supervised camera link model. In contrast to related works that rely on manual spatial-temporal annotations, our model automatically extracts crucial multi-camera relationships for vehicle matching. The camera link is established through a pre-matching process that evaluates feature similarities, pair numbers, and time variance for high-quality tracks. This process calculates the probability of spatial linkage for all camera combinations, selecting the highest scoring pairs to create camera links. Our approach significantly improves deployment times by eliminating the need for human annotation, offering substantial improvements in efficiency and cost-effectiveness when it comes to real-world application. This pairing process supports cross camera matching by setting spatial-temporal constraints, reducing the searching space for potential vehicle matches. According to our experimental results, the proposed method achieves a new state-of-the-art among automatic camera-link based methods in CityFlow V2 benchmarks with 61.07% IDF1 Score.

Updated: 2024-07-09 14:24:45

标题: 城市规模的多摄像头车辆跟踪系统与改进的自监督摄像头连接模型

摘要: 多目标多摄像头跟踪（MTMCT）具有广泛的应用，并为许多未来城市范围系统（例如交通管理，事故检测等）奠定了基础。然而，仅基于特征提取匹配不同摄像头上的车辆轨迹的挑战存在显著困难。本文介绍了一种创新的多摄像头车辆跟踪系统，该系统利用了一种自我监督的摄像头链接模型。与依赖于手动空间-时间标注的相关作品不同，我们的模型自动提取了用于车辆匹配的关键多摄像头关系。摄像头链接是通过一个预匹配过程建立的，该过程评估特征相似性、对数和时间方差以获得高质量的轨迹。该过程计算了所有摄像头组合的空间链接概率，选择得分最高的对数来创建摄像头链接。我们的方法通过消除人工标注的需求，显著提高了部署时间，在实际应用中具有显著的效率和成本效益改进。这种配对过程通过设置空间-时间约束来支持跨摄像头匹配，减少潜在车辆匹配的搜索空间。根据我们的实验结果，所提出的方法在CityFlow V2基准测试中以61.07％的IDF1分数实现了新的自动摄像头链接方法的最新成果。

更新时间: 2024-07-09 14:24:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.11345v2

A Complete Set of Quadratic Constraints For Repeated ReLU

This paper derives a complete set of quadratic constraints (QCs) for the repeated ReLU. The complete set of QCs is described by a collection of $2^{n_v}$ matrix copositivity conditions where $n_v$ is the dimension of the repeated ReLU. We also show that only two functions satisfy all QCs in our complete set: the repeated ReLU and a repeated "flipped" ReLU. Thus our complete set of QCs bounds the repeated ReLU as tight as possible up to the sign invariance inherent in quadratic forms. We derive a similar complete set of incremental QCs for repeated ReLU, which can potentially lead to less conservative Lipschitz bounds for ReLU networks than the standard LipSDP approach. Finally, we illustrate the use of the complete set of QCs to assess stability and performance for recurrent neural networks with ReLU activation functions. The stability/performance condition combines Lyapunov/dissipativity theory with the QCs for repeated ReLU. A numerical implementation is given and demonstrated via a simple example.

Updated: 2024-07-09 14:18:30

标题: 一组用于重复ReLU的完整二次约束集合

摘要: 本文推导了重复ReLU的完整二次约束（QCs）集合。完整的QCs集合由一组$2^{n_v}$矩阵共正条件描述，其中$n_v$是重复ReLU的维度。我们还展示只有两个函数满足我们完整集合中的所有QCs：重复ReLU和重复“翻转”ReLU。因此，我们的完整QCs集合在二次形式固有的符号不变性下尽可能紧密地限制了重复ReLU。我们推导了重复ReLU的类似完整增量QCs集合，这可能比标准的LipSDP方法对ReLU网络的Lipschitz边界更少保守。最后，我们演示了使用完整QCs集合来评估具有ReLU激活函数的循环神经网络的稳定性和性能。稳定性/性能条件将Lyapunov/耗散性理论与重复ReLU的QCs结合起来。通过一个简单的示例给出了数值实现并进行了演示。

更新时间: 2024-07-09 14:18:30

领域: cs.LG,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2407.06888v1

Revisiting Experience Replayable Conditions

Experience replay (ER) used in (deep) reinforcement learning is considered to be applicable only to off-policy algorithms. However, there have been some cases in which ER has been applied for on-policy algorithms, suggesting that off-policyness might be a sufficient condition for applying ER. This paper reconsiders more strict "experience replayable conditions" (ERC) and proposes the way of modifying the existing algorithms to satisfy ERC. In light of this, it is postulated that the instability of policy improvements represents a pivotal factor in ERC. The instability factors are revealed from the viewpoint of metric learning as i) repulsive forces from negative samples and ii) replays of inappropriate experiences. Accordingly, the corresponding stabilization tricks are derived. As a result, it is confirmed through numerical simulations that the proposed stabilization tricks make ER applicable to an advantage actor-critic, an on-policy algorithm. Moreover, its learning performance is comparable to that of a soft actor-critic, a state-of-the-art off-policy algorithm.

Updated: 2024-07-09 14:16:53

标题: 重新审视经验重播条件

摘要: 经验重播（ER）在（深度）强化学习中被认为仅适用于离策略算法。然而，已经有一些情况下，ER已被应用于在策略算法中，这表明离策略性可能是应用ER的充分条件。本文重新考虑更严格的“可重播经验条件”（ERC），并提出修改现有算法以满足ERC的方式。鉴于此，我们假设政策改进的不稳定性代表了ERC中的一个关键因素。不稳定性因素从度量学习的角度揭示，即i）来自负样本的排斥力和ii）不恰当经验的重播。因此，相应的稳定技巧被推导出来。结果表明，通过数值模拟验证了所提出的稳定技巧使得ER可以应用于优势演员-评论家，一种在策略算法。此外，其学习性能与软演员评论家，一种最先进的离策略算法相当。

更新时间: 2024-07-09 14:16:53

领域: cs.LG

下载: http://arxiv.org/abs/2402.10374v2

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.

Updated: 2024-07-09 14:14:47

标题: 将网络空间与物理世界对齐：对具身人工智能的综合调查

摘要: 具身人工智能（Embodied AI）对于实现人工通用智能（AGI）至关重要，并为桥接虚拟空间和物理世界的各种应用奠定了基础。最近，多模态大型模型（MLMs）和世界模型（WMs）的出现引起了极大关注，因为它们具有卓越的感知、互动和推理能力，使其成为具身代理的大脑具有前景的架构。然而，在MLM时代中，还没有对具身人工智能进行全面调查。在本调查中，我们对具身人工智能的最新进展进行了全面探索。我们的分析首先浏览了具身机器人和模拟器的代表性作品的前沿，以全面了解研究重点及其局限性。然后，我们分析了四个主要研究目标：1）具身感知，2）具身互动，3）具身代理和4）从模拟到真实的适应，涵盖了最先进的方法、基本范式和全面数据集。此外，我们探讨了虚拟和真实具身代理中MLMs的复杂性，突出它们在促进动态数字和物理环境中的互动中的重要性。最后，我们总结了具身人工智能的挑战和局限性，并讨论了它们潜在的未来方向。我们希望这项调查能为研究界提供基础参考，并激发持续创新。相关项目可在https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List找到。

更新时间: 2024-07-09 14:14:47

领域: cs.CV,cs.AI,cs.LG,cs.MA,cs.RO

下载: http://arxiv.org/abs/2407.06886v1

Applications of artificial intelligence in the analysis of histopathology images of gliomas: a review

In recent years, the diagnosis of gliomas has become increasingly complex. Analysis of glioma histopathology images using artificial intelligence (AI) offers new opportunities to support diagnosis and outcome prediction. To give an overview of the current state of research, this review examines 83 publicly available research studies that have proposed AI-based methods for whole-slide histopathology images of human gliomas, covering the diagnostic tasks of subtyping (23/83), grading (27/83), molecular marker prediction (20/83), and survival prediction (29/83). All studies were reviewed with regard to methodological aspects as well as clinical applicability. It was found that the focus of current research is the assessment of hematoxylin and eosin-stained tissue sections of adult-type diffuse gliomas. The majority of studies (52/83) are based on the publicly available glioblastoma and low-grade glioma datasets from The Cancer Genome Atlas (TCGA) and only a few studies employed other datasets in isolation (16/83) or in addition to the TCGA datasets (15/83). Current approaches mostly rely on convolutional neural networks (63/83) for analyzing tissue at 20x magnification (35/83). A new field of research is the integration of clinical data, omics data, or magnetic resonance imaging (29/83). So far, AI-based methods have achieved promising results, but are not yet used in real clinical settings. Future work should focus on the independent validation of methods on larger, multi-site datasets with high-quality and up-to-date clinical and molecular pathology annotations to demonstrate routine applicability.

Updated: 2024-07-09 13:57:09

标题: 《人工智能在胶质瘤组织学图像分析中的应用：一项综述》

摘要: 近年来，胶质瘤的诊断变得越来越复杂。利用人工智能（AI）分析胶质瘤组织病理学图像为诊断和结果预测提供了新机会。为了概述当前研究的现状，本综述研究了83项公开研究，这些研究提出了基于AI的方法用于人类胶质瘤的全幻灯片组织病理学图像，涵盖了亚型（23/83）、分级（27/83）、分子标记预测（20/83）和生存预测（29/83）的诊断任务。所有研究都从方法学和临床适用性方面进行了审查。发现当前研究的重点是对成人型弥漫性胶质瘤的溴化铵和噻唑蓝染色组织切片进行评估。大多数研究（52/83）基于公开可获得的TCGA（癌症基因组图谱）的胶质母细胞瘤和低级别胶质瘤数据集，只有少数研究单独使用其他数据集（16/83）或与TCGA数据集一起使用（15/83）。目前的方法大多依赖于卷积神经网络（63/83）来分析20倍放大倍率下的组织。一个新的研究领域是整合临床数据、组学数据或磁共振成像（29/83）。到目前为止，基于AI的方法取得了有希望的结果，但尚未在真实临床环境中使用。未来的工作应该集中在更大、多地点数据集上独立验证方法，具有高质量和最新的临床和分子病理学注释，以证明其日常适用性。

更新时间: 2024-07-09 13:57:09

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.15022v3

Energy Efficient Fair STAR-RIS for Mobile Users

In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS elements allocated to each user. We then formulate a novel optimization problem by concurrently optimizing the phase shifts of the STAR-RIS and subsurface assignment variable. We leverage the deep reinforcement learning (DRL) technique to address this optimization problem. The DRL model predicts the phase shifts of the STAR-RIS and efficiently allocates elements of STAR-RIS to the users. Additionally, we incorporate a penalty term in the DRL model to facilitate intelligent deactivation of STAR-RIS elements when not in use to enhance energy efficiency. Through extensive experiments, we show that the proposed method can achieve fairly high and nearly equal data rates for all users in both the transmission and reflection spaces in an energy-efficient manner.

Updated: 2024-07-09 13:56:59

标题: 移动用户的能效公平STAR-RIS

摘要: 在这项工作中，我们提出了一种方法来提高同时传输和反射可重配置智能表面（STAR-RIS）对移动用户的能效和公平性，确保降低功耗同时保持可靠的通信。为了实现这一目标，我们引入了一个称为次表面分配变量的新参数，该参数确定分配给每个用户的STAR-RIS元素的数量。然后，我们通过同时优化STAR-RIS的相位移位和次表面分配变量，构建了一个新颖的优化问题。我们利用深度强化学习（DRL）技术来解决这个优化问题。DRL模型预测STAR-RIS的相位移位，并有效地将STAR-RIS的元素分配给用户。此外，我们在DRL模型中加入一个惩罚项，以促进在不使用时智能关闭STAR-RIS元素，以增强能效。通过大量实验，我们展示了这种方法可以以节能的方式实现所有用户在传输和反射空间中几乎相等的高数据速率。

更新时间: 2024-07-09 13:56:59

领域: cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2407.06868v1

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows that the pursuit of better classification can guide the data imputation process. While some works consider using label information to assist in this task, their simplistic utilization of labels lacks flexibility and may rely on strict assumptions. In this paper, we propose a new framework that effectively leverages supervision information to complete missing data in a manner conducive to classification. Specifically, this framework operates in two stages. Firstly, it leverages labels to supervise the optimization of similarity relationships among data, represented by the kernel matrix, with the goal of enhancing classification accuracy. To mitigate overfitting that may occur during this process, a perturbation variable is introduced to improve the robustness of the framework. Secondly, the learned kernel matrix serves as additional supervision information to guide data imputation through regression, utilizing the block coordinate descent method. The superiority of the proposed method is evaluated on four real-world data sets by comparing it with state-of-the-art imputation methods. Remarkably, our algorithm significantly outperforms other methods when the data is missing more than 60\% of the features

Updated: 2024-07-09 13:54:24

标题: 通过追求更好的分类实现数据插补：一种监督的基于核的方法

摘要: 数据插补是填补不完整数据集中缺失特征元素的过程，在数据驱动学习中起着至关重要的作用。一个基本观念是数据插补有助于学习性能，并且可以通过追求更好的分类来指导数据插补过程。虽然一些工作考虑使用标签信息来辅助这一任务，但它们对标签的简单利用缺乏灵活性，并且可能依赖严格的假设。在本文中，我们提出了一个新框架，有效利用监督信息以一种有利于分类的方式完成缺失数据。具体而言，该框架分为两个阶段。首先，它利用标签监督数据之间的相似关系的优化，表示为核矩阵，旨在提高分类准确性。为了减少在这一过程中可能出现的过拟合，引入了一个扰动变量来提高框架的鲁棒性。其次，学习到的核矩阵作为额外的监督信息，通过回归利用块坐标下降方法指导数据插补。通过与最先进的插补方法进行比较，在四个真实数据集上评估了所提出方法的优越性。值得注意的是，当数据缺失超过60\%的特征时，我们的算法明显优于其他方法。

更新时间: 2024-07-09 13:54:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.07800v2

Does CLIP Know My Face?

With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introduce a novel method to assess privacy for multi-modal models, specifically vision-language models like CLIP. The proposed Identity Inference Attack (IDIA) reveals whether an individual was included in the training data by querying the model with images of the same person. Letting the model choose from a wide variety of possible text labels, the model reveals whether it recognizes the person and, therefore, was used for training. Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy. We confirm that the model has learned to associate names with depicted individuals, implying the existence of sensitive information that can be extracted by adversaries. Our results highlight the need for stronger privacy protection in large-scale models and suggest that IDIAs can be used to prove the unauthorized use of data for training and to enforce privacy laws.

Updated: 2024-07-09 13:54:11

标题: CLIP知道我的脸吗？

摘要: 随着深度学习在各种应用中的崛起，围绕保护训练数据的隐私问题已经成为一个关键的研究领域。而先前的研究集中在单模型中的隐私风险，我们提出了一种新颖的方法来评估多模型模型的隐私，特别是像CLIP这样的视觉-语言模型。所提出的身份推断攻击（IDIA）通过使用相同人物的图像查询模型来揭示一个人是否包含在训练数据中。让模型从各种可能的文本标签中选择，模型会透露它是否识别出这个人，因此被用于训练。我们在CLIP上进行的大规模实验表明，使用于训练的个体可以被识别出来，并且准确率非常高。我们证实模型已经学会将姓名与描绘的人物相关联，暗示了可以被对手提取的敏感信息的存在。我们的结果强调了大规模模型需要更强的隐私保护，并建议可以使用IDIA来证明数据未经授权用于训练，并执行隐私法律。

更新时间: 2024-07-09 13:54:11

领域: cs.LG,cs.CR,cs.CV

下载: http://arxiv.org/abs/2209.07341v4

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context

While the biases of language models in production are extensively documented, the biases of their guardrails have been neglected. This paper studies how contextual information about the user influences the likelihood of an LLM to refuse to execute a request. By generating user biographies that offer ideological and demographic information, we find a number of biases in guardrail sensitivity on GPT-3.5. Younger, female, and Asian-American personas are more likely to trigger a refusal guardrail when requesting censored or illegal information. Guardrails are also sycophantic, refusing to comply with requests for a political position the user is likely to disagree with. We find that certain identity groups and seemingly innocuous information, e.g., sports fandom, can elicit changes in guardrail sensitivity similar to direct statements of political ideology. For each demographic category and even for American football team fandom, we find that ChatGPT appears to infer a likely political ideology and modify guardrail behavior accordingly.

Updated: 2024-07-09 13:53:38

标题: ChatGPT不信任充电器球迷：上下文中的护栏敏感性

摘要: 尽管生产中的语言模型的偏见被广泛记录，但它们的防护措施的偏见却被忽视了。本文研究了用户的上下文信息如何影响语言模型拒绝执行请求的可能性。通过生成提供意识形态和人口统计信息的用户传记，我们发现在GPT-3.5上的防护措施对年轻、女性和亚裔美国人的敏感度存在一些偏见。当请求被审查或非法信息时，更年轻、女性和亚裔美国人的人物更有可能触发拒绝防护措施。防护措施也会奉承，拒绝遵循用户可能不同意的政治立场的请求。我们发现某些身份群体和看似无害的信息，如体育迷，可以引起防护措施敏感度的变化，类似于直接陈述政治意识形态。对于每个人口统计类别，甚至对于美式橄榄球队的粉丝，我们发现ChatGPT似乎能推断出可能的政治意识形态，并相应地修改防护行为。

更新时间: 2024-07-09 13:53:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06866v1

Trust and Resilience in Federated Learning Through Smart Contracts Enabled Decentralized Systems

In this paper, we present a study of a Federated Learning (FL) system, based on the use of decentralized architectures to ensure trust and increase reliability. The system is based on the idea that the FL collaborators upload the (ciphered) model parameters on the Inter-Planetary File System (IPFS) and interact with a dedicated smart contract to track their behavior. Thank to this smart contract, the phases of parameter updates are managed efficiently, thereby strengthening data security. We have carried out an experimental study that exploits two different methods of weight aggregation, i.e., a classic averaging scheme and a federated proximal aggregation. The results confirm the feasibility of the proposal.

Updated: 2024-07-09 13:50:32

标题: 信任和韧性在通过智能合约实现的分布式系统中的联邦学习

摘要: 本文介绍了基于分散架构的联邦学习（FL）系统的研究，以确保信任和提高可靠性。该系统的基本思想是FL合作者将（加密的）模型参数上传到星际文件系统（IPFS），并与专用智能合约进行交互以跟踪它们的行为。通过这个智能合约，参数更新的阶段得以有效管理，从而加强数据安全性。我们进行了一项实验研究，利用了两种不同的权重聚合方法，即经典的平均方案和联邦近端聚合。结果证实了该提议的可行性。

更新时间: 2024-07-09 13:50:32

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06862v1

Performance Evaluation of Knowledge Graph Embedding Approaches under Non-adversarial Attacks

Knowledge Graph Embedding (KGE) transforms a discrete Knowledge Graph (KG) into a continuous vector space facilitating its use in various AI-driven applications like Semantic Search, Question Answering, or Recommenders. While KGE approaches are effective in these applications, most existing approaches assume that all information in the given KG is correct. This enables attackers to influence the output of these approaches, e.g., by perturbing the input. Consequently, the robustness of such KGE approaches has to be addressed. Recent work focused on adversarial attacks. However, non-adversarial attacks on all attack surfaces of these approaches have not been thoroughly examined. We close this gap by evaluating the impact of non-adversarial attacks on the performance of 5 state-of-the-art KGE algorithms on 5 datasets with respect to attacks on 3 attack surfaces-graph, parameter, and label perturbation. Our evaluation results suggest that label perturbation has a strong effect on the KGE performance, followed by parameter perturbation with a moderate and graph with a low effect.

Updated: 2024-07-09 13:42:14

标题: 知识图谱嵌入方法在非对抗攻击下的性能评估

摘要: 知识图谱嵌入（KGE）将离散的知识图谱（KG）转化为连续的向量空间，促进其在各种人工智能驱动应用中的使用，如语义搜索、问答或推荐系统。虽然KGE方法在这些应用中是有效的，但大多数现有方法假定给定的知识图谱中的所有信息都是正确的。这使得攻击者可以影响这些方法的输出，例如通过扰动输入。因此，必须解决这种KGE方法的鲁棒性问题。最近的工作着重于敌对攻击。然而，对这些方法的所有攻击面上的非敌对攻击尚未得到彻底检查。我们通过评估对5个数据集上5种最先进的KGE算法的表现的非敌对攻击对其性能的影响来填补这一空白，攻击面包括图形、参数和标签扰动。我们的评估结果表明，标签扰动对KGE性能有很强的影响，其次是参数扰动具有适度的影响，而图形扰动的影响较小。

更新时间: 2024-07-09 13:42:14

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.06855v1

TimeTravel: Real-time Timing Drift Attack on System Time Using Acoustic Waves

Real-time Clock (RTC) has been widely used in various real-time systems to provide precise system time. In this paper, we reveal a new security vulnerability of the RTC circuit, where the internal storage time or timestamp can be arbitrarily modified forward or backward. The security threat of dynamic modifications of system time caused by this vulnerability is called TimeTravel. Based on acoustic resonance and piezoelectric effects, TimeTravel applies acoustic guide waves to the quartz crystal, thereby adjusting the characteristics of the oscillating signal transmitted into the RTC circuit. By manipulating the parameters of acoustic waves, TimeTravel can accelerate or decelerate the timing speed of system time at an adjustable rate, resulting in the relative drift of the timing, which can pose serious safety threats. To assess the severity of TimeTravel, we examine nine modules and two commercial devices under the RTC circuit. The experimental results show that TimeTravel can drift system time forward and backward at a chosen speed with a maximum 93% accuracy. Our analysis further shows that TimeTravel can maintain an attack success rate of no less than 77% under environments with typical obstacle items.

Updated: 2024-07-09 13:41:46

标题: 时间旅行：使用声波进行系统时间实时时序漂移攻击

摘要: 实时时钟（RTC）被广泛应用于各种实时系统中，以提供精确的系统时间。在本文中，我们揭示了RTC电路的一个新的安全漏洞，其中内部存储时间或时间戳可以任意向前或向后修改。由于这一漏洞引起的系统时间动态修改的安全威胁被称为TimeTravel。基于声学共振和压电效应，TimeTravel应用声导波到石英晶体，从而调整传输到RTC电路的振荡信号的特性。通过操纵声波的参数，TimeTravel可以以可调速率加速或减速系统时间的定时速度，导致定时的相对漂移，可能带来严重的安全威胁。为了评估TimeTravel的严重性，我们检查了RTC电路下的九个模块和两个商用设备。实验结果显示，TimeTravel可以以最高93%的准确率以选择的速度向前或向后漂移系统时间。我们的分析进一步显示，在典型障碍物环境下，TimeTravel可以保持攻击成功率不低于77%。

更新时间: 2024-07-09 13:41:46

领域: cs.CR

下载: http://arxiv.org/abs/2407.06853v1

TE-SSL: Time and Event-aware Self Supervised Learning for Alzheimer's Disease Progression Analysis

Alzheimer's Dementia (AD) represents one of the most pressing challenges in the field of neurodegenerative disorders, with its progression analysis being crucial for understanding disease dynamics and developing targeted interventions. Recent advancements in deep learning and various representation learning strategies, including self-supervised learning (SSL), have shown significant promise in enhancing medical image analysis, providing innovative ways to extract meaningful patterns from complex data. Notably, the computer vision literature has demonstrated that incorporating supervisory signals into SSL can further augment model performance by guiding the learning process with additional relevant information. However, the application of such supervisory signals in the context of disease progression analysis remains largely unexplored. This gap is particularly pronounced given the inherent challenges of incorporating both event and time-to-event information into the learning paradigm. Addressing this, we propose a novel framework, Time and Even-aware SSL (TE-SSL), which integrates time-to-event and event data as supervisory signals to refine the learning process. Our comparative analysis with existing SSL-based methods in the downstream task of survival analysis shows superior performance across standard metrics.

Updated: 2024-07-09 13:41:32

标题: TE-SSL：面向时间和事件的自监督学习用于阿尔茨海默病进展分析

摘要: 阿尔茨海默病（AD）代表了神经退行性疾病领域中最紧迫的挑战之一，其疾病进展分析对于理解疾病动态和开发有针对性的干预至关重要。深度学习和各种表示学习策略的最新进展，包括自监督学习（SSL），已经显示出在增强医学图像分析方面具有显著的潜力，提供了从复杂数据中提取有意义模式的创新方法。值得注意的是，计算机视觉文献表明，将监督信号纳入到SSL中，可以通过提供额外的相关信息来引导学习过程，进一步增强模型的性能。然而，在疾病进展分析的背景下应用这种监督信号仍然是未被充分探索的。鉴于将事件和时间至事件信息同时融入到学习范式中的固有挑战，这一空白尤为突出。为了解决这个问题，我们提出了一个新颖的框架，即时间和事件感知的自监督学习（TE-SSL），它将时间至事件和事件数据作为监督信号整合到学习过程中。我们在下游生存分析任务中对现有基于SSL的方法进行了比较分析，结果显示TE-SSL在标准指标上表现出优越的性能。

更新时间: 2024-07-09 13:41:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06852v1

TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data

As attention to recorded data grows in the realm of automotive testing and manual evaluation reaches its limits, there is a growing need for automatic online anomaly detection. This real-world data is complex in many ways and requires the modelling of testee behaviour. To address this, we propose a temporal variational autoencoder (TeVAE) that can detect anomalies with minimal false positives when trained on unlabelled data. Our approach also avoids the bypass phenomenon and introduces a new method to remap individual windows to a continuous time series. Furthermore, we propose metrics to evaluate the detection delay and root-cause capability of our approach and present results from experiments on a real-world industrial data set. When properly configured, TeVAE flags anomalies only 6% of the time wrongly and detects 65% of anomalies present. It also has the potential to perform well with a smaller training and validation subset but requires a more sophisticated threshold estimation method.

Updated: 2024-07-09 13:32:33

标题: TeVAE：一种离散在线异常检测的变分自动编码器方法，适用于可变状态多变量时间序列数据

摘要: 随着对汽车测试记录数据的关注度增加，手动评估达到极限，自动在线异常检测的需求日益增长。这种现实世界的数据在许多方面都很复杂，需要对被测对象的行为进行建模。为了解决这个问题，我们提出了一种可以在未标记数据上训练时最小化误报的时间变分自动编码器（TeVAE），可以检测异常。我们的方法还避免了旁路现象，并引入了一种将单个窗口重新映射到连续时间序列的新方法。此外，我们提出了评估我们方法的检测延迟和根本原因能力的度量标准，并展示了在真实工业数据集上进行实验的结果。当正确配置时，TeVAE仅在错误地标记异常的情况下占6％的时间，并检测到65％的现有异常。它还具有在较小的训练和验证子集上表现良好的潜力，但需要更复杂的阈值估计方法。

更新时间: 2024-07-09 13:32:33

领域: cs.LG,cs.AI,cs.CE

下载: http://arxiv.org/abs/2407.06849v1

Shape-aware synthesis of pathological lung CT scans using CycleGAN for enhanced semi-supervised lung segmentation

This paper addresses the problem of pathological lung segmentation, a significant challenge in medical image analysis, particularly pronounced in cases of peripheral opacities (severe fibrosis and consolidation) because of the textural similarity between lung tissue and surrounding areas. To overcome these challenges, this paper emphasizes the use of CycleGAN for unpaired image-to-image translation, in order to provide an augmentation method able to generate fake pathological images matching an existing ground truth. Although previous studies have employed CycleGAN, they often neglect the challenge of shape deformation, which is crucial for accurate medical image segmentation. Our work introduces an innovative strategy that incorporates additional loss functions. Specifically, it proposes an L1 loss based on the lung surrounding which shape is constrained to remain unchanged at the transition from the healthy to pathological domains. The lung surrounding is derived based on ground truth lung masks available in the healthy domain. Furthermore, preprocessing steps, such as cropping based on ribs/vertebra locations, are applied to refine the input for the CycleGAN, ensuring that the network focus on the lung region. This is essential to avoid extraneous biases, such as the zoom effect bias, which can divert attention from the main task. The method is applied to enhance in semi-supervised manner the lung segmentation process by employing a U-Net model trained with on-the-fly data augmentation incorporating synthetic pathological tissues generated by the CycleGAN model. Preliminary results from this research demonstrate significant qualitative and quantitative improvements, setting a new benchmark in the field of pathological lung segmentation. Our code is available at https://github.com/noureddinekhiati/Semi-supervised-lung-segmentation

Updated: 2024-07-09 13:32:24

标题: 使用CycleGAN进行形状感知的病理性肺部CT扫描合成，以增强半监督肺部分割

摘要: 这篇论文讨论了病态肺部分割的问题，这是医学图像分析中的一个重要挑战，尤其在外围浸润（严重纤维化和实变）的情况下更加明显，因为肺组织与周围区域的纹理相似。为了克服这些挑战，本文强调使用CycleGAN进行非配对图像到图像的转换，以提供一种能够生成与现有标准匹配的假病理图像的增强方法。尽管先前的研究已经使用了CycleGAN，但它们经常忽视形状变形的挑战，而这对于准确的医学图像分割至关重要。我们的工作引入了一种创新策略，其中包含了额外的损失函数。具体而言，它提出了基于肺部周围的L1损失，其形状受限于在从健康到病态领域的过渡中保持不变。肺部周围是基于健康领域中可用的真实肺部掩模导出的。此外，通过基于肋骨/椎骨位置的裁剪等预处理步骤，用于优化CycleGAN的输入，确保网络专注于肺部区域。这是为了避免额外的偏见，例如缩放效应偏见，这可能会偏离主要任务。该方法应用于半监督方式增强肺部分割过程，通过使用U-Net模型，该模型经过实时数据增强培训，包括CycleGAN模型生成的合成病理组织。该研究的初步结果展示了显著的定性和定量改进，为病态肺部分割领域设定了新的基准。我们的代码可以在https://github.com/noureddinekhiati/Semi-supervised-lung-segmentation找到。

更新时间: 2024-07-09 13:32:24

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.08556v2

EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context

Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS). Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock of the global economy. In the current paper, we detail the design of an LLM-driven CRS in an SME setting, and its subsequent performance in the field using both objective system metrics and subjective user evaluations. While doing so, we additionally outline a short-form revised ResQue model for evaluating LLM-driven CRS, enabling replicability in a rapidly evolving field. Our results reveal good system performance from a user experience perspective (85.5% recommendation accuracy) but underscore latency, cost, and quality issues challenging business viability. Notably, with a median cost of $0.04 per interaction and a latency of 5.7s, cost-effectiveness and response time emerge as crucial areas for achieving a more user-friendly and economically viable LLM-driven CRS for SME settings. One major driver of these costs is the use of an advanced LLM as a ranker within the retrieval-augmented generation (RAG) technique. Our results additionally indicate that relying solely on approaches such as Prompt-based learning with ChatGPT as the underlying LLM makes it challenging to achieve satisfying quality in a production environment. Strategic considerations for SMEs deploying an LLM-driven CRS are outlined, particularly considering trade-offs in the current technical landscape.

Updated: 2024-07-09 13:31:00

标题: EventChat：在中小企业环境中探索休闲活动的大型语言模型驱动对话推荐系统的实施和用户中心评估

摘要: 大型语言模型（LLMs）在会话式推荐系统（CRS）的战略潜力方面呈现出巨大的进化。然而，迄今为止，研究主要集中在实施以LLM为驱动的CRS的技术框架上，而不是终端用户评估或企业战略影响，特别是从构成全球经济基础的中小型企业（SME）的角度来看。在本文中，我们详细介绍了在SME环境中设计LLM驱动的CRS的设计，以及在实地使用中的性能，同时使用客观系统指标和主观用户评价。在此过程中，我们另外概述了一个用于评估LLM驱动的CRS的简化版ResQue模型，从而使其在快速发展的领域中具有可复制性。我们的结果显示出良好的系统性能（85.5％的推荐准确度），但也强调了延迟、成本和质量问题对业务的可持续性构成挑战。值得注意的是，每次交互的中位成本为0.04美元，延迟为5.7秒，成本效益和响应时间成为在SME环境中实现更用户友好和经济可行的LLM驱动的CRS的关键领域。这些成本的一个主要驱动因素是在检索增强生成（RAG）技术中使用高级LLM作为排序器。我们的结果还表明，仅依靠Prompt-based学习与ChatGPT作为基础LLM，很难在生产环境中实现令人满意的质量。对于部署LLM驱动的CRS的SME的战略考虑被概述，特别是考虑当前技术景观中的权衡。

更新时间: 2024-07-09 13:31:00

领域: cs.IR,cs.AI,cs.CL,cs.LG,68T50,I.2.7; H.5.2

下载: http://arxiv.org/abs/2407.04472v3

A False Sense of Privacy: Towards a Reliable Evaluation Methodology for the Anonymization of Biometric Data

Biometric data contains distinctive human traits such as facial features or gait patterns. The use of biometric data permits an individuation so exact that the data is utilized effectively in identification and authentication systems. But for this same reason, privacy protections become indispensably necessary. Privacy protection is extensively afforded by the technique of anonymization. Anonymization techniques protect sensitive personal data from biometrics by obfuscating or removing information that allows linking records to the generating individuals, to achieve high levels of anonymity. However, our understanding and possibility to develop effective anonymization relies, in equal parts, on the effectiveness of the methods employed to evaluate anonymization performance. In this paper, we assess the state-of-the-art methods used to evaluate the performance of anonymization techniques for facial images and for gait patterns. We demonstrate that the state-of-the-art evaluation methods have serious and frequent shortcomings. In particular, we find that the underlying assumptions of the state-of-the-art are quite unwarranted. State-of-the-art methods generally assume a difficult recognition scenario and thus a weak adversary. However, that assumption causes state-of-the-art evaluations to grossly overestimate the performance of the anonymization. Therefore, we propose a strong adversary which is aware of the anonymization in place. We improve the selection process for the evaluation dataset, and we reduce the numbers of identities contained in the dataset while ensuring that these identities remain easily distinguishable from one another. Our novel evaluation methodology surpasses the state-of-the-art because we measure worst-case performance and so deliver a highly reliable evaluation of biometric anonymization techniques.

Updated: 2024-07-09 13:22:43

标题: 一种可靠的评估方法论：针对生物特征数据的匿名化的虚假隐私感

摘要: 生物识别数据包含独特的人类特征，如面部特征或步态模式。使用生物识别数据能够实现精确的个体识别，从而有效地在识别和认证系统中应用。但出于同样的原因，隐私保护变得非常必要。隐私保护在很大程度上通过匿名化技术来实现。匿名化技术通过混淆或删除允许将记录与生成个体相关联的信息，保护敏感个人数据免受生物特征识别的影响，以实现高水平的匿名性。然而，我们对于理解和开发有效的匿名化的可能性依赖于用于评估匿名化性能的方法的有效性。在本文中，我们评估了用于评估面部图像和步态模式匿名化技术性能的最新方法。我们展示了目前评估方法存在严重且频繁的缺陷。特别是，我们发现目前的基本假设是不合理的。目前的方法通常假设具有困难的识别场景和弱势对手。然而，这种假设导致目前的评估严重高估了匿名化的性能。因此，我们提出了一种知晓已实施匿名化的强势对手。我们改进了评估数据集的选择过程，并减少了数据集中包含的身份数量，同时确保这些身份之间仍然易于区分。我们的新颖评估方法超越了目前的技术水平，因为我们衡量最坏情况下的性能，从而提供了高度可靠的生物匿名化技术评估。

更新时间: 2024-07-09 13:22:43

领域: cs.CR

下载: http://arxiv.org/abs/2304.01635v2

How Can Recommender Systems Benefit from Large Language Models: A Survey

With the rapid development of online services, recommender systems (RS) have become increasingly indispensable for mitigating information overload. Despite remarkable progress, conventional recommendation models (CRM) still have some limitations, e.g., lacking open-world knowledge, and difficulties in comprehending users' underlying preferences and motivations. Meanwhile, large language models (LLM) have shown impressive general intelligence and human-like capabilities, which mainly stem from their extensive open-world knowledge, reasoning ability, as well as their comprehension of human culture and society. Consequently, the emergence of LLM is inspiring the design of recommender systems and pointing out a promising research direction, i.e., whether we can incorporate LLM and benefit from their knowledge and capabilities to compensate for the limitations of CRM. In this paper, we conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems. Specifically, we summarize existing works from two orthogonal aspects: where and how to adapt LLM to RS. For the WHERE question, we discuss the roles that LLM could play in different stages of the recommendation pipeline, i.e., feature engineering, feature encoder, scoring/ranking function, user interaction, and pipeline controller. For the HOW question, we investigate the training and inference strategies, resulting in two fine-grained taxonomy criteria, i.e., whether to tune LLM or not, and whether to involve conventional recommendation models for inference. Then, we highlight key challenges in adapting LLM to RS from three aspects, i.e., efficiency, effectiveness, and ethics. Finally, we summarize the survey and discuss the future prospects. We actively maintain a GitHub repository for papers and other related resources: https://github.com/CHIANGEL/Awesome-LLM-for-RecSys/.

Updated: 2024-07-09 13:17:52

标题: 大型语言模型如何受益于推荐系统：一项调查

摘要: 随着在线服务的快速发展，推荐系统（RS）在缓解信息过载方面变得越来越不可或缺。尽管取得了显著进展，传统的推荐模型（CRM）仍然存在一些局限，例如缺乏开放世界知识，以及难以理解用户的潜在偏好和动机。与此同时，大型语言模型（LLM）展现出了令人印象深刻的普遍智能和类人能力，这主要源于它们广泛的开放世界知识、推理能力，以及对人类文化和社会的理解。因此，LLM的出现激发了推荐系统的设计，并指出了一个有前途的研究方向，即我们是否可以整合LLM并从它们的知识和能力中受益，以弥补CRM的局限性。在本文中，我们从实际推荐系统整个流程的角度对这一研究方向进行了全面调查。具体而言，我们从两个正交方面总结了现有研究成果：如何在RS中适应LLM。对于“WHERE”问题，我们讨论了LLM在推荐流程的不同阶段扮演的角色，即特征工程、特征编码器、评分/排名函数、用户交互和流程控制器。对于“HOW”问题，我们调查了训练和推断策略，产生了两个细粒度的分类标准，即是否调整LLM，以及是否在推断中涉及传统的推荐模型。然后，我们强调了将LLM应用于RS所面临的关键挑战，即效率、有效性和道德。最后，我们总结了调查结果并讨论了未来的前景。我们积极维护一个GitHub存储库，用于论文和其他相关资源：https://github.com/CHIANGEL/Awesome-LLM-for-RecSys/。

更新时间: 2024-07-09 13:17:52

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2306.05817v6

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation

We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs). Over-optimization occurs when a reward model serves as an imperfect proxy for human preference, and RL-driven policy optimization erroneously exploits reward inaccuracies. In this paper, we begin by introducing a lightweight way to quantify uncertainties in rewards, relying solely on the last layer embeddings of the reward model, without the need for computationally expensive reward ensembles. AdvPO then addresses a distributionally robust optimization problem centred around the confidence interval of the reward model's predictions for policy improvement. Through comprehensive experiments on the Anthropic HH and TL;DR summarization datasets, we illustrate the efficacy of AdvPO in mitigating the overoptimization issue, consequently resulting in enhanced performance as evaluated through human-assisted evaluation.

Updated: 2024-07-09 13:17:36

标题: 通过轻量级不确定性估计的对抗策略优化克服奖励过度优化

摘要: 我们引入了对抗策略优化（AdvPO），这是一种解决强化学习从人类反馈（RLHF）中的奖励过度优化问题的新颖解决方案，特别是针对大型语言模型（LLMs）。当奖励模型作为人类偏好的不完美代理时，过度优化就会发生，RL驱动的策略优化错误地利用了奖励的不准确性。在本文中，我们首先介绍了一种轻量级的方法来量化奖励中的不确定性，仅依赖于奖励模型的最后一层嵌入，而无需计算昂贵的奖励集合。AdvPO然后解决围绕奖励模型预测的置信区间的分布鲁棒优化问题，以改进策略。通过在Anthropic HH和TL;DR摘要数据集上进行全面实验，我们展示了AdvPO在减轻过度优化问题方面的有效性，从而导致通过人类辅助评估所评估的性能改善。

更新时间: 2024-07-09 13:17:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.05171v2

Event Trojan: Asynchronous Event-based Backdoor Attacks

As asynchronous event data is more frequently engaged in various vision tasks, the risk of backdoor attacks becomes more evident. However, research into the potential risk associated with backdoor attacks in asynchronous event data has been scarce, leaving related tasks vulnerable to potential threats. This paper has uncovered the possibility of directly poisoning event data streams by proposing Event Trojan framework, including two kinds of triggers, i.e., immutable and mutable triggers. Specifically, our two types of event triggers are based on a sequence of simulated event spikes, which can be easily incorporated into any event stream to initiate backdoor attacks. Additionally, for the mutable trigger, we design an adaptive learning mechanism to maximize its aggressiveness. To improve the stealthiness, we introduce a novel loss function that constrains the generated contents of mutable triggers, minimizing the difference between triggers and original events while maintaining effectiveness. Extensive experiments on public event datasets show the effectiveness of the proposed backdoor triggers. We hope that this paper can draw greater attention to the potential threats posed by backdoor attacks on event-based tasks. Our code is available at https://github.com/rfww/EventTrojan.

Updated: 2024-07-09 13:15:39

标题: 事件木马：异步事件驱动的后门攻击

摘要: 随着异步事件数据在各种视觉任务中被越来越频繁地使用，后门攻击的风险变得更加明显。然而，关于异步事件数据中潜在风险与后门攻击相关的研究非常稀缺，使得相关任务容易受到潜在威胁。本文通过提出事件特洛伊框架揭示了直接污染事件数据流的可能性，包括两种触发器，即不可变触发器和可变触发器。具体来说，我们的两种事件触发器基于一系列模拟事件尖峰，可以轻松地整合到任何事件流中以发动后门攻击。此外，对于可变触发器，我们设计了一种自适应学习机制，以最大化其攻击性。为了提高隐蔽性，我们引入了一种新颖的损失函数，约束可变触发器生成内容，最小化触发器与原始事件之间的差异，同时保持有效性。对公共事件数据集进行的大量实验证明了所提出后门触发器的有效性。我们希望本文能引起人们对事件驱动任务中后门攻击潜在威胁的更多关注。我们的代码可在https://github.com/rfww/EventTrojan 上获得。

更新时间: 2024-07-09 13:15:39

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2407.06838v1

Problem-Solving in Language Model Networks

To improve the reasoning and question-answering capabilities of Large Language Models (LLMs), several multi-agent approaches have been introduced. While these methods enhance performance, the application of collective intelligence-based approaches to complex network structures and the dynamics of agent interactions remain underexplored. This work extends the concept of multi-agent debate to more general network topologies, measuring the question-answering accuracy, influence, consensus, and the effects of bias on the collective. The results show that random networks perform similarly to fully connected networks despite using significantly fewer tokens. Furthermore, a strong consensus among agents correlates with correct answers, whereas divided responses typically indicate incorrect answers. Analysing the influence of the agents reveals a balance between self-reflection and interconnectedness; self-reflection aids when local interactions are incorrect, and local interactions aid when the agent itself is incorrect. Additionally, bias plays a strong role in system performance with correctly biased hub nodes boosting performance. These insights suggest that using random networks or scale-free networks with knowledgeable agents placed in central positions can enhance the overall question-answering performance of multi-agent systems.

Updated: 2024-07-09 13:05:58

标题: 语言模型网络中的问题解决

摘要: 为了提升大型语言模型（LLMs）的推理和问答能力，引入了几种多智能体方法。尽管这些方法提高了性能，但在复杂网络结构和智能体交互动态的集体智能方法的应用仍未得到充分探索。本研究将多智能体辩论的概念扩展到更一般的网络拓扑结构，衡量了问答准确性、影响力、共识以及偏见对集体的影响。结果显示，随机网络与完全连接网络表现类似，尽管使用的标记明显较少。此外，智能体之间的强烈共识与正确答案相关，而分歧的回应通常表明答案不正确。分析智能体的影响力揭示了自我反思和相互连接之间的平衡；当局部交互不正确时，自我反思有所帮助，当智能体本身不正确时，局部交互有所帮助。此外，偏见在系统性能中起着重要作用，正确偏见的中心节点促进了性能。这些见解表明，在多智能体系统中，使用随机网络或无标度网络，并将知识智能体置于中心位置，可以提高整体问答性能。

更新时间: 2024-07-09 13:05:58

领域: cs.AI,cs.SI

下载: http://arxiv.org/abs/2406.12374v3

VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Businesses need to query visually rich documents (VRDs) like receipts, medical records, and insurance forms to make decisions. Existing techniques for extracting entities from VRDs struggle with new layouts or require extensive pre-training data. We introduce VRDSynth, a program synthesis method to automatically extract entity relations from multilingual VRDs without pre-training data. To capture the complexity of VRD domain, we design a domain-specific language (DSL) to capture spatial and textual relations to describe the synthesized programs. Along with this, we also derive a new synthesis algorithm utilizing frequent spatial relations, search space pruning, and a combination of positive, negative, and exclusive programs to improve coverage. We evaluate VRDSynth on the FUNSD and XFUND benchmarks for semantic entity linking, consisting of 1,592 forms in 8 languages. VRDSynth outperforms state-of-the-art pre-trained models (LayoutXLM, InfoXLMBase, and XLMRobertaBase) in 5, 6, and 7 out of 8 languages, respectively, improving the F1 score by 42% over LayoutXLM in English. To test the extensibility of the model, we further improve VRDSynth with automated table recognition, creating VRDSynth(Table), and compare it with extended versions of the pre-trained models, InfoXLM(Large) and XLMRoberta(Large). VRDSynth(Table) outperforms these baselines in 4 out of 8 languages and in average F1 score. VRDSynth also significantly reduces memory footprint (1M and 380MB vs. 1.48GB and 3GB for LayoutXLM) while maintaining similar time efficiency.

Updated: 2024-07-09 12:59:58

标题: VRDSynth：为多语言视觉丰富文档信息提取合成程序

摘要: 企业需要查询视觉丰富的文档（VRD），如收据、医疗记录和保险表格，以做出决策。现有的从VRD中提取实体的技术在处理新的布局时存在困难，或者需要大量的预训练数据。我们引入了VRDSynth，这是一种程序综合方法，可以自动从多语言VRD中提取实体关系而无需预训练数据。为了捕捉VRD领域的复杂性，我们设计了一个特定领域语言（DSL）来捕捉空间和文本关系，以描述合成的程序。除此之外，我们还提出了一种新的综合算法，利用频繁的空间关系、搜索空间修剪以及正面、负面和排他性程序的组合来提高覆盖率。我们在FUNSD和XFUND语义实体链接基准测试中评估了VRDSynth，这些测试包括8种语言的1,592个表单。在8种语言中，VRDSynth在5种、6种和7种语言中分别优于最先进的预训练模型（LayoutXLM、InfoXLMBase和XLMRobertaBase），英语中的F1分数比LayoutXLM提高了42%。为了测试模型的可扩展性，我们进一步改进了VRDSynth，添加了自动表格识别功能，创建了VRDSynth(Table)，并将其与预训练模型的扩展版本InfoXLM(Large)和XLMRoberta(Large)进行了比较。在8种语言中，VRDSynth(Table)在4种语言和平均F1分数中优于这些基线模型。VRDSynth还显著减少了内存占用（1M和380MB vs. LayoutXLM的1.48GB和3GB），同时保持了类似的时间效率。

更新时间: 2024-07-09 12:59:58

领域: cs.AI

下载: http://arxiv.org/abs/2407.06826v1

Cue Point Estimation using Object Detection

Cue points indicate possible temporal boundaries in a transition between two pieces of music in DJ mixing and constitute a crucial element in autonomous DJ systems as well as for live mixing. In this work, we present a novel method for automatic cue point estimation, interpreted as a computer vision object detection task. Our proposed system is based on a pre-trained object detection transformer which we fine-tune on our novel cue point dataset. Our provided dataset contains 21k manually annotated cue points from human experts as well as metronome information for nearly 5k individual tracks, making this dataset 35x larger than the previously available cue point dataset. Unlike previous methods, our approach does not require low-level musical information analysis, while demonstrating increased precision in retrieving cue point positions. Moreover, our proposed method demonstrates high adherence to phrasing, a type of high-level music structure commonly emphasized in electronic dance music. The code, model checkpoints, and dataset are made publicly available.

Updated: 2024-07-09 12:56:30

标题: 使用目标检测进行提示点估计

摘要: 提示点表示在DJ混音中两首音乐之间的过渡中可能存在的时间边界，并且构成自主DJ系统以及现场混音中的关键要素。在这项工作中，我们提出了一种新颖的自动提示点估计方法，将其解释为计算机视觉对象检测任务。我们提出的系统基于一个预先训练过的对象检测转换器，在我们的新颖提示点数据集上进行微调。我们提供的数据集包含来自人类专家的21k个手动注释的提示点，以及近5k首独立音轨的节拍信息，使得这个数据集比以前可用的提示点数据集大35倍。与先前的方法不同，我们的方法不需要低级别的音乐信息分析，同时在检索提示点位置方面表现出更高的精度。此外，我们提出的方法表现出高度的短语结构依从性，这是电子舞曲中常被强调的一种高级音乐结构。代码、模型检查点和数据集已公开发布。

更新时间: 2024-07-09 12:56:30

领域: cs.AI

下载: http://arxiv.org/abs/2407.06823v1

ChatTracer: Large Language Model Powered Real-time Bluetooth Device Tracking System

Large language models (LLMs) have transformed the way we interact with cyber technologies. In this paper, we study the possibility of connecting LLM with wireless sensor networks (WSN). A successful design will not only extend LLM's knowledge landscape to the physical world but also revolutionize human interaction with WSN. To the end, we present ChatTracer, an LLM-powered real-time Bluetooth device tracking system. ChatTracer comprises three key components: an array of Bluetooth sniffing nodes, a database, and a fine-tuned LLM. ChatTracer was designed based on our experimental observation that commercial Apple/Android devices always broadcast hundreds of BLE packets per minute even in their idle status. Its novelties lie in two aspects: i) a reliable and efficient BLE packet grouping algorithm; and ii) an LLM fine-tuning strategy that combines both supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). We have built a prototype of ChatTracer with four sniffing nodes. Experimental results show that ChatTracer not only outperforms existing localization approaches, but also provides an intelligent interface for user interaction.

Updated: 2024-07-09 12:56:01

标题: ChatTracer：基于大型语言模型的实时蓝牙设备跟踪系统

摘要: 大型语言模型(LLMs)已经改变了我们与网络技术互动的方式。在这篇论文中，我们研究了将LLM与无线传感器网络(WSN)连接的可能性。成功的设计不仅会将LLM的知识领域扩展到物理世界，还将彻底改变人类与WSN的互动方式。为此，我们提出了ChatTracer，一个由LLM驱动的实时蓝牙设备跟踪系统。ChatTracer包括三个关键组件：一组蓝牙嗅探节点、一个数据库和一个经过精细调整的LLM。ChatTracer的设计基于我们的实验观察，即商用的苹果/安卓设备即使在空闲状态下也会每分钟广播数百个BLE数据包。其创新之处在于两个方面：i)一种可靠高效的BLE数据包分组算法；ii)一种将监督微调(SFT)和结合人类反馈的强化学习(RLHF)的LLM微调策略。我们已经用四个嗅探节点建立了ChatTracer的原型。实验结果表明，ChatTracer不仅优于现有的定位方法，还为用户互动提供了智能界面。

更新时间: 2024-07-09 12:56:01

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2403.19833v2

Historical Review of Variants of Informal Semantics for Logic Programs under Answer Set Semantics: GL'88, GL'91, GK'14, D-V'12

This note presents a historical survey of informal semantics that are associated with logic programming under answer set semantics. We review these in uniform terms and align them with two paradigms: Answer Set Programming and ASP-Prolog -- two prominent Knowledge Representation and Reasoning Paradigms in Artificial Intelligence. Under consideration in Theory and Practice of Logic Programming (TPLP).

Updated: 2024-07-09 12:40:58

标题: 逻辑程序在答案集语义下非正式语义变体的历史回顾：GL'88，GL'91，GK'14，D-V'12

摘要: 这篇笔记提供了一个与逻辑编程在答案集语义下相关的非正式语义的历史调查。我们统一回顾这些内容，并将它们与两种范式对齐：答案集编程和ASP-Prolog——这两种在人工智能领域中知识表示和推理范式。这些内容正在《逻辑编程的理论与实践》（TPLP）中进行考虑。

更新时间: 2024-07-09 12:40:58

领域: cs.AI

下载: http://arxiv.org/abs/2407.06814v1

Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks

Machine learning models, in particular deep neural networks, are currently an integral part of various applications, from healthcare to finance. However, using sensitive data to train these models raises concerns about privacy and security. One method that has emerged to verify if the trained models are privacy-preserving is Membership Inference Attacks (MIA), which allows adversaries to determine whether a specific data point was part of a model's training dataset. While a series of MIAs have been proposed in the literature, only a few can achieve high True Positive Rates (TPR) in the low False Positive Rate (FPR) region (0.01%~1%). This is a crucial factor to consider for an MIA to be practically useful in real-world settings. In this paper, we present a novel approach to MIA that is aimed at significantly improving TPR at low FPRs. Our method, named learning-based difficulty calibration for MIA(LDC-MIA), characterizes data records by their hardness levels using a neural network classifier to determine membership. The experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs. It also has the highest Area Under ROC curve (AUC) across all datasets. Our method's cost is comparable with most of the existing MIAs, but is orders of magnitude more efficient than one of the state-of-the-art methods, LiRA, while achieving similar performance.

Updated: 2024-07-09 12:37:58

标题: 基于学习的难度校准以增强成员推理攻击

摘要: 机器学习模型，特别是深度神经网络，目前是各种应用的重要组成部分，从医疗保健到金融。然而，使用敏感数据训练这些模型引发了有关隐私和安全性的担忧。一种出现的方法是验证训练模型是否保护隐私，即成员推理攻击（MIA），允许对手确定特定数据点是否是模型的训练数据集的一部分。尽管文献中提出了一系列MIA，但只有少数可以在低误报率（FPR）区域（0.01%〜1%）实现高真阳性率（TPR）。这是考虑MIA在实际环境中实用性的关键因素。在本文中，我们提出了一种旨在显著提高低FPR下TPR的MIA的新方法。我们的方法，称为基于学习的难度校准的MIA（LDC-MIA），使用神经网络分类器将数据记录特征化为难度级别，以确定成员身份。实验结果表明，与其他基于难度校准的MIA相比，LDC-MIA可以将低FPR下的TPR提高多达4倍。它还在所有数据集中具有最高的ROC曲线下面积（AUC）。我们的方法的成本与大多数现有的MIA相当，但比最先进的方法之一LiRA效率高出数个数量级，同时实现类似的性能。

更新时间: 2024-07-09 12:37:58

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.04929v3

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Diplomacy is one of the most sophisticated activities in human society. The complex interactions among multiple parties/ agents involve various abilities like social reasoning, negotiation arts, and long-term strategy planning. Previous AI agents surely have proved their capability of handling multi-step games and larger action spaces on tasks involving multiple agents. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. Recently, LLM agents have shown their potential for extending the boundary of previous agents on a couple of applications, however, it is still not enough to handle a very long planning period in a complex multi-agent environment. Empowered with cutting-edge LLM technology, we make the first stab to explore AI's upper bound towards a human-like agent for such a highly comprehensive multi-agent mission by combining three core and essential capabilities for stronger LLM-based societal agents: 1) strategic planner with memory and reflection; 2) goal-oriented negotiate with social reasoning; 3) augmenting memory by self-play games to self-evolving without any human in the loop.

Updated: 2024-07-09 12:37:54

标题: 里士满：基于自进化LLM的人工智能外交代理

摘要: 外交是人类社会中最复杂的活动之一。多方/代理之间的复杂互动涉及各种能力，如社会推理、谈判艺术和长期战略规划。先前的人工智能代理确实已经证明了它们处理多步游戏和涉及多个代理的任务中更大的行动空间的能力。然而，外交涉及惊人的决策空间，特别是考虑到所需的谈判阶段。最近，LLM代理已经展示了它们在一些应用中拓展先前代理边界的潜力，然而，这仍然不足以处理复杂多代理环境中非常长的规划期。借助尖端的LLM技术，我们首次尝试探索人工智能在面向如此高度综合的多代理任务方面的上限，通过结合三种核心和基本能力来加强基于LLM的社会代理：1）具有记忆和反思的战略规划者；2）具有社会推理的目标导向谈判；3）通过自我对弈游戏来增强记忆，实现自我进化而无需人类参与。

更新时间: 2024-07-09 12:37:54

领域: cs.AI,cs.MA,cs.SI

下载: http://arxiv.org/abs/2407.06813v1

Mining Potentially Explanatory Patterns via Partial Solutions

Genetic Algorithms have established their capability for solving many complex optimization problems. Even as good solutions are produced, the user's understanding of a problem is not necessarily improved, which can lead to a lack of confidence in the results. To mitigate this issue, explainability aims to give insight to the user by presenting them with the knowledge obtained by the algorithm. In this paper we introduce Partial Solutions in order to improve the explainability of solutions to combinatorial optimization problems. Partial Solutions represent beneficial traits found by analyzing a population, and are presented to the user for explainability, but also provide an explicit model from which new solutions can be generated. We present an algorithm that assembles a collection of Partial Solutions chosen to strike a balance between high fitness, simplicity and atomicity. Experiments with standard benchmarks show that the proposed algorithm is able to find Partial Solutions which improve explainability at reasonable computational cost without affecting search performance.

Updated: 2024-07-09 12:36:12

标题: 通过部分解决方案挖掘潜在的解释模式

摘要: 遗传算法已经证明了它们在解决许多复杂优化问题方面的能力。虽然可以产生良好的解决方案，但用户对问题的理解并不一定会提高，这可能导致对结果的信心不足。为了缓解这个问题，可解释性旨在通过向用户展示算法获得的知识来为用户提供洞察力。在本文中，我们引入了部分解决方案，以提高对组合优化问题解决方案的可解释性。部分解决方案代表通过分析群体发现的有益特征，并向用户展示以提高可解释性，同时提供一个明确的模型，可以从中生成新的解决方案。我们提出了一个算法，该算法汇集了一组部分解决方案，以在高适应度、简单性和原子性之间取得平衡。对标准基准进行的实验表明，所提出的算法能够以合理的计算成本找到部分解决方案，从而提高可解释性，而不影响搜索性能。

更新时间: 2024-07-09 12:36:12

领域: cs.NE,cs.LG,I.2.8

下载: http://arxiv.org/abs/2404.04388v2

Factored Conditional Filtering: Tracking States and Estimating Parameters in High-Dimensional Spaces

This paper introduces factored conditional filters, new filtering algorithms for simultaneously tracking states and estimating parameters in high-dimensional state spaces. The conditional nature of the algorithms is used to estimate parameters and the factored nature is used to decompose the state space into low-dimensional subspaces in such a way that filtering on these subspaces gives distributions whose product is a good approximation to the distribution on the entire state space. The conditions for successful application of the algorithms are that observations be available at the subspace level and that the transition model can be factored into local transition models that are approximately confined to the subspaces; these conditions are widely satisfied in computer science, engineering, and geophysical filtering applications. We give experimental results on tracking epidemics and estimating parameters in large contact networks that show the effectiveness of our approach.

Updated: 2024-07-09 12:34:28

标题: 分解条件滤波：在高维空间中跟踪状态和估计参数

摘要: 本文介绍了一种新的滤波算法，即分解条件滤波器，用于在高维状态空间中同时跟踪状态和估计参数。算法的条件性质用于估计参数，而分解性质用于将状态空间分解为低维子空间，使得在这些子空间上的滤波可以得到分布，其乘积是对整个状态空间上的分布的良好近似。算法成功应用的条件是观测在子空间级别可用，转换模型可以分解为局部转换模型，这些局部转换模型近似限制在子空间中；这些条件在计算机科学，工程和地球物理滤波应用中被广泛满足。我们在跟踪流行病和估计大型接触网络中的参数方面提供了实验结果，展示了我们方法的有效性。

更新时间: 2024-07-09 12:34:28

领域: cs.AI,cs.LG,68T37,I.2.6

下载: http://arxiv.org/abs/2206.02178v2

A Hybrid Training-time and Run-time Defense Against Adversarial Attacks in Modulation Classification

Motivated by the superior performance of deep learning in many applications including computer vision and natural language processing, several recent studies have focused on applying deep neural network for devising future generations of wireless networks. However, several recent works have pointed out that imperceptible and carefully designed adversarial examples (attacks) can significantly deteriorate the classification accuracy. In this paper, we investigate a defense mechanism based on both training-time and run-time defense techniques for protecting machine learning-based radio signal (modulation) classification against adversarial attacks. The training-time defense consists of adversarial training and label smoothing, while the run-time defense employs a support vector machine-based neural rejection (NR). Considering a white-box scenario and real datasets, we demonstrate that our proposed techniques outperform existing state-of-the-art technologies.

Updated: 2024-07-09 12:28:38

标题: 一种混合训练和运行时防御对抗性攻击的方法在调制分类中的应用

摘要: 受深度学习在计算机视觉和自然语言处理等许多应用中出色性能的启发，最近几项研究集中在应用深度神经网络来设计未来一代无线网络。然而，最近几项研究指出，几乎察觉不到且精心设计的对抗性示例（攻击）会显著降低分类准确性。本文中，我们研究了一种基于训练时和运行时防御技术的防御机制，用于保护基于机器学习的无线电信号（调制）分类免受对抗性攻击。训练时的防御包括对抗性训练和标签平滑，而运行时的防御采用基于支持向量机的神经拒绝（NR）。考虑到白盒场景和真实数据集，我们证明了我们提出的技术优于现有最先进技术。

更新时间: 2024-07-09 12:28:38

领域: cs.AI

下载: http://arxiv.org/abs/2407.06807v1

HuLP: Human-in-the-Loop for Prognosis

This paper introduces HuLP, a Human-in-the-Loop for Prognosis model designed to enhance the reliability and interpretability of prognostic models in clinical contexts, especially when faced with the complexities of missing covariates and outcomes. HuLP offers an innovative approach that enables human expert intervention, empowering clinicians to interact with and correct models' predictions, thus fostering collaboration between humans and AI models to produce more accurate prognosis. Additionally, HuLP addresses the challenges of missing data by utilizing neural networks and providing a tailored methodology that effectively handles missing data. Traditional methods often struggle to capture the nuanced variations within patient populations, leading to compromised prognostic predictions. HuLP imputes missing covariates based on imaging features, aligning more closely with clinician workflows and enhancing reliability. We conduct our experiments on two real-world, publicly available medical datasets to demonstrate the superiority and competitiveness of HuLP.

Updated: 2024-07-09 12:24:50

标题: HuLP：人在环节中的预后

摘要: 这篇论文介绍了HuLP，一种用于预后模型的“人在环中”设计，旨在增强临床背景下预测模型的可靠性和可解释性，特别是在面对缺失协变量和结果的复杂性时。HuLP提供了一种创新方法，使人类专家能够干预，赋予临床医生与模型交互和修正预测的能力，从而促进人类与AI模型之间的合作，产生更准确的预测。此外，HuLP通过利用神经网络解决缺失数据的挑战，并提供一种定制方法有效处理缺失数据。传统方法往往难以捕捉患者群体中微妙的变化，导致预后预测受损。HuLP根据影像特征填充缺失协变量，更贴近临床工作流程，并增强可靠性。我们在两个真实世界的公开医学数据集上进行实验，以展示HuLP的优越性和竞争力。

更新时间: 2024-07-09 12:24:50

领域: cs.CV,cs.AI,cs.HC

下载: http://arxiv.org/abs/2403.13078v2

Neuromimetic metaplasticity for adaptive continual learning

Conventional intelligent systems based on deep neural network (DNN) models encounter challenges in achieving human-like continual learning due to catastrophic forgetting. Here, we propose a metaplasticity model inspired by human working memory, enabling DNNs to perform catastrophic forgetting-free continual learning without any pre- or post-processing. A key aspect of our approach involves implementing distinct types of synapses from stable to flexible, and randomly intermixing them to train synaptic connections with different degrees of flexibility. This strategy allowed the network to successfully learn a continuous stream of information, even under unexpected changes in input length. The model achieved a balanced tradeoff between memory capacity and performance without requiring additional training or structural modifications, dynamically allocating memory resources to retain both old and new information. Furthermore, the model demonstrated robustness against data poisoning attacks by selectively filtering out erroneous memories, leveraging the Hebb repetition effect to reinforce the retention of significant data.

Updated: 2024-07-09 12:21:35

标题: 神经元仿生元可塑性用于适应性持续学习

摘要: 传统基于深度神经网络（DNN）模型的智能系统在实现类人类持续学习方面面临挑战，因为存在灾难性遗忘。在这里，我们提出了受人类工作记忆启发的元可塑性模型，使得DNN能够在没有任何预处理或后处理的情况下进行不受灾难性遗忘影响的持续学习。我们方法的一个关键方面涉及实现从稳定到灵活的不同类型的突触，并随机混合它们来训练具有不同灵活度的突触连接。这种策略使网络能够成功学习连续的信息流，甚至在输入长度出现意外变化时也能如此。该模型在不需要额外训练或结构修改的情况下实现了内存容量和性能之间的平衡折衷，动态分配内存资源以保留旧信息和新信息。此外，该模型通过有选择性地过滤出错误记忆，利用Hebb重复效应来加强重要数据的保留，展示了对数据污染攻击的稳健性。

更新时间: 2024-07-09 12:21:35

领域: cs.NE,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.07133v1

Learn and Don't Forget: Adding a New Language to ASR Foundation Models

Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code tuning, train only the language code; soft prompt tuning, train prepended tokens; and LoRA where a small set of additional parameters are optimised. Elastic Weight Consolidation (EWC) offers an alternative compromise with the potential to maintain performance in specific target languages. Results show that direct fine-tuning yields the best performance for the new language but degrades existing language capabilities. EWC can address this issue for specific languages. If only adaptation parameters are used, the language capabilities are maintained but at the cost of performance in the new language.

Updated: 2024-07-09 12:14:48

标题: 学习并不遗忘：将一种新语言添加到ASR基础模型

摘要: 基础 ASR 模型通常支持许多语言，例如 Whisper 中的 100 种语言。然而，在集成额外的、通常是低资源的语言时，很少有相关工作，同时要保持对原始语言集的性能。微调虽然简单，但可能会降低原始集的准确性。我们比较了三种利用适应参数的方法：软语言代码调整，仅训练语言代码；软提示调整，训练前置标记；以及 LoRA，其中优化了一小组额外参数。弹性权重合并（EWC）提供了一种替代折衷方案，有潜力在特定目标语言中保持性能。结果显示，直接微调对于新语言实现了最佳性能，但降低了现有语言的能力。EWC 可以解决特定语言的这一问题。如果仅使用适应参数，语言能力将得到保持，但以新语言的性能为代价。

更新时间: 2024-07-09 12:14:48

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2407.06800v1

It Cannot Be Right If It Was Written by AI: On Lawyers' Preferences of Documents Perceived as Authored by an LLM vs a Human

Large Language Models (LLMs) enable a future in which certain types of legal documents may be generated automatically. This has a great potential to streamline legal processes, lower the cost of legal services, and dramatically increase access to justice. While many researchers focus their efforts on proposing and evaluating LLM-based applications supporting tasks in the legal domain, there is a notable lack of investigations into how legal professionals perceive content if they believe it has been generated by an LLM. Yet, this is a critical point as over-reliance or unfounded skepticism may influence whether such documents bring about appropriate legal consequences. This study is the necessary analysis in the context of the ongoing transition towards mature generative AI systems. Specifically, we examined whether the perception of legal documents' by lawyers (n=75) varies based on their assumed origin (human-crafted vs AI-generated). The participants evaluated the documents focusing on their correctness and language quality. Our analysis revealed a clear preference for documents perceived as crafted by a human over those believed to be generated by AI. At the same time, most of the participants are expecting the future in which documents will be generated automatically. These findings could be leveraged by legal practitioners, policy makers and legislators to implement and adopt legal document generation technology responsibly, and to fuel the necessary discussions into how legal processes should be updated to reflect the recent technological developments.

Updated: 2024-07-09 12:11:25

标题: 这个文献标题的翻译是：如果是由AI编写的，那就不对：律师对被认为是由LLM vs人类创作的文件的偏好

摘要: 大型语言模型（LLMs）使得未来可能会自动生成某些类型的法律文件。这有巨大潜力来简化法律流程，降低法律服务成本，并显著提高司法准入。虽然许多研究人员专注于提出和评估基于LLM的应用程序，支持法律领域的任务，但律师如何看待内容是否是由LLM生成的缺乏研究调查。然而，这是一个关键点，因为过度依赖或毫无根据的怀疑可能会影响这些文件是否带来适当的法律后果。这项研究是在向成熟生成式AI系统过渡的背景下进行的必要分析。具体而言，我们研究了律师（n=75）对法律文件的感知是否基于它们的假定来源（人工制作 vs AI生成）而变化。参与者评估文件时关注其正确性和语言质量。我们的分析显示，大多数参与者更喜欢认为是由人工制作而不是由AI生成的文件。同时，大多数参与者期待未来将会自动生成文件。这些发现可以被法律实践者、决策者和立法者利用，以负责任地实施和采用法律文件生成技术，并推动必要的讨论，探讨法律流程应如何更新以反映最新的技术发展。

更新时间: 2024-07-09 12:11:25

领域: cs.HC,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.06798v1

Synthesizing Realistic Data for Table Recognition

To overcome the limitations and challenges of current automatic table data annotation methods and random table data synthesis approaches, we propose a novel method for synthesizing annotation data specifically designed for table recognition. This method utilizes the structure and content of existing complex tables, facilitating the efficient creation of tables that closely replicate the authentic styles found in the target domain. By leveraging the actual structure and content of tables from Chinese financial announcements, we have developed the first extensive table annotation dataset in this domain. We used this dataset to train several recent deep learning-based end-to-end table recognition models. Additionally, we have established the inaugural benchmark for real-world complex tables in the Chinese financial announcement domain, using it to assess the performance of models trained on our synthetic data, thereby effectively validating our method's practicality and effectiveness. Furthermore, we applied our synthesis method to augment the FinTabNet dataset, extracted from English financial announcements, by increasing the proportion of tables with multiple spanning cells to introduce greater complexity. Our experiments show that models trained on this augmented dataset achieve comprehensive improvements in performance, especially in the recognition of tables with multiple spanning cells.

Updated: 2024-07-09 12:09:32

标题: 合成逼真数据用于表格识别

摘要: 为了克服当前自动表格数据标注方法和随机表格数据合成方法的局限性和挑战，我们提出了一种新颖的方法，用于合成专门设计用于表格识别的标注数据。该方法利用现有复杂表格的结构和内容，有助于高效创建与目标领域中发现的真实样式密切相似的表格。通过利用来自中国财务公告的实际表格的结构和内容，我们在该领域开发了第一个广泛的表格标注数据集。我们使用这个数据集训练了几个最近基于深度学习的端到端表格识别模型。此外，我们建立了中国财务公告领域实际复杂表格的首个基准，用它来评估在我们合成数据上训练的模型的性能，从而有效验证我们方法的实用性和有效性。此外，我们应用我们的合成方法来增强从英文财务公告中提取的FinTabNet数据集，通过增加具有多个跨越单元格的表格的比例，引入更大的复杂性。我们的实验证明，在这个增强数据集上训练的模型在性能上取得了全面的改善，特别是在识别具有多个跨越单元格的表格方面。

更新时间: 2024-07-09 12:09:32

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.11100v2

ED-VAE: Entropy Decomposition of ELBO in Variational Autoencoders

Traditional Variational Autoencoders (VAEs) are constrained by the limitations of the Evidence Lower Bound (ELBO) formulation, particularly when utilizing simplistic, non-analytic, or unknown prior distributions. These limitations inhibit the VAE's ability to generate high-quality samples and provide clear, interpretable latent representations. This work introduces the Entropy Decomposed Variational Autoencoder (ED-VAE), a novel re-formulation of the ELBO that explicitly includes entropy and cross-entropy components. This reformulation significantly enhances model flexibility, allowing for the integration of complex and non-standard priors. By providing more detailed control over the encoding and regularization of latent spaces, ED-VAE not only improves interpretability but also effectively captures the complex interactions between latent variables and observed data, thus leading to better generative performance.

Updated: 2024-07-09 12:09:21

标题: ED-VAE：变分自编码器中ELBO的熵分解

摘要: 传统的变分自动编码器（VAEs）受到证据下界（ELBO）公式的限制，特别是在使用简化、非解析或未知先验分布时。这些限制阻碍了VAE生成高质量样本和提供清晰可解释的潜在表示的能力。本文介绍了熵分解变分自动编码器（ED-VAE），这是ELBO的一种新的重新表述，明确包括熵和交叉熵组件。这种重新表述显著增强了模型的灵活性，允许集成复杂和非标准的先验分布。通过提供更详细的控制编码和正则化潜在空间，ED-VAE不仅提高了可解释性，而且有效地捕捉了潜在变量与观察数据之间的复杂相互作用，从而导致更好的生成性能。

更新时间: 2024-07-09 12:09:21

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2407.06797v1

Countermeasures Against Adversarial Examples in Radio Signal Classification

Deep learning algorithms have been shown to be powerful in many communication network design problems, including that in automatic modulation classification. However, they are vulnerable to carefully crafted attacks called adversarial examples. Hence, the reliance of wireless networks on deep learning algorithms poses a serious threat to the security and operation of wireless networks. In this letter, we propose for the first time a countermeasure against adversarial examples in modulation classification. Our countermeasure is based on a neural rejection technique, augmented by label smoothing and Gaussian noise injection, that allows to detect and reject adversarial examples with high accuracy. Our results demonstrate that the proposed countermeasure can protect deep-learning based modulation classification systems against adversarial examples.

Updated: 2024-07-09 12:08:50

标题: 无线信号分类中对抗性样本的对策

摘要: 深度学习算法在许多通信网络设计问题中表现出强大的能力，包括自动调制分类中的应用。然而，它们容易受到精心设计的攻击，称为对抗样本。因此，无线网络对深度学习算法的依赖对无线网络的安全和运行构成严重威胁。在这封信中，我们首次提出了一种对抗样本在调制分类中的对抗措施。我们的对抗措施基于神经拒绝技术，辅以标签平滑和高斯噪声注入，能够高精度地检测和拒绝对抗样本。我们的结果表明，所提出的对抗措施可以保护基于深度学习的调制分类系统免受对抗样本的影响。

更新时间: 2024-07-09 12:08:50

领域: cs.AI

下载: http://arxiv.org/abs/2407.06796v1

Explore BiLSTM-CRF-Based Models for Open Relation Extraction

Extracting multiple relations from text sentences is still a challenge for current Open Relation Extraction (Open RE) tasks. In this paper, we develop several Open RE models based on the bidirectional LSTM-CRF (BiLSTM-CRF) neural network and different contextualized word embedding methods. We also propose a new tagging scheme to solve overlapping problems and enhance models' performance. From the evaluation results and comparisons between models, we select the best combination of tagging scheme, word embedder, and BiLSTM-CRF network to achieve an Open RE model with a remarkable extracting ability on multiple-relation sentences.

Updated: 2024-07-09 12:06:39

标题: 探索基于BiLSTM-CRF模型的开放式关系抽取模型

摘要: 目前，从文本句子中提取多个关系仍然是当前开放关系抽取（Open RE）任务面临的挑战。本文基于双向LSTM-CRF（BiLSTM-CRF）神经网络和不同的上下文化词嵌入方法开发了几种开放关系抽取模型。我们还提出了一种新的标记方案来解决重叠问题并增强模型的性能。通过评估结果和模型之间的比较，我们选择了标记方案、词嵌入器和BiLSTM-CRF网络的最佳组合，以实现具有显著提取能力的开放关系抽取模型，适用于多关系句子。

更新时间: 2024-07-09 12:06:39

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2104.12333v2

Robust Reinforcement Learning from Corrupted Human Feedback

Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted preference label as sparse outliers. Accordingly, we formulate the robust reward learning as an $\ell_1$-regularized maximum likelihood estimation problem. Computationally, we develop an efficient alternating optimization algorithm, which only incurs negligible computational overhead compared with the standard RLHF approach. Theoretically, we prove that under proper regularity conditions, $R^3M$ can consistently learn the underlying reward and identify outliers, provided that the number of outlier labels scales sublinearly with the preference sample size. Furthermore, we remark that $R^3M$ is versatile and can be extended to various preference optimization methods, including direct preference optimization (DPO). Our experiments on robotic control and natural language generation with large language models (LLMs) show that $R^3M$ improves robustness of the reward against several types of perturbations to the preference data.

Updated: 2024-07-09 12:04:03

标题: 受损人类反馈的稳健强化学习

摘要: 从人类反馈中学习强化学习（RLHF）提供了一个有原则的框架，用于将人工智能系统与人类偏好数据对齐。由于各种原因，例如个人偏见、上下文模糊、缺乏训练等，人类注释者可能会给出不正确或不一致的偏好标签。为了解决这个挑战，我们提出了一种强大的RLHF方法——$R^3M$，将潜在受损偏好标签建模为稀疏的异常值。因此，我们将强化奖励学习形式化为$\ell_1$正则化的最大似然估计问题。在计算上，我们开发了一种高效的交替优化算法，与标准的RLHF方法相比，仅产生可忽略的计算开销。理论上，我们证明在适当的正则性条件下，只要异常值标签的数量与偏好样本大小的比例为次线性，$R^3M$就可以一致地学习底层奖励并识别异常值。此外，我们指出$R^3M$是多功能的，可以扩展到各种偏好优化方法，包括直接偏好优化（DPO）。我们在机器人控制和使用大语言模型（LLMs）进行自然语言生成的实验中表明，$R^3M$能够提高奖励对于偏好数据多种扰动的鲁棒性。

更新时间: 2024-07-09 12:04:03

领域: cs.LG

下载: http://arxiv.org/abs/2406.15568v2

Representation Learning on Hyper-Relational and Numeric Knowledge Graphs with Transformers

A hyper-relational knowledge graph has been recently studied where a triplet is associated with a set of qualifiers; a qualifier is composed of a relation and an entity, providing auxiliary information for a triplet. While existing hyper-relational knowledge graph embedding methods assume that the entities are discrete objects, some information should be represented using numeric values, e.g., (J.R.R., was born in, 1892). Also, a triplet (J.R.R., educated at, Oxford Univ.) can be associated with a qualifier such as (start time, 1911). In this paper, we propose a unified framework named HyNT that learns representations of a hyper-relational knowledge graph containing numeric literals in either triplets or qualifiers. We define a context transformer and a prediction transformer to learn the representations based not only on the correlations between a triplet and its qualifiers but also on the numeric information. By learning compact representations of triplets and qualifiers and feeding them into the transformers, we reduce the computation cost of using transformers. Using HyNT, we can predict missing numeric values in addition to missing entities or relations in a hyper-relational knowledge graph. Experimental results show that HyNT significantly outperforms state-of-the-art methods on real-world datasets.

Updated: 2024-07-09 12:03:09

标题: 使用Transformer在超关系和数值知识图上进行表示学习

摘要: 最近研究了一个超关系知识图，其中三元组与一组限定词相关联；一个限定词由一个关系和一个实体组成，为三元组提供辅助信息。现有的超关系知识图嵌入方法假定实体是离散对象，但有些信息应该使用数值表示，例如（J.R.R.，出生于，1892年）。此外，一个三元组（J.R.R.，受教于，牛津大学）可以与一个限定词相关联，例如（开始时间，1911年）。在本文中，我们提出了一个统一的框架，名为HyNT，它学习超关系知识图中包含数字文字的三元组或限定词的表示。我们定义了一个上下文转换器和一个预测转换器，不仅基于三元组及其限定词之间的相关性来学习表示，还基于数字信息。通过学习三元组和限定词的紧凑表示并将它们输入到转换器中，我们减少了使用转换器的计算成本。使用HyNT，我们可以预测超关系知识图中缺失的数值，还可以预测缺失的实体或关系。实验结果表明，HyNT在真实数据集上明显优于现有方法。

更新时间: 2024-07-09 12:03:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2305.18256v4

Not All Layers of LLMs Are Necessary During Inference

Due to the large number of parameters, the inference phase of Large Language Models (LLMs) is resource-intensive. However, not all requests posed to LLMs are equally difficult to handle. Through analysis, we show that for some tasks, LLMs can achieve results comparable to the final output at some intermediate layers. That is, not all layers of LLMs are necessary during inference. If we can predict at which layer the inferred results match the final results (produced by evaluating all layers), we could significantly reduce the inference cost. To this end, we propose a simple yet effective algorithm named AdaInfer to adaptively terminate the inference process for an input instance. AdaInfer relies on easily obtainable statistical features and classic classifiers like SVM. Experiments on well-known LLMs like the Llama2 series and OPT, show that AdaInfer can achieve an average of 17.8% pruning ratio, and up to 43% on sentiment tasks, with nearly no performance drop (<1%). Because AdaInfer does not alter LLM parameters, the LLMs incorporated with AdaInfer maintain generalizability across tasks.

Updated: 2024-07-09 11:59:01

标题: 在推理过程中，LLM的不是所有层都是必需的。

摘要: 由于参数数量庞大，大型语言模型（LLMs）的推理阶段需要大量资源。然而，并非所有针对LLMs的请求都同样难以处理。通过分析，我们表明对于某些任务，LLMs可以在一些中间层实现与最终输出可比较的结果。也就是说，在推理过程中，并非LLMs的所有层都是必要的。如果我们可以预测推断结果与最终结果（通过评估所有层产生）匹配的层，我们可以显著减少推理成本。为此，我们提出了一种简单而有效的算法，名为AdaInfer，用于自适应地终止输入实例的推理过程。AdaInfer依赖于易于获取的统计特征和经典分类器如SVM。对于像Llama2系列和OPT这样广为人知的LLMs的实验表明，AdaInfer可以实现平均17.8%的修剪比例，情感任务中高达43%，几乎没有性能下降（<1%）。由于AdaInfer不会改变LLMs参数，与AdaInfer结合的LLMs在各种任务中保持泛化能力。

更新时间: 2024-07-09 11:59:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.02181v3

Towards physics-informed neural networks for landslide prediction

For decades, solutions to regional scale landslide prediction have mostly relied on data-driven models, by definition, disconnected from the physics of the failure mechanism. The success and spread of such tools came from the ability to exploit proxy variables rather than explicit geotechnical ones, as the latter are prohibitive to acquire over broad landscapes. Our work implements a Physics Informed Neural Network (PINN) approach, thereby adding to a standard data-driven architecture, an intermediate constraint to solve for the permanent deformation typical of Newmark slope stability methods. This translates into a neural network tasked with explicitly retrieving geotechnical parameters from common proxy variables and then minimize a loss function with respect to the available coseismic landside inventory. The results are very promising, because our model not only produces excellent predictive performance in the form of standard susceptibility output, but in the process, also generates maps of the expected geotechnical properties at a regional scale. Such architecture is therefore framed to tackle coseismic landslide prediction, something that, if confirmed in other studies, could open up towards PINN-based near-real-time predictions.

Updated: 2024-07-09 11:54:49

标题: 朝向物理启示的神经网络在滑坡预测中的应用

摘要: 几十年来，区域尺度的滑坡预测解决方案主要依赖于数据驱动模型，这些模型与失败机制的物理机制相互隔离。这些工具的成功和传播来自于利用代理变量而不是明确的岩土工程变量，因为后者在广阔的景观上获取是困难的。我们的工作实现了一种物理信息神经网络（PINN）方法，从而在标准数据驱动架构中添加一个中间约束，用于解决 Newmark 边坡稳定方法中典型的永久变形。这意味着一个神经网络被明确要求从常见的代理变量中检索岩土工程参数，然后最小化与现有余震滑坡清单相关的损失函数。结果非常有前景，因为我们的模型不仅以标准易感性输出的形式产生了出色的预测性能，而且在这个过程中还产生了区域尺度预期岩土工程特性的地图。因此，这种架构被设计用于解决余震滑坡预测问题，如果在其他研究中得到证实，可能会开启基于PINN的近实时预测。

更新时间: 2024-07-09 11:54:49

领域: physics.geo-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06785v1

Convergence rates for Poisson learning to a Poisson equation with measure data

In this paper we prove discrete to continuum convergence rates for Poisson Learning, a graph-based semi-supervised learning algorithm that is based on solving the graph Poisson equation with a source term consisting of a linear combination of Dirac deltas located at labeled points and carrying label information. The corresponding continuum equation is a Poisson equation with measure data in a Euclidean domain $\Omega \subset \mathbb{R}^d$. The singular nature of these equations is challenging and requires an approach with several distinct parts: (1) We prove quantitative error estimates when convolving the measure data of a Poisson equation with (approximately) radial function supported on balls. (2) We use quantitative variational techniques to prove discrete to continuum convergence rates on random geometric graphs with bandwidth $\varepsilon>0$ for bounded source terms. (3) We show how to regularize the graph Poisson equation via mollification with the graph heat kernel, and we study fine asymptotics of the heat kernel on random geometric graphs. Combining these three pillars we obtain $L^1$ convergence rates that scale, up to logarithmic factors, like $O(\varepsilon^{\frac{1}{d+2}})$ for general data distributions, and $O(\varepsilon^{\frac{2-\sigma}{d+4}})$ for uniformly distributed data, where $\sigma>0$. These rates are valid with high probability if $\varepsilon\gg\left({\log n}/{n}\right)^q$ where $n$ denotes the number of vertices of the graph and $q \approx \frac{1}{3d}$.

Updated: 2024-07-09 11:54:34

标题: Poisson学习收敛速率到一个具有测量数据的泊松方程

摘要: 在这篇论文中，我们证明了基于图的半监督学习算法Poisson Learning的离散到连续收敛速率，该算法基于解图泊松方程，其源项由标记点处的Dirac δ的线性组合组成，并携带标签信息。相应的连续方程是在欧几里得域$\Omega \subset \mathbb{R}^d$中带有测量数据的泊松方程。这些方程的奇异性具有挑战性，需要采用多个不同部分的方法：(1) 当将泊松方程的测量数据与（近似）支持在球上的径向函数卷积时，我们证明了定量误差估计。(2) 我们使用定量变分技术证明了在带宽$\varepsilon>0$的随机几何图上对有界源项的离散到连续收敛速率。(3) 我们展示了如何通过图热核的平滑化来正则化图泊松方程，并研究了随机几何图上热核的精细渐近行为。通过结合这三个支柱，我们得到了$L^1$收敛速率，对于一般数据分布，这些速率在高概率下的尺度，最多有对数因子，如$O(\varepsilon^{\frac{1}{d+2}})$，对于均匀分布的数据，速率为$O(\varepsilon^{\frac{2-\sigma}{d+4}})$，其中$\sigma>0$。如果$\varepsilon\gg\left({\log n}/{n}\right)^q$，其中$n$表示图的顶点数量，$q \approx \frac{1}{3d}$，那么这些速率在高概率下是有效的。

更新时间: 2024-07-09 11:54:34

领域: math.AP,cs.LG,cs.NA,math.NA,math.PR,35J05, 35A35, 05C80, 35J05

下载: http://arxiv.org/abs/2407.06783v1

Fuzzy color model and clustering algorithm for color clustering problem

The research interest of this paper is focused on the efficient clustering task for an arbitrary color data. In order to tackle this problem, we have tried to model the inherent uncertainty and vagueness of color data using fuzzy color model. By taking fuzzy approach to color modeling, we could make a soft decision for the vague regions between neighboring colors. The proposed fuzzy color model defined a three dimensional fuzzy color ball and color membership computation method with two inter-color distances. With the fuzzy color model, we developed a new fuzzy clustering algorithm for an efficient partition of color data. Each fuzzy cluster set has a cluster prototype which is represented by fuzzy color centroid.

Updated: 2024-07-09 11:53:54

标题: 模糊颜色模型和用于颜色聚类问题的聚类算法

摘要: 本文的研究兴趣集中在针对任意颜色数据的高效聚类任务上。为了解决这个问题，我们尝试使用模糊颜色模型来建模颜色数据的固有不确定性和模糊性。通过采用模糊方法进行颜色建模，我们可以对相邻颜色之间的模糊区域做出软决策。提出的模糊颜色模型定义了一个三维模糊颜色球和颜色成员计算方法，其中包括两种颜色之间的距离。通过模糊颜色模型，我们开发了一种新的模糊聚类算法，用于对颜色数据进行高效的分区。每个模糊聚类集合都有一个由模糊颜色中心表示的聚类原型。

更新时间: 2024-07-09 11:53:54

领域: cs.AI

下载: http://arxiv.org/abs/2407.06782v1

A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR

Since its implementation in May 2018, the General Data Protection Regulation (GDPR) has prompted businesses to revisit and revise their data handling practices to ensure compliance. The privacy policy, which serves as the primary means of informing users about their privacy rights and the data practices of companies, has been significantly updated by numerous businesses post-GDPR implementation. However, many privacy policies remain packed with technical jargon, lengthy explanations, and vague descriptions of data practices and user rights. This makes it a challenging task for users and regulatory authorities to manually verify the GDPR compliance of these privacy policies. In this study, we aim to address the challenge of compliance analysis between GDPR (Article 13) and privacy policies for 5G networks. We manually collected privacy policies from almost 70 different 5G MNOs, and we utilized an automated BERT-based model for classification. We show that an encouraging 51$\%$ of companies demonstrate a strong adherence to GDPR. In addition, we present the first study that provides current empirical evidence on the readability of privacy policies for 5G network. we adopted readability analysis toolset that incorporates various established readability metrics. The findings empirically show that the readability of the majority of current privacy policies remains a significant challenge. Hence, 5G providers need to invest considerable effort into revising these documents to enhance both their utility and the overall user experience.

Updated: 2024-07-09 11:47:52

标题: 一个基于BERT的有关隐私政策遵从GDPR的实证研究

摘要: 自2018年5月实施以来，通用数据保护条例（GDPR）促使企业重新审视和修改其数据处理实践，以确保合规性。隐私政策作为向用户通知其隐私权利和公司数据实践的主要手段，已在GDPR实施后被许多企业显着更新。然而，许多隐私政策仍然充斥着技术术语、冗长的解释和对数据实践和用户权利的模糊描述。这使得用户和监管机构难以手动验证这些隐私政策的GDPR合规性。在本研究中，我们旨在解决GDPR（第13条）和5G网络隐私政策之间合规性分析的挑战。我们手动收集了近70个不同5G移动网络运营商的隐私政策，并利用基于BERT的自动模型进行分类。我们展示了令人鼓舞的结果，51％的公司表现出对GDPR的强烈遵守。此外，我们提供了首个关于5G网络隐私政策可读性的当前实证证据研究。我们采用了包含各种已建立的可读性指标的可读性分析工具集。研究结果从实证层面显示，当前大多数隐私政策的可读性仍然是一个重要挑战。因此，5G提供商需要投入大量精力来修订这些文件，以提高它们的实用性和整体用户体验。

更新时间: 2024-07-09 11:47:52

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.06778v1

A new validity measure for fuzzy c-means clustering

A new cluster validity index is proposed for fuzzy clusters obtained from fuzzy c-means algorithm. The proposed validity index exploits inter-cluster proximity between fuzzy clusters. Inter-cluster proximity is used to measure the degree of overlap between clusters. A low proximity value refers to well-partitioned clusters. The best fuzzy c-partition is obtained by minimizing inter-cluster proximity with respect to c. Well-known data sets are tested to show the effectiveness and reliability of the proposed index.

Updated: 2024-07-09 11:45:02

标题: 模糊c均值聚类的新有效性度量

摘要: 提出了一种新的聚类有效性指标，用于模糊c均值算法得到的模糊聚类。所提出的有效性指标利用模糊聚类之间的簇间接近度。簇间接近度用于衡量簇之间的重叠程度。低接近度值表示簇被很好地分割。通过最小化与c相关的簇间接近度，得到最佳的模糊c-分区。对一些知名数据集进行测试，以展示所提出指标的有效性和可靠性。

更新时间: 2024-07-09 11:45:02

领域: cs.AI

下载: http://arxiv.org/abs/2407.06774v1

Temporal Convolution Derived Multi-Layered Reservoir Computing

The prediction of time series is a challenging task relevant in such diverse applications as analyzing financial data, forecasting flow dynamics or understanding biological processes. Especially chaotic time series that depend on a long history pose an exceptionally difficult problem. While machine learning has shown to be a promising approach for predicting such time series, it either demands long training time and much training data when using deep recurrent neural networks. Alternative, when using a reservoir computing approach it comes with high uncertainty and typically a high number of random initializations and extensive hyper-parameter tuning when using a reservoir computing approach. In this paper, we focus on the reservoir computing approach and propose a new mapping of input data into the reservoir's state space. Furthermore, we incorporate this method in two novel network architectures increasing parallelizability, depth and predictive capabilities of the neural network while reducing the dependence on randomness. For the evaluation, we approximate a set of time series from the Mackey-Glass equation, inhabiting non-chaotic as well as chaotic behavior and compare our approaches in regard to their predictive capabilities to echo state networks and gated recurrent units. For the chaotic time series, we observe an error reduction of up to $85.45\%$ and up to $87.90\%$ in contrast to echo state networks and gated recurrent units respectively. Furthermore, we also observe tremendous improvements for non-chaotic time series of up to $99.99\%$ in contrast to existing approaches.

Updated: 2024-07-09 11:40:46

标题: 时间卷积衍生的多层水库计算

摘要: 时间序列的预测是一项具有挑战性的任务，在诸如分析金融数据、预测流动动力学或理解生物过程等各种应用中都很重要。特别是依赖长期历史的混沌时间序列提出了一个异常困难的问题。虽然机器学习已经被证明是预测这种时间序列的一种有希望的方法，但使用深度递归神经网络时要求长时间训练和大量训练数据。另外，当使用储水计算方法时，会带来高度的不确定性，并且通常需要大量随机初始化和广泛的超参数调整。在本文中，我们专注于储水计算方法，并提出了一种将输入数据映射到储水器状态空间的新方法。此外，我们将这种方法融入到两种新颖的网络架构中，增加了神经网络的并行性、深度和预测能力，同时减少了对随机性的依赖。为了评估，我们从Mackey-Glass方程中近似一组时间序列，这些时间序列具有非混沌和混沌行为，并将我们的方法与回声状态网络和门控递归单元相比较，以了解它们的预测能力。对于混沌时间序列，我们观察到相对于回声状态网络和门控递归单元，错误率降低了高达85.45%和87.90%。此外，我们还观察到相对于现有方法，非混沌时间序列的改进高达99.99%。

更新时间: 2024-07-09 11:40:46

领域: cs.LG,nlin.CD

下载: http://arxiv.org/abs/2407.06771v1

Are Large Language Models Aligned with People's Social Intuitions for Human-Robot Interactions?

Large language models (LLMs) are increasingly used in robotics, especially for high-level action planning. Meanwhile, many robotics applications involve human supervisors or collaborators. Hence, it is crucial for LLMs to generate socially acceptable actions that align with people's preferences and values. In this work, we test whether LLMs capture people's intuitions about behavior judgments and communication preferences in human-robot interaction (HRI) scenarios. For evaluation, we reproduce three HRI user studies, comparing the output of LLMs with that of real participants. We find that GPT-4 strongly outperforms other models, generating answers that correlate strongly with users' answers in two studies $\unicode{x2014}$ the first study dealing with selecting the most appropriate communicative act for a robot in various situations ($r_s$ = 0.82), and the second with judging the desirability, intentionality, and surprisingness of behavior ($r_s$ = 0.83). However, for the last study, testing whether people judge the behavior of robots and humans differently, no model achieves strong correlations. Moreover, we show that vision models fail to capture the essence of video stimuli and that LLMs tend to rate different communicative acts and behavior desirability higher than people.

Updated: 2024-07-09 11:27:40

标题: 大型语言模型是否与人们对人机交互的社会直觉一致？

摘要: 大型语言模型（LLMs）在机器人领域的应用越来越广泛，特别是在高级行动规划方面。同时，许多机器人应用涉及人类监督员或合作者。因此，LLMs 生成与人们偏好和价值观一致的社会可接受行为至关重要。在这项研究中，我们测试LLMs是否捕捉到人们在人机交互（HRI）场景中的行为判断和沟通偏好的直觉。为了评估，我们重现了三个HRI用户研究，将LLMs的输出与真实参与者的输出进行比较。我们发现，GPT-4表现优于其他模型，在两个研究中生成的答案与用户答案强烈相关 - 第一个研究涉及在不同情况下选择最合适的沟通行为的机器人（rs = 0.82），第二个研究涉及判断行为的可取性、意图性和令人惊讶性（rs = 0.83）。然而，在最后一个研究中，测试人们是否不同程度地判断机器人和人类的行为时，没有模型能够达到强相关。此外，我们展示视觉模型无法捕捉视频刺激的本质，而LLMs倾向于将不同的沟通行为和行为可取性评分高于人类。

更新时间: 2024-07-09 11:27:40

领域: cs.RO,cs.AI,cs.HC

下载: http://arxiv.org/abs/2403.05701v2

A Generalization Bound for Nearly-Linear Networks

We consider nonlinear networks as perturbations of linear ones. Based on this approach, we present novel generalization bounds that become non-vacuous for networks that are close to being linear. The main advantage over the previous works which propose non-vacuous generalization bounds is that our bounds are a-priori: performing the actual training is not required for evaluating the bounds. To the best of our knowledge, they are the first non-vacuous generalization bounds for neural nets possessing this property.

Updated: 2024-07-09 11:20:01

标题: 一个关于近线性网络的泛化界限

摘要: 我们将非线性网络视为线性网络的扰动。基于这种方法，我们提出了新颖的泛化界限，这些界限对于接近线性的网络而言是非空的。与先前提出非空泛化界限的作品相比，我们的界限具有先验性：不需要进行实际训练即可评估界限。据我们所知，这是具有此属性的神经网络的第一个非空泛化界限。

更新时间: 2024-07-09 11:20:01

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2407.06765v1

Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions

We propose MToMnet - a Theory of Mind (ToM) neural network for predicting beliefs and their dynamics during human social interactions from multimodal input. ToM is key for effective nonverbal human communication and collaboration, yet, existing methods for belief modelling have not included explicit ToM modelling or have typically been limited to one or two modalities. MToMnet encodes contextual cues (scene videos and object locations) and integrates them with person-specific cues (human gaze and body language) in a separate MindNet for each person. Inspired by prior research on social cognition and computational ToM, we propose three different MToMnet variants: two involving fusion of latent representations and one involving re-ranking of classification scores. We evaluate our approach on two challenging real-world datasets, one focusing on belief prediction, while the other examining belief dynamics prediction. Our results demonstrate that MToMnet surpasses existing methods by a large margin while at the same time requiring a significantly smaller number of parameters. Taken together, our method opens up a highly promising direction for future work on artificial intelligent systems that can robustly predict human beliefs from their non-verbal behaviour and, as such, more effectively collaborate with humans.

Updated: 2024-07-09 11:15:51

标题: 非言语交际中信念预测的心智理论显式建模

摘要: 我们提出了MToMnet - 一种理论推理（ToM）神经网络，用于从多模态输入中预测人类社交互动中的信念及其动态。ToM对于有效的非言语人类沟通和合作至关重要，然而，现有的信念建模方法并未包括显式的ToM建模，或者通常仅限于一种或两种模态。MToMnet在每个人员的MindNet中编码了场景视频和物体位置等情境线索，并将它们与个人特定线索（人类凝视和身体语言）集成在一起。受社交认知和计算ToM的先前研究启发，我们提出了三种不同的MToMnet变体：两种涉及潜在表示的融合，一种涉及重新排列分类分数。我们在两个具有挑战性的真实世界数据集上评估了我们的方法，一个侧重于信念预测，另一个检查信念动态预测。我们的结果表明，MToMnet在同一时间需要显著较少的参数的情况下，大幅超越了现有方法。总的来说，我们的方法为未来关于能够从人类的非言语行为中稳健预测人类信念，并因此更有效地与人类合作的人工智能系统开辟了一个高度有前途的方向。

更新时间: 2024-07-09 11:15:51

领域: cs.AI

下载: http://arxiv.org/abs/2407.06762v1

On the Importance of Reproducibility of Experimental Results Especially in the Domain of Security

Security especially in the fields of IoT, industrial automation and critical infrastructure is paramount nowadays and a hot research topic. In order to ensure confidence in research results they need to be reproducible. In the past we reported [18] that in many publications important information such as details about the equipment used are missing. In this paper we report on our own experiments that we run to verify the parameters reported in the datasheets that came along with our experimental equipment. Our results show that there are significant discrepancies between the datasheets and the real world data. These deviations concern accuracy of positions, movements, duration of laser shots etc. In order to improve reproducibility of results we therefore argue on the one hand that research groups verify the data given in datasheets of equipment they use and on the other hand that they provide measurement set-up parameters in globally accepted units such as cm, seconds, etc.

Updated: 2024-07-09 11:12:14

标题: 关于实验结果可复制性在安全领域尤为重要的意义

摘要: 安全性在物联网、工业自动化和关键基础设施领域尤为重要，是当前热门研究课题。为了确保研究结果的可复制性，在过去我们曾报道[18]许多出版物中缺少重要信息，例如使用设备的详细信息。本文报告了我们进行的实验，以验证随实验设备提供的数据表中报告的参数。我们的结果显示，数据表和实际数据之间存在显著偏差。这些偏差涉及位置的准确性、运动、激光射击持续时间等。为了提高结果的可复制性，我们因此主张一方面研究团队验证所使用设备数据表中提供的数据，另一方面提供以厘米、秒等全球公认的单位表示的测量设置参数。

更新时间: 2024-07-09 11:12:14

领域: cs.AR,cs.CR

下载: http://arxiv.org/abs/2407.06760v1

Unlocking the Potential of Model Merging for Low-Resource Languages

Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. We use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target languages. Our experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data. Observing performance saturation in model merging with more training tokens, we further analyze the merging process and introduce a slack variable to the model merging algorithm to mitigate the loss of important parameters, thereby enhancing performance. We hope that model merging can benefit more human languages suffering from data scarcity with its higher data efficiency.

Updated: 2024-07-09 11:09:19

标题: 解锁低资源语言模型合并的潜力

摘要: 将大型语言模型（LLMs）适应新语言通常涉及持续预训练（CT），然后是监督微调（SFT）。然而，在低资源语言环境中，这种CT然后SFT的方法在数据有限时难以平衡语言建模和任务解决能力。因此，我们提出了模型合并作为低资源语言的替代方案，将具有不同能力的模型合并为一个模型，而无需额外训练。我们使用模型合并为目标语言没有SFT数据的低资源语言开发任务解决LLMs。我们基于Llama-2-7B的实验表明，模型合并有效地赋予低资源语言的LLMs任务解决能力，在极度稀缺数据的情况下胜过CT然后SFT。观察到在更多的训练标记中模型合并的性能饱和，我们进一步分析合并过程，并向模型合并算法引入松弛变量，以减轻重要参数的损失，从而增强性能。我们希望模型合并可以通过更高的数据效率使更多因数据稀缺而受苦的人类语言受益。

更新时间: 2024-07-09 11:09:19

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.03994v2

Cybersecurity Defenses: Exploration of CVE Types through Attack Descriptions

Vulnerabilities in software security can remain undiscovered even after being exploited. Linking attacks to vulnerabilities helps experts identify and respond promptly to the incident. This paper introduces VULDAT, a classification tool using a sentence transformer MPNET to identify system vulnerabilities from attack descriptions. Our model was applied to 100 attack techniques from the ATT&CK repository and 685 issues from the CVE repository. Then, we compare the performance of VULDAT against the other eight state-of-the-art classifiers based on sentence transformers. Our findings indicate that our model achieves the best performance with F1 score of 0.85, Precision of 0.86, and Recall of 0.83. Furthermore, we found 56% of CVE reports vulnerabilities associated with an attack were identified by VULDAT, and 61% of identified vulnerabilities were in the CVE repository.

Updated: 2024-07-09 11:08:35

标题: 网络安全防御：通过攻击描述探索CVE类型

摘要: 软件安全中的漏洞即使在被利用之后仍可能保持未被发现状态。将攻击与漏洞进行关联有助于专家及时识别和应对事件。本文介绍了VULDAT，一种分类工具，使用句子转换器MPNET来从攻击描述中识别系统漏洞。我们的模型应用于ATT&CK存储库中的100种攻击技术和CVE存储库中的685个问题。然后，我们将VULDAT与其他八种基于句子转换器的最新分类器的性能进行比较。我们的研究结果表明，我们的模型在F1分数为0.85，精确度为0.86和召回率为0.83方面取得了最佳性能。此外，我们发现56%的CVE报告中与攻击相关的漏洞被VULDAT识别出来，而识别出的漏洞中61%在CVE存储库中。

更新时间: 2024-07-09 11:08:35

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.06759v1

On the Influence of the Laser Illumination on the Logic Cells Current Consumption

Physical side-channel attacks represent a great challenge for today's chip design. Although attacks on CMOS dynamic power represent a class of state-of-the-art attacks, many other effects potentially affect the security of CMOS chips analogously by affecting mostly static behaviour of the chip, including aging, ionizing radiation, or non-ionizing illumination of the CMOS. Vulnerabilities exploiting data dependency in CMOS static power were already demonstrated in practice and the analogous vulnerability exploiting light-modulated static power was demonstrated by simulation. This work confirms the CMOS vulnerability related to the light-modulated data-dependent static power experimentally and discusses future work.

Updated: 2024-07-09 11:08:23

标题: 关于激光照射对逻辑单元电流消耗的影响

摘要: 物理侧信道攻击对今天的芯片设计构成了巨大挑战。尽管对CMOS动态功耗的攻击代表了一类最先进的攻击，但许多其他效应可能通过影响芯片的静态行为而类似地影响CMOS芯片的安全性，包括老化、电离辐射或非电离辐射照射CMOS。已经在实践中展示了利用CMOS静态功耗中的数据依赖性的漏洞，并通过模拟展示了利用光调制静态功耗的类似漏洞。这项工作通过实验证实了与光调制数据依赖性静态功耗相关的CMOS漏洞，并讨论了未来的工作。

更新时间: 2024-07-09 11:08:23

领域: cs.CR,cs.AR

下载: http://arxiv.org/abs/2407.06758v1

Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning

Periodic activation functions, often referred to as learned Fourier features have been widely demonstrated to improve sample efficiency and stability in a variety of deep RL algorithms. Potentially incompatible hypotheses have been made about the source of these improvements. One is that periodic activations learn low frequency representations and as a result avoid overfitting to bootstrapped targets. Another is that periodic activations learn high frequency representations that are more expressive, allowing networks to quickly fit complex value functions. We analyse these claims empirically, finding that periodic representations consistently converge to high frequencies regardless of their initialisation frequency. We also find that while periodic activation functions improve sample efficiency, they exhibit worse generalization on states with added observation noise -- especially when compared to otherwise equivalent networks with ReLU activation functions. Finally, we show that weight decay regularization is able to partially offset the overfitting of periodic activation functions, delivering value functions that learn quickly while also generalizing.

Updated: 2024-07-09 11:07:41

标题: 强化学习中周期激活函数的频率和泛化

摘要: 周期性激活函数，通常被称为学习的傅立叶特征，已广泛证明可提高各种深度强化学习算法的样本效率和稳定性。关于这些改进的来源，可能存在不相容的假设。其中一个假设是周期性激活函数学习低频表示，从而避免对引导目标的过度拟合。另一个假设是周期性激活函数学习高频表示，这种表示更具表现力，使网络能够快速拟合复杂的值函数。我们通过实证分析这些声明，发现周期性表示始终收敛到高频，无论其初始化频率如何。我们还发现，虽然周期性激活函数提高了样本效率，但在添加观测噪声的状态下，它们的泛化性能较差，特别是与具有ReLU激活函数的等效网络相比。最后，我们表明权重衰减正则化能够部分抵消周期性激活函数的过拟合，提供能够快速学习并具有泛化性能的值函数。

更新时间: 2024-07-09 11:07:41

领域: cs.LG,cs.AI,cs.NE

下载: http://arxiv.org/abs/2407.06756v1

Threats and Defenses in Federated Learning Life Cycle: A Comprehensive Survey and Challenges

Federated Learning (FL) offers innovative solutions for privacy-preserving collaborative machine learning (ML). Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model's utility or compromise participants' privacy, either directly or indirectly. In response, numerous defense frameworks have been proposed, demonstrating effectiveness in specific settings and scenarios. To provide a clear understanding of the current research landscape, this paper reviews the most representative and state-of-the-art threats and defense frameworks throughout the FL service life cycle. We start by identifying FL threats that harm utility and privacy, including those with potential or direct impacts. Then, we dive into the defense frameworks, analyze the relationship between threats and defenses, and compare the trade-offs among different defense strategies. Finally, we summarize current research bottlenecks and offer insights into future research directions to conclude this survey. We hope this survey sheds light on trustworthy FL research and contributes to the FL community.

Updated: 2024-07-09 11:05:45

标题: 《联邦学习生命周期中的威胁与防御：综合调查和挑战》

摘要: 联邦学习（FL）为隐私保护的协作机器学习（ML）提供了创新的解决方案。尽管具有很大的潜力，但由于其分布式性质，FL容易受到各种攻击，影响FL服务的整个生命周期。这些威胁可能损害模型的效用或危害参与者的隐私，无论是直接还是间接的。为了应对这些威胁，已经提出了许多防御框架，展示了在特定环境和场景中的有效性。为了清晰地了解当前研究领域，本文回顾了整个FL服务生命周期中最具代表性和最先进的威胁和防御框架。我们首先确定损害效用和隐私的FL威胁，包括具有潜在或直接影响的威胁。然后，我们深入研究防御框架，分析威胁与防御之间的关系，并比较不同防御策略之间的权衡。最后，我们总结了当前研究瓶颈，并提供了未来研究方向的见解，以结束这项调查。我们希望这项调查能为可信赖的FL研究提供启示，并为FL社区做出贡献。

更新时间: 2024-07-09 11:05:45

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2407.06754v1

A Comparison of Vulnerability Feature Extraction Methods from Textual Attack Patterns

Nowadays, threat reports from cybersecurity vendors incorporate detailed descriptions of attacks within unstructured text. Knowing vulnerabilities that are related to these reports helps cybersecurity researchers and practitioners understand and adjust to evolving attacks and develop mitigation plans. This paper aims to aid cybersecurity researchers and practitioners in choosing attack extraction methods to enhance the monitoring and sharing of threat intelligence. In this work, we examine five feature extraction methods (TF-IDF, LSI, BERT, MiniLM, RoBERTa) and find that Term Frequency-Inverse Document Frequency (TF-IDF) outperforms the other four methods with a precision of 75\% and an F1 score of 64\%. The findings offer valuable insights to the cybersecurity community, and our research can aid cybersecurity researchers in evaluating and comparing the effectiveness of upcoming extraction methods.

Updated: 2024-07-09 11:04:49

标题: 一个文本攻击模式脆弱性特征提取方法的比较

摘要: 现今，来自网络安全供应商的威胁报告包含对未结构文本中攻击的详细描述。了解与这些报告相关的漏洞有助于网络安全研究人员和从业者理解和适应不断发展的攻击，并制定缓解计划。本文旨在帮助网络安全研究人员和从业者选择攻击提取方法，以增强威胁情报的监控和分享。在这项工作中，我们检验了五种特征提取方法（TF-IDF、LSI、BERT、MiniLM、RoBERTa），发现术语频率-逆文档频率（TF-IDF）的表现优于其他四种方法，精度为75\%，F1得分为64\%。这些发现为网络安全社区提供了有价值的见解，我们的研究可以帮助网络安全研究人员评估和比较即将推出的提取方法的效果。

更新时间: 2024-07-09 11:04:49

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2407.06753v1

Laser Fault Injection Attacks against Radiation Tolerant TMR Registers

Security requirements for the Internet of things (IoT), wireless sensor nodes, and other wireless devices connected in a network for data exchange are high. These devices are often subject to lab analysis with the objective to reveal secret hidden information. One kind of attacks to reveal the cryptographic key is to perform optical Fault Injection attacks. In this work, we investigated the IHP radiation tolerant shift registers built of Triple Modular Redundant flip-flops. In our experiments, we were able to inject different transient faults into TMR registers.

Updated: 2024-07-09 11:03:01

标题: 激光故障注入攻击对辐射耐受TMR寄存器的影响

摘要: 物联网（IoT）、无线传感器节点和其他无线设备连接在网络中进行数据交换的安全要求很高。这些设备经常被实验室分析，以揭示隐藏的秘密信息。一种揭示加密密钥的攻击方式是执行光学故障注入攻击。在这项工作中，我们调查了由三重模块冗余触发器构建的IHP辐射耐受移位寄存器。在我们的实验中，我们能够向TMR寄存器注入不同的瞬态故障。

更新时间: 2024-07-09 11:03:01

领域: cs.AR,cs.CR

下载: http://arxiv.org/abs/2407.06751v1

Enhancing Social Media Personalization: Dynamic User Profile Embeddings and Multimodal Contextual Analysis Using Transformer Models

This study investigates the impact of dynamic user profile embedding on personalized context-aware experiences in social networks. A comparative analysis of multilingual and English transformer models was performed on a dataset of over twenty million data points. The analysis included a wide range of metrics and performance indicators to compare dynamic profile embeddings versus non-embeddings (effectively static profile embeddings). A comparative study using degradation functions was conducted. Extensive testing and research confirmed that dynamic embedding successfully tracks users' changing tastes and preferences, providing more accurate recommendations and higher user engagement. These results are important for social media platforms aiming to improve user experience through relevant features and sophisticated recommendation engines.

Updated: 2024-07-09 10:58:46

标题: 提升社交媒体个性化：利用Transformer模型进行动态用户画像嵌入和多模态情境分析

摘要: 本研究调查了动态用户配置文件嵌入对社交网络中个性化上下文感知体验的影响。对超过两千万数据点的数据集进行了多语言和英语变压器模型的比较分析。分析包括广泛的指标和性能指标，以比较动态配置文件嵌入与非嵌入（实际上是静态配置文件嵌入）。进行了使用退化函数的比较研究。广泛的测试和研究证实，动态嵌入成功跟踪用户不断变化的口味和偏好，提供更准确的推荐和更高的用户参与度。这些结果对于社交媒体平台旨在通过相关功能和复杂的推荐引擎改善用户体验是重要的。

更新时间: 2024-07-09 10:58:46

领域: cs.IR,cs.AI,cs.SI

下载: http://arxiv.org/abs/2407.07925v1

iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine

The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. The project offers a common representation schema for the heterogeneous data sources. The iASiS infrastructure is able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This facilitates the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data gives the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.

Updated: 2024-07-09 10:52:19

标题: iASiS：面向个性化医疗的异质大数据分析

摘要: IASIS项目的愿景是将大量的生物医学数据转化为可操作的知识，供决策者使用。这是通过整合来自不同来源的数据，包括基因组学、电子健康记录和文献，以及应用先进的分析方法来发现有用的模式来实现的。其目标是将大量可用数据转化为可操作信息，供当局规划公共卫生活动和政策使用。整合和分析这些异构信息源将使得最佳决策能够被做出，从而使得诊断和治疗能够个性化到每个个体。该项目为这些异构数据源提供了一个共同的表示模式。IASIS基础设施能够将临床记录转化为可用数据，将其与基因组数据、相关文献、图像数据等结合，创建一个全球知识库。这有助于使用智能方法来发现跨不同资源的有用模式。使用数据的语义整合为生成丰富、可审计和可靠的信息提供了机会。这些信息可以用于提供更好的护理，减少错误，并在数据共享方面提供更多信心，从而提供更多的见解和机会。IASIS用例中探讨了两种不同疾病类别的数据资源，包括痴呆症和肺癌。

更新时间: 2024-07-09 10:52:19

领域: cs.AI,cs.DB

下载: http://arxiv.org/abs/2407.06748v1

Modularity aided consistent attributed graph clustering via coarsening

Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.

Updated: 2024-07-09 10:42:19

标题: 模块化辅助的一致属性图聚类通过粗化

摘要: 图聚类是一种重要的无监督学习技术，用于对具有属性的图进行分区并检测社区。然而，当前的方法很难准确捕捉真实的社区结构和簇内关系，计算效率低，并识别较小的社区。我们通过集成粗化和模块化最大化来解决这些挑战，有效地利用邻接和节点特征来增强聚类准确性。我们提出了一个损失函数，使用块主极大-极小化技术结合对数行列式、平滑性和模块化组件，导致出色的聚类结果。该方法在度校正随机块模型（DC-SBM）下在理论上保持一致，确保渐近无误差性能和完整标签恢复。我们的可证收敛且时间高效的算法与图神经网络（GNNs）和变分图自动编码器（VGAEs）无缝集成，以学习增强的节点特征并提供出色的聚类性能。对基准数据集的大量实验表明，它在属性图和非属性图上均优于现有的最先进方法。

更新时间: 2024-07-09 10:42:19

领域: cs.LG,cs.SI,stat.ML

下载: http://arxiv.org/abs/2407.07128v1

Expanding continual few-shot learning benchmarks to include recognition of specific instances

Continual learning and few-shot learning are important frontiers in progress toward broader Machine Learning (ML) capabilities. Recently, there has been intense interest in combining both. One of the first examples to do so was the Continual few-shot Learning (CFSL) framework of Antoniou et al. arXiv:2004.11967. In this study, we extend CFSL in two ways that capture a broader range of challenges, important for intelligent agent behaviour in real-world conditions. First, we increased the number of classes by an order of magnitude, making the results more comparable to standard continual learning experiments. Second, we introduced an 'instance test' which requires recognition of specific instances of classes -- a capability of animal cognition that is usually neglected in ML. For an initial exploration of ML model performance under these conditions, we selected representative baseline models from the original CFSL work and added a model variant with replay. As expected, learning more classes is more difficult than the original CFSL experiments, and interestingly, the way in which image instances and classes are presented affects classification performance. Surprisingly, accuracy in the baseline instance test is comparable to other classification tasks, but poor given significant occlusion and noise. The use of replay for consolidation substantially improves performance for both types of tasks, but particularly for the instance test.

Updated: 2024-07-09 10:41:17

标题: 将文献标题翻译为：将不断增长的少样本学习基准扩展到包括特定实例的识别

摘要: 持续学习和少样本学习是朝着更广泛机器学习（ML）能力的进展的重要前沿。最近，人们对将两者结合的兴趣日益增加。安东尼乌等人的持续少样本学习（CFSL）框架arXiv:2004.11967是最早的结合两者的例子之一。在这项研究中，我们以两种方式扩展了CFSL，以捕捉更广泛的挑战，这对于在现实条件下进行智能代理行为非常重要。首先，我们将类别数量增加了一个数量级，使结果更易与标准持续学习实验进行比较。其次，我们引入了一个“实例测试”，它要求识别特定类别的实例 -- 这是动物认知中通常被忽略的一种能力。为了初步探索这些条件下机器学习模型的性能，我们从原始CFSL工作中选择了代表性的基线模型，并添加了一个具有回放功能的模型变体。正如预期的那样，学习更多类别比原始CFSL实验更困难，有趣的是，展示图像实例和类别的方式影响分类性能。令人惊讶的是，基线实例测试的准确性与其他分类任务相当，但在存在明显遮挡和噪音情况下表现不佳。使用回放进行巩固显著提高了两种类型任务的性能，尤其是对于实例测试。

更新时间: 2024-07-09 10:41:17

领域: cs.NE,cs.LG,I.2.6; I.5.0; I.5.1

下载: http://arxiv.org/abs/2209.07863v4

Positive-Unlabelled Learning for Improving Image-based Recommender System Explainability

Among the existing approaches for visual-based Recommender System (RS) explainability, utilizing user-uploaded item images as efficient, trustable explanations is a promising option. However, current models following this paradigm assume that, for any user, all images uploaded by other users can be considered negative training examples (i.e. bad explanatory images), an inadvertedly naive labelling assumption that contradicts the rationale of the approach. This work proposes a new explainer training pipeline by leveraging Positive-Unlabelled (PU) Learning techniques to train image-based explainer with refined subsets of reliable negative examples for each user selected through a novel user-personalized, two-step, similarity-based PU Learning algorithm. Computational experiments show this PU-based approach outperforms the state-of-the-art non-PU method in six popular real-world datasets, proving that an improvement of visual-based RS explainability can be achieved by maximizing training data quality rather than increasing model complexity.

Updated: 2024-07-09 10:40:31

标题: 正标记-无标记学习用于改善基于图像的推荐系统解释性

摘要: 在现有的基于视觉的推荐系统(RS)可解释性方法中，利用用户上传的物品图像作为高效、可信的解释是一种有前途的选择。然而，目前遵循这一范式的模型假设，对于任何用户来说，其他用户上传的所有图像都可以被视为负面训练样本（即不良解释图像），这是一种无意中天真的标记假设，与该方法的原理相矛盾。本文提出了一种新的解释器训练流程，通过利用正样本-未标记样本（PU）学习技术，为每个用户选择的可靠负面示例的改进子集来训练基于图像的解释器，通过一种新颖的基于相似性的用户个性化、两步骤的PU学习算法。计算实验表明，这种基于PU的方法在六个流行的真实世界数据集中优于最先进的非PU方法，证明通过最大化训练数据质量而不是增加模型复杂性，可以实现对基于视觉的RS可解释性的改进。

更新时间: 2024-07-09 10:40:31

领域: cs.LG,cs.AI,cs.CV,cs.IR

下载: http://arxiv.org/abs/2407.06740v1

Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models

As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.

Updated: 2024-07-09 10:23:53

标题: 全球剪辑器：增强基于变压器的目标检测模型的安全性和可靠性

摘要: 随着基于transformer的目标检测模型的进展，它们在自动驾驶车辆和航空等关键领域的影响预计将增长。推理过程中导致比特翻转的软错误显著影响了深度神经网络的性能，改变了预测结果。传统的CNN的范围限制解决方案对于transformers来说并不足够。本研究介绍了全局裁剪器和全局混合裁剪器，这是专门为基于transformer的模型设计的有效缓解策略。它显著增强了它们对软错误的韧性，并将错误推理降低到约0％。我们还详细介绍了在涉及两个transformer模型（DINO-DETR和Lite-DETR）和两个CNN模型（YOLOv3和SSD）以及三个数据集的64多种情景下进行的广泛测试，总共进行了约330万次推理，全面评估了模型的稳健性。此外，本文探讨了transformers中注意力块的独特特性以及它们与CNN的操作差异。

更新时间: 2024-07-09 10:23:53

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2406.03229v4

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Humans describe complex scenes with compositionality, using simple text descriptions enriched with links and relationships. While vision-language research has aimed to develop models with compositional understanding capabilities, this is not reflected yet in existing datasets which, for the most part, still use plain text to describe images. In this work, we propose a new annotation strategy, graph-based captioning (GBC) that describes an image using a labelled graph structure, with nodes of various types. The nodes in GBC are created using, in a first stage, object detection and dense captioning tools nested recursively to uncover and describe entity nodes, further linked together in a second stage by highlighting, using new types of nodes, compositions and relations among entities. Since all GBC nodes hold plain text descriptions, GBC retains the flexibility found in natural language, but can also encode hierarchical information in its edges. We demonstrate that GBC can be produced automatically, using off-the-shelf multimodal LLMs and open-vocabulary detection models, by building a new dataset, GBC10M, gathering GBC annotations for about 10M images of the CC12M dataset. We use GBC10M to showcase the wealth of node captions uncovered by GBC, as measured with CLIP training. We show that using GBC nodes' annotations -- notably those stored in composition and relation nodes -- results in significant performance boost on downstream models when compared to other dataset formats. To further explore the opportunities provided by GBC, we also propose a new attention mechanism that can leverage the entire GBC graph, with encouraging experimental results that show the extra benefits of incorporating the graph structure. Our datasets are released at \url{https://huggingface.co/graph-based-captions}.

Updated: 2024-07-09 09:55:04

标题: 基于图的标题生成：通过互联区域描述增强视觉描述

摘要: 人类用组合性描述复杂场景，使用简单的文本描述并丰富链接和关系。虽然视觉语言研究旨在开发具有组合理解能力的模型，但这在现有数据集中尚未反映出来，其中大部分仍然使用普通文本描述图像。在这项工作中，我们提出了一种新的注释策略，基于图的字幕（GBC），它使用带有各种类型节点的标记图结构描述图像。GBC中的节点是使用对象检测和密集字幕工具递归嵌套创建的，以发现和描述实体节点，然后在第二阶段将它们进一步链接在一起，通过使用新类型的节点，强调实体之间的组合和关系。由于所有GBC节点都包含普通文本描述，GBC保留了自然语言中的灵活性，但也可以在其边缘中编码层次信息。我们展示了GBC可以通过使用现成的多模态LLMs和开放词汇检测模型自动产生，通过构建一个新数据集GBC10M，收集CC12M数据集中约10M图像的GBC注释。我们使用GBC10M展示了通过GBC揭示的节点说明的丰富性，通过CLIP训练进行测量。我们展示了使用GBC节点的注释（特别是存储在组合和关系节点中的注释）相比其他数据集格式，在下游模型上产生显著的性能提升。为了进一步探索GBC提供的机会，我们还提出了一种新的注意机制，可以利用整个GBC图，具有鼓励的实验结果，显示了整合图结构的额外好处。我们的数据集发布在\url{https://huggingface.co/graph-based-captions}。

更新时间: 2024-07-09 09:55:04

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.06723v1

A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts

This study proposes a simple architecture for Enterprise application for Large Language Models (LLMs) for role based security and NATO clearance levels. Our proposal aims to address the limitations of current LLMs in handling security and information access. The proposed architecture could be used while utilizing Retrieval-Augmented Generation (RAG) and fine tuning of Mixture of experts models (MoE). It could be used only with RAG, or only with MoE or with both of them. Using roles and security clearance level of the user, documents in RAG and experts in MoE are filtered. This way information leakage is prevented.

Updated: 2024-07-09 09:46:23

标题: 一个基于角色权限和清除级别的企业大型语言模型应用的简单架构：使用检索增强生成或专家混合模式

摘要: 本研究提出了一个简单的企业应用架构，用于大型语言模型（LLMs）的基于角色的安全性和北约许可级别。我们的提议旨在解决当前LLMs在处理安全性和信息访问方面的局限性。所提出的架构可以在利用检索增强生成（RAG）和混合专家模型（MoE）的同时使用。它可以仅与RAG一起使用，或仅与MoE一起使用，或两者一起使用。通过使用用户的角色和安全许可级别，筛选RAG中的文档和MoE中的专家。这样可以防止信息泄露。

更新时间: 2024-07-09 09:46:23

领域: cs.AI,D.2.11; I.2.7

下载: http://arxiv.org/abs/2407.06718v1

Transfer Learning Study of Motion Transformer-based Trajectory Predictions

Trajectory planning in autonomous driving is highly dependent on predicting the emergent behavior of other road users. Learning-based methods are currently showing impressive results in simulation-based challenges, with transformer-based architectures technologically leading the way. Ultimately, however, predictions are needed in the real world. In addition to the shifts from simulation to the real world, many vehicle- and country-specific shifts, i.e. differences in sensor systems, fusion and perception algorithms as well as traffic rules and laws, are on the agenda. Since models that can cover all system setups and design domains at once are not yet foreseeable, model adaptation plays a central role. Therefore, a simulation-based study on transfer learning techniques is conducted on basis of a transformer-based model. Furthermore, the study aims to provide insights into possible trade-offs between computational time and performance to support effective transfers into the real world.

Updated: 2024-07-09 09:46:06

标题: 基于运动转换器的轨迹预测的迁移学习研究

摘要: 自动驾驶中的轨迹规划高度依赖于预测其他道路使用者的紧急行为。基于学习的方法目前在基于仿真的挑战中展现出令人印象深刻的结果，以基于转换器的架构技术处于领先地位。然而，最终需要在现实世界中进行预测。除了从仿真到现实世界的转变之外，许多车辆和国家特定的转变，即传感器系统、融合和感知算法以及交通规则和法律的差异，也在议程中。由于目前尚无法预见可以涵盖所有系统设置和设计领域的模型，模型适应起着至关重要的作用。因此，在基于转换器模型的基础上进行了转移学习技术的基于仿真的研究。此外，该研究旨在提供有关计算时间和性能之间可能权衡的见解，以支持有效地转移到现实世界中。

更新时间: 2024-07-09 09:46:06

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.08271v2

Stochastic Multi-round Submodular Optimization with Budget

In this work we study the problem of {\em Stochastic Budgeted Multi-round Submodular Maximization} (SBMSm), in which we would like to adaptively maximize the sum over multiple rounds of the value of a monotone and submodular objective function defined on a subset of items, subject to the fact that the values of this function depend on the realization of stochastic events and the number of items that we can select over all rounds is limited by a given budget. This problem extends, and generalizes to multiple round settings, well-studied problems such as (adaptive) influence maximization and stochastic probing. We first show that, if the number of items and stochastic events is somehow bounded, there is a polynomial time dynamic programming algorithm for SBMSm. Then, we provide a simple greedy approximation algorithm for SBMSm, that first non-adaptively allocates the budget to be spent at each round, and then greedily and adaptively maximizes the objective function by using the budget assigned at each round. Such algorithm guarantees a $(1-1/e-\epsilon)$-approximation to the optimal adaptive value. Finally, by introducing a metric called {\em budget-adaptivity gap}, we measure how much an optimal policy for SBMSm, that is adaptive in both the budget allocation and item selection, is better than an optimal partially adaptive policy that, as in our greedy algorithm, determined the budget allocation in advance. We show a tight bound of $e/(e-1)$ on the budget-adaptivity gap, and this result implies that our greedy algorithm guarantees the best approximation among all partially adaptive policies.

Updated: 2024-07-09 09:43:57

标题: 随机多轮带预算次模优化

摘要: 在这项工作中，我们研究了{\em 随机预算多轮次子模最大化}（SBMSm）的问题，其中我们希望在多轮次中自适应地最大化定义在物品子集上的单调和子模目标函数的值，同时考虑到这个函数的值取决于随机事件的实现以及我们在所有轮次中可以选择的物品数量受到给定预算的限制。这个问题扩展了，并泛化到多轮次设置中，如（自适应）影响最大化和随机探测等经过充分研究的问题。我们首先展示了，如果物品和随机事件的数量在某种程度上是有限的，那么SBMSm有一个多项式时间的动态规划算法。然后，我们提供了一个简单的贪心逼近算法用于SBMSm，该算法首先非自适应地分配每一轮要花费的预算，然后通过使用每一轮分配的预算贪心地自适应地最大化目标函数。这种算法保证了对最优自适应值的$(1-1/e-\epsilon)$近似。最后，通过引入一个称为{\em 预算自适应性差距}的度量，我们衡量了针对SBMSm的最优策略在预算分配和物品选择方面都是自适应的情况下，与一个在预算分配方面与我们的贪心算法相似地提前确定预算分配的最优部分自适应策略相比，有多么大的优势。我们展示出了关于预算自适应性差距的紧密界限为$e/(e-1)$，这个结果意味着我们的贪心算法保证了在所有部分自适应策略中最好的逼近。

更新时间: 2024-07-09 09:43:57

领域: cs.DS,cs.AI

下载: http://arxiv.org/abs/2404.13737v2

Secure Outsourced Decryption for FHE-based Privacy-preserving Cloud Computing

The demand for processing vast volumes of data has surged dramatically due to the advancement of machine learning technology. Large-scale data processing necessitates substantial computational resources, prompting individuals and enterprises to turn to cloud services. Accompanying this trend is a growing concern regarding data leakage and misuse. Homomorphic encryption (HE) is one solution for safeguarding data privacy, enabling encrypted data to be processed securely in the cloud. However, the encryption and decryption routines of some HE schemes require considerable computational resources, presenting non-trivial work for clients. In this paper, we propose an outsourced decryption protocol for the prevailing RLWE-based fully homomorphic encryption schemes. The protocol splits the original decryption into two routines, with the computationally intensive part executed remotely by the cloud. Its security relies on an invariant of the NTRU-search problem with a newly designed blinding key distribution. Cryptographic analyses are conducted to configure protocol parameters across varying security levels. Our experiments demonstrate that the proposed protocol achieves up to a $67\%$ acceleration in the client's local decryption, accompanied by a $50\%$ reduction in space usage.

Updated: 2024-07-09 09:40:52

标题: 基于全同态加密的隐私保护云计算安全外包解密

摘要: 由于机器学习技术的进步，对处理大量数据的需求急剧增加。大规模数据处理需要大量的计算资源，促使个人和企业转向云服务。伴随着这一趋势的是对数据泄露和滥用日益增长的担忧。同态加密（HE）是保护数据隐私的一种解决方案，使加密数据能够在云中安全处理。然而，一些HE方案的加密和解密程序需要大量的计算资源，对客户提出了不小的挑战。本文提出一种针对当前RLWE-based全同态加密方案的外包解密协议。该协议将原始解密分为两个程序，计算密集型部分由云端远程执行。其安全性依赖于新设计的NTRU-search问题的不变性盲化密钥分发。通过加密分析，配置不同安全级别的协议参数。我们的实验表明，所提出的协议在客户端本地解密方面实现了高达67%的加速，并减少了50%的空间使用。

更新时间: 2024-07-09 09:40:52

领域: cs.CR

下载: http://arxiv.org/abs/2406.19964v2

CBM: Curriculum by Masking

We propose Curriculum by Masking (CBM), a novel state-of-the-art curriculum learning strategy that effectively creates an easy-to-hard training schedule via patch (token) masking, offering significant accuracy improvements over the conventional training regime and previous curriculum learning (CL) methods. CBM leverages gradient magnitudes to prioritize the masking of salient image regions via a novel masking algorithm and a novel masking block. Our approach enables controlling sample difficulty via the patch masking ratio, generating an effective easy-to-hard curriculum by gradually introducing harder samples as training progresses. CBM operates with two easily configurable parameters, i.e. the number of patches and the curriculum schedule, making it a versatile curriculum learning approach for object recognition and detection. We conduct experiments with various neural architectures, ranging from convolutional networks to vision transformers, on five benchmark data sets (CIFAR-10, CIFAR-100, ImageNet, Food-101 and PASCAL VOC), to compare CBM with conventional as well as curriculum-based training regimes. Our results reveal the superiority of our strategy compared with the state-of-the-art curriculum learning regimes. We also observe improvements in transfer learning contexts, where CBM surpasses previous work by considerable margins in terms of accuracy. We release our code for free non-commercial use at https://github.com/CroitoruAlin/CBM.

Updated: 2024-07-09 09:40:38

标题: CBM：遮蔽课程

摘要: 我们提出了一种新颖的最新课程学习策略Curriculum by Masking（CBM），通过补丁（令牌）掩盖有效地创建一个易于困难的训练时间表，从而比传统训练制度和先前的课程学习（CL）方法提供了显著的准确性改进。CBM利用梯度大小来优先考虑通过一种新颖的掩盖算法和一种新颖的掩盖块遮盖显著的图像区域。我们的方法通过逐渐引入更难的样本来控制样本难度，通过补丁掩盖比例生成一个有效的易于困难的课程表。CBM通过两个易于配置的参数操作，即补丁数量和课程时间表，使其成为一种适用于目标识别和检测的多功能课程学习方法。我们在五个基准数据集（CIFAR-10、CIFAR-100、ImageNet、Food-101和PASCAL VOC）上使用各种神经架构进行实验，从卷积网络到视觉变换器，以比较CBM与传统以及基于课程的训练制度。我们的结果显示了我们的策略与最先进的课程学习制度相比的优越性。我们还观察到在迁移学习情境中的改进，CBM在准确性方面超越了以往的工作。我们免费发布我们的代码供非商业使用，网址为https://github.com/CroitoruAlin/CBM。

更新时间: 2024-07-09 09:40:38

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.05193v2

MDP Geometry, Normalization and Value Free Solvers

Markov Decision Process (MDP) is a common mathematical model for sequential decision-making problems. In this paper, we present a new geometric interpretation of MDP, which is useful for analyzing the dynamics of main MDP algorithms. Based on this interpretation, we demonstrate that MDPs can be split into equivalence classes with indistinguishable algorithm dynamics. The related normalization procedure allows for the design of a new class of MDP-solving algorithms that find optimal policies without computing policy values.

Updated: 2024-07-09 09:39:45

标题: MDP几何、标准化和无价值解算器

摘要: 马尔可夫决策过程（MDP）是一个常见的用于序贯决策问题的数学模型。本文提出了一种新的几何解释MDP的方法，有助于分析主要MDP算法的动态。基于这种解释，我们展示了MDP可以被分割成具有无法区分算法动态的等价类。相关的归一化程序允许设计一种新的MDP解决算法类，可以找到最优策略而无需计算策略值。

更新时间: 2024-07-09 09:39:45

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2407.06712v1

Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification

Multi-label ranking, which returns multiple top-ranked labels for each instance, has a wide range of applications for visual tasks. Due to its complicated setting, prior arts have proposed various measures to evaluate model performances. However, both theoretical analysis and empirical observations show that a model might perform inconsistently on different measures. To bridge this gap, this paper proposes a novel measure named Top-K Pairwise Ranking (TKPR), and a series of analyses show that TKPR is compatible with existing ranking-based measures. In light of this, we further establish an empirical surrogate risk minimization framework for TKPR. On one hand, the proposed framework enjoys convex surrogate losses with the theoretical support of Fisher consistency. On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction. Finally, empirical results on benchmark datasets validate the effectiveness of the proposed framework.

Updated: 2024-07-09 09:36:37

标题: Top-K对排序：弥合基于排序的多标签分类度量之间的差距

摘要: 多标签排名是为每个实例返回多个排名靠前的标签，在视觉任务中有广泛的应用。由于其复杂的设置，先前的研究提出了各种衡量模型性能的方法。然而，理论分析和实证观察表明，模型在不同的衡量标准上可能表现不一致。为了填补这一差距，本文提出了一种名为Top-K Pairwise Ranking（TKPR）的新衡量方法，并一系列分析表明TKPR与现有基于排名的衡量方法兼容。基于此，我们进一步建立了TKPR的经验替代风险最小化框架。一方面，所提出的框架具有凸替代损失函数，并得到Fisher一致性的理论支持。另一方面，我们基于一种名为数据相关收缩的新技术为所提出的框架建立了尖锐的泛化界限。最后，基准数据集上的实证结果验证了所提出框架的有效性。

更新时间: 2024-07-09 09:36:37

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.06709v1

Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems

In the age of Industry 4.0 and Cyber-Physical Production Systems (CPPSs) vast amounts of potentially valuable data are being generated. Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected. The knowledge obtained can in turn be used to improve tasks like diagnostics or maintenance planning. However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISP-DM), often fail due to the disproportionate amount of time needed for understanding and preparing the data. The application of domain-specific ontologies has demonstrated its advantageousness in a wide variety of Industry 4.0 application scenarios regarding the aforementioned challenges. However, workflows and artifacts from ontology design for CPPSs have not yet been systematically integrated into the CRISP-DM. Accordingly, this contribution intends to present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS. The result is exemplarily applied to an anomaly detection use case.

Updated: 2024-07-09 09:34:26

标题: 将领域专家中心的本体设计集成到CRISP-DM中，用于网络物理生产系统

摘要: 在工业4.0和物联网生产系统（CPPSs）时代，大量潜在有价值的数据正在生成。机器学习（ML）和数据挖掘（DM）的方法已被证明在从收集的数据中提取复杂和隐藏模式方面是有前途的。所获得的知识可以用来改进诊断或维护计划等任务。然而，这种数据驱动的项目通常使用跨行业数据挖掘标准过程（CRISP-DM）进行，由于理解和准备数据所需的时间不成比例，往往会失败。在各种工业4.0应用场景中，领域特定本体论的应用已经证明了其优势，解决了上述挑战。然而，针对CPPS的本体论设计的工作流程和工件尚未系统地整合到CRISP-DM中。因此，本文意在提出一种集成方法，使数据科学家能够更快速、可靠地了解CPPS。结果以异常检测用例为例进行了应用。

更新时间: 2024-07-09 09:34:26

领域: cs.AI

下载: http://arxiv.org/abs/2307.11637v2

Communication Optimal Unbalanced Private Set Union

We consider the private set union (PSU) problem, where two parties each hold a private set of elements, and they want one of the parties (the receiver) to learn the union of the two sets and nothing else. Our protocols are targeted for the unbalanced case where the receiver's set size is larger than the sender's set size, with the goal of minimizing the costs for the sender both in terms of communication volume and local computation time. This setting is motivated by applications where the receiver has significantly more data (input set size) and computational resources than the sender which might be realized on a small, low-power device. Asymptotically, we achieve communication cost linear in the sender's (smaller) set size, and computation costs for sender and receiver which are nearly-linear in their respective set sizes. To our knowledge, ours is the first algorithm to achieve nearly-linear communication and computation for PSU in this unbalanced setting. Our protocols utilize fully homomorphic encryption (FHE) and, optionally, linearly homomorphic encryption (LHE) to perform the necessary computations while preserving privacy. The underlying computations are based on univariate polynomial arithmetic realized within homomorphic encryption, namely fast multiplication, modular reduction, and multi-point evaluation. These asymptotically fast HE polynomial arithmetic algorithms may be of independent interest.

Updated: 2024-07-09 09:34:03

标题: 通信最优的不平衡私有集合并操作

摘要: 我们考虑私有集合并（PSU）问题，其中两个参与方各自持有一组私有元素，他们希望其中一方（接收方）学习两组集合的并集，而不知道其他任何信息。我们的协议针对接收方集合大小大于发送方集合大小的不平衡情况，旨在尽量减少发送方在通信量和本地计算时间方面的成本。这种设置源于接收方在数据量（输入集合大小）和计算资源方面显著多于发送方的应用场景，发送方可能是一台小型、低功耗设备。从渐近的角度来看，我们实现了与发送方（较小）集合大小线性相关的通信成本，并实现了与各自集合大小几乎线性相关的发送方和接收方的计算成本。据我们所知，我们的算法是在这种不平衡设置下实现近线性通信和计算成本的第一个算法。我们的协议利用完全同态加密（FHE）和可选的线性同态加密（LHE）来执行必要的计算，同时保护隐私。基础计算基于同态加密中实现的一元多项式算术，即快速乘法、模数缩减和多点评估。这些渐进快速的同态加密多项式算术算法可能具有独立的兴趣。

更新时间: 2024-07-09 09:34:03

领域: cs.CR,cs.SC

下载: http://arxiv.org/abs/2402.16393v2

MicroT: Low-Energy and Adaptive Models for MCUs

We propose MicroT, a low-energy, multi-task adaptive model framework for resource-constrained MCUs. We divide the original model into a feature extractor and a classifier. The feature extractor is obtained through self-supervised knowledge distillation and further optimized into part and full models through model splitting and joint training. These models are then deployed on MCUs, with classifiers added and trained on local tasks, ultimately performing stage-decision for joint inference. In this process, the part model initially processes the sample, and if the confidence score falls below the set threshold, the full model will resume and continue the inference. We evaluate MicroT on two models, three datasets, and two MCU boards. Our experimental evaluation shows that MicroT effectively improves model performance and reduces energy consumption when dealing with multiple local tasks. Compared to the unoptimized feature extractor, MicroT can improve accuracy by up to 9.87%. On MCUs, compared to the standard full model inference, MicroT can save up to about 29.13% in energy consumption. MicroT also allows users to adaptively adjust the stage-decision ratio as needed, better balancing model performance and energy consumption. Under the standard stage-decision ratio configuration, MicroT can increase accuracy by 5.91% and save about 14.47% of energy consumption.

Updated: 2024-07-09 09:33:45

标题: MicroT：针对MCU的低能耗和自适应模型

摘要: 我们提出了MicroT，这是一个为资源受限的MCU设计的低能耗、多任务自适应模型框架。我们将原始模型分为特征提取器和分类器。通过自监督知识蒸馏获得特征提取器，并通过模型分割和联合训练进一步优化为部分模型和完整模型。这些模型然后部署在MCU上，分类器添加并在本地任务上进行训练，最终执行联合推理的阶段决策。在这个过程中，如果部分模型初始处理样本，如果置信度得分低于设定的阈值，完整模型将继续推理。我们在两个模型、三个数据集和两个MCU板上评估了MicroT。我们的实验评估表明，MicroT在处理多个本地任务时有效提高了模型性能并降低了能耗。与未优化的特征提取器相比，MicroT的准确率提高了高达9.87%。在MCU上，与标准完整模型推理相比，MicroT可以节省高达29.13%的能耗。MicroT还允许用户根据需要自适应调整阶段决策比例，更好地平衡模型性能和能耗。在标准阶段决策比例配置下，MicroT可以提高准确率5.91%并节省约14.47%的能耗。

更新时间: 2024-07-09 09:33:45

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2403.08040v2

Probabilistically-Sound Beam Search with Masked Language Models

Beam search with masked language models (MLMs) is challenging in part because joint probability distributions over sequences are not readily available, unlike for autoregressive models. However, estimating such distributions has important domain-specific applications such as ancient text restoration and protein engineering. Here we present probabilistically-sound methods for beam search with MLMs. First, we clarify the conditions under which it is theoretically sound to perform text infilling with MLMs using standard beam search. When these conditions fail, we provide a probabilistically-sound modification with no additional computational complexity and demonstrate that it is superior to the aforementioned beam search in the expected conditions. We then present empirical results comparing several infilling approaches with MLMs across several domains.

Updated: 2024-07-09 09:32:52

标题: 基于掩码语言模型的概率正确性束搜索

摘要: 使用掩蔽语言模型（MLMs）进行波束搜索在某种程度上具有挑战性，因为序列上的联合概率分布不像自回归模型那样readily available。然而，估计这种分布在古文本恢复和蛋白工程等具有重要领域特定应用。在这里，我们提出了用MLMs进行波束搜索的概率上可靠的方法。首先，我们澄清了在哪些条件下使用标准波束搜索进行文本填充与MLMs在理论上是合理的。当这些条件失败时，我们提供了一个概率上可靠的修改，没有额外的计算复杂性，并且证明在预期条件下它优于前述的波束搜索。然后，我们提出了在几个领域中比较几种MLMs文本填充方法的实证结果。

更新时间: 2024-07-09 09:32:52

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2402.15020v2

Self-supervised visual learning from interactions with objects

Self-supervised learning (SSL) has revolutionized visual representation learning, but has not achieved the robustness of human vision. A reason for this could be that SSL does not leverage all the data available to humans during learning. When learning about an object, humans often purposefully turn or move around objects and research suggests that these interactions can substantially enhance their learning. Here we explore whether such object-related actions can boost SSL. For this, we extract the actions performed to change from one ego-centric view of an object to another in four video datasets. We then introduce a new loss function to learn visual and action embeddings by aligning the performed action with the representations of two images extracted from the same clip. This permits the performed actions to structure the latent visual representation. Our experiments show that our method consistently outperforms previous methods on downstream category recognition. In our analysis, we find that the observed improvement is associated with a better viewpoint-wise alignment of different objects from the same category. Overall, our work demonstrates that embodied interactions with objects can improve SSL of object categories.

Updated: 2024-07-09 09:31:15

标题: 与物体互动的自监督视觉学习

摘要: 自监督学习（SSL）已经彻底改变了视觉表示学习，但尚未达到人类视觉的稳健性。这可能是因为SSL在学习过程中没有充分利用人类可用的所有数据。在学习物体时，人类经常会有目的地转动或移动物体，研究表明这些交互可以大大增强他们的学习能力。在这里，我们探讨了这种与物体相关的行为是否可以提升SSL。为此，我们从四个视频数据集中提取了用于从一个自我中心视角转换到另一个的动作。然后，我们引入了一个新的损失函数，通过将执行的动作与从同一剪辑中提取的两幅图像的表示对齐来学习视觉和动作嵌入。这允许执行的动作来构建潜在的视觉表示。我们的实验表明，我们的方法在下游类别识别上始终优于先前的方法。在我们的分析中，我们发现观察到的改进与同一类别的不同物体的视角对齐更好有关。总的来说，我们的工作表明，与物体的互动可以提高对物体类别的SSL。

更新时间: 2024-07-09 09:31:15

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06704v1

HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

Predicting the stability and fitness effects of amino acid mutations in proteins is a cornerstone of biological discovery and engineering. Various experimental techniques have been developed to measure mutational effects, providing us with extensive datasets across a diverse range of proteins. By training on these data, traditional computational modeling and more recent machine learning approaches have advanced significantly in predicting mutational effects. Here, we introduce HERMES, a 3D rotationally equivariant structure-based neural network model for mutational effect and stability prediction. Pre-trained to predict amino acid propensity from its surrounding 3D structure, HERMES can be fine-tuned for mutational effects using our open-source code. We present a suite of HERMES models, pre-trained with different strategies, and fine-tuned to predict the stability effect of mutations. Benchmarking against other models shows that HERMES often outperforms or matches their performance in predicting mutational effect on stability, binding, and fitness. HERMES offers versatile tools for evaluating mutational effects and can be fine-tuned for specific predictive objectives.

Updated: 2024-07-09 09:31:05

标题: HERMES：用于突变效应和稳定性预测的全息等变神经网络模型

摘要: 预测蛋白质中氨基酸突变的稳定性和适应性影响是生物学发现和工程的基石。已经开发了各种实验技术来测量突变效应，为我们提供了跨多种蛋白质的广泛数据集。通过在这些数据上进行训练，传统的计算建模和更近期的机器学习方法在预测突变效应方面取得了显著进展。在这里，我们介绍了HERMES，这是一个基于结构的3D旋转等变神经网络模型，用于突变效应和稳定性预测。通过预先训练来预测氨基酸在其周围3D结构中的倾向性，HERMES可以通过我们的开源代码进行微调，用于突变效应的预测。我们提出了一系列HERMES模型，通过不同策略进行预训练，并进行微调以预测突变对稳定性的影响。与其他模型的基准测试表明，HERMES在预测突变对稳定性、结合和适应性的效应时通常表现优于或与它们的性能相匹配。HERMES提供了多功能工具，用于评估突变效应，并可以根据特定的预测目标进行微调。

更新时间: 2024-07-09 09:31:05

领域: q-bio.BM,cs.LG,J.3

下载: http://arxiv.org/abs/2407.06703v1

Rethinking Model Re-Basin and Linear Mode Connectivity

Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging by ensuring the linear mode connectivity. However, current re-basin strategies are ineffective in many scenarios due to a lack of comprehensive understanding of underlying mechanisms. Addressing this gap, this paper provides novel insights into understanding and improving the standard practice. Firstly, we decompose re-normalization into rescaling and reshift, uncovering that rescaling plays a crucial role in re-normalization while re-basin performance is sensitive to shifts in model activation. The finding calls for a more nuanced handling of the activation shift. Secondly, we identify that the merged model suffers from the issue of activation collapse and magnitude collapse. Varying the learning rate, weight decay, and initialization method can mitigate the issues and improve model performance. Lastly, we propose a new perspective to unify the re-basin and pruning, under which a lightweight yet effective post-pruning technique is derived, which can significantly improve the model performance after pruning. Our implementation is available at https://github.com/XingyuQu/rethink-re-basin.

Updated: 2024-07-09 09:23:25

标题: 重新思考模型重新盆地和线性模式连接

摘要: 最近的研究表明，通过足够宽的模型，大部分SGD解决方案最终会收敛到相同的盆地，除了排列顺序不同。这种现象被称为模型重新盆地制度，对通过模型平均来确保线性模式连接性具有重要意义。然而，由于对潜在机制缺乏全面的理解，目前的重新盆地策略在许多情况下并不有效。本文填补了这一空白，提供了对标准实践的新见解。首先，我们将重新归一化分解为重缩放和重新偏移，揭示了重缩放在重新归一化中起着关键作用，而重新盆地性能对模型激活中的偏移敏感。这一发现要求更加细致地处理激活偏移。其次，我们发现合并模型存在激活崩溃和幅度崩溃的问题。调整学习率、权重衰减和初始化方法可以缓解这些问题并提高模型性能。最后，我们提出了一个统一重新盆地和修剪的新视角，根据这一视角推导出了一种轻量而有效的后修剪技术，可以显著提高修剪后的模型性能。我们的实现可在https://github.com/XingyuQu/rethink-re-basin找到。

更新时间: 2024-07-09 09:23:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.05966v2

PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision

Positive and Unlabeled (PU) learning, a binary classification model trained with only positive and unlabeled data, generally suffers from overfitted risk estimation due to inconsistent data distributions. To address this, we introduce a pseudo-supervised PU learning framework (PSPU), in which we train the PU model first, use it to gather confident samples for the pseudo supervision, and then apply these supervision to correct the PU model's weights by leveraging non-PU objectives. We also incorporate an additional consistency loss to mitigate noisy sample effects. Our PSPU outperforms recent PU learning methods significantly on MNIST, CIFAR-10, CIFAR-100 in both balanced and imbalanced settings, and enjoys competitive performance on MVTecAD for industrial anomaly detection.

Updated: 2024-07-09 09:19:01

标题: PSPU：通过利用伪监督增强正向和无标签学习

摘要: Positive and Unlabeled (PU) learning是一种仅使用正样本和未标记数据训练的二元分类模型，通常由于数据分布不一致而导致过拟合风险估计。为了解决这个问题，我们引入了一个伪监督的PU学习框架（PSPU），我们首先训练PU模型，然后使用它收集确信样本进行伪监督，并利用非PU目标来校正PU模型的权重。我们还结合了额外的一致性损失来减轻噪声样本效果。我们的PSPU在MNIST、CIFAR-10、CIFAR-100上在平衡和不平衡设置中明显优于最近的PU学习方法，并在工业异常检测的MVTecAD上表现出竞争性能。

更新时间: 2024-07-09 09:19:01

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06698v1

Certified Continual Learning for Neural Network Regression

On the one hand, there has been considerable progress on neural network verification in recent years, which makes certifying neural networks a possibility. On the other hand, neural networks in practice are often re-trained over time to cope with new data distribution or for solving different tasks (a.k.a. continual learning). Once re-trained, the verified correctness of the neural network is likely broken, particularly in the presence of the phenomenon known as catastrophic forgetting. In this work, we propose an approach called certified continual learning which improves existing continual learning methods by preserving, as long as possible, the established correctness properties of a verified network. Our approach is evaluated with multiple neural networks and on two different continual learning methods. The results show that our approach is efficient and the trained models preserve their certified correctness and often maintain high utility.

Updated: 2024-07-09 09:14:45

标题: 神经网络回归的认证持续学习

摘要: 一方面，近年来在神经网络验证方面取得了相当大的进展，这使得证明神经网络的可行性成为可能。另一方面，实际中的神经网络经常会在时间内重新训练，以应对新的数据分布或解决不同的任务（即持续学习）。一旦重新训练，神经网络的验证正确性很可能会被破坏，特别是在存在所谓的灾难性遗忘现象的情况下。在这项工作中，我们提出了一种称为“认证持续学习”的方法，通过尽可能长时间地保留已验证网络的已建立正确性属性来改进现有的持续学习方法。我们的方法使用多个神经网络，并在两种不同的持续学习方法上进行评估。结果显示，我们的方法高效，训练模型保持其认证的正确性，并经常保持高效性。

更新时间: 2024-07-09 09:14:45

领域: cs.LG

下载: http://arxiv.org/abs/2407.06697v1

Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

We propose Deep-Motion-Net: an end-to-end graph neural network (GNN) architecture that enables 3D (volumetric) organ shape reconstruction from a single in-treatment kV planar X-ray image acquired at any arbitrary projection angle. Estimating and compensating for true anatomical motion during radiotherapy is essential for improving the delivery of planned radiation dose to target volumes while sparing organs-at-risk, and thereby improving the therapeutic ratio. Achieving this using only limited imaging available during irradiation and without the use of surrogate signals or invasive fiducial markers is attractive. The proposed model learns the mesh regression from a patient-specific template and deep features extracted from kV images at arbitrary projection angles. A 2D-CNN encoder extracts image features, and four feature pooling networks fuse these features to the 3D template organ mesh. A ResNet-based graph attention network then deforms the feature-encoded mesh. The model is trained using synthetically generated organ motion instances and corresponding kV images. The latter is generated by deforming a reference CT volume aligned with the template mesh, creating digitally reconstructed radiographs (DRRs) at required projection angles, and DRR-to-kV style transferring with a conditional CycleGAN model. The overall framework was tested quantitatively on synthetic respiratory motion scenarios and qualitatively on in-treatment images acquired over full scan series for liver cancer patients. Overall mean prediction errors for synthetic motion test datasets were 0.16$\pm$0.13 mm, 0.18$\pm$0.19 mm, 0.22$\pm$0.34 mm, and 0.12$\pm$0.11 mm. Mean peak prediction errors were 1.39 mm, 1.99 mm, 3.29 mm, and 1.16 mm.

Updated: 2024-07-09 09:07:18

标题: 深度运动网络：基于GNN的从单视角2D投影重建体积器官形状

摘要: 我们提出了Deep-Motion-Net：这是一个端到端的图神经网络（GNN）架构，可以从任意投影角度获取的单个治疗中的kV平面X射线图像中重建3D（体积）器官形状。在放疗过程中估计和补偿真实解剖运动对于改善计划的放射剂量传递到靶区，同时保护风险器官至关重要，从而提高治疗比。仅使用放射照射期间有限的成像而无需使用替代信号或侵入性基准标记来实现这一点具有吸引力。所提出的模型从患者特定模板和在任意投影角度提取的kV图像的深层特征中学习网格回归。一个2D-CNN编码器提取图像特征，四个特征池网络将这些特征融合到3D模板器官网格中。然后，基于ResNet的图形注意网络对特征编码的网格进行变形。模型使用通过合成生成的器官运动实例和相应的kV图像进行训练。后者通过变形与模板网格对齐的参考CT体积，创建所需投影角度的数字重建放射图（DRRs），并使用条件CycleGAN模型进行DRR到kV风格的转换生成。整体框架在合成呼吸运动场景上进行了定量测试，并在肝癌患者的完整扫描系列中定性测试。合成运动测试数据集的整体平均预测误差为0.16±0.13毫米，0.18±0.19毫米，0.22±0.34毫米和0.12±0.11毫米。平均峰值预测误差为1.39毫米，1.99毫米，3.29毫米和1.16毫米。

更新时间: 2024-07-09 09:07:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06692v1

Hierarchical Average-Reward Linearly-solvable Markov Decision Processes

We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.

Updated: 2024-07-09 09:06:44

标题: 层次平均回报可线性求解的马尔可夫决策过程

摘要: 我们介绍了一种新颖的层次强化学习方法，适用于无限时域平均报酬条件下的线性可解马尔可夫决策过程（LMDPs）。与先前的工作不同，我们的方法允许同时学习低级和高级任务，而不对低级任务施加限制。我们的方法依赖于对状态空间的划分，创建更容易解决的较小子任务，并利用这些划分之间的等价性来更有效地学习。然后，我们利用低级任务的组合性来精确表示高级任务的值函数。实验表明，我们的方法可以比单层平均报酬强化学习表现出更高的性能，超过一个或多个数量级。

更新时间: 2024-07-09 09:06:44

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.06690v1

A Test-Time Learning Approach to Reparameterize the Geophysical Inverse Problem with a Convolutional Neural Network

Regularization is critical for solving ill-posed geophysical inverse problems. Explicit regularization is often used, but there are opportunities to explore the implicit regularization effects that are inherent in a Neural Network structure. Researchers have discovered that the Convolutional Neural Network (CNN) architecture inherently enforces a regularization that is advantageous for addressing diverse inverse problems in computer vision, including de-noising and in-painting. In this study, we examine the applicability of this implicit regularization to geophysical inversions. The CNN maps an arbitrary vector to the model space. The predicted subsurface model is then fed into a forward numerical simulation to generate corresponding predicted measurements. Subsequently, the objective function value is computed by comparing these predicted measurements with the observed measurements. The backpropagation algorithm is employed to update the trainable parameters of the CNN during the inversion. Note that the CNN in our proposed method does not require training before the inversion, rather, the CNN weights are estimated in the inversion process, hence this is a test-time learning (TTL) approach. In this study, we choose to focus on the Direct Current (DC) resistivity inverse problem, which is representative of typical Tikhonov-style geophysical inversions (e.g. gravity, electromagnetic, etc.), to test our hypothesis. The experimental results demonstrate that the implicit regularization can be useful in some DC resistivity inversions. We also provide a discussion of the potential sources of this implicit regularization introduced from the CNN architecture and discuss some practical guides for applying the proposed method to other geophysical methods.

Updated: 2024-07-09 09:06:34

标题: 一个测试时间学习方法：使用卷积神经网络重新参数化地球物理反问题

摘要: 正则化对解决不适定的地球物理反问题至关重要。通常使用显式正则化，但也有机会探索神经网络结构中固有的隐式正则化效应。研究人员发现，卷积神经网络（CNN）架构固有地实施了一种有利于解决计算机视觉中各种反问题的正则化，包括去噪和修复。在本研究中，我们研究了这种隐式正则化对地球物理反演的适用性。CNN将任意向量映射到模型空间。然后，将预测的地下模型输入到前向数值模拟中，生成相应的预测测量。随后，通过比较这些预测测量与观测测量来计算目标函数值。在反演过程中，采用反向传播算法更新CNN的可训练参数。需要注意的是，我们提出的方法中的CNN在反演之前不需要训练，而是在反演过程中估计CNN权重，因此这是一种测试时间学习（TTL）方法。在本研究中，我们选择专注于直流（DC）电阻率反问题，这代表了典型的Tikhonov样式地球物理反演（例如重力、电磁等），以测试我们的假设。实验结果表明，隐式正则化在某些DC电阻率反演中是有用的。我们还讨论了从CNN架构引入的隐式正则化的潜在来源，并讨论了应用所提出的方法到其他地球物理方法的一些实用指南。

更新时间: 2024-07-09 09:06:34

领域: cs.LG,physics.geo-ph

下载: http://arxiv.org/abs/2312.04752v2

EcoVal: An Efficient Data Valuation Framework for Machine Learning

Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an efficient data valuation framework EcoVal, to estimate the value of data for machine learning models in a fast and practical manner. Instead of directly working with individual data sample, we determine the value of a cluster of similar data points. This value is further propagated amongst all the member cluster points. We show that the overall value of the data can be determined by estimating the intrinsic and extrinsic value of each data. This is enabled by formulating the performance of a model as a \textit{production function}, a concept which is popularly used to estimate the amount of output based on factors like labor and capital in a traditional free economic market. We provide a formal proof of our valuation technique and elucidate the principles and mechanisms that enable its accelerated performance. We demonstrate the real-world applicability of our method by showcasing its effectiveness for both in-distribution and out-of-sample data. This work addresses one of the core challenges of efficient data valuation at scale in machine learning models. The code is available at \underline{https://github.com/respai-lab/ecoval}.

Updated: 2024-07-09 08:59:44

标题: EcoVal：一种用于机器学习的高效数据估值框架

摘要: 量化机器学习工作流程中数据的价值可以在机器学习倡议中做出更具战略意义的决策起到关键作用。现有的基于Shapley值的数据估值框架在机器学习中是计算昂贵的，因为它们需要对模型进行大量重复训练以获得Shapley值。在本文中，我们介绍了一种高效的数据估值框架EcoVal，以快速实用的方式估算机器学习模型的数据价值。我们不直接处理单个数据样本，而是确定一组相似数据点的价值。这个价值进一步传播给所有成员集群点。我们展示了通过估算每个数据的内在和外在价值来确定数据的整体价值。这是通过将模型的性能形成为生产函数来实现的，这个概念在传统自由经济市场中用来估算基于劳动和资本等因素的产出量。我们提供了我们估值技术的正式证明，并阐述了启用其加速性能的原理和机制。我们通过展示我们的方法对于分布内和样本外数据的有效性来展示其在现实世界中的适用性。这项工作解决了机器学习模型中大规模高效数据估值的核心挑战之一。代码可在https://github.com/respai-lab/ecoval找到。

更新时间: 2024-07-09 08:59:44

领域: cs.LG

下载: http://arxiv.org/abs/2402.09288v5

A Predictive Model Based on Transformer with Statistical Feature Embedding in Manufacturing Sensor Dataset

In the manufacturing process, sensor data collected from equipment is crucial for building predictive models to manage processes and improve productivity. However, in the field, it is challenging to gather sufficient data to build robust models. This study proposes a novel predictive model based on the Transformer, utilizing statistical feature embedding and window positional encoding. Statistical features provide an effective representation of sensor data, and the embedding enables the Transformer to learn both time- and sensor-related information. Window positional encoding captures precise time details from the feature embedding. The model's performance is evaluated in two problems: fault detection and virtual metrology, showing superior results compared to baseline models. This improvement is attributed to the efficient use of parameters, which is particularly beneficial for sensor data that often has limited sample sizes. The results support the model's applicability across various manufacturing industries, demonstrating its potential for enhancing process management and yield.

Updated: 2024-07-09 08:59:27

标题: 基于统计特征嵌入的Transformer模型在制造传感器数据集中的预测模型

摘要: 在制造过程中，从设备收集的传感器数据对于建立预测模型以管理过程并提高生产率至关重要。然而，在实际场景中，收集足够的数据来构建稳健的模型是具有挑战性的。本研究提出了一种基于Transformer的新颖预测模型，利用统计特征嵌入和窗口位置编码。统计特征提供了传感器数据的有效表示，而嵌入使Transformer能够学习时间和传感器相关信息。窗口位置编码从特征嵌入中捕获精确的时间细节。该模型在故障检测和虚拟计量两个问题上的性能得到评估，显示出优于基准模型的结果。这种改进归因于参数的有效使用，这对于通常具有有限样本量的传感器数据特别有益。结果支持该模型在各种制造行业中的适用性，展示了其增强过程管理和产量的潜力。

更新时间: 2024-07-09 08:59:27

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.06682v1

Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention

Understanding road geometry is a critical component of the autonomous vehicle (AV) stack. While high-definition (HD) maps can readily provide such information, they suffer from high labeling and maintenance costs. Accordingly, many recent works have proposed methods for estimating HD maps online from sensor data. The vast majority of recent approaches encode multi-camera observations into an intermediate representation, e.g., a bird's eye view (BEV) grid, and produce vector map elements via a decoder. While this architecture is performant, it decimates much of the information encoded in the intermediate representation, preventing downstream tasks (e.g., behavior prediction) from leveraging them. In this work, we propose exposing the rich internal features of online map estimation methods and show how they enable more tightly integrating online mapping with trajectory forecasting. In doing so, we find that directly accessing internal BEV features yields up to 73% faster inference speeds and up to 29% more accurate predictions on the real-world nuScenes dataset.

Updated: 2024-07-09 08:59:27

标题: 通过直接BEV特征注意力加速在线地图绘制和行为预测

摘要: 理解道路几何形状是自动驾驶车辆（AV）技术中至关重要的一部分。虽然高清晰度（HD）地图可以提供这样的信息，但它们面临着高昂的标注和维护成本。因此，许多最近的研究提出了从传感器数据在线估计HD地图的方法。最近的大多数方法将多摄像头观测编码成中间表示，例如鸟瞰图（BEV）网格，并通过解码器生成矢量地图元素。虽然这种架构表现出色，但它破坏了中间表示中编码的大部分信息，阻止下游任务（例如行为预测）利用它们。在这项工作中，我们提出暴露在线地图估计方法的丰富内部特征，并展示它们如何使在线地图制作与轨迹预测更紧密地集成。通过这样做，我们发现直接访问内部BEV特征可以使推理速度提高高达73％，在现实世界的nuScenes数据集上可以使预测准确性提高高达29％。

更新时间: 2024-07-09 08:59:27

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06683v1

Knowledge Graph Pruning for Recommendation

Recent years have witnessed the prosperity of knowledge graph based recommendation system (KGRS), which enriches the representation of users, items, and entities by structural knowledge with striking improvement. Nevertheless, its unaffordable computational cost still limits researchers from exploring more sophisticated models. We observe that the bottleneck for training efficiency arises from the knowledge graph, which is plagued by the well-known issue of knowledge explosion. Recently, some works have attempted to slim the inflated KG via summarization techniques. However, these summarized nodes may ignore the collaborative signals and deviate from the facts that nodes in knowledge graph represent symbolic abstractions of entities from the real-world. To this end, in this paper, we propose a novel approach called KGTrimmer for knowledge graph pruning tailored for recommendation, to remove the unessential nodes while minimizing performance degradation. Specifically, we design an importance evaluator from a dual-view perspective. For the collective view, we embrace the idea of collective intelligence by extracting community consensus based on abundant collaborative signals, i.e. nodes are considered important if they attract attention of numerous users. For the holistic view, we learn a global mask to identify the valueless nodes from their inherent properties or overall popularity. Next, we build an end-to-end importance-aware graph neural network, which injects filtered knowledge to enhance the distillation of valuable user-item collaborative signals. Ultimately, we generate a pruned knowledge graph with lightweight, stable, and robust properties to facilitate the following-up recommendation task. Extensive experiments are conducted on three publicly available datasets to prove the effectiveness and generalization ability of KGTrimmer.

Updated: 2024-07-09 08:57:52

标题: 知识图谱修剪用于推荐

摘要: 近年来，知识图谱推荐系统（KGRS）的繁荣得到了见证，通过结构化知识丰富了用户、物品和实体的表示，并取得了显著的改进。然而，其昂贵的计算成本仍然限制了研究人员探索更复杂的模型。我们观察到，训练效率的瓶颈源自知识图谱，这受到知识爆炸这一众所周知的问题的困扰。最近，一些工作尝试通过总结技术来精简膨胀的知识图谱。然而，这些总结节点可能会忽略合作信号，并偏离知识图谱中节点代表现实世界实体的符号抽象事实。因此，在本文中，我们提出了一种针对推荐定制的知识图谱修剪新方法KGTrimmer，以去除不必要的节点同时最小化性能下降。具体而言，我们从双视角设计了一个重要性评估器。对于集体视角，我们采纳了基于丰富合作信号的社区共识抽取的集体智慧概念，即如果节点吸引了众多用户的关注，则被认为是重要的。对于整体视角，我们学习一个全局掩码，以识别从其固有属性或整体流行度来确定无价值的节点。接下来，我们构建了一个端到端的重要性感知图神经网络，将过滤后的知识注入以增强有价值的用户-物品合作信号的提炼。最终，我们生成了一个经过修剪的知识图谱，具有轻量级、稳定和强大的特性，以促进后续的推荐任务。在三个公开可用数据集上进行了大量实验，以证明KGTrimmer的有效性和泛化能力。

更新时间: 2024-07-09 08:57:52

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.11531v2

Games played by Exponential Weights Algorithms

This paper studies the last-iterate convergence properties of the exponential weights algorithm with constant learning rates. We consider a repeated interaction in discrete time, where each player uses an exponential weights algorithm characterized by an initial mixed action and a fixed learning rate, so that the mixed action profile $p^t$ played at stage $t$ follows an homogeneous Markov chain. At first, we show that whenever a strict Nash equilibrium exists, the probability to play a strict Nash equilibrium at the next stage converges almost surely to 0 or 1. Secondly, we show that the limit of $p^t$, whenever it exists, belongs to the set of ``Nash Equilibria with Equalizing Payoffs''. Thirdly, we show that in strong coordination games, where the payoff of a player is positive on the diagonal and 0 elsewhere, $p^t$ converges almost surely to one of the strict Nash equilibria. We conclude with open questions.

Updated: 2024-07-09 08:49:51

标题: 指数权重算法玩的游戏

摘要: 这篇论文研究了具有恒定学习率的指数权重算法的最后迭代收敛特性。我们考虑离散时间中的重复交互，其中每个参与者使用一个由初始混合动作和固定学习率特征的指数权重算法，使得在阶段$t$上玩出的混合动作概况$p^t$遵循一个均匀马尔可夫链。首先，我们表明每当存在严格纳什均衡时，下一阶段玩出严格纳什均衡的概率几乎必然收敛于0或1。其次，我们表明$p^t$的极限，无论是否存在，都属于“具有等价收益的纳什均衡”集合。第三，我们表明在强协调游戏中，其中一个玩家的收益在对角线上是正的，在其他地方是0，$p^t$几乎必然收敛于一个严格纳什均衡。我们最后讨论了一些未解决的问题。

更新时间: 2024-07-09 08:49:51

领域: cs.AI,math.PR

下载: http://arxiv.org/abs/2407.06676v1

On the Limitation and Experience Replay for GNNs in Continual Learning

Continual learning seeks to empower models to progressively acquire information from a sequence of tasks. This approach is crucial for many real-world systems, which are dynamic and evolve over time. Recent research has witnessed a surge in the exploration of Graph Neural Networks (GNN) in Node-wise Graph Continual Learning (NGCL), a practical yet challenging paradigm involving the continual training of a GNN on node-related tasks. Despite recent advancements in continual learning strategies for GNNs in NGCL, a thorough theoretical understanding, especially regarding its learnability, is lacking. Learnability concerns the existence of a learning algorithm that can produce a good candidate model from the hypothesis/weight space, which is crucial for model selection in NGCL development. This paper introduces the first theoretical exploration of the learnability of GNN in NGCL, revealing that learnability is heavily influenced by structural shifts due to the interconnected nature of graph data. Specifically, GNNs may not be viable for NGCL under significant structural changes, emphasizing the need to manage structural shifts. To mitigate the impact of structural shifts, we propose a novel experience replay method termed Structure-Evolution-Aware Experience Replay (SEA-ER). SEA-ER features an innovative experience selection strategy that capitalizes on the topological awareness of GNNs, alongside a unique replay strategy that employs structural alignment to effectively counter catastrophic forgetting and diminish the impact of structural shifts on GNNs in NGCL. Our extensive experiments validate our theoretical insights and the effectiveness of SEA-ER.

Updated: 2024-07-09 08:34:43

标题: 关于图神经网络在持续学习中的限制和经验重放

摘要: 持续学习旨在赋予模型逐渐从一系列任务中获取信息的能力。这种方法对于许多现实世界的系统至关重要，这些系统是动态的并随时间演变。最近的研究在探索图神经网络（GNN）在节点图持续学习（NGCL）中的应用方面出现了激增，这是一个实际但具有挑战性的范式，涉及在节点相关任务上对GNN进行持续训练。尽管最近在NGCL中GNN的持续学习策略方面取得了进展，但对其可学习性特别是理论上的理解仍然缺乏。可学习性涉及存在能够从假设/权重空间产生良好候选模型的学习算法，这对于NGCL开发中的模型选择至关重要。本文介绍了GNN在NGCL中可学习性的第一次理论探索，揭示了由于图数据的相互联系性而受到结构性转变的严重影响。具体而言，GNN在重大结构性变化下可能不适用于NGCL，强调了管理结构性转变的必要性。为了减轻结构性转变的影响，我们提出了一种新颖的经验重放方法，称为结构演化感知经验重放（SEA-ER）。SEA-ER具有一种创新的经验选择策略，利用GNN的拓扑意识，以及一种采用结构对齐的独特重放策略，有效地对抗灾难性遗忘并减少结构性转变对GNN在NGCL中的影响。我们广泛的实验验证了我们的理论见解和SEA-ER的有效性。

更新时间: 2024-07-09 08:34:43

领域: cs.LG

下载: http://arxiv.org/abs/2302.03534v2

Collaborative Design of AI-Enhanced Learning Activities

Artificial intelligence has accelerated innovations in different aspects of citizens' lives. Many contexts have already addressed technology-enhanced learning, but educators at different educational levels now need to develop AI literacy and the ability to integrate appropriate AI usage into their teaching. We take into account this objective, along with the creative learning design, to create a formative intervention that enables preservice teachers, in-service teachers, and EdTech specialists to effectively incorporate AI into their teaching practices. We developed the formative intervention with Terra Numerica and Maison de l'Intelligence Artificielle in two phases in order to enhance their understanding of AI and foster its creative application in learning design. Participants reflect on AI's potential in teaching and learning by exploring different activities that can integrate AI literacy in education, including its ethical considerations and potential for innovative pedagogy. The approach emphasises not only acculturating professionals to AI but also empowering them to collaboratively design AI-enhanced educational activities that promote learner engagement and personalised learning experiences. Through this process, participants in the workshops develop the skills and mindset necessary to effectively leverage AI while maintaining a critical awareness of its implications in education.

Updated: 2024-07-09 08:34:08

标题: 人工智能增强学习活动的协作设计

摘要: 人工智能加速了公民生活各个方面的创新。许多背景已经涉及技术增强学习，但不同教育层次的教育工作者现在需要培养人工智能素养，并将适当的人工智能使用融入教学中。我们考虑到这一目标，结合创造性学习设计，创建了一个形成性干预，使准教师、在职教师和教育技术专家能够有效地将人工智能纳入他们的教学实践中。我们与 Terra Numerica 和 Maison de l'Intelligence Artificielle 在两个阶段开发了这个形成性干预，以增进他们对人工智能的理解并在学习设计中促进其创造性应用。参与者通过探索各种可以将人工智能素养融入教育的活动，包括其伦理考虑和创新教学法，来思考人工智能在教学和学习中的潜力。这种方法不仅强调将专业人士融入人工智能，还赋予他们协同设计促进学习者参与和个性化学习体验的人工智能增强教育活动的能力。通过这一过程，参与者在研讨会中培养出有效利用人工智能的技能和心态，同时保持对其对教育的影响的批判意识。

更新时间: 2024-07-09 08:34:08

领域: cs.AI

下载: http://arxiv.org/abs/2407.06660v1

Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers

In this paper, we propose a method to improve prediction accuracy of semantic segmentation methods as follows: (1) construct a neural network that has pre-processing layers based on a convolutional autoencoder ahead of a semantic segmentation network, and (2) train the entire network initialized by the weights of the pre-trained autoencoder. We applied this method to the fully convolutional network (FCN) and experimentally compared its prediction accuracy on the cityscapes dataset. The Mean IoU of the proposed target model with the He normal initialization is 18.7% higher than that of FCN with the He normal initialization. In addition, those of the modified models of the target model are significantly higher than that of FCN with the He normal initialization. The accuracy and loss curves during the training showed that these are resulting from the improvement of the generalization ability. All of these results provide strong evidence that the proposed method is significantly effective in improving the prediction accuracy of FCN. The proposed method has the following features: it is comparatively simple, whereas the effect on improving the generalization ability and prediction accuracy of FCN is significant; the increase in the number of parameters by using it is very small, and that in the computation time is substantially large. In principle, the proposed method can be applied to other semantic segmentation methods. For semantic segmentation, at present, there is no effective way to improve the prediction accuracy of existing methods. None have published a method which is the same as or similar to our method and none have used such a method in practice. Therefore, we believe that our method is useful in practice and worthy of being widely known and used.

Updated: 2024-07-09 08:33:59

标题: 使用基于卷积自编码器的预处理层提高语义分割方法的预测准确性

摘要: 在本文中，我们提出了一种改进语义分割方法预测准确性的方法，具体如下：(1)构建一个神经网络，在语义分割网络之前加入基于卷积自动编码器的预处理层；(2)训练整个网络，初始化权重使用预训练自动编码器的权重。我们将这种方法应用于全卷积网络（FCN），并在cityscapes数据集上实验比较其预测准确性。提出的目标模型与He正态初始化的Mean IoU比FCN与He正态初始化的高出18.7%。此外，修改后的目标模型的准确性明显高于FCN与He正态初始化的准确性。训练过程中的准确性和损失曲线表明，这些改进是由于泛化能力的提高。所有这些结果强有力地证明了提出的方法在提高FCN的预测准确性方面显著有效。提出的方法具有以下特点：相对简单，但对于提高FCN的泛化能力和预测准确性的影响显著；使用该方法增加的参数数量非常小，计算时间显著增加。原则上，该方法可以应用于其他语义分割方法。目前，对于语义分割，没有有效方法来提高现有方法的预测准确性。没有发表与我们方法相同或类似的方法，也没有在实践中使用过这种方法。因此，我们认为我们的方法在实践中是有用的，值得广泛知晓和使用。

更新时间: 2024-07-09 08:33:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.12718v2

TriQXNet: Forecasting Dst Index from Solar Wind Data Using an Interpretable Parallel Classical-Quantum Framework with Uncertainty Quantification

Geomagnetic storms, caused by solar wind energy transfer to Earth's magnetic field, can disrupt critical infrastructure like GPS, satellite communications, and power grids. The disturbance storm-time (Dst) index measures storm intensity. Despite advancements in empirical, physics-based, and machine-learning models using real-time solar wind data, accurately forecasting extreme geomagnetic events remains challenging due to noise and sensor failures. This research introduces TriQXNet, a novel hybrid classical-quantum neural network for Dst forecasting. Our model integrates classical and quantum computing, conformal prediction, and explainable AI (XAI) within a hybrid architecture. To ensure high-quality input data, we developed a comprehensive preprocessing pipeline that included feature selection, normalization, aggregation, and imputation. TriQXNet processes preprocessed solar wind data from NASA's ACE and NOAA's DSCOVR satellites, predicting the Dst index for the current hour and the next, providing vital advance notice to mitigate geomagnetic storm impacts. TriQXNet outperforms 13 state-of-the-art hybrid deep-learning models, achieving a root mean squared error of 9.27 nanoteslas (nT). Rigorous evaluation through 10-fold cross-validated paired t-tests confirmed its superior performance with 95% confidence. Conformal prediction techniques provide quantifiable uncertainty, which is essential for operational decisions, while XAI methods like ShapTime enhance interpretability. Comparative analysis shows TriQXNet's superior forecasting accuracy, setting a new level of expectations for geomagnetic storm prediction and highlighting the potential of classical-quantum hybrid models in space weather forecasting.

Updated: 2024-07-09 08:30:42

标题: TriQXNet：使用可解释的并行经典-量子框架并进行不确定性量化，从太阳风数据预测Dst指数

摘要: 地球磁场受太阳风能量传输引起的地磁暴可能会干扰GPS、卫星通信和电网等关键基础设施。扰动暴时指数（Dst指数）用于衡量暴风强度。尽管利用实时太阳风数据进行的经验、基于物理的和机器学习模型取得了进展，但由于噪声和传感器故障，准确预测极端地磁事件仍然具有挑战性。本研究介绍了TriQXNet，一种用于Dst预测的新型混合经典-量子神经网络。我们的模型将经典和量子计算、符合性预测和可解释人工智能（XAI）集成在一个混合架构中。为了确保高质量的输入数据，我们开发了一个全面的预处理流水线，包括特征选择、归一化、聚合和插补。TriQXNet处理了来自NASA的ACE和NOAA的DSCOVR卫星的预处理太阳风数据，预测当前小时和下一个小时的Dst指数，提供重要的提前通知以减轻地磁暴的影响。TriQXNet优于13种最先进的混合深度学习模型，实现了9.27纳特（nT）的均方根误差。通过10次交叉验证配对t检验的严格评估确认了其95%的优越性能。符合性预测技术提供了可量化的不确定性，这对操作决策至关重要，而ShapTime等XAI方法增强了解释性。比较分析显示TriQXNet具有更高的预测准确性，为地磁暴预测设定了新的期望水平，并突显了经典-量子混合模型在空间天气预测中的潜力。

更新时间: 2024-07-09 08:30:42

领域: cs.AI

下载: http://arxiv.org/abs/2407.06658v1

Kermut: Composite kernel regression for protein variant effects

Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has seen much progress in recent years, uncertainty metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, with a novel composite kernel for modelling mutation similarity, which obtains state-of-the-art performance for protein variant effect prediction while also offering estimates of uncertainty through its posterior. An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration, but that instance-specific uncertainty calibration remains more challenging. We hope that this will encourage future work in this promising direction.

Updated: 2024-07-09 08:28:57

标题: Kermut：蛋白变异效应的复合核回归

摘要: 可靠地预测蛋白变体效应对于蛋白优化和推进生物学理解至关重要。在蛋白工程的实际应用中，我们也需要为我们的预测提供可靠的不确定性估计，尽管近年来预测准确性取得了很大进展，但很少报道不确定性指标。在这里，我们提供了一种高斯过程回归模型Kermut，采用一种新颖的复合核来建模突变相似性，该模型在蛋白变体效应预测方面取得了最先进的性能，同时也通过其后验提供了不确定性估计。对不确定性估计质量的分析表明，我们的模型提供了有意义的整体校准水平，但特定实例的不确定性校准仍然更具挑战性。我们希望这将鼓励未来在这个有前途的方向上进行更多工作。

更新时间: 2024-07-09 08:28:57

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2407.00002v2

Teacher agency in the age of generative AI: towards a framework of hybrid intelligence for learning design

Generative AI (genAI) is being used in education for different purposes. From the teachers' perspective, genAI can support activities such as learning design. However, there is a need to study the impact of genAI on the teachers' agency. While GenAI can support certain processes of idea generation and co-creation, GenAI has the potential to negatively affect professional agency due to teachers' limited power to (i) act, (ii) affect matters, and (iii) make decisions or choices, as well as the possibility to (iv) take a stance. Agency is identified in the learning sciences studies as being one of the factors in teachers' ability to trust AI. This paper aims to introduce a dual perspective. First, educational technology, as opposed to other computer-mediated communication (CMC) tools, has two distinctly different user groups and different user needs, in the form of learners and teachers, to cater for. Second, the design of educational technology often prioritises learner agency and engagement, thereby limiting the opportunities for teachers to influence the technology and take action. This study aims to analyse the way GenAI is influencing teachers' agency. After identifying the current limits of GenAI, a solution based on the combination of human intelligence and artificial intelligence through a hybrid intelligence approach is proposed. This combination opens up the discussion of a collaboration between teacher and genAI being able to open up new practices in learning design in which they HI support the extension of the teachers' activity.

Updated: 2024-07-09 08:28:05

标题: 教师代理在生成式人工智能时代：朝向学习设计的混合智能框架

摘要: 生成式人工智能（genAI）正在教育领域被用于不同的目的。从教师的角度来看，genAI可以支持学习设计等活动。然而，有必要研究genAI对教师代理的影响。虽然genAI可以支持某些思想生成和共同创作的过程，但由于教师在行动、影响事务、做出决定或选择以及可能采取立场方面的能力有限，genAI具有负面影响教师的专业代理的潜力。学习科学研究中确定代理作为教师能够信任人工智能的因素之一。本文旨在引入一个双重视角。首先，教育技术与其他计算机中介通讯（CMC）工具相对立，有两个截然不同的用户群体和不同的用户需求，即学习者和教师，需要满足。其次，教育技术的设计往往优先考虑学习者的代理和参与，从而限制了教师影响技术和采取行动的机会。本研究旨在分析GenAI如何影响教师的代理。在确定GenAI当前限制后，提出了一种基于人工智能和人类智能结合的混合智能方法的解决方案。这种组合开启了关于教师和genAI之间合作的讨论，能够在学习设计中开辟新的实践，其中他们共同支持扩展教师的活动。

更新时间: 2024-07-09 08:28:05

领域: cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.06655v1

SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training

The effectiveness of large language models (LLMs) is often hindered by duplicated data in their extensive pre-training datasets. Current approaches primarily focus on detecting and removing duplicates, which risks the loss of valuable information and neglects the varying degrees of duplication. To address this, we propose a soft deduplication method that maintains dataset integrity while selectively reducing the sampling weight of data with high commonness. Central to our approach is the concept of "data commonness", a metric we introduce to quantify the degree of duplication by measuring the occurrence probabilities of samples using an n-gram model. Empirical analysis shows that this method significantly improves training efficiency, achieving comparable perplexity scores with at least a 26% reduction in required training steps. Additionally, it enhances average few-shot downstream accuracy by 1.77% when trained for an equivalent duration. Importantly, this approach consistently improves performance, even on rigorously deduplicated datasets, indicating its potential to complement existing methods and become a standard pre-training process for LLMs.

Updated: 2024-07-09 08:26:39

标题: SoftDedup：一种用于加速语言模型预训练的高效数据重新加权方法

摘要: 大型语言模型（LLMs）的有效性常常受到其广泛的预训练数据中重复数据的阻碍。目前的方法主要集中在检测和删除重复数据，这可能会导致有价值信息的丢失，并忽视了不同程度的重复。为了解决这个问题，我们提出了一种软去重方法，该方法在保持数据集完整性的同时，有选择性地减少具有高共同性数据的采样权重。我们方法的核心是“数据共同性”概念，这是我们引入的一个度量重复程度的指标，通过使用n-gram模型来测量样本的出现概率。经验分析表明，这种方法显著提高了训练效率，实现了与所需训练步骤至少减少26%的可比困惑度分数。此外，当以等效时长进行训练时，它还提高了平均少样本下游准确性1.77%。重要的是，这种方法在严格去重的数据集上也能保持性能改进，表明其有潜力补充现有方法，并成为LLMs的标准预训练过程。

更新时间: 2024-07-09 08:26:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06654v1

Future You: A Conversation with an AI-Generated Future Self Reduces Anxiety, Negative Emotions, and Increases Future Self-Continuity

We introduce "Future You," an interactive, brief, single-session, digital chat intervention designed to improve future self-continuity--the degree of connection an individual feels with a temporally distant future self--a characteristic that is positively related to mental health and wellbeing. Our system allows users to chat with a relatable yet AI-powered virtual version of their future selves that is tuned to their future goals and personal qualities. To make the conversation realistic, the system generates a "synthetic memory"--a unique backstory for each user--that creates a throughline between the user's present age (between 18-30) and their life at age 60. The "Future You" character also adopts the persona of an age-progressed image of the user's present self. After a brief interaction with the "Future You" character, users reported decreased anxiety, and increased future self-continuity. This is the first study successfully demonstrating the use of personalized AI-generated characters to improve users' future self-continuity and wellbeing.

Updated: 2024-07-09 08:24:10

标题: 未来的你：与由人工智能生成的未来自我的对话减少焦虑、负面情绪，增加未来自我的连续性

摘要: 我们介绍了“未来自己”，这是一个互动、简短、单次会话的数字聊天干预项目，旨在提高未来自我连续性——即个体与遥远未来自我的联系程度，这一特征与心理健康和幸福感呈正相关。我们的系统允许用户与一个可靠但又由人工智能驱动的未来自我的虚拟版本进行对话，该版本根据用户的未来目标和个人品质进行调整。为了使对话更加真实，系统生成了“合成记忆”——为每个用户创造一个独特的背景故事，从而在用户当前年龄（18-30岁）和60岁时的生活之间建立起联系。该“未来自己”角色还采用了用户当前自己的年龄进展图像的个性。与“未来自己”角色进行简短互动后，用户报告称焦虑减少，未来自我连续性增加。这是第一项成功证明利用个性化AI生成角色来提高用户未来自我连续性和幸福感的研究。

更新时间: 2024-07-09 08:24:10

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.12514v3

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}. While traditional methods rely on offline preference dataset constructions, recent approaches have shifted towards online settings, where a learner uses a small amount of labeled seed data and a large pool of unlabeled prompts to iteratively construct new preference data through self-generated responses and high-quality reward/preference feedback. However, most current online algorithms still focus on preference labeling during policy model updating with given feedback oracles, which incurs significant expert query costs. \textit{We are the first to explore cost-effective proxy reward oracles construction strategies for further labeling preferences or rewards with extremely limited labeled data and expert query budgets}. Our approach introduces two key innovations: (1) on-policy query to avoid OOD and imbalance issues in seed data, and (2) active learning to select the most informative data for preference queries. Using these methods, we train a evaluation model with minimal expert-labeled data, which then effectively labels nine times more preference pairs for further RLHF training. For instance, our model using Direct Preference Optimization (DPO) gains around over 1% average improvement on AlpacaEval2, MMLU-5shot and MMLU-0shot, with only 1.7K query cost. Our methodology is orthogonal to other direct expert query-based strategies and therefore might be integrated with them to further reduce query costs.

Updated: 2024-07-09 08:24:06

标题: 使用基于政策和主动学习的成本效益代理奖励模型构建

摘要: 带有人类反馈的强化学习（RLHF）作为当前大型语言模型管道中广泛采用的方法，受限于人类偏好数据的规模。传统方法依赖于离线偏好数据集构建，而最近的方法已经转向在线设置，其中学习者使用少量标记的种子数据和大量未标记的提示，通过自动生成的响应和高质量的奖励/偏好反馈迭代地构建新的偏好数据。然而，大多数当前的在线算法仍然专注于在给定反馈神谕的情况下，在策略模型更新过程中进行偏好标记，这会产生显著的专家查询成本。我们是第一个探索成本有效的代理奖励神谕构建策略，用于使用极少标记数据和专家查询预算进一步标记偏好或奖励。我们的方法引入了两个关键创新：（1）基于政策的查询，以避免种子数据中的OOD和不平衡问题，以及（2）主动学习来选择最具信息量的数据进行偏好查询。使用这些方法，我们训练了一个评估模型，仅使用极少专家标记数据，然后有效地为进一步的RLHF训练标记了九倍更多的偏好对。例如，我们使用直接偏好优化（DPO）的模型在AlpacaEval2、MMLU-5shot和MMLU-0shot上平均改进了超过1%，仅需1.7K的查询成本。我们的方法论与其他基于直接专家查询的策略正交，并因此可以与它们集成以进一步降低查询成本。

更新时间: 2024-07-09 08:24:06

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.02119v2

SEBA: Strong Evaluation of Biometric Anonymizations

Biometric data is pervasively captured and analyzed. Using modern machine learning approaches, identity and attribute inferences attacks have proven high accuracy. Anonymizations aim to mitigate such disclosures by modifying data in a way that prevents identification. However, the effectiveness of some anonymizations is unclear. Therefore, improvements of the corresponding evaluation methodology have been proposed recently. In this paper, we introduce SEBA, a framework for strong evaluation of biometric anonymizations. It combines and implements the state-of-the-art methodology in an easy-to-use and easy-to-expand software framework. This allows anonymization designers to easily test their techniques using a strong evaluation methodology. As part of this discourse, we introduce and discuss new metrics that allow for a more straightforward evaluation of the privacy-utility trade-off that is inherent to anonymization attempts. Finally, we report on a prototypical experiment to demonstrate SEBA's applicability.

Updated: 2024-07-09 08:20:03

标题: SEBA：生物特征匿名化的强度评估

摘要: 生物特征数据被广泛地捕获和分析。利用现代机器学习方法，身份和属性推断攻击已被证明具有高准确性。匿名化旨在通过修改数据的方式来防止识别，从而减轻此类泄露。然而，一些匿名化的有效性尚不清楚。因此，最近提出了改进相应评估方法的建议。在本文中，我们介绍了SEBA，这是一个用于对生物特征匿名化进行强有力评估的框架。它结合并实现了最先进的方法论，以易于使用和易于扩展的软件框架。这使得匿名化设计者可以使用强有力的评估方法轻松测试他们的技术。作为这一讨论的一部分，我们介绍并讨论了新的指标，这些指标允许更直接地评估匿名化尝试中固有的隐私与效用之间的权衡。最后，我们报告了一个原型实验，以展示SEBA的适用性。

更新时间: 2024-07-09 08:20:03

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2407.06648v1

TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series

Temporally indexed data are essential in a wide range of fields and of interest to machine learning researchers. Time series data, however, are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations and the application of existing and new data-intensive ML methods. A possible solution to this bottleneck is to generate synthetic data. In this work, we introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series. TSGM includes a broad repertoire of machine learning methods: generative models, probabilistic, and simulator-based approaches. The framework enables users to evaluate the quality of the produced data from different angles: similarity, downstream effectiveness, predictive consistency, diversity, and privacy. The framework is extensible, which allows researchers to rapidly implement their own methods and compare them in a shareable environment. TSGM was tested on open datasets and in production and proved to be beneficial in both cases. Additionally to the library, the project allows users to employ command line interfaces for synthetic data generation which lowers the entry threshold for those without a programming background.

Updated: 2024-07-09 08:19:23

标题: TSGM: 一种用于生成合成时间序列的灵活框架

摘要: 时间索引数据在各个领域中至关重要，并受到机器学习研究人员的关注。然而，时间序列数据通常稀缺或高度敏感，这导致了研究人员和工业组织之间无法共享数据，也无法应用现有和新的数据密集型机器学习方法。解决这一瓶颈的可能方法是生成合成数据。在这项工作中，我们介绍了时间序列生成建模（TSGM），这是一个用于生成合成时间序列的开源框架。TSGM包括广泛的机器学习方法：生成模型、概率和基于模拟器的方法。该框架使用户能够从不同角度评估生成数据的质量：相似性、下游效果、预测一致性、多样性和隐私性。该框架是可扩展的，允许研究人员快速实现自己的方法并在可共享的环境中进行比较。TSGM已在公开数据集和生产中进行了测试，并在两种情况下证明是有益的。除了库外，该项目还允许用户使用命令行界面生成合成数据，降低了那些没有编程背景的人的门槛。

更新时间: 2024-07-09 08:19:23

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2305.11567v2

Variational Learning ISTA

Compressed sensing combines the power of convex optimization techniques with a sparsity-inducing prior on the signal space to solve an underdetermined system of equations. For many problems, the sparsifying dictionary is not directly given, nor its existence can be assumed. Besides, the sensing matrix can change across different scenarios. Addressing these issues requires solving a sparse representation learning problem, namely dictionary learning, taking into account the epistemic uncertainty of the learned dictionaries and, finally, jointly learning sparse representations and reconstructions under varying sensing matrix conditions. We address both concerns by proposing a variant of the LISTA architecture. First, we introduce Augmented Dictionary Learning ISTA (A-DLISTA), which incorporates an augmentation module to adapt parameters to the current measurement setup. Then, we propose to learn a distribution over dictionaries via a variational approach, dubbed Variational Learning ISTA (VLISTA). VLISTA exploits A-DLISTA as the likelihood model and approximates a posterior distribution over the dictionaries as part of an unfolded LISTA-based recovery algorithm. As a result, VLISTA provides a probabilistic way to jointly learn the dictionary distribution and the reconstruction algorithm with varying sensing matrices. We provide theoretical and experimental support for our architecture and show that our model learns calibrated uncertainties.

Updated: 2024-07-09 08:17:06

标题: 变分学习ISTA

摘要: 压缩感知结合了凸优化技术的力量和对信号空间的稀疏先验，以解决欠定方程组。对于许多问题，稀疏字典并不直接给出，也不能假定其存在。此外，感知矩阵在不同情景下可能会发生变化。解决这些问题需要解决稀疏表示学习问题，即字典学习，考虑到学习字典的认知不确定性，并最终在不同感知矩阵条件下联合学习稀疏表示和重构。我们通过提出LISTA架构的一个变体来解决这两个问题。首先，我们引入了增强字典学习ISTA（A-DLISTA），该方法包含一个增强模块，以适应当前的测量设置。然后，我们提出通过一种变分方法学习字典的分布，称为变分学习ISTA（VLISTA）。VLISTA利用A-DLISTA作为似然模型，并在展开的基于LISTA的恢复算法中近似计算字典的后验分布。因此，VLISTA提供了一种概率方式来联合学习字典分布和重构算法，以适应不同的感知矩阵。我们为我们的架构提供了理论和实验支持，并展示我们的模型学习了校准的不确定性。

更新时间: 2024-07-09 08:17:06

领域: cs.LG,eess.SP,stat.ML

下载: http://arxiv.org/abs/2407.06646v1

Entropy Law: The Story Behind Data Compression and LLM Performance

Data is the cornerstone of large language models (LLMs), but not all data is useful for model learning. Carefully selected data can better elicit the capabilities of LLMs with much less computational overhead. Most methods concentrate on evaluating the quality of individual samples in data selection, while the combinatorial effects among samples are neglected. Even if each sample is of perfect quality, their combinations may be suboptimal in teaching LLMs due to their intrinsic homogeneity or contradiction. In this paper, we aim to uncover the underlying relationships between LLM performance and data selection. Inspired by the information compression nature of LLMs, we uncover an ``entropy law'' that connects LLM performance with data compression ratio and first-epoch training loss, which reflect the information redundancy of a dataset and the mastery of inherent knowledge encoded in this dataset, respectively. Through both theoretical deduction and empirical evaluation, we find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss. Based on the findings of the entropy law, we propose a quite efficient and universal data selection method named \textbf{ZIP} for training LLMs, which aim to prioritize data subsets exhibiting a low compression ratio. Based on a multi-stage algorithm that selects diverse data in a greedy manner, we can obtain a good data subset with satisfactory diversity. Extensive experiments have been conducted to validate the entropy law and the superiority of ZIP across different LLM backbones and alignment stages. We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.

Updated: 2024-07-09 08:14:29

标题: 熵定律：数据压缩和LLM性能背后的故事

摘要: 数据是大型语言模型（LLMs）的基石，但并非所有数据都对模型学习有用。精心选择的数据可以更好地引发LLMs的能力，而且计算开销也更少。大多数方法集中在评估数据选择中单个样本的质量，而忽略了样本之间的组合效应。即使每个样本的质量都很好，它们的组合可能由于固有的同质性或矛盾而不利于教授LLMs。本文旨在揭示LLMs性能与数据选择之间的潜在关系。受LLMs信息压缩特性的启发，我们揭示了一个将LLMs性能与数据压缩比和首轮训练损失联系起来的“熵定律”，分别反映了数据集的信息冗余和其中包含的内在知识的掌握程度。通过理论推导和实证评估，我们发现模型性能与训练数据的压缩比呈负相关，通常会导致较低的训练损失。基于熵定律的发现，我们提出了一种名为ZIP的非常高效且普遍适用的数据选择方法，旨在优先考虑显示低压缩比的数据子集。通过以贪婪方式选择多样化数据的多阶段算法，我们可以获得一个具有令人满意多样性的良好数据子集。我们进行了广泛的实验证实熵定律和ZIP在不同LLM主干和对齐阶段上的优越性。我们还提出了一个有趣的熵定律应用，可以在模型训练开始时检测潜在的性能风险。

更新时间: 2024-07-09 08:14:29

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.06645v1

Early Detection of Network Service Degradation: An Intra-Flow Approach

This research presents a novel method for predicting service degradation (SD) in computer networks by leveraging early flow features. Our approach focuses on the observable (O) segments of network flows, particularly analyzing Packet Inter-Arrival Time (PIAT) values and other derived metrics, to infer the behavior of non-observable (NO) segments. Through a comprehensive evaluation, we identify an optimal O/NO split threshold of 10 observed delay samples, balancing prediction accuracy and resource utilization. Evaluating models including Logistic Regression, XGBoost, and Multi-Layer Perceptron, we find XGBoost outperforms others, achieving an F1-score of 0.74, balanced accuracy of 0.84, and AUROC of 0.97. Our findings highlight the effectiveness of incorporating comprehensive early flow features and the potential of our method to offer a practical solution for monitoring network traffic in resource-constrained environments. This approach ensures enhanced user experience and network performance by preemptively addressing potential SD, providing the basis for a robust framework for maintaining high-quality network services.

Updated: 2024-07-09 08:05:14

标题: 网络服务降级的早期检测：一种流内方法

摘要: 这项研究提出了一种利用早期流特征预测计算机网络中服务降级（SD）的新方法。我们的方法侧重于网络流的可观测（O）部分，特别是分析分组到达时间（PIAT）值和其他衍生指标，以推断不可观测（NO）部分的行为。通过全面评估，我们确定了一个最佳的O/NO分割阈值为10个观察到的延迟样本，平衡了预测准确性和资源利用率。评估模型包括Logistic回归、XGBoost和多层感知器，我们发现XGBoost表现优于其他模型，达到了0.74的F1分数、0.84的平衡准确性和0.97的AUROC。我们的研究结果突显了将全面的早期流特征纳入考虑的有效性，以及我们的方法在资源受限环境中监测网络流量的潜力，从而提前解决潜在的SD问题，为维护高质量网络服务提供坚实基础。这种方法能够通过预防性地解决潜在的SD问题来确保用户体验和网络性能得到提升，为维护高质量网络服务提供了坚实的框架基础。

更新时间: 2024-07-09 08:05:14

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2407.06637v1

Reasoning about unpredicted change and explicit time

Reasoning about unpredicted change consists in explaining observations by events; we propose here an approach for explaining time-stamped observations by surprises, which are simple events consisting in the change of the truth value of a fluent. A framework for dealing with surprises is defined. Minimal sets of surprises are provided together with time intervals where each surprise has occurred, and they are characterized from a model-based diagnosis point of view. Then, a probabilistic approach of surprise minimisation is proposed.

Updated: 2024-07-09 07:49:57

标题: 推理关于意外变化和显式时间

摘要: 对于无法预测的变化进行推理意味着通过事件来解释观察结果；我们在这里提出了一种通过惊喜来解释时间戳观察结果的方法，这些惊喜是由流动性质的真值发生变化组成的简单事件。定义了一个处理惊喜的框架。提供了最小的惊喜集合以及每个惊喜发生的时间间隔，并从基于模型的诊断角度对其进行描述。然后，提出了一个惊喜最小化的概率方法。

更新时间: 2024-07-09 07:49:57

领域: cs.AI

下载: http://arxiv.org/abs/2407.06622v1

Localisation of Regularised and Multiview Support Vector Machine Learning

We prove a few representer theorems for a localised version of the regularised and multiview support vector machine learning problem introduced by H.Q. Minh, L. Bazzani, and V. Murino, Journal of Machine Learning Research, 17(2016) 1-72, that involves operator valued positive semidefinite kernels and their reproducing kernel Hilbert spaces. The results concern general cases when convex or nonconvex loss functions and finite or infinite dimensional input spaces are considered. We show that the general framework allows infinite dimensional input spaces and nonconvex loss functions for some special cases, in particular in case the loss functions are Gateaux differentiable. Detailed calculations are provided for the exponential least square loss function that lead to partially nonlinear equations for which a particular unconstrained potential reduction Newton's approximation method can be used.

Updated: 2024-07-09 07:43:12

标题: 支持向量机学习的正则化和多视角定位

摘要: 我们证明了一些关于由H.Q. Minh, L. Bazzani和V. Murino在《机器学习研究杂志》（2016年第17期1-72页）中引入的正则化和多视图支持向量机学习问题的局部化版本的表现定理。该问题涉及算子值正半定核和它们的再生核希尔伯特空间。结果涉及一般情况，其中考虑了凸或非凸损失函数以及有限或无限维输入空间。我们表明，一般框架允许无限维输入空间和非凸损失函数的一些特殊情况，特别是在损失函数是Gateaux可微的情况下。对指数最小二乘损失函数进行了详细计算，导致部分非线性方程，可以使用特定的无约束潜在减少牛顿逼近方法。

更新时间: 2024-07-09 07:43:12

领域: math.FA,cs.LG,Primary 68T05, Secondary 46E22, 46G05

下载: http://arxiv.org/abs/2304.05655v3

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role in LLM-based TTS models. Current speech tokens are learned in an unsupervised manner, which lacks explicit semantic information and alignment to the text. In this paper, we propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder. Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis. Experimental results show that supervised semantic tokens significantly outperform existing unsupervised tokens in terms of content consistency and speaker similarity for zero-shot voice cloning. Moreover, we find that utilizing large-scale data further improves the synthesis performance, indicating the scalable capacity of CosyVoice. To the best of our knowledge, this is the first attempt to involve supervised speech tokens into TTS models.

Updated: 2024-07-09 07:42:51

标题: CosyVoice：基于监督语义标记的可扩展多语言零-shot文本转语音合成器

摘要: 近年来，大型语言模型（LLM）基于文本到语音（TTS）的趋势成为主流，这是因为它们具有高自然度和零-shot 能力。在这种范式中，语音信号被离散化为令牌序列，这些序列由LLM以文本为提示进行建模，并通过基于令牌的声码器重构为波形。显然，语音令牌在基于LLM的TTS模型中发挥着至关重要的作用。目前的语音令牌是通过无监督方式学习的，缺乏明确的语义信息和对文本的对齐。在本文中，我们提出使用受监督的语义令牌来表示语音，这些令牌是通过在编码器中插入向量量化从多语种语音识别模型中获得的。基于这些令牌，我们进一步提出了一个可扩展的零-shot TTS合成器，CosyVoice，它由一个LLM用于文本到令牌生成和一个条件流匹配模型用于令牌到语音合成组成。实验结果显示，受监督的语义令牌在内容一致性和说话者相似性方面明显优于现有的无监督令牌，适用于零-shot语音克隆。此外，我们发现利用大规模数据进一步提高了合成性能，表明了CosyVoice的可扩展能力。据我们所知，这是首次尝试将受监督的语音令牌引入TTS模型。

更新时间: 2024-07-09 07:42:51

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.05407v2

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .

Updated: 2024-07-09 07:40:52

标题: Prompt-Singer：具有自然语言提示的可控歌声合成

摘要: 最近的歌声合成（SVS）方法取得了出色的音频质量和自然度，但它们缺乏明确控制合成歌声风格属性的能力。我们提出了Prompt-Singer，这是第一个能够通过自然语言控制歌手性别、音域和音量属性的SVS方法。我们采用了基于解码器的变压器模型架构，具有多尺度层次结构，并设计了一种音域-旋律解耦的音高表示，可以在保持旋律精度的同时实现文本条件下的音域控制。此外，我们探索了各种实验设置，包括不同类型的文本表示、文本编码器微调以及引入语音数据以缓解数据稀缺问题，旨在促进进一步的研究。实验表明，我们的模型实现了良好的控制能力和音频质量。音频样本可在http://prompt-singer.github.io 上获取。

更新时间: 2024-07-09 07:40:52

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2403.11780v2

AI-based Automatic Segmentation of Prostate on Multi-modality Images: A Review

Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The advent of precision medicine and a significant increase in clinical capacity have spurred the need for various data-driven tasks in the field of medical imaging. Recently, numerous machine learning and data mining tools have been integrated into various medical areas, including image segmentation. This article proposes a new classification method that differentiates supervision types, either in number or kind, during the training phase. Subsequently, we conducted a survey on artificial intelligence (AI)-based automatic prostate segmentation methods, examining the advantages and limitations of each. Additionally, we introduce variants of evaluation metrics for the verification and performance assessment of the segmentation method and summarize the current challenges. Finally, future research directions and development trends are discussed, reflecting the outcomes of our literature survey, suggesting high-precision detection and treatment of prostate cancer as a promising avenue.

Updated: 2024-07-09 07:36:18

标题: 基于人工智能的多模态图像前列腺自动分割：综述

摘要: 前列腺癌对健康构成重大威胁。早期检测对降低前列腺癌患者的死亡率至关重要。一种方法涉及使用多模态（CT、MRI、US等）计算机辅助诊断（CAD）系统对前列腺区域进行分析。然而，由于图像的不完美和前列腺复杂的组织结构，前列腺分割是具有挑战性的。精准医学的出现和临床能力的显著增加推动了医学影像领域中各种数据驱动任务的需求。最近，许多机器学习和数据挖掘工具已经整合到各种医学领域，包括图像分割。本文提出了一种新的分类方法，在训练阶段区分监督类型，无论数量还是种类。随后，我们对基于人工智能（AI）的自动前列腺分割方法进行了调查，考察了每种方法的优势和局限性。此外，我们介绍了用于验证和性能评估分割方法的评估指标的变体，并总结了当前的挑战。最后，讨论了未来的研究方向和发展趋势，反映了我们文献调查的结果，建议高精度检测和治疗前列腺癌作为一个有前途的途径。

更新时间: 2024-07-09 07:36:18

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06612v1

CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding

We present CEIA, an effective framework for open-world event-based understanding. Currently training a large event-text model still poses a huge challenge due to the shortage of paired event-text data. In response to this challenge, CEIA learns to align event and image data as an alternative instead of directly aligning event and text data. Specifically, we leverage the rich event-image datasets to learn an event embedding space aligned with the image space of CLIP through contrastive learning. In this way, event and text data are naturally aligned via using image data as a bridge. Particularly, CEIA offers two distinct advantages. First, it allows us to take full advantage of the existing event-image datasets to make up the shortage of large-scale event-text datasets. Second, leveraging more training data, it also exhibits the flexibility to boost performance, ensuring scalable capability. In highlighting the versatility of our framework, we make extensive evaluations through a diverse range of event-based multi-modal applications, such as object recognition, event-image retrieval, event-text retrieval, and domain adaptation. The outcomes demonstrate CEIA's distinct zero-shot superiority over existing methods on these applications.

Updated: 2024-07-09 07:26:15

标题: CEIA：用于开放世界事件基于理解的基于CLIP的事件-图像对齐

摘要: 我们提出了CEIA，这是一个有效的开放世界事件基础理解框架。目前，由于缺乏配对的事件文本数据，训练大型事件文本模型仍然面临巨大挑战。为了应对这一挑战，CEIA学会了将事件和图像数据对齐作为一种替代方法，而不是直接对齐事件和文本数据。具体来说，我们利用丰富的事件-图像数据集通过对比学习来学习与CLIP图像空间对齐的事件嵌入空间。通过这种方式，通过使用图像数据作为桥梁，事件和文本数据自然对齐。特别是，CEIA提供了两个明显优势。首先，它允许我们充分利用现有的事件-图像数据集来弥补大规模事件-文本数据集的不足。其次，利用更多的训练数据，它还展现了提高性能的灵活性，确保可扩展的能力。通过广泛评估我们框架的通用性，在各种基于事件的多模态应用中，如对象识别、事件-图像检索、事件-文本检索和领域适应，结果表明CEIA在这些应用上具有明显的零样本优势。

更新时间: 2024-07-09 07:26:15

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06611v1

Iteratively Refined Image Reconstruction with Learned Attentive Regularizers

We propose a regularization scheme for image reconstruction that leverages the power of deep learning while hinging on classic sparsity-promoting models. Many deep-learning-based models are hard to interpret and cumbersome to analyze theoretically. In contrast, our scheme is interpretable because it corresponds to the minimization of a series of convex problems. For each problem in the series, a mask is generated based on the previous solution to refine the regularization strength spatially. In this way, the model becomes progressively attentive to the image structure. For the underlying update operator, we prove the existence of a fixed point. As a special case, we investigate a mask generator for which the fixed-point iterations converge to a critical point of an explicit energy functional. In our experiments, we match the performance of state-of-the-art learned variational models for the solution of inverse problems. Additionally, we offer a promising balance between interpretability, theoretical guarantees, reliability, and performance.

Updated: 2024-07-09 07:22:48

标题: 通过学习的注意力正则化器进行迭代优化的图像重建

摘要: 我们提出了一种用于图像重建的正则化方案，利用深度学习的能力，同时依赖于经典的稀疏促进模型。许多基于深度学习的模型难以解释并且在理论上分析起来繁琐。相比之下，我们的方案是可解释的，因为它对应于一系列凸问题的最小化。对于系列中的每个问题，基于先前解决方案生成一个掩码，以在空间上细化正则化强度。通过这种方式，模型逐渐关注图像结构。对于底层的更新算子，我们证明了固定点的存在。作为一个特例，我们研究了一个掩码生成器，其中固定点迭代收敛到显式能量泛函的临界点。在我们的实验中，我们与解决逆问题的最新学习变分模型的性能相匹配。此外，我们在解释性、理论保证、可靠性和性能之间提供了一个有前途的平衡。

更新时间: 2024-07-09 07:22:48

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.06608v1

Reliable Projection Based Unsupervised Learning for Semi-Definite QCQP with Application of Beamforming Optimization

In this paper, we investigate a special class of quadratic-constrained quadratic programming (QCQP) with semi-definite constraints. Traditionally, since such a problem is non-convex and N-hard, the neural network (NN) is regarded as a promising method to obtain a high-performing solution. However, due to the inherent prediction error, it is challenging to ensure all solution output by the NN is feasible. Although some existing methods propose some naive methods, they only focus on reducing the constraint violation probability, where not all solutions are feasibly guaranteed. To deal with the above challenge, in this paper a computing efficient and reliable projection is proposed, where all solution output by the NN are ensured to be feasible. Moreover, unsupervised learning is used, so the NN can be trained effectively and efficiently without labels. Theoretically, the solution of the NN after projection is proven to be feasible, and we also prove the projection method can enhance the convergence performance and speed of the NN. To evaluate our proposed method, the quality of service (QoS)-contained beamforming scenario is studied, where the simulation results show the proposed method can achieve high-performance which is competitive with the lower bound.

Updated: 2024-07-09 07:22:42

标题: 可靠的基于投影的无监督学习方法在半定 QCQP 中的应用与波束成形优化

摘要: 在这篇论文中，我们研究了一类具有半定约束的二次约束二次规划（QCQP）。传统上，由于这样的问题是非凸和N-hard的，神经网络（NN）被视为获得高性能解决方案的有前途的方法。然而，由于固有的预测误差，确保NN输出的所有解决方案都是可行的是具有挑战性的。尽管一些现有方法提出了一些朴素的方法，它们只专注于减少约束违反概率，而并非所有解决方案都能得到可行性保证。为了应对上述挑战，在本文中提出了一个计算高效且可靠的投影方法，确保NN输出的所有解决方案都是可行的。此外，采用了无监督学习，使得NN可以在没有标签的情况下进行有效和高效的训练。从理论上讲，经过投影后的NN的解被证明是可行的，我们还证明了投影方法可以提高NN的收敛性能和速度。为了评估我们提出的方法，研究了包含服务质量（QoS）的波束成形场景，模拟结果显示所提出的方法可以实现与下限相竞争的高性能。

更新时间: 2024-07-09 07:22:42

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.03668v2

Robustness and Exploration of Variational and Machine Learning Approaches to Inverse Problems: An Overview

This paper provides an overview of current approaches for solving inverse problems in imaging using variational methods and machine learning. A special focus lies on point estimators and their robustness against adversarial perturbations. In this context results of numerical experiments for a one-dimensional toy problem are provided, showing the robustness of different approaches and empirically verifying theoretical guarantees. Another focus of this review is the exploration of the subspace of data-consistent solutions through explicit guidance to satisfy specific semantic or textural properties.

Updated: 2024-07-09 07:13:56

标题: 抗干扰性和探索性的变分和机器学习方法在逆问题中的应用：概述

摘要: 本文概述了使用变分方法和机器学习解决成像反问题的当前方法。特别关注点估计器及其针对对抗性扰动的稳健性。在这个背景下，提供了针对一维玩具问题的数值实验结果，展示了不同方法的稳健性，并在经验上验证了理论保证。本文评述的另一个重点是通过明确引导来探索数据一致解空间，以满足特定语义或纹理属性。

更新时间: 2024-07-09 07:13:56

领域: eess.IV,cs.CV,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2402.12072v2

Solving General Natural-Language-Description Optimization Problems with Large Language Models

Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale selfdeveloped optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the promptbased models. Some features of OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat).

Updated: 2024-07-09 07:11:10

标题: 用大型语言模型解决通用自然语言描述优化问题

摘要: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, converts them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale self-developed optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the prompt-based models. Some features of the OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat).

更新时间: 2024-07-09 07:11:10

领域: math.OC,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.07924v1

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM.

Updated: 2024-07-09 07:08:30

标题: FunAudioLLM：用于人类与LLM之间自然交互的语音理解和生成基础模型

摘要: 这份报告介绍了FunAudioLLM，这是一个旨在增强人类与大型语言模型（LLM）之间自然语音交互的模型系列。其核心包括两个创新模型：SenseVoice，负责处理多语言语音识别、情感识别和音频事件检测；以及CosyVoice，用于实现对多种语言、音色、说话风格和说话人身份的自然语音生成控制。 SenseVoice-Small能够提供5种语言的极低延迟自动语音识别（ASR），SenseVoice-Large支持50多种语言的高精度ASR，而CosyVoice在多语言语音生成、零样本上下文学习、跨语言语音克隆和遵循指令等方面表现出色。与SenseVoice和CosyVoice相关的模型已在Modelscope和Huggingface上开源，同时在GitHub上发布了相应的训练、推断和微调代码。通过将这些模型与LLM集成，FunAudioLLM实现了诸如语音-语音翻译、情感语音聊天、互动播客和富有表现力的有声书叙述等应用，从而推动了语音交互技术的边界。演示可在https://fun-audio-llm.github.io上查看，代码可在https://github.com/FunAudioLLM上访问。

更新时间: 2024-07-09 07:08:30

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.04051v2

Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks

We present and evaluate a method called grammar masking, which is used to guide large language models (LLMs) toward producing syntactically correct models for a given context-free grammar. Prompt engineering methods such as few-shot learning or priming can be used to improve the chances of an LLM producing correct syntax, but the more complex the grammar, the more time-consuming and less promising these methods become. Previous work is focused primarily on the usage of either language model training or prompt engineering. In this work, a method is presented that restricts the output to a given grammar using constrained decoding to ensure the output adheres to a valid syntax. We use several DSLs built with MontiCore and task multiple LLMs to produce models with and without constrained decoding. A corresponding parser is used to confirm the syntactic correctness of each model. We show that grammar masking can dramatically improve the modeling capabilities of several LLMs, reducing the need for well-refined prompting while increasing the chance of producing correct models.

Updated: 2024-07-09 07:08:11

标题: 使用语法遮蔽确保在基于LLM的建模任务中的句法有效性

摘要: 我们提出并评估了一种称为语法屏蔽的方法，用于引导大型语言模型（LLMs）在给定上下文无关语法的情况下产生符合语法的模型。提示工程方法，如少样本学习或引导，可以用来提高LLM产生正确语法的机会，但是语法越复杂，这些方法就越耗时且希望越小。先前的工作主要集中在语言模型训练或提示工程的使用上。在这项工作中，提出了一种使用受限解码将输出限制为给定语法以确保输出符合有效语法的方法。我们使用使用MontiCore构建的多个DSL，并要求多个LLM产生具有和不具有受限解码的模型。使用相应的解析器确认每个模型的语法正确性。我们展示了语法屏蔽可以极大地提高几个LLM的建模能力，减少对精心设计提示的需求，同时增加产生正确模型的机会。

更新时间: 2024-07-09 07:08:11

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2407.06146v2

TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq \mu$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}

Updated: 2024-07-09 06:57:30

标题: TVR-Ranking：一种用于带有不精确查询的视频时刻检索的数据集

摘要: 在这篇论文中，我们提出了\textit{Ranked Video Moment Retrieval}（RVMR）的任务，通过自然语言查询从一系列视频中定位匹配时刻的排序列表。尽管CV、NLP和IR社区已经提出并研究了一些相关任务，但RVMR是最能反映时刻搜索实际情境的任务。为了促进RVMR的研究，我们基于TVR数据集中提供的原始视频和现有时刻注释开发了TVR-Ranking数据集。我们的主要贡献是对94,442个查询-时刻对进行相关性级别的手动注释。然后，我们为这一新任务开发了$NDCG@K, IoU\geq \mu$评估指标，并进行实验评估了三种基线模型。我们的实验表明，新的RVMR任务给现有模型带来了新的挑战，我们相信这一新数据集对多模态搜索的研究有所贡献。该数据集可在\url{https://github.com/Ranking-VMR/TVR-Ranking}中获取。

更新时间: 2024-07-09 06:57:30

领域: cs.AI

下载: http://arxiv.org/abs/2407.06597v1

Point Cloud Geometry Scalable Coding with a Quality-Conditioned Latents Probability Estimator

The widespread usage of point clouds (PC) for immersive visual applications has resulted in the use of very heterogeneous receiving conditions and devices, notably in terms of network, hardware, and display capabilities. In this scenario, quality scalability, i.e., the ability to reconstruct a signal at different qualities by progressively decoding a single bitstream, is a major requirement that has yet to be conveniently addressed, notably in most learning-based PC coding solutions. This paper proposes a quality scalability scheme, named Scalable Quality Hyperprior (SQH), adaptable to learning-based static point cloud geometry codecs, which uses a Quality-conditioned Latents Probability Estimator (QuLPE) to decode a high-quality version of a PC learning-based representation, based on an available lower quality base layer. SQH is integrated in the future JPEG PC coding standard, allowing to create a layered bitstream that can be used to progressively decode the PC geometry with increasing quality and fidelity. Experimental results show that SQH offers the quality scalability feature with very limited or no compression performance penalty at all when compared with the corresponding non-scalable solution, thus preserving the significant compression gains over other state-of-the-art PC codecs.

Updated: 2024-07-09 06:56:06

标题: 基于质量条件概率估计器的点云几何可扩展编码

摘要: 广泛使用点云（PC）进行沉浸式视觉应用导致接收条件和设备非常异质化，尤其在网络、硬件和显示能力方面。在这种情况下，质量可伸缩性即通过逐步解码单一比特流在不同质量下重建信号的能力是一个重要需求，目前大多数基于学习的PC编码解决方案尚未方便地解决这个问题。本文提出了一种名为可伸缩质量超先验（SQH）的质量可伸缩方案，适用于基于学习的静态点云几何编解码器，该编解码器使用质量条件概率估计器（QuLPE）来解码基于可用较低质量基层的PC学习表示的高质量版本。SQH被集成在未来的JPEG PC编码标准中，允许创建一个分层比特流，可用于逐步解码具有不断提高的质量和保真度的PC几何体。实验结果显示，与对应的不可伸缩解决方案相比，SQH在保持其他最先进PC编解码器的显著压缩增益的同时，提供了质量可伸缩性功能，并几乎没有或没有任何压缩性能损失。

更新时间: 2024-07-09 06:56:06

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2404.07698v2

PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general purpose LLMs often fall short. In this study, we introduce PharmaGPT, a suite of domain specilized LLMs with 13 billion and 70 billion parameters, specifically trained on a comprehensive corpus tailored to the Bio-Pharmaceutical and Chemical domains. Our evaluation shows that PharmaGPT surpasses existing general models on specific-domain benchmarks such as NAPLEX, demonstrating its exceptional capability in domain-specific tasks. Remarkably, this performance is achieved with a model that has only a fraction, sometimes just one-tenth-of the parameters of general-purpose large models. This advancement establishes a new benchmark for LLMs in the bio-pharmaceutical and chemical fields, addressing the existing gap in specialized language modeling. It also suggests a promising path for enhanced research and development, paving the way for more precise and effective NLP applications in these areas.

Updated: 2024-07-09 06:52:17

标题: PharmaGPT：面向生物制药和化学领域的特定领域大型语言模型

摘要: 大型语言模型（LLMs）通过最小化复杂特征工程的需求，彻底改变了自然语言处理（NLP）。然而，LLMs在生物制药和化学等专门领域的应用仍然很少被探索。这些领域的特点是术语复杂、专业知识深入，对精度要求高，而通用目的的LLMs往往无法满足。在本研究中，我们引入了PharmaGPT，一个具有130亿和700亿参数的领域专用LLMs套件，专门针对生物制药和化学领域的全面语料库进行训练。我们的评估表明，PharmaGPT在特定领域的基准测试中超越了现有的通用模型（如NAPLEX），展示了其在领域特定任务中的卓越能力。值得注意的是，这种性能是通过一个仅拥有通用大型模型参数的一小部分（有时仅为十分之一）的模型实现的。这一进展为生物制药和化学领域的LLMs设立了新的基准，填补了专门语言建模中的现有差距。它还为增强研究和开发提供了一个有前途的道路，为这些领域的更精确和有效的NLP应用铺平了道路。

更新时间: 2024-07-09 06:52:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18045v3

Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)

The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disassembly AMMR(BEAM-1) system based on NeuralSymbolic AI. It detects the environmental state by leveraging a combination of multi-sensors and neural predicates and then translates this information into a quasi-symbolic space. In real-time, it identifies the optimal sequence of action primitives through LLM-heuristic tree search, ensuring high-precision execution of these primitives. Additionally, it employs positional speculative sampling using intuitive networks and achieves the disassembly of various bolt types with a meticulously designed end-effector. Importantly, BEAM-1 is a continuously learning embodied intelligence system capable of subjective reasoning like a human, and possessing intuition. A large number of real scene experiments have proved that it can autonomously perceive, decide, and execute to complete the continuous disassembly of bolts in multiple, multi-category, and complex situations, with a success rate of 98.78%. This research attempts to use NeuroSymbolic AI to give robots real autonomous reasoning, planning, and learning capabilities. BEAM-1 realizes the revolution of battery disassembly. Its framework can be easily ported to any robotic system to realize different application scenarios, which provides a ground-breaking idea for the design and implementation of future embodied intelligent robotic systems.

Updated: 2024-07-09 06:44:20

标题: Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)的设计与实施：革新电池拆卸技术

摘要: 电动汽车电池的高效拆解对绿色制造和可持续发展至关重要。目前由自主移动操纵机器人(AMMR)进行的预先编程拆解难以满足动态环境、复杂场景和非结构化过程中的拆解要求。本文提出了一种基于神经符号人工智能的电池拆解AMMR(BEAM-1)系统。它通过利用多传感器和神经谓词的组合来检测环境状态，然后将这些信息转化为准符号空间。在实时中，通过LLM启发式树搜索，识别出行动基元的最佳顺序，确保这些基元的高精度执行。此外，它利用直观网络进行位置推测抽样，并通过精心设计的末端执行器实现各种螺栓类型的拆解。重要的是，BEAM-1是一个不断学习的具有主观推理和直觉能力的实体智能系统，类似于人类。大量实景实验证明它能够自主感知、决策和执行，以完成多种、多类别和复杂情况下的螺栓连续拆解，成功率为98.78%。这项研究试图利用神经符号人工智能赋予机器人真正的自主推理、规划和学习能力。BEAM-1实现了电池拆解的革命。其框架可以轻松移植到任何机器人系统，实现不同的应用场景，为未来实体智能机器人系统的设计和实施提供了开创性的思路。

更新时间: 2024-07-09 06:44:20

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.06590v1

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning

Large Language Models (LLMs) are recruited in applications that span from clinical assistance and legal support to question answering and education. Their success in specialized tasks has led to the claim that they possess human-like linguistic capabilities related to compositional understanding and reasoning. Yet, reverse-engineering is bound by Moravec's Paradox, according to which easy skills are hard. We systematically assess 7 state-of-the-art models on a novel benchmark. Models answered a series of comprehension questions, each prompted multiple times in two settings, permitting one-word or open-length replies. Each question targets a short text featuring high-frequency linguistic constructions. To establish a baseline for achieving human-like performance, we tested 400 humans on the same prompts. Based on a dataset of n=26,680 datapoints, we discovered that LLMs perform at chance accuracy and waver considerably in their answers. Quantitatively, the tested models are outperformed by humans, and qualitatively their answers showcase distinctly non-human errors in language understanding. We interpret this evidence as suggesting that, despite their usefulness in various tasks, current AI models fall short of understanding language in a way that matches humans, and we argue that this may be due to their lack of a compositional operator for regulating grammatical and semantic information.

Updated: 2024-07-09 06:25:16

标题: 在语言理解任务上测试人工智能揭示对基本含义的不敏感性

摘要: 大型语言模型（LLMs）被应用于从临床辅助和法律支持到问题回答和教育等各个领域。它们在专业任务中的成功导致了宣称它们具有类似于人类的语言能力，涉及构成理解和推理。然而，根据莫拉维克悖论，逆向工程受限于易得技能困难的现象。我们在一个新颖的基准测试中系统评估了7个最先进的模型。模型回答了一系列理解问题，每个问题在两种设置中多次提示，允许一词或开放长度的回答。每个问题针对一个特征高频语言结构的短文本。为了建立实现类似人类表现的基准，我们对同一提示进行了400人的测试。基于n=26,680个数据点的数据集，我们发现LLMs在准确性上表现为机会水平，并在回答中波动较大。在数量上，测试的模型被人类超越，质量上它们的回答展示出明显非人类的语言理解错误。我们将这一证据解释为尽管它们在各种任务中有用，但当前的AI模型在理解语言方面仍不及人类，并且我们认为这可能是由于它们缺乏用于调节语法和语义信息的构成运算符。

更新时间: 2024-07-09 06:25:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2302.12313v4

Harnessing Orthogonality to Train Low-Rank Neural Networks

This study explores the learning dynamics of neural networks by analyzing the singular value decomposition (SVD) of their weights throughout training. Our investigation reveals that an orthogonal basis within each multidimensional weight's SVD representation stabilizes during training. Building upon this, we introduce Orthogonality-Informed Adaptive Low-Rank (OIALR) training, a novel training method exploiting the intrinsic orthogonality of neural networks. OIALR seamlessly integrates into existing training workflows with minimal accuracy loss, as demonstrated by benchmarking on various datasets and well-established network architectures. With appropriate hyperparameter tuning, OIALR can surpass conventional training setups, including those of state-of-the-art models.

Updated: 2024-07-09 06:23:23

标题: 利用正交性训练低秩神经网络

摘要: 这项研究通过分析神经网络在训练过程中权重的奇异值分解（SVD），探索了神经网络学习动态。我们的研究发现，在训练过程中，每个多维权重的SVD表示中存在一个正交基。基于此，我们引入了一种新颖的训练方法：Orthogonality-Informed Adaptive Low-Rank（OIALR）训练，利用神经网络的固有正交性。OIALR可以无缝集成到现有的训练流程中，并且在各种数据集和成熟的网络架构上进行基准测试显示，OIALR几乎不损失准确性。通过适当调整超参数，OIALR可以超越传统的训练设置，包括最先进的模型。

更新时间: 2024-07-09 06:23:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.08505v3

Vision language models are blind

Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. Yet, we find that VLMs fail on 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting the number of circles in a Olympic-like logo. The shockingly poor performance of four state-of-the-art VLMs suggests their vision is, at best, like of a person with myopia seeing fine details as blurry, and at worst, like an intelligent person that is blind making educated guesses. Code is available at: https://vlmsareblind.github.io/

Updated: 2024-07-09 06:20:17

标题: 视觉语言模型是盲目的

摘要: 具有视觉能力的大型语言模型（VLMs），例如GPT-4o和Gemini 1.5 Pro，正在推动无数的图像文本应用，并在许多视觉理解基准测试中取得高分。然而，我们发现VLMs在7个对人类来说非常简单的视觉任务上表现不佳，例如识别（a）两个圆是否重叠；（b）两条线是否相交；（c）单词中哪个字母被圈出；以及（d）在奥林匹克式标志中计算圆的数量。四个最先进的VLMs的惊人表现差劲表明它们的视觉能力，最好的情况类似于近视者看到细节模糊，最糟糕的情况类似于一个聪明的盲人在进行有根据的猜测。代码可在以下链接找到：https://vlmsareblind.github.io/

更新时间: 2024-07-09 06:20:17

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.06581v1

Virtual Personas for Language Models via an Anthology of Backstories

Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits. While these models bear the potential to be used as approximations of human subjects in behavioral studies, prior efforts have been limited in steering model responses to match individual human users. In this work, we introduce "Anthology", a method for conditioning LLMs to particular virtual personas by harnessing open-ended life narratives, which we refer to as "backstories." We show that our methodology enhances the consistency and reliability of experimental outcomes while ensuring better representation of diverse sub-populations. Across three nationally representative human surveys conducted as part of Pew Research Center's American Trends Panel (ATP), we demonstrate that Anthology achieves up to 18% improvement in matching the response distributions of human respondents and 27% improvement in consistency metrics. Our code and generated backstories are available at https://github.com/CannyLab/anthology.

Updated: 2024-07-09 06:11:18

标题: 通过故事集创造语言模型的虚拟角色

摘要: 大型语言模型（LLMs）是从数百万不同作者创作的庞大文本库中训练出来的，反映了巨大的人类特质多样性。虽然这些模型有潜力被用作行为研究中人类主体的近似，但之前的努力在引导模型响应以匹配个别人类用户方面受到了限制。在这项工作中，我们引入了“Anthology”方法，通过利用开放式生活叙事，即我们称之为“背景故事”，来使LLMs适应特定的虚拟人物。我们展示了我们的方法提高了实验结果的一致性和可靠性，同时确保更好地代表多样的人群子集。通过在皮尤研究中心美国趋势小组（ATP）进行的三项代表性人类调查中展示，我们证明Anthology在匹配人类受访者响应分布方面取得了高达18%的改进，一致性指标提高了27%。我们的代码和生成的背景故事可在https://github.com/CannyLab/anthology 上获取。

更新时间: 2024-07-09 06:11:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06576v1

MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction

AutoEncoders (AEs) are commonly used for machine learning tasks due to their intrinsic learning ability. This unique characteristic can be capitalized for Outlier Detection (OD). However conventional AE-based methods face the issue of overconfident decisions and unexpected reconstruction results of outliers, limiting their performance in OD. To mitigate these issues, the Mean Squared Error (MSE) and Negative Logarithmic Likelihood (NLL) were firstly analyzed, and the importance of incorporating aleatoric uncertainty to AE-based OD was elucidated. Then the Weighted Negative Logarithmic Likelihood (WNLL) was proposed to adjust for the effect of uncertainty for different OD scenarios. Moreover, the Mean-Shift Scoring (MSS) method was proposed to utilize the local relationship of data to reduce the issue of false inliers caused by AE. Experiments on 32 real-world OD datasets proved the effectiveness of the proposed methods. The combination of WNLL and MSS achieved 41% relative performance improvement compared to the best baseline. In addition, MSS improved the detection performance of multiple AE-based outlier detectors by an average of 20%. The proposed methods have the potential to advance AE's development in OD.

Updated: 2024-07-09 06:08:17

标题: MSS-PAE：从意外重构中挽救基于自动编码器的异常检测

摘要: AutoEncoders（AEs）由于其固有的学习能力而常用于机器学习任务。这一独特特征可以用于异常检测（OD）。然而，传统的基于AE的方法面临过于自信的决策和异常值的意外重构结果的问题，限制了它们在OD中的性能。为了缓解这些问题，首先分析了均方误差（MSE）和负对数似然（NLL），阐明了将随机不确定性纳入基于AE的OD的重要性。然后提出了加权负对数似然（WNLL）来调整不同OD场景下不确定性的影响。此外，提出了均值漂移评分（MSS）方法，利用数据的局部关系来减少AE引起的假内点问题。在32个真实世界的OD数据集上的实验证明了所提出方法的有效性。WNLL和MSS的组合相比最佳基线实现了41%的相对性能改善。此外，MSS将多个基于AE的异常检测器的检测性能平均提高了20%。所提出的方法有潜力推动AE在OD中的发展。

更新时间: 2024-07-09 06:08:17

领域: cs.LG

下载: http://arxiv.org/abs/2304.00709v3

Source Code Summarization in the Era of Large Language Models

To support software developers in understanding and maintaining programs, various automatic (source) code summarization techniques have been proposed to generate a concise natural language summary (i.e., comment) for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of code-related tasks. In this paper, we undertake a systematic and comprehensive study on code summarization in the era of LLMs, which covers multiple aspects involved in the workflow of LLM-based code summarization. Specifically, we begin by examining prevalent automated evaluation methods for assessing the quality of summaries generated by LLMs and find that the results of the GPT-4 evaluation method are most closely aligned with human evaluation. Then, we explore the effectiveness of five prompting techniques (zero-shot, few-shot, chain-of-thought, critique, and expert) in adapting LLMs to code summarization tasks. Contrary to expectations, advanced prompting techniques may not outperform simple zero-shot prompting. Next, we investigate the impact of LLMs' model settings (including top\_p and temperature parameters) on the quality of generated summaries. We find the impact of the two parameters on summary quality varies by the base LLM and programming language, but their impacts are similar. Moreover, we canvass LLMs' abilities to summarize code snippets in distinct types of programming languages. The results reveal that LLMs perform suboptimally when summarizing code written in logic programming languages compared to other language types. Finally, we unexpectedly find that CodeLlama-Instruct with 7B parameters can outperform advanced GPT-4 in generating summaries describing code implementation details and asserting code properties. We hope that our findings can provide a comprehensive understanding of code summarization in the era of LLMs.

Updated: 2024-07-09 05:48:42

标题: 大语言模型时代的源代码摘要

摘要: 为了帮助软件开发人员理解和维护程序，提出了各种自动（源）代码总结技术，以生成给定代码片段的简明自然语言摘要（即评论）。最近，大型语言模型（LLMs）的出现大大提高了与代码相关任务的性能。在本文中，我们在LLMs时代对代码总结进行了系统和全面的研究，涵盖了LLM-based代码总结工作流程中涉及的多个方面。具体来说，我们首先研究了用于评估LLMs生成的摘要质量的流行自动评估方法，并发现GPT-4评估方法的结果与人类评估最为接近。然后，我们探讨了五种提示技术（零-shot、少-shot、思维链、批评和专家）在适应LLMs进行代码总结任务中的有效性。与预期相反，先进的提示技术可能不如简单的零-shot提示。接下来，我们调查了LLMs的模型设置（包括top_p和温度参数）对生成摘要质量的影响。我们发现这两个参数对摘要质量的影响因基础LLM和编程语言而异，但它们的影响是相似的。此外，我们调查了LLMs总结不同类型编程语言中的代码片段的能力。结果显示，与其他语言类型相比，LLMs在总结逻辑编程语言编写的代码时表现不佳。最后，我们意外地发现，具有7B参数的CodeLlama-Instruct可以胜过先进的GPT-4，在生成描述代码实现细节和断言代码属性的摘要方面。我们希望我们的发现可以提供对LLMs时代的代码总结的全面理解。

更新时间: 2024-07-09 05:48:42

领域: cs.SE,cs.AI,68-04,D.2.3; I.2.7

下载: http://arxiv.org/abs/2407.07959v1

Combining Knowledge Graphs and Large Language Models

In recent years, Natural Language Processing (NLP) has played a significant role in various Artificial Intelligence (AI) applications such as chatbots, text generation, and language translation. The emergence of large language models (LLMs) has greatly improved the performance of these applications, showing astonishing results in language understanding and generation. However, they still show some disadvantages, such as hallucinations and lack of domain-specific knowledge, that affect their performance in real-world tasks. These issues can be effectively mitigated by incorporating knowledge graphs (KGs), which organise information in structured formats that capture relationships between entities in a versatile and interpretable fashion. Likewise, the construction and validation of KGs present challenges that LLMs can help resolve. The complementary relationship between LLMs and KGs has led to a trend that combines these technologies to achieve trustworthy results. This work collected 28 papers outlining methods for KG-powered LLMs, LLM-based KGs, and LLM-KG hybrid approaches. We systematically analysed and compared these approaches to provide a comprehensive overview highlighting key trends, innovative techniques, and common challenges. This synthesis will benefit researchers new to the field and those seeking to deepen their understanding of how KGs and LLMs can be effectively combined to enhance AI applications capabilities.

Updated: 2024-07-09 05:42:53

标题: 结合知识图谱和大型语言模型

摘要: 最近几年，自然语言处理（NLP）在各种人工智能（AI）应用中发挥了重要作用，比如聊天机器人、文本生成和语言翻译。大型语言模型（LLMs）的出现极大地提高了这些应用的性能，在语言理解和生成方面展现出惊人的结果。然而，它们仍然存在一些缺点，比如幻觉和缺乏领域特定知识，影响了它们在现实任务中的表现。这些问题可以通过将知识图谱（KGs）纳入其中来有效缓解，KGs以结构化格式组织信息，捕捉实体之间的关系，具有多功能和可解释的特点。同样，KGs的建立和验证提出了LLMs能够帮助解决的挑战。LLMs和KGs之间的互补关系导致了将这些技术结合以实现可靠结果的趋势。本文收集了28篇论文，概述了KG驱动的LLMs、基于LLMs的KGs和LLM-KG混合方法。我们系统分析和比较了这些方法，提供了全面的概述，突出了关键趋势、创新技术和常见挑战。这种综合将有利于新手研究人员以及希望加深对KGs和LLMs如何有效结合以增强AI应用能力的人们的理解。

更新时间: 2024-07-09 05:42:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06564v1

TCKIN: A Novel Integrated Network Model for Predicting Mortality Risk in Sepsis Patients

Sepsis poses a major global health threat, accounting for millions of deaths annually and significant economic costs. Accurate predictions of mortality risk in sepsis patients facilitate the efficient allocation of medical resources, thereby enhancing patient survival and quality of life. Through precise risk assessments, healthcare facilities can effectively distribute intensive care beds, medical equipment, and staff, ensuring high-risk patients receive timely and appropriate care. Early identification and intervention significantly decrease mortality rates and improve patient outcomes. Current methods typically utilize only one type of data--either constant, temporal, or ICD codes. This study introduces the Time-Constant KAN Integrated Network(TCKIN), an innovative model that enhances the accuracy of sepsis mortality risk predictions by integrating both temporal and constant data from electronic health records and ICD codes. Validated against the MIMIC-III and MIMIC-IV datasets, TCKIN surpasses existing machine learning and deep learning methods in accuracy, sensitivity, and specificity. Notably, TCKIN achieved AUCs of 87.76% and 88.07%, demonstrating superior capability in identifying high-risk patients. Additionally, TCKIN effectively combats the prevalent issue of data imbalance in clinical settings, improving the detection of patients at elevated risk of mortality and facilitating timely interventions. These results confirm the model's effectiveness and its potential to transform patient management and treatment optimization in clinical practice. With this advanced risk assessment tool, healthcare providers can devise more tailored treatment plans, optimize resource utilization, and ultimately enhance survival rates and quality of life for sepsis patients.

Updated: 2024-07-09 05:37:50

标题: TCKIN：一种新颖的集成网络模型，用于预测脓毒症患者的死亡风险

摘要: 脓毒症是全球主要的健康威胁，每年造成数百万人死亡和巨大的经济成本。准确预测脓毒症患者的死亡风险有助于有效分配医疗资源，从而提高患者的存活率和生活质量。通过精确的风险评估，医疗机构可以有效分配重症监护床位、医疗设备和人员，确保高风险患者及时接受适当的护理。早期识别和干预显著降低死亡率并改善患者预后。目前的方法通常只利用一种类型的数据--常数、时间或ICD代码。本研究介绍了时间-常数KAN集成网络(TCKIN)，这是一种创新模型，通过整合来自电子健康记录和ICD代码的时间和常数数据，提高了脓毒症死亡风险预测的准确性。在MIMIC-III和MIMIC-IV数据集上进行验证，TCKIN在准确性、敏感性和特异性方面超过了现有的机器学习和深度学习方法。值得注意的是，TCKIN实现了87.76%和88.07%的AUC，表明其在识别高风险患者方面具有卓越能力。此外，TCKIN有效地解决了临床环境中数据不平衡的普遍问题，提高了对高死亡风险患者的检测，并促进及时干预。这些结果证实了该模型的有效性及其在临床实践中改变患者管理和治疗优化的潜力。借助这种先进的风险评估工具，医疗提供者可以制定更具针对性的治疗计划，优化资源利用，并最终提高脓毒症患者的存活率和生活质量。

更新时间: 2024-07-09 05:37:50

领域: stat.AP,cs.AI

下载: http://arxiv.org/abs/2407.06560v1

DLOVE: A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques

Recent developments in Deep Neural Network (DNN) based watermarking techniques have shown remarkable performance. The state-of-the-art DNN-based techniques not only surpass the robustness of classical watermarking techniques but also show their robustness against many image manipulation techniques. In this paper, we performed a detailed security analysis of different DNN-based watermarking techniques. We propose a new class of attack called the Deep Learning-based OVErwriting (DLOVE) attack, which leverages adversarial machine learning and overwrites the original embedded watermark with a targeted watermark in a watermarked image. To the best of our knowledge, this attack is the first of its kind. We have considered scenarios where watermarks are used to devise and formulate an adversarial attack in white box and black box settings. To show adaptability and efficiency, we launch our DLOVE attack analysis on seven different watermarking techniques, HiDDeN, ReDMark, PIMoG, Stegastamp, Aparecium, Distortion Agostic Deep Watermarking and Hiding Images in an Image. All these techniques use different approaches to create imperceptible watermarked images. Our attack analysis on these watermarking techniques with various constraints highlights the vulnerabilities of DNN-based watermarking. Extensive experimental results validate the capabilities of DLOVE. We propose DLOVE as a benchmark security analysis tool to test the robustness of future deep learning-based watermarking techniques.

Updated: 2024-07-09 05:18:14

标题: DLOVE：一种新的用于基于深度学习的水印技术的安全评估工具

摘要: 最近发展的基于深度神经网络（DNN）的水印技术表现出卓越的性能。最先进的基于DNN的技术不仅超越了传统水印技术的稳健性，而且还展示了它们对许多图像处理技术的稳健性。在本文中，我们对不同基于DNN的水印技术进行了详细的安全分析。我们提出了一种称为基于深度学习的OVErwriting（DLOVE）攻击的新攻击类型，该攻击利用对抗机器学习，将原始嵌入式水印覆盖在带水印图像中的目标水印上。据我们所知，这种攻击是首次出现。我们考虑了水印被用来设计和制定白盒和黑盒设置中的对抗攻击的情景。为了展示适应性和效率，我们对七种不同的水印技术进行了DLOVE攻击分析，包括HiDDeN、ReDMark、PIMoG、Stegastamp、Aparecium、Distortion Agostic Deep Watermarking和Hiding Images in an Image。所有这些技术使用不同的方法创建难以察觉的带水印图像。我们对这些水印技术进行了攻击分析，并突出了基于DNN的水印技术的漏洞。广泛的实验结果验证了DLOVE的能力。我们提出DLOVE作为一个基准安全分析工具，用于测试未来基于深度学习的水印技术的稳健性。

更新时间: 2024-07-09 05:18:14

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2407.06552v1

FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information

This paper establishes a mathematical foundation for the Adam optimizer, elucidating its connection to natural gradient descent through Riemannian and information geometry. We rigorously analyze the diagonal empirical Fisher information matrix (FIM) in Adam, clarifying all detailed approximations and advocating for the use of log probability functions as loss, which should be based on discrete distributions, due to the limitations of empirical FIM. Our analysis uncovers flaws in the original Adam algorithm, leading to proposed corrections such as enhanced momentum calculations, adjusted bias corrections, adaptive epsilon, and gradient clipping. We refine the weight decay term based on our theoretical framework. Our modified algorithm, Fisher Adam (FAdam), demonstrates superior performance across diverse domains including LLM, ASR, and VQ-VAE, achieving state-of-the-art results in ASR.

Updated: 2024-07-09 05:15:47

标题: 自然梯度优化器Adam：使用对角经验Fisher信息

摘要: 本文建立了Adam优化器的数学基础，通过黎曼和信息几何阐明了它与自然梯度下降的联系。我们对Adam中的对角经验Fisher信息矩阵（FIM）进行了严格分析，澄清了所有详细的近似，并倡导使用基于离散分布的对数概率函数作为损失函数，由于经验FIM的限制。我们的分析揭示了原始Adam算法中的缺陷，提出了一些更正措施，如增强的动量计算，调整的偏差校正，自适应epsilon和梯度剪切。我们根据我们的理论框架调整了权重衰减项。我们修改后的算法Fisher Adam（FAdam）在包括LLM、ASR和VQ-VAE在内的各个领域展现出卓越的性能，在ASR领域取得了最先进的结果。

更新时间: 2024-07-09 05:15:47

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.12807v8

AutoTask: Task Aware Multi-Faceted Single Model for Multi-Task Ads Relevance

Ads relevance models are crucial in determining the relevance between user search queries and ad offers, often framed as a classification problem. The complexity of modeling increases significantly with multiple ad types and varying scenarios that exhibit both similarities and differences. In this work, we introduce a novel multi-faceted attention model that performs task aware feature combination and cross task interaction modeling. Our technique formulates the feature combination problem as "language" modeling with auto-regressive attentions across both feature and task dimensions. Specifically, we introduce a new dimension of task ID encoding for task representations, thereby enabling precise relevance modeling across diverse ad scenarios with substantial improvement in generality capability for unseen tasks. We demonstrate that our model not only effectively handles the increased computational and maintenance demands as scenarios proliferate, but also outperforms generalized DNN models and even task-specific models across a spectrum of ad applications using a single unified model.

Updated: 2024-07-09 05:13:45

标题: AutoTask：针对多任务广告相关性的任务感知多方面单模型

摘要: 广告相关性模型在确定用户搜索查询和广告提供之间的相关性方面至关重要，通常被构建为分类问题。建模的复杂性随着多种广告类型和不同场景的增加而显著增加，这些场景既有相似之处又有不同之处。在这项工作中，我们引入了一种新颖的多方面注意力模型，执行任务感知特征组合和跨任务交互建模。我们的技术将特征组合问题形式化为跨特征和任务维度的自回归注意力“语言”建模。具体来说，我们引入了一个新的任务ID编码维度用于任务表示，从而在广告场景中实现精确的相关性建模，并在未见任务的情况下显著提高了泛化能力。我们展示了我们的模型不仅有效处理了随着场景增多而增加的计算和维护需求，而且在一系列广告应用程序中使用单一统一模型优于广义DNN模型甚至任务特定模型。

更新时间: 2024-07-09 05:13:45

领域: cs.IR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.06549v1

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning

Distantly-Supervised Named Entity Recognition (DS-NER) is widely used in real-world scenarios. It can effectively alleviate the burden of annotation by matching entities in existing knowledge bases with snippets in the text but suffer from the label noise. Recent works attempt to adopt the teacher-student framework to gradually refine the training labels and improve the overall robustness. However, these teacher-student methods achieve limited performance because the poor calibration of the teacher network produces incorrectly pseudo-labeled samples, leading to error propagation. Therefore, we propose: (1) Uncertainty-Aware Teacher Learning that leverages the prediction uncertainty to reduce the number of incorrect pseudo labels in the self-training stage; (2) Student-Student Collaborative Learning that allows the transfer of reliable labels between two student networks instead of indiscriminately relying on all pseudo labels from its teacher, and further enables a full exploration of mislabeled samples rather than simply filtering unreliable pseudo-labeled samples. We evaluate our proposed method on five DS-NER datasets, demonstrating that our method is superior to the state-of-the-art DS-NER methods.

Updated: 2024-07-09 05:11:25

标题: 通过不确定性感知的教师学习和学生-学生协作学习改善远程监督命名实体识别的鲁棒性

摘要: 遥感监督命名实体识别（DS-NER）在现实世界中被广泛应用。它可以通过将现有知识库中的实体与文本中的片段进行匹配，有效减轻标注的负担，但存在标签噪声的问题。最近的研究尝试采用师生框架逐渐完善训练标签并提高整体鲁棒性。然而，这些师生方法的性能受限，因为师网络的较差校准会产生不正确的伪标记样本，导致错误传播。因此，我们提出了：（1）基于不确定性的师学习，利用预测不确定性来减少自我训练阶段中不正确伪标签的数量；（2）学生-学生协作学习，允许两个学生网络之间可靠标签的传递，而不是盲目依赖其师的所有伪标签，并进一步实现对错误标记样本的全面探索，而不仅仅是过滤不可靠的伪标记样本。我们在五个DS-NER数据集上评估了我们提出的方法，表明我们的方法优于最先进的DS-NER方法。

更新时间: 2024-07-09 05:11:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.08010v2

See and Think: Embodied Agent in Virtual Environment

Large language models (LLMs) have achieved impressive pro-gress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE comprises three key components: vision perception, language instruction, and code action. Vision perception involves interpreting visual information in the environment, which is then integrated into the LLMs component with agent state and task instruction. Language instruction is responsible for iterative reasoning and decomposing complex tasks into manageable guidelines. Code action generates executable skill actions based on retrieval in skill database, enabling the agent to interact effectively within the Minecraft environment. We also collect STEVE-21K dataset, which includes 600+ vision-environment pairs, 20K knowledge question-answering pairs, and 200+ skill-code pairs. We conduct continuous block search, knowledge question and answering, and tech tree mastery to evaluate the performance. Extensive experiments show that STEVE achieves at most 1.5x faster unlocking key tech trees and 2.5x quicker in block search tasks.

Updated: 2024-07-09 05:01:06

标题: 看见和思考：虚拟环境中的具身代理

摘要: 大型语言模型（LLMs）在几个开放世界任务上取得了令人印象深刻的进展。最近，使用LLMs构建具有实体性的代理成为了热点。本文提出了STEVE，一个在Minecraft虚拟环境中具有全面愿景的实体代理。STEVE包括三个关键组件：视觉感知、语言指导和代码操作。视觉感知涉及解释环境中的视觉信息，然后将其与代理状态和任务指令结合到LLMs组件中。语言指导负责迭代推理和将复杂任务分解为可管理的指导方针。代码操作基于技能数据库中的检索生成可执行的技能操作，使代理能够有效地在Minecraft环境中进行交互。我们还收集了STEVE-21K数据集，其中包括600多个视觉环境对、20K知识问答对和200多个技能代码对。我们进行持续的块搜索、知识问答和技术树掌握以评估性能。广泛的实验证明，STEVE在解锁关键技术树方面最多快1.5倍，块搜索任务快2.5倍。

更新时间: 2024-07-09 05:01:06

领域: cs.AI

下载: http://arxiv.org/abs/2311.15209v3

Multiple Instance Verification

We explore multiple-instance verification, a problem setting where a query instance is verified against a bag of target instances with heterogeneous, unknown relevancy. We show that naive adaptations of attention-based multiple instance learning (MIL) methods and standard verification methods like Siamese neural networks are unsuitable for this setting: directly combining state-of-the-art (SOTA) MIL methods and Siamese networks is shown to be no better, and sometimes significantly worse, than a simple baseline model. Postulating that this may be caused by the failure of the representation of the target bag to incorporate the query instance, we introduce a new pooling approach named ``cross-attention pooling'' (CAP). Under the CAP framework, we propose two novel attention functions to address the challenge of distinguishing between highly similar instances in a target bag. Through empirical studies on three different verification tasks, we demonstrate that CAP outperforms adaptations of SOTA MIL methods and the baseline by substantial margins, in terms of both classification accuracy and quality of the explanations provided for the classifications. Ablation studies confirm the superior ability of the new attention functions to identify key instances.

Updated: 2024-07-09 04:51:22

标题: 多实例验证

摘要: 我们探讨了多实例验证，这是一个问题设置，其中一个查询实例与一个具有异构未知相关性的目标实例包进行验证。我们发现，基于注意力的多实例学习（MIL）方法和标准验证方法，如连体神经网络的朴素适应对于这种设置是不适用的：直接结合最先进的MIL方法和连体网络表明并不比简单的基线模型更好，有时甚至明显更差。假设这可能是由于目标包的表示未能包含查询实例导致的，我们引入了一种名为“交叉注意力池”的新池化方法（CAP）。在CAP框架下，我们提出了两种新的注意力函数来解决在目标包中区分高度相似实例的挑战。通过对三个不同的验证任务进行实证研究，我们证明CAP在分类准确性和提供的分类解释质量方面均比最先进的MIL方法和基线表现出显著的优势。消融研究证实了新注意力函数识别关键实例的卓越能力。

更新时间: 2024-07-09 04:51:22

领域: cs.LG

下载: http://arxiv.org/abs/2407.06544v1

Defending Large Language Models Against Attacks With Residual Stream Activation Analysis

The widespread adoption of Large Language Models (LLMs), exemplified by OpenAI's ChatGPT, brings to the forefront the imperative to defend against adversarial threats on these models. These attacks, which manipulate an LLM's output by introducing malicious inputs, undermine the model's integrity and the trust users place in its outputs. In response to this challenge, our paper presents an innovative defensive strategy, given white box access to an LLM, that harnesses residual activation analysis between transformer layers of the LLM. We apply a novel methodology for analyzing distinctive activation patterns in the residual streams for attack prompt classification. We curate multiple datasets to demonstrate how this method of classification has high accuracy across multiple types of attack scenarios, including our newly-created attack dataset. Furthermore, we enhance the model's resilience by integrating safety fine-tuning techniques for LLMs in order to measure its effect on our capability to detect attacks. The results underscore the effectiveness of our approach in enhancing the detection and mitigation of adversarial inputs, advancing the security framework within which LLMs operate.

Updated: 2024-07-09 04:39:46

标题: 用残余流激活分析技术保护大型语言模型免受攻击

摘要: 大型语言模型（LLMs）的广泛采用，如OpenAI的ChatGPT，突显了必须防御这些模型面临的对抗威胁的迫切性。这些攻击通过引入恶意输入来操纵LLM的输出，破坏了模型的完整性以及用户对其输出的信任。针对这一挑战，我们的论文提出了一种创新的防御策略，在获得LLM的白盒访问权限的情况下，利用LLM的变压器层之间的残余激活分析。我们应用一种新颖的方法来分析残差流中独特的激活模式，用于攻击提示分类。我们整理了多个数据集，以展示这种分类方法在多种类型的攻击场景中具有高准确性，包括我们新创建的攻击数据集。此外，我们通过将LLMs的安全微调技术整合到模型中来增强模型的韧性，以评估其对我们检测攻击能力的影响。结果强调了我们的方法在增强检测和减轻对抗输入方面的有效性，推动了LLMs运行的安全框架的发展。

更新时间: 2024-07-09 04:39:46

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2406.03230v3

DriftGAN: Using historical data for Unsupervised Recurring Drift Detection

In real-world applications, input data distributions are rarely static over a period of time, a phenomenon known as concept drift. Such concept drifts degrade the model's prediction performance, and therefore we require methods to overcome these issues. The initial step is to identify concept drifts and have a training method in place to recover the model's performance. Most concept drift detection methods work on detecting concept drifts and signalling the requirement to retrain the model. However, in real-world cases, there could be concept drifts that recur over a period of time. In this paper, we present an unsupervised method based on Generative Adversarial Networks(GAN) to detect concept drifts and identify whether a specific concept drift occurred in the past. Our method reduces the time and data the model requires to get up to speed for recurring drifts. Our key results indicate that our proposed model can outperform the current state-of-the-art models in most datasets. We also test our method on a real-world use case from astrophysics, where we detect the bow shock and magnetopause crossings with better results than the existing methods in the domain.

Updated: 2024-07-09 04:38:44

标题: DriftGAN：使用历史数据进行无监督的循环漂移检测

摘要: 在现实世界的应用中，输入数据分布很少在一段时间内保持不变，这种现象称为概念漂移。这种概念漂移会降低模型的预测性能，因此我们需要方法来克服这些问题。初始步骤是识别概念漂移，并采用训练方法来恢复模型的性能。大多数概念漂移检测方法都是用于检测概念漂移并发出信号，要求重新训练模型。然而，在现实情况下，可能会有一段时间内反复发生概念漂移。在本文中，我们提出了一种基于生成对抗网络（GAN）的无监督方法，用于检测概念漂移并识别是否在过去发生了特定的概念漂移。我们的方法减少了模型为应对反复发生的漂移而需要的时间和数据。我们的关键结果表明，我们提出的模型在大多数数据集上可以超越当前的最先进模型。我们还在天体物理学中的一个真实用例上测试了我们的方法，在检测弓形冲击和磁层交叉时取得了比该领域现有方法更好的结果。

更新时间: 2024-07-09 04:38:44

领域: cs.LG

下载: http://arxiv.org/abs/2407.06543v1

Fine-grained, Multi-dimensional Summarization Evaluation with LLMs

Automated evaluation is crucial for streamlining text summarization benchmarking and model development, given the costly and time-consuming nature of human evaluation. Traditional methods like ROUGE do not correlate well with human judgment, while recently proposed LLM-based metrics provide only summary-level assessment using Likert-scale scores. This limits deeper model analysis, e.g., we can only assign one hallucination score at the summary level, while at the sentence level, we can count sentences containing hallucinations. To remedy those limitations, we propose FineSurE, a fine-grained evaluator specifically tailored for the summarization task using large language models (LLMs). It also employs completeness and conciseness criteria, in addition to faithfulness, enabling multi-dimensional assessment. We compare various open-source and proprietary LLMs as backbones for FineSurE. In addition, we conduct extensive benchmarking of FineSurE against SOTA methods including NLI-, QA-, and LLM-based methods, showing improved performance especially on the completeness and conciseness dimensions. The code is available at https://github.com/DISL-Lab/FineSurE-ACL24.

Updated: 2024-07-09 04:32:44

标题: 细粒度、多维度的用LLMs进行摘要评估

摘要: 自动化评估对于简化文本摘要基准测试和模型开发至关重要，考虑到人工评估的成本高昂且耗时。传统方法如ROUGE与人类判断的相关性不强，而最近提出的基于LLM的度量仅使用Likert量表分数进行摘要级评估。这限制了对模型的深入分析，例如，我们只能在摘要级别分配一个错觉得分，而在句子级别，我们可以计算包含错觉的句子数。为了解决这些限制，我们提出了FineSurE，一个专门针对使用大型语言模型（LLMs）进行摘要任务的细粒度评估器。它还采用完整性和简洁性标准，除了忠实度，还能进行多维度评估。我们比较了各种开源和专有LLMs作为FineSurE的支撑。此外，我们对FineSurE与NLI、QA和基于LLM的方法等SOTA方法进行了广泛的基准测试，显示在完整性和简洁性维度上表现出改进的性能。代码可在https://github.com/DISL-Lab/FineSurE-ACL24找到。

更新时间: 2024-07-09 04:32:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.00908v2

General and Task-Oriented Video Segmentation

We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies deployment. However, such a highly homogenized framework in current design, where each element maintains uniformity, could overlook the inherent diversity among different tasks and lead to suboptimal performance. To tackle this, GvSeg: i) provides a holistic disentanglement and modeling for segment targets, thoroughly examining them from the perspective of appearance, position, and shape, and on this basis, ii) reformulates the query initialization, matching and sampling strategies in alignment with the task-specific requirement. These architecture-agnostic innovations empower GvSeg to effectively address each unique task by accommodating the specific properties that characterize them. Extensive experiments on seven gold-standard benchmark datasets demonstrate that GvSeg surpasses all existing specialized/general solutions by a significant margin on four different video segmentation tasks.

Updated: 2024-07-09 04:21:38

标题: 一般和任务导向的视频分割

摘要: 我们提出了GvSeg，一个通用的视频分割框架，用于解决四种不同的视频分割任务（即实例、语义、全景和示例引导），同时保持相同的架构设计。目前，有一种趋势是开发通用的视频分割解决方案，可以应用于多个任务。这简化了研究工作并简化了部署。然而，在当前设计中，这种高度同质化的框架，其中每个元素保持一致性，可能忽视不同任务之间的固有多样性，并导致性能不佳。为了解决这个问题，GvSeg：i）为分割目标提供了全面的解缠和建模，从外观、位置和形状的角度彻底检查它们，基于此，ii）重新制定了查询初始化、匹配和采样策略，以符合任务特定要求。这些与架构无关的创新使GvSeg能够通过适应表征它们的特定属性来有效地处理每个独特的任务。对七个金标准基准数据集的大量实验证明，GvSeg在四种不同的视频分割任务上明显优于所有现有的专门/通用解决方案。

更新时间: 2024-07-09 04:21:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06540v1

UAVs and Birds: Enhancing Short-Range Navigation through Budgerigar Flight Studies

This study delves into the flight behaviors of Budgerigars (Melopsittacus undulatus) to gain insights into their flight trajectories and movements. Using 3D reconstruction from stereo video camera recordings, we closely examine the velocity and acceleration patterns during three flight motion takeoff, flying and landing. The findings not only contribute to our understanding of bird behaviors but also hold significant implications for the advancement of algorithms in Unmanned Aerial Vehicles (UAVs). The research aims to bridge the gap between biological principles observed in birds and the application of these insights in developing more efficient and autonomous UAVs. In the context of the increasing use of drones, this study focuses on the biologically inspired principles drawn from bird behaviors, particularly during takeoff, flying and landing flight, to enhance UAV capabilities. The dataset created for this research sheds light on Budgerigars' takeoff, flying, and landing techniques, emphasizing their ability to control speed across different situations and surfaces. The study underscores the potential of incorporating these principles into UAV algorithms, addressing challenges related to short-range navigation, takeoff, flying, and landing.

Updated: 2024-07-09 04:19:53

标题: 无人机与鸟类：通过玄凤鹦鹉飞行研究增强短程导航

摘要: 这项研究深入探讨了虎皮鹦鹉的飞行行为，以获取有关它们飞行轨迹和移动的见解。通过从立体视频摄像机记录中进行的3D重建，我们密切审查了在起飞、飞行和降落过程中的速度和加速度模式。研究结果不仅有助于我们理解鸟类行为，还对无人机（UAV）算法的进展具有重要意义。该研究旨在弥合观察到的鸟类生物学原则与将这些见解应用于开发更高效和自主的无人机之间的鸿沟。在无人机的日益广泛使用背景下，本研究侧重于从鸟类行为中汲取的生物启发原则，特别是在起飞、飞行和降落过程中，以增强无人机的功能。为这项研究创建的数据集揭示了虎皮鹦鹉的起飞、飞行和降落技术，强调了它们在不同情况和表面上控制速度的能力。该研究强调了将这些原则纳入无人机算法中的潜力，解决了与短程导航、起飞、飞行和降落相关的挑战。

更新时间: 2024-07-09 04:19:53

领域: cs.RO,cs.AI,cs.CV

下载: http://arxiv.org/abs/2312.00597v2

Efficient and Accurate Memorable Conversation Model using DPO based on sLLM

In multi-session dialog system, it is essential to continuously update the memory as the session progresses. Simply accumulating memory can make it difficult to focus on the content of the conversation for inference due to the limited input sentence size. Therefore, efficient and accurate conversation model that is capable of managing memory to reflect the conversation history continuously is necessary. This paper presents a conversation model that efficiently manages memory as sessions progress and incorporates this into the model to reflect the conversation history accurately with 3 methodologies: SFT, DPO and DPO with SFT model. Our model using DPO algorithm shows an improvement about 0.0591 of BERTScore in memory accuracy, and the rate of responses reflecting the memory increased as well. Also, response generation performance enhanced about 4.292 in fluency, 3.935 in coherence, and 2.896 in consistency. This paper describes a training method that yields better performance than models with more than twice the parameter size, even when the model size is smaller. Thus, our model demonstrates efficiency not only in terms of accuracy but also in resource utilization.

Updated: 2024-07-09 04:17:39

标题: 基于sLLM的DPO的高效准确可记忆对话模型

摘要: 在多轮对话系统中，随着对话的进行，持续更新内存是至关重要的。简单地累积记忆可能会使得由于输入句子大小有限而难以专注于推理对话内容。因此，需要一种能够有效且准确地管理内存以持续反映对话历史的会话模型。本文提出了一种会话模型，能够在会话进行过程中高效地管理内存，并将其纳入模型中以准确反映对话历史，采用了3种方法：SFT、DPO和带有SFT模型的DPO。我们的模型使用DPO算法，在内存准确性方面BERTScore提升了约0.0591，反映记忆的回应率也增加了。此外，响应生成性能在流畅度、连贯性和一致性方面分别提高了约4.292、3.935和2.896。本文描述了一种训练方法，使性能优于参数大小超过两倍的模型，即使模型大小较小也能取得更好的表现。因此，我们的模型不仅在准确性方面表现出效率，而且在资源利用方面也表现出效率。

更新时间: 2024-07-09 04:17:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06537v1

6GSoft: Software for Edge-to-Cloud Continuum

In the era of 6G, developing and managing software requires cutting-edge software engineering (SE) theories and practices tailored for such complexity across a vast number of connected edge devices. Our project aims to lead the development of sustainable methods and energy-efficient orchestration models specifically for edge environments, enhancing architectural support driven by AI for contemporary edge-to-cloud continuum computing. This initiative seeks to position Finland at the forefront of the 6G landscape, focusing on sophisticated edge orchestration and robust software architectures to optimize the performance and scalability of edge networks. Collaborating with leading Finnish universities and companies, the project emphasizes deep industry-academia collaboration and international expertise to address critical challenges in edge orchestration and software architecture, aiming to drive significant advancements in software productivity and market impact.

Updated: 2024-07-09 04:12:42

标题: 6GSoft: 边缘到云端连续体的软件

摘要: 在6G时代，开发和管理软件需要针对连接的大量边缘设备的复杂性量身定制的领先软件工程（SE）理论和实践。我们的项目旨在领导可持续方法和能效编排模型的开发，特别针对边缘环境，增强由人工智能驱动的当代边缘到云端连续计算的架构支持。这一倡议旨在将芬兰置于6G领域的前沿，专注于复杂的边缘编排和强大的软件架构，以优化边缘网络的性能和可伸缩性。与芬兰领先的大学和公司合作，该项目强调深度产学合作和国际专业知识，以解决边缘编排和软件架构中的关键挑战，旨在推动软件生产力和市场影响的重大进步。

更新时间: 2024-07-09 04:12:42

领域: cs.SE,cs.AI,cs.NI,cs.SI

下载: http://arxiv.org/abs/2407.05963v2

LETS-C: Leveraging Language Embedding for Time Series Classification

Recent advancements in language modeling have shown promising results when applied to time series data. In particular, fine-tuning pre-trained large language models (LLMs) for time series classification tasks has achieved state-of-the-art (SOTA) performance on standard benchmarks. However, these LLM-based models have a significant drawback due to the large model size, with the number of trainable parameters in the millions. In this paper, we propose an alternative approach to leveraging the success of language modeling in the time series domain. Instead of fine-tuning LLMs, we utilize a language embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP). We conducted extensive experiments on well-established time series classification benchmark datasets. We demonstrated LETS-C not only outperforms the current SOTA in classification accuracy but also offers a lightweight solution, using only 14.5% of the trainable parameters on average compared to the SOTA model. Our findings suggest that leveraging language encoders to embed time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification while maintaining a lightweight model architecture.

Updated: 2024-07-09 04:07:57

标题: LETS-C：利用语言嵌入进行时间序列分类

摘要: 最近在语言建模方面取得的进展显示了在应用于时间序列数据时有着很有前景的结果。特别是，对预训练的大型语言模型（LLMs）进行微调，用于时间序列分类任务，在标准基准测试中取得了最先进的性能。然而，这些基于LLM的模型由于模型尺寸大，可训练参数数量在百万级别，存在显著缺点。在本文中，我们提出了一种利用语言建模在时间序列领域取得成功的替代方法。我们不是对LLMs进行微调，而是利用语言嵌入模型对时间序列进行嵌入，然后将嵌入与由卷积神经网络（CNN）和多层感知器（MLP）组成的简单分类头配对。我们在已建立的时间序列分类基准数据集上进行了广泛的实验。我们证明了LETS-C不仅在分类准确度方面优于当前的最先进技术，而且提供了一个轻量级解决方案，与SOTA模型相比，平均只使用了14.5%的可训练参数。我们的研究结果表明，利用语言编码器对时间序列数据进行嵌入，结合简单而有效的分类头，为实现高性能的时间序列分类提供了一个有前景的方向，同时保持轻量级模型架构。

更新时间: 2024-07-09 04:07:57

领域: cs.LG,cs.AI,cs.CE,cs.CL,stat.ME

下载: http://arxiv.org/abs/2407.06533v1

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. The core of Mooncake is its KVCache-centric scheduler, which balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs). Unlike traditional studies that assume all requests will be processed, Mooncake faces challenges due to highly overloaded scenarios. To mitigate these, we developed a prediction-based early rejection policy. Experiments show that Mooncake excels in long-context scenarios. Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs. Under real workloads, Mooncake's innovative architecture enables Kimi to handle 75% more requests.

Updated: 2024-07-09 04:03:10

标题: 月饼：基于KVCache的分散式架构用于LLM服务

摘要: 月饼是由Moonshot AI提供的领先LLM服务Kimi的服务平台。它采用了以KVCache为中心的分离架构，将预填充和解码集群分开。它还利用GPU集群的CPU、DRAM和SSD资源来实现KVCache的分离缓存。Mooncake的核心是其以KVCache为中心的调度器，它在平衡最大化整体有效吞吐量的同时满足与延迟相关的服务水平目标（SLOs）。与传统研究不同，Mooncake面临由于高度超载场景而带来的挑战。为缓解这些问题，我们开发了基于预测的早期拒绝策略。实验表明，Mooncake在长上下文场景中表现出色。与基准方法相比，在某些模拟场景中，Mooncake的吞吐量可增加高达525%，同时遵守SLOs。在实际工作负载下，Mooncake的创新架构使Kimi能够处理更多请求，增加了75%。

更新时间: 2024-07-09 04:03:10

领域: cs.DC,cs.AI,cs.AR

下载: http://arxiv.org/abs/2407.00079v3

Advanced Financial Fraud Detection Using GNN-CL Model

The innovative GNN-CL model proposed in this paper marks a breakthrough in the field of financial fraud detection by synergistically combining the advantages of graph neural networks (gnn), convolutional neural networks (cnn) and long short-term memory (LSTM) networks. This convergence enables multifaceted analysis of complex transaction patterns, improving detection accuracy and resilience against complex fraudulent activities. A key novelty of this paper is the use of multilayer perceptrons (MLPS) to estimate node similarity, effectively filtering out neighborhood noise that can lead to false positives. This intelligent purification mechanism ensures that only the most relevant information is considered, thereby improving the model's understanding of the network structure. Feature weakening often plagues graph-based models due to the dilution of key signals. In order to further address the challenge of feature weakening, GNN-CL adopts reinforcement learning strategies. By dynamically adjusting the weights assigned to central nodes, it reinforces the importance of these influential entities to retain important clues of fraud even in less informative data. Experimental evaluations on Yelp datasets show that the results highlight the superior performance of GNN-CL compared to existing methods.

Updated: 2024-07-09 03:59:06

标题: 使用GNN-CL模型进行高级金融欺诈检测

摘要: 本文提出的创新GNN-CL模型在金融欺诈检测领域取得了突破，通过协同结合图神经网络（GNN）、卷积神经网络（CNN）和长短期记忆（LSTM）网络的优势。这种融合使得对复杂交易模式进行多方面分析，提高了检测准确性并增强了对复杂欺诈活动的弹性。本文的一个关键创新是利用多层感知器（MLPS）来估计节点相似性，有效地过滤可能导致误报的邻域噪音。这种智能的净化机制确保只考虑最相关的信息，从而提高模型对网络结构的理解。由于关键信号被稀释，特征削弱经常困扰基于图的模型。为了进一步解决特征削弱的挑战，GNN-CL采用了强化学习策略。通过动态调整分配给中心节点的权重，它强化了这些有影响力的实体的重要性，即使在信息较少的数据中也能保留欺诈重要线索。对Yelp数据集的实验评估结果显示，与现有方法相比，GNN-CL表现出更优越的性能。

更新时间: 2024-07-09 03:59:06

领域: cs.LG,q-fin.ST

下载: http://arxiv.org/abs/2407.06529v1

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

The trustworthy machine learning (ML) community is increasingly recognizing the crucial need for models capable of selectively 'unlearning' data points after training. This leads to the problem of machine unlearning (MU), aiming to eliminate the influence of chosen data points on model performance, while still maintaining the model's utility post-unlearning. Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting, ignoring the vital inquiry into which subset should be chosen to truly gauge the authenticity of unlearning performance. To tackle this issue, we introduce a new evaluative angle for MU from an adversarial viewpoint. We propose identifying the data subset that presents the most significant challenge for influence erasure, i.e., pinpointing the worst-case forget set. Utilizing a bi-level optimization principle, we amplify unlearning challenges at the upper optimization level to emulate worst-case scenarios, while simultaneously engaging in standard training and unlearning at the lower level, achieving a balance between data influence erasure and model utility. Our proposal offers a worst-case evaluation of MU's resilience and effectiveness. Through extensive experiments across different datasets (including CIFAR-10, 100, CelebA, Tiny ImageNet, and ImageNet) and models (including both image classifiers and generative models), we expose critical pros and cons in existing (approximate) unlearning strategies. Our results illuminate the complex challenges of MU in practice, guiding the future development of more accurate and robust unlearning algorithms. The code is available at https://github.com/OPTML-Group/Unlearn-WorstCase.

Updated: 2024-07-09 03:59:01

标题: 挑战遗忘：揭示机器遗忘中的最坏情况遗忘集

摘要: 机器学习（ML）社区越来越认识到，训练后能够有选择地“遗忘”数据点的模型的关键性需求。这导致了机器遗忘（MU）的问题，旨在消除选择的数据点对模型性能的影响，同时仍保持模型在遗忘后的实用性。尽管存在各种用于数据影响消除的MU方法，评估主要集中在随机数据遗忘上，忽视了对于真正衡量遗忘性能的子集应选择哪个的重要探究。为了解决这个问题，我们从对抗角度引入了一个新的MU评估角度。我们提出确定呈现对影响消除最大挑战的数据子集，即确定最坏情况遗忘集。利用双层优化原则，我们在上层优化级别上放大遗忘挑战，以模拟最坏情况，同时在下层进行标准训练和遗忘，实现数据影响消除和模型实用性之间的平衡。我们的提议为MU的韧性和有效性提供了最坏情况评估。通过在不同数据集（包括CIFAR-10、100、CelebA、Tiny ImageNet和ImageNet）和模型（包括图像分类器和生成模型）上进行大量实验，我们揭示了现有（近似）遗忘策略中的关键优缺点。我们的结果阐明了MU在实践中的复杂挑战，指导未来开发更准确和强大的遗忘算法。代码可在https://github.com/OPTML-Group/Unlearn-WorstCase上找到。

更新时间: 2024-07-09 03:59:01

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2403.07362v4

Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering

Reconstructing high-fidelity hand models with intricate textures plays a crucial role in enhancing human-object interaction and advancing real-world applications. Despite the state-of-the-art methods excelling in texture generation and image rendering, they often face challenges in accurately capturing geometric details. Learning-based approaches usually offer better robustness and faster inference, which tend to produce smoother results and require substantial amounts of training data. To address these issues, we present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details. Firstly, our approach predicts a parametric hand mesh model through Graph Convolutional Networks (GCN) based method from multi-view images. We further introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures, which is capable of preserving the mesh topology. In addition, we suggest an effective mesh-based neural rendering scheme to simultaneously generate photo-realistic image and optimize mesh geometry by fusing the pre-trained rendering network with vertex features. We conduct the comprehensive experiments on InterHand2.6M, DeepHandMesh and dataset collected by ourself, whose promising results show that our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality. Code and dataset are publicly available at https://github.com/agnJason/FMHR.

Updated: 2024-07-09 03:39:45

标题: 细粒度多视角手部重建技术的逆向渲染方法

摘要: 使用复杂纹理重建高保真手部模型在增强人-物体互动和推进现实世界应用中发挥关键作用。尽管最先进的方法在纹理生成和图像渲染方面表现出色，但它们往往面临准确捕捉几何细节的挑战。基于学习的方法通常提供更好的稳健性和更快的推理速度，往往产生更平滑的结果，并需要大量的训练数据。为了解决这些问题，我们提出了一种新颖的细粒度多视角手部网格重建方法，利用逆渲染来恢复手部姿势和复杂细节。首先，我们的方法通过基于图卷积网络（GCN）的方法从多视角图像中预测参数化手部网格模型。我们进一步引入了一种新颖的手部反照率和网格（HAM）优化模块，用于改进手部网格和纹理，能够保留网格拓扑。此外，我们提出了一种有效的基于网格的神经渲染方案，通过将预训练渲染网络与顶点特征融合，同时生成逼真的图像并优化网格几何。我们在InterHand2.6M、DeepHandMesh和我们自己收集的数据集上进行了全面的实验，令人满意的结果表明，我们提出的方法在重建精度和渲染质量方面优于最先进的方法。代码和数据集可以在https://github.com/agnJason/FMHR 上公开获取。

更新时间: 2024-07-09 03:39:45

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.05680v2

Exploring Sentence Type Effects on the Lombard Effect and Intelligibility Enhancement: A Comparative Study of Natural and Grid Sentences

This study explores how sentence types affect the Lombard effect and intelligibility enhancement, focusing on comparisons between natural and grid sentences. Using the Lombard Chinese-TIMIT (LCT) corpus and the Enhanced MAndarin Lombard Grid (EMALG) corpus, we analyze changes in phonetic and acoustic features across different noise levels. Our results show that grid sentences produce more pronounced Lombard effects than natural sentences. Then, we develop and test a normal-to-Lombard conversion model, trained separately on LCT and EMALG corpora. Through subjective and objective evaluations, natural sentences are superior in maintaining speech quality in intelligibility enhancement. In contrast, grid sentences could provide superior intelligibility due to the more pronounced Lombard effect. This study provides a valuable perspective on enhancing speech communication in noisy environments.

Updated: 2024-07-09 03:32:54

标题: 探究句子类型对隆巴德效应和可理解性增强的影响：自然句和网格句的比较研究

摘要: 本研究探讨了句子类型如何影响隆巴德效应和提高可理解性，重点比较了自然句和网格句之间的差异。利用隆巴德中文-TIMIT（LCT）语料库和增强普通话隆巴德网格（EMALG）语料库，我们分析了不同噪音水平下的语音和声学特征变化。我们的结果显示，网格句比自然句产生更明显的隆巴德效应。然后，我们开发并测试了一个分别在LCT和EMALG语料库上训练的正常到隆巴德转换模型。通过主观和客观评估，我们发现自然句在提高可理解性时能够保持语音质量更好。相比之下，网格句则可能由于更明显的隆巴德效应而提供更高的可理解性。本研究为在嘈杂环境中增强语音交流提供了宝贵的视角。

更新时间: 2024-07-09 03:32:54

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2309.10485v2

Graph Neural Networks and Deep Reinforcement Learning Based Resource Allocation for V2X Communications

In the rapidly evolving landscape of Internet of Vehicles (IoV) technology, Cellular Vehicle-to-Everything (C-V2X) communication has attracted much attention due to its superior performance in coverage, latency, and throughput. Resource allocation within C-V2X is crucial for ensuring the transmission of safety information and meeting the stringent requirements for ultra-low latency and high reliability in Vehicle-to-Vehicle (V2V) communication. This paper proposes a method that integrates Graph Neural Networks (GNN) with Deep Reinforcement Learning (DRL) to address this challenge. By constructing a dynamic graph with communication links as nodes and employing the Graph Sample and Aggregation (GraphSAGE) model to adapt to changes in graph structure, the model aims to ensure a high success rate for V2V communication while minimizing interference on Vehicle-to-Infrastructure (V2I) links, thereby ensuring the successful transmission of V2V link information and maintaining high transmission rates for V2I links. The proposed method retains the global feature learning capabilities of GNN and supports distributed network deployment, allowing vehicles to extract low-dimensional features that include structural information from the graph network based on local observations and to make independent resource allocation decisions. Simulation results indicate that the introduction of GNN, with a modest increase in computational load, effectively enhances the decision-making quality of agents, demonstrating superiority to other methods. This study not only provides a theoretically efficient resource allocation strategy for V2V and V2I communications but also paves a new technical path for resource management in practical IoV environments.

Updated: 2024-07-09 03:14:11

标题: 基于图神经网络和深度强化学习的V2X通信资源分配

摘要: 在不断发展的物联网车辆技术领域中，由于其在覆盖范围、延迟和吞吐量方面的卓越性能，蜂窝车辆对一切（C-V2X）通信备受关注。C-V2X内部的资源分配对于确保安全信息传输和满足车辆对车辆（V2V）通信中极低延迟和高可靠性要求至关重要。本文提出了一种将图神经网络（GNN）与深度强化学习（DRL）结合以解决这一挑战的方法。通过构建一个以通信链接为节点的动态图，并采用Graph Sample和Aggregation（GraphSAGE）模型以适应图结构的变化，该模型旨在确保V2V通信的高成功率同时最小化对车辆对基础设施（V2I）链接的干扰，从而确保成功传输V2V链接信息并保持V2I链接的高传输速率。所提出的方法保留了GNN的全局特征学习能力，并支持分布式网络部署，允许车辆根据本地观察从图网络中提取包括结构信息在内的低维特征，并做出独立的资源分配决策。模拟结果表明，引入GNN，虽然会略微增加计算负载，但有效提高了代理的决策质量，表现优于其他方法。这项研究不仅为V2V和V2I通信提供了一个在理论上高效的资源分配策略，同时为实际物联网车辆环境中的资源管理开辟了一条新的技术路径。

更新时间: 2024-07-09 03:14:11

领域: cs.LG,cs.NI

下载: http://arxiv.org/abs/2407.06518v1

Medication Recommendation via Dual Molecular Modalities and Multi-Substructure Enhancement

Medication recommendation combines patient medical history with biomedical knowledge to assist doctors in determining medication combinations more accurately and safely. Existing works based on molecular knowledge neglect the 3D geometric structure of molecules and fail to learn the high-dimensional information of medications, leading to structural confusion. Additionally, it does not extract key substructures from a single patient visit, resulting in the failure to identify medication molecules suitable for the current patient visit. To address the above limitations, we propose a bimodal molecular recommendation framework named BiMoRec, which introduces 3D molecular structures to obtain atomic 3D coordinates and edge indices, overcoming the inherent lack of high-dimensional molecular information in 2D molecular structures. To retain the fast training and prediction efficiency of the recommendation system, we use bimodal graph contrastive pretraining to maximize the mutual information between the two molecular modalities, achieving the fusion of 2D and 3D molecular graphs and re-evaluating substructures at the visit level. Specifically, we use deep learning networks to construct a pretraining method that acquires 2D and 3D molecular structure representations and substructure representations, and obtain mutual information through contrastive learning. We then generate fused molecular representations using the trained GNN module and re-determine the relevance of substructure representations in combination with the patient's clinical history. Finally, we generate the final medication combination based on the extracted substructure sequences. Our implementation on the MIMIC-III and MIMIC-IV datasets demonstrates that our method achieves state-of-the-art performance. Compared to the second-best baseline, our model improves accuracy by 2.07%, with DDI at the same level as the baseline.

Updated: 2024-07-09 03:13:12

标题: 通过双分子模态和多次结构增强进行药物推荐

摘要: 药物推荐结合患者病史和生物医学知识，以帮助医生更准确和安全地确定药物组合。现有基于分子知识的作品忽略了分子的3D几何结构，未能学习药物的高维信息，导致结构混淆。此外，它没有从单次患者就诊中提取关键亚结构，导致未能识别适合当前患者就诊的药物分子。为了解决上述限制，我们提出了一种名为BiMoRec的双模分子推荐框架，引入3D分子结构以获取原子3D坐标和边索引，克服了2D分子结构中高维分子信息的固有缺乏。为了保留推荐系统的快速训练和预测效率，我们使用双模图对比预训练来最大化两种分子模态之间的互信息，实现2D和3D分子图的融合，重新评估就诊级别的亚结构。具体来说，我们使用深度学习网络构建一个预训练方法，获取2D和3D分子结构表示和亚结构表示，并通过对比学习获得互信息。然后使用经过训练的GNN模块生成融合的分子表示，并结合患者的临床历史重新确定亚结构表示的相关性。最后，基于提取的亚结构序列生成最终的药物组合。我们在MIMIC-III和MIMIC-IV数据集上的实现表明，我们的方法实现了最先进的性能。与第二优秀的基准相比，我们的模型将准确性提高了2.07%，DDI与基准水平相同。

更新时间: 2024-07-09 03:13:12

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2405.20358v2

Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

Trustworthy machine learning necessitates meticulous regulation of model reliance on non-robust features. We propose a framework to delineate and regulate such features by attributing model predictions to the input. Within our approach, robust feature attributions exhibit a certain consistency, while non-robust feature attributions are susceptible to fluctuations. This behavior allows identification of correlation between model reliance on non-robust features and smoothness of marginal density of the input samples. Hence, we uniquely regularize the gradients of the marginal density w.r.t. the input features for robustness. We also devise an efficient implementation of our regularization to address the potential numerical instability of the underlying optimization process. Moreover, we analytically reveal that, as opposed to our marginal density smoothing, the prevalent input gradient regularization smoothens conditional or joint density of the input, which can cause limited robustness. Our experiments validate the effectiveness of the proposed method, providing clear evidence of its capability to address the feature leakage problem and mitigate spurious correlations. Extensive results further establish that our technique enables the model to exhibit robustness against perturbations in pixel values, input gradients, and density.

Updated: 2024-07-09 03:09:41

标题: 通过平滑输入边际密度调节模型对非鲁棒特征的依赖

摘要: 值得信赖的机器学习需要对模型对非稳健特征的依赖进行细致的调节。我们提出了一个框架来描述和调节这些特征，通过将模型预测归因于输入。在我们的方法中，稳健特征的归因表现出一定的一致性，而非稳健特征的归因容易波动。这种行为允许识别模型对非稳健特征的依赖与输入样本的边缘密度平滑性之间的相关性。因此，我们通过对输入特征的边缘密度梯度进行独特的正则化来提高稳健性。我们还设计了一个高效的实现方法来解决基础优化过程中潜在的数值不稳定性问题。此外，我们从分析上揭示，与我们的边缘密度平滑相反，普遍的输入梯度正则化会平滑输入的条件或联合密度，这可能导致稳健性受限。我们的实验验证了所提出方法的有效性，清晰地证明了其解决特征泄露问题和减轻伪相关性的能力。广泛的结果进一步证实了我们的技术使模型能够对像素值、输入梯度和密度的扰动表现出稳健性。

更新时间: 2024-07-09 03:09:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.04370v2

Language Models can Evaluate Themselves via Probability Discrepancy

In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. This approach obviates the necessity for an additional evaluation model or the dependence on external, proprietary models like GPT-4 for judgment. It uniquely utilizes the LLMs being tested to compute the probability discrepancy between the initial response and its revised versions. A higher discrepancy for a given query between two LLMs indicates a relatively weaker capability. Our findings reveal that ProbDiff achieves results on par with those obtained from evaluations based on GPT-4, spanning a range of scenarios that include natural language generation (NLG) tasks such as translation, summarization, and our proposed Xiaohongshu blog writing task, and benchmarks for LLM evaluation like AlignBench, MT-Bench, and AlpacaEval, across LLMs of varying magnitudes.

Updated: 2024-07-09 02:56:14

标题: 语言模型可以通过概率差异来评估自身

摘要: 在这篇论文中，我们首先通过展示大型语言模型（LLMs）在回答查询时，如果它们更熟练，相较于不熟练的模型，会显示出更均匀的概率分布来开始我们的讨论。在这个基础性见解的基础上，我们提出了一种新的自我评估方法ProbDiff，用于评估各种LLMs的有效性。这种方法消除了对额外评估模型或对外部专有模型（如GPT-4）的依赖。它独特地利用正在测试的LLMs计算初始响应和其修订版本之间的概率差异。在两个LLMs之间对于给定查询的较高差异表示相对较弱的能力。我们的研究发现，ProbDiff实现了与基于GPT-4的评估相当的结果，涵盖了包括自然语言生成（NLG）任务（如翻译、摘要和我们提出的小红书博客写作任务）以及LLM评估基准（如AlignBench、MT-Bench和AlpacaEval）在不同规模的LLMs中的各种情景。

更新时间: 2024-07-09 02:56:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.10516v2

LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes and high-precision ground truth labels. To address this issue, we propose a multi-task, multi-scene, and multi-label lunar benchmark dataset LuSNAR. This dataset can be used for comprehensive evaluation of autonomous perception and navigation systems, including high-resolution stereo image pairs, panoramic semantic labels, dense depth maps, LiDAR point clouds, and the position of rover. In order to provide richer scene data, we built 9 lunar simulation scenes based on Unreal Engine. Each scene is divided according to topographic relief and the density of objects. To verify the usability of the dataset, we evaluated and analyzed the algorithms of semantic segmentation, 3D reconstruction, and autonomous navigation. The experiment results prove that the dataset proposed in this paper can be used for ground verification of tasks such as autonomous environment perception and navigation, and provides a lunar benchmark dataset for testing the accessibility of algorithm metrics. We make LuSNAR publicly available at: https://github.com/autumn999999/LuSNAR-dataset.

Updated: 2024-07-09 02:47:58

标题: LuSNAR：基于多传感器的月球分割、导航和重建数据集，用于自主探索

摘要: 随着月球探测任务的复杂性增加，月球需要具备更高水平的自主性。环境感知和导航算法是月球车实现自主探索的基础。算法的开发和验证需要高度可靠的数据支持。现有的大部分月球数据集针对单一任务，缺乏多样化场景和高精度地面真实标签。为解决这一问题，我们提出了一个多任务、多场景和多标签的月球基准数据集LuSNAR。该数据集可用于全面评估自主感知和导航系统，包括高分辨率立体图像对、全景语义标签、密集深度图、LiDAR点云和月球车位置。为提供更丰富的场景数据，我们基于虚幻引擎构建了9个月球模拟场景。每个场景根据地形起伏和物体密度进行划分。为验证数据集的可用性，我们评估和分析了语义分割、3D重建和自主导航算法。实验结果证明了本文提出的数据集可用于对自主环境感知和导航等任务进行地面验证，并为测试算法指标可访问性提供了一个月球基准数据集。我们将LuSNAR公开发布在：https://github.com/autumn999999/LuSNAR-dataset。

更新时间: 2024-07-09 02:47:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.06512v1

FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering

Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative training of machine learning models over decentralized devices without exposing their local data. One of the major challenges in FL is the presence of uneven data distributions across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. To address the performance degradation issue incurred by such data heterogeneity, clustered federated learning (CFL) shows its promise by grouping clients into separate learning clusters based on the similarity of their local data distributions. However, state-of-the-art CFL approaches require a large number of communication rounds to learn the distribution similarities during training until the formation of clusters is stabilized. Moreover, some of these algorithms heavily rely on a predefined number of clusters, thus limiting their flexibility and adaptability. In this paper, we propose {\em FedClust}, a novel approach for CFL that leverages the correlation between local model weights and the data distribution of clients. {\em FedClust} groups clients into clusters in a one-shot manner by measuring the similarity degrees among clients based on the strategically selected partial weights of locally trained models. We conduct extensive experiments on four benchmark datasets with different non-IID data settings. Experimental results demonstrate that {\em FedClust} achieves higher model accuracy up to $\sim$45\% as well as faster convergence with a significantly reduced communication cost up to 2.7$\times$ compared to its state-of-the-art counterparts.

Updated: 2024-07-09 02:47:16

标题: FedClust: 通过权重驱动的客户端聚类解决联邦学习中的数据异质性

摘要: 联邦学习（FL）是一种新兴的分布式机器学习范式，它能够在去中心化设备上协同训练机器学习模型，而不暴露它们的本地数据。FL面临的主要挑战之一是客户设备之间存在不均匀的数据分布，违反了传统机器学习中独立同分布（IID）训练样本的假设。为了解决由于数据异质性引起的性能下降问题，集群化联邦学习（CFL）通过根据本地数据分布的相似性将客户端分组成不同的学习集群展现出其潜力。然而，最先进的CFL方法在训练过程中需要大量的通信轮次来学习分布的相似性，直到集群的形成稳定。此外，一些算法严重依赖于预定义的集群数量，从而限制了它们的灵活性和适应性。本文提出了一种新颖的CFL方法FedClust，它利用本地模型权重和客户数据分布之间的相关性。FedClust通过测量基于局部训练模型的部分权重来评估客户端之间的相似度，以一次性方式将客户端分组成集群。我们在四个基准数据集上进行了大量实验，这些数据集具有不同的非IID数据设置。实验结果表明，与最先进的对手相比，FedClust实现了高达45％的模型准确性以及更快的收敛速度，通信成本显著降低，最多可降低2.7倍。

更新时间: 2024-07-09 02:47:16

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.07124v1

Economic span selection of bridge based on deep reinforcement learning

Deep Q-network algorithm is used to select economic span of bridge. Selection of bridge span has a significant impact on the total cost of bridge, and a reasonable selection of span can reduce engineering cost. Economic span of bridge is theoretically analyzed, and the theoretical solution formula of economic span is deduced. Construction process of bridge simulation environment is described in detail, including observation space, action space and reward function of the environment. Agent is constructed, convolutional neural network is used to approximate Q function,{\epsilon} greedy policy is used for action selection, and experience replay is used for training. The test verifies that the agent can successfully learn optimal policy and realize economic span selection of bridge. This study provides a potential decision-making tool for bridge design.

Updated: 2024-07-09 02:27:52

标题: 基于深度强化学习的桥梁经济跨度选择

摘要: 深度Q网络算法被用于选择桥梁的经济跨度。桥梁跨度的选择对桥梁的总成本有重要影响，合理的跨度选择可以降低工程成本。对桥梁的经济跨度进行了理论分析，推导出了经济跨度的理论解决方案公式。详细描述了桥梁仿真环境的构建过程，包括环境的观察空间、行动空间和奖励函数。构建了代理，使用卷积神经网络来逼近Q函数，使用ε贪心策略进行行动选择，使用经验重播进行训练。测试验证了代理能够成功学习最优策略，并实现桥梁的经济跨度选择。这项研究为桥梁设计提供了潜在的决策工具。

更新时间: 2024-07-09 02:27:52

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.06507v1

Human Brain Exhibits Distinct Patterns When Listening to Fake Versus Real Audio: Preliminary Evidence

In this paper we study the variations in human brain activity when listening to real and fake audio. Our preliminary results suggest that the representations learned by a state-of-the-art deepfake audio detection algorithm, do not exhibit clear distinct patterns between real and fake audio. In contrast, human brain activity, as measured by EEG, displays distinct patterns when individuals are exposed to fake versus real audio. This preliminary evidence enables future research directions in areas such as deepfake audio detection.

Updated: 2024-07-09 02:21:21

标题: 人类大脑在听取虚构与真实音频时展现出明显的模式：初步证据

摘要: 在这篇论文中，我们研究了人类大脑在听真实和伪造音频时的活动变化。我们的初步结果表明，由最先进的深度伪造音频检测算法学习到的表示在真实音频和伪造音频之间并没有明显的不同模式。相反，通过脑电图测量的人类大脑活动在个体暴露于伪造音频和真实音频时显示出明显的模式。这些初步证据为未来在深度伪造音频检测等领域的研究方向提供了启示。

更新时间: 2024-07-09 02:21:21

领域: cs.SD,cs.LG,eess.AS,q-bio.NC

下载: http://arxiv.org/abs/2402.14982v3

Power-Enhanced Residual Network for Function Approximation and Physics-Informed Inverse Problems

In this study, we investigate how the updating of weights during forward operation and the computation of gradients during backpropagation impact the optimization process, training procedure, and overall performance of the neural network, particularly the multi-layer perceptrons (MLPs). This paper introduces a novel neural network structure called the Power-Enhancing residual network, inspired by highway network and residual network, designed to improve the network's capabilities for both smooth and non-smooth functions approximation in 2D and 3D settings. By incorporating power terms into residual elements, the architecture enhances the stability of weight updating, thereby facilitating better convergence and accuracy. The study explores network depth, width, and optimization methods, showing the architecture's adaptability and performance advantages. Consistently, the results emphasize the exceptional accuracy of the proposed Power-Enhancing residual network, particularly for non-smooth functions. Real-world examples also confirm its superiority over plain neural network in terms of accuracy, convergence, and efficiency. Moreover, the proposed architecture is also applied to solving the inverse Burgers' equation, demonstrating superior performance. In conclusion, the Power-Enhancing residual network offers a versatile solution that significantly enhances neural network capabilities by emphasizing the importance of stable weight updates for effective training in deep neural networks. The codes implemented are available at: \url{https://github.com/CMMAi/ResNet_for_PINN}.

Updated: 2024-07-09 02:19:26

标题: 强化残差网络用于函数逼近和基于物理学的反问题

摘要: 在这项研究中，我们调查了前向操作中权重更新和反向传播中梯度计算对神经网络，特别是多层感知器（MLP）的优化过程、训练过程和整体性能的影响。本文介绍了一种新颖的神经网络结构，称为增强功率残差网络，灵感来源于高速公路网络和残差网络，旨在改善网络对2D和3D设置中平滑和非平滑函数逼近的能力。通过将功率项并入残差元素，该结构增强了权重更新的稳定性，从而促进更好的收敛和准确性。该研究探讨了网络深度、宽度和优化方法，展示了该结构的适应性和性能优势。结果一致强调了所提出的增强功率残差网络在非平滑函数方面的卓越准确性。现实世界的例子也证实了它在准确性、收敛性和效率方面优于普通神经网络。此外，所提出的结构还应用于解决反向Burgers'方程，展示了卓越的性能。总之，增强功率残差网络提供了一种多才多艺的解决方案，通过强调稳定权重更新对深度神经网络有效训练的重要性，显著增强了神经网络的能力。实施的代码可在以下网址获取：\url{https://github.com/CMMAi/ResNet_for_PINN}。

更新时间: 2024-07-09 02:19:26

领域: cs.LG,cs.CV,math.AP

下载: http://arxiv.org/abs/2310.15690v2

On the Robustness of Graph Reduction Against GNN Backdoor

Graph Neural Networks (GNNs) are gaining popularity across various domains due to their effectiveness in learning graph-structured data. Nevertheless, they have been shown to be susceptible to backdoor poisoning attacks, which pose serious threats to real-world applications. Meanwhile, graph reduction techniques, including coarsening and sparsification, which have long been employed to improve the scalability of large graph computational tasks, have recently emerged as effective methods for accelerating GNN training on large-scale graphs. However, the current development and deployment of graph reduction techniques for large graphs overlook the potential risks of data poisoning attacks against GNNs. It is not yet clear how graph reduction interacts with existing backdoor attacks. This paper conducts a thorough examination of the robustness of graph reduction methods in scalable GNN training in the presence of state-of-the-art backdoor attacks. We performed a comprehensive robustness analysis across six coarsening methods and six sparsification methods for graph reduction, under three GNN backdoor attacks against three GNN architectures. Our findings indicate that the effectiveness of graph reduction methods in mitigating attack success rates varies significantly, with some methods even exacerbating the attacks. Through detailed analyses of triggers and poisoned nodes, we interpret our findings and enhance our understanding of how graph reduction influences robustness against backdoor attacks. These results highlight the critical need for incorporating robustness considerations in graph reduction for GNN training, ensuring that enhancements in computational efficiency do not compromise the security of GNN systems.

Updated: 2024-07-09 02:11:47

标题: 关于图缩减对抗GNN后门的稳健性

摘要: 图神经网络（GNNs）由于在学习图结构数据方面的有效性而在各个领域中日益受到欢迎。然而，已经证明它们容易受到后门毒化攻击的影响，这对现实世界的应用构成了严重威胁。与此同时，包括粗化和稀疏化在内的图减少技术长期以来一直被用于提高大规模图计算任务的可扩展性，最近也被证明是加速大规模图上GNN训练的有效方法。然而，目前针对大型图的图减少技术的开发和部署忽视了对GNN进行数据毒化攻击的潜在风险。目前还不清楚图减少如何与现有的后门攻击相互作用。本文对在最先进的后门攻击存在的情况下，图减少方法在可扩展的GNN训练中的稳健性进行了彻底的研究。我们对六种粗化方法和六种稀疏化方法进行了全面的稳健性分析，针对三种GNN架构进行了三种GNN后门攻击。我们的研究结果表明，图减少方法在减少攻击成功率方面的有效性差异显著，有些方法甚至加剧了攻击。通过对触发器和被毒化节点的详细分析，我们解释了研究结果，增进了我们对图减少如何影响对抗后门攻击的稳健性的理解。这些结果强调了在图减少用于GNN训练时纳入稳健性考虑的迫切需要，确保计算效率的提高不会 compromise GNN 系统的安全性。

更新时间: 2024-07-09 02:11:47

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.02431v2

Preference-Guided Reinforcement Learning for Efficient Exploration

In this paper, we investigate preference-based reinforcement learning (PbRL) that allows reinforcement learning (RL) agents to learn from human feedback. This is particularly valuable when defining a fine-grain reward function is not feasible. However, this approach is inefficient and impractical for promoting deep exploration in hard-exploration tasks with long horizons and sparse rewards. To tackle this issue, we introduce LOPE: Learning Online with trajectory Preference guidancE, an end-to-end preference-guided RL framework that enhances exploration efficiency in hard-exploration tasks. Our intuition is that LOPE directly adjusts the focus of online exploration by considering human feedback as guidance, avoiding learning a separate reward model from preferences. Specifically, LOPE includes a two-step sequential policy optimization process consisting of trust-region-based policy improvement and preference guidance steps. We reformulate preference guidance as a novel trajectory-wise state marginal matching problem that minimizes the maximum mean discrepancy distance between the preferred trajectories and the learned policy. Furthermore, we provide a theoretical analysis to characterize the performance improvement bound and evaluate the LOPE's effectiveness. When assessed in various challenging hard-exploration environments, LOPE outperforms several state-of-the-art methods regarding convergence rate and overall performance. The code used in this study is available at \url{https://github.com/buaawgj/LOPE}.

Updated: 2024-07-09 02:11:12

标题: 偏好引导的强化学习用于高效探索

摘要: 在这篇论文中，我们研究了基于偏好的强化学习（PbRL），允许强化学习（RL）代理从人类反馈中学习。这在定义细粒度奖励函数不可行时特别有价值。然而，这种方法在长期和稀疏奖励的困难探索任务中促进深度探索是低效和不切实际的。为了解决这个问题，我们提出了LOPE：带有轨迹偏好引导的在线学习，一种端到端的偏好引导RL框架，提高了在困难探索任务中的探索效率。我们的直觉是，LOPE通过将人类反馈作为指导直接调整在线探索的焦点，避免从偏好中学习单独的奖励模型。具体来说，LOPE包括一个两步序贯策略优化过程，包括基于信任区域的策略改进和偏好引导步骤。我们重新制定偏好引导作为一个新颖的轨迹状态边际匹配问题，最小化首选轨迹和学习策略之间的最大平均差距距离。此外，我们提供了一个理论分析来表征性能改进的界限，并评估了LOPE的有效性。在各种具有挑战性的困难探索环境中进行评估时，LOPE在收敛速度和整体性能方面优于几种最先进的方法。本研究中使用的代码可在\url{https://github.com/buaawgj/LOPE}上找到。

更新时间: 2024-07-09 02:11:12

领域: cs.LG

下载: http://arxiv.org/abs/2407.06503v1

STORYSUMM: Evaluating Faithfulness in Story Summarization

Human evaluation has been the gold standard for checking faithfulness in abstractive summarization. However, with a challenging source domain like narrative, multiple annotators can agree a summary is faithful, while missing details that are obvious errors only once pointed out. We therefore introduce a new dataset, STORYSUMM, comprising LLM summaries of short stories with localized faithfulness labels and error explanations. This benchmark is for evaluation methods, testing whether a given method can detect challenging inconsistencies. Using this dataset, we first show that any one human annotation protocol is likely to miss inconsistencies, and we advocate for pursuing a range of methods when establishing ground truth for a summarization dataset. We finally test recent automatic metrics and find that none of them achieve more than 70% balanced accuracy on this task, demonstrating that it is a challenging benchmark for future work in faithfulness evaluation.

Updated: 2024-07-09 02:06:30

标题: STORYSUMM：评估故事摘要的忠实性

摘要: 人类评估一直是检查抽象摘要忠实度的黄金标准。然而，在像叙述这样具有挑战性的源领域中，多个注释者可以同意摘要是忠实的，同时忽略明显错误的细节，只有在指出后才能发现。因此，我们引入了一个新的数据集STORYSUMM，其中包括对短篇故事的LLM摘要，具有本地化的忠实度标签和错误解释。这一基准用于评估方法，测试一个给定方法是否能检测具有挑战性的不一致性。利用这个数据集，我们首先展示了任何一个人类注释协议很可能会忽略不一致性，并倡导在建立摘要数据集的基本事实时追求一系列方法。最后，我们测试了最近的自动度量标准，发现它们在这项任务上没有一个能达到70%以上的平衡准确率，这表明这是一个具有挑战性的基准，用于未来在忠实度评估方面的研究工作。

更新时间: 2024-07-09 02:06:30

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.06501v1

MetaCOG: A Hierarchical Probabilistic Model for Learning Meta-Cognitive Visual Representations

Humans have the capacity to question what we see and to recognize when our vision is unreliable (e.g., when we realize that we are experiencing a visual illusion). Inspired by this capacity, we present MetaCOG: a hierarchical probabilistic model that can be attached to a neural object detector to monitor its outputs and determine their reliability. MetaCOG achieves this by learning a probabilistic model of the object detector's performance via Bayesian inference -- i.e., a meta-cognitive representation of the network's propensity to hallucinate or miss different object categories. Given a set of video frames processed by an object detector, MetaCOG performs joint inference over the underlying 3D scene and the detector's performance, grounding inference on a basic assumption of object permanence. Paired with three neural object detectors, we show that MetaCOG accurately recovers each detector's performance parameters and improves the overall system's accuracy. We additionally show that MetaCOG is robust to varying levels of error in object detector outputs, showing proof-of-concept for a novel approach to the problem of detecting and correcting errors in vision systems when ground-truth is not available.

Updated: 2024-07-09 02:02:24

标题: MetaCOG：一种用于学习元认知视觉表示的分层概率模型

摘要: 人类有能力质疑我们所看到的，并且能够识别我们的视觉何时不可靠（例如，当我们意识到我们正在经历一种视觉错觉时）。受到这种能力的启发，我们提出了MetaCOG：一种可以附加到神经目标检测器上以监控其输出并确定其可靠性的分层概率模型。MetaCOG通过学习目标检测器性能的概率模型来实现这一点，通过贝叶斯推断，即网络倾向于产生幻觉或错过不同对象类别的元认知表示。给定由目标检测器处理的一组视频帧，MetaCOG在基本假设的对象永恒性上进行了基于底层3D场景和检测器性能的联合推断。配合三个神经目标检测器，我们展示了MetaCOG准确地恢复了每个检测器的性能参数，并提高了整个系统的准确性。我们另外展示了MetaCOG对于不同级别的目标检测器输出错误的鲁棒性，展示了在没有地面真相的情况下检测和纠正视觉系统错误的新方法的概念验证。

更新时间: 2024-07-09 02:02:24

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2110.03105v4

It's Our Loss: No Privacy Amplification for Hidden State DP-SGD With Non-Convex Loss

Differentially Private Stochastic Gradient Descent (DP-SGD) is a popular iterative algorithm used to train machine learning models while formally guaranteeing the privacy of users. However the privacy analysis of DP-SGD makes the unrealistic assumption that all intermediate iterates (aka internal state) of the algorithm are released since in practice, only the final trained model, i.e., the final iterate of the algorithm is released. In this hidden state setting, prior work has provided tighter analyses, albeit only when the loss function is constrained, e.g., strongly convex and smooth or linear. On the other hand, the privacy leakage observed empirically from hidden state DP-SGD, even when using non-convex loss functions suggest that there is in fact a gap between the theoretical privacy analysis and the privacy guarantees achieved in practice. Therefore, it remains an open question whether privacy amplification for DP-SGD is possible in the hidden state setting for general loss functions. Unfortunately, this work answers the aforementioned research question negatively. By carefully constructing a loss function for DP-SGD, we show that for specific loss functions, the final iterate of DP-SGD alone leaks as much information as the sequence of all iterates combined. Furthermore, we empirically verify this result by evaluating the privacy leakage from the final iterate of DP-SGD with our loss function and show that this matches the theoretical upper bound guaranteed by DP exactly. Therefore, we show that the current privacy analysis fo DP-SGD is tight for general loss functions and conclude that no privacy amplification is possible for DP-SGD in general for all (possibly non-convex) loss functions.

Updated: 2024-07-09 01:58:19

标题: 这个标题的翻译是：这是我们的损失：非凸损失的隐藏状态差分私隐梯度下降算法中无隐私放大效应

摘要: 差分隐私随机梯度下降（DP-SGD）是一种流行的迭代算法，用于训练机器学习模型，同时正式保证用户的隐私。然而，DP-SGD的隐私分析做出了不切实际的假设，即算法的所有中间迭代（也称为内部状态）都被公开，而在实践中，只有最终训练的模型，即算法的最终迭代被公开。在这种隐藏状态设置下，先前的研究提供了更紧密的分析，尽管仅当损失函数受限时，例如，强凸和平滑或线性。另一方面，即使使用非凸损失函数，从隐藏状态DP-SGD中实际观察到的隐私泄漏表明，实际上存在理论隐私分析与实践中实现的隐私保证之间存在差距。因此，对于通用损失函数，在隐藏状态设置中是否可能对DP-SGD进行隐私放大仍然是一个悬而未决的问题。不幸的是，这项工作否定了上述研究问题。通过精心构造DP-SGD的损失函数，我们展示了对于特定的损失函数，DP-SGD的最终迭代单独泄漏的信息量与所有迭代序列的组合相同。此外，通过评估我们的损失函数的DP-SGD的最终迭代中的隐私泄漏，我们通过实验证实了这一结果，并展示这与DP确切保证的理论上限匹配。因此，我们展示了当前对于通用损失函数的DP-SGD的隐私分析是紧密的，并得出结论，对于所有（可能是非凸的）损失函数，通常情况下不可能对DP-SGD进行隐私放大。

更新时间: 2024-07-09 01:58:19

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.06496v1

A Generative Approach to Control Complex Physical Systems

Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce Diffusion Physical systems Control (DiffPhyCon), a new class of method to address the physical systems control problem. DiffPhyCon excels by simultaneously minimizing both the learned generative energy function and the predefined control objectives across the entire trajectory and control sequence. Thus, it can explore globally and identify near-optimal control sequences. Moreover, we enhance DiffPhyCon with prior reweighting, enabling the discovery of control sequences that significantly deviate from the training distribution. We test our method in 1D Burgers' equation and 2D jellyfish movement control in a fluid environment. Our method outperforms widely applied classical approaches and state-of-the-art deep learning and reinforcement learning methods. Notably, DiffPhyCon unveils an intriguing fast-close-slow-open pattern observed in the jellyfish, aligning with established findings in the field of fluid dynamics.

Updated: 2024-07-09 01:56:23

标题: 一个生成式方法来控制复杂的物理系统

摘要: 控制复杂物理系统的演变是科学和工程领域的一项基本任务。传统技术存在适用性有限或计算成本巨大的问题。另一方面，最近基于深度学习和强化学习的方法常常在系统动力学约束下难以优化长期控制序列。在这项工作中，我们引入了扩散物理系统控制（DiffPhyCon），这是一种新的方法类别，用于解决物理系统控制问题。DiffPhyCon通过同时最小化学习到的生成能量函数和整个轨迹和控制序列上的预定义控制目标来实现优越性能。因此，它可以全局探索并确定接近最优的控制序列。此外，我们通过先验重新加权增强了DiffPhyCon，从而使其能够发现与训练分布显著偏离的控制序列。我们在一维Burgers方程和二维液体环境中的水母运动控制中测试了我们的方法。我们的方法优于广泛应用的传统方法和最先进的深度学习和强化学习方法。值得注意的是，DiffPhyCon揭示了水母中观察到的一个有趣的快速关闭慢开启模式，与流体动力学领域的已有研究结果相一致。

更新时间: 2024-07-09 01:56:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.06494v1

Efficient Stitchable Task Adaptation

The paradigm of pre-training and fine-tuning has laid the foundation for deploying deep learning models. However, most fine-tuning methods are designed to meet a specific resource budget. Recently, considering diverse deployment scenarios with various resource budgets, SN-Net is introduced to quickly obtain numerous new networks (stitches) from the pre-trained models (anchors) in a model family via model stitching. Although promising, SN-Net confronts new challenges when adapting it to new target domains, including huge memory and storage requirements and a long and sub-optimal multistage adaptation process. In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints. Specifically, we first tailor parameter-efficient fine-tuning to share low-rank updates among the stitches while maintaining independent bias terms. In this way, we largely reduce fine-tuning memory burdens and mitigate the interference among stitches that arises in task adaptation. Furthermore, we streamline a simple yet effective one-stage deployment pipeline, which estimates the important stitches to deploy with training-time gradient statistics. By assigning higher sampling probabilities to important stitches, we also get a boosted Pareto frontier. Extensive experiments on 25 downstream visual recognition tasks demonstrate that our ESTA is capable of generating stitches with smooth accuracy-efficiency trade-offs and surpasses the direct SN-Net adaptation by remarkable margins with significantly lower training time and fewer trainable parameters. Furthermore, we demonstrate the flexibility and scalability of our ESTA framework by stitching LLMs from LLaMA family, obtaining chatbot stitches of assorted sizes. Source code is available at https://github.com/ziplab/Stitched_LLaMA

Updated: 2024-07-09 01:54:18

标题: 高效可拼接的任务适应

摘要: 预训练和微调的范式为部署深度学习模型奠定了基础。然而，大多数微调方法都是为了满足特定的资源预算而设计的。最近，考虑到不同资源预算的各种部署场景，通过模型拼接从预训练模型（锚点）中快速获得大量新网络（拼接）的SN-Net被引入。尽管有前途，但在将其调整到新的目标领域时，SN-Net面临新的挑战，包括巨大的内存和存储需求以及漫长且次优的多阶段适应过程。在这项工作中，我们提出了一个新颖的框架，高效可拼接任务适应（ESTA），以高效地生成符合各种资源约束的精细调整模型。具体而言，我们首先定制参数高效的微调，以在拼接之间共享低秩更新，同时保持独立的偏置项。通过这种方式，我们大大减少了微调内存负担，并减轻了任务适应中出现的拼接之间的干扰。此外，我们简化了一个简单而有效的单阶段部署流水线，该流水线通过训练时梯度统计估计重要的拼接。通过为重要的拼接分配更高的采样概率，我们还获得了一个增强的帕累托前沿。对25个下游视觉识别任务的大量实验表明，我们的ESTA能够生成具有平滑准确性-效率权衡的拼接，并以显着较低的训练时间和较少的可训练参数大幅超越直接SN-Net适应。此外，我们通过从LLaMA系列中拼接LLMs，获得了各种大小的聊天机器人拼接，展示了我们的ESTA框架的灵活性和可扩展性。源代码可在https://github.com/ziplab/Stitched_LLaMA 上找到。

更新时间: 2024-07-09 01:54:18

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2311.17352v2

Long-Tail Learning with Rebalanced Contrastive Loss

Integrating supervised contrastive loss to cross entropy-based communication has recently been proposed as a solution to address the long-tail learning problem. However, when the class imbalance ratio is high, it requires adjusting the supervised contrastive loss to support the tail classes, as the conventional contrastive learning is biased towards head classes by default. To this end, we present Rebalanced Contrastive Learning (RCL), an efficient means to increase the long tail classification accuracy by addressing three main aspects: 1. Feature space balancedness - Equal division of the feature space among all the classes, 2. Intra-Class compactness - Reducing the distance between same-class embeddings, 3. Regularization - Enforcing larger margins for tail classes to reduce overfitting. RCL adopts class frequency-based SoftMax loss balancing to supervised contrastive learning loss and exploits scalar multiplied features fed to the contrastive learning loss to enforce compactness. We implement RCL on the Balanced Contrastive Learning (BCL) Framework, which has the SOTA performance. Our experiments on three benchmark datasets demonstrate the richness of the learnt embeddings and increased top-1 balanced accuracy RCL provides to the BCL framework. We further demonstrate that the performance of RCL as a standalone loss also achieves state-of-the-art level accuracy.

Updated: 2024-07-09 01:30:04

标题: 长尾学习与重新平衡对比损失

摘要: 最近，将监督对比损失集成到基于交叉熵的通信中被提出作为解决长尾学习问题的一种方法。然而，当类别不平衡比率较高时，需要调整监督对比损失以支持尾部类别，因为传统的对比学习默认偏向于头部类别。为此，我们提出了重新平衡对比学习（RCL），一种通过解决三个主要方面来提高长尾分类准确性的有效方法：1. 特征空间平衡性 - 将特征空间均匀分配给所有类别，2. 类内紧凑性 - 减小相同类别嵌入之间的距离，3. 正则化 - 为尾部类别强制设定更大的间隔以减少过拟合。RCL采用基于类别频率的SoftMax损失平衡到监督对比学习损失，并利用标量乘以的特征输入到对比学习损失中以强制实现紧凑性。我们在具有SOTA性能的平衡对比学习（BCL）框架上实施RCL。我们在三个基准数据集上的实验表明，学到的嵌入的丰富性以及RCL为BCL框架提供的增强的top-1平衡准确性。我们进一步展示了RCL作为独立损失的性能也达到了最先进的水平的准确性。

更新时间: 2024-07-09 01:30:04

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2312.01753v2

SeqLink: A Robust Neural-ODE Architecture for Modelling Partially Observed Time Series

Ordinary Differential Equations (ODE) based models have become popular as foundation models for solving many time series problems. Combining neural ODEs with traditional RNN models has provided the best representation for irregular time series. However, ODE-based models typically require the trajectory of hidden states to be defined based on either the initial observed value or the most recent observation, raising questions about their effectiveness when dealing with longer sequences and extended time intervals. In this article, we explore the behaviour of the ODE models in the context of time series data with varying degrees of sparsity. We introduce SeqLink, an innovative neural architecture designed to enhance the robustness of sequence representation. Unlike traditional approaches that solely rely on the hidden state generated from the last observed value, SeqLink leverages ODE latent representations derived from multiple data samples, enabling it to generate robust data representations regardless of sequence length or data sparsity level. The core concept behind our model is the definition of hidden states for the unobserved values based on the relationships between samples (links between sequences). Through extensive experiments on partially observed synthetic and real-world datasets, we demonstrate that SeqLink improves the modelling of intermittent time series, consistently outperforming state-of-the-art approaches.

Updated: 2024-07-09 01:29:36

标题: SeqLink：一种用于建模部分观测时间序列的健壮神经-ODE架构

摘要: 基于普通微分方程（ODE）的模型已成为解决许多时间序列问题的基础模型。将神经ODE与传统的RNN模型结合使用，为不规则时间序列提供了最佳表示。然而，基于ODE的模型通常要求隐藏状态的轨迹基于初始观察值或最近观察值来定义，这引发了它们在处理更长序列和扩展时间间隔时的有效性问题。在本文中，我们探讨ODE模型在具有不同稀疏度的时间序列数据的背景下的行为。我们引入了SeqLink，一种创新的神经架构，旨在增强序列表示的鲁棒性。与仅依赖于从最后观察值生成的隐藏状态的传统方法不同，SeqLink利用从多个数据样本中派生的ODE潜在表示，使其能够生成稳健的数据表示，无论序列长度或数据稀疏程度如何。我们模型背后的核心概念是基于样本之间的关系（序列之间的链接）定义未观察值的隐藏状态。通过对部分观察的合成和真实世界数据集的大量实验，我们证明了SeqLink改善了间歇性时间序列的建模，始终优于最先进的方法。

更新时间: 2024-07-09 01:29:36

领域: cs.LG

下载: http://arxiv.org/abs/2212.03560v2

Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons

While large language models (LLMs) have demonstrated superior multi-task capabilities, understanding the learning mechanisms behind this is still a challenging problem. In this paper, we attempt to understand such mechanisms from the perspective of neurons. Specifically, we detect task-sensitive neurons in LLMs via gradient attribution on task-specific data. Through extensive deactivation and fine-tuning experiments, we demonstrate that the detected neurons are highly correlated with the given task, which we term as task-specific neurons. With these identified task-specific neurons, we delve into two common problems in multi-task learning and continuous learning: Generalization and Catastrophic Forgetting. We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks. Interestingly, at certain layers of LLMs, there is a high similarity in the parameters of different task-specific neurons, and such similarity is highly correlated with the generalization performance. Inspired by these findings, we propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning, and extensive experiments demonstrate the effectiveness of the proposed method. Our study provides insights into the interpretability of LLMs in multi-task learning.

Updated: 2024-07-09 01:27:35

标题: 朝着理解LLMs多任务学习（泛化）：通过检测和探索任务特定神经元

摘要: 尽管大型语言模型（LLMs）已经证明具有优越的多任务能力，但理解其背后的学习机制仍然是一个具有挑战性的问题。本文试图从神经元的角度理解这种机制。具体来说，我们通过对特定任务数据的梯度归因来检测LLMs中的任务敏感神经元。通过大量的去激活和微调实验，我们证明了检测到的神经元与给定任务高度相关，我们称之为任务特定神经元。通过确定这些任务特定神经元，我们深入探讨了多任务学习和持续学习中的两个常见问题：泛化和灾难性遗忘。我们发现任务特定神经元的重叠与跨任务的泛化和专业化密切相关。有趣的是，在LLMs的某些层中，不同任务特定神经元的参数具有高度相似性，并且这种相似性与泛化性能高度相关。受到这些发现的启发，我们提出了一种神经元级别的持续微调方法，该方法在持续学习过程中只对当前任务特定神经元进行微调，并且广泛的实验证明了该方法的有效性。我们的研究为多任务学习中LLMs的可解释性提供了见解。

更新时间: 2024-07-09 01:27:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.06488v1

Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates

With the increasing demand for large-scale training of machine learning models, fully decentralized optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase is sensitive to stragglers. An efficient way to mitigate this effect is to consider asynchronous updates, where each worker computes stochastic gradients and communicates with other workers at its own pace. Unfortunately, fully asynchronous updates suffer from staleness of stragglers' parameters. To address these limitations, we propose a fully decentralized algorithm DSGD-AAU with adaptive asynchronous updates via adaptively determining the number of neighbor workers for each worker to communicate with. We show that DSGD-AAU achieves a linear speedup for convergence and demonstrate its effectiveness via extensive experiments.

Updated: 2024-07-09 01:23:59

标题: 通过自适应异步更新实现抗拖后效应的分散式学习

摘要: 随着对机器学习模型大规模训练需求的增加，最近提倡了完全去中心化优化方法作为流行的参数服务器框架的替代方案。在这种范式中，每个工作者维护最佳参数向量的本地估计，并通过等待和平均所有从其邻居那里获得的估计来迭代更新它，然后根据其本地数据集对其进行校正。然而，同步阶段对于慢工作者是敏感的。减轻这种影响的一种高效方式是考虑异步更新，其中每个工作者以自己的速度计算随机梯度并与其他工作者通信。不幸的是，完全异步更新会受到慢工作者参数的陈旧性的影响。为了解决这些限制，我们提出了一种全面去中心化算法DSGD-AAU，通过自适应确定每个工作者与邻居工作者通信的数量，实现自适应异步更新。我们展示DSGD-AAU实现了收敛速度的线性加速，并通过大量实验证明了其有效性。

更新时间: 2024-07-09 01:23:59

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2306.06559v2

Optimal Decision Making Through Scenario Simulations Using Large Language Models

The rapid evolution of Large Language Models (LLMs) has markedly expanded their application across diverse domains, transforming how complex problems are approached and solved. Initially conceived to predict subsequent words in texts, these models have transcended their original design to comprehend and respond to the underlying contexts of queries. Today, LLMs routinely perform tasks that once seemed formidable, such as writing essays, poems, stories, and even developing software code. As their capabilities continue to grow, so too do the expectations of their performance in even more sophisticated domains. Despite these advancements, LLMs still encounter significant challenges, particularly in scenarios requiring intricate decision-making, such as planning trips or choosing among multiple viable options. These tasks often demand a nuanced understanding of various outcomes and the ability to predict the consequences of different choices, which are currently outside the typical operational scope of LLMs. This paper proposes an innovative approach to bridge this capability gap. By enabling LLMs to request multiple potential options and their respective parameters from users, our system introduces a dynamic framework that integrates an optimization function within the decision-making process. This function is designed to analyze the provided options, simulate potential outcomes, and determine the most advantageous solution based on a set of predefined criteria. By harnessing this methodology, LLMs can offer tailored, optimal solutions to complex, multi-variable problems, significantly enhancing their utility and effectiveness in real-world applications. This approach not only expands the functional envelope of LLMs but also paves the way for more autonomous and intelligent systems capable of supporting sophisticated decision-making tasks.

Updated: 2024-07-09 01:23:09

标题: 使用大型语言模型进行情景模拟的最佳决策制定

摘要: 大型语言模型（LLMs）的快速发展显著扩展了它们在各个领域的应用，改变了对复杂问题的处理和解决方式。最初被设计用于预测文本中的后续单词，这些模型已超越其原始设计，能够理解和回应查询的基本环境。如今，LLMs经常执行曾经看似艰巨的任务，如写作文章、诗歌、故事，甚至开发软件代码。随着它们的功能不断增强，对它们在更复杂领域性能的期望也在不断增长。尽管取得了这些进展，LLMs仍然面临着重大挑战，特别是在需要复杂决策的情况下，比如规划旅行或在多个可行选项之间选择。这些任务通常需要对各种结果有微妙的理解，并能够预测不同选择的后果，这些目前超出了LLMs的典型操作范围。本文提出了一种创新方法来弥合这种能力差距。通过让LLMs向用户请求多个潜在选项及其相应参数，我们的系统引入了一个动态框架，在决策过程中集成了一个优化函数。该函数旨在分析提供的选项，模拟潜在结果，并根据一组预定义的标准确定最有利的解决方案。通过利用这种方法，LLMs可以为复杂的多变量问题提供定制的最佳解决方案，显著增强它们在现实应用中的效用和效果。这种方法不仅扩展了LLMs的功能范围，还为更自主和智能的系统铺平了道路，这些系统能够支持复杂的决策任务。

更新时间: 2024-07-09 01:23:09

领域: cs.AI

下载: http://arxiv.org/abs/2407.06486v1

CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community

Artificial Intelligence of Things (AIoT) is an emerging frontier based on the deep fusion of Internet of Things (IoT) and Artificial Intelligence (AI) technologies. Although advanced deep learning techniques enhance the efficient data processing and intelligent analysis of complex IoT data, they still suffer from notable challenges when deployed to practical AIoT applications, such as constrained resources, and diverse task requirements. Knowledge transfer is an effective method to enhance learning performance by avoiding the exorbitant costs associated with data recollection and model retraining. Notably, although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances of various knowledge transfer techniques for AIoT field. This survey endeavors to introduce a new concept of knowledge transfer, referred to as Crowd Knowledge Transfer (CrowdTransfer), which aims to transfer prior knowledge learned from a crowd of agents to reduce the training cost and as well as improve the performance of the model in real-world complicated scenarios. Particularly, we present four transfer modes from the perspective of crowd intelligence, including derivation, sharing, evolution and fusion modes. Building upon conventional transfer learning methods, we further delve into advanced crowd knowledge transfer models from three perspectives for various AIoT applications. Furthermore, we explore some applications of AIoT areas, such as human activity recognition, urban computing, multi-robot system, and smart factory. Finally, we discuss the open issues and outline future research directions of knowledge transfer in AIoT community.

Updated: 2024-07-09 01:20:37

标题: CrowdTransfer：在AIoT社区中实现群体知识传递

摘要: 物联网人工智能（AIoT）是一种新兴前沿，基于物联网（IoT）和人工智能（AI）技术的深度融合。尽管先进的深度学习技术增强了对复杂物联网数据的高效处理和智能分析，但在部署到实际AIoT应用时仍面临着显著挑战，如资源受限和多样化任务要求。知识迁移是一种有效的方法，通过避免与数据重新收集和模型重新训练相关的高昂成本来增强学习性能。值得注意的是，尽管已经有一些有价值和令人印象深刻的关于迁移学习的调查报告，但这些调查报告以相对孤立的方式介绍方法，并缺乏AIoT领域各种知识迁移技术的最新进展。本调查旨在介绍一种新概念，即众包知识迁移（CrowdTransfer），旨在将从一群代理人学到的先前知识转移，以减少训练成本并改善模型在现实世界复杂场景中的性能。特别是，我们从众包智能的角度提出四种迁移模式，包括推导、共享、演化和融合模式。在传统迁移学习方法基础上，我们进一步深入探讨了针对各种AIoT应用的高级众包知识迁移模型的三个角度。此外，我们探索了一些AIoT领域的应用，如人类活动识别、城市计算、多机器人系统和智能工厂。最后，我们讨论了AIoT社区中知识迁移的开放问题，并概述了未来的研究方向。

更新时间: 2024-07-09 01:20:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.06485v1

Prospective Messaging: Learning in Networks with Communication Delays

Inter-neuron communication delays are ubiquitous in physically realized neural networks such as biological neural circuits and neuromorphic hardware. These delays have significant and often disruptive consequences on network dynamics during training and inference. It is therefore essential that communication delays be accounted for, both in computational models of biological neural networks and in large-scale neuromorphic systems. Nonetheless, communication delays have yet to be comprehensively addressed in either domain. In this paper, we first show that delays prevent state-of-the-art continuous-time neural networks called Latent Equilibrium (LE) networks from learning even simple tasks despite significant overparameterization. We then propose to compensate for communication delays by predicting future signals based on currently available ones. This conceptually straightforward approach, which we call prospective messaging (PM), uses only neuron-local information, and is flexible in terms of memory and computation requirements. We demonstrate that incorporating PM into delayed LE networks prevents reaction lags, and facilitates successful learning on Fourier synthesis and autoregressive video prediction tasks.

Updated: 2024-07-09 01:20:32

标题: 前瞻性通信：带有通信延迟的网络中的学习

摘要: Interneuron communication delays are common in both biological neural circuits and neuromorphic hardware, and can have significant impacts on network dynamics during training and inference. It is crucial to consider these delays in computational models of neural networks and large-scale neuromorphic systems. However, communication delays have not been fully addressed in either domain. This study shows that delays hinder the learning capabilities of Latent Equilibrium (LE) networks, even with significant overparameterization. The authors propose a method called prospective messaging (PM) to compensate for communication delays by predicting future signals based on current information. This approach, which relies on neuron-local information, is flexible in terms of memory and computation requirements. Incorporating PM into delayed LE networks reduces reaction lags and improves learning performance on tasks such as Fourier synthesis and autoregressive video prediction.

更新时间: 2024-07-09 01:20:32

领域: cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.05494v2

Grounding Language about Belief in a Bayesian Theory-of-Mind

Despite the fact that beliefs are mental states that cannot be directly observed, humans talk about each others' beliefs on a regular basis, often using rich compositional language to describe what others think and know. What explains this capacity to interpret the hidden epistemic content of other minds? In this paper, we take a step towards an answer by grounding the semantics of belief statements in a Bayesian theory-of-mind: By modeling how humans jointly infer coherent sets of goals, beliefs, and plans that explain an agent's actions, then evaluating statements about the agent's beliefs against these inferences via epistemic logic, our framework provides a conceptual role semantics for belief, explaining the gradedness and compositionality of human belief attributions, as well as their intimate connection with goals and plans. We evaluate this framework by studying how humans attribute goals and beliefs while watching an agent solve a doors-and-keys gridworld puzzle that requires instrumental reasoning about hidden objects. In contrast to pure logical deduction, non-mentalizing baselines, and mentalizing that ignores the role of instrumental plans, our model provides a much better fit to human goal and belief attributions, demonstrating the importance of theory-of-mind for a semantics of belief.

Updated: 2024-07-09 01:19:50

标题: 用贝叶斯心智理论解释信仰的语言基础

摘要: 尽管信念是无法直接观察到的心理状态，但人类经常谈论彼此的信念，通常使用丰富的组合语言来描述他人的想法和知识。是什么解释了这种解释其他思维中隐藏的认知内容的能力？在本文中，我们通过将信念陈述的语义基础放在贝叶斯心灵理论中，向答案迈出了一步：通过建模人类如何共同推断解释一个代理人的行动的一致性目标、信念和计划的集合，然后通过认知逻辑评估关于该代理人信念的陈述，我们的框架为信念提供了一个概念角色语义，解释了人类信念归因的渐进性和可组合性，以及它们与目标和计划的密切联系。我们通过研究人类在观察一个代理人解决一个需要关于隐藏对象的工具推理的门和钥匙格子世界谜题时如何归因目标和信念来评估这一框架。与纯逻辑推理、非心理化基线和忽略工具性计划作用的心理化相比，我们的模型更好地适应了人类的目标和信念归因，证明了心灵理论对信念语义的重要性。

更新时间: 2024-07-09 01:19:50

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2402.10416v2

Composable Interventions for Language Models

Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions.

Updated: 2024-07-09 01:17:44

标题: 可组合的语言模型干预措施

摘要: 测试时干预语言模型可以增强事实准确性，减轻有害输出，并提高模型效率，而无需昂贵的重新训练。但尽管有大量新方法涌现，不同类型的干预仍在独立发展。在实践中，必须将多个干预顺序应用于同一个模型，然而我们缺乏标准化的方法来研究干预如何相互作用。我们通过引入可组合干预，一个框架来研究在同一个语言模型上使用多个干预的效果，提供了新的指标和统一的代码库。使用我们的框架，我们进行了大量实验，并组合了来自三种新兴干预类别的流行方法 - 知识编辑、模型压缩和机器遗忘。我们从310种不同组合中得出的结果揭示了有意义的相互作用：压缩阻碍了编辑和遗忘，组合干预取决于它们的应用顺序，并且流行的通用指标不足以评估可组合性。总的来说，我们的研究结果展示了可组合性中明显存在的差距，表明需要新的多目标干预。我们所有的代码都是公开的：https://github.com/hartvigsen-group/composable-interventions。

更新时间: 2024-07-09 01:17:44

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.06483v1

Rethinking Autoencoders for Medical Anomaly Detection from A Theoretical Perspective

Medical anomaly detection aims to identify abnormal findings using only normal training data, playing a crucial role in health screening and recognizing rare diseases. Reconstruction-based methods, particularly those utilizing autoencoders (AEs), are dominant in this field. They work under the assumption that AEs trained on only normal data cannot reconstruct unseen abnormal regions well, thereby enabling the anomaly detection based on reconstruction errors. However, this assumption does not always hold due to the mismatch between the reconstruction training objective and the anomaly detection task objective, rendering these methods theoretically unsound. This study focuses on providing a theoretical foundation for AE-based reconstruction methods in anomaly detection. By leveraging information theory, we elucidate the principles of these methods and reveal that the key to improving AE in anomaly detection lies in minimizing the information entropy of latent vectors. Experiments on four datasets with two image modalities validate the effectiveness of our theory. To the best of our knowledge, this is the first effort to theoretically clarify the principles and design philosophy of AE for anomaly detection. The code is available at \url{https://github.com/caiyu6666/AE4AD}.

Updated: 2024-07-09 01:14:41

标题: 重新考虑自编码器在医学异常检测中的应用：从理论角度出发

摘要: 医学异常检测旨在仅使用正常训练数据识别异常发现，对健康筛查和识别罕见疾病起着至关重要的作用。基于重建的方法，特别是利用自动编码器（AEs）的方法在这一领域占主导地位。它们基于这样的假设，即仅在正常数据上训练的AEs不能很好地重建未见异常区域，从而实现基于重建错误的异常检测。然而，由于重建训练目标与异常检测任务目标之间存在不匹配，这一假设并不总是成立，使得这些方法在理论上不太可靠。本研究专注于为AE基础的重建方法在异常检测中提供理论基础。通过利用信息论，我们阐明了这些方法的原则，并揭示了提高AE在异常检测中的关键在于最小化潜在向量的信息熵。对两种图像模态的四个数据集进行的实验验证了我们理论的有效性。据我们所知，这是第一次在理论上澄清AE在异常检测中的原则和设计理念的努力。代码可在\url{https://github.com/caiyu6666/AE4AD}上找到。

更新时间: 2024-07-09 01:14:41

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.09303v3

Enhancing Mobile "How-to" Queries with Automated Search Results Verification and Reranking

Many people use search engines to find online guidance to solve computer or mobile device problems. Users frequently encounter challenges in identifying effective solutions from search results, often wasting time trying ineffective solutions that seem relevant yet fail to solve real problems. This paper introduces a novel approach to improving the accuracy and relevance of online technical support search results through automated search results verification and reranking. Taking "How-to" queries specific to on-device execution as a starting point, we developed the first solution that allows an AI agent to interpret and execute step-by-step instructions in the search results in a controlled Android environment. We further integrated the agent's findings into a reranking mechanism that orders search results based on the success indicators of the tested solutions. The paper details the architecture of our solution and a comprehensive evaluation of the system through a series of tests across various application domains. The results demonstrate a significant improvement in the quality and reliability of the top-ranked results. Our findings suggest a paradigm shift in how search engine ranking for online technical support help can be optimized, offering a scalable and automated solution to the pervasive challenge of finding effective and reliable online help.

Updated: 2024-07-09 01:14:21

标题: 通过自动搜索结果验证和重新排名来增强移动"How-to"查询

摘要: 许多人使用搜索引擎来寻找在线指南，解决计算机或移动设备问题。用户经常在从搜索结果中识别有效解决方案方面遇到挑战，往往浪费时间尝试看似相关但无法解决实际问题的无效解决方案。本文介绍了一种改进在线技术支持搜索结果准确性和相关性的新方法，通过自动化搜索结果验证和重新排名。以特定于设备执行的“How-to”查询为起点，我们开发了第一种允许AI代理在受控Android环境中解释和执行搜索结果中逐步说明的解决方案。我们进一步将代理的发现集成到一个重新排名机制中，根据测试解决方案的成功指标对搜索结果进行排序。本文详细介绍了我们解决方案的架构，并通过一系列跨不同应用领域的测试对系统进行了全面评估。结果表明，在排名最靠前的结果的质量和可靠性方面取得了显著改进。我们的研究结果表明，搜索引擎对在线技术支持帮助的排名可以得到优化，提供了一个可扩展和自动化的解决方案，以解决寻找有效和可靠在线帮助的普遍挑战。

更新时间: 2024-07-09 01:14:21

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.08860v3

Sinkhorn algorithms and linear programming solvers for optimal partial transport problems

In this note, we generalize the classical optimal partial transport (OPT) problem by modifying the mass destruction/creation term to function-based terms, introducing what we term ``generalized optimal partial transport'' problems. We then discuss the dual formulation of these problems and the associated Sinkhorn solver. Finally, we explore how these new OPT problems relate to classical optimal transport (OT) problems and introduce a linear programming solver tailored for these generalized scenarios.

Updated: 2024-07-09 01:08:21

标题: Sinkhorn算法和线性规划求解器用于最优部分运输问题 (Note: Sinkhorn算法是一种用于解决最优传输问题的数值算法，部分运输问题是最优传输问题的一个特例。)

摘要: 在这篇论文中，我们通过修改质量破坏/创造术语为基于函数的术语，推广了经典的最优偏移传输（OPT）问题，引入了我们称之为“广义最优偏移传输”问题。然后我们讨论了这些问题的对偶表达式以及相关的Sinkhorn求解器。最后，我们探讨了这些新的OPT问题与经典最优传输（OT）问题之间的关系，并介绍了针对这些广义场景量身定制的线性规划求解器。

更新时间: 2024-07-09 01:08:21

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2407.06481v1

Bayesian Semi-supervised learning under nonparanormality

Semi-supervised learning is a model training method that uses both labeled and unlabeled data. This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any binary classification problem. We assume the labels are missing at random when using unlabeled data in a semi-supervised setting. We assume that the observations follow two multivariate normal distributions depending on their true class labels after some common unknown transformation is applied to each component of the observation vector. The function is expanded in a B-splines series and a prior is put on the coefficients. We consider a normal prior on the coefficients and constrain the values to meet the requirement for normality and identifiability constraints. The precision matrices of the two Gaussian distributions have a conjugate Wishart prior, while the means have improper uniform priors. The resulting posterior is still conditionally conjugate, and the Gibbs sampler aided by a data augmentation technique can thus be adopted. An extensive simulation study compares the proposed method with several other available methods. The proposed method is also applied to real datasets on diagnosing breast cancer and classification of signals. We conclude that the proposed method has a better prediction accuracy in various cases.

Updated: 2024-07-09 01:03:09

标题: 贝叶斯半监督学习在非正态性下的应用

摘要: 半监督学习是一种模型训练方法，同时利用有标签和无标签数据。本文提出了一种完全贝叶斯半监督学习算法，可应用于任何二元分类问题。我们假设在半监督设置中使用无标签数据时，标签是随机缺失的。我们假设观测值在应用某种常见未知转换到观测向量的每个分量后，根据其真实类别标签遵循两个多元正态分布。函数在B样条系列中扩展，并对系数进行先验设定。我们考虑在系数上的正态先验，并限制值以满足正态性和可识别性约束条件。两个高斯分布的精度矩阵具有共轭Wishart先验，而均值具有不合适的均匀先验。结果后验仍然条件共轭，通过数据增强技术辅助的吉布斯采样器可以采用。通过广泛的模拟研究将所提出的方法与其他几种可用方法进行比较。该方法还应用于诊断乳腺癌和信号分类的真实数据集。我们得出结论，所提出的方法在各种情况下具有更好的预测准确性。

更新时间: 2024-07-09 01:03:09

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2001.03798v2

Towards a General Framework for Continual Learning with Pre-training

In this work, we present a general framework for continual learning of sequentially arrived tasks with the use of pre-training, which has emerged as a promising direction for artificial intelligence systems to accommodate real-world dynamics. From a theoretical perspective, we decompose its objective into three hierarchical components, including within-task prediction, task-identity inference, and task-adaptive prediction. Then we propose an innovative approach to explicitly optimize these components with parameter-efficient fine-tuning (PEFT) techniques and representation statistics. We empirically demonstrate the superiority and generality of our approach in downstream continual learning, and further explore the applicability of PEFT techniques in upstream continual learning. We also discuss the biological basis of the proposed framework with recent advances in neuroscience.

Updated: 2024-07-09 00:56:12

标题: 朝向一个具有预训练的持续学习的通用框架

摘要: 在这项工作中，我们提出了一个通用框架，用于利用预训练来持续学习顺序到达的任务，这已经成为人工智能系统适应现实世界动态的一个有前途的方向。从理论角度来看，我们将其目标分解为三个层次的组成部分，包括任务内预测、任务身份推断和任务自适应预测。然后，我们提出了一种创新方法，通过参数高效微调（PEFT）技术和表示统计明确优化这些组件。我们在下游持续学习中实证地证明了我们方法的优越性和普适性，并进一步探讨了PEFT技术在上游持续学习中的适用性。我们还讨论了提出的框架与神经科学最新进展之间的生物学基础。

更新时间: 2024-07-09 00:56:12

领域: cs.LG

下载: http://arxiv.org/abs/2310.13888v2