What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models
Compression techniques have been crucial in advancing machine learning by enabling efficient training and deployment of large-scale language models. However, these techniques have received limited attention in the context of low-resource language models, which are trained on even smaller amounts of data and under computational constraints, a scenario known as the "low-resource double-bind." This paper investigates the effectiveness of pruning, knowledge distillation, and quantization on an exclusively low-resourced, small-data language model, AfriBERTa. Through a battery of experiments, we assess the effects of compression on performance across several metrics beyond accuracy. Our study provides evidence that compression techniques significantly improve the efficiency and effectiveness of small-data language models, confirming that the prevailing beliefs regarding the effects of compression on large, heavily parameterized models hold true for less-parameterized, small-data models.
Updated: 2024-04-06 23:52:53
标题: 当小数据模型被压缩变得更小时会发生什么?探索压缩对小数据预训练语言模型的影响
摘要: 压缩技术在推动机器学习方面起着至关重要的作用,通过实现大规模语言模型的高效训练和部署。然而,在低资源语言模型的情境中,这些技术受到了有限的关注,这些模型是在更少的数据和计算约束下训练的,这种情况被称为“低资源双重约束”。本文研究了在专门针对低资源、小数据语言模型AfriBERTa上进行剪枝、知识蒸馏和量化的有效性。通过一系列实验,我们评估了压缩对性能的影响,超出了准确度等几个指标。我们的研究证明,压缩技术显著提高了小数据语言模型的效率和有效性,证实了关于压缩对大型、参数化模型的影响的普遍信念也适用于参数更少、小数据模型。
更新时间: 2024-04-06 23:52:53
领域: cs.CL,cs.LG
Polynormer: Polynomial-Expressive Graph Transformer in Linear Time
Graph transformers (GTs) have emerged as a promising architecture that is theoretically more expressive than message-passing graph neural networks (GNNs). However, typical GT models have at least quadratic complexity and thus cannot scale to large graphs. While there are several linear GTs recently proposed, they still lag behind GNN counterparts on several popular graph datasets, which poses a critical concern on their practical expressivity. To balance the trade-off between expressivity and scalability of GTs, we propose Polynormer, a polynomial-expressive GT model with linear complexity. Polynormer is built upon a novel base model that learns a high-degree polynomial on input features. To enable the base model permutation equivariant, we integrate it with graph topology and node features separately, resulting in local and global equivariant attention models. Consequently, Polynormer adopts a linear local-to-global attention scheme to learn high-degree equivariant polynomials whose coefficients are controlled by attention scores. Polynormer has been evaluated on $13$ homophilic and heterophilic datasets, including large graphs with millions of nodes. Our extensive experiment results show that Polynormer outperforms state-of-the-art GNN and GT baselines on most datasets, even without the use of nonlinear activation functions.
Updated: 2024-04-06 23:26:26
标题: Polynormer:线性时间内的多项式表达图转换器
摘要: 图形变换器(GTs)已经成为一种有希望的架构,从理论上讲比消息传递图神经网络(GNNs)更具表现力。然而,典型的GT模型至少具有二次复杂度,因此无法扩展到大型图。虽然最近提出了几种线性GTs,但它们在几个流行的图数据集上仍落后于GNN对应物,这对它们的实际表现力构成了关键问题。为了在GTs的表现力和可扩展性之间取得平衡,我们提出了Polynormer,这是一个具有线性复杂度的多项式表现力GT模型。Polynormer建立在一个新颖的基础模型之上,该模型学习输入特征上的高次多项式。为了使基础模型置换等变,我们将其与图拓扑和节点特征分别整合在一起,从而产生本地和全局等变注意模型。因此,Polynormer采用线性本地到全局注意方案来学习由注意分数控制的高次等变多项式系数。Polynormer已经在包括数百万个节点的大型图在内的13个同类和异类数据集上进行了评估。我们广泛的实验结果表明,即使没有使用非线性激活函数,Polynormer在大多数数据集上都优于最先进的GNN和GT基线模型。
更新时间: 2024-04-06 23:26:26
领域: cs.LG,cs.AI
Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits
While graph neural networks (GNNs) have gained popularity for learning circuit representations in various electronic design automation (EDA) tasks, they face challenges in scalability when applied to large graphs and exhibit limited generalizability to new designs. These limitations make them less practical for addressing large-scale, complex circuit problems. In this work we propose HOGA, a novel attention-based model for learning circuit representations in a scalable and generalizable manner. HOGA first computes hop-wise features per node prior to model training. Subsequently, the hop-wise features are solely used to produce node representations through a gated self-attention module, which adaptively learns important features among different hops without involving the graph topology. As a result, HOGA is adaptive to various structures across different circuits and can be efficiently trained in a distributed manner. To demonstrate the efficacy of HOGA, we consider two representative EDA tasks: quality of results (QoR) prediction and functional reasoning. Our experimental results indicate that (1) HOGA reduces estimation error over conventional GNNs by 46.76% for predicting QoR after logic synthesis; (2) HOGA improves 10.0% reasoning accuracy over GNNs for identifying functional blocks on unseen gate-level netlists after complex technology mapping; (3) The training time for HOGA almost linearly decreases with an increase in computing resources.
Updated: 2024-04-06 23:23:56
标题: 少即是多:基于跳数的图注意力机制用于电路可扩展和泛化学习
摘要: 虽然图神经网络(GNNs)在各种电子设计自动化(EDA)任务中学习电路表示方面变得越来越受欢迎,但当应用于大型图时,它们面临可扩展性挑战,并且在新设计上表现出有限的泛化能力。这些限制使它们在解决大规模复杂电路问题方面不够实用。在这项工作中,我们提出了HOGA,一种新颖的基于注意力的模型,以可扩展且具有泛化性的方式学习电路表示。HOGA首先在模型训练之前计算每个节点的逐跳特征。随后,逐跳特征仅用于通过一个门控自注意力模块生成节点表示,该模块自适应地学习不涉及图拓扑的不同跳之间的重要特征。因此,HOGA适应于不同电路之间的各种结构,并且可以以分布方式高效训练。为了展示HOGA的有效性,我们考虑了两个代表性的EDA任务:结果质量(QoR)预测和功能推理。我们的实验结果表明:(1)HOGA在逻辑综合后预测QoR时将估计误差降低了46.76%;(2)HOGA在复杂技术映射后识别未见门级网表上的功能块的推理准确性提高了10.0%;(3)HOGA的训练时间随着计算资源的增加几乎线性减少。
更新时间: 2024-04-06 23:23:56
领域: cs.LG,cs.AR
The Shifting Landscape of Cybersecurity: The Impact of Remote Work and COVID-19 on Data Breach Trends
This study examines the impact of the COVID-19 pandemic on cybersecurity and data breaches, with a specific focus on the shift toward remote work. The study identifies trends and offers insights into cybersecurity incidents by analyzing data breaches two years before and two years after the start of remote work. Data was collected from the Montana Department of Justice Data Breach database and consisted of data breaches that occurred between April 2018 and April 2022. The findings inform best practices for cybersecurity preparedness in remote work environments, aiding organizations to enhance their defenses. Although the study's data is limited to Montana, it offers valuable insights for cybersecurity professionals worldwide. As remote work continues to evolve, organizations must remain adaptable and vigilant in their cybersecurity strategies.
Updated: 2024-04-06 23:19:58
标题: 网络安全的变化格局:远程办公和COVID-19对数据泄露趋势的影响
摘要: 这项研究探讨了COVID-19大流行对网络安全和数据泄露的影响,特别关注远程工作模式的转变。通过分析远程工作开始前两年和后两年的数据泄露情况,研究确定了趋势,并提供了网络安全事件的见解。数据来自蒙大拿州司法部数据泄露数据库,包括2018年4月至2022年4月发生的数据泄露事件。研究结果为远程工作环境中网络安全准备工作提供了最佳实践,帮助组织增强其防御能力。虽然该研究的数据仅限于蒙大拿州,但为全球网络安全专业人士提供了宝贵的见解。随着远程工作的持续发展,组织必须在网络安全策略上保持灵活和警惕。
更新时间: 2024-04-06 23:19:58
领域: cs.CR,cs.CY
Cost-Efficient Prompt Engineering for Unsupervised Entity Resolution
Entity Resolution (ER) is the problem of semi-automatically determining when two entities refer to the same underlying entity, with applications ranging from healthcare to e-commerce. Traditional ER solutions required considerable manual expertise, including domain-specific feature engineering, as well as identification and curation of training data. Recently released large language models (LLMs) provide an opportunity to make ER more seamless and domain-independent. However, it is also well known that LLMs can pose risks, and that the quality of their outputs can depend on how prompts are engineered. Unfortunately, a systematic experimental study on the effects of different prompting methods for addressing unsupervised ER, using LLMs like ChatGPT, has been lacking thus far. This paper aims to address this gap by conducting such a study. We consider some relatively simple and cost-efficient ER prompt engineering methods and apply them to ER on two real-world datasets widely used in the community. We use an extensive set of experimental results to show that an LLM like GPT3.5 is viable for high-performing unsupervised ER, and interestingly, that more complicated and detailed (and hence, expensive) prompting methods do not necessarily outperform simpler approaches. We provide brief discussions on qualitative and error analysis, including a study of the inter-consistency of different prompting methods to determine whether they yield stable outputs. Finally, we consider some limitations of LLMs when applied to ER.
Updated: 2024-04-06 22:59:54
标题: 高效成本的无监督实体解析的即时工程化
摘要: 实体解析(ER)是半自动确定两个实体是否指向同一基础实体的问题,其应用范围从医疗保健到电子商务。传统的ER解决方案需要大量的手动专业知识,包括领域特定的特征工程,以及识别和筛选训练数据。最近发布的大型语言模型(LLMs)为使ER更无缝和与领域无关提供了机会。然而,人们也知道LLMs可能存在风险,并且它们的输出质量可能取决于如何设计提示。不幸的是,迄今为止缺乏对使用像ChatGPT这样的LLMs解决无监督ER的不同提示方法的影响进行系统实验研究。本文旨在通过进行这样的研究来填补这一空白。我们考虑一些相对简单和成本效益高的ER提示工程方法,并将它们应用于社区广泛使用的两个真实世界数据集上的ER。我们使用广泛的实验结果集来展示像GPT3.5这样的LLM对于高性能的无监督ER是可行的,并有趣的是,更复杂和详细(因此更昂贵)的提示方法未必优于更简单的方法。我们提供关于定性和错误分析的简要讨论,包括对不同提示方法的互一致性的研究,以确定它们是否产生稳定的输出。最后,我们考虑将LLMs应用于ER时的一些局限性。
更新时间: 2024-04-06 22:59:54
领域: cs.AI,cs.SE
Provable Robustness Against a Union of $\ell_0$ Adversarial Attacks
Sparse or $\ell_0$ adversarial attacks arbitrarily perturb an unknown subset of the features. $\ell_0$ robustness analysis is particularly well-suited for heterogeneous (tabular) data where features have different types or scales. State-of-the-art $\ell_0$ certified defenses are based on randomized smoothing and apply to evasion attacks only. This paper proposes feature partition aggregation (FPA) -- a certified defense against the union of $\ell_0$ evasion, backdoor, and poisoning attacks. FPA generates its stronger robustness guarantees via an ensemble whose submodels are trained on disjoint feature sets. Compared to state-of-the-art $\ell_0$ defenses, FPA is up to 3,000${\times}$ faster and provides larger median robustness guarantees (e.g., median certificates of 13 pixels over 10 for CIFAR10, 12 pixels over 10 for MNIST, 4 features over 1 for Weather, and 3 features over 1 for Ames), meaning FPA provides the additional dimensions of robustness essentially for free.
Updated: 2024-04-06 22:35:20
标题: 可证明的抵抗$\ell_0$对抗攻击联盟
摘要: 稀疏或$\ell_0$对抗性攻击会对未知特征的子集进行任意扰动。$\ell_0$鲁棒性分析特别适用于特征具有不同类型或比例的异构(表格)数据。最先进的$\ell_0$认证防御是基于随机平滑的,仅适用于规避攻击。本文提出了特征分区聚合(FPA)--一种针对$\ell_0$规避、后门和毒化攻击联合的认证防御。FPA通过一个集合生成更强的鲁棒性保证,其中子模型是在不同的特征集上训练的。与最先进的$\ell_0$防御相比,FPA的速度快3,000倍,并提供更大的中值鲁棒性保证(例如,在CIFAR10上为13像素,对于MNIST为12像素,对于Weather为4个特征,对于Ames为3个特征,而不是1个),这意味着FPA基本上免费提供额外维度的鲁棒性。
更新时间: 2024-04-06 22:35:20
领域: cs.LG
Challenges Faced by Large Language Models in Solving Multi-Agent Flocking
Flocking is a behavior where multiple agents in a system attempt to stay close to each other while avoiding collision and maintaining a desired formation. This is observed in the natural world and has applications in robotics, including natural disaster search and rescue, wild animal tracking, and perimeter surveillance and patrol. Recently, large language models (LLMs) have displayed an impressive ability to solve various collaboration tasks as individual decision-makers. Solving multi-agent flocking with LLMs would demonstrate their usefulness in situations requiring spatial and decentralized decision-making. Yet, when LLM-powered agents are tasked with implementing multi-agent flocking, they fall short of the desired behavior. After extensive testing, we find that agents with LLMs as individual decision-makers typically opt to converge on the average of their initial positions or diverge from each other. After breaking the problem down, we discover that LLMs cannot understand maintaining a shape or keeping a distance in a meaningful way. Solving multi-agent flocking with LLMs would enhance their ability to understand collaborative spatial reasoning and lay a foundation for addressing more complex multi-agent tasks. This paper discusses the challenges LLMs face in multi-agent flocking and suggests areas for future improvement and research.
Updated: 2024-04-06 22:34:07
标题: 大型语言模型在解决多智体集群中面临的挑战
摘要: 聚群是多个系统中的智能体试图保持接近彼此并避免碰撞,同时保持所需的形态的行为。这种行为在自然界中被观察到,并在机器人技术中有应用,包括自然灾害搜救、野生动物追踪以及周边监视和巡逻。最近,大型语言模型(LLMs)展现出解决各种协作任务的能力,作为个体决策者。使用LLMs解决多智能体聚群问题将展示它们在需要空间和分散决策的情况下的实用性。然而,当由LLMs驱动的智能体被要求实现多智能体聚群时,它们未能达到预期的行为。经过广泛测试,我们发现具有LLMs作为个体决策者的智能体通常选择收敛于其初始位置的平均值或相互分散。在解决问题的过程中,我们发现LLMs无法理解以有意义的方式维持形状或保持距离。使用LLMs解决多智能体聚群问题将增强它们理解协作空间推理的能力,并为解决更复杂的多智能体任务奠定基础。本文讨论了LLMs在多智能体聚群中面临的挑战,并提出未来改进和研究的方向。
更新时间: 2024-04-06 22:34:07
领域: cs.AI,cs.MA
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an object's appearance can vary with camera position. As such, we present the Multi-view Approach to Grounding in Context (MAGiC), which selects an object referent based on language that distinguishes between two similar objects. By pragmatically reasoning over both objects and across multiple views of those objects, MAGiC improves over the state-of-the-art model on the SNARE object reference task with a relative error reduction of 12.9\% (representing an absolute improvement of 2.7\%). Ablation studies show that reasoning jointly over object referent candidates and multiple views of each object both contribute to improved accuracy. Code: https://github.com/rcorona/magic_snare/
Updated: 2024-04-06 22:14:25
标题: 哪一个?利用对象之间和多个视图之间的背景进行语言基础建设
摘要: 在一个具有实体三维环境的环境中连接物体和它们的语言指称时,重要的是要注意以下几点:(1)一个物体可以通过利用其与其他物体之间的比较信息来更好地描述,(2)一个物体的外观可能会随着摄像机位置的变化而变化。因此,我们提出了一种基于上下文的多视角定位方法(MAGiC),它根据区分两个相似物体之间的语言来选择一个物体指称。通过在两个物体以及这些物体的多个视角之间进行实用推理,MAGiC在SNARE物体指称任务上相对误差降低了12.9\%(绝对改进为2.7\%),超越了现有技术模型。消融研究表明,同时对物体指称候选和每个物体的多个视角进行推理都有助于提高准确性。 代码:https://github.com/rcorona/magic_snare/
更新时间: 2024-04-06 22:14:25
领域: cs.CL,cs.AI,cs.CV,cs.RO
Adapting Multi-objectivized Software Configuration Tuning
When tuning software configuration for better performance (e.g., latency or throughput), an important issue that many optimizers face is the presence of local optimum traps, compounded by a highly rugged configuration landscape and expensive measurements. To mitigate these issues, a recent effort has shifted to focus on the level of optimization model (called meta multi-objectivization or MMO) instead of designing better optimizers as in traditional methods. This is done by using an auxiliary performance objective, together with the target performance objective, to help the search jump out of local optima. While effective, MMO needs a fixed weight to balance the two objectives-a parameter that has been found to be crucial as there is a large deviation of the performance between the best and the other settings. However, given the variety of configurable software systems, the "sweet spot" of the weight can vary dramatically in different cases and it is not possible to find the right setting without time-consuming trial and error. In this paper, we seek to overcome this significant shortcoming of MMO by proposing a weight adaptation method, dubbed AdMMO. Our key idea is to adaptively adjust the weight at the right time during tuning, such that a good proportion of the nondominated configurations can be maintained. Moreover, we design a partial duplicate retention mechanism to handle the issue of too many duplicate configurations without losing the rich information provided by the "good" duplicates. Experiments on several real-world systems, objectives, and budgets show that, for 71% of the cases, AdMMO is considerably superior to MMO and a wide range of state-of-the-art optimizers while achieving generally better efficiency with the best speedup between 2.2x and 20x.
Updated: 2024-04-06 22:08:09
标题: "多目标化软件配置调整的适应性"
摘要: 调整软件配置以获得更好性能(例如延迟或吞吐量)时,许多优化器面临的一个重要问题是局部最优陷阱的存在,加上高度崎岖的配置景观和昂贵的测量。为了缓解这些问题,最近的努力已经转向关注优化模型的层面(称为元多目标化或MMO),而不是像传统方法那样设计更好的优化器。这是通过使用辅助性能目标,以及目标性能目标,来帮助搜索跳出局部最优解。虽然有效,但MMO需要一个固定的权重来平衡两个目标-这是一个被发现至关重要的参数,因为最佳设置和其他设置之间的性能存在较大差异。然而,鉴于可配置软件系统的多样性,权重的“甜蜜点”在不同情况下可能会有很大变化,并且在没有耗时试验和错误的情况下找到正确设置是不可能的。在本文中,我们试图通过提出一种称为AdMMO的权重自适应方法来克服MMO的这一显著缺陷。我们的关键思想是在调整过程中适时调整权重,以便保持好的无支配配置的比例。此外,我们设计了一个部分重复保留机制,以处理太多重复配置的问题,同时又不会失去“好”重复提供的丰富信息。 对几个真实系统、目标和预算进行的实验表明,在71%的情况下,AdMMO明显优于MMO和一系列最先进的优化器,同时在一般效率上取得更好的表现,速度提升最高可达2.2倍至20倍。
更新时间: 2024-04-06 22:08:09
领域: cs.SE,cs.AI,cs.DC
OrderBkd: Textual backdoor attack through repositioning
The use of third-party datasets and pre-trained machine learning models poses a threat to NLP systems due to possibility of hidden backdoor attacks. Existing attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing, which either alter the semantics of the original texts or can be detected. Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger. By designing and applying specific part-of-speech (POS) based rules for selecting these tokens, we maintain high attack success rate on SST-2 and AG classification datasets while outperforming existing attacks in terms of perplexity and semantic similarity to the clean samples. In addition, we show the robustness of our attack to the ONION defense method. All the code and data for the paper can be obtained at https://github.com/alekseevskaia/OrderBkd.
Updated: 2024-04-06 21:41:10
标题: OrderBkd:通过重新定位进行文本后门攻击
摘要: 第三方数据集和预训练的机器学习模型的使用对自然语言处理系统构成威胁,因为可能存在隐藏的后门攻击。现有的攻击方式包括污染数据样本,比如插入令牌或句子改写,这些操作会改变原始文本的语义或被检测到。我们与先前的工作的主要区别在于,我们使用将句子中的两个单词重新排列作为触发器。通过设计和应用基于特定词性的规则来选择这些令牌,我们在SST-2和AG分类数据集上保持高攻击成功率,同时在困惑度和与干净样本的语义相似性方面优于现有的攻击方式。此外,我们展示了我们的攻击对ONION防御方法的鲁棒性。本文的所有代码和数据都可以在https://github.com/alekseevskaia/OrderBkd 上获得。
更新时间: 2024-04-06 21:41:10
领域: cs.CL,cs.AI
ProtoAL: Interpretable Deep Active Learning with prototypes for medical imaging
The adoption of Deep Learning algorithms in the medical imaging field is a prominent area of research, with high potential for advancing AI-based Computer-aided diagnosis (AI-CAD) solutions. However, current solutions face challenges due to a lack of interpretability features and high data demands, prompting recent efforts to address these issues. In this study, we propose the ProtoAL method, where we integrate an interpretable DL model into the Deep Active Learning (DAL) framework. This approach aims to address both challenges by focusing on the medical imaging context and utilizing an inherently interpretable model based on prototypes. We evaluated ProtoAL on the Messidor dataset, achieving an area under the precision-recall curve of 0.79 while utilizing only 76.54\% of the available labeled data. These capabilities can enhances the practical usability of a DL model in the medical field, providing a means of trust calibration in domain experts and a suitable solution for learning in the data scarcity context often found.
Updated: 2024-04-06 21:39:49
标题: ProtoAL:基于原型的医学影像可解释深度主动学习
摘要: 在医学影像领域采用深度学习算法是一个重要的研究领域,具有推动基于人工智能的计算机辅助诊断(AI-CAD)解决方案的巨大潜力。然而,当前的解决方案面临着缺乏可解释性特征和高数据需求的挑战,促使最近的努力解决这些问题。在本研究中,我们提出了ProtoAL方法,将一个可解释的深度学习模型集成到深度主动学习(DAL)框架中。这种方法旨在通过专注于医学影像背景并利用基于原型的内在可解释模型来解决这两个挑战。我们在Messidor数据集上评估了ProtoAL,实现了0.79的精确度-召回率曲线下面积,同时仅利用了可用标记数据的76.54%。这些能力可以增强深度学习模型在医学领域的实际可用性,为领域专家提供信任校准的手段,并为常见的数据稀缺情况下的学习提供一个合适的解决方案。
更新时间: 2024-04-06 21:39:49
领域: cs.CV,cs.AI,cs.LG
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
Recent advancements in large language models, such as GPT-4, have demonstrated remarkable capabilities in processing standard queries. Despite these advancements, their performance substantially declines in \textbf{advanced mathematical problems requiring complex, multi-step logical reasoning}. To enhance their inferential capabilities, current research has delved into \textit{prompting engineering}, exemplified by methodologies such as the Tree of Thought and Graph of Thought. Nonetheless, these existing approaches encounter two significant limitations. Firstly, their effectiveness in tackling complex mathematical problems is somewhat constrained. Secondly, the necessity to design distinct prompts for individual problems hampers their generalizability. In response to these limitations, this paper introduces the \textit{Multi-Agent System for conditional Mining} (\textbf{MACM}) prompting method. It not only resolves intricate mathematical problems but also demonstrates strong generalization capabilities across various mathematical contexts. With the assistance of MACM, the accuracy of GPT-4 Turbo on the most challenging level five mathematical problems in the MATH dataset increase from $\mathbf{54.68\%} \text{ to } \mathbf{76.73\%}$. The code is available in \url{https://github.com/bin123apple/MACM}.
Updated: 2024-04-06 21:39:01
标题: MACM:利用多智能体系统在解决复杂数学问题中进行条件挖掘
摘要: 最近大型语言模型(如GPT-4)的最新进展展示了其在处理标准查询方面的显著能力。尽管取得了这些进展,它们在需要复杂、多步逻辑推理的高级数学问题方面的表现大幅下降。为增强它们的推理能力,目前的研究已深入探讨了“提示工程”,例如思维树和思维图等方法。然而,这些现有方法遇到了两个重要限制。首先,它们在解决复杂数学问题方面的效果受到一定限制。其次,需要为每个问题设计不同的提示,这会影响它们的泛化能力。为应对这些限制,本文介绍了“条件挖掘的多智能体系统”(MACM)提示方法。它不仅解决了复杂的数学问题,还展现了在各种数学背景下的强大泛化能力。在MACM的帮助下,GPT-4 Turbo在MATH数据集中最具挑战性的五级数学问题的准确率从54.68%提高到76.73%。代码可在\url{https://github.com/bin123apple/MACM}中找到。
更新时间: 2024-04-06 21:39:01
领域: cs.AI,cs.CL,cs.MA
Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds
Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bounding the variance of every possible unbiased estimator of the inputs quantifies the confidentiality arising from such added noise. Convenient, computationally tractable bounds are available from classic inequalities of Hammersley and of Chapman and Robbins -- the HCR bounds. Numerical experiments indicate that the HCR bounds are on the precipice of being effectual for small neural nets with the data sets, "MNIST" and "CIFAR-10," which contain 10 classes each for image classification. The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes. Supplementing the addition of noise to features with other methods for providing confidentiality may be warranted in the case of ImageNet. In all cases, the results reported here limit consideration to amounts of added noise that incur little degradation in the accuracy of classification from the noisy features. Thus, the added noise enhances confidentiality without much reduction in the accuracy on the task of image classification.
Updated: 2024-04-06 21:18:01
标题: 通过Hammersley-Chapman-Robbins界限保证保密性
摘要: 通过在深度神经网络的最后几层中添加噪声来保护隐私是可能的。在最终分类器或其他特定任务层之前添加噪声到激活中。这些层中的激活被称为“特征”(或者不太常见的称为“嵌入”或“特征嵌入”)。添加的噪声有助于防止从嘈杂的特征中重建输入。将每个可能的无偏估计量的方差下限化为从这种添加的噪声中产生的保密性。方便、可计算的界限可从Hammersley和Chapman和Robbins的经典不等式中获得 - HCR界限。数值实验表明,HCR界限在具有图像分类的每个包含10个类的数据集“MNIST”和“CIFAR-10”中对于小型神经网络可能是有效的。HCR界限似乎单独不足以保证在预先在包含1000个类的“ImageNet-1000”数据集上进行推理时对输入的保密性。在ImageNet的情况下,可能需要补充对特征添加噪声的其他提供保密性的方法。在所有情况下,这里报告的结果限制考虑在从嘈杂的特征中进行分类时所产生的较小的分类准确性的噪声量。因此,添加的噪声增强了保密性,而对图像分类任务的准确性几乎没有减少。
更新时间: 2024-04-06 21:18:01
领域: cs.LG,cs.CR,cs.CY,stat.ML
A Bayesian Approach to Robust Inverse Reinforcement Learning
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL). The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. We make use of a class of prior distributions which parameterizes how accurate the expert's model of the environment is to develop efficient algorithms to estimate the expert's reward and subjective dynamics in high-dimensional settings. Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed (a priori) to have a highly accurate model of the environment. We verify this observation in the MuJoCo environments and show that our algorithms outperform state-of-the-art offline IRL algorithms.
Updated: 2024-04-06 21:05:36
标题: 一个贝叶斯方法用于稳健的逆强化学习
摘要: 我们考虑了贝叶斯方法用于离线基于模型的逆强化学习(IRL)。所提出的框架不同于现有的离线基于模型的IRL方法,通过同时估计专家的奖励函数和对环境动态的主观模型。我们利用一类先验分布来参数化专家对环境模型的准确性,以开发高维环境中估计专家奖励和主观动态的高效算法。我们的分析揭示了一个新颖的观点,即在专家被认为(先验)拥有高度准确的环境模型时,估计出的策略表现出鲁棒的性能。我们在MuJoCo环境中验证了这一观察结果,并展示了我们的算法优于最先进的离线IRL算法。
更新时间: 2024-04-06 21:05:36
领域: cs.LG
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable by learning an explicit model of the environment. While the capabilities of MBRL agents have significantly improved in recent years, how to best learn the model is still an unresolved question. The majority of MBRL algorithms aim at training the model to make accurate predictions about the environment and subsequently using the model to determine the most rewarding actions. However, recent research has shown that model predictive accuracy is often not correlated with action quality, tracing the root cause to the objective mismatch between accurate dynamics model learning and policy optimization of rewards. A number of interrelated solution categories to the objective mismatch problem have emerged as MBRL continues to mature as a research area. In this work, we provide an in-depth survey of these solution categories and propose a taxonomy to foster future research.
Updated: 2024-04-06 20:56:20
标题: 在模型化强化学习中解决目标不匹配的统一观点
摘要: 基于模型的强化学习(MBRL)旨在通过学习环境的显式模型使Agent更加高效、适应性强和可解释性更强。尽管近年来MBRL Agent的能力显著提高,但如何最好地学习模型仍然是一个未解决的问题。大多数MBRL算法旨在训练模型以对环境进行准确预测,随后使用模型确定最有价值的行动。然而,最近的研究表明,模型预测准确性通常与行动质量不相关,将根本原因追溯到准确动态模型学习与奖励策略优化之间的目标不匹配。随着MBRL作为一个研究领域不断成熟,涌现出了一些相互关联的解决方案类别来解决这一目标不匹配问题。在这项工作中,我们对这些解决方案类别进行了深入调查,并提出了一个分类法以促进未来的研究。
更新时间: 2024-04-06 20:56:20
领域: cs.LG
Towards a low carbon proof-of-work blockchain
Proof of Work (PoW) blockchains burn a lot of energy. Proof-of-work algorithms are expensive by design and often only serve to compute blockchains. In some sense, carbon-based and non-carbon based regional electric power is fungible. So the total carbon and non-carbon electric power mix plays a role. Thus, generally PoW algorithms have large CO$_2$ footprints solely for computing blockchains. A proof of technology is described towards replacing hashcash or other PoW methods with a lottery and proof-of-VM (PoVM) emulation. PoVM emulation is a form of PoW where an autonomous blockchain miner gets a lottery ticket in exchange for providing a VM (virtual Machine) for a specified period. These VMs get their jobs from a job queue. Managing and ensuring, by concensus, that autonomous PoVMs are properly configured and running as expected gives several gaps for a complete practical system. These gaps are discussed. Our system is similar to a number of other blockchain systems. We briefly survey these systems. This paper along with our proof of technology was done as a senior design project.
Updated: 2024-04-06 20:48:20
标题: 走向低碳的工作量证明区块链
摘要: 工作量证明(PoW)区块链消耗大量能源。工作量证明算法天生昂贵,通常只用于计算区块链。在某种意义上,基于碳和非碳的区域电力是可互换的。因此,总的碳和非碳电力混合物起着重要作用。因此,通常来说,PoW算法仅用于计算区块链,具有大量CO$_2$足迹。描述了一种技术证明,用于将hashcash或其他PoW方法替换为彩票和PoVM(Proof-of-VM)仿真。PoVM仿真是一种PoW形式,其中自治区块链矿工提供VM(虚拟机)以换取彩票。这些VM从作业队列中获取作业。管理和确保,通过共识,自治PoVM被正确配置并按预期运行,为一个完整的实际系统提供了几个空隙。对这些空隙进行了讨论。我们的系统类似于其他一些区块链系统。我们简要调查了这些系统。本文以及我们的技术证明是作为一个高级设计项目完成的。
更新时间: 2024-04-06 20:48:20
领域: cs.CR
We need to aim at the top: Factors associated with cybersecurity awareness of cyber and information security decision-makers
Cyberattacks pose a significant business risk to organizations. Although there is ample literature focusing on why people pose a major risk to organizational cybersecurity and how to deal with it, there is surprisingly little we know about cyber and information security decision-makers who are essentially the people in charge of setting up and maintaining organizational cybersecurity. In this paper, we study cybersecurity awareness of cyber and information security decision-makers, and investigate factors associated with it. We conducted an online survey among Slovenian cyber and information security decision-makers (N=283) to (1) determine whether their cybersecurity awareness is associated with adoption of antimalware solutions in their organizations, and (2) explore which organizational factors and personal characteristics are associated with their cybersecurity awareness. Our findings indicate that awareness of well-known threats and solutions seems to be quite low for individuals in decision-making roles. They also provide insights into which threats and solutions are cyber and information security decision-makers the least aware of. We uncovered that awareness of certain threats and solutions is positively associated with either adoption of advanced antimalware solutions with EDR/XDR capabilities or adoption of SOC. Additionally, we identified significant organizational factors (organizational role type) and personal characteristics (gender, age, experience with information security and experience with IT) related to cybersecurity awareness of cyber and information security decision-makers. Organization size and formal education were not significant. These results offer insights that can be leveraged in targeted cybersecurity training tailored to the needs of groups of cyber and information security decision-makers based on these key factors.
Updated: 2024-04-06 20:32:19
标题: 我们需要瞄准顶端:与网络和信息安全决策者的网络安全意识相关的因素
摘要: 网络攻击对组织构成重大的商业风险。尽管有大量文献关注人们为何对组织网络安全构成重大风险以及如何处理这一问题,但令人惊讶的是,我们对于实际负责组织网络安全设置和维护的网络和信息安全决策者知之甚少。本文研究了网络和信息安全决策者的网络安全意识,并调查与之相关的因素。我们在斯洛文尼亚的网络和信息安全决策者中进行了在线调查(N=283),以确定他们的网络安全意识是否与其组织中反恶意软件解决方案的采用有关,并探讨哪些组织因素和个人特征与他们的网络安全意识相关。我们的研究结果显示,决策者对已知威胁和解决方案的意识似乎相当低。他们还揭示了网络和信息安全决策者最缺乏意识的威胁和解决方案。我们发现,对某些威胁和解决方案的意识与采用具有EDR/XDR能力的高级反恶意软件解决方案或采用SOC呈正相关。此外,我们确定了与网络和信息安全决策者的网络安全意识相关的重要组织因素(组织角色类型)和个人特征(性别、年龄、信息安全经验和IT经验)。组织规模和正式教育并不显著。这些结果提供了见解,可用于根据这些关键因素定制针对网络和信息安全决策者群体需求的有针对性网络安全培训。
更新时间: 2024-04-06 20:32:19
领域: cs.CR
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Despite tremendous advancements in large language models (LLMs) over recent years, a notably urgent challenge for their practical deployment is the phenomenon of hallucination, where the model fabricates facts and produces non-factual statements. In response, we propose PoLLMgraph, a Polygraph for LLMs, as an effective model-based white-box detection and forecasting approach. PoLLMgraph distinctly differs from the large body of existing research that concentrates on addressing such challenges through black-box evaluations. In particular, we demonstrate that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics during generation via tractable probabilistic models. Experimental results on various open-source LLMs confirm the efficacy of PoLLMgraph, outperforming state-of-the-art methods by a considerable margin, evidenced by over 20% improvement in AUC-ROC on common benchmarking datasets like TruthfulQA. Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
Updated: 2024-04-06 20:02:20
标题: PoLLMgraph:通过状态转换动力学揭示大型语言模型中的幻觉
摘要: 尽管近年来大型语言模型(LLMs)取得了巨大进展,但它们在实际部署中面临的一个明显紧迫挑战是幻觉现象,即模型捏造事实并产生非事实陈述。为应对这一挑战,我们提出了PoLLMgraph,一种基于模型的白盒检测和预测方法,作为一种有效的模型。 PoLLMgraph与现有大量研究有着明显区别,后者集中于通过黑盒评估解决这些挑战。特别是,我们证明了通过分析LLM在生成过程中的内部状态转换动态,可以有效检测幻觉,通过可处理的概率模型。各种开源LLMs上的实验结果证实了PoLLMgraph的有效性,在TruthfulQA等常见基准数据集上,AUC-ROC有超过20%的改进,超过了最先进的方法。我们的工作为LLMs的基于模型的白盒分析开辟了一条新途径,推动研究社区进一步探索、理解和完善LLM行为的复杂动态。
更新时间: 2024-04-06 20:02:20
领域: cs.CL,cs.CR,cs.SE
Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRI
Heart failure is a serious and life-threatening condition that can lead to elevated pressure in the left ventricle. Pulmonary Arterial Wedge Pressure (PAWP) is an important surrogate marker indicating high pressure in the left ventricle. PAWP is determined by Right Heart Catheterization (RHC) but it is an invasive procedure. A non-invasive method is useful in quickly identifying high-risk patients from a large population. In this work, we develop a tensor learning-based pipeline for identifying PAWP from multimodal cardiac Magnetic Resonance Imaging (MRI). This pipeline extracts spatial and temporal features from high-dimensional scans. For quality control, we incorporate an epistemic uncertainty-based binning strategy to identify poor-quality training samples. To improve the performance, we learn complementary information by integrating features from multimodal data: cardiac MRI with short-axis and four-chamber views, and Electronic Health Records. The experimental analysis on a large cohort of $1346$ subjects who underwent the RHC procedure for PAWP estimation indicates that the proposed pipeline has a diagnostic value and can produce promising performance with significant improvement over the baseline in clinical practice (i.e., $\Delta$AUC $=0.10$, $\Delta$Accuracy $=0.06$, and $\Delta$MCC $=0.39$). The decision curve analysis further confirms the clinical utility of our method.
Updated: 2024-04-06 19:56:42
标题: 基于张量的多模态学习用于通过心脏MRI预测肺动脉楔压
摘要: 心力衰竭是一种严重且危及生命的病症,可能导致左心室压力升高。肺动脉楔压(PAWP)是一个重要的替代标志,表明左心室内的高压。PAWP由右心导管化验(RHC)确定,但这是一种侵入性程序。非侵入性方法对于快速识别大量人群中的高危患者是有用的。在这项工作中,我们开发了一个基于张量学习的流程,用于从多模态心脏磁共振成像(MRI)中识别PAWP。该流程从高维扫描中提取空间和时间特征。为了质量控制,我们采用基于认知不确定性的分箱策略来识别质量差的训练样本。为了提高性能,我们通过整合多模态数据的特征来学习互补信息:心脏MRI与短轴和四腔视图,以及电子健康记录。对1346名接受RHC过程进行PAWP估计的受试者进行的实验分析表明,所提出的流程具有诊断价值,并且在临床实践中可以产生令人满意的性能,明显改善了基线(即ΔAUC = 0.10,Δ准确度 = 0.06,ΔMCC = 0.39)。决策曲线分析进一步确认了我们方法的临床实用性。
更新时间: 2024-04-06 19:56:42
领域: cs.LG,cs.CV,q-bio.QM
Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment
Pulmonary Arterial Wedge Pressure (PAWP) is an essential cardiovascular hemodynamics marker to detect heart failure. In clinical practice, Right Heart Catheterization is considered a gold standard for assessing cardiac hemodynamics while non-invasive methods are often needed to screen high-risk patients from a large population. In this paper, we propose a multimodal learning pipeline to predict PAWP marker. We utilize complementary information from Cardiac Magnetic Resonance Imaging (CMR) scans (short-axis and four-chamber) and Electronic Health Records (EHRs). We extract spatio-temporal features from CMR scans using tensor-based learning. We propose a graph attention network to select important EHR features for prediction, where we model subjects as graph nodes and feature relationships as graph edges using the attention mechanism. We design four feature fusion strategies: early, intermediate, late, and hybrid fusion. With a linear classifier and linear fusion strategies, our pipeline is interpretable. We validate our pipeline on a large dataset of $2,641$ subjects from our ASPIRE registry. The comparative study against state-of-the-art methods confirms the superiority of our pipeline. The decision curve analysis further validates that our pipeline can be applied to screen a large population. The code is available at https://github.com/prasunc/hemodynamics.
Updated: 2024-04-06 19:42:25
标题: 可解释的多模态学习用于心血管血液动力学评估
摘要: 肺动脉楔压(PAWP)是一种必不可少的心血管血流动力学标记,用于检测心力衰竭。在临床实践中,右心导管化验被认为是评估心脏血流动力学的黄金标准,而通常需要非侵入性方法来筛选大规模人群中的高危患者。在本文中,我们提出了一个多模态学习流程,用于预测PAWP标记。我们利用来自心脏磁共振成像(CMR)扫描(短轴和四腔室)和电子健康记录(EHRs)的互补信息。我们利用基于张量的学习从CMR扫描中提取时空特征。我们提出了一个图注意网络来选择重要的EHR特征进行预测,其中我们将受试者建模为图节点,将特征关系建模为图边,使用注意机制。我们设计了四种特征融合策略:早期、中间、晚期和混合融合。通过线性分类器和线性融合策略,我们的流程是可解释的。我们在我们的ASPIRE登记册中的2,641名受试者的大型数据集上验证了我们的流程。与最先进的方法进行的比较研究证实了我们流程的优越性。决策曲线分析进一步验证了我们的流程可应用于筛选大规模人群。代码可在https://github.com/prasunc/hemodynamics上找到。
更新时间: 2024-04-06 19:42:25
领域: cs.CV,cs.AI
Data Poisoning Attacks on Off-Policy Policy Evaluation Methods
Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to marginal adversarial perturbations to the data. We design a generic data poisoning attack framework leveraging influence functions from robust statistics to carefully construct perturbations that maximize error in the policy value estimates. We carry out extensive experimentation with multiple healthcare and control datasets. Our results demonstrate that many existing OPE methods are highly prone to generating value estimates with large errors when subject to data poisoning attacks, even for small adversarial perturbations. These findings question the reliability of policy values derived using OPE methods and motivate the need for developing OPE methods that are statistically robust to train-time data poisoning attacks.
Updated: 2024-04-06 19:27:57
标题: 数据中毒攻击对离线策略评估方法的影响
摘要: 离策略评估(OPE)方法是评估高风险领域(如医疗保健)政策的关键工具,在这些领域中,探索通常是不可行的、不道德的或昂贵的。然而,对于这类方法在面临数据质量敌对威胁时能否受信任的程度还未被广泛探讨。在这项工作中,我们首次尝试研究OPE方法对数据边际敌对扰动的敏感性。我们设计了一个通用的数据毒化攻击框架,利用鲁棒统计学中的影响函数来精心构造最大化政策值估计误差的扰动。我们对多个医疗保健和控制数据集进行了大量实验。我们的结果表明,许多现有的OPE方法在受到数据毒化攻击时很容易生成具有较大误差的价值估计,即使是对于小规模的敌对扰动也是如此。这些发现质疑了使用OPE方法推导的政策值的可靠性,并促使开发那些在训练时对数据毒化攻击具有统计鲁棒性的OPE方法的需求。
更新时间: 2024-04-06 19:27:57
领域: cs.LG,cs.AI,cs.CR
Explaining Indian Stock Market through Geometry of Scale free Networks
This paper presents an analysis of the Indian stock market using a method based on embedding the network in a hyperbolic space using Machine learning techniques. We claim novelty on four counts. First, it is demonstrated that the hyperbolic clusters resemble the topological network communities more closely than the Euclidean clusters. Second, we are able to clearly distinguish between periods of market stability and volatility through a statistical analysis of hyperbolic distance and hyperbolic shortest path distance corresponding to the embedded network. Third, we demonstrate that using the modularity of the embedded network significant market changes can be spotted early. Lastly, the coalescent embedding is able to segregate the certain market sectors thereby underscoring its natural clustering ability.
Updated: 2024-04-06 19:07:12
标题: 用规模自由网络的几何学解释印度股市
摘要: 本文提出了一种基于将网络嵌入双曲空间的机器学习方法来分析印度股市。我们在四个方面宣称具有创新性。首先,通过展示双曲簇更接近拓扑网络社区,而不是欧几里得簇。其次,通过对嵌入网络对应的双曲距离和双曲最短路径距离进行统计分析,我们能够清楚地区分市场稳定和波动期。第三,我们证明通过使用嵌入网络的模块化性质,可以及早发现重大市场变化。最后,聚合嵌入能够将某些市场部门分开,突显其自然的聚类能力。
更新时间: 2024-04-06 19:07:12
领域: physics.soc-ph,cs.LG,q-fin.ST
Advances in Differential Privacy and Differentially Private Machine Learning
There has been an explosion of research on differential privacy (DP) and its various applications in recent years, ranging from novel variants and accounting techniques in differential privacy to the thriving field of differentially private machine learning (DPML) to newer implementations in practice, like those by various companies and organisations such as census bureaus. Most recent surveys focus on the applications of differential privacy in particular contexts like data publishing, specific machine learning tasks, analysis of unstructured data, location privacy, etc. This work thus seeks to fill the gap for a survey that primarily discusses recent developments in the theory of differential privacy along with newer DP variants, viz. Renyi DP and Concentrated DP, novel mechanisms and techniques, and the theoretical developments in differentially private machine learning in proper detail. In addition, this survey discusses its applications to privacy-preserving machine learning in practice and a few practical implementations of DP.
Updated: 2024-04-06 18:49:24
标题: 差分隐私和差分私有机器学习的进展
摘要: 近年来,关于差分隐私(DP)及其各种应用的研究呈现爆炸式增长,涵盖了差分隐私的新变体和计算技术,以及蓬勃发展的差分隐私机器学习(DPML)领域,以及诸如普查局等各种公司和组织的最新实现。大多数最近的调查重点放在差分隐私在特定情境下的应用,如数据发布、特定机器学习任务、非结构化数据分析、位置隐私等。本文旨在填补一项调查的空白,主要讨论差分隐私理论的最新发展以及新的DP变体,如Renyi DP和Concentrated DP,新颖的机制和技术,以及不同差分隐私机器学习的理论发展。此外,本调查还讨论了差分隐私在实践中隐私保护机器学习的应用,以及一些DP的实际实现。
更新时间: 2024-04-06 18:49:24
领域: cs.CR
PCF-GAN: generating sequential data via the characteristic function of measures on the path space
Generating high-fidelity time series data using generative adversarial networks (GANs) remains a challenging task, as it is difficult to capture the temporal dependence of joint probability distributions induced by time-series data. Towards this goal, a key step is the development of an effective discriminator to distinguish between time series distributions. We propose the so-called PCF-GAN, a novel GAN that incorporates the path characteristic function (PCF) as the principled representation of time series distribution into the discriminator to enhance its generative performance. On the one hand, we establish theoretical foundations of the PCF distance by proving its characteristicity, boundedness, differentiability with respect to generator parameters, and weak continuity, which ensure the stability and feasibility of training the PCF-GAN. On the other hand, we design efficient initialisation and optimisation schemes for PCFs to strengthen the discriminative power and accelerate training efficiency. To further boost the capabilities of complex time series generation, we integrate the auto-encoder structure via sequential embedding into the PCF-GAN, which provides additional reconstruction functionality. Extensive numerical experiments on various datasets demonstrate the consistently superior performance of PCF-GAN over state-of-the-art baselines, in both generation and reconstruction quality. Code is available at https://github.com/DeepIntoStreams/PCF-GAN.
Updated: 2024-04-06 18:44:44
标题: PCF-GAN:通过路径空间上的测度特征函数生成序列数据
摘要: 使用生成对抗网络(GANs)生成高保真度时间序列数据仍然是一个具有挑战性的任务,因为很难捕捉由时间序列数据引发的联合概率分布的时间依赖性。为了实现这一目标,一个关键步骤是开发一个有效的鉴别器来区分时间序列分布。我们提出了所谓的PCF-GAN,这是一种新颖的GAN,它将路径特征函数(PCF)作为时间序列分布的基本表示纳入鉴别器,以增强其生成性能。一方面,我们通过证明PCF距离的特性、有界性、对生成器参数的可微性和弱连续性,建立了PCF距离的理论基础,这些保证了训练PCF-GAN的稳定性和可行性。另一方面,我们设计了高效的初始化和优化方案,以增强PCF的鉴别能力并加快训练效率。为了进一步提升复杂时间序列生成的能力,我们通过顺序嵌入将自动编码器结构整合到PCF-GAN中,提供额外的重构功能。对各种数据集进行了大量数值实验,证明了PCF-GAN在生成和重构质量上始终优于最先进的基准线性能。代码可在https://github.com/DeepIntoStreams/PCF-GAN上找到。
更新时间: 2024-04-06 18:44:44
领域: cs.LG
Comparative Analysis of ChatGPT, GPT-4, and Microsoft Bing Chatbots for GRE Test
This research paper presents an analysis of how well three artificial intelligence chatbots: Bing, ChatGPT, and GPT-4, perform when answering questions from standardized tests. The Graduate Record Examination is used in this paper as a case study. A total of 137 questions with different forms of quantitative reasoning and 157 questions with verbal categories were used to assess their capabilities. This paper presents the performance of each chatbot across various skills and styles tested in the exam. The proficiency of these chatbots in addressing image-based questions is also explored, and the uncertainty level of each chatbot is illustrated. The results show varying degrees of success across the chatbots, where GPT-4 served as the most proficient, especially in complex language understanding tasks and image-based questions. Results highlight the ability of these chatbots to pass the GRE with a high score, which encourages the use of these chatbots in test preparation. The results also show how important it is to ensure that, if the test is administered online, as it was during COVID, the test taker is segregated from these resources for a fair competition on higher education opportunities.
Updated: 2024-04-06 18:37:05
标题: 《GRE考试中ChatGPT、GPT-4和微软必应聊天机器人的比较分析》
摘要: 这篇研究论文分析了三种人工智能聊天机器人(Bing、ChatGPT和GPT-4)在回答标准化测试问题时的表现。本文以研究生入学考试作为案例研究对象。共使用了137个不同形式的定量推理问题和157个语言类别问题来评估它们的能力。本文展示了每个聊天机器人在考试中测试的各种技能和风格的表现。还探讨了这些聊天机器人在应对基于图像的问题时的熟练程度,并展示了每个聊天机器人的不确定性水平。结果显示,不同聊天机器人的成功程度各不相同,其中GPT-4表现最为熟练,特别是在复杂语言理解任务和基于图像的问题方面。结果突显了这些聊天机器人以高分通过GRE的能力,这鼓励人们在考试准备中使用这些聊天机器人。结果还显示了在确保公平竞争高等教育机会方面,尤其是在像COVID期间进行在线考试时,确保考试人员与这些资源隔离的重要性。
更新时间: 2024-04-06 18:37:05
领域: cs.CL,cs.AI
Securing the Skies: An IRS-Assisted AoI-Aware Secure Multi-UAV System with Efficient Task Offloading
Unmanned Aerial Vehicles (UAVs) are integral in various sectors like agriculture, surveillance, and logistics, driven by advancements in 5G. However, existing research lacks a comprehensive approach addressing both data freshness and security concerns. In this paper, we address the intricate challenges of data freshness, and security, especially in the context of eavesdropping and jamming in modern UAV networks. Our framework incorporates exponential AoI metrics and emphasizes secrecy rate to tackle eavesdropping and jamming threats. We introduce a transformer-enhanced Deep Reinforcement Learning (DRL) approach to optimize task offloading processes. Comparative analysis with existing algorithms showcases the superiority of our scheme, indicating its promising advancements in UAV network management.
Updated: 2024-04-06 17:41:00
标题: 保护天空:一种由IRS辅助的AoI感知安全多无人机系统,具有高效的任务卸载
摘要: 无人机(UAVs)在农业、监视和物流等各个领域都起着重要作用,这得益于5G技术的进步。然而,现有研究缺乏综合方法来解决数据新鲜度和安全性问题。在本文中,我们探讨了数据新鲜度和安全性等复杂挑战,特别是在现代无人机网络中窃听和干扰方面。我们的框架融入了指数AoI指标,并强调保密率以应对窃听和干扰威胁。我们引入了一种基于变压器增强的深度强化学习(DRL)方法来优化任务卸载过程。与现有算法的比较分析展示了我们方案的优越性,表明其在无人机网络管理方面具有前景。
更新时间: 2024-04-06 17:41:00
领域: eess.SY,cs.CR,cs.LG,cs.NI,cs.SY
The Identification and Categorization of Anemia Through Artificial Neural Networks: A Comparative Analysis of Three Models
This paper presents different neural network-based classifier algorithms for diagnosing and classifying Anemia. The study compares these classifiers with established models such as Feed Forward Neural Network (FFNN), Elman network, and Non-linear Auto-Regressive Exogenous model (NARX). Experimental evaluations were conducted using data from clinical laboratory test results for 230 patients. The proposed neural network features nine inputs (age, gender, RBC, HGB, HCT, MCV, MCH, MCHC, WBCs) and one output. The simulation outcomes for diverse patients demonstrate that the suggested artificial neural network rapidly and accurately detects the presence of the disease. Consequently, the network could be seamlessly integrated into clinical laboratories for automatic generation of Anemia patients' reports Additionally, the suggested method is affordable and can be deployed on hardware at low costs.
Updated: 2024-04-06 17:37:45
标题: 通过人工神经网络识别和分类贫血:三种模型的比较分析
摘要: 本文介绍了用于诊断和分类贫血的不同基于神经网络的分类器算法。研究将这些分类器与已建立的模型(如前馈神经网络(FFNN)、埃尔曼网络和非线性自回归外生模型(NARX))进行了比较。实验评估使用了230名患者的临床实验室检验结果数据。所提出的神经网络特征包括九个输入(年龄、性别、RBC、HGB、HCT、MCV、MCH、MCHC、WBCs)和一个输出。对不同患者的模拟结果表明,建议的人工神经网络能够快速并准确地检测出疾病的存在。因此,该网络可以无缝集成到临床实验室中,用于自动生成贫血患者的报告。此外,建议的方法成本低廉,可以部署在硬件上。
更新时间: 2024-04-06 17:37:45
领域: cs.LG,cs.SY,eess.SY
Multicalibration for Confidence Scoring in LLMs
This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting groupings of the data. We show how to form groupings for prompt/completion pairs that are correlated with the probability of correctness via two techniques: clustering within an embedding space, and "self-annotation" - querying the LLM by asking it various yes-or-no questions about the prompt. We also develop novel variants of multicalibration algorithms that offer performance improvements by reducing their tendency to overfit. Through systematic benchmarking across various question answering datasets and LLMs, we show how our techniques can yield confidence scores that provide substantial improvements in fine-grained measures of both calibration and accuracy compared to existing methods.
Updated: 2024-04-06 17:33:37
标题: 多校准在LLMs中的置信度评分
摘要: 本文提出使用“多校准”来为大型语言模型(LLMs)生成的输出产生可解释和可靠的置信度分数。多校准要求不仅在边际上进行校准,而且同时跨越数据的各种交叉分组。我们展示了如何形成与正确性概率相关的提示/完成对的分组,通过两种技术:在嵌入空间内进行聚类和“自我标注” - 通过向LLM询问各种关于提示的是或否问题。我们还开发了多校准算法的新变体,通过减少它们过度拟合的倾向,提供性能改进。通过系统对各种问答数据集和LLMs的基准测试,我们展示了我们的技术如何产生置信度分数,与现有方法相比,在校准和准确性的细粒度测量方面提供了显著的改进。
更新时间: 2024-04-06 17:33:37
领域: stat.ML,cs.CL,cs.LG
Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view ($360^{\circ}$ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).
Updated: 2024-04-06 17:23:43
标题: Z-Splat:用于相机声呐融合的Z轴高斯点喷射
摘要: 不同iable 3D高斯散射(GS)正成为计算机视觉和图形学中重要的技术,用于重建3D场景。GS将场景表示为一组具有不同不透明度的3D高斯,并采用计算效率高的散射操作以及解析导数来计算给定从不同视角捕获的场景图像的3D高斯参数。不幸的是,在许多现实世界的成像场景中,包括水下成像、建筑内部房间和自主导航,捕捉周围视图($360^{\circ}$视角)图像是不可能或不切实际的。在这些受限基线成像场景中,GS算法存在着众所周知的“缺失锥”问题,导致沿深度轴的重建质量较差。在本文中,我们展示了利用瞬变数据(来自声纳)可以通过沿深度轴采样高频数据来解决缺失锥问题。我们扩展了两种常用声纳的高斯散射算法,并提出了同时利用RGB相机数据和声纳数据的融合算法。通过各种成像场景的模拟、仿真和硬件实验,我们展示了所提出的融合算法可以显著改善新视图合成(PSNR提高5 dB)和3D几何重建(Chamfer距离降低60%)。
更新时间: 2024-04-06 17:23:43
领域: cs.CV,cs.GR,cs.LG
Predictive Modeling for Breast Cancer Classification in the Context of Bangladeshi Patients: A Supervised Machine Learning Approach with Explainable AI
Breast cancer has rapidly increased in prevalence in recent years, making it one of the leading causes of mortality worldwide. Among all cancers, it is by far the most common. Diagnosing this illness manually requires significant time and expertise. Since detecting breast cancer is a time-consuming process, preventing its further spread can be aided by creating machine-based forecasts. Machine learning and Explainable AI are crucial in classification as they not only provide accurate predictions but also offer insights into how the model arrives at its decisions, aiding in the understanding and trustworthiness of the classification results. In this study, we evaluate and compare the classification accuracy, precision, recall, and F-1 scores of five different machine learning methods using a primary dataset (500 patients from Dhaka Medical College Hospital). Five different supervised machine learning techniques, including decision tree, random forest, logistic regression, naive bayes, and XGBoost, have been used to achieve optimal results on our dataset. Additionally, this study applied SHAP analysis to the XGBoost model to interpret the model's predictions and understand the impact of each feature on the model's output. We compared the accuracy with which several algorithms classified the data, as well as contrasted with other literature in this field. After final evaluation, this study found that XGBoost achieved the best model accuracy, which is 97%.
Updated: 2024-04-06 17:23:21
标题: 孟加拉国患者乳腺癌分类的预测建模:可解释性人工智能的监督机器学习方法
摘要: 乳腺癌在近年来的发病率迅速增加,成为全球死亡率最高的疾病之一。在所有癌症中,乳腺癌是最常见的。手动诊断这种疾病需要大量时间和专业知识。由于检测乳腺癌是一个耗时的过程,通过创建基于机器的预测可以帮助防止其进一步传播。机器学习和可解释的人工智能在分类中至关重要,因为它们不仅提供准确的预测,还提供了模型决策的见解,有助于理解和信任分类结果。在这项研究中,我们评估并比较了使用主要数据集(来自达卡医学院医院的500名患者)的五种不同机器学习方法的分类准确性、精确度、召回率和F-1分数。我们使用了五种不同的监督机器学习技术,包括决策树、随机森林、逻辑回归、朴素贝叶斯和XGBoost,以在我们的数据集上取得最佳结果。此外,该研究应用了SHAP分析来解释XGBoost模型的预测结果,并了解每个特征对模型输出的影响。我们比较了几种算法对数据进行分类的准确性,并与该领域的其他文献进行了对比。经过最终评估,该研究发现XGBoost实现了最佳的模型准确性,达到了97%。
更新时间: 2024-04-06 17:23:21
领域: cs.LG,cs.AI,cs.CV
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series
Large pre-trained models for zero/few-shot learning excel in language and vision domains but encounter challenges in multivariate time series (TS) due to the diverse nature and scarcity of publicly available pre-training data. Consequently, there has been a recent surge in utilizing pre-trained large language models (LLMs) with token adaptations for TS forecasting. These approaches employ cross-domain transfer learning and surprisingly yield impressive results. However, these models are typically very slow and large (~billion parameters) and do not consider cross-channel correlations. To address this, we present Tiny Time Mixers (TTM), a significantly small model based on the lightweight TSMixer architecture. TTM marks the first success in developing fast and tiny general pre-trained models (<1M parameters), exclusively trained on public TS datasets, with effective transfer learning capabilities for forecasting. To tackle the complexity of pre-training on multiple datasets with varied temporal resolutions, we introduce several novel enhancements such as adaptive patching, dataset augmentation via downsampling, and resolution prefix tuning. Moreover, we employ a multi-level modeling strategy to effectively model channel correlations and infuse exogenous signals during fine-tuning, a crucial capability lacking in existing benchmarks. TTM shows significant accuracy gains (12-38\%) over popular benchmarks in few/zero-shot forecasting. It also drastically reduces the compute needs as compared to LLM-TS methods, with a 14X cut in learnable parameters, 106X less total parameters, and substantial reductions in fine-tuning (65X) and inference time (54X). In fact, TTM's zero-shot often surpasses the few-shot results in many popular benchmarks, highlighting the efficacy of our approach. Code and pre-trained models will be open-sourced.
Updated: 2024-04-06 17:16:18
标题: 微小时间混合器(TTMs):用于增强多变量时间序列零/少样本预测的快速预训练模型
摘要: 大型预训练模型在语言和视觉领域的零/少样本学习中表现出色,但在多变量时间序列(TS)方面面临挑战,原因是预训练数据的多样性和稀缺性。因此,最近在TS预测中利用具有令牌适应性的预训练大型语言模型(LLMs)出现了激增。这些方法采用跨领域迁移学习,令人惊讶地取得了令人印象深刻的结果。然而,这些模型通常非常缓慢和庞大(~十亿参数),并且不考虑跨通道相关性。为了解决这个问题,我们提出了Tiny Time Mixers(TTM),这是一种基于轻量级TSMixer架构的显着小型模型。TTM标志着第一个成功开发出快速而小型的通用预训练模型(<1M参数),专门在公共TS数据集上进行训练,并具有用于预测的有效迁移学习能力。为了解决在多个具有不同时间分辨率的数据集上进行预训练的复杂性,我们引入了一些新颖的增强功能,如自适应补丁、通过降采样进行数据集增强和分辨率前缀调整。此外,我们采用多层建模策略,有效地建模通道相关性,并在微调期间注入外生信号,这是现有基准测试中缺乏的关键能力。TTM在少/零样本预测中显示出显著的准确性提升(12-38%)。与LLM-TS方法相比,它还大幅减少了计算需求,可减少14倍的可学习参数、106倍的总参数,并显著减少微调(65倍)和推理时间(54倍)。事实上,TTM的零样本通常超过许多流行基准测试中的少样本结果,突显了我们方法的有效性。代码和预训练模型将开源。
更新时间: 2024-04-06 17:16:18
领域: cs.LG,cs.AI
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened inter-level communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting inter-level cooperation. Here, we propose a goal-conditioned HRL framework named Guided Cooperation via Model-based Rollout (GCMR), aiming to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics. Firstly, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Secondly, to prevent disruption by the unseen subgoals and states, lower-level Q-function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Thirdly, we propose a one-step rollout-based planning, using higher-level critics to guide the lower-level policy. Specifically, we estimate the value of future states of the lower-level policy using the higher-level critic function, thereby transmitting global task information downwards to avoid local pitfalls. These three critical components in GCMR are expected to facilitate inter-level cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG, yields more stable and robust policy improvement compared to various baselines and significantly outperforms previous state-of-the-art algorithms.
Updated: 2024-04-06 17:07:13
标题: 通过基于模型的展开实现层次强化学习中的引导式合作
摘要: 目标条件的分层强化学习(HRL)提供了一种有希望的方法,通过时间抽象在复杂的、长期的强化学习(RL)任务中实现有效的探索。实证上,增强的层间通信和协调可以在分层系统中引起更稳定和健壮的策略改进。然而,大多数现有的目标条件的HRL算法主要集中在子目标的发现上,忽视了层间合作。在这里,我们提出了一个名为基于模型展开的指导合作(GCMR)的目标条件的HRL框架,旨在通过利用前向动力学来桥接层间信息的同步和合作。首先,GCMR通过基于模型的展开减轻了在脱机校正中的状态转换错误,从而提高了样本效率。其次,为了防止未见子目标和状态的干扰,通过使用一个模型推断的上限对较低级别的Q函数梯度进行约束,从而导致更稳定的行为策略,有利于有效的探索。第三,我们提出了一个基于一步展开的计划,使用更高级别的评论家来指导较低级别的策略。具体来说,我们使用更高级别的评论家函数估计较低级别策略的未来状态的价值,从而向下传递全局任务信息,以避免局部陷阱。这三个关键组件在GCMR中预计将大大促进层间合作。实验结果表明,将提出的GCMR框架与HIGL的一种分散变体,即ACLG相结合,与各种基线相比,产生了更稳定和健壮的策略改进,并明显优于先前的最先进算法。
更新时间: 2024-04-06 17:07:13
领域: cs.LG,cs.AI
Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning
Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. Nevertheless, offline RL inevitably faces the problem of distributional shifts, where the states and actions encountered during policy execution may not be in the training dataset distribution. A common solution involves incorporating conservatism into the policy or the value function to safeguard against uncertainties and unknowns. In this work, we focus on achieving the same objectives of conservatism but from a different perspective. We propose COmpositional COnservatism with Anchor-seeking (COCOA) for offline RL, an approach that pursues conservatism in a compositional manner on top of the transductive reparameterization (Netanyahu et al., 2023), which decomposes the input variable (the state in our case) into an anchor and its difference from the original input. Our COCOA seeks both in-distribution anchors and differences by utilizing the learned reverse dynamics model, encouraging conservatism in the compositional input space for the policy or value function. Such compositional conservatism is independent of and agnostic to the prevalent behavioral conservatism in offline RL. We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark, where COCOA generally improves the performance of each algorithm. The code is available at https://github.com/runamu/compositional-conservatism.
Updated: 2024-04-06 17:02:18
标题: 组成保守性:离线强化学习中的传导方法
摘要: 离线强化学习(Offline reinforcement learning, RL)是一个吸引人的框架,可以从过去的经验中学习最优策略,而无需额外与环境进行交互。然而,离线RL不可避免地面临着分布转移的问题,即在策略执行过程中遇到的状态和动作可能不在训练数据集的分布中。一个常见的解决方案是将保守性纳入到策略或价值函数中,以防止不确定性和未知因素。在本文中,我们专注于从不同的角度实现保守性的相同目标。我们提出了一种针对离线RL的COmpositional COnservatism with Anchor-seeking (COCOA)方法,该方法在Netanyahu等人(2023)的传导重参数化的基础上以组合方式追求保守性,将输入变量(在我们的案例中是状态)分解为锚点和其与原始输入的差异。我们的COCOA通过利用学习到的逆动力学模型寻找分布内的锚点和差异,鼓励在策略或价值函数的组合输入空间中实现保守性。这种组合保守性与离线RL中普遍的行为保守性是独立且不可知的。我们将COCOA应用于四种最先进的离线RL算法,并在D4RL基准测试上对它们进行评估,结果显示COCOA通常提高了每个算法的性能。代码可在https://github.com/runamu/compositional-conservatism上找到。
更新时间: 2024-04-06 17:02:18
领域: cs.LG,cs.AI,cs.RO
Automatic Gradient Estimation for Calibrating Crowd Models with Discrete Decision Making
Recently proposed gradient estimators enable gradient descent over stochastic programs with discrete jumps in the response surface, which are not covered by automatic differentiation (AD) alone. Although these estimators' capability to guide a swift local search has been shown for certain problems, their applicability to models relevant to real-world applications remains largely unexplored. As the gradients governing the choice in candidate solutions are calculated from sampled simulation trajectories, the optimization procedure bears similarities to metaheuristics such as particle swarm optimization, which puts the focus on the different methods' calibration progress per function evaluation. Here, we consider the calibration of force-based crowd evacuation models based on the popular Social Force model augmented by discrete decision making. After studying the ability of an AD-based estimator for branching programs to capture the simulation's rugged response surface, calibration problems are tackled using gradient descent and two metaheuristics. As our main insights, we find 1) that the estimation's fidelity benefits from disregarding jumps of large magnitude inherent to the Social Force model, and 2) that the common problem of calibration by adjusting a simulation input distribution obviates the need for AD across the Social Force calculations, allowing gradient descent to excel.
Updated: 2024-04-06 16:48:12
标题: 使用离散决策制定校准人群模型的自动梯度估计
摘要: 最近提出的梯度估计器使得可以在具有响应曲面中离散跳变的随机程序上进行梯度下降,这是仅仅依靠自动微分(AD)无法实现的。尽管已经证明了这些估计器对于某些问题引导快速局部搜索的能力,但它们在与真实应用相关的模型中的适用性仍然大部分未被探索。由于控制候选解选择的梯度是从采样的模拟轨迹中计算得出的,优化过程类似于粒子群优化等元启发式方法,这些方法侧重于每个函数评估的不同方法的校准进展。在这里,我们考虑基于受欢迎的社会力模型增强的基于力的人群疏散模型的校准,该模型包括离散决策制定。在研究了基于AD的估计器对于分支程序捕捉模拟的复杂响应曲面的能力后,利用梯度下降和两种元启发式方法来解决校准问题。作为我们的主要发现,我们发现1)估计的准确性受益于忽略社会力模型固有的大幅度跳跃,2)通过调整模拟输入分布进行校准的常见问题消除了对社会力计算进行AD的必要性,从而使梯度下降表现出色。
更新时间: 2024-04-06 16:48:12
领域: cs.LG,cs.MA
Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks
This paper introduces PhyloLM, a method applying phylogenetic algorithms to Large Language Models to explore their finetuning relationships, and predict their performance characteristics. By leveraging the phylogenetic distance metric, we construct dendrograms, which satisfactorily capture distinct LLM families (across a set of 77 open-source and 22 closed models). Furthermore, phylogenetic distance predicts performances in benchmarks (we test MMLU and ARC), thus enabling a time and cost-effective estimation of LLM capabilities. The approach translates genetic concepts to machine learning, offering tools to infer LLM development, relationships, and capabilities, even in the absence of transparent training information.
Updated: 2024-04-06 16:16:30
标题: 推断大型语言模型的系统发育并预测它们在基准测试中的性能
摘要: 本文介绍了PhyloLM,一种将系统发育算法应用于大型语言模型的方法,以探索它们的微调关系,并预测它们的性能特征。通过利用系统发育距离度量,我们构建了树状图,可以满意地捕捉到不同的LLM家族(跨77个开源和22个封闭模型)。此外,系统发育距离可以预测基准测试中的性能(我们测试了MMLU和ARC),从而实现对LLM能力的时间和成本有效估计。该方法将遗传概念转化为机器学习,提供了推断LLM开发、关系和能力的工具,即使在没有透明训练信息的情况下也能做到。
更新时间: 2024-04-06 16:16:30
领域: cs.CL,cs.LG,q-bio.PE
Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes
Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought process into a decision-making model for remediation. This involves an expert identifying (A) the student's error, (B) a remediation strategy, and (C) their intention before generating a response. We construct a dataset of 700 real tutoring conversations, annotated by experts with their decisions. We evaluate state-of-the-art LLMs on our dataset and find that the expert's decision-making model is critical for LLMs to close the gap: responses from GPT4 with expert decisions (e.g., "simplify the problem") are +76% more preferred than without. Additionally, context-sensitive decisions are critical to closing pedagogical gaps: random decisions decrease GPT4's response quality by -97% than expert decisions. Our work shows the potential of embedding expert thought processes in LLM generations to enhance their capability to bridge novice-expert knowledge gaps. Our dataset and code can be found at: \url{https://github.com/rosewang2008/bridge}.
Updated: 2024-04-06 16:15:27
标题: 通过决策模型弥合新手和专家之间的差距:数学错误改正案例研究
摘要: 规模化高质量辅导仍然是教育领域的一大挑战。由于需求增长,许多平台雇佣了新手导师,这些导师与经验丰富的教育工作者不同,往往难以解决学生的错误,因此无法抓住主要的学习机会。我们的研究探讨了大型语言模型(LLMs)在纠正数学错误方面缩小新手与专家之间知识差距的潜力。我们提出了Bridge方法,该方法利用认知任务分析将专家的潜在思维过程转化为纠正模型的决策模型。这包括专家识别(A)学生的错误,(B)纠正策略,以及(C)他们在生成响应之前的意图。我们构建了一个由专家标注其决策的700个真实辅导对话的数据集。我们在我们的数据集上评估了最先进的LLMs,并发现专家的决策模型对于LLMs来说是缩小差距的关键:GPT4与专家决策(例如“简化问题”)的响应比没有专家决策的响应更受欢迎多出76%。此外,对上下文敏感的决策对于缩小教育差距至关重要:随机决策使GPT4的响应质量降低了97%。我们的研究展示了将专家思维过程嵌入LLM生成中以增强其缩小新手与专家知识差距能力的潜力。我们的数据集和代码可在以下网址找到:\url{https://github.com/rosewang2008/bridge}。
更新时间: 2024-04-06 16:15:27
领域: cs.CL,cs.AI
A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music
Introduction: Music generation is a complex task that has received significant attention in recent years, and deep learning techniques have shown promising results in this field. Objectives: While extensive work has been carried out on generating Piano and other Western music, there is limited research on generating classical Indian music due to the scarcity of Indian music in machine-encoded formats. In this technical paper, methods for generating classical Indian music, specifically tabla music, is proposed. Initially, this paper explores piano music generation using deep learning architectures. Then the fundamentals are extended to generating tabla music. Methods: Tabla music in waveform (.wav) files are pre-processed using the librosa library in Python. A novel Bi-LSTM with an Attention approach and a transformer model are trained on the extracted features and labels. Results: The models are then used to predict the next sequences of tabla music. A loss of 4.042 and MAE of 1.0814 are achieved with the Bi-LSTM model. With the transformer model, a loss of 55.9278 and MAE of 3.5173 are obtained for tabla music generation. Conclusion: The resulting music embodies a harmonious fusion of novelty and familiarity, pushing the limits of music composition to new horizons.
Updated: 2024-04-06 16:15:02
标题: 一个用于生成Tabla音乐的新型Bi-LSTM和Transformer架构
摘要: 简介:音乐生成是一个复杂的任务,近年来受到了重视,深度学习技术在这一领域显示出了令人期待的结果。目标:虽然在生成钢琴和其他西方音乐方面进行了大量工作,但由于印度音乐在机器编码格式中的稀缺性,对生成古典印度音乐的研究有限。在这篇技术论文中,提出了生成古典印度音乐,特别是套鼓音乐的方法。首先,本文探讨了使用深度学习架构生成钢琴音乐。然后将基础知识扩展到生成套鼓音乐。方法:使用Python中的librosa库预处理波形文件中的套鼓音乐(.wav)。在提取的特征和标签上训练了一种新颖的具有注意力方法的Bi-LSTM和Transformer模型。结果:然后使用这些模型预测套鼓音乐的下一个序列。Bi-LSTM模型实现了4.042的损失和1.0814的MAE。对于套鼓音乐的生成,Transformer模型获得了55.9278的损失和3.5173的MAE。结论:生成的音乐体现了新奇和熟悉的和谐融合,将音乐创作的界限推向新的地平线。
更新时间: 2024-04-06 16:15:02
领域: cs.SD,cs.AI,eess.AS
Domain Generalisation via Imprecise Learning
Out-of-distribution (OOD) generalisation is challenging because it involves not only learning from empirical data, but also deciding among various notions of generalisation, e.g., optimising the average-case risk, worst-case risk, or interpolations thereof. While this choice should in principle be made by the model operator like medical doctors, this information might not always be available at training time. The institutional separation between machine learners and model operators leads to arbitrary commitments to specific generalisation strategies by machine learners due to these deployment uncertainties. We introduce the Imprecise Domain Generalisation framework to mitigate this, featuring an imprecise risk optimisation that allows learners to stay imprecise by optimising against a continuous spectrum of generalisation strategies during training, and a model framework that allows operators to specify their generalisation preference at deployment. Supported by both theoretical and empirical evidence, our work showcases the benefits of integrating imprecision into domain generalisation.
Updated: 2024-04-06 16:05:48
标题: 领域泛化的不精确学习
摘要: 超出分布(OOD)概括是具有挑战性的,因为它不仅涉及从经验数据中学习,还涉及在各种概括概念中做出决定,例如优化平均风险、最坏风险或其内插。虽然原则上这个选择应该由模型操作者(如医生)做出,但这些信息在训练时可能并不总是可用的。机器学习者与模型操作者之间的制度分离导致机器学习者对特定概括策略的任意承诺,由于这些部署不确定性。我们引入了不精确域概括框架来缓解这一问题,其中包括不精确风险优化,允许学习者在训练期间通过优化连续的概括策略谱保持不精确,以及一个模型框架,允许操作者在部署时指定他们的概括偏好。通过理论和实证证据的支持,我们的工作展示了将不精确性整合到领域概括中的好处。
更新时间: 2024-04-06 16:05:48
领域: cs.LG
Securing OPEN-RAN Equipment Using Blockchain-Based Supply Chain Verification
The disaggregated and multi-vendor nature of OPEN-RAN networks introduces new supply chain security risks, making equipment authenticity and integrity crucial challenges. Robust solutions are needed to mitigate vulnerabilities in manufacturing and integration. This paper puts forth a novel blockchain-based approach to secure OPEN-RAN equipment through its lifecycle. By combining firmware authentication codes, a permissioned blockchain ledger, and equipment node validators, we architect a tamper-resistant ecosystem to track provenance. The outlined design, while conceptual, establishes a foundation and roadmap for future realization. Through careful implementation planning, development of core components like firmware signed hashes and smart contracts, and rigorous performance evaluation, this paper can evolve from concept to practice. There is a vivid potential to make OPEN-RAN supply chains corner to corner secure, igniting further research and real-world deployment.
Updated: 2024-04-06 15:59:55
标题: 使用基于区块链的供应链验证确保OPEN-RAN设备的安全
摘要: OPEN-RAN网络的分散化和多供应商性质引入了新的供应链安全风险,使设备的真实性和完整性成为关键挑战。需要强大的解决方案来减轻制造和集成过程中的漏洞。本文提出了一种基于区块链的新型方法,通过其生命周期保护OPEN-RAN设备的安全。通过结合固件认证代码、许可的区块链账本和设备节点验证器,我们构建了一个防篡改的生态系统来追踪设备来源。所概述的设计,虽然是概念性的,但为未来实现奠定了基础和路线图。通过仔细的实施计划、开发核心组件如固件签名哈希和智能合约,以及严格的性能评估,本文可以从概念演变为实践。有潜力使OPEN-RAN供应链从头到尾都安全,激发进一步研究和现实世界的部署。
更新时间: 2024-04-06 15:59:55
领域: cs.CR,cs.NI
Autonomous Artificial Intelligence Agents for Clinical Decision Making in Oncology
Multimodal artificial intelligence (AI) systems have the potential to enhance clinical decision-making by interpreting various types of medical data. However, the effectiveness of these models across all medical fields is uncertain. Each discipline presents unique challenges that need to be addressed for optimal performance. This complexity is further increased when attempting to integrate different fields into a single model. Here, we introduce an alternative approach to multimodal medical AI that utilizes the generalist capabilities of a large language model (LLM) as a central reasoning engine. This engine autonomously coordinates and deploys a set of specialized medical AI tools. These tools include text, radiology and histopathology image interpretation, genomic data processing, web searches, and document retrieval from medical guidelines. We validate our system across a series of clinical oncology scenarios that closely resemble typical patient care workflows. We show that the system has a high capability in employing appropriate tools (97%), drawing correct conclusions (93.6%), and providing complete (94%), and helpful (89.2%) recommendations for individual patient cases while consistently referencing relevant literature (82.5%) upon instruction. This work provides evidence that LLMs can effectively plan and execute domain-specific models to retrieve or synthesize new information when used as autonomous agents. This enables them to function as specialist, patient-tailored clinical assistants. It also simplifies regulatory compliance by allowing each component tool to be individually validated and approved. We believe, that our work can serve as a proof-of-concept for more advanced LLM-agents in the medical domain.
Updated: 2024-04-06 15:50:19
标题: 《肿瘤学临床决策中的自主人工智能代理》
摘要: 多模态人工智能(AI)系统有潜力通过解释各种类型的医疗数据来增强临床决策。然而,这些模型在所有医学领域的有效性尚不确定。每个学科都面临着需要解决的独特挑战,以实现最佳性能。当尝试将不同领域整合到单一模型时,这种复杂性进一步增加。在这里,我们介绍了一种多模态医学AI的替代方法,利用大型语言模型(LLM)的通用能力作为中央推理引擎。该引擎自主协调和部署一组专门的医学AI工具。这些工具包括文本、放射学和组织病理学图像解释、基因组数据处理、网络搜索以及从医学指南中检索文件。我们验证了我们的系统在一系列临床肿瘤学场景中,这些场景与典型的患者护理工作流程非常相似。我们展示了该系统在使用适当工具(97%)、得出正确结论(93.6%)以及为个体患者案例提供完整(94%)和有用(89.2%)建议方面具有高能力,同时在执行指令时始终参考相关文献(82.5%)。这项工作提供了证据,表明LLM在作为自主代理时能够有效地规划和执行特定领域的模型以检索或综合新信息。这使它们能够作为专家、针对患者的临床助手运行。它还通过允许对每个组件工具进行单独验证和批准,简化了监管合规性。我们相信,我们的工作可以作为医学领域更先进的LLM代理的概念验证。
更新时间: 2024-04-06 15:50:19
领域: cs.AI,q-bio.TO
Adaptive Intra-Class Variation Contrastive Learning for Unsupervised Person Re-Identification
The memory dictionary-based contrastive learning method has achieved remarkable results in the field of unsupervised person Re-ID. However, The method of updating memory based on all samples does not fully utilize the hardest sample to improve the generalization ability of the model, and the method based on hardest sample mining will inevitably introduce false-positive samples that are incorrectly clustered in the early stages of the model. Clustering-based methods usually discard a significant number of outliers, leading to the loss of valuable information. In order to address the issues mentioned before, we propose an adaptive intra-class variation contrastive learning algorithm for unsupervised Re-ID, called AdaInCV. And the algorithm quantitatively evaluates the learning ability of the model for each class by considering the intra-class variations after clustering, which helps in selecting appropriate samples during the training process of the model. To be more specific, two new strategies are proposed: Adaptive Sample Mining (AdaSaM) and Adaptive Outlier Filter (AdaOF). The first one gradually creates more reliable clusters to dynamically refine the memory, while the second can identify and filter out valuable outliers as negative samples.
Updated: 2024-04-06 15:48:14
标题: 自适应内类变异对比学习用于无监督人员再识别
摘要: 基于记忆词典的对比学习方法在无监督人员重新识别领域取得了显著的成果。然而,基于所有样本更新记忆的方法并未充分利用最难样本来提高模型的泛化能力,而基于最难样本挖掘的方法将不可避免地引入误正例样本,在模型早期阶段错误聚类。基于聚类的方法通常丢弃大量异常值,导致有价值信息的丢失。为了解决前述问题,我们提出了一种自适应内类变异对比学习算法,名为AdaInCV,用于无监督重新识别。该算法通过考虑聚类后的内类变异来定量评估模型对每个类的学习能力,有助于在模型训练过程中选择适当的样本。具体而言,提出了两种新策略:自适应样本挖掘(AdaSaM)和自适应异常值过滤(AdaOF)。前者逐渐创建更可靠的聚类以动态细化记忆,而后者可以识别并过滤有价值的异常值作为负样本。
更新时间: 2024-04-06 15:48:14
领域: cs.CV,cs.AI
Focused Active Learning for Histopathological Image Classification
Active Learning (AL) has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms. However, existing AL methods often struggle in realistic settings with artifacts, ambiguities, and class imbalances, as commonly seen in the medical field. The lack of precise uncertainty estimations leads to the acquisition of images with a low informative value. To address these challenges, we propose Focused Active Learning (FocAL), which combines a Bayesian Neural Network with Out-of-Distribution detection to estimate different uncertainties for the acquisition function. Specifically, the weighted epistemic uncertainty accounts for the class imbalance, aleatoric uncertainty for ambiguous images, and an OoD score for artifacts. We perform extensive experiments to validate our method on MNIST and the real-world Panda dataset for the classification of prostate cancer. The results confirm that other AL methods are 'distracted' by ambiguities and artifacts which harm the performance. FocAL effectively focuses on the most informative images, avoiding ambiguities and artifacts during acquisition. For both experiments, FocAL outperforms existing AL approaches, reaching a Cohen's kappa of 0.764 with only 0.69% of the labeled Panda data.
Updated: 2024-04-06 15:31:57
标题: 专注的主动学习用于组织病理图像分类
摘要: 主动学习(AL)有潜力解决数字病理学的一个主要问题:为机器学习算法高效获取标记数据。然而,现有的AL方法在现实环境中常见的伪影、模糊和类别不平衡等问题上经常遇到困难,这在医学领域中很常见。缺乏精确的不确定性估计会导致获取价值较低的图像。为了解决这些挑战,我们提出了Focused Active Learning(FocAL),它结合了贝叶斯神经网络和外域检测,用于估计获取函数的不同不确定性。具体而言,加权的认知不确定性考虑了类别不平衡,不确定性考虑了模糊图像,OoD分数考虑了伪影。我们进行了大量实验,验证了我们在MNIST和真实的Panda数据集上用于前列腺癌分类的方法。结果证实其他AL方法受到模糊和伪影的'干扰',这会损害性能。FocAL有效地专注于最具信息价值的图像,避免在获取过程中出现模糊和伪影。在两个实验中,FocAL优于现有的AL方法,仅使用0.69%的标记Panda数据就达到了0.764的Cohen's kappa值。
更新时间: 2024-04-06 15:31:57
领域: cs.CV,cs.AI
Learning Minimal NAP Specifications for Neural Network Verification
Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represented as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.
Updated: 2024-04-06 15:31:20
标题: 学习神经网络验证的最小NAP规范
摘要: 规范在神经网络验证中起着至关重要的作用。它们定义了我们的目标验证精确输入区域,通常表示为L-无穷范数球。尽管最近的研究建议使用神经激活模式(NAPs)作为验证未见测试集数据的规范,但它侧重于计算最精细的NAPs,通常局限于输入空间中非常小的区域。在本文中,我们研究以下问题:给定一个神经网络,找到一个足以形式验证网络稳健性的最小(最粗糙)NAP。找到最小的NAP规范不仅扩展了可验证的边界,还提供了哪些神经元对模型的稳健性有贡献的见解。为了解决这个问题,我们提出了几种精确和近似方法。我们的精确方法利用验证工具以确定性或统计方式找到最小的NAP规范。而近似方法则使用对抗性示例和局部梯度有效地估计最小的NAP,而无需调用验证工具。这使我们能够检查神经元与最先进神经网络的稳健性之间的潜在因果关系,这是现有验证框架无法扩展的任务。我们的实验结果表明,与最精细的NAP规范相比,最小的NAP规范需要比例较小的神经元,但它们可以显著扩展可验证边界到几个数量级更大。
更新时间: 2024-04-06 15:31:20
领域: cs.LG,cs.PL
Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning
Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not allowed to explore adquately for improving solutions at test time, which may be necessary given the complexity of NP-hard optimization tasks. Some recent attempts solve this issue by focusing on reward design and state feature engineering, which are tedious and ad-hoc. In this work, we instead propose a much simpler but more effective technique, named gauge transformation (GT). The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test. Morever, GT is very simple, which can be implemented with less than 10 lines of Python codes, and can be applied to a vast majority of RL models. Experimentally, we show that traditional RL models with GT technique produce the state-of-the-art performances on the MaxCut problem. Furthermore, since GT is independent of any RL models, it can be seamlessly integrated into various RL frameworks, paving the way of these models for more effective explorations in the solving of general COPs.
Updated: 2024-04-06 15:31:17
标题: 转换然后探索:一种简单有效的基于强化学习的探索型组合优化技术
摘要: 许多在生产和日常生活中遇到的复杂问题可以概念化为图上的组合优化问题(COPs)。近年来,基于强化学习(RL)的模型已经成为一个有前途的方向,将COPs的解决视为一种启发式学习问题。然而,目前基于有限时间段MDP的RL模型存在固有的限制。它们在测试时不允许充分探索以改进解决方案,鉴于NP难优化任务的复杂性,这可能是必要的。一些最近的尝试通过关注奖励设计和状态特征工程来解决这个问题,这些方法繁琐且专门化。在这项工作中,我们提出了一种更简单但更有效的技术,名为标尺变换(GT)。这种技术源自物理学,但在让RL代理人在测试中探索以不断改进解决方案方面非常有效。此外,GT非常简单,可以用不到10行Python代码实现,并且可以应用于绝大多数RL模型。实验证明,具有GT技术的传统RL模型在MaxCut问题上表现出最先进的性能。此外,由于GT独立于任何RL模型,因此可以无缝集成到各种RL框架中,为这些模型在解决一般COPs时提供更有效的探索途径。
更新时间: 2024-04-06 15:31:17
领域: cs.LG,cs.AI
Binary Classifier Optimization for Large Language Model Alignment
Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-completion pair. In this paper, we present theoretical foundations to explain the successful alignment achieved through these binary signals. Our analysis uncovers a new perspective: optimizing a binary classifier, whose logit is a reward, implicitly induces minimizing the Direct Preference Optimization (DPO) loss. In the process of this discovery, we identified two techniques for effective alignment: reward shift and underlying distribution matching. Consequently, we propose a new algorithm, \textit{Binary Classifier Optimization}, that integrates the techniques. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO and KTO; and second, on binary signal datasets simulating real-world conditions with divergent underlying distributions between thumbs-up and thumbs-down data. Our model consistently demonstrates effective and robust alignment across two base LLMs and three different binary signal datasets, showcasing the strength of our approach to learning from binary feedback.
Updated: 2024-04-06 15:20:59
标题: 大型语言模型对齐的二元分类器优化
摘要: 通过偏好优化将大型语言模型(LLMs)与人类偏好对齐是至关重要但劳动密集的,对于每个提示需要评估者比较所选择和拒绝的文本完成。最近,卡内曼-特弗斯基优化(KTO)已经证明LLMs可以仅使用每个提示-完成对的二进制“赞”或“踩”信号进行对齐。在本文中,我们提出了解释通过这些二进制信号实现成功对齐的理论基础。我们的分析揭示了一个新的视角:优化一个二元分类器,其logit是一个奖励,隐含地导致最小化直接偏好优化(DPO)损失。在这一发现过程中,我们确定了两种有效对齐的技术:奖励转移和基础分布匹配。因此,我们提出了一种新算法,二元分类器优化,将这些技术整合在一起。我们在两个设置中验证了我们的方法:首先,在一个配对偏好数据集上,我们的方法与DPO和KTO表现相当;其次,在模拟真实世界条件的二进制信号数据集上,其中赞和踩数据之间存在不同的基础分布。我们的模型在两个基础LLM和三个不同的二进制信号数据集上始终展示出有效和稳健的对齐,展示了我们学习二进制反馈的方法的优势。
更新时间: 2024-04-06 15:20:59
领域: cs.LG,cs.AI,cs.CL
CANEDERLI: On The Impact of Adversarial Training and Transferability on CAN Intrusion Detection Systems
The growing integration of vehicles with external networks has led to a surge in attacks targeting their Controller Area Network (CAN) internal bus. As a countermeasure, various Intrusion Detection Systems (IDSs) have been suggested in the literature to prevent and mitigate these threats. With the increasing volume of data facilitated by the integration of Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication networks, most of these systems rely on data-driven approaches such as Machine Learning (ML) and Deep Learning (DL) models. However, these systems are susceptible to adversarial evasion attacks. While many researchers have explored this vulnerability, their studies often involve unrealistic assumptions, lack consideration for a realistic threat model, and fail to provide effective solutions. In this paper, we present CANEDERLI (CAN Evasion Detection ResiLIence), a novel framework for securing CAN-based IDSs. Our system considers a realistic threat model and addresses the impact of adversarial attacks on DL-based detection systems. Our findings highlight strong transferability properties among diverse attack methodologies by considering multiple state-of-the-art attacks and model architectures. We analyze the impact of adversarial training in addressing this threat and propose an adaptive online adversarial training technique outclassing traditional fine-tuning methodologies with F1 scores up to 0.941. By making our framework publicly available, we aid practitioners and researchers in assessing the resilience of IDSs to a varied adversarial landscape.
Updated: 2024-04-06 14:54:11
标题: CANEDERLI:对CAN入侵检测系统中对抗训练和可迁移性的影响
摘要: 随着车辆与外部网络的日益整合,针对其控制器局域网(CAN)内部总线的攻击激增。作为一种对抗措施,文献中提出了各种入侵检测系统(IDS)以预防和减轻这些威胁。随着车辆对车辆(V2V)和车辆对基础设施(V2I)通信网络集成所带来的数据量增加,大多数这些系统依赖于基于数据驱动的方法,如机器学习(ML)和深度学习(DL)模型。然而,这些系统容易受到对抗性规避攻击的影响。尽管许多研究人员已经探讨了这种脆弱性,但他们的研究往往涉及不切实际的假设,缺乏对现实威胁模型的考虑,并且未能提供有效的解决方案。 本文提出了CANEDERLI(CAN规避检测弹性)的新框架,用于保护基于CAN的IDS。我们的系统考虑了现实的威胁模型,并解决了对DL-based检测系统的对抗攻击的影响。我们的研究通过考虑多种最新攻击和模型架构,突显了不同攻击方法之间的强大可转移性属性。我们分析了对抗性训练在应对这一威胁方面的影响,并提出了一种优于传统微调方法的自适应在线对抗性训练技术,F1分数高达0.941。通过公开我们的框架,我们帮助从业者和研究人员评估IDS对多样化对抗环境的弹性。
更新时间: 2024-04-06 14:54:11
领域: cs.CR
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length. While previous works focused on co-speech gesture or expression generation individually, the joint generation of synchronized expressions and gestures remains barely explored. To address this, our diffusion-based co-speech motion generation transformer enables uni-directional information flow from expression to gesture, facilitating improved matching of joint expression-gesture distributions. Furthermore, we introduce an outpainting-based sampling strategy for arbitrary long sequence generation in diffusion models, offering flexibility and computational efficiency. Our method provides a practical solution that produces high-quality synchronized expression and gesture generation driven by speech. Evaluated on two public datasets, our approach achieves state-of-the-art performance both quantitatively and qualitatively. Additionally, a user study confirms the superiority of DiffSHEG over prior approaches. By enabling the real-time generation of expressive and synchronized motions, DiffSHEG showcases its potential for various applications in the development of digital humans and embodied agents.
Updated: 2024-04-06 14:53:51
标题: DiffSHEG:一种基于扩散的方法,用于实时语音驱动的整体3D表达和手势生成
摘要: 我们提出了DiffSHEG,一种基于扩散的方法,用于生成带有任意长度的语音驱动的整体三维表达和手势。虽然先前的研究集中在单独生成语音手势或表达,但联合生成同步表达和手势仍然几乎未被探索。为了解决这个问题,我们基于扩散的共同语音运动生成变换器实现了从表达到手势的单向信息流,促进了联合表达-手势分布的更好匹配。此外,我们引入了基于外部绘制的采样策略,用于扩散模型中的任意长序列生成,提供了灵活性和计算效率。我们的方法提供了一个实用的解决方案,通过语音驱动生成高质量的同步表达和手势。在两个公共数据集上评估,我们的方法在定量和定性上均实现了最先进的性能。此外,用户研究证实了DiffSHEG相对于先前方法的优越性。通过实现表达丰富和同步运动的实时生成,DiffSHEG展示了在数字人类和具身代理的发展中的各种应用潜力。
更新时间: 2024-04-06 14:53:51
领域: cs.SD,cs.AI,cs.CV,cs.GR,eess.AS
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art performance in the parameter-efficient regime. We also compare different variants of HyperTTS, comparing them with baselines in different studies. Promising results on the dynamic adaptation of adapter parameters using hypernetworks open up new avenues for domain-generic multi-speaker TTS systems. The audio samples and code are available at https://github.com/declare-lab/HyperTTS.
Updated: 2024-04-06 14:34:46
标题: HyperTTS:使用超网络进行文本到语音的参数高效调整
摘要: 神经语音合成,或文字到语音(TTS),旨在将信号从文本领域转换为语音领域。尽管开发在相同一组演讲者上训练和测试的TTS架构已经取得了显着进展,但跨领域演讲者表现仍然面临巨大限制。通过对新一组演讲者进行域适应,可以通过对整个模型进行微调来实现,从而使其在参数效率方面变得低效。这个问题可以通过提供参数高效的适配器来解决。尽管在自然语言处理领域很有名气,但语音合成并没有从适配器中看到很大改进。在这项工作中,我们提出了HyperTTS,它包括一个小型可学习网络,“超网络”,用于生成适配器块的参数,使我们能够根据演讲者表示条件适配器并使其动态化。对两种领域适应设置的广泛评估表明,在参数效率方面实现了最先进性能。我们还比较了HyperTTS的不同变体,将它们与不同研究中的基线进行比较。使用超网络动态适配器参数的有希望的结果为领域通用多说者TTS系统开辟了新的途径。音频样本和代码可在https://github.com/declare-lab/HyperTTS 上找到。
更新时间: 2024-04-06 14:34:46
领域: cs.CL,cs.LG,cs.SD,eess.AS
Editing Personality for Large Language Models
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct a new benchmark dataset PersonalityEdit to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that not only align with a specified topic but also embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our intriguing findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can provide the NLP community with insights. Code and datasets are available at https://github.com/zjunlp/EasyEdit.
Updated: 2024-04-06 14:32:40
标题: 编辑大型语言模型的个性
摘要: 本文介绍了一项创新任务,重点是编辑大型语言模型(LLMs)的个性特征。该任务旨在调整模型对指定主题上涉及意见的问题的回答,因为个体的个性常常表现在其表达的观点形式中,从而展示出不同的个性特征。具体来说,我们构建了一个新的基准数据集PersonalityEdit来解决这个任务。借鉴社会心理学理论,我们将神经质、外向性和宜人性三个代表性特征作为我们基准的基础。然后,我们使用GPT-4收集数据,生成不仅与指定主题相关的回答,而且体现目标个性特征的回答。我们进行了涉及各种基线的全面实验,并讨论了LLMs中个性行为的表示。我们引人入胜的发现揭示了所提出任务的潜在挑战,说明了几个尚未解决的问题。我们期待我们的工作能为NLP社区提供见解。代码和数据集可在https://github.com/zjunlp/EasyEdit上找到。
更新时间: 2024-04-06 14:32:40
领域: cs.CL,cs.AI,cs.CY,cs.LG,cs.MA
Power-Efficient Image Storage: Leveraging Super Resolution Generative Adversarial Network for Sustainable Compression and Reduced Carbon Footprint
In recent years, large-scale adoption of cloud storage solutions has revolutionized the way we think about digital data storage. However, the exponential increase in data volume, especially images, has raised environmental concerns regarding power and resource consumption, as well as the rising digital carbon footprint emissions. The aim of this research is to propose a methodology for cloud-based image storage by integrating image compression technology with SuperResolution Generative Adversarial Networks (SRGAN). Rather than storing images in their original format directly on the cloud, our approach involves initially reducing the image size through compression and downsizing techniques before storage. Upon request, these compressed images will be retrieved and processed by SRGAN to generate images. The efficacy of the proposed method is evaluated in terms of PSNR and SSIM metrics. Additionally, a mathematical analysis is given to calculate power consumption and carbon footprint assesment. The proposed data compression technique provides a significant solution to achieve a reasonable trade off between environmental sustainability and industrial efficiency.
Updated: 2024-04-06 14:27:22
标题: 节能图像存储:利用超分辨率生成对抗网络实现可持续压缩和减少碳足迹
摘要: 近年来,大规模采用云存储解决方案已经彻底改变了我们对数字数据存储的看法。然而,数据量的指数增长,尤其是图像数据,引发了人们对功耗和资源消耗以及不断增长的数字碳足迹排放的环境担忧。本研究的目的是提出一种基于云存储的图像存储方法,通过将图像压缩技术与超分辨率生成对抗网络(SRGAN)相结合。我们的方法不是直接将图像以原始格式存储在云端,而是在存储之前通过压缩和缩小技术首先减小图像大小。在请求时,这些压缩图像将被检索并通过SRGAN处理以生成图像。所提出的方法的有效性是通过PSNR和SSIM指标进行评估。此外,还给出了数学分析来计算功耗和碳足迹评估。所提出的数据压缩技术提供了一种重要解决方案,以实现环境可持续性和工业效率之间的合理权衡。
更新时间: 2024-04-06 14:27:22
领域: eess.IV,cs.AI,cs.LG,68T07,I.2.m; H.3.2
Partial Selfish Mining for More Profits
Mining attacks aim to gain an unfair share of extra rewards in the blockchain mining. Selfish mining can preserve discovered blocks and strategically release them, wasting honest miners' computing resources and getting higher profits. Previous mining attacks either conceal the mined whole blocks (hiding or discarding), or release them completely in a particular time slot (e.g., causing a fork). In this paper, we extend the mining attack's strategy space to partial block sharing, and propose a new and feasible Partial Selfish Mining (PSM) attack. We show that by releasing partial block data publicly and attracting rational miners to work on attacker's private branch, attackers and these attracted miners can gain an unfair share of mining rewards. We then propose Advanced PSM (A-PSM) attack that can further improve attackers' profits to be no less than the selfish mining. Both theoretical and experimental results show that PSM attackers can be more profitable than selfish miners under a certain range of mining power and network conditions. A-PSM attackers can gain even higher profits than both selfish mining and honest mining with attracted rational miners.
Updated: 2024-04-06 14:00:20
标题: 部分自私挖矿以获取更多利润
摘要: 挖矿攻击旨在在区块链挖矿中获得不公平的额外奖励份额。自私挖矿可以保留发现的区块,并策略性地释放它们,浪费诚实矿工的计算资源并获得更高的利润。先前的挖矿攻击要么隐瞒已挖出的整个区块(隐藏或丢弃),要么完全在特定时间段释放它们(例如,引起分叉)。本文将挖矿攻击的策略空间扩展到部分区块共享,并提出一种新的可行的部分自私挖矿(PSM)攻击。我们表明,通过公开释放部分区块数据并吸引理性矿工在攻击者的私有分支上工作,攻击者和这些被吸引的矿工可以获得不公平的挖矿奖励份额。然后我们提出了进阶PSM(A-PSM)攻击,可以进一步提高攻击者的利润,使其不低于自私挖矿。理论和实验结果均表明,在一定范围内的挖矿能力和网络条件下,PSM攻击者可以比自私矿工更具盈利性。A-PSM攻击者可以比自私挖矿和诚实挖矿与被吸引的理性矿工更高的利润。
更新时间: 2024-04-06 14:00:20
领域: cs.CR,cs.DC
Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective
Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across various tasks, DPO has been criticized for its sensitivity to the SFT's effectiveness and its hindrance to the learning capacity towards human-preferred responses, leading to less satisfactory performance. To overcome those limitations, the theoretical understanding of DPO are indispensable but still lacking. To this end, we take a step towards theoretically analyzing and understanding the limitations of DPO. Specifically, we provide an analytical framework using the field theory to analyze the optimization process of DPO. By analyzing the gradient vector field of the DPO loss function, we find that the DPO loss function decreases the probability of producing human dispreferred data at a faster rate than it increases the probability of producing preferred data. This provides theoretical insights for understanding the limitations of DPO discovered in the related research experiments, thereby setting the foundation for its improvement.
Updated: 2024-04-06 13:24:37
标题: 朝向分析和理解DPO的局限性:理论视角
摘要: 直接偏好优化(DPO)从成对偏好数据中直接推导出奖励信号,已经显示出其在将大型语言模型(LLMs)与人类偏好对齐方面的有效性。尽管DPO在各种任务中被广泛使用,但它已经因其对SFT有效性的敏感性和对学习容量朝向人类优选响应的阻碍而受到批评,导致表现不太令人满意。为了克服这些限制,对DPO的理论理解是必不可少的,但仍然缺乏。为此,我们迈出了一步,从理论上分析和理解DPO的限制。具体而言,我们提供了一个使用场论的分析框架来分析DPO的优化过程。通过分析DPO损失函数的梯度向量场,我们发现DPO损失函数以更快的速度降低了产生人类不喜欢数据的概率,而不是增加了产生优选数据的概率。这为理解在相关研究实验中发现的DPO限制提供了理论洞见,从而为其改进奠定了基础。
更新时间: 2024-04-06 13:24:37
领域: cs.CL,cs.AI
An Automated Machine Learning Approach to Inkjet Printed Component Analysis: A Step Toward Smart Additive Manufacturing
In this paper, we present a machine learning based architecture for microwave characterization of inkjet printed components on flexible substrates. Our proposed architecture uses several machine learning algorithms and automatically selects the best algorithm to extract the material parameters (ink conductivity and dielectric properties) from on-wafer measurements. Initially, the mutual dependence between material parameters of the inkjet printed coplanar waveguides (CPWs) and EM-simulated propagation constants is utilized to train the machine learning models. Next, these machine learning models along with measured propagation constants are used to extract the ink conductivity and dielectric properties of the test prototypes. To demonstrate the applicability of our proposed approach, we compare and contrast four heuristic based machine learning models. It is shown that eXtreme Gradient Boosted Trees Regressor (XGB) and Light Gradient Boosting (LGB) algorithms perform best for the characterization problem under study.
Updated: 2024-04-06 13:13:45
标题: 一种自动化机器学习方法用于喷墨打印组件分析:迈向智能增材制造的一步
摘要: 在本文中,我们提出了一种基于机器学习的架构,用于对柔性基板上喷墨打印组件进行微波特性表征。我们提出的架构使用多种机器学习算法,并自动选择最佳算法从晶圆测量中提取材料参数(墨水导电性和介电性能)。最初,利用喷墨打印的共面波导(CPWs)的材料参数与电磁模拟传播常数之间的相互依赖关系来训练机器学习模型。接下来,这些机器学习模型以及测量的传播常数被用于提取测试原型的墨水导电性和介电性能。为了展示我们提出的方法的适用性,我们比较和对比了四种基于启发式的机器学习模型。结果表明,eXtreme Gradient Boosted Trees Regressor(XGB)和Light Gradient Boosting(LGB)算法在所研究的表征问题中表现最佳。
更新时间: 2024-04-06 13:13:45
领域: cs.LG,cs.ET
Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model
With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output executable actions. Nowadays, Multi-modal Language Models~(MLMs) integrate multi-modal signals into LLMs, further bringing richer perception to entity agents and allowing embodied agents to perceive world-understanding tasks more delicately. However, existing works: 1) operate independently by agents, each containing multiple LLMs, from perception to action, resulting in gaps between complex tasks and execution; 2) train MLMs on static data, struggling with dynamics in open-ended scenarios; 3) input prior knowledge directly as prompts, suppressing application flexibility. We propose STEVE-2, a hierarchical knowledge distillation framework for open-ended embodied tasks, characterized by 1) a hierarchical system for multi-granular task division, 2) a mirrored distillation method for parallel simulation data, and 3) an extra expert model for bringing additional knowledge into parallel simulation. After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM. Extensive evaluations on navigation and creation tasks highlight the superior performance of STEVE-2 in open-ended tasks, with $1.4 \times$ - $7.3 \times$ in performance.
Updated: 2024-04-06 12:51:00
标题: 我们真的需要一个复杂的智能体系统吗?将具身智能体提炼成一个单一模型
摘要: 借助大型语言模型(LLMs)的强大能力,开放式的具身代理可以灵活地理解人类指令,生成可解释的指导策略,并输出可执行的动作。如今,多模态语言模型(MLMs)将多模态信号集成到LLMs中,进一步为实体代理带来更丰富的感知,使具身代理能够更细致地感知世界理解任务。然而,现有的研究存在以下问题:1)独立运行的代理之间存在差距,每个代理从感知到执行都包含多个LLMs,导致复杂任务和执行之间存在鸿沟;2)在静态数据上训练MLMs,在开放式场景中面临动态问题;3)直接将先验知识作为提示输入,抑制了应用的灵活性。我们提出了STEVE-2,一个用于开放式具身任务的分层知识蒸馏框架,其特点包括:1)多粒度任务划分的分层系统,2)用于并行模拟数据的镜像蒸馏方法,3)额外的专家模型,将额外知识引入并行模拟中。经过蒸馏,具身代理可以在没有额外专家指导的情况下完成复杂的开放式任务,利用多才多艺的MLM的性能和知识。在导航和创作任务的广泛评估中,STEVE-2在开放式任务中展现出优越的性能,性能提升为1.4倍至7.3倍。
更新时间: 2024-04-06 12:51:00
领域: cs.AI,cs.CV
Optimizing Sparse Convolution on GPUs with CUDA for 3D Point Cloud Processing in Embedded Systems
In recent years, there has been a significant increase in the utilization of deep learning methods, particularly convolutional neural networks (CNNs), which have emerged as the dominant approach in various domains that involve structured grid data, such as picture analysis and processing. Nevertheless, the exponential growth in the utilization of LiDAR and 3D sensors across many domains has resulted in an increased need for the analysis of 3D point clouds. The utilization of 3D point clouds is crucial in various applications, including object recognition and segmentation, as they offer a spatial depiction of things within a three-dimensional environment. In contrast to photos, point clouds exhibit sparsity and lack a regular grid, hence posing distinct processing and computational issues.
Updated: 2024-04-06 12:49:43
标题: 在嵌入式系统中使用CUDA优化GPU上的稀疏卷积,用于3D点云处理
摘要: 近年来,深度学习方法的利用显著增加,特别是卷积神经网络(CNN),已成为涉及结构化网格数据的各个领域中的主导方法,如图片分析和处理。然而,LiDAR和3D传感器在许多领域的指数增长导致对3D点云分析的需求增加。利用3D点云在各种应用中至关重要,包括物体识别和分割,因为它们提供了三维环境中物体的空间描述。与照片不同,点云表现出稀疏性和缺乏规则网格,因此提出了独特的处理和计算问题。
更新时间: 2024-04-06 12:49:43
领域: cs.LG,cs.CV
Vanishing Variance Problem in Fully Decentralized Neural-Network Systems
Federated learning and gossip learning are emerging methodologies designed to mitigate data privacy concerns by retaining training data on client devices and exclusively sharing locally-trained machine learning (ML) models with others. The primary distinction between the two lies in their approach to model aggregation: federated learning employs a centralized parameter server, whereas gossip learning adopts a fully decentralized mechanism, enabling direct model exchanges among nodes. This decentralized nature often positions gossip learning as less efficient compared to federated learning. Both methodologies involve a critical step: computing a representation of received ML models and integrating this representation into the existing model. Conventionally, this representation is derived by averaging the received models, exemplified by the FedAVG algorithm. Our findings suggest that this averaging approach inherently introduces a potential delay in model convergence. We identify the underlying cause and refer to it as the "vanishing variance" problem, where averaging across uncorrelated ML models undermines the optimal variance established by the Xavier weight initialization. Unlike federated learning where the central server ensures model correlation, and unlike traditional gossip learning which circumvents this problem through model partitioning and sampling, our research introduces a variance-corrected model averaging algorithm. This novel algorithm preserves the optimal variance needed during model averaging, irrespective of network topology or non-IID data distributions. Our extensive simulation results demonstrate that our approach enables gossip learning to achieve convergence efficiency comparable to that of federated learning.
Updated: 2024-04-06 12:49:20
标题: Fully Decentralized神经网络系统中的消失方差问题
摘要: 联合学习和八卦学习是新兴的方法论,旨在通过在客户设备上保留训练数据并将本地训练的机器学习(ML)模型专门与他人分享,以减轻数据隐私问题。两者之间的主要区别在于它们对模型聚合的方法:联合学习采用集中参数服务器,而八卦学习采用完全分散的机制,使节点之间可以直接交换模型。这种分散的特性通常使八卦学习相对于联合学习来说效率较低。这两种方法都涉及一个关键步骤:计算接收到的ML模型的表示,并将此表示集成到现有模型中。传统上,这种表示是通过对接收到的模型进行平均来得出的,例如FedAVG算法。我们的研究结果表明,这种平均方法在本质上引入了模型收敛的潜在延迟。我们确定了其潜在原因,并将其称为“消失方差”问题,即通过对不相关的ML模型进行平均会削弱Xavier权重初始化建立的最佳方差。与联合学习不同,其中中央服务器确保模型相关性,以及传统的八卦学习通过模型分区和抽样来规避此问题,我们的研究引入了一个方差校正模型平均算法。这种新颖算法在模型平均过程中保留了所需的最佳方差,无论网络拓扑或非独立同分布的数据分布如何。我们广泛的仿真结果表明,我们的方法使八卦学习能够实现与联合学习相媲美的收敛效率。
更新时间: 2024-04-06 12:49:20
领域: cs.LG,cs.DC
PointSAGE: Mesh-independent superresolution approach to fluid flow predictions
Computational Fluid Dynamics (CFD) serves as a powerful tool for simulating fluid flow across diverse industries. High-resolution CFD simulations offer valuable insights into fluid behavior and flow patterns, aiding in optimizing design features or enhancing system performance. However, as resolution increases, computational data requirements and time increase proportionately. This presents a persistent challenge in CFD. Recently, efforts have been directed towards accurately predicting fine-mesh simulations using coarse-mesh simulations, with geometry and boundary conditions as input. Drawing inspiration from models designed for super-resolution, deep learning techniques like UNets have been applied to address this challenge. However, these existing methods are limited to structured data and fail if the mesh is unstructured due to its inability to convolute. Additionally, incorporating geometry/mesh information in the training process introduces drawbacks such as increased data requirements, challenges in generalizing to unseen geometries for the same physical phenomena, and issues with robustness to mesh distortions. To address these concerns, we propose a novel framework, PointSAGE a mesh-independent network that leverages the unordered, mesh-less nature of Pointcloud to learn the complex fluid flow and directly predict fine simulations, completely neglecting mesh information. Utilizing an adaptable framework, the model accurately predicts the fine data across diverse point cloud sizes, regardless of the training dataset's dimension. We have evaluated the effectiveness of PointSAGE on diverse datasets in different scenarios, demonstrating notable results and a significant acceleration in computational time in generating fine simulations compared to standard CFD techniques.
Updated: 2024-04-06 12:49:09
标题: PointSAGE:网格无关的流体流动预测超分辨率方法
摘要: 计算流体动力学(CFD)作为一个强大的工具,用于模拟各行业中的流体流动。高分辨率的CFD模拟提供有价值的洞察力,可帮助优化设计特征或增强系统性能。然而,随着分辨率的增加,计算数据需求和时间成正比增加。这在CFD中构成了一个持续的挑战。最近,人们开始致力于利用粗网格模拟准确预测细网格模拟,以几何和边界条件为输入。受设计用于超分辨率的模型的启发,深度学习技术如UNets已被应用来解决这一挑战。然而,这些现有方法仅限于结构化数据,并且如果网格是非结构化的,则会因为其无法卷积而失败。此外,将几何/网格信息纳入训练过程会引入诸如增加数据需求、难以推广到未见几何形状的相同物理现象以及对网格扭曲的鲁棒性等缺点。为了解决这些问题,我们提出了一种新颖的框架,PointSAGE,一种独立于网格的网络,利用无序、无网格的Pointcloud来学习复杂的流体流动,并直接预测细致的模拟,完全忽略网格信息。利用一个可适应的框架,该模型准确预测各种点云尺寸的细致数据,不受训练数据集维度的影响。我们在不同情况下评估了PointSAGE在各种数据集上的有效性,展示了显著的结果和与标准CFD技术相比在生成细致模拟时计算时间的显著加速。
更新时间: 2024-04-06 12:49:09
领域: physics.flu-dyn,cs.AI,cs.LG
Spectral Graph Pruning Against Over-Squashing and Over-Smoothing
Message Passing Graph Neural Networks are known to suffer from two problems that are sometimes believed to be diametrically opposed: over-squashing and over-smoothing. The former results from topological bottlenecks that hamper the information flow from distant nodes and are mitigated by spectral gap maximization, primarily, by means of edge additions. However, such additions often promote over-smoothing that renders nodes of different classes less distinguishable. Inspired by the Braess phenomenon, we argue that deleting edges can address over-squashing and over-smoothing simultaneously. This insight explains how edge deletions can improve generalization, thus connecting spectral gap optimization to a seemingly disconnected objective of reducing computational resources by pruning graphs for lottery tickets. To this end, we propose a more effective spectral gap optimization framework to add or delete edges and demonstrate its effectiveness on large heterophilic datasets.
Updated: 2024-04-06 12:40:21
标题: 光谱图修剪抵抗过度压缩和过度平滑
摘要: 消息传递图神经网络被认为存在两个问题,有时被认为是截然相反的:过度压缩和过度平滑。前者由于拓扑瓶颈导致远距离节点的信息流受阻,通过增加边缘来减轻,主要通过增加边缘来最大化谱间隙。然而,这种添加往往会促使过度平滑,使不同类别的节点难以区分。受Braess现象的启发,我们认为删除边缘可以同时解决过度压缩和过度平滑问题。这一洞见解释了边缘删除如何改善泛化能力,从而将谱间隙优化与通过修剪图来减少计算资源的看似无关目标联系起来。为此,我们提出了一个更有效的谱间隙优化框架,用于添加或删除边缘,并在大型异质数据集上展示其有效性。
更新时间: 2024-04-06 12:40:21
领域: cs.LG,eess.SP,stat.ML
Dynamic Graph Information Bottleneck
Dynamic Graphs widely exist in the real world, which carry complicated spatial and temporal feature patterns, challenging their representation learning. Dynamic Graph Neural Networks (DGNNs) have shown impressive predictive abilities by exploiting the intrinsic dynamics. However, DGNNs exhibit limited robustness, prone to adversarial attacks. This paper presents the novel Dynamic Graph Information Bottleneck (DGIB) framework to learn robust and discriminative representations. Leveraged by the Information Bottleneck (IB) principle, we first propose the expected optimal representations should satisfy the Minimal-Sufficient-Consensual (MSC) Condition. To compress redundant as well as conserve meritorious information into latent representation, DGIB iteratively directs and refines the structural and feature information flow passing through graph snapshots. To meet the MSC Condition, we decompose the overall IB objectives into DGIB$_{MS}$ and DGIB$_C$, in which the DGIB$_{MS}$ channel aims to learn the minimal and sufficient representations, with the DGIB$_{MS}$ channel guarantees the predictive consensus. Extensive experiments on real-world and synthetic dynamic graph datasets demonstrate the superior robustness of DGIB against adversarial attacks compared with state-of-the-art baselines in the link prediction task. To the best of our knowledge, DGIB is the first work to learn robust representations of dynamic graphs grounded in the information-theoretic IB principle.
Updated: 2024-04-06 12:38:45
标题: 动态图信息瓶颈
摘要: 动态图在现实世界中广泛存在,具有复杂的空间和时间特征模式,挑战着它们的表示学习。动态图神经网络(DGNNs)通过利用内在动态展现出了令人印象深刻的预测能力。然而,DGNNs表现出有限的鲁棒性,容易受到对抗性攻击。本文提出了新颖的动态图信息瓶颈(DGIB)框架,用于学习稳健和有区分性的表示。借助信息瓶颈(IB)原则,我们首先提出预期的最佳表示应满足最小-充分-共识(MSC)条件。为了将冗余信息压缩并保存优良信息到潜在表示中,DGIB通过迭代引导和完善通过图快照的结构和特征信息流。为了满足MSC条件,我们将整体IB目标分解为DGIB$_{MS}$和DGIB$_C$,其中DGIB$_{MS}$通道旨在学习最小且充分的表示,而DGIB$_C$通道则保证了预测的一致性。对真实世界和合成动态图数据集的大量实验表明,在链接预测任务中,DGIB相对于现有技术基线具有更强的对抗性攻击鲁棒性。据我们所知,DGIB是首个基于信息论信息瓶颈原则学习动态图稳健表示的工作。
更新时间: 2024-04-06 12:38:45
领域: cs.LG,cs.AI
Enhancing Convergence Speed with Feature-Enforcing Physics-Informed Neural Networks: Utilizing Boundary Conditions as Prior Knowledge for Faster Convergence
This study introduces an accelerated training method for Vanilla Physics-Informed-Neural-Networks (PINN) addressing three factors that imbalance the loss function: initial weight state of a neural network, domain to boundary points ratio, and loss weighting factor. We propose a novel two-stage training method. During the initial stage, we create a unique loss function using a subset of boundary conditions and partial differential equation terms. Furthermore, we introduce preprocessing procedures that aim to decrease the variance during initialization and choose domain points according to the initial weight state of various neural networks. The second phase resembles Vanilla-PINN training, but a portion of the random weights are substituted with weights from the first phase. This implies that the neural network's structure is designed to prioritize the boundary conditions, subsequently affecting the overall convergence. Three benchmarks are utilized: two-dimensional flow over a cylinder, an inverse problem of inlet velocity determination, and the Burger equation. It is found that incorporating weights generated in the first training phase into the structure of a neural network neutralizes the effects of imbalance factors. For instance, in the first benchmark, as a result of our process, the second phase of training is balanced across a wide range of ratios and is not affected by the initial state of weights, while the Vanilla-PINN failed to converge in most cases. Lastly, the initial training process not only eliminates the need for hyperparameter tuning to balance the loss function, but it also outperforms the Vanilla-PINN in terms of speed.
Updated: 2024-04-06 12:30:26
标题: 使用强调特征的物理信息神经网络增强收敛速度:利用边界条件作为更快收敛的先验知识
摘要: 这项研究介绍了一种加速训练方法,用于解决香草物理信息神经网络(PINN)中三个导致损失函数不平衡的因素:神经网络的初始权重状态、域到边界点的比例以及损失加权因子。我们提出了一种新颖的两阶段训练方法。在初始阶段,我们利用一部分边界条件和偏微分方程项创建了一个独特的损失函数。此外,我们引入了预处理程序,旨在在初始化过程中减少方差,并根据各种神经网络的初始权重状态选择域点。第二阶段类似于香草PINN训练,但是一部分随机权重被第一阶段的权重替换。这意味着神经网络的结构被设计为优先考虑边界条件,进而影响整体收敛性。我们使用了三个基准测试:二维流经圆柱体,入口速度确定的反问题以及Burger方程。结果表明,将第一训练阶段生成的权重纳入神经网络结构可以抵消不平衡因素的影响。例如,在第一个基准测试中,由于我们的方法,训练的第二阶段在各种比率下均保持平衡,并且不受初始权重状态的影响,而香草PINN在大多数情况下无法收敛。最后,初始训练过程不仅消除了平衡损失函数所需的超参数调整,而且在速度方面表现优于香草PINN。
更新时间: 2024-04-06 12:30:26
领域: cs.LG
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning
In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves an unprecedented 100% success rate in the development stage, while attaining 36% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing \$1.60 and \$0.13 per run with GPT-4, respectively. Our code is open-sourced at https://github.com/guosyjlu/DS-Agent.
Updated: 2024-04-06 12:28:57
标题: DS-Agent:通过赋能大型语言模型和基于案例推理实现自动化数据科学
摘要: 在这项工作中,我们研究了基于大型语言模型(LLMs)的代理能够自动化数据科学任务的潜力,目标是理解任务需求,然后构建和训练最合适的机器学习模型。尽管现有的LLM代理取得了广泛的成功,但在这种情况下,它们受到生成不合理实验计划的阻碍。为此,我们提出了DS-Agent,这是一个利用LLM代理和案例推理(CBR)的新型自动化框架。在开发阶段,DS-Agent遵循CBR框架构建自动迭代管道,可以灵活利用来自Kaggle的专家知识,并通过反馈机制促进一致的性能改进。此外,DS-Agent实现了一个低资源部署阶段,采用简化的CBR范式,从开发阶段成功的解决方案中适应直接代码生成,显著减少了对LLMs基本能力的需求。在实证研究中,DS-Agent与GPT-4在开发阶段实现了前所未有的100%成功率,同时在部署阶段相对于其他LLMs平均实现了36%的改进。在两个阶段中,DS-Agent在性能方面取得了最佳排名,使用GPT-4分别每次运行花费1.60美元和0.13美元。我们的代码在https://github.com/guosyjlu/DS-Agent上开源。
更新时间: 2024-04-06 12:28:57
领域: cs.LG
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Current remote-sensing interpretation models often focus on a single task such as detection, segmentation, or caption. However, the task-specific designed models are unattainable to achieve the comprehensive multi-level interpretation of images. The field also lacks support for multi-task joint interpretation datasets. In this paper, we propose Panoptic Perception, a novel task and a new fine-grained dataset (FineGrip) to achieve a more thorough and universal interpretation for RSIs. The new task, 1) integrates pixel-level, instance-level, and image-level information for universal image perception, 2) captures image information from coarse to fine granularity, achieving deeper scene understanding and description, and 3) enables various independent tasks to complement and enhance each other through multi-task learning. By emphasizing multi-task interactions and the consistency of perception results, this task enables the simultaneous processing of fine-grained foreground instance segmentation, background semantic segmentation, and global fine-grained image captioning. Concretely, the FineGrip dataset includes 2,649 remote sensing images, 12,054 fine-grained instance segmentation masks belonging to 20 foreground things categories, 7,599 background semantic masks for 5 stuff classes and 13,245 captioning sentences. Furthermore, we propose a joint optimization-based panoptic perception model. Experimental results on FineGrip demonstrate the feasibility of the panoptic perception task and the beneficial effect of multi-task joint optimization on individual tasks. The dataset will be publicly available.
Updated: 2024-04-06 12:27:21
标题: 全视角感知:一项新颖任务和细粒度数据集,用于普适遥感图像解释
摘要: 目前遥感解译模型往往集中在单一任务,如检测、分割或描述。然而,专门设计的任务特定模型无法实现对图像的全面多层次解译。该领域还缺乏支持多任务联合解译数据集的支持。本文提出了全景感知(Panoptic Perception)这一新任务,以及一个新的细粒度数据集(FineGrip),以实现对遥感图像更全面和普遍的解释。新任务将像素级、实例级和图像级信息整合到一起,实现对图像的通用感知;从粗到细的粒度捕捉图像信息,实现更深入的场景理解和描述;通过多任务学习实现各种独立任务互补和增强。该任务强调多任务互动和感知结果的一致性,实现对细粒度前景实例分割、背景语义分割和全局细粒度图像描述的同时处理。FineGrip数据集包括2,649张遥感图像,属于20种前景物体类别的12,054个细粒度实例分割掩模,5种材料类别的7,599个背景语义掩模和13,245个描述性句子。此外,我们提出了基于联合优化的全景感知模型。在FineGrip上的实验结果展示了全景感知任务的可行性,以及多任务联合优化对各个任务的有益影响。该数据集将公开提供。
更新时间: 2024-04-06 12:27:21
领域: cs.CV,cs.AI
Exploiting Sequence Number Leakage: TCP Hijacking in NAT-Enabled Wi-Fi Networks
In this paper, we uncover a new side-channel vulnerability in the widely used NAT port preservation strategy and an insufficient reverse path validation strategy of Wi-Fi routers, which allows an off-path attacker to infer if there is one victim client in the same network communicating with another host on the Internet using TCP. After detecting the presence of TCP connections between the victim client and the server, the attacker can evict the original NAT mapping and reconstruct a new mapping at the router by sending fake TCP packets due to the routers' vulnerability of disabling TCP window tracking strategy, which has been faithfully implemented in most of the routers for years. In this way, the attacker can intercept TCP packets from the server and obtain the current sequence and acknowledgment numbers, which in turn allows the attacker to forcibly close the connection, poison the traffic in plain text, or reroute the server's incoming packets to the attacker. We test 67 widely used routers from 30 vendors and discover that 52 of them are affected by this attack. Also, we conduct an extensive measurement study on 93 real-world Wi-Fi networks. The experimental results show that 75 of these evaluated Wi-Fi networks (81%) are fully vulnerable to our attack. Our case study shows that it takes about 17.5, 19.4, and 54.5 seconds on average to terminate an SSH connection, download private files from FTP servers, and inject fake HTTP response packets with success rates of 87.4%, 82.6%, and 76.1%. We responsibly disclose the vulnerability and suggest mitigation strategies to all affected vendors and have received positive feedback, including acknowledgments, CVEs, rewards, and adoption of our suggestions.
Updated: 2024-04-06 11:59:35
标题: 利用序列号泄漏:在启用NAT的Wi-Fi网络中进行TCP劫持
摘要: 在本文中,我们揭示了一种新的侧信道漏洞,涉及广泛使用的NAT端口保留策略和Wi-Fi路由器的不足的反向路径验证策略,这使得一个远程攻击者可以推断出在同一网络中是否有一个受害者客户端与Internet上的另一个主机使用TCP进行通信。在检测到受害者客户端与服务器之间的TCP连接存在后,攻击者可以通过发送虚假的TCP数据包来逐出原始NAT映射,并在路由器上重建新的映射,这是因为路由器在多年来一直实现的TCP窗口跟踪策略被禁用,容易受到攻击。通过这种方式,攻击者可以拦截来自服务器的TCP数据包并获取当前的序列和确认号,从而可以强制关闭连接、在明文中污染流量,或将服务器的传入数据包重新路由到攻击者。我们测试了来自30个供应商的67个广泛使用的路由器,发现其中52个受到了这种攻击的影响。此外,我们对93个真实的Wi-Fi网络进行了广泛的测量研究。实验结果显示,这些评估的Wi-Fi网络中有75个(81%)完全容易受到我们的攻击。我们的案例研究表明,平均需要大约17.5、19.4和54.5秒来终止一个SSH连接,从FTP服务器下载私人文件,并注入虚假的HTTP响应数据包,成功率分别为87.4%,82.6%和76.1%。我们负责地披露了这个漏洞,并向所有受影响的供应商提出了缓解策略,并收到了积极的反馈,包括感谢、CVE、奖励和采纳我们建议的情况。
更新时间: 2024-04-06 11:59:35
领域: cs.CR
Multinomial belief networks for healthcare data
Healthcare data from patient or population cohorts are often characterized by sparsity, high missingness and relatively small sample sizes. In addition, being able to quantify uncertainty is often important in a medical context. To address these analytical requirements we propose a deep generative Bayesian model for multinomial count data. We develop a collapsed Gibbs sampling procedure that takes advantage of a series of augmentation relations, inspired by the Zhou$\unicode{x2013}$Cong$\unicode{x2013}$Chen model. We visualise the model's ability to identify coherent substructures in the data using a dataset of handwritten digits. We then apply it to a large experimental dataset of DNA mutations in cancer and show that we can identify biologically meaningful clusters of mutational signatures in a fully data-driven way.
Updated: 2024-04-06 11:38:31
标题: 多项式信念网络在医疗数据中的应用
摘要: 患者或人群队列中的医疗数据通常具有稀疏性、高缺失率和相对较小的样本量。此外,在医疗背景下,能够量化不确定性通常很重要。为了满足这些分析要求,我们提出了一个用于多项式计数数据的深度生成贝叶斯模型。我们开发了一种折叠吉布斯采样过程,利用了一系列增强关系,灵感来自周$\unicode{x2013}$丛$\unicode{x2013}$陈模型。我们通过一个手写数字数据集展示了模型在识别数据中连贯子结构的能力。然后,我们将其应用于一个大型癌症DNA突变实验数据集,并展示我们可以以完全数据驱动的方式识别突变签名的生物学意义集群。
更新时间: 2024-04-06 11:38:31
领域: stat.ML,cs.LG,stat.AP
Evaluating the Effectiveness of Artificial Intelligence in Predicting Adverse Drug Reactions among Cancer Patients: A Systematic Review and Meta-Analysis
Adverse drug reactions considerably impact patient outcomes and healthcare costs in cancer therapy. Using artificial intelligence to predict adverse drug reactions in real time could revolutionize oncology treatment. This study aims to assess the performance of artificial intelligence models in predicting adverse drug reactions in patients with cancer. This is the first systematic review and meta-analysis. Scopus, PubMed, IEEE Xplore, and ACM Digital Library databases were searched for studies in English, French, and Arabic from January 1, 2018, to August 20, 2023. The inclusion criteria were: (1) peer-reviewed research articles; (2) use of artificial intelligence algorithms (machine learning, deep learning, knowledge graphs); (3) study aimed to predict adverse drug reactions (cardiotoxicity, neutropenia, nephrotoxicity, hepatotoxicity); (4) study was on cancer patients. The data were extracted and evaluated by three reviewers for study quality. Of the 332 screened articles, 17 studies (5%) involving 93,248 oncology patients from 17 countries were included in the systematic review, of which ten studies synthesized the meta-analysis. A random-effects model was created to pool the sensitivity, specificity, and AUC of the included studies. The pooled results were 0.82 (95% CI:0.69, 0.9), 0.84 (95% CI:0.75, 0.9), and 0.83 (95% CI:0.77, 0.87) for sensitivity, specificity, and AUC, respectively, of ADR predictive models. Biomarkers proved their effectiveness in predicting ADRs, yet they were adopted by only half of the reviewed studies. The use of AI in cancer treatment shows great potential, with models demonstrating high specificity and sensitivity in predicting ADRs. However, standardized research and multicenter studies are needed to improve the quality of evidence. AI can enhance cancer patient care by bridging the gap between data-driven insights and clinical expertise.
Updated: 2024-04-06 11:20:28
标题: 评估人工智能在预测癌症患者中不良药物反应方面的有效性:一项系统评价和荟萃分析
摘要: 不良药物反应在癌症治疗中显著影响患者预后和医疗成本。利用人工智能实时预测不良药物反应可能彻底改变肿瘤治疗。本研究旨在评估人工智能模型在预测癌症患者不良药物反应方面的性能。这是第一次系统性回顾和荟萃分析。从2018年1月1日至2023年8月20日,搜索了Scopus、PubMed、IEEE Xplore和ACM数字图书馆数据库中英文、法文和阿拉伯文的研究。包括标准为:(1)同行评审的研究文章;(2)使用人工智能算法(机器学习、深度学习、知识图谱);(3)研究旨在预测不良药物反应(心脏毒性、中性粒细胞减少、肾毒性、肝毒性);(4)研究对象为癌症患者。数据由三名评审员提取和评估研究质量。在筛选的332篇文章中,17项研究(5%)涉及来自17个国家的93,248名肿瘤患者被纳入系统性回顾,其中十项研究进行了荟萃分析。建立了随机效应模型来汇总包括研究的敏感度、特异度和AUC。包括研究的敏感度、特异度和AUC的汇总结果分别为0.82(95% CI:0.69,0.9)、0.84(95% CI:0.75,0.9)和0.83(95% CI:0.77,0.87)。生物标志物被证明在预测不良药物反应方面有效,然而仅有一半的研究采用了它们。人工智能在癌症治疗中的应用显示出巨大潜力,模型在预测不良药物反应方面表现出高的特异度和敏感度。然而,需要规范化的研究和多中心研究以提高证据质量。人工智能可以通过连接数据驱动的见解和临床专业知识来提升癌症患者护理。
更新时间: 2024-04-06 11:20:28
领域: q-bio.QM,cs.AI,cs.LG
Neuroevolving Electronic Dynamical Networks
Neuroevolution is a powerful method of applying an evolutionary algorithm to refine the performance of artificial neural networks through natural selection; however, the fitness evaluation of these networks can be time-consuming and computationally expensive, particularly for continuous time recurrent neural networks (CTRNNs) that necessitate the simulation of differential equations. To overcome this challenge, field programmable gate arrays (FPGAs) have emerged as an increasingly popular solution, due to their high performance and low power consumption. Further, their ability to undergo dynamic and partial reconfiguration enables the extremely rapid evaluation of the fitness of CTRNNs, effectively addressing the bottleneck associated with conventional methods. By incorporating fitness evaluation directly upon the programmable logic of the FPGA, hyper-parallel evaluation becomes feasible, dramatically reducing the time required for assessment. This inherent parallelism of FPGAs accelerates the entire neuroevolutionary process by several orders of magnitude, facilitating faster convergence to an optimal solution. The work presented in this study demonstrates the potential of utilizing dynamic and partial reconfiguration on capable FPGAs as a powerful platform for neuroevolving dynamic neural networks.
Updated: 2024-04-06 10:54:35
标题: 神经进化电子动态网络
摘要: 神经进化是一种将进化算法应用于通过自然选择改进人工神经网络性能的强大方法;然而,这些网络的适应度评估可能耗时且计算成本高,特别是对于需要模拟微分方程的连续时间递归神经网络(CTRNNs)。为了克服这一挑战,现场可编程门阵列(FPGAs)作为一种越来越受欢迎的解决方案出现,由于其高性能和低功耗。此外,它们具有动态和部分重配置的能力,使得对CTRNNs的适应度极快评估成为可能,有效地解决了传统方法所固有的瓶颈。通过直接将适应度评估纳入FPGA的可编程逻辑中,超并行评估变得可行,极大地减少了评估所需的时间。FPGAs的固有并行性将整个神经进化过程加速数个数量级,促进更快地收敛到最佳解决方案。本研究所展示的工作展示了利用功能强大的FPGAs上的动态和部分重配置作为神经演化动态神经网络的强大平台的潜力。
更新时间: 2024-04-06 10:54:35
领域: cs.NE,cs.AI,cs.AR
The Journey to Trustworthy AI- Part 1: Pursuit of Pragmatic Frameworks
This paper reviews Trustworthy Artificial Intelligence (TAI) and its various definitions. Considering the principles respected in any society, TAI is often characterized by a few attributes, some of which have led to confusion in regulatory or engineering contexts. We argue against using terms such as Responsible or Ethical AI as substitutes for TAI. And to help clarify any confusion, we suggest leaving them behind. Given the subjectivity and complexity inherent in TAI, developing a universal framework is deemed infeasible. Instead, we advocate for approaches centered on addressing key attributes and properties such as fairness, bias, risk, security, explainability, and reliability. We examine the ongoing regulatory landscape, with a focus on initiatives in the EU, China, and the USA. We recognize that differences in AI regulations based on geopolitical and geographical reasons pose an additional challenge for multinational companies. We identify risk as a core factor in AI regulation and TAI. For example, as outlined in the EU-AI Act, organizations must gauge the risk level of their AI products to act accordingly (or risk hefty fines). We compare modalities of TAI implementation and how multiple cross-functional teams are engaged in the overall process. Thus, a brute force approach for enacting TAI renders its efficiency and agility, moot. To address this, we introduce our framework Set-Formalize-Measure-Act (SFMA). Our solution highlights the importance of transforming TAI-aware metrics, drivers of TAI, stakeholders, and business/legal requirements into actual benchmarks or tests. Finally, over-regulation driven by panic of powerful AI models can, in fact, harm TAI too. Based on GitHub user-activity data, in 2023, AI open-source projects rose to top projects by contributor account. Enabling innovation in TAI hinges on the independent contributions of the open-source community.
Updated: 2024-04-06 10:45:35
标题: 通往可信人工智能的旅程- 第一部分:追求实用框架
摘要: 本文回顾了可信人工智能(TAI)及其各种定义。考虑到任何社会中尊重的原则,TAI通常被一些属性所特征化,其中一些属性导致了在监管或工程背景下的混淆。我们反对使用“负责任的”或“道德的”人工智能这样的术语作为TAI的替代品。为了帮助澄清任何混淆,我们建议将它们抛诸脑后。鉴于TAI固有的主观性和复杂性,发展一个通用框架被认为是不可行的。相反,我们主张采用以公平、偏见、风险、安全性、可解释性和可靠性为中心的方法。我们审视了正在进行中的监管环境,重点关注欧盟、中国和美国的倡议。我们认识到基于地缘政治和地理原因的人工智能监管规定的差异对跨国公司构成了额外挑战。我们确定风险是AI监管和TAI的核心因素。例如,正如欧盟人工智能法案中所概述的,组织必须评估其人工智能产品的风险水平并相应行动(否则可能面临巨额罚款)。我们比较了TAI实施的模式以及多个跨职能团队如何参与整个过程。因此,对TAI实施的蛮力方法使其效率和灵活性变得无关紧要。为了解决这个问题,我们引入了我们的框架“设定-形式化-测量-行动”(SFMA)。我们的解决方案强调了将TAI意识的度量标准、TAI的驱动因素、利益相关者和业务/法律要求转化为实际基准或测试的重要性。最后,由于对强大人工智能模型的恐慌所驱动的过度监管实际上也可能损害TAI。根据GitHub用户活动数据,在2023年,AI开源项目成为贡献者账户数量最多的项目。推动TAI中的创新取决于开源社区的独立贡献。
更新时间: 2024-04-06 10:45:35
领域: cs.CY,cs.AI,cs.HC
Enhancing Readmission Prediction with Deep Learning: Extracting Biomedical Concepts from Clinical Texts
Hospital readmission, defined as patients being re-hospitalized shortly after discharge, is a critical concern as it impacts patient outcomes and healthcare costs. Identifying patients at risk of readmission allows for timely interventions, reducing re-hospitalization rates and overall treatment costs. This study focuses on predicting patient readmission within less than 30 days using text mining techniques applied to discharge report texts from electronic health records (EHR). Various machine learning and deep learning methods were employed to develop a classification model for this purpose. A novel aspect of this research involves leveraging the Bio-Discharge Summary Bert (BDSS) model along with principal component analysis (PCA) feature extraction to preprocess data for deep learning model input. Our analysis of the MIMIC-III dataset indicates that our approach, which combines the BDSS model with a multilayer perceptron (MLP), outperforms state-of-the-art methods. This model achieved a recall of 94% and an area under the curve (AUC) of 75%, showcasing its effectiveness in predicting patient readmissions. This study contributes to the advancement of predictive modeling in healthcare by integrating text mining techniques with deep learning algorithms to improve patient outcomes and optimize resource allocation.
Updated: 2024-04-06 10:39:02
标题: 用深度学习提升再入院预测:从临床文本中提取生物医学概念
摘要: 住院再入院,即患者在出院后不久再次入院,是一个关键问题,因为它影响患者的预后和医疗成本。识别再入院风险的患者可以及时进行干预,降低再入院率和整体治疗成本。本研究着重于利用文本挖掘技术应用于电子健康记录(EHR)的出院报告文本,预测患者在30天内的再入院情况。采用各种机器学习和深度学习方法开发了一个用于此目的的分类模型。这项研究的一个新颖之处在于利用Bio-Discharge Summary Bert(BDSS)模型以及主成分分析(PCA)特征提取来预处理数据以供深度学习模型输入。我们对MIMIC-III数据集的分析表明,我们的方法,将BDSS模型与多层感知器(MLP)结合使用,优于最先进的方法。该模型实现了94%的召回率和75%的曲线下面积(AUC),展示了其在预测患者再入院方面的有效性。这项研究通过将文本挖掘技术与深度学习算法结合起来,对健康预测建模的进展做出了贡献,以改善患者预后并优化资源分配。
更新时间: 2024-04-06 10:39:02
领域: cs.CL,cs.AI
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks.
Updated: 2024-04-06 10:22:57
标题: QLLM:大型语言模型的精确高效低位宽量化
摘要: 大型语言模型(LLMs)在自然语言处理方面表现出色,但其需求限制了它们的广泛部署。虽然量化感知训练(QAT)提供了一种解决方案,但其昂贵的训练成本使得后训练量化(PTQ)成为LLMs更实用的方法。现有研究指出,特定通道中的激活异常值是影响PTQ准确性的瓶颈。他们提出将激活的幅度转换为权重,然而这种方法只能在一定程度上减轻问题或者受到不稳定的梯度影响,导致在低比特宽度下性能严重下降。在本文中,我们提出了QLLM,这是一种为LLMs设计的准确高效的低比特宽度后训练量化方法。QLLM引入了一种自适应通道重组技术,将异常值的幅度重新分配到其他通道,从而减轻它们对量化范围的影响。这通过通道分解和通道组装实现,首先将异常通道分解为几个子通道,以确保激活幅度的更均衡分布。然后将类似的通道合并以保持效率。此外,设计了一种自适应策略,可以自主确定通道分解的最佳子通道数量。为了进一步弥补由量化引起的性能损失,我们提出了一种有效的调整方法,只学习少量低秩权重,同时冻结预训练的量化模型。训练后,这些低秩参数可以融合到冻结的权重中,而不影响推理。在LLaMA-1和LLaMA-2上的大量实验表明,QLLM可以高效地获得准确的量化模型。例如,QLLM在单个A100-80G GPU上在10小时内对LLaMA-2-70B进行4位量化,平均准确率比之前的最先进方法高出7.89%。
更新时间: 2024-04-06 10:22:57
领域: cs.CL,cs.AI,cs.LG
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is it viable to learn a neural network to predict a personalized temperature of any input data for enhancing LFMs"? In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our solution is composed of a novel learning framework with a robust loss underpinned by constrained distributionally robust optimization (DRO), and a properly designed TempNet with theoretical inspiration. TempNet can be trained together with a large foundation model from scratch or learned separately given a pretrained foundation model. It is not only useful for predicting personalized temperature to promote the training of LFMs but also generalizable and transferable to new tasks. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models, e.g. Table 1. The code to reproduce the experimental results in this paper can be found at https://github.com/zhqiu/TempNet.
Updated: 2024-04-06 09:55:03
标题: 要冷却还是不要冷却?温度网络通过DRO与大型基础模型相遇
摘要: 温度参数在训练和/或推断中对大型基础模型(LFM)如大型语言模型(LLM)和CLIP模型起着重要作用。特别是,它调整了LLM中softmax函数中的logits,这对于生成下一个标记至关重要,并且它在训练CLIP模型时调整了对比损失中的相似性。一个重要的问题是:是否可行学习一个神经网络来预测任何输入数据的个性化温度以增强LFMs?在本文中,我们提出了一个有原则的框架,用于学习一个小型但可泛化的温度预测网络(TempNet)以改进LFMs。我们的解决方案由一个新颖的学习框架和一个受到约束分布鲁棒优化(DRO)支持的稳健损失组成,并且一个经过妥善设计的具有理论启发的TempNet。TempNet可以与大型基础模型一起从头开始训练,也可以在给定预训练基础模型的情况下单独学习。它不仅有助于预测个性化温度以促进LFMs的训练,而且具有泛化性和可转移性到新任务。我们在LLMs和CLIP模型上的实验表明,TempNet极大地提高了现有解决方案或模型的性能,例如表1。本文中用于重现实验结果的代码可以在https://github.com/zhqiu/TempNet找到。
更新时间: 2024-04-06 09:55:03
领域: cs.LG,cs.AI,math.OC
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
Gender-inclusive NLP research has documented the harmful limitations of gender binary-centric large language models (LLM), such as the inability to correctly use gender-diverse English neopronouns (e.g., xe, zir, fae). While data scarcity is a known culprit, the precise mechanisms through which scarcity affects this behavior remain underexplored. We discover LLM misgendering is significantly influenced by Byte-Pair Encoding (BPE) tokenization, the tokenizer powering many popular LLMs. Unlike binary pronouns, BPE overfragments neopronouns, a direct consequence of data scarcity during tokenizer training. This disparate tokenization mirrors tokenizer limitations observed in multilingual and low-resource NLP, unlocking new misgendering mitigation strategies. We propose two techniques: (1) pronoun tokenization parity, a method to enforce consistent tokenization across gendered pronouns, and (2) utilizing pre-existing LLM pronoun knowledge to improve neopronoun proficiency. Our proposed methods outperform finetuning with standard BPE, improving neopronoun accuracy from 14.1% to 58.4%. Our paper is the first to link LLM misgendering to tokenization and deficient neopronoun grammar, indicating that LLMs unable to correctly treat neopronouns as pronouns are more prone to misgender.
Updated: 2024-04-06 09:32:53
标题: 标记化很重要:导航数据稀缺的标记化以实现性别包容性语言技术
摘要: 性别包容性自然语言处理研究已经记录了性别二元中心主义大型语言模型(LLM)的有害局限,例如无法正确使用性别多样化的英语新代词(例如xe、zir、fae)。虽然数据稀缺是一个已知的问题,但稀缺数据如何影响这种行为的确切机制仍未得到深入探讨。我们发现LLM的误性别化受到字节对编码(BPE)分词的显著影响,这是许多流行LLM的分词器。与二元代词不同,BPE过度分割新代词,这是分词器训练期间数据稀缺的直接后果。这种不一致的分词反映了观察到的多语言和低资源自然语言处理中的分词器限制,为新的减少误性别化策略打开了大门。我们提出了两种技术:(1)代词分词平衡,一种强制性保持性别代词一致分词的方法,以及(2)利用现有的LLM代词知识来提高新代词的熟练程度。我们提出的方法优于使用标准BPE进行微调,将新代词准确率从14.1%提高到58.4%。我们的论文是第一个将LLM的误性别化与分词化和不足的新代词语法联系起来的论文,表明无法正确将新代词视为代词的LLM更容易出现误性别化。
更新时间: 2024-04-06 09:32:53
领域: cs.CL,cs.AI,cs.LG
Optimization of Lightweight Malware Detection Models For AIoT Devices
Malware intrusion is problematic for Internet of Things (IoT) and Artificial Intelligence of Things (AIoT) devices as they often reside in an ecosystem of connected devices, such as a smart home. If any devices are infected, the whole ecosystem can be compromised. Although various Machine Learning (ML) models are deployed to detect malware and network intrusion, generally speaking, robust high-accuracy models tend to require resources not found in all IoT devices, compared to less robust models defined by weak learners. In order to combat this issue, Fadhilla proposed a meta-learner ensemble model comprised of less robust prediction results inherent with weak learner ML models to produce a highly robust meta-learning ensemble model. The main problem with the prior research is that it cannot be deployed in low-end AIoT devices due to the limited resources comprising processing power, storage, and memory (the required libraries quickly exhaust low-end AIoT devices' resources.) Hence, this research aims to optimize the proposed super learner meta-learning ensemble model to make it viable for low-end AIoT devices. We show the library and ML model memory requirements associated with each optimization stage and emphasize that optimization of current ML models is necessitated for low-end AIoT devices. Our results demonstrate that we can obtain similar accuracy and False Positive Rate (FPR) metrics from high-end AIoT devices running the derived ML model, with a lower inference duration and smaller memory footprint.
Updated: 2024-04-06 09:30:38
标题: AIoT设备轻量级恶意软件检测模型的优化
摘要: 恶意软件入侵对物联网(IoT)和物联网人工智能(AIoT)设备构成问题,因为它们通常存在于连接设备的生态系统中,比如智能家居。如果任何设备被感染,整个生态系统都可能受到威胁。虽然各种机器学习(ML)模型被部署来检测恶意软件和网络入侵,但通常来说,强健高准确度的模型往往需要一些并非所有IoT设备都具备的资源,相比之下,由弱学习器定义的不太强健模型相对较少。为了解决这个问题,Fadhilla提出了一个元学习器集成模型,由弱学习器ML模型固有的不太强健的预测结果组成,以产生一个高度强健的元学习集成模型。先前研究的主要问题在于,由于处理能力、存储和内存等有限的资源,这些模型无法部署在低端AIoT设备上(所需的库会迅速耗尽低端AIoT设备的资源)。因此,本研究旨在优化所提出的超级学习器元学习集成模型,使其适用于低端AIoT设备。我们展示了与每个优化阶段相关的库和ML模型内存需求,并强调为低端AIoT设备优化当前ML模型的必要性。我们的结果表明,我们可以从运行衍生ML模型的高端AIoT设备上获得类似的准确度和误报率(FPR)指标,而推理持续时间更短,内存占用更小。
更新时间: 2024-04-06 09:30:38
领域: cs.CR,cs.LG
Enhancing Video Summarization with Context Awareness
Video summarization is a crucial research area that aims to efficiently browse and retrieve relevant information from the vast amount of video content available today. With the exponential growth of multimedia data, the ability to extract meaningful representations from videos has become essential. Video summarization techniques automatically generate concise summaries by selecting keyframes, shots, or segments that capture the video's essence. This process improves the efficiency and accuracy of various applications, including video surveillance, education, entertainment, and social media. Despite the importance of video summarization, there is a lack of diverse and representative datasets, hindering comprehensive evaluation and benchmarking of algorithms. Existing evaluation metrics also fail to fully capture the complexities of video summarization, limiting accurate algorithm assessment and hindering the field's progress. To overcome data scarcity challenges and improve evaluation, we propose an unsupervised approach that leverages video data structure and information for generating informative summaries. By moving away from fixed annotations, our framework can produce representative summaries effectively. Moreover, we introduce an innovative evaluation pipeline tailored specifically for video summarization. Human participants are involved in the evaluation, comparing our generated summaries to ground truth summaries and assessing their informativeness. This human-centric approach provides valuable insights into the effectiveness of our proposed techniques. Experimental results demonstrate that our training-free framework outperforms existing unsupervised approaches and achieves competitive results compared to state-of-the-art supervised methods.
Updated: 2024-04-06 09:08:34
标题: 利用上下文意识增强视频摘要
摘要: 视频摘要是一个重要的研究领域,旨在有效地浏览和检索当今海量视频内容中的相关信息。随着多媒体数据呈指数级增长,从视频中提取有意义的表示已成为至关重要的。视频摘要技术通过选择捕捉视频要点的关键帧、镜头或片段自动生成简洁摘要。这一过程提高了各种应用的效率和准确性,包括视频监控、教育、娱乐和社交媒体等。尽管视频摘要的重要性,存在着缺乏多样性和代表性数据集的问题,阻碍了算法的全面评估和基准测试。现有的评估指标也未能充分捕捉视频摘要的复杂性,限制了准确的算法评估,并阻碍了该领域的进展。为了克服数据稀缺挑战并改善评估,我们提出了一种利用视频数据结构和信息生成信息摘要的无监督方法。通过摆脱固定注释,我们的框架能够有效地生成代表性摘要。此外,我们还引入了一个专门针对视频摘要定制的创新评估流程。人类参与者参与评估,将我们生成的摘要与基准摘要进行比较,并评估其信息量。这种以人为中心的方法为我们提出的技术的有效性提供了宝贵的见解。实验结果表明,我们的无需训练的框架优于现有的无监督方法,并与最先进的监督方法达成了竞争性的结果。
更新时间: 2024-04-06 09:08:34
领域: cs.CV,cs.AI
Spectral GNN via Two-dimensional (2-D) Graph Convolution
Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph learning. As an essential part of spectral GNNs, spectral graph convolution extracts crucial frequency information in graph data, leading to superior performance of spectral GNNs in downstream tasks. However, in this paper, we show that existing spectral GNNs remain critical drawbacks in performing the spectral graph convolution. Specifically, considering the spectral graph convolution as a construction operation towards target output, we prove that existing popular convolution paradigms cannot construct the target output with mild conditions on input graph signals, causing spectral GNNs to fall into suboptimal solutions. To address the issues, we rethink the spectral graph convolution from a more general two-dimensional (2-D) signal convolution perspective and propose a new convolution paradigm, named 2-D graph convolution. We prove that 2-D graph convolution unifies existing graph convolution paradigms, and is capable to construct arbitrary target output. Based on the proposed 2-D graph convolution, we further propose ChebNet2D, an efficient and effective GNN implementation of 2-D graph convolution through applying Chebyshev interpolation. Extensive experiments on benchmark datasets demonstrate both effectiveness and efficiency of the ChebNet2D.
Updated: 2024-04-06 08:53:26
标题: 基于二维图卷积的谱图神经网络
摘要: 谱图神经网络(GNNs)在图学习中取得了巨大成功。作为谱GNNs的一个重要部分,谱图卷积提取了图数据中的关键频率信息,导致谱GNNs在下游任务中表现出优越性能。然而,在本文中,我们展示了现有谱GNNs在执行谱图卷积时仍存在关键缺陷。具体来说,将谱图卷积视为朝向目标输出的构造操作,我们证明现有流行的卷积范式无法在输入图信号上以温和条件构建目标输出,导致谱GNNs陷入次优解决方案。为了解决这些问题,我们从更一般的二维(2-D)信号卷积角度重新思考谱图卷积,并提出了一个新的卷积范式,称为2-D图卷积。我们证明2-D图卷积统一了现有的图卷积范式,并能够构建任意目标输出。基于提出的2-D图卷积,我们进一步提出了ChebNet2D,通过应用切比雪夫插值实现2-D图卷积的高效有效的GNN实现。在基准数据集上进行的广泛实验展示了ChebNet2D的有效性和效率。
更新时间: 2024-04-06 08:53:26
领域: cs.LG,cs.NA,eess.SP,math.NA
Correcting misinformation on social media with a large language model
Real-world misinformation can be partially correct and even factual but misleading. It undermines public trust in science and democracy, particularly on social media, where it can spread rapidly. High-quality and timely correction of misinformation that identifies and explains its (in)accuracies has been shown to effectively reduce false beliefs. Despite the wide acceptance of manual correction, it is difficult to promptly correct newly created misinformation and to scale this approach, a concern as technologies like large language models (LLMs) make misinformation easier to produce. LLMs also have versatile capabilities that could accelerate misinformation correction--however, they struggle due to a lack of recent information, a tendency to produce false content, and limitations in addressing multimodal information. We propose MUSE, an LLM augmented with access to and credibility evaluation of up-to-date information. By retrieving evidence as refutations or contexts, MUSE identifies and explains (in)accuracies in a piece of content--not presupposed to be misinformation--with references. It also describes images and conducts multimodal searches to verify and correct multimodal content. Fact-checking experts evaluate responses to social media content that are not presupposed to be (non-)misinformation but broadly include incorrect, partially correct, and correct posts, that may or may not be misleading. We propose and evaluate 13 dimensions of misinformation correction quality, ranging from the accuracy of identifications and factuality of explanations to the relevance and credibility of references. The results demonstrate MUSE's ability to promptly write high-quality responses to potential misinformation on social media--overall, MUSE outperforms GPT-4 by 37% and even high-quality responses from laypeople by 29%.
Updated: 2024-04-06 08:49:31
标题: 使用大型语言模型纠正社交媒体上的错误信息
摘要: 真实世界中的错误信息可能部分正确甚至事实准确,但具有误导性。这削弱了公众对科学和民主的信任,尤其是在社交媒体上,错误信息可以迅速传播。已经证明,高质量和及时的纠正错误信息,可以有效减少错误信念。尽管手动纠正被广泛接受,但很难迅速纠正新创建的错误信息,并扩展这种方法,这是一个担忧,因为像大型语言模型(LLMs)这样的技术使错误信息更容易产生。LLMs还具有多功能能力,可以加速错误信息的纠正--然而,由于缺乏最新信息、倾向于产生假内容以及在处理多模态信息方面存在限制,它们面临困难。我们提出了MUSE,这是一个增强了对最新信息访问和可信度评估的LLM。通过检索证据作为反驳或背景,MUSE可以识别和解释一篇内容中的(不)准确性--这篇内容并非被假设为错误信息--并提供参考。它还描述图片并进行多模态搜索,以验证和纠正多模态内容。事实核查专家评估社交媒体内容的回应,这些回应并非被假定为(非)错误信息,但广泛包括不正确、部分正确和正确的帖子,可能会或可能不会具有误导性。我们提出并评估了13个维度的错误信息纠正质量,从识别的准确性和解释的事实性到参考的相关性和可信度。结果显示了MUSE能够迅速撰写高质量的回应,以应对社交媒体上的潜在错误信息--总体而言,MUSE的表现比GPT-4高出37%,甚至比普通人的高质量回应高出29%。
更新时间: 2024-04-06 08:49:31
领域: cs.CL,cs.AI
Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders
We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including rate-optimal approximation of smooth functions or approximation without the curse of dimensionality. Moreover, positive-weight spiking neural networks are shown to depend continuously on their parameters which facilitates classical covering number-based generalization statements. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities.
Updated: 2024-04-06 08:17:07
标题: 使用配备仿射编码器和解码器的脉冲神经网络实现高效学习
摘要: 我们研究了与脉冲神经网络相关的学习问题。具体而言,我们考虑了具有仿射时间编码器和解码器以及仅具有正向突触权重的简单脉冲神经元的脉冲神经网络的假设集。我们证明了权重的正性继续使得一系列表达能力结果成为可能,包括对平滑函数的速率最优逼近或无维度灾难的逼近。此外,正权重脉冲神经网络被证明依赖于其参数,这有助于经典基于覆盖数的泛化陈述。最后,我们观察到,从泛化的角度来看,与前馈神经网络或以前对于一般脉冲神经网络的结果相反,深度对泛化能力几乎没有不利影响。
更新时间: 2024-04-06 08:17:07
领域: cs.NE,cs.LG,math.FA,stat.ML
Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner
Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.
Updated: 2024-04-06 08:07:48
标题: 自然启发计算在癌症筛查中的集成式彻底利用
摘要: 癌症类型的准确筛查对于有效的癌症检测和精确的治疗选择至关重要。然而,基因表达谱与肿瘤之间的关联通常仅限于少量生物标记基因。虽然使用自然启发算法的计算方法已经显示出在选择预测基因方面的潜力,但现有技术受到搜索效率低和在不同数据集中的泛化能力差的限制。本研究提出了一个名为进化优化多样性集成学习(EODE)的框架,以改善基因表达数据中癌症分类的集成学习。EODE方法结合了智能灰狼优化算法用于选择性特征空间缩减,引导随机注入建模用于增强集成多样性,以及子集模型优化用于协同分类器组合。在涵盖了各种癌症类型的35个基因表达基准数据集上进行了大量实验。结果表明,EODE相对于单独和传统聚合模型获得了显著提高的筛查准确性。先进特征选择、定向专业建模和协作分类器集成的综合优化有助于解决当前自然启发方法中的关键挑战。这为基因表达生物标记的强大和泛化的集成学习提供了一个有效的框架。具体来说,我们已经在Github上开放了EODE的源代码:https://github.com/wangxb96/EODE。
更新时间: 2024-04-06 08:07:48
领域: cs.NE,cs.AI,cs.LG
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often yielded human-centric scenes with severe artifacts. We propose BeyondScene, a novel framework that overcomes prior limitations, generating exquisite higher-resolution (over 8K) human-centric scenes with exceptional text-image correspondence and naturalness using existing pretrained diffusion models. BeyondScene employs a staged and hierarchical approach to initially generate a detailed base image focusing on crucial elements in instance creation for multiple humans and detailed descriptions beyond token limit of diffusion model, and then to seamlessly convert the base image to a higher-resolution output, exceeding training image size and incorporating details aware of text and instances via our novel instance-aware hierarchical enlargement process that consists of our proposed high-frequency injected forward diffusion and adaptive joint diffusion. BeyondScene surpasses existing methods in terms of correspondence with detailed text descriptions and naturalness, paving the way for advanced applications in higher-resolution human-centric scene creation beyond the capacity of pretrained diffusion models without costly retraining. Project page: https://janeyeon.github.io/beyond-scene.
Updated: 2024-04-06 07:53:49
标题: 超越场景:使用预训练扩散生成更高分辨率的以人为中心的场景
摘要: 使用细节和控制生成更高分辨率的以人为中心的场景仍然是现有文本到图像扩散模型的挑战。这一挑战源自有限的训练图像大小、文本编码器容量(有限标记)以及生成涉及多个人的复杂场景的固有困难。虽然当前方法仅尝试解决训练大小限制,但它们经常产生具有严重伪影的以人为中心的场景。我们提出了BeyondScene,这是一个新颖的框架,克服了先前的限制,使用现有的预训练扩散模型生成精美的更高分辨率(超过8K)的以人为中心的场景,具有出色的文本-图像对应性和自然性。BeyondScene采用分阶段和分层方法,首先生成一个详细的基础图像,重点关注多个人的实例创建中的关键元素和扩散模型的标记限制以外的详细描述,然后将基础图像无缝转换为高分辨率输出,超过训练图像大小,并通过我们提出的高频注入前向扩散和自适应联合扩散组成的实例感知分层放大过程,结合了文本和实例的细节。BeyondScene在与详细文本描述和自然性的对应方面超越了现有方法,为超越预训练扩散模型容量的更高分辨率以人为中心的场景创作提供了可能,而无需昂贵的重新训练。项目页面:https://janeyeon.github.io/beyond-scene。
更新时间: 2024-04-06 07:53:49
领域: cs.CV,cs.AI
The Case for Developing a Foundation Model for Planning-like Tasks from Scratch
Foundation Models (FMs) have revolutionized many areas of computing, including Automated Planning and Scheduling (APS). For example, a recent study found them useful for planning problems: plan generation, language translation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. Besides APS, there are many seemingly related tasks involving the generation of a series of actions with varying guarantees of their executability to achieve intended goals, which we collectively call planning-like (PL) tasks like business processes, programs, workflows, and guidelines, where researchers have considered using FMs. However, previous works have primarily focused on pre-trained, off-the-shelf FMs and optionally fine-tuned them. This paper discusses the need for a comprehensive FM for PL tasks from scratch and explores its design considerations. We argue that such an FM will open new and efficient avenues for PL problem-solving, just like LLMs are creating for APS.
Updated: 2024-04-06 07:44:40
标题: 从零开始开发规划任务基础模型的案例
摘要: 基础模型(FMs)已经在许多领域的计算中引起了革命,包括自动规划和调度(APS)。例如,最近的一项研究发现它们对规划问题很有用:计划生成、语言翻译、模型构建、多智能体规划、交互规划、启发式优化、工具集成和脑启发式规划。除了APS,还有许多看似相关的任务涉及生成一系列行动以实现预期目标的各种保证,我们统称为类似规划(PL)任务,如业务流程、程序、工作流程和指南,研究人员已考虑使用FMs。然而,先前的工作主要集中在预训练的、现成的FMs上,并可选择对其进行微调。本文讨论了从头开始为PL任务设计全面的FM的必要性,并探讨了其设计考虑。我们认为这样的FM将为PL问题解决开辟新的高效途径,就像LLMs为APS正在创造的一样。
更新时间: 2024-04-06 07:44:40
领域: cs.AI
Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning
The chain-of-thought technique has been received well in multi-modal tasks. It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts. However, human thought processes are predominantly non-linear, as they encompass multiple aspects simultaneously and employ dynamic adjustment and updating mechanisms. Therefore, we propose a novel Aggregation-Graph-of-Thought (AGoT) mechanism for soft-prompt tuning in multi-modal representation learning. The proposed AGoT models the human thought process not only as a chain but also models each step as a reasoning aggregation graph to cope with the overlooked multiple aspects of thinking in single-step reasoning. This turns the entire reasoning process into prompt aggregation and prompt flow operations. Experiments show that our multi-modal model enhanced with AGoT soft-prompting achieves good results in several tasks such as text-image retrieval, visual question answering, and image recognition. In addition, we demonstrate that it has good domain generalization performance due to better reasoning.
Updated: 2024-04-06 07:39:44
标题: Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning 多模态表示学习中的思维图软提示
摘要: 思维链技术在多模态任务中得到了很好的接受。它是一种逐步线性推理过程,通过调整链的长度来提高生成提示的性能。然而,人类思维过程主要是非线性的,因为它们同时涵盖多个方面,并采用动态调整和更新机制。因此,我们提出了一种新颖的聚合思维图机制(AGoT)用于多模态表示学习中的软提示调节。所提出的AGoT模型不仅将人类思维过程建模为一种链,还将每一步建模为一个推理聚合图,以应对单步推理中被忽视的多方面思考。这将整个推理过程转化为提示聚合和提示流操作。实验证明,我们增强了AGoT软提示的多模态模型在文本图像检索、视觉问题回答和图像识别等几个任务中取得了良好的结果。此外,我们证明它具有良好的领域泛化性能,因为推理更好。
更新时间: 2024-04-06 07:39:44
领域: cs.AI,cs.CL
Impact of Fairness Regulations on Institutions' Policies and Population Qualifications
The proliferation of algorithmic systems has fueled discussions surrounding the regulation and control of their social impact. Herein, we consider a system whose primary objective is to maximize utility by selecting the most qualified individuals. To promote demographic parity in the selection algorithm, we consider penalizing discrimination across social groups. We examine conditions under which a discrimination penalty can effectively reduce disparity in the selection. Additionally, we explore the implications of such a penalty when individual qualifications may evolve over time in response to the imposed penalizing policy. We identify scenarios where the penalty could hinder the natural attainment of equity within the population. Moreover, we propose certain conditions that can counteract this undesirable outcome, thus ensuring fairness.
Updated: 2024-04-06 07:21:41
标题: 公平法规对机构政策和人口资格的影响
摘要: 算法系统的泛滥引发了围绕其社会影响的监管和控制的讨论。在这里,我们考虑一个主要目标是通过选择最合格个体来最大化效用的系统。为了促进选择算法中的人口统计平衡,我们考虑惩罚社会群体之间的歧视。我们研究了歧视惩罚在减少选择中的差异方面有效的条件。此外,我们探讨了这种惩罚在个体资格可能随时间对施加的惩罚政策而发生变化时的影响。我们确定了一些情景,在这些情景中,这种惩罚可能阻碍人口内在平等的实现。此外,我们提出了一些条件,可以抵消这种不良结果,从而确保公平。
更新时间: 2024-04-06 07:21:41
领域: cs.LG,cs.AI,cs.CY
Concept -- An Evaluation Protocol on Conversation Recommender Systems with System-centric and User-centric Factors
The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protocol, Concept, which integrates both system- and user-centric factors. We conceptualise three key characteristics in representing such factors and further divide them into six primary abilities. To implement Concept, we adopt a LLM-based user simulator and evaluator with scoring rubrics that are tailored for each primary ability. Our protocol, Concept, serves a dual purpose. First, it provides an overview of the pros and cons in current CRS models. Second, it pinpoints the problem of low usability in the "omnipotent" ChatGPT and offers a comprehensive reference guide for evaluating CRS, thereby setting the foundation for CRS improvement.
Updated: 2024-04-06 07:04:35
标题: Concept-一个基于系统中心和用户中心因素的对话推荐系统评估协议
摘要: 对话推荐系统(CRS)在学术界取得了显著进展,但在现实场景中却受到用户体验的批评。现有的CRS评估协议可能会优先考虑系统中心因素,如对话的有效性和流畅性,而忽略用户中心的因素。因此,我们提出了一个新的全面评估协议Concept,该协议整合了系统和用户中心因素。我们概念化了代表这些因素的三个关键特征,并将它们进一步划分为六个主要能力。为了实现Concept,我们采用了基于LLM的用户模拟器和评估器,其评分标准针对每个主要能力进行了定制。我们的协议Concept具有双重目的。首先,它提供了当前CRS模型的优缺点概述。其次,它指出了“全知”的ChatGPT在可用性方面存在问题,并为评估CRS提供了全面的参考指南,从而为CRS改进奠定基础。
更新时间: 2024-04-06 07:04:35
领域: cs.CL,cs.AI
A Survey of Route Recommendations: Methods, Applications, and Opportunities
Nowadays, with advanced information technologies deployed citywide, large data volumes and powerful computational resources are intelligentizing modern city development. As an important part of intelligent transportation, route recommendation and its applications are widely used, directly influencing citizens` travel habits. Developing smart and efficient travel routes based on big data (possibly multi-modal) has become a central challenge in route recommendation research. Our survey offers a comprehensive review of route recommendation work based on urban computing. It is organized by the following three parts: 1) Methodology-wise. We categorize a large volume of traditional machine learning and modern deep learning methods. Also, we discuss their historical relations and reveal the edge-cutting progress. 2) Application\-wise. We present numerous novel applications related to route commendation within urban computing scenarios. 3) We discuss current problems and challenges and envision several promising research directions. We believe that this survey can help relevant researchers quickly familiarize themselves with the current state of route recommendation research and then direct them to future research trends.
Updated: 2024-04-06 07:02:46
标题: 一份关于路径推荐的调查:方法、应用和机会
摘要: 如今,随着先进的信息技术在整个城市范围部署,大数据量和强大的计算资源正在智能化现代城市发展。作为智能交通的重要组成部分,路径推荐及其应用被广泛使用,直接影响市民的出行习惯。基于大数据(可能是多模态)开发智能高效的出行路径已成为路径推荐研究的中心挑战。我们的调查对基于城市计算的路径推荐工作进行了全面审查。它分为以下三个部分:1)方法学方面。我们对大量传统机器学习和现代深度学习方法进行分类。此外,我们讨论它们的历史关系并揭示了尖端的进展。2)应用方面。我们提出了许多与城市计算场景中路径推荐相关的新颖应用。3)我们讨论当前问题和挑战,并设想了几个有前途的研究方向。我们相信,这项调查可以帮助相关研究人员快速了解路径推荐研究的当前状态,然后引导他们走向未来的研究趋势。
更新时间: 2024-04-06 07:02:46
领域: cs.AI,cs.LG
Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs
Diffusion models have exhibited excellent performance in various domains. The probability flow ordinary differential equation (ODE) of diffusion models (i.e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation. However, the likelihood estimation results by diffusion ODEs are still far from those of the state-of-the-art likelihood-based generative models. In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives. For training, we propose velocity parameterization and explore variance reduction techniques for faster convergence. We also derive an error-bounded high-order flow matching objective for finetuning, which improves the ODE likelihood and smooths its trajectory. For evaluation, we propose a novel training-free truncated-normal dequantization to fill the training-evaluation gap commonly existing in diffusion ODEs. Building upon these techniques, we achieve state-of-the-art likelihood estimation results on image datasets (2.56 on CIFAR-10, 3.43/3.69 on ImageNet-32) without variational dequantization or data augmentation, and 2.42 on CIFAR-10 with data augmentation. Code is available at \url{https://github.com/thu-ml/i-DODE}.
Updated: 2024-04-06 07:01:53
标题: 改进的技术用于扩散ODE的最大似然估计
摘要: 扩散模型在各个领域表现出色。扩散模型的概率流常微分方程(ODE)(即扩散ODEs)是连续归一化流(CNFs)的一个特例,使得确定性推断和精确似然评估成为可能。然而,通过扩散ODEs进行的似然估计结果仍远远落后于最先进的基于似然的生成模型。本文提出了几种用于扩散ODEs最大似然估计的改进技术,包括训练和评估两个方面。在训练方面,我们提出了速度参数化并探索方差缩减技术以加快收敛速度。我们还推导出了一个误差有界的高阶流匹配目标用于微调,这样可以提高ODE的似然性并平滑其轨迹。在评估方面,我们提出了一种新颖的无需训练的截断正态去量化方法,以填补扩散ODEs中常见的训练-评估差距。基于这些技术,我们在图像数据集上取得了最先进的似然估计结果(在CIFAR-10上为2.56,在ImageNet-32上为3.43/3.69),而无需变分去量化或数据增强,以及在CIFAR-10上为2.42(带有数据增强)。代码可在\url{https://github.com/thu-ml/i-DODE}上找到。
更新时间: 2024-04-06 07:01:53
领域: cs.LG
VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA
Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications, outperforming their CNN counterparts. However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive training data to generalize well due to their low locality; the standard SAR datasets, however, have a limited number of labeled training data which reduces the learning capability of ViTs; (2) ViTs have a high parameter count and are computation intensive which makes their deployment on resource-constrained SAR platforms difficult. In this work, we develop a lightweight ViT model that can be trained directly on small datasets without any pre-training by utilizing the Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. We directly train this model on SAR datasets which have limited training samples to evaluate its effectiveness for SAR ATR applications. We evaluate our proposed model, that we call VTR (ViT for SAR ATR), on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Further, we propose a novel FPGA accelerator for VTR, in order to enable deployment for real-time SAR ATR applications.
Updated: 2024-04-06 06:49:55
标题: VTR:一种针对FPGA上SAR ATR加速的优化视觉变换器
摘要: 合成孔径雷达(SAR)自动目标识别(ATR)是军事应用中的关键技术,如遥感图像识别。视觉变换器(ViTs)目前是各种计算机视觉应用的最先进技术,在性能上超越了它们的CNN对应物。然而,由于标准ViTs需要大量训练数据才能很好地泛化,加之标准SAR数据集中标记的训练数据数量有限,这降低了ViTs的学习能力;ViTs参数数量较高,计算密集,使得它们在资源受限的SAR平台上部署困难。在这项工作中,我们开发了一种轻量级ViT模型,通过利用偏移补丁标记化(SPT)和局部自注意(LSA)模块,可以直接在小数据集上进行训练,无需预训练。我们直接在SAR数据集上训练这个模型,该数据集具有有限的训练样本,以评估其在SAR ATR应用中的有效性。我们在三个广泛使用的SAR数据集上评估我们提出的模型,我们称之为VTR(ViT用于SAR ATR)。此外,我们提出了一种用于VTR的新型FPGA加速器,以便实现实时SAR ATR应用的部署。
更新时间: 2024-04-06 06:49:55
领域: cs.CV,cs.AI,cs.AR,cs.DC
IITK at SemEval-2024 Task 10: Who is the speaker? Improving Emotion Recognition and Flip Reasoning in Conversations via Speaker Embeddings
This paper presents our approach for the SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversations. For the Emotion Recognition in Conversations (ERC) task, we utilize a masked-memory network along with speaker participation. We propose a transformer-based speaker-centric model for the Emotion Flip Reasoning (EFR) task. We also introduce Probable Trigger Zone, a region of the conversation that is more likely to contain the utterances causing the emotion to flip. For sub-task 3, the proposed approach achieves a 5.9 (F1 score) improvement over the task baseline. The ablation study results highlight the significance of various design choices in the proposed method.
Updated: 2024-04-06 06:47:44
标题: IITK在SemEval-2024任务10中的表现:谁是说话者?通过说话者嵌入改进对话中的情绪识别和翻转推理
摘要: 这篇论文介绍了我们在SemEval-2024任务10中的方法:在对话中发现和推理情绪及其翻转。对于对话中的情绪识别(ERC)任务,我们利用了一个带有训练的记忆网络和说话者参与的方法。我们提出了一个基于变压器的以说话者为中心的模型,用于情绪翻转推理(EFR)任务。我们还引入了可能的触发区域,这是对话中更有可能包含导致情绪翻转的话语的区域。对于子任务3,所提出的方法比任务基线提高了5.9(F1分数)。消融研究结果突出了所提方法中各种设计选择的重要性。
更新时间: 2024-04-06 06:47:44
领域: cs.CL,cs.AI,cs.LG
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models
Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized in Large Language Models (LLMs) to improve the down-streaming tasks without the cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively use PEFT for fine-tuning LLMs in ranking tasks with convincing performance; there are some limitations, including the learned prompt being fixed for different documents, overfitting to specific tasks, and low adaptation ability. In this paper, we introduce a query-dependent parameter efficient fine-tuning (Q-PEFT) approach for text reranking to leak the information of the true queries to LLMs and then make the generation of true queries from input documents much easier. Specifically, we utilize the query to extract the top-$k$ tokens from concatenated documents, serving as contextual clues. We further augment Q-PEFT by substituting the retrieval mechanism with a multi-head attention layer to achieve end-to-end training and cover all the tokens in the documents, guiding the LLMs to generate more document-specific synthetic queries, thereby further improving the reranking performance. Extensive experiments are conducted on four public datasets, demonstrating the effectiveness of our proposed approach.
Updated: 2024-04-06 06:44:41
标题: Q-PEFT:针对大型语言模型的文本重新排名的查询相关参数高效微调
摘要: 参数高效微调(PEFT)方法已广泛应用于大型语言模型(LLMs),以改善下游任务的性能,而无需微调整个LLMs的成本。最近的研究表明如何有效地利用PEFT来微调LLMs以在排名任务中表现出令人信服的性能;其中一些局限包括学习提示固定于不同文档、过拟合于特定任务和适应能力较低。在本文中,我们介绍了一种基于查询的参数高效微调(Q-PEFT)方法,用于文本重新排序,以向LLMs泄露真实查询的信息,然后使LLMs更容易从输入文档中生成真实查询。具体地,我们利用查询从连接文档中提取前$k$个标记,作为上下文线索。我们进一步通过用多头注意力层替换检索机制来增强Q-PEFT,以实现端到端训练并覆盖文档中的所有标记,指导LLMs生成更多特定于文档的合成查询,从而进一步提高重新排序性能。我们在四个公共数据集上进行了大量实验证明了我们提出的方法的有效性。
更新时间: 2024-04-06 06:44:41
领域: cs.CL,cs.AI,cs.IR,cs.LG
IITK at SemEval-2024 Task 4: Hierarchical Embeddings for Detection of Persuasion Techniques in Memes
Memes are one of the most popular types of content used in an online disinformation campaign. They are primarily effective on social media platforms since they can easily reach many users. Memes in a disinformation campaign achieve their goal of influencing the users through several rhetorical and psychological techniques, such as causal oversimplification, name-calling, and smear. The SemEval 2024 Task 4 \textit{Multilingual Detection of Persuasion Technique in Memes} on identifying such techniques in the memes is divided across three sub-tasks: ($\mathbf{1}$) Hierarchical multi-label classification using only textual content of the meme, ($\mathbf{2}$) Hierarchical multi-label classification using both, textual and visual content of the meme and ($\mathbf{3}$) Binary classification of whether the meme contains a persuasion technique or not using it's textual and visual content. This paper proposes an ensemble of Class Definition Prediction (CDP) and hyperbolic embeddings-based approaches for this task. We enhance meme classification accuracy and comprehensiveness by integrating HypEmo's hierarchical label embeddings (Chen et al., 2023) and a multi-task learning framework for emotion prediction. We achieve a hierarchical F1-score of 0.60, 0.67, and 0.48 on the respective sub-tasks.
Updated: 2024-04-06 06:28:02
标题: IITK在SemEval-2024任务4中的表现:用于检测Memes中说服技术的分层嵌入
摘要: 梗是在线虚假信息宣传中最受欢迎的内容类型之一。它们在社交媒体平台上效果显著,因为可以轻松地触及许多用户。在虚假信息宣传中,梗通过多种修辞和心理技巧(如因果关系过度简单化、挖苦和诽谤)实现影响用户的目标。SemEval 2024任务4《梗中多语种说服技巧检测》旨在识别梗中的这些技巧,分为三个子任务:(1)仅使用梗的文本内容进行层次多标签分类,(2)同时使用文本和视觉内容进行层次多标签分类,(3)使用文本和视觉内容对梗是否包含说服技巧进行二元分类。本文提出了一种基于Class Definition Prediction(CDP)和双曲嵌入的方法集成,以提高梗分类的准确性和全面性。通过集成HypEmo的层次化标签嵌入和用于情绪预测的多任务学习框架,实现了分别为0.60、0.67和0.48的层次F1分数。
更新时间: 2024-04-06 06:28:02
领域: cs.CL,cs.AI,cs.LG
Goal-guided Generative Prompt Injection Attack on Large Language Models
Current large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. A large number of users can easily inject adversarial text or instructions through the user interface, thus causing LLMs model security challenges. Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic strategies. It is unclear how these heuristic strategies relate to the success rate of attacks and thus effectively improve model robustness. To solve this problem, we redefine the goal of the attack: to maximize the KL divergence between the conditional probabilities of the clean text and the adversarial text. Furthermore, we prove that maximizing the KL divergence is equivalent to maximizing the Mahalanobis distance between the embedded representation $x$ and $x'$ of the clean text and the adversarial text when the conditional probability is a Gaussian distribution and gives a quantitative relationship on $x$ and $x'$. Then we designed a simple and effective goal-guided generative prompt injection strategy (G2PIA) to find an injection text that satisfies specific constraints to achieve the optimal attack effect approximately. It is particularly noteworthy that our attack method is a query-free black-box attack method with low computational cost. Experimental results on seven LLM models and four datasets show the effectiveness of our attack method.
Updated: 2024-04-06 06:17:10
标题: 大型语言模型上的目标引导生成提示注入攻击
摘要: 目前大型语言模型(LLMs)为大规模用户导向的自然语言任务提供了坚实的基础。大量用户可以通过用户界面轻松注入对抗性文本或指令,从而导致LLMs模型安全挑战。尽管目前有大量关于提示注入攻击的研究,但大多数黑盒攻击使用启发式策略。目前尚不清楚这些启发式策略与攻击成功率的关系,因此无法有效提高模型的鲁棒性。为解决这一问题,我们重新定义了攻击的目标:最大化干净文本和对抗性文本的条件概率之间的KL散度。此外,我们证明了最大化KL散度等价于在条件概率为高斯分布时最大化干净文本和对抗性文本的嵌入表示$x$和$x'$之间的马氏距离,并给出了$x$和$x'$之间的定量关系。然后我们设计了一个简单有效的目标导向生成提示注入策略(G2PIA),以找到一个满足特定约束条件的注入文本,从而近似实现最佳攻击效果。值得注意的是,我们的攻击方法是一种无查询的黑盒攻击方法,计算成本低。在七个LLM模型和四个数据集上的实验结果显示了我们攻击方法的有效性。
更新时间: 2024-04-06 06:17:10
领域: cs.CR,cs.AI,cs.CL
Latent-based Diffusion Model for Long-tailed Recognition
Long-tailed imbalance distribution is a common issue in practical computer vision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-weighting, transfer learning, and feature augmentation. In recent years, diffusion models have shown an impressive generation ability in many sub-problems of deep computer vision. However, its powerful generation has not been explored in long-tailed problems. We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue. First, we encode the imbalanced dataset into features using the baseline model. Then, we train a Denoising Diffusion Implicit Model (DDIM) using these encoded features to generate pseudo-features. Finally, we train the classifier using the encoded and pseudo-features from the previous two steps. The model's accuracy shows an improvement on the CIFAR-LT and ImageNet-LT datasets by using the proposed method.
Updated: 2024-04-06 06:15:07
标题: 基于潜变量的长尾识别扩散模型
摘要: 长尾不平衡分布是实际计算机视觉应用中常见的问题。先前的研究提出了解决这个问题的方法,可以分为几类:重新采样、重新加权、迁移学习和特征增强。近年来,扩散模型在深度计算机视觉的许多子问题中展现出了令人印象深刻的生成能力。然而,其强大的生成能力尚未在长尾问题中得到探索。我们提出了一种新方法,即基于潜在的扩散模型长尾识别(LDMLR),作为一种特征增强方法来解决这个问题。首先,我们使用基线模型将不平衡的数据集编码为特征。然后,我们使用这些编码的特征训练一个去噪扩散隐式模型(DDIM)来生成伪特征。最后,我们使用前两个步骤中的编码和伪特征来训练分类器。通过使用提出的方法,模型在CIFAR-LT和ImageNet-LT数据集上的准确性得到了改善。
更新时间: 2024-04-06 06:15:07
领域: cs.CV,cs.AI
IITK at SemEval-2024 Task 1: Contrastive Learning and Autoencoders for Semantic Textual Relatedness in Multilingual Texts
This paper describes our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness. The challenge is focused on automatically detecting the degree of relatedness between pairs of sentences for 14 languages including both high and low-resource Asian and African languages. Our team participated in two subtasks consisting of Track A: supervised and Track B: unsupervised. This paper focuses on a BERT-based contrastive learning and similarity metric based approach primarily for the supervised track while exploring autoencoders for the unsupervised track. It also aims on the creation of a bigram relatedness corpus using negative sampling strategy, thereby producing refined word embeddings.
Updated: 2024-04-06 05:58:42
标题: IITK在SemEval-2024任务1中的表现:对比学习和自动编码器在多语言文本中的语义文本相关性
摘要: 本文描述了我们为SemEval-2024任务1开发的系统:语义文本相关性。该挑战侧重于自动检测包括高资源和低资源的亚洲和非洲语言在内的14种语言之间的相关性程度。我们的团队参与了两个子任务,包括Track A:监督和Track B:无监督。本文主要关注基于BERT的对比学习和相似度度量方法,主要用于监督轨道,同时探索无监督轨道上的自编码器。它还旨在使用负采样策略创建一个双字母相关性语料库,从而生成精细的词嵌入。
更新时间: 2024-04-06 05:58:42
领域: cs.CL,cs.AI,cs.LG
Cluster-based Video Summarization with Temporal Context Awareness
In this paper, we present TAC-SUM, a novel and efficient training-free approach for video summarization that addresses the limitations of existing cluster-based models by incorporating temporal context. Our method partitions the input video into temporally consecutive segments with clustering information, enabling the injection of temporal awareness into the clustering process, setting it apart from prior cluster-based summarization methods. The resulting temporal-aware clusters are then utilized to compute the final summary, using simple rules for keyframe selection and frame importance scoring. Experimental results on the SumMe dataset demonstrate the effectiveness of our proposed approach, outperforming existing unsupervised methods and achieving comparable performance to state-of-the-art supervised summarization techniques. Our source code is available for reference at \url{https://github.com/hcmus-thesis-gulu/TAC-SUM}.
Updated: 2024-04-06 05:55:14
标题: 基于簇的视频摘要与时间上下文感知
摘要: 在本文中,我们提出了TAC-SUM,一种新颖且高效的视频摘要方法,该方法无需训练,解决了现有基于聚类模型的局限性,通过整合时间上下文。我们的方法将输入视频分成具有聚类信息的时间连续段,从而使时间感知融入聚类过程中,使其与先前基于聚类的摘要方法有所不同。然后利用产生的时间感知聚类来计算最终摘要,使用简单的规则进行关键帧选择和帧重要性评分。对SumMe数据集的实验结果表明,我们提出的方法的有效性,优于现有的无监督方法,并实现了与最先进的监督摘要技术相媲美的性能。我们的源代码可在\url{https://github.com/hcmus-thesis-gulu/TAC-SUM}上参考。
更新时间: 2024-04-06 05:55:14
领域: cs.CV,cs.AI
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that accurately predicts certain metrics for large models without training them. Existing scaling laws require hyperparameter search on the largest models, limiting their predicative capability. In this paper, we present an approach (namely {\mu}Scaling) to predict the pre-training loss, based on our observations that Maximal Update Parametrization ({\mu}P) enables accurate fitting of scaling laws close to common loss basins in hyperparameter space. With {\mu}Scaling, different model designs can be compared on large scales by training only their smaller counterparts. Further, we introduce nanoLM: an affordable LLM pre-training benchmark that facilitates this new research paradigm. With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B. Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models. We also aspire for our benchmark to serve as a bridge between the academic community and the industry. Code for {\mu}Scaling is available at https://github.com/cofe-ai/Mu-scaling. Code for nanoLLM will be available later.
Updated: 2024-04-06 05:50:39
标题: nanoLM:通过准确的损失预测跨尺度实现经济实惠的LLM预训练基准
摘要: 随着语言模型的规模扩大,验证研究想法变得越来越昂贵,因为在小型模型上得出的结论并不会简单地转移到大型模型上。一个可能的解决方案是建立一个通用系统,能够准确预测大型模型的某些指标,而无需对其进行训练。现有的缩放规律需要在最大的模型上进行超参数搜索,限制了它们的预测能力。在本文中,我们提出了一种方法(即μScaling),用于预测预训练损失,基于我们的观察到最大更新参数化(μP)能够在超参数空间的常见损失盆地附近准确拟合缩放规律。通过μScaling,可以通过仅训练它们的较小对应模型来比较不同的模型设计。此外,我们引入了nanoLM:一个经济实惠的LLM预训练基准,促进了这种新的研究范式。使用大约一次性预训练成本的14%,我们可以准确预测高达52B的模型的损失。我们的目标是通过nanoLM赋予资源有限的研究人员在大型模型上得出有意义的结论的能力。我们也希望我们的基准可以作为学术界和行业之间的桥梁。μScaling的代码可在https://github.com/cofe-ai/Mu-scaling上找到。nanoLLM的代码将稍后提供。
更新时间: 2024-04-06 05:50:39
领域: cs.CL,cs.LG
IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials
Large Language models (LLMs) have demonstrated state-of-the-art performance in various natural language processing (NLP) tasks across multiple domains, yet they are prone to shortcut learning and factual inconsistencies. This research investigates LLMs' robustness, consistency, and faithful reasoning when performing Natural Language Inference (NLI) on breast cancer Clinical Trial Reports (CTRs) in the context of SemEval 2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials. We examine the reasoning capabilities of LLMs and their adeptness at logical problem-solving. A comparative analysis is conducted on pre-trained language models (PLMs), GPT-3.5, and Gemini Pro under zero-shot settings using Retrieval-Augmented Generation (RAG) framework, integrating various reasoning chains. The evaluation yields an F1 score of 0.69, consistency of 0.71, and a faithfulness score of 0.90 on the test dataset.
Updated: 2024-04-06 05:44:53
标题: IITK在SemEval-2024任务2中的表现:探索LLMs在临床试验中安全生物医学自然语言推理的能力
摘要: 大型语言模型(LLMs)已经在多个领域展示了最先进的自然语言处理(NLP)性能,但它们容易出现捷径学习和事实不一致的问题。本研究调查了LLMs在执行乳腺癌临床试验报告(CTRs)上的自然语言推断(NLI)时的稳健性、一致性和合理推理能力,考虑到SemEval 2024任务2的背景:安全生物医学自然语言推断临床试验。我们检查了LLMs的推理能力和在逻辑问题解决方面的熟练程度。在零-shot设置下,使用检索增强生成(RAG)框架对预训练语言模型(PLMs)、GPT-3.5和Gemini Pro进行了比较分析,整合了各种推理链。评估结果显示,在测试数据集上,F1分数为0.69,一致性为0.71,忠实度分数为0.90。
更新时间: 2024-04-06 05:44:53
领域: cs.CL,cs.AI,cs.LG
Distributed No-Regret Learning for Multi-Stage Systems with End-to-End Bandit Feedback
This paper studies multi-stage systems with end-to-end bandit feedback. In such systems, each job needs to go through multiple stages, each managed by a different agent, before generating an outcome. Each agent can only control its own action and learn the final outcome of the job. It has neither knowledge nor control on actions taken by agents in the next stage. The goal of this paper is to develop distributed online learning algorithms that achieve sublinear regret in adversarial environments. The setting of this paper significantly expands the traditional multi-armed bandit problem, which considers only one agent and one stage. In addition to the exploration-exploitation dilemma in the traditional multi-armed bandit problem, we show that the consideration of multiple stages introduces a third component, education, where an agent needs to choose its actions to facilitate the learning of agents in the next stage. To solve this newly introduced exploration-exploitation-education trilemma, we propose a simple distributed online learning algorithm, $\epsilon-$EXP3. We theoretically prove that the $\epsilon-$EXP3 algorithm is a no-regret policy that achieves sublinear regret. Simulation results show that the $\epsilon-$EXP3 algorithm significantly outperforms existing no-regret online learning algorithms for the traditional multi-armed bandit problem.
Updated: 2024-04-06 05:34:12
标题: 多阶段系统的分布式无悔学习与端到端赌博反馈
摘要: 本文研究具有端到端赌博反馈的多阶段系统。在这种系统中,每个作业需要通过多个阶段,每个阶段由不同的代理管理,才能生成结果。每个代理只能控制自己的行动,并学习作业的最终结果。它对下一阶段代理采取的行动既没有知识也没有控制。本文的目标是开发分布式在线学习算法,在对抗环境中实现亚线性后悔。 本文的设置显著扩展了传统的多臂赌博问题,传统问题只考虑一个代理和一个阶段。除了传统多臂赌博问题中的探索-利用困境外,我们还表明多个阶段的考虑引入了第三个组成部分,即教育,在这里代理需要选择其行动以促进下一阶段代理的学习。为了解决这个新引入的探索-利用-教育三难问题,我们提出了一个简单的分布式在线学习算法,ε-EXP3。我们理论上证明ε-EXP3算法是一个无后悔策略,实现亚线性后悔。模拟结果显示,ε-EXP3算法明显优于现有的传统多臂赌博问题的无后悔在线学习算法。
更新时间: 2024-04-06 05:34:12
领域: cs.LG,cs.NI
Within-Document Event Coreference with BERT-Based Contextualized Representations
Event coreference continues to be a challenging problem in information extraction. With the absence of any external knowledge bases for events, coreference becomes a clustering task that relies on effective representations of the context in which event mentions appear. Recent advances in contextualized language representations have proven successful in many tasks, however, their use in event linking been limited. Here we present a three part approach that (1) uses representations derived from a pretrained BERT model to (2) train a neural classifier to (3) drive a simple clustering algorithm to create coreference chains. We achieve state of the art results with this model on two standard datasets for within-document event coreference task and establish a new standard on a third newer dataset.
Updated: 2024-04-06 05:14:07
标题: 使用基于BERT的上下文化表示进行文档内事件指代。
摘要: 事件共指仍然是信息提取中的一个具有挑战性的问题。由于缺乏任何外部知识库用于事件,共指变成了一个依赖于事件提及出现上下文的聚类任务。最近在上下文化语言表示方面取得的进展在许多任务中证明了成功,然而,它们在事件链接中的使用受到了限制。在这里,我们提出了一个三部分方法,该方法使用从预训练BERT模型派生的表示来训练一个神经分类器,从而驱动一个简单的聚类算法来创建共指链。我们使用这种模型在两个标准数据集上取得了最新的结果,用于文档内事件共指任务,并在第三个更新的数据集上建立了一个新的标准。
更新时间: 2024-04-06 05:14:07
领域: cs.CL,cs.AI,cs.IR
Trustless Audits without Revealing Data or Models
There is an increasing conflict between business incentives to hide models and data as trade secrets, and the societal need for algorithmic transparency. For example, a rightsholder wishing to know whether their copyrighted works have been used during training must convince the model provider to allow a third party to audit the model and data. Finding a mutually agreeable third party is difficult, and the associated costs often make this approach impractical. In this work, we show that it is possible to simultaneously allow model providers to keep their model weights (but not architecture) and data secret while allowing other parties to trustlessly audit model and data properties. We do this by designing a protocol called ZkAudit in which model providers publish cryptographic commitments of datasets and model weights, alongside a zero-knowledge proof (ZKP) certifying that published commitments are derived from training the model. Model providers can then respond to audit requests by privately computing any function F of the dataset (or model) and releasing the output of F alongside another ZKP certifying the correct execution of F. To enable ZkAudit, we develop new methods of computing ZKPs for SGD on modern neural nets for simple recommender systems and image classification models capable of high accuracies on ImageNet. Empirically, we show it is possible to provide trustless audits of DNNs, including copyright, censorship, and counterfactual audits with little to no loss in accuracy.
Updated: 2024-04-06 04:43:06
标题: 无需透露数据或模型的无信任审计
摘要: 随着商业激励隐藏模型和数据作为商业机密的需求增加,与社会对算法透明度的需求之间出现了越来越多的冲突。例如,希望了解其受版权保护作品是否在训练过程中被使用的权利持有人必须说服模型提供者允许第三方审核模型和数据。找到一个双方都同意的第三方是困难的,相关成本往往使这种方法变得不切实际。 在这项工作中,我们展示了可以同时允许模型提供者保持其模型权重(但不保密架构)和数据的机密性,同时允许其他各方在不信任的情况下审计模型和数据属性。我们通过设计一个名为ZkAudit的协议来实现这一点,其中模型提供者发布数据集和模型权重的加密承诺,以及证明这些发布的承诺是由训练模型得出的零知识证明(ZKP)。然后,模型提供者可以通过私下计算数据集(或模型)的任何函数F并在另一个ZKP证明正确执行F的情况下回应审计请求。为了实现ZkAudit,我们开发了一种新的方法,在现代神经网络上计算SGD的ZKP,用于简单的推荐系统和能够在ImageNet上实现高准确度的图像分类模型。从经验上看,我们展示了可以在几乎不损失准确性的情况下对DNN进行无信任的审计,包括版权、审查和反事实审计。
更新时间: 2024-04-06 04:43:06
领域: cs.CR,cs.AI,cs.CY,cs.LG
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to enhance the robustness of LLM-based TTS. To accomplish this idea, RALL-E first predicts prosody features (pitch and duration) of the input text and uses them as intermediate conditions to predict speech tokens in a CoT style. Second, RALL-E utilizes the predicted duration prompt to guide the computing of self-attention weights in Transformer to enforce the model to focus on the corresponding phonemes and prosody features when predicting speech tokens. Results of comprehensive objective and subjective evaluations demonstrate that, compared to a powerful baseline method VALL-E, RALL-E significantly improves the WER of zero-shot TTS from $6.3\%$ (without reranking) and $2.1\%$ (with reranking) to $2.8\%$ and $1.0\%$, respectively. Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$.
Updated: 2024-04-06 04:35:50
标题: RALL-E:文本到语音合成中具有思维链提示的鲁棒编解码器语言建模
摘要: 我们提出了一种强大的语言建模方法RALL-E,用于文本到语音(TTS)合成。尽管先前基于大型语言模型(LLMs)的工作在零样本TTS上表现出色,但这种方法往往存在韵律不稳定(奇怪的音高和节奏/持续时间)和高词错误率(WER)等鲁棒性问题,这是由于语言模型的自回归预测风格所致。RALL-E背后的核心思想是“思维链”(CoT)提示,将任务分解为更简单的步骤,以增强基于LLM的TTS的鲁棒性。为了实现这一思想,RALL-E首先预测输入文本的韵律特征(音高和持续时间),并将它们用作预测语音令牌的中间条件,以CoT风格进行预测。其次,RALL-E利用预测的持续时间提示来引导Transformer中的自注意力权重计算,强制模型在预测语音令牌时集中精力关注相应的音素和韵律特征。综合客观和主观评估结果表明,与强大的基线方法VALL-E相比,RALL-E显著改善了零样本TTS的WER,分别从6.3%(无重排序)和2.1%(有重排序)降至2.8%和1.0%。此外,我们证明了RALL-E能够正确合成对VALL-E困难的句子,并将错误率从68%降低到4%。
更新时间: 2024-04-06 04:35:50
领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD
Causal Bayesian Optimization via Exogenous Distribution Learning
Maximizing a target variable as an operational objective in a structured causal model is an important problem. Existing Causal Bayesian Optimization (CBO) methods either rely on hard interventions that alter the causal structure to maximize the reward; or introduce action nodes to endogenous variables so that the data generation mechanisms are adjusted to achieve the objective. In this paper, a novel method is introduced to learn the distribution of exogenous variables, which is typically ignored or marginalized through expectation by existing methods. Exogenous distribution learning improves the approximation accuracy of structured causal models in a surrogate model that is usually trained with limited observational data. Moreover, the learned exogenous distribution extends existing CBO to general causal schemes beyond Additive Noise Models (ANM). The recovery of exogenous variables allows us to use a more flexible prior for noise or unobserved hidden variables. A new CBO method is developed by leveraging the learned exogenous distribution. Experiments on different datasets and applications show the benefits of our proposed method.
Updated: 2024-04-06 04:34:58
标题: 贝叶斯因果优化:通过外部分布学习
摘要: 将目标变量最大化作为结构因果模型中的操作目标是一个重要问题。现有的因果贝叶斯优化(CBO)方法要么依赖于改变因果结构以最大化奖励的硬干预;要么引入动作节点到内生变量,以调整数据生成机制以实现目标。本文介绍了一种新的方法,用于学习外生变量的分布,这通常被现有方法忽视或通过期望边缘化。外生分布学习提高了通常使用有限观察数据进行训练的代理模型中结构因果模型的逼近精度。此外,学习的外生分布将现有的CBO扩展到超出加性噪声模型(ANM)的一般因果方案。外生变量的恢复使我们能够对噪声或未观察到的隐藏变量使用更灵活的先验。通过利用学习的外生分布开发了一种新的CBO方法。对不同数据集和应用的实验显示了我们提出的方法的优势。
更新时间: 2024-04-06 04:34:58
领域: cs.LG,stat.ML
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench emphasizes real-world scenarios and practical aspects of software development. We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks. Evaluation of 19 LLMs reveals that closed-source models (particularly Gemini-Ultra and GPT-4), outperform open-source models in CodeEditorBench, highlighting differences in model performance based on problem types and prompt sensitivities. CodeEditorBench aims to catalyze advancements in LLMs by providing a robust platform for assessing code editing capabilities. We will release all prompts and datasets to enable the community to expand the dataset and benchmark emerging LLMs. By introducing CodeEditorBench, we contribute to the advancement of LLMs in code editing and provide a valuable resource for researchers and practitioners.
Updated: 2024-04-06 04:29:25
标题: CodeEditorBench:评估大型语言模型的代码编辑能力
摘要: 大型语言模型(LLMs)用于代码的应用正在迅速发展,代码编辑作为一项关键能力正逐渐崭露头角。我们引入了CodeEditorBench,这是一个评估框架,旨在严格评估LLMs在代码编辑任务中的表现,包括调试、翻译、润色和需求切换等。与现有的重点仅在代码生成上的基准不同,CodeEditorBench强调真实世界场景和软件开发的实际方面。我们从五个来源精心策划了各种编程语言、复杂程度和编辑任务的多样化编码挑战和场景。对19个LLMs的评估显示,闭源模型(特别是Gemini-Ultra和GPT-4)在CodeEditorBench中表现优于开源模型,突显了基于问题类型和提示灵敏度的模型性能差异。CodeEditorBench旨在通过提供一个强大的平台来评估代码编辑能力,推动LLMs的进步。我们将发布所有提示和数据集,以便社区扩展数据集并评估新兴的LLMs。通过引入CodeEditorBench,我们为LLMs在代码编辑方面的进步做出了贡献,并为研究人员和实践者提供了宝贵的资源。
更新时间: 2024-04-06 04:29:25
领域: cs.SE,cs.AI,cs.CL,cs.LG
Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression
The remarkable generalization performance of overparameterized models has challenged the conventional wisdom of statistical learning theory. While recent theoretical studies have shed light on this behavior in linear models or nonlinear classifiers, a comprehensive understanding of overparameterization in nonlinear regression remains lacking. This paper explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the methodology of adaptive prior based on the intrinsic spectral structure of the data. We establish posterior contraction for single-neuron models with Lipschitz continuous activation functions and for generalized linear models, demonstrating that our approach achieves consistent predictions in the overparameterized regime. Moreover, our Bayesian framework allows for uncertainty estimation of the predictions. The proposed method is validated through numerical simulations and a real data application, showcasing its ability to achieve accurate predictions and reliable uncertainty estimates. Our work advances the theoretical understanding of the blessing of overparameterization and offers a principled Bayesian approach for prediction in large nonlinear models.
Updated: 2024-04-06 04:22:48
标题: 贝叶斯推断在过参数化非线性回归中的一致预测
摘要: 过参数化模型的显著泛化性能挑战了统计学习理论的传统智慧。虽然最近的理论研究已经揭示了线性模型或非线性分类器中这种行为的特点,但在非线性回归中对过参数化的全面理解仍然缺乏。本文探讨了基于贝叶斯框架的过参数化非线性回归的预测性质,扩展了基于数据固有谱结构的自适应先验方法。我们建立了具有利普希茨连续激活函数的单神经元模型和广义线性模型的后验收缩,证明了我们的方法在过参数化区域实现了一致预测。此外,我们的贝叶斯框架允许对预测进行不确定性估计。所提出的方法通过数值模拟和实际数据应用进行了验证,展示了其实现准确预测和可靠不确定性估计的能力。我们的工作推进了对过参数化的有利性的理论理解,并为大型非线性模型的预测提供了一种原则性的贝叶斯方法。
更新时间: 2024-04-06 04:22:48
领域: stat.ML,cs.LG,stat.ME
Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs
We demonstrate experimental results with LLMs that address robotics task planning problems. Recently, LLMs have been applied in robotics task planning, particularly using a code generation approach that converts complex high-level instructions into mid-level policy codes. In contrast, our approach acquires text descriptions of the task and scene objects, then formulates task planning through natural language reasoning, and outputs coordinate level control commands, thus reducing the necessity for intermediate representation code as policies with pre-defined APIs. Our approach is evaluated on a multi-modal prompt simulation benchmark, demonstrating that our prompt engineering experiments with natural language reasoning significantly enhance success rates compared to its absence. Furthermore, our approach illustrates the potential for natural language descriptions to transfer robotics skills from known tasks to previously unseen tasks. The project website: https://natural-language-as-policies.github.io/
Updated: 2024-04-06 04:12:47
标题: 自然语言作为政策:基于LLMs的坐标级体现控制的推理
摘要: 我们展示了应用LLMs解决机器人任务规划问题的实验结果。最近,LLMs已被应用于机器人任务规划,特别是使用代码生成方法,将复杂的高层指令转换为中层策略代码。相比之下,我们的方法获取任务和场景对象的文本描述,然后通过自然语言推理制定任务规划,并输出坐标级控制命令,从而减少了中间表示代码作为具有预定义API的策略的必要性。我们的方法在一个多模态提示仿真基准上进行了评估,表明我们与其缺席相比,通过自然语言推理进行的提示工程实验显著提高了成功率。此外,我们的方法展示了自然语言描述将机器人技能从已知任务转移到以前未见任务的潜力。项目网站:https://natural-language-as-policies.github.io/
更新时间: 2024-04-06 04:12:47
领域: cs.RO,cs.AI,cs.CL,I.2.9; I.2.7
General2Specialized LLMs Translation for E-commerce
Existing Neural Machine Translation (NMT) models mainly handle translation in the general domain, while overlooking domains with special writing formulas, such as e-commerce and legal documents. Taking e-commerce as an example, the texts usually include amounts of domain-related words and have more grammar problems, which leads to inferior performances of current NMT methods. To address these problems, we collect two domain-related resources, including a set of term pairs (aligned Chinese-English bilingual terms) and a parallel corpus annotated for the e-commerce domain. Furthermore, we propose a two-step fine-tuning paradigm (named G2ST) with self-contrastive semantic enhancement to transfer one general NMT model to the specialized NMT model for e-commerce. The paradigm can be used for the NMT models based on Large language models (LLMs). Extensive evaluations on real e-commerce titles demonstrate the superior translation quality and robustness of our G2ST approach, as compared with state-of-the-art NMT models such as LLaMA, Qwen, GPT-3.5, and even GPT-4.
Updated: 2024-04-06 04:07:49
标题: 从一般到专业化的电子商务LLM翻译
摘要: 现有的神经机器翻译(NMT)模型主要处理通用领域的翻译,而忽视了具有特殊写作公式的领域,如电子商务和法律文件。以电子商务为例,这些文本通常包含大量领域相关词汇,并且存在更多的语法问题,这导致当前NMT方法的性能较差。为了解决这些问题,我们收集了两个与领域相关的资源,包括一组术语对(对齐的中英双语术语)和一个标注为电子商务领域的平行语料库。此外,我们提出了一个两步微调范式(名为G2ST),其中包括自对比语义增强,将一个通用NMT模型转移到电子商务的专门化NMT模型。该范式可用于基于大型语言模型(LLMs)的NMT模型。对真实电子商务标题的广泛评估显示,与LLaMA、Qwen、GPT-3.5甚至GPT-4等最先进的NMT模型相比,我们的G2ST方法具有更优越的翻译质量和稳健性。
更新时间: 2024-04-06 04:07:49
领域: cs.CL,cs.AI
Automated Lane Change Behavior Prediction and Environmental Perception Based on SLAM Technology
In addition to environmental perception sensors such as cameras, radars, etc. in the automatic driving system, the external environment of the vehicle is perceived, in fact, there is also a perception sensor that has been silently dedicated in the system, that is, the positioning module. This paper explores the application of SLAM (Simultaneous Localization and Mapping) technology in the context of automatic lane change behavior prediction and environment perception for autonomous vehicles. It discusses the limitations of traditional positioning methods, introduces SLAM technology, and compares LIDAR SLAM with visual SLAM. Real-world examples from companies like Tesla, Waymo, and Mobileye showcase the integration of AI-driven technologies, sensor fusion, and SLAM in autonomous driving systems. The paper then delves into the specifics of SLAM algorithms, sensor technologies, and the importance of automatic lane changes in driving safety and efficiency. It highlights Tesla's recent update to its Autopilot system, which incorporates automatic lane change functionality using SLAM technology. The paper concludes by emphasizing the crucial role of SLAM in enabling accurate environment perception, positioning, and decision-making for autonomous vehicles, ultimately enhancing safety and driving experience.
Updated: 2024-04-06 03:48:29
标题: 基于SLAM技术的自动车道变更行为预测和环境感知
摘要: 除了环境感知传感器如摄像头、雷达等之外,在自动驾驶系统中,车辆的外部环境实际上也被感知到了,事实上,系统中还有一个默默地致力于的感知传感器,即定位模块。本文探讨了SLAM(同时定位与地图构建)技术在自动车道变换行为预测和自动驾驶车辆环境感知中的应用。它讨论了传统定位方法的局限性,介绍了SLAM技术,并将激光雷达SLAM与视觉SLAM进行了比较。来自特斯拉、Waymo和Mobileye等公司的现实世界示例展示了AI驱动技术、传感器融合和SLAM在自动驾驶系统中的整合。文章随后深入探讨了SLAM算法、传感器技术以及自动车道变换在驾驶安全性和效率方面的重要性。它强调了特斯拉最近对其Autopilot系统的更新,该系统利用SLAM技术实现了自动车道变换功能。文章最后强调了SLAM在实现自动驾驶车辆准确环境感知、定位和决策方面的关键作用,最终提升了安全性和驾驶体验。
更新时间: 2024-04-06 03:48:29
领域: cs.RO,cs.AI,cs.CV
Galaxy 3D Shape Recovery using Mixture Density Network
Since the turn of the century, astronomers have been exploiting the rich information afforded by combining stellar kinematic maps and imaging in an attempt to recover the intrinsic, three-dimensional (3D) shape of a galaxy. A common intrinsic shape recovery method relies on an expected monotonic relationship between the intrinsic misalignment of the kinematic and morphological axes and the triaxiality parameter. Recent studies have, however, cast doubt about underlying assumptions relating shape and intrinsic kinematic misalignment. In this work, we aim to recover the 3D shape of individual galaxies using their projected stellar kinematic and flux distributions using a supervised machine learning approach with mixture density network (MDN). Using a mock dataset of the EAGLE hydrodynamical cosmological simulation, we train the MDN model for a carefully selected set of common kinematic and photometric parameters. Compared to previous methods, we demonstrate potential improvements achieved with the MDN model to retrieve the 3D galaxy shape along with the uncertainties, especially for prolate and triaxial systems. We make specific recommendations for recovering galaxy intrinsic shapes relevant for current and future integral field spectroscopic galaxy surveys.
Updated: 2024-04-06 03:48:11
标题: 《使用混合密度网络恢复星系的三维形状》
摘要: 自世纪之交以来,天文学家一直在利用结合恒星动力学地图和成像提供的丰富信息,试图恢复星系的固有三维形状。一种常见的固有形状恢复方法依赖于固有动力学和形态轴之间的固有错位与三轴性参数之间的预期单调关系。然而,最近的研究对与形状和固有动力学错位相关的基本假设产生了疑问。在这项工作中,我们旨在利用监督式机器学习方法和混合密度网络(MDN)利用星系的投影恒星动力学和流量分布来恢复个体星系的三维形状。使用EAGLE流体动力学宇宙学模拟的模拟数据集,我们为一组精心选择的常见动力学和光度参数训练MDN模型。与先前的方法相比,我们展示了使用MDN模型实现的潜在改进,以检索三维星系形状以及不确定性,特别是对于长形和三轴系统。我们针对当前和未来的积分场谱学星系调查提出了恢复星系固有形状的具体建议。
更新时间: 2024-04-06 03:48:11
领域: astro-ph.IM,astro-ph.GA,cs.LG
Hyperparameter Optimization for SecureBoost via Constrained Multi-Objective Federated Learning
SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is secured. Our study found that SecureBoost and some of its variants are still vulnerable to label leakage. This vulnerability may lead the current heuristic hyperparameter configuration of SecureBoost to a suboptimal trade-off between utility, privacy, and efficiency, which are pivotal elements toward a trustworthy federated learning system. To address this issue, we propose the Constrained Multi-Objective SecureBoost (CMOSB) algorithm, which aims to approximate Pareto optimal solutions that each solution is a set of hyperparameters achieving an optimal trade-off between utility loss, training cost, and privacy leakage. We design measurements of the three objectives, including a novel label inference attack named instance clustering attack (ICA) to measure the privacy leakage of SecureBoost. Additionally, we provide two countermeasures against ICA. The experimental results demonstrate that the CMOSB yields superior hyperparameters over those optimized by grid search and Bayesian optimization regarding the trade-off between utility loss, training cost, and privacy leakage.
Updated: 2024-04-06 03:46:42
标题: SecureBoost通过受限多目标联邦学习的超参数优化
摘要: SecureBoost是一种利用同态加密(HE)保护垂直联邦学习中数据隐私的树增强算法。SecureBoost及其变种已被广泛应用于金融和医疗保健等领域。然而,SecureBoost的超参数通常是启发式配置的,仅为优化模型性能(即效用)而设计,假设隐私已得到保护。我们的研究发现,SecureBoost及其部分变种仍然容易受到标签泄露的影响。这种脆弱性可能导致SecureBoost当前的启发式超参数配置在效用、隐私和效率之间达不到最佳平衡,而这些是构建可信赖的联邦学习系统的关键因素。为了解决这个问题,我们提出了约束多目标SecureBoost(CMOSB)算法,旨在近似帕累托最优解,即每个解是一组超参数,实现了效用损失、训练成本和隐私泄露之间的最佳平衡。我们设计了三个目标的度量标准,包括一种名为实例聚类攻击(ICA)的新型标签推断攻击,用于衡量SecureBoost的隐私泄露。此外,我们提供了两种抵御ICA的对策。实验结果表明,CMOSB在效用损失、训练成本和隐私泄露之间的权衡方面优于通过网格搜索和贝叶斯优化优化的超参数。
更新时间: 2024-04-06 03:46:42
领域: cs.LG,cs.CR
An Optimization Framework to Personalize Passive Cardiac Mechanics
Personalized cardiac mechanics modeling is a powerful tool for understanding the biomechanics of cardiac function in health and disease and assisting in treatment planning. However, current models are limited to using medical images acquired at a single cardiac phase, often limiting their applicability for processing dynamic image acquisitions. This study introduces an inverse finite element analysis (iFEA) framework to estimate the passive mechanical properties of cardiac tissue using time-dependent medical image data. The iFEA framework relies on a novel nested optimization scheme, in which the outer iterations utilize a traditional optimization method to best approximate material parameters that fit image data, while the inner iterations employ an augmented Sellier's algorithm to estimate the stress-free reference configuration. With a focus on characterizing the passive mechanical behavior, the framework employs structurally based anisotropic hyperelastic constitutive models and physiologically relevant boundary conditions to simulate myocardial mechanics. We use a stabilized variational multiscale formulation for solving the governing nonlinear elastodynamics equations, verified for cardiac mechanics applications. The framework is tested in myocardium models of biventricle and left atrium derived from cardiac phase-resolved computed tomographic (CT) images of a healthy subject and three patients with hypertrophic obstructive cardiomyopathy (HOCM). The impact of the choice of optimization methods and other numerical settings, including fiber direction parameters, mesh size, initial parameters for optimization, and perturbations to optimal material parameters, is assessed using a rigorous sensitivity analysis. The performance of the current iFEA is compared against an assumed power-law-based pressure-volume relation, typically used for single-phase image acquisition.
Updated: 2024-04-06 03:24:27
标题: 一个优化框架用于个性化被动心脏力学
摘要: 个性化心脏力学建模是一种强大的工具,用于理解健康和疾病中心脏功能的生物力学,并协助治疗规划。然而,当前的模型仅限于使用在单个心脏相位获取的医学图像,通常限制了它们在处理动态图像采集方面的适用性。本研究引入了一种逆有限元分析(iFEA)框架,利用时间依赖医学图像数据估计心脏组织的被动机械特性。iFEA框架依赖于一种新颖的嵌套优化方案,其中外部迭代利用传统优化方法最佳逼近适合图像数据的材料参数,而内部迭代采用增广Sellier算法估计无应力参考配置。重点在于表征被动机械行为,该框架采用基于结构的各向异性超弹性本构模型和生理相关的边界条件来模拟心肌力学。我们使用稳定的变分多尺度公式来解决统治非线性弹性动力学方程,已经验证用于心脏力学应用。该框架在从健康受试者和三名患有肥厚性梗阻性心肌病(HOCM)的患者的心脏相位分辨计算机断层扫描(CT)图像导出的左心房和双心室心肌模型中进行了测试。使用严格的敏感性分析评估了优化方法和其他数值设置的选择,包括纤维方向参数、网格大小、优化的初始参数以及对最佳材料参数的扰动。当前iFEA的性能与通常用于单相图像采集的假定幂律压力-容积关系进行比较。
更新时间: 2024-04-06 03:24:27
领域: physics.med-ph,cs.AI
You Only Train Once: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment
Although recent efforts in image quality assessment (IQA) have achieved promising performance, there still exists a considerable gap compared to the human visual system (HVS). One significant disparity lies in humans' seamless transition between full reference (FR) and no reference (NR) tasks, whereas existing models are constrained to either FR or NR tasks. This disparity implies the necessity of designing two distinct systems, thereby greatly diminishing the model's versatility. Therefore, our focus lies in unifying FR and NR IQA under a single framework. Specifically, we first employ an encoder to extract multi-level features from input images. Then a Hierarchical Attention (HA) module is proposed as a universal adapter for both FR and NR inputs to model the spatial distortion at each encoder stage. Furthermore, considering that different distortions contaminate encoder stages and damage image semantic meaning differently, a Semantic Distortion Aware (SDA) module is proposed to examine feature correlations between shallow and deep layers of the encoder. By adopting HA and SDA, the proposed network can effectively perform both FR and NR IQA. When our proposed model is independently trained on NR or FR IQA tasks, it outperforms existing models and achieves state-of-the-art performance. Moreover, when trained jointly on NR and FR IQA tasks, it further enhances the performance of NR IQA while achieving on-par performance in the state-of-the-art FR IQA. You only train once to perform both IQA tasks. Code will be released at: https://github.com/BarCodeReader/YOTO.
Updated: 2024-04-06 03:17:33
标题: 您只需训练一次:一个统一的框架用于全参考和无参考图像质量评估
摘要: 尽管最近在图像质量评估(IQA)方面的努力取得了令人满意的表现,但与人类视觉系统(HVS)相比仍存在相当大的差距。一个重要的差异在于人类在全参考(FR)和无参考(NR)任务之间的无缝转换,而现有模型受限于FR或NR任务。这种差异意味着需要设计两个不同的系统,从而大大降低了模型的通用性。因此,我们的重点在于将FR和NR IQA统一起来,放在一个单一的框架下。具体来说,我们首先使用编码器从输入图像中提取多级特征。然后提出了一个分层注意(HA)模块作为FR和NR输入的通用适配器,用于模拟每个编码器阶段的空间失真。此外,考虑到不同的失真会污染编码器阶段并以不同方式破坏图像语义含义,我们提出了一个语义失真感知(SDA)模块,用于检查编码器的浅层和深层特征之间的相关性。通过采用HA和SDA,所提出的网络可以有效地执行FR和NR IQA。当我们的模型分别在NR或FR IQA任务上训练时,它优于现有模型并实现了最先进的性能。此外,当在NR和FR IQA任务上联合训练时,它进一步提高了NR IQA的性能,同时在最先进的FR IQA中实现了同等的性能。您只需训练一次即可执行两个IQA任务。代码将在以下网址发布:https://github.com/BarCodeReader/YOTO。
更新时间: 2024-04-06 03:17:33
领域: cs.CV,cs.AI,eess.IV
Joint Identifiability of Cross-Domain Recommendation via Hierarchical Subspace Disentanglement
Cross-Domain Recommendation (CDR) seeks to enable effective knowledge transfer across domains. Existing works rely on either representation alignment or transformation bridges, but they struggle on identifying domain-shared from domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its joint identifiability as they primarily fixate on the marginal distribution within a particular domain. Such a failure may overlook the conditionality between two domains and how it contributes to latent factor disentanglement, leading to negative transfer when domains are weakly correlated. In this study, we explore what should and should not be transferred in cross-domain user representations from a causality perspective. We propose a Hierarchical subspace disentanglement approach to explore the Joint IDentifiability of cross-domain joint distribution, termed HJID, to preserve domain-specific behaviors from domain-shared factors. HJID organizes user representations into layers: generic shallow subspaces and domain-oriented deep subspaces. We first encode the generic pattern in the shallow subspace by minimizing the Maximum Mean Discrepancy of initial layer activation. Then, to dissect how domain-oriented latent factors are encoded in deeper layers activation, we construct a cross-domain causality-based data generation graph, which identifies cross-domain consistent and domain-specific components, adhering to the Minimal Change principle. This allows HJID to maintain stability whilst discovering unique factors for different domains, all within a generative framework of invertible transformations that guarantee the joint identifiability. With experiments on real-world datasets, we show that HJID outperforms SOTA methods on a range of strongly and weakly correlated CDR tasks.
Updated: 2024-04-06 03:11:31
标题: 跨领域推荐的联合可识别性:通过分层子空间解缠的方法
摘要: 跨领域推荐(CDR)旨在实现跨领域之间的有效知识转移。现有研究依赖于表示对齐或转换桥梁,但在识别领域共享和领域特定潜在因素方面存在困难。具体而言,虽然CDR将用户表示描述为两个领域上的联合分布,但这些方法未能考虑其联合可识别性,因为它们主要专注于特定领域内的边缘分布。这种失败可能忽视了两个领域之间的条件性以及它如何有助于潜在因素的分离,导致在领域之间弱相关时出现负面转移。在本研究中,我们从因果关系的角度探讨了跨领域用户表示中应该传递和不应该传递的内容。我们提出了一种分层次子空间分解方法,用于探索跨领域联合分布的联合可识别性,称为HJID,以保留从领域共享因素中获得的领域特定行为。HJID将用户表示组织成层次结构:通用浅层子空间和面向领域的深层子空间。我们首先通过最小化初始层激活的最大均值差异来编码浅层子空间中的通用模式。然后,为了解析领域定向潜在因素是如何编码在更深层次的激活中,我们构建了一个基于跨领域因果关系的数据生成图,该图识别跨领域一致和领域特定组件,遵循最小变化原则。这使得HJID能够在保证联合可识别性的可逆变换生成框架内保持稳定性,同时发现不同领域的独特因素。通过对真实世界数据集的实验,我们展示了HJID在一系列强相关和弱相关的CDR任务上优于SOTA方法。
更新时间: 2024-04-06 03:11:31
领域: cs.IR,cs.AI,cs.LG
DELTA: Decoupling Long-Tailed Online Continual Learning
A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting previously acquired knowledge. In this work, we study the under-explored problem of Long-Tailed Online Continual Learning (LTOCL), which aims to learn new tasks from sequentially arriving class-imbalanced data streams. Each data is observed only once for training without knowing the task data distribution. We present DELTA, a decoupled learning approach designed to enhance learning representations and address the substantial imbalance in LTOCL. We enhance the learning process by adapting supervised contrastive learning to attract similar samples and repel dissimilar (out-of-class) samples. Further, by balancing gradients during training using an equalization loss, DELTA significantly enhances learning outcomes and successfully mitigates catastrophic forgetting. Through extensive evaluation, we demonstrate that DELTA improves the capacity for incremental learning, surpassing existing OCL methods. Our results suggest considerable promise for applying OCL in real-world applications.
Updated: 2024-04-06 02:33:04
标题: DELTA:解耦长尾在线持续学习
摘要: 在实现普遍人工智能的过程中,一个重要的挑战是模型在数据遵循长尾分布的真实场景中迅速学习新信息的能力受到限制,同时避免遗忘先前获取的知识。在这项工作中,我们研究了一个鲜为人知的问题,即长尾在线持续学习(LTOCL),旨在从顺序到达的类别不平衡数据流中学习新任务。每个数据只被观察一次用于训练,而不知道任务数据分布。我们提出了DELTA,这是一种解耦学习方法,旨在增强学习表示并解决LTOCL中的重大不平衡。我们通过将监督对比学习调整为吸引相似样本和排斥不相似(非类内)样本来增强学习过程。此外,通过在训练过程中使用均衡损失平衡梯度,DELTA显著增强了学习结果并成功缓解了灾难性遗忘。通过广泛的评估,我们证明DELTA提高了增量学习的能力,超越了现有的OCL方法。我们的结果表明,在实际应用中应用OCL有很大的潜力。
更新时间: 2024-04-06 02:33:04
领域: cs.LG,cs.CV
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
LLM-based auto-annotators have become a key component of the LLM development process due to their cost-effectiveness and scalability compared to human-based evaluation. However, these auto-annotators can introduce complex biases that are hard to remove. Even simple, known confounders such as preference for longer outputs remain in existing automated evaluation metrics. We propose a simple regression analysis approach for controlling biases in auto-evaluations. As a real case study, we focus on reducing the length bias of AlpacaEval, a fast and affordable benchmark for chat LLMs that uses LLMs to estimate response quality. Despite being highly correlated with human preferences, AlpacaEval is known to favor models that generate longer outputs. We introduce a length-controlled AlpacaEval that aims to answer the counterfactual question: "What would the preference be if the model's and baseline's output had the same length?". To achieve this, we first fit a generalized linear model to predict the biased output of interest (auto-annotator preferences) based on the mediators we want to control for (length difference) and other relevant features. We then obtain length-controlled preferences by predicting preferences while conditioning the GLM with a zero difference in lengths. Length-controlling not only improves the robustness of the metric to manipulations in model verbosity, we also find that it increases the Spearman correlation with LMSYS' Chatbot Arena from 0.94 to 0.98. We release the code and leaderboard at https://tatsu-lab.github.io/alpaca_eval/ .
Updated: 2024-04-06 02:29:02
标题: 长度控制的AlpacaEval:一种消除自动评估器偏见的简单方法
摘要: 基于LLM的自动注释器已成为LLM开发过程中的关键组成部分,因其与基于人类评估相比具有成本效益和可扩展性。然而,这些自动注释器可能引入难以消除的复杂偏见。即使是简单的已知混杂因素,如对较长输出的偏好,仍存在于现有的自动评估指标中。我们提出了一种简单的回归分析方法来控制自动评估中的偏见。作为一个真实案例研究,我们专注于减少AlpacaEval的长度偏见,这是一个快速且价格实惠的用于估计响应质量的聊天LLM基准。尽管与人类偏好高度相关,AlpacaEval已知偏向于生成较长的输出模型。我们引入了一个控制长度的AlpacaEval,旨在回答反事实问题:“如果模型和基准的输出长度相同,偏好会是什么?”为了实现这一点,我们首先拟合一个广义线性模型,以预测感兴趣的偏见输出(自动注释器偏好),基于我们想要控制的中介变量(长度差异)和其他相关特征。然后,通过在长度差异为零的情况下对GLM进行条件预测,我们获得了控制长度的偏好。控制长度不仅提高了指标对模型冗余性的抵抗力,我们还发现它将与LMSYS的Chatbot Arena的Spearman相关性从0.94提高到0.98。我们在https://tatsu-lab.github.io/alpaca_eval/发布了代码和排行榜。
更新时间: 2024-04-06 02:29:02
领域: cs.LG,cs.AI,cs.CL,stat.ML
Gen4DS: Workshop on Data Storytelling in an Era of Generative AI
Storytelling is an ancient and precious human ability that has been rejuvenated in the digital age. Over the last decade, there has been a notable surge in the recognition and application of data storytelling, both in academia and industry. Recently, the rapid development of generative AI has brought new opportunities and challenges to this field, sparking numerous new questions. These questions may not necessarily be quickly transformed into papers, but we believe it is necessary to promptly discuss them to help the community better clarify important issues and research agendas for the future. We thus invite you to join our workshop (Gen4DS) to discuss questions such as: How can generative AI facilitate the creation of data stories? How might generative AI alter the workflow of data storytellers? What are the pitfalls and risks of incorporating AI in storytelling? We have designed both paper presentations and interactive activities (including hands-on creation, group discussion pods, and debates on controversial issues) for the workshop. We hope that participants will learn about the latest advances and pioneering work in data storytelling, engage in critical conversations with each other, and have an enjoyable, unforgettable, and meaningful experience at the event.
Updated: 2024-04-06 02:12:13
标题: Gen4DS:在生成式人工智能时代的数据叙事研讨会
摘要: 讲故事是人类一项古老而珍贵的能力,在数字时代得到了复兴。在过去的十年里,学术界和工业界对数据讲故事的认可和应用有了明显增加。最近,生成式人工智能的快速发展为这一领域带来了新的机遇和挑战,引发了许多新问题。这些问题可能不会立即转化为论文,但我们认为有必要立即讨论它们,以帮助社区更好地澄清未来的重要问题和研究议程。因此,我们邀请您参加我们的研讨会(Gen4DS),讨论以下问题:生成式人工智能如何促进数据故事的创作?生成式人工智能如何改变数据讲故事者的工作流程?将人工智能融入讲故事中会有哪些陷阱和风险?我们为研讨会设计了论文展示和互动活动(包括实践创作、小组讨论、以及关于有争议问题的辩论)。我们希望参与者能了解数据讲故事领域的最新进展和开拓性工作,与彼此进行批判性对话,并在活动中度过一次愉快、难忘且有意义的经历。
更新时间: 2024-04-06 02:12:13
领域: cs.HC,cs.AI,cs.GR
Demand Balancing in Primal-Dual Optimization for Blind Network Revenue Management
This paper proposes a practically efficient algorithm with optimal theoretical regret which solves the classical network revenue management (NRM) problem with unknown, nonparametric demand. Over a time horizon of length $T$, in each time period the retailer needs to decide prices of $N$ types of products which are produced based on $M$ types of resources with unreplenishable initial inventory. When demand is nonparametric with some mild assumptions, Miao and Wang (2021) is the first paper which proposes an algorithm with $O(\text{poly}(N,M,\ln(T))\sqrt{T})$ type of regret (in particular, $\tilde O(N^{3.5}\sqrt{T})$ plus additional high-order terms that are $o(\sqrt{T})$ with sufficiently large $T\gg N$). In this paper, we improve the previous result by proposing a primal-dual optimization algorithm which is not only more practical, but also with an improved regret of $\tilde O(N^{3.25}\sqrt{T})$ free from additional high-order terms. A key technical contribution of the proposed algorithm is the so-called demand balancing, which pairs the primal solution (i.e., the price) in each time period with another price to offset the violation of complementary slackness on resource inventory constraints. Numerical experiments compared with several benchmark algorithms further illustrate the effectiveness of our algorithm.
Updated: 2024-04-06 01:39:51
标题: 盲网络收入管理中的原始-对偶优化需求平衡
摘要: 这篇论文提出了一个实际有效的算法,具有最优理论遗憾,解决了具有未知、非参数需求的经典网络收益管理(NRM)问题。在长度为$T$的时间范围内,每个时间段零售商需要决定$N$种产品的价格,这些产品是基于$M$种资源生产的,并且具有不可补充的初始库存。当需求是非参数的且满足一些温和的假设时,苗和王(2021年)是第一篇提出具有$O(\text{poly}(N,M,\ln(T))\sqrt{T})$遗憾类型的算法的论文(特别是$\tilde O(N^{3.5}\sqrt{T})$,再加上具有足够大的$T\gg N$时小$o(\sqrt{T})$的额外高阶项)。在本文中,我们通过提出一种原始-对偶优化算法改进了先前的结果,这种算法不仅更实用,而且遗憾也改进为$\tilde O(N^{3.25}\sqrt{T})$,不再有额外的高阶项。所提算法的一个关键技术贡献是所谓的需求平衡,该平衡将每个时间段的原始解(即价格)与另一个价格配对,以抵消对资源库存约束的互补松弛条件的违反。与几种基准算法进行的数值实验进一步说明了我们算法的有效性。
更新时间: 2024-04-06 01:39:51
领域: stat.ML,cs.LG
Cybersecurity for Modern Smart Grid against Emerging Threats
Smart Grid is a power grid system that uses digital communication technologies. By deploying intelligent devices throughout the power grid infrastructure,from power generation to consumption, and enabling communication among them, it revolutionizes the modern power grid industry with increased efficiency, reliability, and availability. However, reliance on information and communication technologies has also made the smart grids exposed to new vulnerabilities and complications that may negatively impact the availability and stability of electricity services, which are vital for people's daily lives. The purpose of this monograph is to provide an up-to-date and comprehensive survey and tutorial on the cybersecurity aspect of smart grids. The book focuses on the sources of the cybersecurity issues, the taxonomy of threats, and the survey of various approaches to overcome or mitigate such threats. It covers the state-of-the-art research results in recent years, along with remaining open challenges. We hope that this monograph can be used both as learning materials for beginners who are embarking on research in this area and as a useful reference for established researchers in this field.
Updated: 2024-04-06 01:31:33
标题: 现代智能电网的网络安全防御新兴威胁
摘要: 智能电网是一种利用数字通信技术的电网系统。通过在整个电网基础设施中部署智能设备,从发电到消费,实现它们之间的通信,它以提高效率、可靠性和可用性的方式革新了现代电网行业。然而,对信息和通信技术的依赖也使智能电网暴露于新的脆弱性和复杂性,可能对人们日常生活至关重要的电力服务的可用性和稳定性产生负面影响。本专著的目的是提供关于智能电网网络安全方面的最新和全面的调查和教程。本书专注于网络安全问题的来源、威胁的分类以及克服或减轻此类威胁的各种方法的调查。它涵盖了近年来的最新研究成果,以及尚未解决的开放性挑战。我们希望本专著既可以作为初学者在这一领域进行研究的学习材料,也可以作为这一领域的资深研究人员的有用参考。
更新时间: 2024-04-06 01:31:33
领域: cs.CR
Approximate Information States for Worst-Case Control and Learning in Uncertain Systems
In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate information state, and introduce conditions to identify an uncertain variable that can be used to compute an optimal strategy through a dynamic program (DP). Next, we relax these conditions and define approximate information states that can be learned from output data without knowledge of system dynamics. We use approximate information states to formulate a DP that yields a strategy with a bounded performance loss. Finally, we illustrate the application of our results in control and reinforcement learning using numerical examples.
Updated: 2024-04-06 00:50:16
标题: 不确定系统中最坏情况控制和学习的近似信息状态
摘要: 在本文中,我们研究了在具有部分观测状态的不确定系统中的离散时间决策问题。我们考虑了一个非随机模型,其中作用于系统的无控制扰动的值在具有未知分布的有界集合中。我们通过使用信息状态和近似信息状态的概念,提出了在这类问题中进行决策的一般框架,并介绍了用于通过动态规划(DP)计算最优策略的确定一个不确定变量的条件。接下来,我们放宽这些条件,并定义了可以从输出数据中学习而无需了解系统动态的近似信息状态。我们使用近似信息状态制定了一个DP,该DP产生了一个具有有界性能损失的策略。最后,我们通过数值例子展示了我们结果在控制和强化学习中的应用。
更新时间: 2024-04-06 00:50:16
领域: eess.SY,cs.AI,cs.SY,math.OC
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phase is constrained by memory bandwidth due to the overhead of transferring weights and KV cache values from the memory system to the computing units. This memory bottleneck becomes particularly pronounced in applications that require long-context and extensive text generation, both of which are increasingly crucial for LLMs. This paper introduces "Keyformer", an innovative inference-time approach, to mitigate the challenges associated with KV cache size and memory bandwidth utilization. Keyformer leverages the observation that approximately 90% of the attention weight in generative inference focuses on a specific subset of tokens, referred to as "key" tokens. Keyformer retains only the key tokens in the KV cache by identifying these crucial tokens using a novel score function. This approach effectively reduces both the KV cache size and memory bandwidth usage without compromising model accuracy. We evaluate Keyformer's performance across three foundational models: GPT-J, Cerebras-GPT, and MPT, which employ various positional embedding algorithms. Our assessment encompasses a variety of tasks, with a particular emphasis on summarization and conversation tasks involving extended contexts. Keyformer's reduction of KV cache reduces inference latency by 2.1x and improves token generation throughput by 2.4x, while preserving the model's accuracy.
Updated: 2024-04-06 00:22:37
标题: Keyformer:通过关键令牌选择减少KV缓存以实现高效的生成推断
摘要: Transformer已经成为大型语言模型(LLMs)的基础架构。在生成式语言模型中,推理过程包括两个主要阶段:提示处理和标记生成。标记生成占据了大部分计算工作量,主要涉及向量矩阵乘法和与关键-值(KV)缓存的交互。由于将权重和KV缓存值从内存系统传输到计算单元的开销,这个阶段受到内存带宽的限制。这种内存瓶颈在需要长上下文和广泛文本生成的应用中特别显著,这两者对于LLMs变得越来越重要。 本文介绍了一种创新的推理时方法“Keyformer”,以减轻与KV缓存大小和内存带宽利用相关的挑战。Keyformer利用一个观察结果,即大约90%的生成式推理中的注意权重集中在一个特定的标记子集上,称为“关键”标记。Keyformer通过使用一种新颖的评分函数识别这些关键标记,仅保留KV缓存中的关键标记。这种方法有效地减少了KV缓存大小和内存带宽使用,而不会影响模型的准确性。我们评估了Keyformer在三个基础模型上的性能:GPT-J、Cerebras-GPT和MPT,这些模型采用不同的位置嵌入算法。我们的评估涵盖了各种任务,特别强调了涉及扩展上下文的摘要和对话任务。Keyformer减少了KV缓存,将推理延迟降低了2.1倍,并将标记生成吞吐量提高了2.4倍,同时保持模型的准确性。
更新时间: 2024-04-06 00:22:37
领域: cs.LG,cs.AI,cs.AR,cs.CL,68U35,I.2.7; C.0
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code and a screencast video demo of NeuroPrompts publicly available.
Updated: 2024-04-06 00:17:01
标题: 神经提示:一种用于优化文本到图像生成的提示的自适应框架
摘要: 尽管最近文本到图像扩散模型取得了令人印象深刻的进展,但要获得高质量图像通常需要由具有使用专业知识的人类迅速进行工程处理。在这项工作中,我们提出了NeuroPrompts,这是一个自适应框架,可以自动增强用户的提示,以改善文本到图像模型生成的质量。我们的框架利用受过训练的语言模型进行受限文本解码,该语言模型已被调整以生成类似于人类提示工程师生成的提示。这种方法使得文本到图像生成的质量更高,并通过约束集规范提供了用户对风格特征的控制。我们通过使用Stable Diffusion创建了一个交互式应用程序来展示我们的框架的实用性。此外,我们进行了一系列实验,利用一个大型数据集中的人工设计的提示进行文本到图像生成,并展示我们的方法自动产生的增强提示能够产生更优质的图像。我们公开提供NeuroPrompts的代码和一个屏幕录像演示视频。
更新时间: 2024-04-06 00:17:01
领域: cs.AI
Beyond the Known: Adversarial Autoencoders in Novelty Detection
In novelty detection, the goal is to decide if a new data point should be categorized as an inlier or an outlier, given a training dataset that primarily captures the inlier distribution. Recent approaches typically use deep encoder and decoder network frameworks to derive a reconstruction error, and employ this error either to determine a novelty score, or as the basis for a one-class classifier. In this research, we use a similar framework but with a lightweight deep network, and we adopt a probabilistic score with reconstruction error. Our methodology calculates the probability of whether the sample comes from the inlier distribution or not. This work makes two key contributions. The first is that we compute the novelty probability by linearizing the manifold that holds the structure of the inlier distribution. This allows us to interpret how the probability is distributed and can be determined in relation to the local coordinates of the manifold tangent space. The second contribution is that we improve the training protocol for the network. Our results indicate that our approach is effective at learning the target class, and it outperforms recent state-of-the-art methods on several benchmark datasets.
Updated: 2024-04-06 00:04:19
标题: 超越已知:对抗自动编码器在新颖性检测中的应用
摘要: 在新颖性检测中,目标是在主要捕捉内点分布的训练数据集的基础上,决定新数据点是否应被分类为内点或异常值。最近的方法通常使用深度编码器和解码器网络框架来推导重构错误,并将此错误用于确定新颖性分数,或作为单类分类器的基础。在这项研究中,我们使用类似的框架,但采用轻量级深度网络,并采用具有重构错误的概率分数。我们的方法计算样本来自内点分布还是不来自内点分布的概率。这项工作有两个关键贡献。第一是通过线性化容纳内点分布结构的流形来计算新颖性概率。这使我们能够解释概率如何分布,并且可以与流形切空间的局部坐标相关确定。第二个贡献是改进网络的训练协议。我们的结果表明,我们的方法在学习目标类别方面是有效的,并且在几个基准数据集上优于最近的最先进方法。
更新时间: 2024-04-06 00:04:19
领域: cs.CV,cs.AI,cs.LG