【文献调研】搜索与学习思想在人工智能研究中的应用

训练后的神经网络则作为学习到的启发式函数或策略，能够指导和增强后续搜索过程的效率和有效性。报告的结构将首先介绍“搜索+学习”的基本原理，然后探讨神经网络在学习搜索启发式方面的应用，接着深入分析这种范式在神经组合优化、规划和机器人技术中的应用，并重点介绍在游戏人工智能领域取得突破性进展的AlphaZero和MuZero算法。未来的研究方向包括开发更具通用性和可扩展性的搜索启发式学习方法，提高神经网络

vlln

1166人浏览 · 2025-05-05 17:46:22

vlln · 2025-05-05 17:46:22 发布

搜索与学习思想在人工智能研究中的应用 (2018年-至今)

1. 引言

解决复杂人工智能问题，尤其是在涉及庞大搜索空间和需要高效决策的情况下，一直以来都是一个核心挑战。为了应对这些挑战，一种结合了显式搜索和机器学习优势的协同方法——“搜索+学习”范式应运而生。这种范式利用显式搜索算法系统地探索问题空间，生成高质量的数据。随后，这些数据被用于训练机器学习模型，特别是神经网络。训练后的神经网络则作为学习到的启发式函数或策略，能够指导和增强后续搜索过程的效率和有效性。近年来，随着搜索方法和机器学习技术（尤其是深度学习）的不断进步，这种范式在各种人工智能领域中的相关性和影响力日益增强。本报告旨在全面回顾2018年至目前为止发表的关于“搜索+学习”研究的主要进展，涵盖关键方法、应用以及未来的发展方向。报告的结构将首先介绍“搜索+学习”的基本原理，然后探讨神经网络在学习搜索启发式方面的应用，接着深入分析这种范式在神经组合优化、规划和机器人技术中的应用，并重点介绍在游戏人工智能领域取得突破性进展的AlphaZero和MuZero算法。此外，报告还将概述支持“搜索+学习”研究和开发的开源工具和框架，最后讨论当前面临的挑战和未来的研究方向。

2. “搜索+学习”范式的基本原理

“搜索+学习”范式是一种将显式搜索算法与机器学习技术相结合的强大方法，旨在解决复杂的人工智能问题¹。其核心思想在于利用搜索过程产生有价值的数据，然后利用这些数据训练机器学习模型，从而提升未来的搜索效率和效果。显式搜索算法，例如A*搜索、分支定界和蒙特卡洛树搜索（MCTS），通过系统地探索问题空间来寻找解决方案或收集信息⁴。在这些搜索过程中产生的数据，例如搜索轨迹、状态评估以及子问题的最优解，可以作为机器学习模型（特别是神经网络）的训练数据¹。经过训练，这些神经网络可以充当学习到的启发式函数或策略，为后续的搜索过程提供指导，使其能够更快速、更有效地找到高质量的解决方案¹。

相较于传统方法，“搜索+学习”范式展现出显著的优势。与需要大量领域知识和反复试验才能完成的手动设计启发式方法形成对比，“搜索+学习”提供了一种自动发现有效启发式的方法⁴。此外，与直接将输入映射到输出而没有显式搜索的端到端学习方法相比，“搜索+学习”能够将结构化的探索和推理融入到学习过程中，从而在需要序列决策或探索巨大状态空间的问题中表现更佳¹。

这种范式的出现，使得启发式函数的设计得以自动化，有可能产生比人类专家手工设计的更有效和更具通用性的启发式函数。手动设计启发式函数是一个需要深厚专业知识和大量时间的过程⁴。通过使用搜索来生成关于问题空间的数据，然后训练神经网络来学习这些数据中的模式，可以自动创建针对特定问题领域的启发式函数，并且有可能发现人类专家可能忽略的新策略。许多研究片段都强调了手动设计启发式的成本以及自动化的潜力⁴。

此外，将学习与搜索相结合可以克服纯粹数据驱动方法的局限性，通过融入结构化的探索和推理来提升性能。仅仅依赖端到端学习的方法在处理需要序列决策或探索庞大状态空间的问题时可能会遇到困难。通过在学习框架内嵌入搜索过程，“搜索+学习”使得模型不仅能够学习直接的映射，还能学习探索问题空间的策略，从而产生更稳健和有效的解决方案。这一点在各种研究片段中将“搜索+学习”与纯粹的贪婪搜索或局部搜索算法进行比较时得到了体现¹。

3. 基于神经网络的学习搜索启发式

各种机器学习技术被用于训练神经网络以指导搜索过程。其中，强化学习（RL）将搜索过程视为一个序列决策过程，智能体（搜索算法）通过与环境（问题空间）交互学习策略（启发式函数），以最大化奖励（例如，快速找到高质量的解决方案）¹。例如，GLSEARCH框架在最大公共子图检测中使用了深度Q网络（DQN）来学习搜索策略²。此外，策略梯度方法（如REINFORCE）也被用于训练指导局部搜索算子的策略³。

模仿学习是另一种重要的方法，它训练神经网络来模仿专家搜索策略或最优搜索轨迹的行为¹。路径启发式模仿学习（PHIL）框架就是一个例子，它用于发现图搜索启发式⁴。追溯模仿学习的概念则指策略通过回顾自身成功的搜索过程，构建改进的轨迹来学习⁵。

在某些情况下，监督学习也被应用于训练神经网络，这些网络在问题实例及其对应的最优解或启发式值的数据集上进行训练，以直接预测搜索指导¹。

许多特定的神经网络架构被应用于学习搜索启发式。图神经网络（GNN）在学习图结构化问题的表示以及利用这些表示来指导搜索方面非常有效，例如在社群检测¹²、MCS检测²和组合优化¹³中。指针网络则在旅行商问题等组合优化问题中展现出潜力，它们通过学习从输入序列中选择元素来指导搜索过程¹⁰。此外，Transformer网络也被用于学习搜索数据中的复杂关系并生成有效的搜索策略¹⁴。

对于训练搜索启发式，学习范式（强化学习、模仿学习、监督学习）的选择在很大程度上取决于专家数据的可用性、奖励信号的性质以及所需的探索程度。如果存在最优解或专家搜索轨迹，模仿学习或监督学习可能更直接。如果奖励稀疏但定义明确（例如，达到目标状态），强化学习在通过试错学习方面可能更有效。搜索空间中探索的必要性也会影响选择，强化学习通常更适合发现新策略。

图神经网络已成为解决具有底层图结构问题的学习搜索启发式的强大架构，使得模型能够利用问题中固有的关系信息。许多搜索问题，例如组合优化、规划和网络分析中的问题，都可以自然地表示为图。图神经网络专门用于处理图数据，学习捕获问题结构的节点和边嵌入。这使得学习到的启发式能够被问题不同部分之间的关系所告知，从而实现更有效的搜索指导²。

4. 通过“搜索+学习”实现的神经组合优化

“搜索+学习”范式已被广泛应用于解决NP难题的组合优化问题。旅行商问题（TSP）和车辆路径问题（VRP）是突出的例子，神经网络被用来学习启发式方法，以指导搜索算法或直接构建具有搜索机制的解决方案⁹。

大规模邻域搜索（LNS）框架是另一种应用，其中神经网络学习将问题分解为更小的子问题，这些子问题可以被现有的求解器更有效地解决¹。GLSEARCH框架用于最大公共子图（MCS）检测，它使用基于GNN的DQN来指导分支定界搜索算法²。

神经网络也与分支定界等搜索算法相结合使用²。在这些方法中，神经网络学习选择分支变量或节点，以提高搜索效率。

“搜索+学习”为解决组合优化问题固有的复杂性提供了一种有前景的方法，通常在解决方案质量和运行时间方面优于传统的启发式方法，甚至可以与商业求解器竞争。诸如具有学习分解的LNS ¹和GLSEARCH ²等方法在具有挑战性的组合问题上取得了最先进的结果，这证明了将搜索与学习到的启发式相结合的力量。这表明这种范式可以为传统方法可能不足的现实世界优化问题提供实用的解决方案。

神经组合优化领域的发展趋势是更深入地将学习融入搜索过程，而不仅仅是将神经网络用作端到端的解决方案预测器。早期的研究主要集中在训练神经网络来直接输出组合问题的解决方案。然而，最近的研究强调了使用神经网络来指导搜索过程本身的益处，例如通过学习分支策略、节点选择策略或问题分解。这使得模型能够利用学习和结构化搜索两者的优势。

表 1：神经组合优化中的关键“搜索+学习”方法

方法名称	组合优化问题	集成的搜索算法	主要优势	相关文献
LNS with Learning	整数线性规划 (ILP)	大规模邻域搜索	通过学习分解策略，利用现有求解器高效解决大规模ILP问题，性能优于商业求解器	¹
GLSEARCH	最大公共子图 (MCS)	分支定界	使用基于GNN的DQN学习节点选择启发式，加速MCS检测，能够检测比现有求解器更大的公共子图	²
Pointer Networks	旅行商问题 (TSP), 车辆路径问题 (VRP)	序列构建 (类似搜索)	使用注意力机制直接输出问题解的序列，通过强化学习进行优化，在一定规模的问题上接近最优解	¹⁰
LEHD	旅行商问题 (TSP), 车辆路径问题 (VRP)	序列构建 (类似搜索)	轻量级编码器和重型解码器结构，具有强大的泛化能力，能够处理大规模问题实例（高达1000个节点），并在实际数据集上表现良好	³²
Learning Branching	混合整数规划 (MIP)	分支定界	学习变量选择策略以模仿专家强分支规则，在求解速度和搜索树大小方面优于专家设计的规则和现有机器学习方法	¹⁰
Neural A*	路径规划	A*搜索	提出可微分的A*搜索，允许通过反向传播损失来训练包括搜索步骤在内的整个模型，在搜索最优性和效率之间取得良好平衡，并能预测真实的人类轨迹	⁴⁵

5. “搜索+学习”在规划和机器人技术中的应用

学习到的启发式方法也被广泛应用于改进机器人和人工智能领域的规划算法。模仿学习被用于训练指导路径规划问题搜索的策略⁴。神经网络也被用来学习价值函数或成本图，这些函数或成本图可以被A*等搜索算法用来寻找高效的路径⁴。此外，“搜索+学习”也被应用于更复杂的机器人任务，如任务和运动规划⁵⁴。在多智能体路径规划场景中，学习有助于协调多个智能体的搜索⁴⁴。神经网络也被用于学习复杂、高维状态空间中有效的搜索策略，这与机器人技术密切相关⁴⁶。

在机器人技术和规划领域，学习搜索尤其有益，因为这些领域的状态空间可能非常庞大且动态变化，使得传统的搜索方法在计算上非常昂贵或不可行。学习到的启发式方法能够引导搜索朝着状态空间中有希望的区域前进，从而显著减少探索的状态数量，在复杂的环境中实现更快、更高效的规划。这一点在无人机飞行规划和多智能体系统的背景下得到了强调⁴。

来自专家演示的模仿学习是机器人技术和规划领域学习搜索启发式的一种常见方法，它允许机器人通过观察成功的行为来学习复杂的导航和操作策略。提供如何解决规划问题的专家演示，使得学习算法能够理解期望的行为并学习可以复制这种行为的成本函数或策略。这在路径规划中使用LEARCH ⁴⁹和其他模仿学习方法的工作中得到了体现⁴。

6. 游戏人工智能的进步：AlphaZero 和 MuZero

AlphaZero和MuZero的突破性成就代表了“搜索+学习”范式的典型范例⁶。这两种算法都利用蒙特卡洛树搜索（MCTS）进行规划，并使用深度强化学习来学习强大的评估函数（价值网络）和移动预测策略（策略网络）。AlphaZero通过完全自对弈学习，在没有任何人类知识的情况下，在围棋、国际象棋和将棋中都达到了超人的水平。MuZero则更进一步，除了策略和价值函数外，还学习了环境模型（包括动态和奖励），使其能够在事先不知道游戏规则的情况下掌握围棋、国际象棋、将棋和Atari等游戏。

这些算法通过平衡MCTS框架内的探索和利用来有效地学习搜索，并由学习到的神经网络指导。AlphaZero和MuZero对人工智能领域产生了重大影响，证明了结合搜索和学习来通用地解决复杂问题的潜力。诸如MiniZero、RLZero和MuZero General等开源实现和框架促进了这些想法的进一步研究和应用⁷⁵。

AlphaZero和MuZero展示了“搜索+学习”范式通过自对弈迭代改进搜索策略和评估函数，从而在复杂领域中实现超人性能的卓越能力。这些算法的成功展示了一个强大的反馈循环，更好的搜索带来更好的训练数据，进而带来更好的学习模型，进一步改进搜索。这种迭代过程使得算法能够发现人类专家可能不会立即意识到的复杂策略。

MuZero学习环境模型的能力为将“搜索+学习”应用于更广泛的现实世界问题打开了大门，在这些问题中，底层动态是未知的或难以显式建模的。通过学习预测未来的状态和奖励，MuZero消除了对完美模拟器的需求，而这曾是AlphaZero的要求。这显著扩展了“搜索+学习”范式在机器人技术、工业控制和科学发现等领域的适用性，在这些领域中，与真实环境交互以进行学习至关重要。

7. 开源工具和框架

许多有价值的开源库和框架支持“搜索+学习”范式下的研究和开发。MiniZero支持AlphaZero、MuZero及其Gumbel变体，并兼容多种游戏环境⁷⁵。RLZero提供了一个简洁易懂的MuZero、AlphaZero和自对弈强化学习算法的实现，适用于任何游戏⁷⁸。MuZero General是基于原始论文的MuZero的注释和文档完善的实现，旨在轻松适应新的游戏和强化学习环境⁸¹。NCOLib旨在简化神经网络模型和深度学习算法在解决组合优化问题中的应用⁸⁶。ai4co (RL4CO) 是一个专门用于组合优化强化学习的PyTorch库⁸⁷。

这些工具提供了不同的算法、游戏环境、神经网络架构和训练流程的支持。它们通过使这些复杂的算法更易于访问，并促进实验和可重复性，为“搜索+学习”领域的研究进步做出了贡献。这些开源框架降低了对“搜索+学习”感兴趣的研究人员和从业者的门槛。它们提供了经过良好测试和记录的代码库，可以随时用于和调整以适应新的问题，从而促进了该领域的协作并加速了创新步伐。

8. 挑战与未来方向

“搜索+学习”范式目前仍面临一些局限性和挑战。其中之一是泛化问题，即如何训练能够很好地泛化到未见问题实例或更大问题规模的学习启发式¹⁴。另一个挑战是可扩展性，尤其是在组合优化和规划中处理非常大规模问题时，现有方法存在局限性²⁸。此外，理解神经网络学习到的启发式和决策过程的难度也限制了它们在关键应用中的采用³¹。最后，训练这些模型（尤其是像AlphaZero和MuZero这样的算法）需要大量的计算资源⁷⁰。

未来的研究方向包括开发更具通用性和可扩展性的搜索启发式学习方法，提高神经网络引导搜索的可解释性和可解释性，探索更有效的训练技术并利用迁移学习，研究“搜索+学习”与神经符号人工智能⁸⁸和大型语言模型¹⁵等新兴人工智能技术的集成，以及将“搜索+学习”应用于各个领域新的和具有挑战性的现实世界问题。

解决泛化性、可扩展性、可解释性和计算成本方面的挑战对于“搜索+学习”范式在现实世界应用中的更广泛采用至关重要。虽然“搜索+学习”已经显示出巨大的潜力，但克服这些限制将是这些技术在实际场景中部署的关键。例如，改进泛化性将允许在较小实例上训练的模型解决较大的实例，而增强可解释性将增加信任并有助于调试和改进学习到的策略。

“搜索+学习”与神经符号方法和大型语言模型等其他先进人工智能技术的集成，为创建更强大和通用的、能够进行复杂推理和问题解决的人工智能系统提供了巨大的潜力。将神经网络的数据驱动学习能力与基于知识系统的符号推理或大型语言模型的语言理解和生成能力相结合，可能会在自动推理、基于自然语言指令的规划以及开发更像人类的人工智能代理等领域带来新的突破。

9. 结论

自2018年以来，“搜索+学习”范式在人工智能研究中取得了显著的进展。它作为一种强大的方法，通过结合显式搜索和机器学习的优势来解决复杂问题，其重要性日益凸显。该范式通过自动化启发式设计、实现超人性能以及解决以前难以处理的问题，正在推动人工智能领域的进步。尽管仍存在一些挑战，但持续的研究和开发工作预示着未来将出现更智能、更高效和更通用的智能系统。

引用的著作

proceedings.neurips.cc, 访问时间为五月 5, 2025， https://proceedings.neurips.cc/paper_files/paper/2020/file/e769e03a9d329b2e864b4bf4ff54ff39-Paper.pdf
GLSearch: Maximum Common Subgraph Detection via … - NSF-PAR, 访问时间为五月 5, 2025， https://par.nsf.gov/servlets/purl/10259989
proceedings.neurips.cc, 访问时间为五月 5, 2025， https://proceedings.neurips.cc/paper/2021/file/fc9e62695def29ccdb9eb3fed5b4c8c8-Paper.pdf
Learning Graph Search Heuristics - OpenReview, 访问时间为五月 5, 2025， https://openreview.net/pdf?id=-xjStp_F9o
Learning to Search via Retrospective Imitation - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/pdf/1804.00846
MuZero - Wikipedia, 访问时间为五月 5, 2025， https://en.wikipedia.org/wiki/MuZero
www.ifaamas.org, 访问时间为五月 5, 2025， https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p1074.pdf
GLSearch: Maximum Common Subgraph Detection via Learning to Search - UCLA Computer Science Department, 访问时间为五月 5, 2025， https://web.cs.ucla.edu/~yzsun/papers/2021_ICML_GLSearch.pdf
Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization, 访问时间为五月 5, 2025， https://proceedings.neurips.cc/paper_files/paper/2023/file/97b983c974551153d20ddfabb62a5203-Paper-Conference.pdf
Learning to Branch in Combinatorial Optimization with Graph Pointer Networks - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/html/2307.01434v1
Learning to Branch in Combinatorial Optimization With Graph Pointer Networks, 访问时间为五月 5, 2025， https://www.ieee-jas.net/article/doi/10.1109/JAS.2023.124113
‪Jianyong Sun‬ - ‪Google Scholar‬, 访问时间为五月 5, 2025， https://scholar.google.co.uk/citations?user=2FGZtCMAAAAJ&hl=en
CSCI 699: Topics in Discrete Optimization and Learning - USC Search, 访问时间为五月 5, 2025， https://web-app.usc.edu/soc/syllabus/20201/30126.pdf
Neural Solver Selection for Combinatorial Optimization - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/html/2410.09693
arXiv:2404.03683v1 [cs.LG] 1 Apr 2024, 访问时间为五月 5, 2025， https://arxiv.org/pdf/2404.03683?
‪Kevin Tierney‬ - ‪Google Scholar‬, 访问时间为五月 5, 2025， https://scholar.google.com/citations?user=G-EGfLEAAAAJ&hl=en
‪Shengcai Liu‬ - ‪Google Scholar‬, 访问时间为五月 5, 2025， https://scholar.google.com/citations?user=tV0nV3oAAAAJ&hl=en
arxiv.org, 访问时间为五月 5, 2025， https://arxiv.org/pdf/2003.03600
(PDF) Learning Combinatorial Optimization on Graphs: A Survey …, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/342460990_Learning_Combinatorial_Optimization_on_Graphs_A_Survey_With_Applications_to_Networking
scholar.harvard.edu, 访问时间为五月 5, 2025， https://scholar.harvard.edu/files/ctang/files/1-s2.0-s0950705121007887-main.pdf
End-to-End Constrained Optimization Learning: A Survey, 访问时间为五月 5, 2025， https://web.ecs.syr.edu/~ffiorett/files/papers/ijcai21a.pdf
www.ijcai.org, 访问时间为五月 5, 2025， https://www.ijcai.org/proceedings/2021/0610.pdf
Neural Combinatorial Optimization: a New Player in the Field - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/pdf/2205.01356
Learning to Branch in Combinatorial Optimization With Graph Pointer Networks - SciEngine, 访问时间为五月 5, 2025， https://www.sciengine.com/doi/pdfView/907505DE38B54DEFAEAB3D54DC5C732B
Learning Combinatorial Optimization Algorithms over Graphs, 访问时间为五月 5, 2025， https://ics.uci.edu/~dechter/courses/ics-295/winter-2018/papers/nips/comb-opt_rl_combopt.pd.pdf
Learning to Branch in Combinatorial Optimization with Graph Pointer Networks - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/pdf/2307.01434
A review on learning to solve combinatorial optimisation problems in manufacturing, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/368991066_A_review_on_learning_to_solve_combinatorial_optimisation_problems_in_manufacturing
Neural Combinatorial Optimization for Real-World Routing - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/html/2503.16159
View of Neural Combinatorial Optimization on Heterogeneous Graphs: An Application to the Picker Routing Problem in Mixed-shelves Warehouses, 访问时间为五月 5, 2025， https://ojs.aaai.org/index.php/ICAPS/article/view/31494/33654
Neural Combinatorial Optimization on Heterogeneous Graphs. An Application to the Picker Routing Problem in Mixed-shelves Warehouses | OpenReview, 访问时间为五月 5, 2025， https://openreview.net/forum?id=BL0DDUfSzk
Unveiling Neural Combinatorial Optimization Model Representations Through Probing, 访问时间为五月 5, 2025， https://openreview.net/forum?id=agEy9hliY1
NeurIPS Poster Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization, 访问时间为五月 5, 2025， https://neurips.cc/virtual/2023/poster/71671
UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/html/2408.12214v1
NeurIPS Poster BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization, 访问时间为五月 5, 2025， https://nips.cc/virtual/2023/poster/72480
Reducing the Costs to Design, Train, and Collect Data for Neural Networks with Combinatorial Optimization, 访问时间为五月 5, 2025， https://kilthub.cmu.edu/articles/thesis/Reducing_the_Costs_to_Design_Train_and_Collect_Data_for_Neural_Networks_with_Combinatorial_Optimization/24774432
Neural Combinatorial Optimization: a New Player in the Field - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/360353780_Neural_Combinatorial_Optimization_a_New_Player_in_the_Field
Neural Combinatorial Optimization - Chaitanya K. Joshi, 访问时间为五月 5, 2025， https://www.chaitjo.com/post/neural-combinatorial-optimization/
Rintarooo/TSP_DRL_PtrNet: “Neural Combinatorial Optimization with Reinforcement Learning”[Bello+, 2016], Traveling Salesman Problem solver - GitHub, 访问时间为五月 5, 2025， https://github.com/Rintarooo/TSP_DRL_PtrNet
pemami4911/neural-combinatorial-rl-pytorch: PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning https://arxiv.org/abs/1611.09940 - GitHub, 访问时间为五月 5, 2025， https://github.com/pemami4911/neural-combinatorial-rl-pytorch
Learning to Search in Local Branching - AAAI, 访问时间为五月 5, 2025， https://cdn.aaai.org/ojs/20294/20294-13-24307-1-2-20220628.pdf
(PDF) Learning to Search in Local Branching - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/356817599_Learning_to_Search_in_Local_Branching
A General Large Neighborhood Search Framework for Solving Integer Linear Programs, 访问时间为五月 5, 2025， https://proceedings.neurips.cc/paper/2020/file/e769e03a9d329b2e864b4bf4ff54ff39-Paper.pdf
Learning to Search in Branch-and-Bound Algorithms - He He, 访问时间为五月 5, 2025， https://hhexiy.github.io/docs/papers/ilp-bb.pdf
Learning Node-Selection Strategies in Bounded-Suboptimal Conflict-Based Search for Multi-Agent Path Finding, 访问时间为五月 5, 2025， https://idm-lab.org/bib/abstracts/papers/aamas21b.pdf
Path Planning using Neural A* Search - Proceedings of Machine Learning Research, 访问时间为五月 5, 2025， http://proceedings.mlr.press/v139/yonetani21a/yonetani21a.pdf
Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees | OpenReview, 访问时间为五月 5, 2025， https://openreview.net/forum?id=rJgJDAVKvB
(PDF) Learning UAV-based path planning for efficient localization of objects using prior knowledge - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/387105781_Learning_UAV-based_path_planning_for_efficient_localization_of_objects_using_prior_knowledge
Soft Value Iteration Networks for Planetary Rover Path Planning - OpenReview, 访问时间为五月 5, 2025， https://openreview.net/forum?id=Sktm4zWRb
DhruvaKumar/path-planning-imitation-learning - GitHub, 访问时间为五月 5, 2025， https://github.com/DhruvaKumar/path-planning-imitation-learning
A curated list of awesome imitation learning resources and publications - GitHub, 访问时间为五月 5, 2025， https://github.com/kristery/Awesome-Imitation-Learning
apexrl/Imitation-Learning-Paper-Lists - GitHub, 访问时间为五月 5, 2025， https://github.com/apexrl/Imitation-Learning-Paper-Lists
Value Function Learning via Prolonged Backward Heuristic Search - PRL Workshop Series, 访问时间为五月 5, 2025， https://prl-theworkshop.github.io/prl2023-icaps/papers/value-function-learning.pdf
Learning to Search with MCTSnets | David Silver, 访问时间为五月 5, 2025， https://www.davidsilver.uk/wp-content/uploads/2020/03/mctsnet.pdf
Learning to Search in Task and Motion Planning With Streams - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/368274638_Learning_to_Search_in_Task_and_Motion_Planning_with_Streams
[2111.13144] Learning to Search in Task and Motion Planning with Streams - arXiv, 访问时间为五月 5, 2025， https://arxiv.org/abs/2111.13144
Learning to Search in Task and Motion Planning with Streams | Papers With Code, 访问时间为五月 5, 2025， https://paperswithcode.com/paper/learning-to-search-in-task-and-motion
Multiagent Cooperative Search Learning With Intermittent Communication - IEEE Computer Society, 访问时间为五月 5, 2025， https://www.computer.org/csdl/magazine/ex/2024/02/10382961/1TxRMrUgjYc
Knowledge-Guided Reinforcement Learning with Artificial Potential Field-Based Demonstrations for Multi-Autonomous Underwater Vehicle Cooperative Hunting - MDPI, 访问时间为五月 5, 2025， https://www.mdpi.com/2077-1312/13/3/423
Student of Games: A unified learning algorithm for both perfect and imperfect information games | Request PDF - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/375669946_Student_of_Games_A_unified_learning_algorithm_for_both_perfect_and_imperfect_information_games
(PDF) GameTable Working Group 1 meeting report on search, planning, learning, and explainability - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/382727758_GameTable_Working_Group_1_meeting_report_on_search_planning_learning_and_explainability
Student of Games: A unified learning algorithm for both perfect and imperfect information games arXiv:2112.03178v2 [cs.AI] 15, 访问时间为五月 5, 2025， http://arxiv.org/pdf/2112.03178
A unified learning algorithm for both perfect and imperfect information games - PMC, 访问时间为五月 5, 2025， https://pmc.ncbi.nlm.nih.gov/articles/PMC10651118/
[PDF] Monte-Carlo Tree Search for Multi-Player Games | Semantic Scholar, 访问时间为五月 5, 2025， https://www.semanticscholar.org/paper/Monte-Carlo-Tree-Search-for-Multi-Player-Games/73a1a8a0659cac92206291ee009ac50777d9d99c
AI Search: The Bitter-Er Lesson | Hacker News, 访问时间为五月 5, 2025， https://news.ycombinator.com/item?id=40683697
CSC 2547 Fall 2019: Learning to Search - GitHub Pages, 访问时间为五月 5, 2025， https://duvenaud.github.io/learning-to-search/
Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search - IJCAI, 访问时间为五月 5, 2025， https://www.ijcai.org/proceedings/2024/0772.pdf
Learning to search with MCTSnets - OpenReview, 访问时间为五月 5, 2025， https://openreview.net/forum?id=r1TA9ZbA-
Learning to Search in Reinforcement Learning - UCL Discovery - University College London, 访问时间为五月 5, 2025， https://discovery.ucl.ac.uk/id/eprint/10166147/1/PhD_Thesis_draft_corrected.pdf
‘AlphaGo’ directory - Gwern.net, 访问时间为五月 5, 2025， https://gwern.net/doc/reinforcement-learning/model/alphago/index
MuZero: Mastering Go, chess, shogi and Atari without rules - Google DeepMind, 访问时间为五月 5, 2025， https://deepmind.google/discover/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules/
Finally an official MuZero implementation : r/reinforcementlearning - Reddit, 访问时间为五月 5, 2025， https://www.reddit.com/r/reinforcementlearning/comments/tfu624/finally_an_official_muzero_implementation/
alexZajac/muzero_experiments: A set of experiments and human-playing comparisons with the Muzero agent from Google DeepMind, made as part of a research project with l’école polytechnique. - GitHub, 访问时间为五月 5, 2025， https://github.com/alexZajac/muzero_experiments
chiamp/muzero-cartpole: Applying DeepMind’s MuZero algorithm to the cart pole environment in gym - GitHub, 访问时间为五月 5, 2025， https://github.com/chiamp/muzero-cartpole
AlphaGo/AlphaGoZero/AlphaZero/MuZero: Mastering games using progressively fewer priors - The VITALab website, 访问时间为五月 5, 2025， https://vitalab.github.io/article/2020/11/19/AlphaMuGoZero.html
MiniZero: An AlphaZero and MuZero Training Framework - GitHub, 访问时间为五月 5, 2025， https://github.com/rlglab/minizero
Integrate the MuZero algorithm into the AlphaZero.jl package - Archive Project Details | Google Summer of Code, 访问时间为五月 5, 2025， https://summerofcode.withgoogle.com/programs/2022/projects/YWv8Vbw1
MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games, 访问时间为五月 5, 2025， https://arxiv.org/html/2310.11305v3
jianzhnie/RLZero: A clean and easy implementation of MuZero, AlphaZero and Self-Play reinforcement learning algorithms for any game. - GitHub, 访问时间为五月 5, 2025， https://github.com/jianzhnie/RLZero
DeepMind has open-sourced the heart of AlphaGo and AlphaZero - Hacker News, 访问时间为五月 5, 2025， https://news.ycombinator.com/item?id=34801636
TurboZero: a vectorized implementation of AlphaZero + more : r/reinforcementlearning, 访问时间为五月 5, 2025， https://www.reddit.com/r/reinforcementlearning/comments/16he0m4/turbozero_a_vectorized_implementation_of/
werner-duvaud/muzero-general - GitHub, 访问时间为五月 5, 2025， https://github.com/werner-duvaud/muzero-general
CogitoNTNU/Deeptactics-AlphaZero: An implementation of the AlphaZero algorithm by Google Deepmind. Research paper here: https://arxiv.org/abs/1911.08265 - GitHub, 访问时间为五月 5, 2025， https://github.com/CogitoNTNU/Deeptactics-AlphaZero
kaesve/muzero: A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models. - GitHub, 访问时间为五月 5, 2025， https://github.com/kaesve/muzero
Alpha Zero / MuZero differences · Issue #143 - GitHub, 访问时间为五月 5, 2025， https://github.com/werner-duvaud/muzero-general/issues/143
KarelPeeters/kZero: A from-scratch general AlphaZero implementation for board games, 访问时间为五月 5, 2025， https://github.com/KarelPeeters/kZero
TheLeprechaun25/NCOLib: The Neural Combinatorial Optimization Library (NCOLib) is an accessible software library designed to simplify the application of neural network models and deep learning algorithms to solve combinatorial optimization problems. - GitHub, 访问时间为五月 5, 2025， https://github.com/TheLeprechaun25/NCOLib
ai4co/rl4co: A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO) - GitHub, 访问时间为五月 5, 2025， https://github.com/ai4co/rl4co
‪Md Kamruzzaman Sarker‬ - ‪Google Scholar‬, 访问时间为五月 5, 2025， https://scholar.google.com/citations?user=dnySX2QAAAAJ&hl=en
‪Hanchen Yang‬ - ‪Google Scholar‬, 访问时间为五月 5, 2025， https://scholar.google.com/citations?user=zoMQ8CoAAAAJ&hl=en
(PDF) Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/387184187_Scaling_of_Search_and_Learning_A_Roadmap_to_Reproduce_o1_from_Reinforcement_Learning_Perspective
Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance - ResearchGate, 访问时间为五月 5, 2025， https://www.researchgate.net/publication/384680456_Guided_Stream_of_Search_Learning_to_Better_Search_with_Language_Models_via_Optimal_Path_Guidance
dongjinkun/COPZoo: Neural Network for solving challenging Combinatorial Optimization Problems - GitHub, 访问时间为五月 5, 2025， https://github.com/dongjinkun/COPZoo
LAMDASZ-ML/Awesome-Neuro-Symbolic-Learning-with-LLM - GitHub, 访问时间为五月 5, 2025， https://github.com/LAMDASZ-ML/Awesome-Neuro-Symbolic-Learning-with-LLM

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

揭秘易开发核心功能：页面信息提取与无障碍服务配置详解

易开发（DeveloperHelper）是一款专为Android开发者打造的快速开发工具，核心功能包括界面分析、页面信息提取、加固脱壳等，完美支持Android 9.0系统。本文将深入解析其两大核心功能——**页面信息提取**与**无障碍服务配置**，帮助开发者快速掌握使用技巧，提升开发效率。## 一、无障碍服务：易开发的核心引擎 🚀无障碍服务是易开发实现界面分析的基础，通过系统级别的

魔乐社区

魔乐社区上线Qwen3.5全家桶！基于vLLM Ascend的昇腾部署教程已就位

魔乐社区

pry-rails源码探秘：Rails控制台增强插件的实现原理

pry-rails是一款为Rails >= 3应用提供Pry控制台支持的增强插件，它通过替换默认的IRB控制台，为开发者带来更强大的交互式编程体验。本文将深入剖析pry-rails的实现原理，帮助开发者理解其工作机制和核心功能。## Railtie：Rails集成的核心pry-rails的核心集成逻辑位于[lib/pry-rails/railtie.rb](https://link.git