AniMaker: Multi-Agent Animated Storytelling
with MCTS-Driven Clip Generation
AniMaker is a multi-agent framework designed to efficiently generate coherent,
long-form storytelling animations from text input. It integrates multi-candidate
clip generation, intelligent selection, and story-level consistency.
(* Corresponding Authors)
1 Harbin Institute of Technology (Shenzhen) 2 Alibaba International Group

Components
Generates the storyboard from the input text, defining multi-scene and multi-character narratives.
Uses MCTS-Gen, an MCTS-inspired strategy, to efficiently generate multiple candidate clips and select high-potential ones.
Employs AniEval, the first evaluation framework for multi-shot animation, to assess story-level consistency, action completion, and animation features across clips.
Edits the final sequence, ensures smooth transitions, and adds voiceovers for a production-quality output.
Workflow
Comparison
| MovieAgent | MMStoryAgent | VideoGenoT | AniMaker (Ours) |
|---|---|---|---|
Citation
@article{shi2025animaker,
title={AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation},
author={Shi, Haoyuan and Li, Yunxin and Chen, Xinyu and Wang, Longyue and Hu, Baotian and Zhang, Min},
journal={arXiv preprint arXiv:2506.10540},
year={2025}
}