AniMaker: Multi-Agent Animated Storytelling
with MCTS-Driven Clip Generation

AniMaker is a multi-agent framework designed to efficiently generate coherent,
long-form storytelling animations from text input. It integrates multi-candidate
clip generation, intelligent selection, and story-level consistency.

Haoyuan Shi1, Yunxin Li1, Xinyu Chen1, Longyue Wang2, Baotian Hu*1, Min Zhang1

(* Corresponding Authors)

1 Harbin Institute of Technology (Shenzhen) 2 Alibaba International Group

AniMaker pipeline

Components

Director Agent

Generates the storyboard from the input text, defining multi-scene and multi-character narratives.

Photography Agent

Uses MCTS-Gen, an MCTS-inspired strategy, to efficiently generate multiple candidate clips and select high-potential ones.

Reviewer Agent

Employs AniEval, the first evaluation framework for multi-shot animation, to assess story-level consistency, action completion, and animation features across clips.

Post-Production Agent

Edits the final sequence, ensures smooth transitions, and adds voiceovers for a production-quality output.

Workflow

Comparison

MovieAgent MMStoryAgent VideoGenoT AniMaker (Ours)

Citation

@article{shi2025animaker,
  title={AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation},
  author={Shi, Haoyuan and Li, Yunxin and Chen, Xinyu and Wang, Longyue and Hu, Baotian and Zhang, Min},
  journal={arXiv preprint arXiv:2506.10540},
  year={2025}
}