PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing

Teaser figure from the PerformRecast paper showing expression editing, cross-reenactment, and portrait animation results. — Our proposed PerformRecast is capable of editing the facial expression of a source portrait video as well as animating a static portrait image according to a driving video. The top part of this figure shows the expression editing results of a movie clip and a 3D animation. The generated results exhibit high fidelity to the driving video, facilitating the production processes in film and animation industries. For the shot with multiple characters, we can select the specific person whose facial expression we want to edit, which is indicated by a red dashed box. The bottom-right insets on the top part show the driving frames.

Abstract

This paper primarily investigates the task of expression-only portrait video performance editing based on a driving video, which plays a crucial role in animation and film industries. Most existing research mainly focuses on portrait animation, which aims to animate a static portrait image according to the facial motion from the driving video. As a consequence, it remains challenging for them to disentangle the facial expression from head pose rotation and thus lack the ability to edit facial expression independently. In this paper, we propose PerformRecast, a versatile expression-only video editing method which is dedicated to recast the performance in existing film and animation. The key insight of our method comes from the characteristics of 3D Morphable Face Model (3DMM), which models the face identity, facial expression and head pose of 3D face mesh with separate parameters. Therefore, we improve the keypoints transformation formula in previous methods to make it more consistent with 3DMM model, which achieves a better disentanglement and provides users with much more fine-grained control. Furthermore, to avoid the misalignment around the boundary of face in generated results, we decouple the facial and non-facial regions of input portrait images and pre-train a teacher model to provide separate supervision for them. Extensive experiments show that our method produces high-quality results which are more faithful to the driving video, outperforming existing methods in both controllability and efficiency.

Method

Overview of the PerformRecast framework. — An overview of our PerformRecast framework. The motion extractor extracts canonical keypoints, head pose, expression deformation, scale factor and translation from source and driving frames. The facial keypoints are obtained via our improved keypoints transformation formula and compared with the tracking results from Pixel3DMM to calculate the FLAME loss. Finally, the appearance feature volume, source and driving keypoints are sent to the warping module, followed by the decoder to reconstruct the driving frame.

Expression-only Video Performance Editing

Visual Comparisons

Replacement

Enhancement

3D Animation

YouTube video ID not configured yet for shenmu_total.

YouTube video ID not configured yet for tianxiang_total.

Portrait Animation

Self-reenactment

Cross-reenactment

BibTeX

@article{liang2026performrecast,
  title={PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing},
  author={Liang, Jiadong and Xiong, Bojun and Tian, Jie and Li, Hua and Long, Xiao and Zheng, Yong and Fu, Huan},
  journal={arXiv preprint arXiv:2603.19731},
  year={2026}
}