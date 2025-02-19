Singapore, Feb. 19, 2025 (GLOBE NEWSWIRE) -- On February 18, SkyReels open-sourced the world’s first human-centric video foundation model for AI short drama creation, SkyReels-V1, and the world’s first SOTA-level expressive portrait image animation based on video diffusion transformers, SkyReels-A1.

Open-Source Repositories:

Technical Report:

https://arxiv.org/abs/2502.10841

Official Website:

skyreels.ai

Addressing global pain points in AI video generation—such as closed-source models, limited accessibility, high costs, and usability issues—SkyReels is breaking new ground by open-sourcing two SOTA-level models and algorithms, SkyReels-V1 and SkyReels-A1. These cutting-edge technologies for AI short drama creation are now offered to the open-source community and AIGC users.

The production format for AI videos and short dramas has been market-validated. SkyReels helps address the challenges in traditional short drama production—such as complex offline processes including scriptwriting, casting, set design, storyboard creation, filming, and post-production, which require substantial manpower, incur high costs, and extend production cycles.

01

SkyReels-V1: Human-Centric Video Foundation Model, the world’s first open-source video generation model dedicated to AI short drama creation

AI short dramas require precise control over both cognitive and physical expressions—integrating lip-sync, facial expression, and body movement generation into a unified process. Currently, lip-sync generation is particularly well-developed, owing to its strong mapping with audio cues that enable high precision and superior user experience.

Yet, the true quality of AI short drama generation lies in the nuances of character performance. To dramatically enhance the controllability of facial expressions and body movements, SkyReels-V1 not only meticulously annotates performance details but also processes emotions, scene context, and acting intent, fine-tuning on tens of millions of high-quality, Hollywood-level data points.

Research team has implemented advanced technical upgrades to capture micro-expressions, performance subtleties, scene descriptions, lighting, and composition. As a result, characters generated by SkyReels now exhibit remarkably precise acting details—approaching an award-winning level.

SkyReels-V1 delivers cinematic-grade micro-expression performance, supporting 33 nuanced facial expressions and over 400 natural motion combinations that faithfully reproduce genuine human emotional expression. As demonstrated in the accompanying video, SkyReels-V1 can generate expressions ranging from hearty laughter, fierce roars, and astonishment to tears—showcasing rich, dynamic performance details.

Moreover, SkyReels-V1 brings cinematic-level lighting and aesthetics to AI video generation. Trained on Hollywood-level high-quality film data, every frame generated by SkyReels exhibits cinematic quality in composition, actor positioning, and camera angles.

Whether capturing solo performance details or multi-character scenes, the model now achieves precise expression control and high-quality visuals.

Importantly, SkyReels-V1 supports both text-to-video and image-to-video generation. It is the largest open-source video generation model supporting image-to-video tasks at equivalent resolution, achieving SOTA-level performance across multiple metrics.

Figure 1: Comparison of Text-to-Video Metrics for SkyReels-V1 (Source: SkyReels)

Such SOTA-level performance is made possible not only by SkyReels self-developed high-quality data cleaning and manual annotation pipeline—which has built a tens-of-millions–scale dataset from movies, TV shows, and documentaries—but also by “Human-Centric” multimodal video understanding model, which significantly enhances the ability to interpret human-related elements in video, particularly through in-house character intelligence analysis system.

In summary, thanks to the robust data foundation and advanced character intelligence analysis system, SkyReels-V1 can achieve:

Cinematic Expression Recognition: 11 types of facial expression understanding for characters in film and drama, including expressions such as disdain, impatience, helplessness, and disgust, with emotional intensity levels categorized into strong, medium, and weak.

11 types of facial expression understanding for characters in film and drama, including expressions such as disdain, impatience, helplessness, and disgust, with emotional intensity levels categorized into strong, medium, and weak. Character Spatial Awareness: Leveraging 3D reconstruction technology to comprehend spatial relationships among multiple characters, enabling cinematic positioning.

Leveraging 3D reconstruction technology to comprehend spatial relationships among multiple characters, enabling cinematic positioning. Behavioral Intent Understanding: Constructing over 400 behavioral semantic units for precise action interpretation.

Constructing over 400 behavioral semantic units for precise action interpretation. Scene-Performance Correlation: Analyzing the interplay between characters, wardrobe, setting, and plot.

https://youtu.be/U2SXDzQGnbQ





SkyReels-V1 is not only among the very few open-source video foundation models worldwide, but it is also the most powerful in terms of performance for character-driven video generation.

With SkyReels self-developed inference optimization framework “SkyReels-Infer,” the inference efficiency has been significantly improved —achieving 544p video generation on a single RTX 4090 in just 80 seconds. The framework supports distributed multi-GPU parallelism, Context Parallel, CFG Parallel, and VAE Parallel. Furthermore, by implementing FP8 quantization and parameter-level offload, it meets the requirements of low-memory GPUs; support for flash attention, SageAttention, and model compilation optimizations further reduces latency; and leveraging the open-source diffuser library enhances usability.





Figure 2: Using equivalent RTX 4090 resources (4 GPUs), the SkyReels-Infer version reduces end-to-end latency by 58.3% compared to the HunyuanVideo official version (293.3s vs. 464.3s).







Figure 3: Under similar A800 resource conditions, the SkyReels-Infer version reduces end-to-end latency by 14.7%–28.2% compared to the HunyuanVideo official version, demonstrating a more robust multi-GPU deployment strategy.

02

SkyReels-A1: The First SOTA-Level Expressive Portrait Image Animation Algorithm Based on Video Diffusion Transformers

To achieve even more precise and controllable character video generation, the SkyReels is open-sourcing SkyReels-A1, a SOTA-level algorithm based on video diffusion transformer for expression and action control. Comparable to Runway Act-One, SkyReels-A1 supports video-driven, film-grade expression capture, enabling high-fidelity micro-expression reproduction.

SkyReels-A1 can generate highly realistic and consistency videos for characters in any reference conditions—from portrait of half-body to full-body shots. It achieves a precise simulation of facial expressions, emotional nuances, skin textures, and body movements.

By inputting both a reference and a driving video, SkyReels-A1 “transplants” the facial expressions and actions details from the driving video onto the character in the reference image. The resulting video shows no distortion and faithfully reproduces the micro-expressions and body movements from the driving video, even surpassing the video quality generated by Runway Act-One in evaluation.

More encouragingly, SkyReels-A1 not only supports profile-based expression control but also enables highly realistic eyebrow and eye micro-expression alignment, along with more pronounced head movements and natural body motions.

For example, in the same dialogue scene, while the character generated by Runway Act-One shows noticeable distortion and deviates from the original appearance, SkyReels-A1 preserves the character's details, maintaining authentic nuance and seamlessly blending facial expressions as well as body movements.

Furthermore, SkyReels-A1 can drive more dramatic facial expressions scenes. Compared to Runway Act-One that fails to generate the desired effect, SkyReels-A1 support to transfer more complex expression dynamics, enabling the character’s facial emotions to naturally synchronize with body movements and scene content for an exceptionally life–like performance.

03

Empowering the Global AI Short Drama Ecosystem through Open-Sourcing

Video generation models are among the most challenging components of AI short drama creation. Although model generation capabilities have significantly improved over the past year, there remains a considerable gap—particularly given the high production costs.

By open-sourcing our SOTA-level models, SkyReels-V1 and SkyReels-A1, SkyReels becomes the first in the AI short drama industry to take such a step. This initiative not only represents a modest yet significant contribution to the industry but also marks a major leap toward fostering a flourishing ecosystem for AI short drama creation and video generation.

It's believed that with further advancements in inference optimization and the open-sourcing of controllable algorithms, these models will soon provide users with more cost-effective and highly controllable AIGC capabilities. SkyReels aims to empower users to create AI short dramas at minimal cost, overcome current issues of inconsistent video generation, and enable everyone to generate detailed, controllable character performances using their own computers.

This open-sourcing of our video generation models is not only a technological breakthrough that helps narrow the digital divide in the global content industry, but it is also a revolution in cultural production capacity. In the future, the convergence of short dramas, gaming, virtual reality, and other fields will accelerate industrial integration. AI short dramas have the potential to evolve from a “tech experiment” into a mainstream creative medium and become a new vehicle for global cultural expression.

“Achieve artificial general intelligence and empower everyone to better shape and express themselves.” With this open-source initiative, SkyReels will continue to release more video generation models, algorithms, and universal models—advancing AGI equity and fostering the sustained growth and prosperity of the AI short drama ecosystem, while benefiting the open-source community, developer ecosystems, and the broader AI industry.

