Apache Software Foundation Announces New Top-Level Project Apache® Celeborn

The intermediate data service for big data computing engines has graduated from incubation

Wilmington, DE, April 23, 2024 (GLOBE NEWSWIRE) -- The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 320 active open source projects and initiatives, today announced Apache Celeborn has graduated from incubation and is now a Top-Level Project (TLP). Celeborn is an intermediate data service for big data computing engines used to boost performance, stability, and flexibility. Celeborn is now available for download.

"I extend my sincere congratulations to the Celeborn community on their ascension to a Top-Level Project within the Apache Software Foundation," said Yu Li, an ASF Member and mentor in the Incubator program. "As incubating champion of the project, it fills me with great joy to observe the strides in development and community expansion, all in alignment with ASF’s ethos of valuing community over code. The Celeborn community not only has consistently delivered substantial releases brimming with innovation but also has cultivated a nurturing environment that eagerly embraces new contributors."

Celeborn Overview & Advantages
Celeborn is an intermediate data service –- specifically, a Remote Shuffle Service (RSS) – for big data computing engines. Users can leverage Celeborn to enhance performance, stability and elasticity for their computing engines.

Celeborn Feature Highlights

  • Decoupled Storage and Compute: Celeborn integrates with big data engines such as Spark, Flink, and MapReduce enabling writing shuffle data and reading shuffle data  to and from Celeborn;
  • Performance Boost: Celeborn allows users to reorganize shuffle data to be more I/O efficient, and leverages freed memory for caching;
  • High Availability: By leveraging Apache Ratis, a highly customizable Raft protocol library in Java, Celeborn enables replication for high availability, reducing the risk of failure in the event of an interruption;
  • High Fault Tolerance: In the event of a failure, Celeborn invokes a retry mechanism to aid in recovery;
  • Fast, Graceful Upgrade: Users benefit from fast restarts and upgrades without risk of data loss; and
  • Flexible Deployment: Celeborn can be deployed standalone or deployed on Kubernetes.

"Celeborn provides us with the foundational conditions for implementing co-located online and offline services in a heterogeneous environment, helping us to effectively achieve joint scheduling and utilization enhancements between online services and big data clusters, resulting in substantial cost savings,” said Li Luo, Director of Data Infra from Shopee. "Moreover, being able to continuously contribute to the community and witness its graduation is a rewarding experience. Collaborating with a group of outstanding developers whom we've never met in person, as well as reconnecting with former colleagues and actively participating in the ongoing development and iteration of Celeborn, has been a great journey.” 

"Warm congratulations to the Apache Celeborn project for smoothly completing its incubation phase and successfully graduating from the Apache Incubator,” said Qin Yao, Leader of the Spark team from NetEase. "This milestone achievement fully reflects the Celeborn community's deep commitment to practicing the Apache Way, as well as its determination and contributions to enduring exploration in the big data field. This is a new beginning, and we wish the Celeborn project continued success and an increasingly diverse community!"

Since being open sourced in 2021, Celeborn has served as an intermediate data service  to boost performance, stability and elasticity for big data compute engines. Many companies are deploying Celeborn in production environments today, such as Alibaba, BIGO, Bilibili, Xiaohongshu, Shopee, and Trip.com, among others.

To serve users better, Celeborn plans to add TLS for communication; support object stores as storage backend; support more types of intermediate data such as spilled data;integrate additional engines such as Tez; keep improving performance, and much more to fulfill users' requirements. For future updates and feature add-ons, visit https://celeborn.apache.org/

Additional Resources 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision-making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit https://incubator.apache.org/.

About The Apache Software Foundation (ASF)
Founded in 1999, the Apache Software Foundation exists to provide software for the public good with support from more than 75 sponsors. ASF’s open source software is used ubiquitously around the world with more than 8,400 committers contributing to 320+ active projects including Apache Superset, Apache Camel, Apache Flink, Apache HTTP Server, Apache Kafka, and Apache Airflow. The Foundation’s open source projects and community practices are considered industry standards, including the widely adopted Apache License 2.0, the podling incubation process, and a consensus-driven decision model that enables projects to build strong communities and thrive. https://apache.org

ASF’s annual Community Over Code event is where open source technologists convene to share best practices and use cases, forge critical relationships, and learn about advancements in their field. https://communityovercode.org/ 

© The Apache Software Foundation. “Apache” is a registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

Media Contact


Contact Data