Alluxio Virtualizes Distributed Storage for Petabyte Scale Computing at In-Memory Speeds

Supported by Alibaba, Baidu, Barclays, IBM, EMC, Intel and Other Industry Leaders, Alluxio Is the Next Major Innovation Out of UC Berkeley's AMPLab


SAN MATEO, CA--(Marketwired - Feb 23, 2016) - Alluxio (formerly known as Tachyon), the world's first memory-centric virtual distributed storage system, today announced its open source version 1.0 release. The vision for Alluxio is to become the de-facto storage unification layer for big data and other scale-out application environments in the same manner that Apache Spark became the standard computation layer.

Alluxio's memory-centric architecture provides orders of magnitude performance gains over existing solutions and superior manageability by allowing developers to interact with a single storage layer API without worrying about the configurations and complexities of underlying storage and file systems. Co-created by Haoyuan Li, CEO of Alluxio, Inc. and a founding committer of Spark, Alluxio ushers in the next generation of storage virtualization for petabyte scale computing.

"A storage unification layer that bridges computation frameworks and underlying storage systems is long overdue in the enterprise," said Haoyuan Li. "Alluxio is that unification layer with a memory-centric architecture. Alluxio enables any framework to access any data, from any storage at memory speeds."

Organizations can run any computation framework (e.g. Apache Spark, Apache MapReduce, Presto, etc.) with any storage system (e.g. Amazon S3, EMC, Google Cloud Storage, NetApp) and utilize any storage media (DRAM, SSD, HDD, etc.). As a memory-centric system, Alluxio yields orders of magnitude performance gains and manageability for existing configurations.

Only three years in existence, Alluxio has gained broad industry support as an open source project. With more than 200 contributors, 12,000 commits, and over 50 commercial organizations, Alluxio has surpassed many other open source projects in the same timeframe. Alluxio runs in production at some of the largest cloud providers for petabyte scale workloads, in financial services to meet government regulations, for research by leading universities, and at technology vendors globally.

Intel recently published its findings on the diverse range of big data storage challenges that Alluxio can address.

"Big data analytics is driving new requirements for distributed memory across clusters for real-time streaming, interactive queries, analytics and graph processing," said Michael Greene, Intel vice president, Software and Services Group and general manager of System Technologies and Optimization. "We are excited to work with developer communities on Alluxio and to optimize Alluxio solutions on Intel platforms. Ultimately, this helps our customers create more innovative and high performance cloud and big data solutions."

In financial services, Alluxio brings many advantages. It helps banks make faster and better trading decisions through dramatic performance improvements and also helps satisfy regulatory requirements. Barclays, the global financial services firm with 48 million customers and clients, recently published a report about how it uses Alluxio to boost big data analytics performance without duplicating confidential customer information to disk.

Last summer, IBM Research published a study about using Tachyon for "ultra-fast big data processing" to overcome "critical bottlenecks for system workloads."

For some of the world's cloud computing giants, Alluxio is allowing business analysts to discover insights interactively by analyzing petabytes of data in near real-time to improve customer experience.

"As one of the largest Internet companies in the world, Baidu constantly faces the challenges of managing data at multi-petabyte scale. By adopting innovative technologies like Alluxio we are able to help our users extract meaningful and useful data almost instantly," said James Peng, Chief Architect at Baidu. "Our deployment of an Alluxio cluster has already reached 1,000 workers, which is one of the largest Alluxio clusters in the world. The tiered storage of Alluxio has provided us great flexibility in managing data in large-scale. We are seeing an average 10-fold, and up to 30-fold performance improvement in supporting interactive query system and other types of workloads. This greatly improved the speed in making important business decisions."

"As the cloud computing business for Alibaba Group, the world's leading e-commerce business, Alibaba manages many of the world's largest data centers, including the largest big data cluster ever built in China," said Wensong Zhang, CTO and Senior Research Fellow of AliCloud, founder of Linux Virtual Server. "With Alluxio combined with AliCloud OSS as well as other AliCloud cloud service products, our customers can leverage the technology trends of hardware to run important jobs at the fastest performance. We have been contributing to the Alluxio open source community and believe that Alluxio will play a critical role in the future of big data infrastructure."

Background

As a PhD candidate at UC Berkeley, Haoyuan Li saw Spark adoption driving the requirements for more developer-friendly methods for how big data frameworks access persistent data at in-memory speeds. Formerly known as Tachyon, the Alluxio system quickly gained prominence in use cases that required in-memory storage speeds for Spark computation and received early backing from enterprise software and storage leaders, including EMC and Pivotal. Where storage and file systems have historically required high customization and tuning, Alluxio brings a unified interface that's intuitive for developers, easy for operators, and delivers unprecedented speeds for data access to support the broadest range of big data use cases such as machine learning, real-time analytics and streaming data.

"As a layer that abstracts away the differences of existing storage systems from the cluster computing frameworks such as Apache Spark and Hadoop MapReduce, Alluxio can enable the rapid evolution of the big data storage, similarly to the way the Internet Protocol (IP) has enabled the evolution of the Internet," said Prof. Ion Stoica, co-author of Spark, co-founder and executive chairman of DataBricks, co-director of UC Berkeley AMPLab and Ph.D. co-advisor to Haoyuan Li.

"AMPLab has created some of the most important open source technologies in the new big data stack, including Apache Spark," said Michael Franklin, Professor of Computer Science and Director of the AMPLab at UC Berkeley. "Alluxio is the next project with roots in the AMPLab to have major impact. We see it playing a huge disruptive role in the evolution of the storage layer to handle the expanding range of big data use cases."

To protect the project from potential trademark litigation and to preserve the intellectual property of the open source software community contributions internationally, the community changed the project name from Tachyon to Alluxio. A newly-created non-profit organization, Alluxio Open Foundation, will host the project.

In 2015, Andreessen Horowitz invested $7.5M in Alluxio Inc., which has since assembled a team consisting of the world's leading distributed computing experts from Carnegie Mellon University, Google, Palantir, UC Berkeley AMPLab and VMWare to continue to innovate and realize the vision for Alluxio.

ABOUT ALLUXIO, INC.
Alluxio, Inc. was founded by the creators and top contributors of the open source Alluxio project -- the first memory-centric virtual distributed storage system. Alluxio's memory-centric architecture provides orders of magnitudes performance gains and superior manageability by allowing developers to interact with a single storage layer API. Alluxio, Inc. is venture-backed by Andreessen Horowitz. For more information, contact info@alluxio.com.

Contact Information:

EDITORIAL CONTACT
Lonn Johnston
Flak42 for Alluxio, Inc.
lonn@flak42.com
650.219.7764