Cloudera Leads the Way in Real-Time Streaming with End-to-End Architecture Innovation and Trusted Partner Certification

Leadership in Powerful Streaming Technologies, Continued Community Innovation, and New Partner Accelerator Program Help Customers Harness Real-Time Data

PALO ALTO, Calif., Oct. 14, 2014 (GLOBE NEWSWIRE) -- Cloudera, the leader in enterprise analytic data management powered byApache Hadoop™, today announced that, as the leading provider of an integrated, comprehensive real-time streaming solution, they will continue to drive innovation with the introduction of the Cloudera Accelerator Program and Cloudera Labs.

With Hadoop deployments shifting from proof-of-concept experiments to enterprise-grade, mission-critical production implementations, they take on new workloads that require the power and flexibility of proven frameworks and tools. Cloudera's enterprise data hub is built with these carefully curated components, integrated into one enterprise-grade platform. Apache Spark is one of the most popular components, due to its ease of use and extensibility across multiple use cases. With the ability to handle batch processing, iterative algorithms, and real-time stream processing all within the same processing environment, this general-purpose framework opens up the potential of Hadoop through improved accessibility and processing speed. Spark has been broadly embraced by the open source community, Big Data vendors, and data-intensive enterprises and Cloudera has been working in partnership with Databricks, IBM, Intel, and MapR to further extend support for Spark as the standard data processing engine for the Hadoop ecosystem.

For real-time stream processing, a rapidly growing use case for enterprises, speed, resiliency, and integration are key. Spark delivers on all those and is a core part of a streaming architecture, working together with ingestion tools like Apache Flume and real-time data serving frameworks like Impala. Cloudera was the first to integrate and support Spark in their platform and has dedicated the resources to enhance Spark, especially around enterprise-grade capabilities.

To further advance real-time and streaming architectures, Cloudera has launched the Cloudera Accelerator Program. The Accelerator Program drives innovation across the Hadoop ecosystem and ensures customers always have access to the leading, integrated technologies. Cloudera will work with partners to certify innovative applications being built on proven frameworks, such as Spark and Impala, and will provide the resources and support needed to bring these differentiated solutions to market more quickly so customers can solve new and challenging use cases. The Cloudera Accelerator Program has already accepted many key partners looking to validate and support their exciting applications. More information on these partners can be found on the attached quote sheet.

"Cloudera saw the value in Spark early, and we were the first to adopt Spark as part of our Hadoop platform--making it an integrated and supported component," said Doug Cutting, chief architect, Cloudera. "We are continuously driving the roadmap for Spark and adding enterprise capabilities. As a result, our customers now have more diverse streaming use cases in production than all our competitors combined. With the Cloudera Accelerator Program, our customers will continue to have access to cutting-edge Spark applications to further expand the reach of their enterprise data hubs."

Cloudera recognized the business importance of real-time processing early; they were the first to commercially offer Apache HBase, created both Apache Flume and Impala, and were the first to offer and support Spark. Cloudera is dedicated to ensuring a first-class experience with real-time processing, especially as new tools and applications are developed.

Kafka via Cloudera Labs

To further drive innovation around Hadoop, Cloudera is also announcing the launch of Cloudera Labs. Cloudera Labs is a virtual center for fostering innovations in incubation within Cloudera's engineering teams and fast-tracking promising open source initiatives on the leading edge of adoption. Cloudera Labs aims to bring more use cases, productivity, and value to developers by seeking and exploring new solutions to their problems.

One of the most promising projects under way across the Hadoop ecosystem is Apache Kafka, a highly scalable, fault-tolerant publish-subscribe messaging system. Kafka, founded and in production at LinkedIn, can broker terabytes of data from thousands of users across a single cluster serving as the backbone for any large organization. Kafka is already well-integrated with systems like Spark and other components of an enterprise data hub. As a Labs initiative, Cloudera will explore Kafka further in support of applications that would immediately benefit from such elasticity, scale, and performance using a distributed messaging system. For those interested in experimenting with Kafka, a downloadable binary is now available.

To learn more about Spark and Cloudera, read "Our Commitment to Accelerating Apache Spark." To start exploring Spark and other components of an enterprise data hub, download Cloudera 5.2, Cloudera's open source Hadoop Platform.

Additional News

Today Cloudera also announced:

●      Cloudera Unveils Cloudera Enterprise 5.2

●      Cloudera Announces Cloudera Director

●      Cloudera Releases Impala 2.0: The Leading Open Source Analytic Database for Apache Hadoop

About Cloudera

Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Big Data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera's open source Big Data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 22,000 individuals worldwide. Over 1,200 partners and a seasoned professional services team help deliver greater time to value. Finally, only Cloudera provides proactive and predictive support to run an enterprise data hub with confidence. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production.

Connect with Cloudera

Read our blogs: and

Follow us on Twitter:

Visit us on Facebook:

Join the Cloudera Community:

Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Edition and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.


Partner Quotes:

"Adatao is one of the earliest pioneering users and contributors to Apache Spark. We were first to deliver a beautiful, real-time, interactive application for collaboration between BI analysts and data scientists, directly on Hadoop data. We're excited to partner with Cloudera, who has announced full enterprise support for Spark, and look forward to bringing the value of this powerful combination to our customers," said Christopher T. Nguyen, Co-founder & CEO of Adatao

"Cloudera's leadership on Spark has delivered real innovations that our customers depend on for speed and sophistication in large-scale machine learning. From everything from improving health outcomes to predicting network outages, Spark is emerging as the 'must have' layer in the Hadoop stack." - Steven Hillion, co-founder and Chief Product Officer at Alpine Data Labs

"Tresata is one of the first predictive analytics software companies to make its entire Hadoop powered software suite 'real-time' ready. By leveraging the embedded Spark functionality that is part of CDH, Tresata is delivering rapid customer intelligence inherent in financial data, heralding a future where all big data applications will be real-time." - Koert Kuipers, CTO, Tresata

"Since our beginning, Skytree has been dedicated to driving the advanced analytics industry for Big Data forward by providing our customers with the best Machine Learning based solutions that easily translate their data assets into actionable intelligence," said Terison Gregory, Senior Director of Product Management at Skytree. "Increasingly, our customers are requesting support for streaming data and in particular Spark Steaming. In response, we are announcing Skytree Infinity™ which represents a major step in our product roadmap to better support Spark Core and Spark Streaming. We are excited about participating in the Cloudera Accelerator Program because their roadmap for Spark Streaming aligns with Skytree for delivering Spark Streaming support to our customers."

"Trifacta leverages Spark to deliver self-service data transformation across a wide range of data sizes," said Sean Kandel, co-founder and CTO of Trifacta. "Using the flexibility of the Cloudera platform, our customers have the ability to visually define their transformation logic once and then have Trifacta optimize the execution across any number of nodes and any number of processing frameworks."

"All of the Internet is going to be rewired with Machine Intelligence. Sparkling Water converges elegant APIs, fast machine learning and in-memory predictive analytics. A unified developer experience for data munging, modeling and scoring is critical for building smarter applications and accelerate the adoption of big data. We teamed up closely with folks at Cloudera to make deployment real easy on Apache Spark as part of CDH," said SriSatish Ambati, CEO and Co-Founder of by 0xdata. "This effort is bringing together the Open Source communities of Data Science and Application Developers. Sparkling Water is the killer-app on Apache Spark!"

"The future is about superpowering every analyst -- eliminating IT bottlenecks while allowing easy collaboration across roles. Spark is being woven into our platform, creating the definitive end-to-end Business Analyst workflow built on Spark." - Peter Schlampp, Vice President of Product, Platfora

"We use QuantCell to interactively create, test and run Spark analytics on CDH5 clusters, visualize the results and pull other CDH elements into the analysis, such as Impala for a combination of lightning fast SQL processing and Spark memory based analytics." - Agust Egilsson, Founder, QuantCell Research

"SAS is partnering with Cloudera to build out its data quality and analytics stack to take advantage of the processing power of Apache Spark. The partnership will provide joint customers with the ability to improve the quality of their Big Data using SAS and gain actionable intelligence." - Mike Ames, Director of Data Science & Emerging Technology Product Management at SAS

"Zoomdata is the one of the first data visualization technologies to embrace Spark at the core of its data analysis engine. Zoomdata already allows for lightning-fast interactive reporting, dashboarding, and visualization of data stored in Cloudera Impala and Cloudera Search, and is excited to now offer support for data stored in and streamed through Spark." - Justin Langseth, CEO, Zoomdata

"Talend services the open source community and customers through its support of Apache Spark and being certified to work with Apache Spark as part of CDH enables customers to leverage their existing investments in partner products and Cloudera," said Laurent Bride, CTO and Head of Engineering at Talend. "Having the ability to jointly offer Spark support validates the innovation in the big data space, driven by open source vendors."

"Diyotta leverages Spark for complex data processing using its in-memory computational capabilities. With Spark as part of CDH, Diyotta makes Big Data Integration simple & powerful through Diyotta's unique data integration solution which provides a comprehensive & intuitive GUI for using native functionality of Spark and enables developers to focus on real business problems by hiding the complexity and enabling customers to implement Big Data Integration solutions over Hadoop much faster, easier and effective," said Sanjay Vyas, CEO of Diyotta.

"Syncsort's participation in the Cloudera Accelerator program enhances Apache Spark as part of CDH by tapping into all enterprise data stores, ultimately leveraging our customer's existing investments," said Tendu Yogurtcu, Vice President Engineering Syncsort. "Syncsort powers Apache Spark with mainframe data, enabling users to access and extract insights from mainframe data sources."

"Atigeo has been an early adopter and contributor to Apache Spark, Shark, SparkSQL, Tachyon & Mesos. We have seen dramatic performance gains combined with lower operational costs by adopting Apache Spark, and now have production deployments that validate the technology's readiness," said David Talby, SVP of Engineering at Atigeo.

"Our Hadoop connector, Elasticsearch for Apache Hadoop, utilizes native integration with Apache Spark as part of CDH to provide real-time data discovery and exploration with a full-blown search and analytics engine," said Jobi George, Senior Director of Business Development at Elasticsearch, Inc. "We are pleased to partner with Cloudera to help businesses around the world extract insights out of their data in real-time."

"RapidMiner is integrating with Spark machine learning algorithms and encapsulates them as operators of its completely code-free workflow interface. By this, RapidMiner on Spark empowers business analysts and data-savvy business managers to create predictive models on top of Spark faster and easier than ever. RapidMiner on Spark supports the collaboration of data scientists with business analysts in a purely visual environment and easily combines Hadoop tools like Impala, Hive, Pig, Mahout, and now Spark to achieve great Big Data analytics results faster than ever." - Ingo Mierswa, CEO of RapidMiner

"SnapLogic is adding native support for Apache Spark as part of CDH in upcoming releases of our Elastic Integration Platform," said Greg Benson, Chief Scientist at SnapLogic. "In the first phase, we are adding a Spark Snap that takes advantage of a Spark cluster co-located with our Snaplex processing engine. This allows SnapLogic pipelines to stream data into a Spark resilient distributed dataset (RDD). Our goal is to make it easy to deliver data to Spark from disparate sources such as conventional databases, cloud applications, APIs and any SnapLogic-supported destination. Further applications of Spark include combining our SnapReduce computations and Spark computations into coordinated workflows via Snaplogic pipelines and providing the data wrangling capabilities that will allow organizations to double the productivity of their data scientists."

"We are excited to continue our collaboration with Cloudera. After adding certified support for Hive and Impala earlier this year, supporting Apache Spark as part of CDH is a big step towards integrating big data analysis seamlessly within the KNIME open analytics platform as well," said Michael Berthold, CEO of KNIME.


Contact Data