Apache Spark Survey Reveals Increased Growth in Users and New Workloads Including Exploratory Data Science and Machine Learning

57% of Respondents Cite Cloudera as the Spark Platform of Choice For Their Most Important Use Cases

PALO ALTO, Calif., Nov. 07, 2016 (GLOBE NEWSWIRE) -- In order to better understand Apache Spark’s growing role in big data, Taneja Group conducted a major market research project, surveying approximately 7,000 people. The sample was made up of technical and managerial job roles from around the world directly involved in big data. The survey, which received an overwhelming response, explored experiences with and intentions for Spark adoption and deployment, current perceptions, favored vendors, and the future of Spark itself. Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, which sponsored the market research project, today announced the findings of the study.

An integrated part of CDH and supported with Cloudera Enterprise, Spark is the open standard for flexible in-memory data processing that enables batch, real-time, and advanced analytics on the Apache Hadoop platform.

“Apache Spark has grown rapidly into one of the leading big data open source projects,” said Mike Matchett, senior analyst and consultant at Taneja Group. “We found that across the broad range of industries, company sizes, and big data maturity levels represented, over one-half of respondents are already actively using Spark. It is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. With an increasing number of workloads requiring real-time data streaming for analytics, the emergence of machine learning applications and data science use cases, Spark is clearly here to stay.”

Cloudera’s Leadership in Spark

Cloudera became the first Hadoop vendor to ship and support Spark in early 2014 when it was quickly becoming the framework of choice for faster batch processing. Cloudera invested in its development early. Today many Cloudera users have transitioned data processing workloads from MapReduce to Spark in their production systems, drastically reducing their data processing windows. According to the survey this trend is accelerating.

Cloudera’s customers require Spark to be delivered at enterprise scale, backed by experts that have been involved in the genesis of making it the de-facto data processing engine for Hadoop. Cloudera continues to innovate via the One Platform Initiative aimed at enhancing Spark’s capabilities around management, security, scale, streaming, and cloud. Through the initiative, Cloudera is committed to helping the ecosystem adopt Spark as the default data execution engine for analytic workloads.

Cloudera works with partners to certify new solutions built on Spark and provides the resources and support needed to bring these differentiated solutions to market more quickly, ensuring customers can solve new and challenging use cases.

Survey Results

Key findings of the Apache Spark Market Research Study include a high level of growth and momentum for Spark usage beyond expected data processing/engineering ETL workloads and a future transition to cloud deployments. Other noteworthy findings include:

  • Nearly one-half of all respondents, 54 percent, are already actively using Spark. Of those presently using Spark, 64 percent say it’s proving invaluable and they intend on increasing usage of Spark within the next 12 months.
  • New Spark user adoption is also growing with 4 out of 10 people familiar with the big data project saying that they plan to deploy Spark in the very near term.
  • 57 percent rely on Spark, as provided by Cloudera, for their most important use cases, over twice the next three Apache Hadoop vendors combined. Customers that chose Cloudera over other solutions noted its regulatory-ready security and governance model, its stability and performance, its cloud portability and its integration with a complete suite of data processing, query, analytic and machine learning services as key factors.
  • Aside from the expected data processing/engineering/ETL workloads which make up 55 percent of reported Spark use today, the top active Spark initiatives include real-time stream processing, exploratory data science, and the emergence of Spark for machine learning. These are all areas where Cloudera continues to invest.
  • Barriers to adoption and challenges remain the same however, and are largely attributed to the big data skills gap and the ability to consume relevant training in a variety of formats (online, in-person, conference or tradeshow). Cloudera trains more Apache Spark professionals than any other Hadoop vendor and supports them through professional services, value consulting, and a wide breadth of partners.

“Our focus is on enterprise leadership at Cloudera and we provide the critical security, data governance and compliance that our customers need,” said Mike Olson, founder and chief strategy officer at Cloudera. “The results of the survey validate the importance placed on being fully enterprise-ready today and also well prepared to support future Spark use cases. It is the key reason that customers overwhelmingly choose Spark from Cloudera over other commercial vendors.”

The survey also details the elevated role of the public cloud and Spark: “Interestingly, while on-premises Spark deployments dominate today there is a strong interest in transitioning many of those to cloud deployments going forward,” said Matchett. “Overall Spark deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23% today to 36% in the future.”

Cloudera has created an infographic detailing the results of the report.

For more information

About Taneja Group

Taneja Group is a premiere boutique analyst firm providing analysis and consulting for the technology industry. All our research and guidance is targeted at technology vendors, IT end users and the venture capitalists. Taneja Group’s analysts cover technologies in the following areas: all aspects of storage, server virtualization, WAN Optimization, storage and application acceleration, eDiscovery and corporate governance. Cloud storage, Big Data and Data Center Convergence are inherently covered in these segments. The data center is undergoing a fundamental metamorphosis and our analysts are at the forefront of advising clients in terms of which technologies are crucial and when they should be implemented, for maximum effectiveness. For the eDiscovery industry we cover all aspects of the litigation workflow as well as related business processes including governance, compliance, records management, and data retention management.

About Cloudera

Cloudera delivers the modern data management and analytics platform built on Apache Hadoop and the latest open source technologies. The world’s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest and most secure data platform available for the modern world. Our customers efficiently capture, store, process and analyze vast amounts of data, empowering them to use advanced analytics to drive business decisions quickly, flexibly and at lower cost than has been possible before. To ensure our customers are successful, we offer comprehensive support, training and professional services.  Learn more at cloudera.com.

Connect with Cloudera

Read our blogs: cloudera.com/engblog and vision.cloudera.com

Follow us on Twitter: twitter.com/cloudera

Visit us on Facebook: facebook.com/cloudera

Join the Cloudera Community: cloudera.com/community

Cloudera, Cloudera's Platform for Big Data, Cloudera Enterprise Data Hub Edition, Cloudera Enterprise Flex Edition, Cloudera Enterprise Basic Edition, Cloudera Navigator Optimizer and CDH are trademarks or registered trademarks of Cloudera Inc. in the United States, and in jurisdictions throughout the world. All other company and product names may be trademarks of their respective owners.



Contact Data