SAN FRANCISCO, CA--(Marketwired - June 07, 2017) - Spark Summit -- Cask (, the company that makes building and deploying big data solutions easy, today at Spark Summit San Francisco announced the newest release of Cask Data Application Platform (CDAP), version 4.2. CDAP 4.2 delivers expanded support for Spark and enhanced, user-centric self-service data ingestion and preparation capabilities. It also expands the pre-built and easy-to-deploy solutions in Cask Market to include Change Data Capture (CDC) support for SQL Server and Oracle. These new capabilities will help accelerate user productivity in Spark and Hadoop projects, reducing initial time to value and time to production for big data solutions significantly.

"Simplified code development, workload flexibility and faster data processing have generated huge interest in Spark", said Jonathan Gray, Cask founder and CEO. "But as is the case with many other big data technologies, operationalizing Spark and scaling workloads from prototype to production present their own set of challenges for IT teams, greatly extending timelines and often putting the success of projects at risk. With broad support for Spark and increasingly code-free, interactive data integration capabilities, this latest release of CDAP dramatically shortens the time to prepare and ingest data and to test, run and deploy Spark data pipelines on that data. This means simplified onboarding, better productivity and faster time to production for data lakes and data-driven applications on Spark and Hadoop."

CDAP 4.2 adds support for Spark 2.x, which includes the new DataFrame/DataSet/SQL APIs, as well as the new Spark2 runtime. As a result, CDAP users will be able to easily upgrade their Spark programs from Spark 1.x to Spark 2.x. Furthermore, when building a data pipeline, or exporting a data pipeline from their CDAP SDK environment to a cluster, users will not have to be concerned with the version of Spark running on the target cluster. CDAP 4.2 also adds a new, interactive experience for Spark developers, enabling them to add custom Spark transformation logic to a data pipeline, run the code and get results from it quickly, all directly from the user interface.

CDAP 4.2 introduces a new user-centric, self-service data preparation workflow that allows users to easily connect to existing data sources, offers them simple point-and-click interactive data preparation to transform data, and provides push button operationalization of ingestion and transformation work as production pipelines. Additional enhancements in CDAP 4.2 include advanced scheduling capabilities designed to boost scalability and flexibility in production environments. The new, more scalable CDAP scheduler allows for data-driven schedules, event-based triggers, and the definition of constraints that can be used to triage multiple jobs running on the same cluster.

The Cask Market update for CDAP 4.2 offers new, pre-built assets, expanding the list of reusable, ready-to-use big data solutions and components available for push button deployment. Introducing EDW Offload as a pre-built, packaged solution in Cask Market with CDAP 4.1, Cask Market now offers real-time Change Data Capture (CDC) for SQL Server and Oracle with Spark Streaming, enabling data to be in sync between the source databases and Hadoop. This allows CDAP users to use Change Data Capture instead of traditional ETL for their EDW Offload workloads, improving efficiency while reducing latency of the data extracted from their source data systems. In addition to CDC, Cask Market now also features XSD-based, complex XML readers as well as connectors for Apache Kafka, Apache Kudu, HP Vertica and others.

"Enterprises derive the most value from Hadoop and Spark with configurable data applications. Yet these applications can be hard to create, and even harder to manage in production settings", said John L. Myers, Managing Research Director, Enterprise Management Associates, a Boulder, CO-based analysis firm. "CDAP 4.2 encapsulates the complexity and difficulties of the do-it-yourself approach from organizations. This approach empowers companies to tackle big data applications from data prep to production implementation quickly and speed time to implementation."

Additional Resources

To learn more about CDAP 4.2 and its new capabilities, please check out the following resources:

  • Live demos of CDAP 4.2 and Cask Market at Spark Summit 2017 in San Francisco from June 6-7 at booth #501, and at DataWorks Summit in San Jose from June 13-15 at booth #806
  • A blog by the Cask engineering team offering additional technical details about the new capabilities in CDAP 4.2 and Cask Market
  • A webinar on June 8 at 11am PT / 2pm ET with Cask Software Engineer Terence Yim, during which he will demonstrate how to interactively create production-ready data pipelines using business logic written in Spark
  • A webinar on June 28 at 11am PT / 2pm ET with Cask Software Engineer Sagar Kapare, during which he will demonstrate the pre-built Change Data Capture solution for EDW Offload available in Cask Market
  • An Ask the Experts panel with participation from Cask Senior Vice President of Sales Steve Huber at the Cloudera Session Boston on June 8


CDAP 4.2 is available immediately on the Cask website. In addition, CDC for SQL Server and Oracle, as well as other pre-built components and solutions, are available immediately through Cask Market.

About CDAP

The first unified integration platform for big data, Cask Data Application Platform (CDAP) lets developers, architects and data scientists focus on applications and insights rather than infrastructure and integration. CDAP, which is 100% open source, accelerates time to value from Hadoop through standardized APIs, configurable templates and visual interfaces. With a radically simplified developer experience and a code-free self-service environment, CDAP enables IT enterprises to broaden the big data user base and seamlessly integrates with existing MDM, BI and security and governance solutions.

About Cask

Cask makes building and running big data solutions on-premises or in the cloud easy with Cask Data Application Platform (CDAP), the first unified integration platform for big data. CDAP reduces the time to production for data lakes and data applications by 80%, empowering the business to make better decisions faster. Cask customers and partners include AT&T, AWS, Cloudera, Ericsson, Google, IBM, Lotame, Microsoft, Salesforce and Tableau, among others. For more information, visit the Cask website at and follow @caskdata.

Contact Information:

Max Herrmann