PALO ALTO, Calif., Aug. 30, 2017 (GLOBE NEWSWIRE) -- Cask (, the company that makes building and deploying big data solutions easy, today announced that it is broadening its award-winning big data integration offering by adding new, pre-built frameworks for common and critical enterprise data lake patterns, available for the newly released 4.3 version of Cask Data Application Platform (CDAP). CDAP 4.3 delivers an expanded set of capabilities for self-service data preparation, self-service data pipelines, support for the Spark Python API (PySpark), and more. Designed to make it easier for less technical stakeholders to access and manipulate data, the new offering will broaden the base of big data users within the organization while reducing the time to gain access to clean, trusted data.

“Data scientists, citizen developers and business analysts are in a unique position to make decisions regarding the quality of data, its relevance to the business, and how best to apply it. Their perspectives are required to ensure the true value of data is captured and exploited, making them critical enablers of their organizations’ digital transformation initiatives,” said Jonathan Gray, Cask founder and CEO. “Historically, their ability to gain access to data in the data lake was often limited due to the need for specialized coding skills or to involve expertise within their IT organizations. With today’s announcement, we continue on our path to democratize access to all data for any user, while maintaining the necessary enterprise-level security and governance. This latest release of CDAP takes data preparation and data pipelines to the next level of ease-of-use while significantly expanding validation and rules capabilities.”

Available for download immediately, CDAP 4.3 introduces a new data preparation framework for efficiently transforming and validating data during onboarding of a new data source – without having to write any code. New User-Defined Directives (UDD) provide an easier way for users to build, integrate and deploy custom data processing directives within the data preparation step. CDAP 4.3 also provides a number of enhancements for building and managing interactive data pipelines, including an improved pipeline studio to streamline the creation of large numbers of pipelines. CDAP 4.3 now also offers statistical insights into historical runs of pipelines and security enhancements including integration with Apache Ranger.

For data scientists writing Spark jobs in Python, CDAP 4.3 now supports the Spark Python API for Spark 1.x and 2.x. As a result, PySpark users can now inject their Spark transformation logic into a CDAP data pipeline, run the code and get results directly from the user interface.

Cask today also announced a pre-built, distributed rules engine, as well as a new real-time microservices framework; these new offerings leverage the platform capabilities available in CDAP, and are available under separate licenses from Cask. As a sophisticated if-then-else statement interpreter that runs natively on Apache Spark, Hadoop, Amazon EMR, Azure HDInsight and GCE, the new Cask Distributed Rules Engine addresses the gap for a horizontally scalable, inference-based business rules engine for big data processing. It provides an alternative computational model for transforming data while empowering business users to specify and manage data transformations and policy enforcements. Cask is also introducing the Cask Microservices Framework which allows developers to build loosely connected services, increasing application modularity, and making it easier to deploy, modify and maintain applications.

Earlier this month, Cask announced that Thomson Reuters, the world’s leading source of news, data and information that power professional markets, uses Cask Data Application Platform (CDAP) to augment development of a new, large-scale data lake. Thomson Reuters found that by significantly reducing the amount of operational coding and tool integration, CDAP cut the average time to implement straight forward data ingest solutions by more than 60%.

Additional Resources

  • A blog by the Cask engineering team offering additional technical details about the new capabilities in CDAP 4.3
  • A 4-part webinar series starting on September 14, 2017 at 11am PT / 2pm ET with Cask co-founder and CTO Nitin Motgi, which will include a demonstration of the new data preparation capabilities in CDAP 4.3, the distributed rules engine, and other Cask innovations
  • A case study describing Thomson Reuters’ use of CDAP, and providing an overview of the customers’ challenges, the Cask solution deployed, and its benefits to Thomson Reuters

CDAP 4.3 is available immediately for download on the Cask website. To evaluate the Cask Distributed Rules Engine or the Cask Microservices Framework, please contact Cask for additional information.

About CDAP
The first unified integration platform for big data, Cask Data Application Platform (CDAP) lets developers, architects and data scientists focus on applications and insights rather than infrastructure and integration. CDAP is open source and accelerates time to value from Hadoop through standardized APIs, configurable templates and visual interfaces. With a radically simplified developer experience and a code-free self-service environment, CDAP enables enterprise IT to broaden the big data user base and seamlessly integrates with existing MDM, BI and security and governance solutions.

About Cask
Cask makes building and running big data solutions on-premises or in the cloud easy with Cask Data Application Platform (CDAP), the first unified integration platform for big data. CDAP reduces the time to production for data lakes and data applications by 80%, empowering the business to make better decisions faster. Cask customers and partners include AT&T, AWS, Cloudera, Ericsson, Google, IBM, Microsoft, Salesforce, Tableau and Thomson Reuters, among others. For more information, visit the Cask website at and follow @caskdata.

Max Herrmann