Apache Software Foundation Announces Apache® Hive 4.0 

Open source data warehouse software built on top of Apache Hadoop enables data analytics and management at massive scale 


Wilmington, DE, April 30, 2024 (GLOBE NEWSWIRE) -- The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 320 open-source projects and initiatives, today announced Apache Hive 4.0. For over a decade, Apache Hive has been the cornerstone of data warehouse and data lake architectures, empowering companies and organizations worldwide to perform analytics at an unprecedented scale while seamlessly managing vast amounts of data through SQL queries.

Since its inception in 2010 Apache Hive has evolved to meet the ever-growing demands of modern data management by offering a distributed, fault-tolerant data warehouse system known for its scalability and reliability. With support for Kerberos authentication and seamless integration with Apache Ranger and Apache Atlas for enhanced security and observability, Hive has become one of the best go-to solutions for enterprises seeking robust data management solutions.

“Hive 4.0 is one of the most significant releases from the Hive community to-date, unlocking unprecedented capabilities for data engineers, analysts and architects who need to manage or analyze data at scale,” said Ayush Saxena, ASF Member and Hive contributor. “This release is the result of a tremendous effort from the Hive community, and we are excited to announce its availability.” 

Empowering the Data Ecosystem
At the heart of Apache Hive lies the Hive Metastore (HMS), a centralized repository of metadata that serves as a fundamental building block for data lakes. Hive leverages a myriad of open source technologies including Apache Spark, Presto, and Trino. The Hive Metastore facilitates seamless access to metadata for various clients including Hive, Apache Impala and Spark, making it a vital component of the modern data ecosystem.

What's New in Apache Hive 4.0
Apache Hive 4.0 features over 5,000 commits including new features, bug fixes, and performance enhancements. Key highlights of Apache Hive 4.0 include:

  • Hive Iceberg Integration: Streamlines data management with seamless integration of Apache Iceberg tables;
  • Improved Transaction and Locking Capability: Enhances the ACID compliance of Hive with improved transaction handling and locking mechanisms;
  • Table Maintenance: Introduces compaction mechanisms for both Hive ACID and Iceberg tables to optimize storage and performance;
  • Hive Docker Support: Simplifies deployment with official Apache Hive Docker images for easier setup and configuration. Explore the Docker images on Docker Hub for seamless deployment;
  • Compiler Improvements: Anti-join support, branch pruning, column histogram statistics, HPL/SQL support, scheduled queries, new and improved cost-based optimization (CBO) rules leading to better query plans;
  • Materialized Views Support: Enables the creation and management of materialized views for accelerated query processing;
  • Runtime Optimizations: Enhances query performance with optimizations in Apache Tez and Apache Hive LLAP, ensuring faster data processing;
  • Hive Replication: Introduces improved replication features both for external and ACID tables for efficient data distribution and disaster recovery; and
  • Support for Apache Ozone: Introduces support for Apache Ozone, enabling seamless integration with Ozone-based object stores for scalable and efficient storage solutions.

For a complete list of changes, visit the Apache Hive Wiki

Additional Resources 

About The Apache Software Foundation (ASF)
Founded in 1999, the Apache Software Foundation exists to provide software for the public good with support from more than 75 sponsors. ASF’s open-source software is used ubiquitously around the world, with more than 8,400 committers contributing to 320+ active projects, including Apache Superset, Apache Camel, Apache Flink, Apache HTTP Server, Apache Kafka, and Apache Airflow. The Foundation’s open-source projects and community practices are considered industry standards, including the widely adopted Apache License 2.0, the podling incubation process, and a consensus-driven decision model that enables projects to build strong communities and thrive. https://apache.org

ASF’s annual Community Over Code event is where open-source technologists convene to share best practices and use cases, forge critical relationships, and learn about advancements in their field. https://communityovercode.org/ 

© The Apache Software Foundation. “Apache” is a registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.


 

Coordonnées