Yahoo! Cloud Serving Benchmark (YCSB) Now Available in Cloudera Labs

YCSB joins Ibis, Apache Hive-on-Apache Spark, Apache Kafka (graduated), and Apache Phoenix Projects


MEDIA ALERT: PALO ALTO, Calif., Sept. 4, 2015 (GLOBE NEWSWIRE) -- What: Cloudera announces that the Yahoo! Cloud Serving Benchmark (YCSB), an open source framework for evaluating and comparing the performance of multiple types of data-serving systems (including NoSQL stores such as Apache HBase, Apache Cassandra, Redis, MongoDB, and Voldemort) is now available to CDH users.

Why: YCSB has long been the de facto open standard for comparative performance evaluation of NoSQL data stores. Many factors go into deciding which data store to use for production applications, including basic features, data model, and performance characteristics on a given type of workload. It's critical to have the ability to compare multiple data stores intelligently and objectively so that sound architectural decisions can be made.

From the perspective of a generic, database-neutral, performance evaluation utility, YCSB is currently the de-facto comparative benchmark for NoSQL stores. It includes support for a wide range of database bindings and is commonly used to compare their performance for a set of desired workloads. Being open source and extensible, support for additional databases is regularly added.

Who: Cloudera sees great value in the YCSB project for the HBase community, and recently Cloudera engineers have been working with Brian Cooper, the original author of YCSB, to reinvigorate the project within the developer community. A number of enhancements have already been added, and a regular release cycle has been established. Some of the recent improvements to YCSB include:

  • Latency capture via HDRHistogram
  • Measuring transaction latency against a fixed schedule
  • Support for an additional JSON format
  • Better reporting and status output
  • New database bindings

When: Available now, Cloudera CDH users can now easily install and use YCSB to evaluate the performance of their HBase deployments by taking advantage of new packages in Cloudera Labs. (As with all Cloudera Labs projects, although these packages are not currently supported we do strongly encourage you to experiment with them.)

How YCSB Works:

YCSB was developed at Yahoo! Labs to provide a framework and common set of workloads for evaluating the performance of different key-value stores. It has two parts:

  • The YCSB Client, an extensible workload generator
  • The core workloads, a set of workload scenarios to be executed by the generator

The core workloads provide a well rounded picture of a system's performance, and the client is extensible so that you can define additional workloads to examine system aspects or application scenarios not covered by the core workload; the client can also be extended to benchmark different databases. YCSB ships with bindings for a long list of databases including HBase, Cassandra, Apache Accumulo, MongoDB, and Voldemort, and support for a different data store can be added by writing an interface layer.

To benchmark multiple data stores and compare them, you can install multiple data stores on multiple instances of an identical hardware configuration and run the same workloads against each instance. Next, plot the performance of each system, to see their relative performance profiles. One example of a good visualization to try is latency versus throughput curves.

Installing YCSB with CDH

YCSB packages and parcels for CDH, including basic documentation, can be downloaded from here. (YCSB 0.3.0 is the version packaged in Cloudera Labs.) To install this version, within Cloudera Manager, click on the "Parcels" icon in the top bar, then click on the "Edit Settings" button. Add the following link to the list of URLs enumerated in the "Remote Parcel Repository URLs" setting: http://archive.cloudera.com/cloudera-labs/ycsb/parcels/latest. Then, install the parcel and activate it.

Users will see value in YCSB for benchmarking HBase deployments. Share feedback or questions on the Cloudera Labs area at community.cloudera.com.

Useful additional reading:



            

Contact Data