Print - Informatica Delivers Industry's First Data Parser for Hadoop

Source: Informatica
November 02, 2011 07:00 ET

Informatica Delivers Industry's First Data Parser for Hadoop

Informatica HParser Brings Enterprise-Grade Parsing of Big Data Logs, Documents and Industry Standards to Hadoop

REDWOOD CITY, Calif., Nov. 2, 2011 (GLOBE NEWSWIRE) -- Informatica Corporation (NASDAQ: INFA), the world’s number one independent provider of data integration software, today announced the immediate availability of Informatica HParser, the first data parsing transformation solution for Hadoop environments. Informatica HParser runs on nearly any distribution of Apache Hadoop, exploiting the parallelism of the MapReduce framework to efficiently turn unstructured complex data, such as web logs, social media data, call detail records and other data formats, into a structured or semi-structured format in Hadoop. Once transformed into a more structured format, the data can be more rapidly used and validated to drive business insights and improve operations.

Available in a free community edition and commercial editions, Informatica HParser provides organizations with the solution they require to extract the value of complex, unstructured data. This powerful data parsing capability in Hadoop empowers organizations to achieve new levels of productivity, efficiency and scalability. Organizations can readily augment their existing IT investments by using Informatica HParser as the standard for data parsing in Hadoop. Using Informatica HParser, customers benefit from an engine-based solution that covers the broadest range of data formats and greatly simplifies and speeds the analytical process by eliminating the risks and costs of one-off custom-coded parsing scripts.

Unique Benefits of Informatica HParser

The unique benefits delivered by Informatica HParser include:

Rapid, visual development - HParser’s visual Integrated Development Environment (IDE) for creating and maintaining transformations accelerates development and boosts developer productivity. HParser also turns deep hierarchy and relationships into a flattened, easier to use format while allowing for business rule validation.
Single engine covering a broad range of data formats - HParser’s ready-to-use transformation building blocks, or libraries, cover a wide range of general and industry-specific data formats including support for XML and JSON; SWIFT, X12, NACHA for the financial industry; HL7 and HIPAA for healthcare; ASN.1 for telecommunications; and market data.
Support for device-generated logs - HParser simplifies the parsing of complex device- or machine-generated content including proprietary log files such as Apache weblogs and Omniture logs.
Exploiting parallelism in MapReduce - HParser delivers optimized parsing performance for large files of complex data by running natively inside MapReduce and fully leveraging its parallelism.
Leveraging best practices across large-scale projects - With HParser, developers can create an abstraction layer between the application logic in MapReduce and data sources. This enables projects to easily scale by allowing application logic to be written once and then applied across multiple data sources. Using the same IDE, the design artifacts can be extended to the rest of the enterprise beyond Hadoop projects.

Supporting Quotes

"By 2014, organizations which have deployed analytics to support new complex data types and large volumes of data in analytics will outperform their market peers by more than 20 percent relative to virtually any accepted, standardized accounting performance metric," said Merv Adrian, research vice president, Gartner.¹ "The ability to parse diverse unstructured and multi-structured data with deep hierarchies into a format that can be readily analyzed and processed is a foundation for developing a logically, consistent information infrastructure extensible to tackle big data including Hadoop. It is crucial for a data-centric enterprise to look for common ways to normalize and extract meaning from all types of content using such standards as XML and JSON so that it can be exchanged across the organization."
"The market demand for achieving the full potential of big data for business value is high," said Tom Kersnick, director of the Big Data Center of Excellence (CoE) for Cognizant’s Data Warehousing, Business Intelligence and Performance Management Practice. "This has led Cognizant to create the Big Data CoE, where Hadoop is one of our strategic growth drivers. As part of our beta engagement with Informatica, we tested a range of use cases, and HParser demonstrated how complex, hierarchical files can be flattened through parallel parsing in an easy-to-use, graphical user interface. As we expand our big data competency, this type of scalable and efficient approach to data parsing in Hadoop is a crucial factor in building skill sets and increasing service capacity for our rapidly growing joint client base."
"Informatica HParser, the newest addition to the Informatica B2B Data Exchange family and Informatica Platform, addresses the growing demand for deriving business value from large volumes of unstructured complex data," said Juan Carlos Soto, senior vice president and general manager, B2B Data Exchange and Cloud Data Integration, Informatica. "HParser combines Informatica’s latest innovation optimized for Hadoop with our unrivaled experience in parsing unstructured data and handling industry-standard formats. Informatica HParser is a pivotal milestone on our roadmap for helping enterprises leverage big data, and is yet another Informatica solution designed to help organizations maximize their Return on Data."

Informatica HParser Editions and Availability

Informatica HParser is available immediately in three editions:

HParser for Logs, Omniture, XML and JSON (Community Edition) - Available free of charge, with Informatica support and add-on features available for purchase.
HParser for Industry Standardsand Documents (Commercial Editions).

Both Commercial Editions of Informatica HParser are available for a 30-day free trial period.

Tweet this: News: @InformaticaCorp Delivers Industry’s First Data Parser for #Hadoop http://bit.ly/uDthlq #bigdata

About Informatica

Informatica Corporation (NASDAQ: INFA) is the world’s number one independent provider of data integration software. Organizations around the world rely on Informatica to gain a competitive advantage with timely, relevant and trustworthy data for their top business imperatives. Worldwide, over 4,500 enterprises depend on Informatica for data integration, data quality and big data solutions to access, integrate and trust their information assets residing on-premise and in the Cloud. For more information, call +1 650-385-5000 (1-800-653-3871 in the U.S.), or visit www.informatica.com. Connect with Informatica at http://www.facebook.com/InformaticaCorporation, http://www.linkedin.com/company/informatica and http://twitter.com/InformaticaCorp.

¹See: The Information Capabilities Framework: An Aligned Vision for Information Infrastructure, G00215835

###

Note: Informatica,Informatica Platform, Informatica HParser and Informatica B2B Data Exchange are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.