Google launching open source Cloud Dataflow SDK for Java

First unveiled at Google’s annual I/O developer summit this summer, Cloud Dataflow is a big data analytics solution designed to crunch information in either streaming or batch mode.

Cloud Dataflow has since been pushed out as an alpha release as Google preps a managed service model for data processing.

Urs Hölzle, senior vice president of Google Cloud Platform, noted at the time that Dataflow replaced MapReduce inside Google as the new approach for analyzing pipelines with “arbitrarily large datasets.”

Cloud Dataflow also fills a major puzzle piece in Google’s rapidly evolving and growing cloud stack as the Internet giant continues to challenge Amazon Web Services, among other cloud providers.

More specifically, Google’s Cloud Dataflow lines up against data warehouse service AWS Redshift as well as Hadoop tool AWS Elastic MapReduce.