Apache Beam

Apache Beam

Developer(s)	Apache Software Foundation
Initial release	June 15, 2016 (2016-06-15)

Stable release	0.2.0 / August 8, 2016 (2016-08-08)
Development status	Active
Written in	Java, Python
Operating system	Cross-platform
License	Apache License 2.0
Website	beam.apache.org

Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.^[1] Beam Pipelines are defined using one of the provided SDKs and executed in one of the Beam’s supported runners (distributed processing back-ends) including Apache Flink, Apache Spark, and Google Cloud Dataflow^[2]

It has been termed an "uber-API for big data".^[3]

History

Apache Beam^[2] is one implementation of the Dataflow model paper.^[4] The Dataflow model is based on previous work on distributed processing abstractions at Google, in particular on FlumeJava^[5] and Millwheel.^[6]^[7]

Google released an open SDK implementation of the Dataflow model in 2014 and an environment to execute Dataflows locally (non-distributed) as well as in the Google Cloud Platform service.

In 2016 Google donated the core SDK as well as the implementation of a local runner, and a set of IOs (data connectors) to access Google Cloud Platform data services to the Apache Software Foundation. Other companies and members of the community have contributed runners for existing distributed execution platforms, as well as new IOs to integrate the Beam Runners with existing Databases, Key-Value stores and Message systems. Additionally new DSLs have been proposed to support specific domain needs on top of the Beam Model.

Timeline

Version	Original release date	Latest version	Release date
Current stable version: 0.2.0	2016-08-08	0.2.0	2016-08-08
Old version, no longer supported: 0.1.0	2016-06-15	0.1.0	2016-06-15
Legend: Old version Older version, still supported Latest version Latest preview version Future release

References

↑ Woodie, Alex (22 April 2016). "Apache Beam's Ambitious Goal: Unify Big Data Development". Datanami. Retrieved 4 August 2016.
1 2 "Cloud Dataflow - Batch & Stream Data Processing".
↑ Ian Pointer (April 14, 2016). "Apache Beam wants to be uber-API for big data". InfoWorld.
↑ Akidau, Tyler; Schmidt, Eric; Whittle, Sam; Bradshaw, Robert; Chambers, Craig; Chernyak, Slava; Fernández-Moctezuma, Rafael J.; Lax, Reuven; McVeety, Sam; Mills, Daniel; Perry, Frances (1 August 2015). "The dataflow model" (PDF). Proceedings of the VLDB Endowment. 8 (12): 1792–1803. doi:10.14778/2824032.2824076. Retrieved 4 August 2016.
↑ Chambers, Craig; Raniwala, Ashish; Perry, Frances; Adams, Stephen; Henry, Robert R.; Bradshaw, Robert; Weizenbaum, Nathan (1 January 2010). "FlumeJava: Easy, Efficient Data-parallel Pipelines" (PDF). Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM: 363–375. doi:10.1145/1806596.1806638. Retrieved 4 August 2016.
↑ Akidau, Tyler; Whittle, Sam; Balikov, Alex; Bekiroğlu, Kaya; Chernyak, Slava; Haberman, Josh; Lax, Reuven; McVeety, Sam; Mills, Daniel; Nordstrom, Paul (27 August 2013). "MillWheel" (PDF). Proceedings of the VLDB Endowment. 6 (11): 1033–1044. doi:10.14778/2536222.2536229. Retrieved 4 August 2016.
↑ Pointer, Ian. "Apache Beam wants to be uber-API for big data". InfoWorld. Retrieved 4 August 2016.

Apache Software Foundation

Top level projects	Abdera Accumulo ActiveMQ Ambari Ant Apex Aries Apache HTTP Server APR Avro Axis Axis2 Bloodhound Brooklyn Buildr Calcite Camel Cassandra Cayenne Chemistry CloudStack Cocoon Continuum Cordova CouchDB cTAKES CXF Deltacloud Derby Directory Drill Empire-db Felix Flex Flink Flume Forrest Geronimo Gora Gump Hadoop Hama HBase Hive Jackrabbit James JMeter Kafka Karaf Kylin Lucene Mahout Marmotta Maven MINA mod_perl MyFaces Nutch ODE OFBiz Oozie OpenEJB OpenJPA OpenNLP OpenOffice PDFBox Phoenix POI Pig Pivot Qpid River Roller Samza ServiceMix Shiro Sling Spark Stanbol Storm SpamAssassin Sqoop Struts Subversion Tapestry Thrift Tika TinkerPop Tiles Tomcat Traffic Server Turbine Tuscany UIMA Velocity Wave Wicket Wink Xalan Xerces ZooKeeper

Commons projects	Apache Commons Logging BCEL BSF Daemon Jelly

Lucene projects	Lucene Java Solr

Hadoop projects	Hadoop HDFS

Other projects	Batik Chainsaw FOP Ivy Log4j Log4Net XAP

Incubator projects	XAP Apache Beam Apache Singa

Apache Attic	AxKit Beehive Click Bluesky Cactus Jakarta Excalibur Harmony HiveMind Lenya Slide Shale Shindig stdcxx iBATIS XMLBeans

Licenses	Apache License

Category Commons

This article is issued from Wikipedia - version of the 10/16/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Apache Beam

History

Timeline

See also

References