Andy Wu's Blog: Big Data: Hadoop, Spark

Wednesday, December 2, 2015

Big Data: Hadoop, Spark

Apache Hadoop - wiki.
Hadoop Ecosystem

Hadoop Vendors

Hortonworks,
Cloudera,
MapR,
Greenplum,
IBM, and
Amazon.

Parts

Hive is a SQL dialect and
Pig is a dataflow language for that hide the tedium of creating MapReduce jobs behind higher-level abstractions more appropriate for user goals.
Zookeeper is used for federating services and
Oozie is a scheduling system. Avro,
Thrift and Protobuf are platform-portable data serialization and description formats.

Apache Spark

Apache Spark Ecosystem
Wiki
Implementation

77% scala
9% python
7% java
8% other

Big Data Processing with Apache Spark

Labels: Big Data

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]