Wednesday, December 2, 2015

Big Data: Hadoop, Spark



  1. Apache Hadoop - wiki.
  2. Hadoop Ecosystem
    1. Hadoop Vendors
      1. Hortonworks
      2. Cloudera,
      3. MapR, 
      4. Greenplum, 
      5. IBM, and 
      6. Amazon.
    2. Parts
      1. Hive is a SQL dialect and 
      2. Pig is a dataflow language for that hide the tedium of creating MapReduce jobs behind higher-level abstractions more appropriate for user goals. 
      3. Zookeeper is used for federating services and 
      4. Oozie is a scheduling system. Avro, 
      5. Thrift and Protobuf are platform-portable data serialization and description formats.
  3. Apache Spark
    1. Apache Spark Ecosystem
    2. Wiki
    3. Implementation
      1. 77% scala
      2. 9%  python
      3. 7%  java
      4. 8%  other
    4. Big Data Processing with Apache Spark
      1. Part 1: Introduction
      2. Part 2: Spark SQL
      3. Spark Streaming.

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home