Big Data: Hadoop, Spark
- Apache Hadoop - wiki.
- Hadoop Ecosystem
- Hadoop Vendors
- Hortonworks,
- Cloudera,
- MapR,
- Greenplum,
- IBM, and
- Amazon.
- Parts
- Hive is a SQL dialect and
- Pig is a dataflow language for that hide the tedium of creating MapReduce jobs behind higher-level abstractions more appropriate for user goals.
- Zookeeper is used for federating services and
- Oozie is a scheduling system. Avro,
- Thrift and Protobuf are platform-portable data serialization and description formats.
- Apache Spark
- Apache Spark Ecosystem
- Wiki
- Implementation
- 77% scala
- 9% python
- 7% java
- 8% other
- Big Data Processing with Apache Spark
Labels: Big Data
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home