Big Data Scalable Systems
Concepts

Big Data Fundamentals

Understand what "big data" means, its key characteristics, and common technologies used to process massive datasets.

The 4Vs of Big Data

  • Volume: large amounts of data (GB, TB, PB).
  • Velocity: speed at which data is generated (streams, real-time).
  • Variety: different types (structured, semi-structured, unstructured).
  • Veracity: data quality and reliability.

Big Data Tools

Hadoop

Distributed storage (HDFS) and batch processing (MapReduce, now less popular than Spark).

Apache Spark

Fast general engine for big data processing; supports batch, streaming, SQL, ML, graph.

Cloud Services

AWS EMR, GCP Dataproc, Azure HDInsight, BigQuery, Snowflake, etc.