Big Data

Big Data Q&A

1What is Big Data?
Answer: Datasets whose scale/complexity exceeds traditional processing systems.
25Vs of Big Data?
Answer: Volume, velocity, variety, veracity, value.
3Batch vs streaming?
Answer: Batch processes stored chunks; streaming handles real-time events.
4What is distributed computing?
Answer: Processing data across multiple machines in parallel.
5Data lake vs warehouse?
Answer: Lake stores raw flexible data; warehouse stores curated structured data.
6What is HDFS?
Answer: Hadoop Distributed File System for fault-tolerant distributed storage.
7Why partition data?
Answer: Improves query performance and parallel execution.
8Common big-data tools?
Answer: Hadoop, Spark, Kafka, Hive, Airflow, cloud data platforms.
9Main challenge?
Answer: Balancing scale, cost, reliability, and data governance.
10One-line summary?
Answer: Big Data engineering enables scalable analytics from massive datasets.