Introduction to Data Engineering
Introduction to Data Engineering
• Overview of Data Engineering: Roles, Responsibilities, and Importance.
• Data Lifecycle: Ingestion, Storage, Processing, and Visualization.
• Tools and Technologies: Introduction to Popular Tools (e.g., SQL, NoSQL, ETL Tools, Hadoop, Spark, Kafka).
Data Ingestion and Storage
Data Ingestion and Storage
• Data Ingestion Techniques: Batch vs. Real-Time Ingestion.
• Introduction to Databases: SQL vs. NoSQL, Key Concepts, and Differences.
• Hands-On Activity: Setting Up a Simple Database and Ingesting Data.
• Advanced SQL Queries: Joins, Aggregations, Subqueries.
• Data Transformation: Techniques for Transforming Raw Data into Usable Formats.
• Hands-On Activity: Writing and Executing Complex SQL Queries.
• Introduction to Hadoop: HDFS, MapReduce.
• Introduction to Spark: Core Concepts, RDDs, DataFrames.
• Hands-On Activity: Running Simple Spark Jobs on a Local Cluster.
Data Pipelines and ETL Processes
Data Pipelines and ETL Processes
• ETL Concepts: Extract, Transform, Load Processes.
• Building Data Pipelines: Tools (e.g., Apache Airflow, Luigi, Nifi ETL).
• Hands-On Activity: Creating a Basic ETL Pipeline Using a Selected Tool.
Real-Time Data Processing
Real-Time Data Processing
• Introduction to Kafka: Core Concepts, Producers, Consumers.
• Streaming Data Processing: Using Tools Like Apache Flink or Spark Streaming.
• Hands-On Activity: Setting Up a Kafka Producer and Consumer, Processing Real-Time Data.
Data Visualization and Reporting
Data Visualization and Reporting
• Data Visualization Tools: Tableau, Power BI, or Open-Source Tools.
• Creating Dashboards: Best Practices for Effective Data Visualization.
• Hands-On Activity: Building a Simple Dashboard Using a Selected Tool.
Building a Data Pipeline as a Portfolio Project