Siirry suoraan sisältöön

Introduction to Data EngineeringLaajuus (5 op)

Tunnus: TT00CN68

Laajuus

5 op

Osaamistavoitteet

After completing the course the student is able to:
Understand and describe the data engineering process life cycle

Sisältö

What is Data Engineering
Data Storage and Retrieval
Data Engineering Lifecycle
Extract, Transform and Load (ETL) process
Introduction to Big Data Frameworks

Ilmoittautumisaika

01.06.2024 - 09.09.2024

Ajoitus

02.09.2024 - 15.12.2024

Opintopistemäärä

5 op

Toteutustapa

Lähiopetus

Yksikkö

Tekniikka ja liiketoiminta

Toimipiste

Kupittaan kampus

Opetuskielet
  • Englanti
Paikat

30 - 65

Koulutus
  • Tietojenkäsittelyn koulutus
Opettaja
  • Golnaz Sahebi
Ajoitusryhmät
  • Subgroup 1 (Koko: 35. Avoin AMK: 0.)
  • Subgroup 2 (Koko: 35. Avoin AMK: 0.)
Ryhmät
  • PTIETS23deai
    Data Engineering and Artificial Intelligence
  • PTIVIS23I
    Data Engineering and Artificial Intelligence
Pienryhmät
  • Subgroup 1
  • Subgroup 2

Tavoitteet

After completing the course the student is able to:
Understand and describe the data engineering process life cycle

Sisältö

What is Data Engineering
Data Storage and Retrieval
Data Engineering Lifecycle
Extract, Transform and Load (ETL) process
Introduction to Big Data Frameworks

Oppimateriaalit

- The learning materials including slides and exercises will be prepared by the lecturer from various sources such as online courses and articles, books, videos, etc. The material will be introduced during the lectures and will be available via the learning environment (ITS).

- AWS Academy Data Engineering [91081] Course Materials

- Recommended books:
1. Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python, Crickard III, Paul, Packt Publishing, 2020.
Slides provided by teacher can be found via Itslearning.

2. Fundamentals of Data Engineering, Plan and Build Robust Data Systems
By Joe Reis and Matt Housley, Publisher: O’Reily, First edition, 2022.

Opetusmenetelmät

- Participating in lectures (theory and practice)
- Learning through hands-on programming (classwork assignments)
- Completing homework assignments or AWS Academy Course
- Interacting with the teacher and classmates
- Enhancing knowledge through teamwork projects
- Following the flipped-classroom model (pre-session self-study of theoretical concepts followed by in-class practical application)

Tenttien ajankohdat ja uusintamahdollisuudet

No exam, and retake not possible after evaluation grade is published.

There is a final teamwork project where students must demonstrate their work during a presentation event in week 48.

Pedagogiset toimintatavat ja kestävä kehitys

- The course includes approximately 12 theory and practice sessions, where students engage with practical tasks.
- Additionally, there are 4 online Q&A sessions to provide extra support.
- Homework exercises will be assigned, with some parts demonstrated during contact sessions.
- Integration of Cloud-based data engineering through the AWS Academy course.
- A teamwork project will be introduced in the second month, requiring students to apply their teamwork skills and the knowledge gained from the course to implement their final project.
- A flipped-classroom model may be used for some lectures, where students study the theoretical content at home and focus on practical implementation and discussions during class.

Toteutuksen valinnaiset suoritustavat

The practice works and exercises are mainly performed using VS Code, Jupyter Notebook, Apache Airflow, and AWS services.

Opiskelijan ajankäyttö ja kuormitus

- Contact teaching:
• We have 12 theory and practice sessions, each lasting 3 hours, conducted weekly: 12 x 3 = 36
• Additionally, there are 4 online Q&A sessions, each lasting 1 hour.
• Total contact teaching hours per course: 40 hours.

- Homework and teamwork assignment:
• Personal assignments (homework) and independent studies: 75 hours
• Teamwork assignment: 20 hours

Total: approximately 135 hours (5 x 27h)

Sisällön jaksotus

Course Overview
This course provides an introduction to data engineering, combining theoretical concepts with practical applications. The course is divided into two main parts, each with a distinct focus:

- Part I: Theories and Practice
• Instructor-Led Sessions: Covering general topics in data engineering, taught and supervised by the instructor.

- Part II: Optional AWS Academy Self-Paced Course
• Self-Paced Learning: Students have the option to independently complete the AWS Academy Data Engineering course, gaining in-depth knowledge and earning a certification. This can replace the requirement to complete standard homework assignments.

Course Structure
Part I: Theories and Practice (Instructor Supervision & AWS Academy)
• Week 36: Course Overview and Introduction to AWS Academy Data Engineering
• Week 37: The Data Engineering Ecosystem & AWS Practice
• Week 38: ETL Processes & AWS Practice + Exercise Demo (I)
• Week 39: Introduction to Apache Airflow & AWS Integration
• Week 40: Data Engineering Life Cycle: Data Wrangling & ETL + AWS Practice
• Week 41: Data Wrangling and ETL in Apache Airflow + AWS Practice
• Week 42: Autumn Break
• Week 43: Data Governance and Compliance in Data Engineering + AWS Practice
• Week 44: Exercise Demo + AWS Practice
• Week 45: Continued AWS Course Study
• Weeks 46 & 47: Group Work on Final Projects (in-class) + AWS Practice
• Week 48: Final Project Presentations

Part II: Optional AWS Academy Data Engineering [91081]
- Self-Paced Modules: Students can choose to complete the full AWS Academy Data Engineering course, covering the following modules.
- Module Timeline:
• Week 36: Module 1 - Welcome to AWS Academy Data Engineering
• Week 37: Module 2 - Data-Driven Organizations
• Week 38: Module 3 - The Elements of Data
• Week 39: Module 4 - Design Principles and Patterns for Data Pipelines
• Week 40: Module 5 - Securing and Scaling the Data Pipeline
• Week 41: Module 6 - Ingesting and Preparing Data
• Week 42: Module 7 - Ingesting by Batch or by Stream
• Week 43: Module 8 - Storing and Organizing Data
• Week 44: Module 9 - Processing Big Data
• Week 45: Module 10 - Processing Data for ML
• Week 46: Module 11 - Analyzing and Visualizing Data
• Week 47: Module 12 - Automating the Pipeline

Student Responsibilities
1. Class Participation and Assignments:
• Active participation in all classes, including the completion of in-class assignments, which must be submitted during class hours.
2. Homework Assignments:
• Option A: Complete eight individual homework exercises, partially demonstrated during contact sessions.
• Option B: Complete the full AWS Academy Data Engineering course as a substitute for the homework assignments. To do this, students must follow the weekly schedule and upload their AWS Academy course certificate to the Itslearning platform.
3. Final Project:
• A group project (3-4 students) to be completed over Weeks 46 & 47, culminating in a presentation in Week 48.
________________________________________
Additional Notes
• Flexibility: The option to replace homework with the AWS Academy course allows students to tailor their learning experience to their interests and career goals.
• Integration of AWS: The inclusion of AWS Academy in both the core and optional parts of the course provides a strong foundation in cloud-based data engineering, which is highly relevant in today's industry.
• Project Work: The group project encourages collaboration and the practical application of the skills learned throughout the course.

Viestintäkanava ja lisätietoja

Qualifications:
Before taking an "Introduction to Data Engineering with Python" course, students typically need a foundational understanding of several key areas. Here are the mandatory and recommended prerequisite courses and topics.

1. Mandatory Prerequisites: 
1.1. Programming:
1.1.1. Introduction to Programming: Knowledge of programming fundamentals,
including concepts like variables, loops, conditionals, and functions. 
1.1.2. Python Programming: Familiarity with Python, including basic syntax, data
types, control structures, and function and modules
1.1.3. Error Handling 
1.1.4. Object-oriented programming (OOP) 
1.1.5. Data Manipulation: Skills in using Pandas library including DataFrames and
Series, reading, writing, filtering, and transforming data
1.2. Databases: Knowledge of how databases work, including concepts like tables, keys, normalization, and indexing.

2. Recommended Topics:
2.1. Algorithms and Data Structures: Basic understanding of algorithms and data
structures such as arrays, lists, trees, and graphs, which are crucial for data
processing
2.2. Having the fundamental knowledge of cloud services or passing the Cloud Services
Course in TUAS (Lecturer: Ali Khan)
2.3. Version Control Systems: Basic understanding of tools like Git for version control.
2.4. Basic Algebra and Calculus: Fundamental math skills to handle data transformations
and calculations.
2.5. Statistics: Understanding of basic statistical concepts like mean, median, standard
deviation, and probability distributions.+

Arviointiasteikko

H-5

Arviointimenetelmät ja arvioinnin perusteet

1) The course is graded on a scale of 0-5

2) Students can achieve 100 points from this course that contains:
- Participation and classwork assignments: participating on each lecture and submitting the related classwork assignment during the class hours 3p => 12 X 3 = 36 points.
- Homework assignments: each homework assignment has 4-6 points. There are 6-8 homework assignments => 8 x 4 (or 6 x6)= 36 points. (or Completing the AWS Academy Course labs and and uploading the certificate on ITS: 36 points)
- Teamwork assignment: 28 points
Note: the teamwork assignment will be graded on scale 0-5 on Itslearning.

The assignments must be returned by the deadline to get the points. The assignments returned after the deadline will give you only half of the points.
Demonstrations of exercises during the contact session is mandatory without demonstration you will lose 50% of your marks.

3) Evaluation:
50% of total to pass: 50% from participation and classwork + 50% from homework assignments (or AWS Academy Course) + 50% from the teamwork projects to pass

Note: Grades will be rounded down if they include decimals less than 0.5; otherwise, they will be rounded up. (e.g., 3.4 is rounded down to 3.0, but 3.5 or higher is rounded up to 4.0)

Hylätty (0)

The student does NOT get at least 50% of the points in teamwork assignment OR does not get at least 50% of the points in the homework assignments (or did not get the AWS Academy course certificate) OR does not get at least 50% of the points in participation and classwork submission.

Arviointikriteerit, tyydyttävä (1-2)

The student got 50-65% of the points for the homework assignments (or got the AWS Academy course certificate) AND got 50-65% of the points for the participation and classwork assignments submission AND got a grade of 1 - 3 for the teamwork assignment.

Arviointikriteerit, hyvä (3-4)

The student got 66-85% of the points for the homework assignments (or got the AWS Academy course certificate) AND got 66-85% of the points for the participation and classwork assignments submission AND got a grade of 4 for the teamwork assignment.

Arviointikriteerit, kiitettävä (5)

The student got at least 86% of the points for the homework assignments (or got the AWS Academy course certificate) AND got at least 86% of the points for the participation and classwork assignments submission AND got a grade of 5 for the teamwork assignment.

Ilmoittautumisaika

01.06.2024 - 09.09.2024

Ajoitus

02.09.2024 - 15.12.2024

Opintopistemäärä

5 op

Toteutustapa

Lähiopetus

Yksikkö

Tekniikka ja liiketoiminta

Toimipiste

Kupittaan kampus

Opetuskielet
  • Suomi
  • Englanti
Paikat

0 - 35

Koulutus
  • Tietojenkäsittelyn koulutus
Opettaja
  • Golnaz Sahebi
Vastuuopettaja

Golnaz Sahebi

Ryhmät
  • PTIVIS22H
    Health Technology

Tavoitteet

After completing the course the student is able to:
Understand and describe the data engineering process life cycle

Sisältö

What is Data Engineering
Data Storage and Retrieval
Data Engineering Lifecycle
Extract, Transform and Load (ETL) process
Introduction to Big Data Frameworks

Oppimateriaalit

- The learning materials including slides and exercises will be prepared by the lecturer from various sources such as online courses and articles, books, videos, etc. The material will be introduced during the lectures and will be available via the learning environment (ITS).

- AWS Academy Data Engineering [91081] Course Materials

- Recommended books:
1. Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python, Crickard III, Paul, Packt Publishing, 2020.
Slides provided by teacher can be found via Itslearning.

2. Fundamentals of Data Engineering, Plan and Build Robust Data Systems
By Joe Reis and Matt Housley, Publisher: O’Reily, First edition, 2022.

Opetusmenetelmät

- Participating in lectures (theory and practice)
- Learning through hands-on programming (classwork assignments)
- Completing homework assignments or AWS Academy Course
- Interacting with the teacher and classmates
- Enhancing knowledge through teamwork projects
- Following the flipped-classroom model (pre-session self-study of theoretical concepts followed by in-class practical application)

Tenttien ajankohdat ja uusintamahdollisuudet

No exam, and retake not possible after evaluation grade is published.

There is a final teamwork project where students must demonstrate their work during a presentation event in week 48.

Pedagogiset toimintatavat ja kestävä kehitys

- The course includes approximately 12 theory and practice sessions, where students engage with practical tasks.
- Additionally, there are 4 online Q&A sessions to provide extra support.
- Homework exercises will be assigned, with some parts demonstrated during contact sessions.
- Integration of Cloud-based data engineering through the AWS Academy course.
- A teamwork project will be introduced in the second month, requiring students to apply their teamwork skills and the knowledge gained from the course to implement their final project.
- A flipped-classroom model may be used for some lectures, where students study the theoretical content at home and focus on practical implementation and discussions during class.

Toteutuksen valinnaiset suoritustavat

The practice works and exercises are mainly performed using VS Code, Jupyter Notebook, Apache Airflow, and AWS services.

Opiskelijan ajankäyttö ja kuormitus

- Contact teaching:
• We have 12 theory and practice sessions, each lasting 3 hours, conducted weekly. (36 hours)
• Additionally, there are 4 online Q&A sessions, each lasting 1 hour.
• Total contact teaching hours per course: 40 hours.

- Homework and teamwork assignment:
• Personal assignments (homework) and independent studies: 75 hours
• Teamwork assignment: 20 hours

Total: approximately 135 hours (5 x 27h)

Sisällön jaksotus

Course Overview
This course provides an introduction to data engineering, combining theoretical concepts with practical applications. The course is divided into two main parts, each with a distinct focus:

- Part I: Theories and Practice
• Instructor-Led Sessions: Covering general topics in data engineering, taught and supervised by the instructor.
• AWS Academy Modules: Select topics integrated into practice sessions, enhancing hands-on experience with Cloud-based data engineering.

- Part II: Optional AWS Academy Self-Paced Course
• Self-Paced Learning: Students have the option to independently complete the AWS Academy Data Engineering course, gaining in-depth knowledge and earning a certification. This can replace the requirement to complete standard homework assignments.

Course Structure
Part I: Theories and Practice (Instructor Supervision & AWS Academy)
• Week 36: Course Overview and Introduction to AWS Academy Data Engineering
• Week 37: The Data Engineering Ecosystem & AWS Practice
• Week 38: ETL Processes & AWS Practice + Exercise Demo (I)
• Week 39: Introduction to Apache Airflow & AWS Integration
• Week 40: Data Engineering Life Cycle: Data Wrangling & ETL + AWS Practice
• Week 41: Data Wrangling and ETL in Apache Airflow + AWS Practice
• Week 42: Autumn Break
• Week 43: Data Governance and Compliance in Data Engineering + AWS Practice
• Week 44: Exercise Demo + AWS Practice
• Week 45: Continued AWS Course Study
• Weeks 46 & 47: Group Work on Final Projects (in-class) + AWS Practice
• Week 48: Final Project Presentations

Part II: Optional AWS Academy Data Engineering [91081]
- Self-Paced Modules: Students can choose to complete the full AWS Academy Data Engineering course, covering the following modules.
- Module Timeline:
• Week 36: Module 1 - Welcome to AWS Academy Data Engineering
• Week 37: Module 2 - Data-Driven Organizations
• Week 38: Module 3 - The Elements of Data
• Week 39: Module 4 - Design Principles and Patterns for Data Pipelines
• Week 40: Module 5 - Securing and Scaling the Data Pipeline
• Week 41: Module 6 - Ingesting and Preparing Data
• Week 42: Module 7 - Ingesting by Batch or by Stream
• Week 43: Module 8 - Storing and Organizing Data
• Week 44: Module 9 - Processing Big Data
• Week 45: Module 10 - Processing Data for ML
• Week 46: Module 11 - Analyzing and Visualizing Data
• Week 47: Module 12 - Automating the Pipeline

Student Responsibilities
1. Class Participation and Assignments:
• Active participation in all classes, including the completion of in-class assignments, which must be submitted during class hours.
2. Homework Assignments:
• Option A: Complete eight individual homework exercises, partially demonstrated during contact sessions.
• Option B: Complete the full AWS Academy Data Engineering course as a substitute for the homework assignments. To do this, students must follow the weekly schedule and upload their AWS Academy course certificate to the Itslearning platform.
3. Final Project:
• A group project (3-4 students) to be completed over Weeks 46 & 47, culminating in a presentation in Week 48.
________________________________________
Additional Notes
• Flexibility: The option to replace homework with the AWS Academy course allows students to tailor their learning experience to their interests and career goals.
• Integration of AWS: The inclusion of AWS Academy in both the core and optional parts of the course provides a strong foundation in cloud-based data engineering, which is highly relevant in today's industry.
• Project Work: The group project encourages collaboration and the practical application of the skills learned throughout the course.

Viestintäkanava ja lisätietoja

Qualifications:
Before taking an "Introduction to Data Engineering with Python" course, students typically need a foundational understanding of several key areas. Here are the mandatory and recommended prerequisite courses and topics.

1. Mandatory Prerequisites: 
1.1. Programming:
1.1.1. Introduction to Programming: Knowledge of programming fundamentals,
including concepts like variables, loops, conditionals, and functions. 
1.1.2. Python Programming: Familiarity with Python, including basic syntax, data
types, control structures, and function and modules
1.1.3. Error Handling 
1.1.4. Object-oriented programming (OOP) 
1.1.5. Data Manipulation: Skills in using Pandas library including DataFrames and
Series, reading, writing, filtering, and transforming data
1.2. Databases: Knowledge of how databases work, including concepts like tables, keys, normalization, and indexing.

2. Recommended Topics:
2.1. Algorithms and Data Structures: Basic understanding of algorithms and data
structures such as arrays, lists, trees, and graphs, which are crucial for data
processing
2.2. Having the fundamental knowledge of cloud services or passing the Cloud Services
Course in TUAS (Lecturer: Ali Khan)
2.3. Version Control Systems: Basic understanding of tools like Git for version control.
2.4. Basic Algebra and Calculus: Fundamental math skills to handle data transformations
and calculations.
2.5. Statistics: Understanding of basic statistical concepts like mean, median, standard
deviation, and probability distributions.

Arviointiasteikko

H-5

Arviointimenetelmät ja arvioinnin perusteet

1) The course is graded on a scale of 0-5

2) Students can achieve 100 points from this course that contains:
- Participation and classwork assignments: participating on each lecture and submitting the related classwork assignment during the class hours 1+2 = 3p => 12 X 3 = 36 points.
- Homework assignments: each homework assignment has 4-6 points. There are 6-8 homework assignments => 8 x 4 (or 6 x6)= 36 points. (or Completing the AWS Academy Course labs and uploading the certificate on ITS: 36 points)
- Teamwork assignment: 28 points
Note: the teamwork assignment will be graded on scale 0-5 on Itslearning.

The assignments must be returned by the deadline to get the points. The assignments returned after the deadline will give you only half of the points.
Demonstrations of exercises during the contact session is mandatory without demonstration you will lose 50% of your marks.

3) Evaluation:
To pass the course, you need to achieve 50% of total points: 50% from participation and classwork = 18p AND 50% from homework assignments (or AWS Academy Course) = 18p AND 50% from the teamwork projects = 14p.

Note: Grades will be rounded down if they include decimals less than 0.5; otherwise, they will be rounded up. (e.g., 3.4 is rounded down to 3.0, but 3.5 or higher is rounded up to 4.0)

Hylätty (0)

The student did NOT get at least 50% of the points in teamwork assignment OR did not get at least 50% of the points in the homework assignments/ the AWS Academy Labs OR did not get at least 50% of the points in participation and classwork submission.

Arviointikriteerit, tyydyttävä (1-2)

The student got 50-65% of the points for the homework assignments/ the AWS Academy Labs AND got 50-65% of the points for the participation and classwork assignments submission AND got a grade of 1 - 3 for the teamwork assignment.

Arviointikriteerit, hyvä (3-4)

The student got 66-85% of the points for the homework assignments/ the AWS Academy Labs AND got 66-85% of the points for the participation and classwork assignments submission AND got a grade of 4 for the teamwork assignment.

Arviointikriteerit, kiitettävä (5)

The student got at least 86% of the points for the homework assignments/ the AWS Academy Labs AND got at least 86% of the points for the participation and classwork assignments submission AND got a grade of 5 for the teamwork assignment.

Ilmoittautumisaika

01.06.2023 - 14.09.2023

Ajoitus

04.09.2023 - 15.12.2023

Opintopistemäärä

5 op

Toteutustapa

Lähiopetus

Yksikkö

Tekniikka ja liiketoiminta

Toimipiste

Kupittaan kampus

Opetuskielet
  • Englanti
Paikat

25 - 35

Koulutus
  • Tietojenkäsittelyn koulutus
Opettaja
  • Golnaz Sahebi
Ryhmät
  • PTIETS22deai
    PTIETS22 Datatekniikka ja Tekoäly
  • PTIVIS22I
    Data Engineering and AI

Tavoitteet

After completing the course the student is able to:
Understand and describe the data engineering process life cycle

Sisältö

What is Data Engineering
Data Storage and Retrieval
Data Engineering Lifecycle
Extract, Transform and Load (ETL) process
Introduction to Big Data Frameworks

Oppimateriaalit

Material will be available via the learning environment (ITS).

Opetusmenetelmät

Weekly contact sessions when 3-4 hours for theory and practical exercises.
Additionally, there is home work exercises.

Pedagogiset toimintatavat ja kestävä kehitys

The course includes approximately 11 theory sessions and guided exercises sessions where students work with practical tasks.
Additionally, exercises for home work that will be partly demonstrated in during contact sessions.

Opiskelijan ajankäyttö ja kuormitus

Contact hours
- 10 times 3.5h theory and practice: 10 x 3.5h = 35 hours
- Final projects and presentations: 25 hours

Home work: approximately 70 hours

Total: approximately: 130 hours

Sisällön jaksotus

Course Topics and Scheduling (pre-planning):
Week 36: Course Overview and Introduction to Data Engineering
Week 37 - 38: The Data Engineering Ecosystem
Week 39: Big Data Platforms
Week 40: Exercise Demo (I)
Week 41: Week 41: Apache Airflow
Week 43: Data Engineering Life Cycle - Data wrangling
Week 44: Data Engineering Life Cycle - Data Wrangling and ETL in Airflow
Week 45: Data Engineering Lifecycle - Governance and Compliance
Week 46 and 47: Exercise demo and working independently on your final projects within your groups
Week 48: Final Project presentations

Viestintäkanava ja lisätietoja

ITS.

Arviointiasteikko

H-5

Arviointimenetelmät ja arvioinnin perusteet

The course is graded on a scale of 0-5.
*
You can achieve a maximum of 60 points from practical exercises in class room and homework exercises, and a maximum of 40 points from the final project.
*
To pass the course, you need to achieve at least 30 points of the exercises and 20 points of the final project.

Hylätty (0)

Less than 50 points in exercises and project not passed (less than 45% points).
To pass the course, you need to achieve at least 30 points of the exercises and 20 points of the final project.

Arviointikriteerit, tyydyttävä (1-2)

50 - 69 points from the total points of the exercises and the final project

Arviointikriteerit, hyvä (3-4)

70 - 89 points from the total points of the exercises and the final project

Arviointikriteerit, kiitettävä (5)

90 - 100 points from the total points of the exercises and the final project