Skip to main content

Introduction to Data Engineering (5 cr)

Code: TT00CN68-3005

General information


Enrollment
15.05.2025 - 12.09.2025
Registration for the implementation has begun.
Timing
12.09.2025 - 21.12.2025
The implementation has not yet started.
Number of ECTS credits allocated
5 cr
Local portion
5 cr
Mode of delivery
Contact learning
Unit
Engineering and Business
Campus
Kupittaa Campus
Teaching languages
English
Seats
25 - 65
Teachers
Golnaz Sahebi
Tommi Tuomola
Groups
DEAI24A
Data Engineering and Artificial Intelligence
DEAI24B
Data Engineering and Artificial Intelligence
Course
TT00CN68
No reservations found for realization TT00CN68-3005!

Evaluation scale

H-5

Content scheduling

Course Overview
This course provides an introduction to data engineering, combining theoretical concepts with practical applications. The course is divided into two main parts, each with a distinct focus:

- Part I: Theories and Practice
• Instructor-Led Sessions: Covering general topics in data engineering, taught and supervised by the instructors.
• Self-study tasks

- Part II: Optional AWS Academy Self-Paced Course
• Self-Paced Learning: Students have the option to independently complete the AWS Academy Data Engineering course, gaining in-depth knowledge and earning a certification. This can replace the requirement to complete standard homework assignments.


Student Responsibilities
1. Class Participation and Assignments:
• Active participation in all classes, including the completion of in-class assignments.
2. Homework Assignments: (students can choose one of the following options)
• Option A: Complete the individual homework exercises, partially demonstrated during contact sessions.
• Option B: Complete the full AWS Academy Data Engineering course as a substitute for the homework assignments. To do this, students must follow the weekly schedule and upload their AWS Academy course certificate to the Itslearning platform.
3. Final Project:
• A group project (3-4 students) to be completed over Weeks 47 and 48, culminating in a presentation in Week 49.
________________________________________
Additional Notes
• Flexibility: The option to replace homework with the AWS Academy course allows students to tailor their learning experience to their interests and career goals.

• Project Work: The group project encourages collaboration and the practical application of the skills learned throughout the course.

Objective

After completing the course the student is able to:
Understand and describe the data engineering process life cycle

Content

What is Data Engineering
Data Storage and Retrieval
Data Engineering Lifecycle
Extract, Transform and Load (ETL) process
Introduction to Big Data Frameworks

Materials

- The learning materials including slides and exercises will be prepared by the lecturer from various sources such as online courses and articles, books, videos, etc. The material will be introduced during the lectures and will be available via the learning environment (ITS).

- AWS Academy Data Engineering [91081] Course Materials

Teaching methods

- Participating in lectures (theory and practice)
- Learning through hands-on programming (classwork assignments)
- Completing homework assignments or AWS Academy Course
- Interacting with the teachers and classmates
- Enhancing knowledge through teamwork projects

Exam schedules

No exam

There is a final teamwork project where students must demonstrate their work during a presentation event in week 49.

Pedagogic approaches and sustainable development

- The course includes approximately 12 theory and practice sessions, where students engage with practical tasks.
- Additionally, there are 4 online Q&A sessions to provide extra support.
- Homework exercises will be assigned, with some parts demonstrated during contact sessions.
- Integration of Cloud-based data engineering through the AWS Academy course.
- A teamwork project, requiring students to apply their teamwork skills and the knowledge gained from the course to implement their final project.

Completion alternatives

The practice works and exercises are mainly performed using VS Code, Jupyter Notebook, Apache Airflow, and AWS services.

Student workload

- Contact teaching:
• We have 12 theory and practice sessions, each lasting 3 hours, conducted weekly. (36 hours)
• Additionally, there are 4 online Q&A sessions, each lasting 1 hour.
• Total contact teaching hours per course: 40 hours.

- Homework and teamwork assignment:
• Personal assignments (homework) and independent studies: 75 hours
• Teamwork assignment: 20 hours

Total: approximately 135 hours (5 x 27h)

Evaluation methods and criteria

The course is graded on a scale from 0 to 5, based on the total points accumulated:

1. Lesson Participation (Approx. 20%)
- Full points: Attendance in more than 70% of lectures.
- Half points: Attendance in 50–70% of lectures.
- No points: Attendance in less than 50% of lectures.

2. Weekly Exercises (Approx. 60%)
- Includes classwork, homework, or AWS Academy Labs (as a substitute for homework):
- Full points: Submitted on time and attending the demonstration sessions.
- Half points: Submitted after the deadline or not attending the demonstration session.

Note: Demonstrating homework exercises during contact sessions is mandatory. Failure to do so results in a 50% deduction of the respective exercise's points.

3. Team Project (Approx. 20%)
- A final team-based project to be completed by the end of the course.

Passing Criteria
To pass the course, students must earn at least 50% of the possible points in each of the following components:
- Lesson participation
- Exercises
- Final project

The course is graded on a scale of 0-5.
Grading will be according to the total points collected by the student during the course as well as the final project.
1: 50% (minimum to pass the course)
2: 60-69%
3: 70-79%
4: 80-89%
5: 90- 100%

Failed (0)

Less than 50% in assignments not passed.

Assessment criteria, satisfactory (1-2)

1: 50% - 59% from the total points of the assignments

2: 60% - 69% from the total points of the assignments

Assessment criteria, good (3-4)

3: 70% - 79% from the total points of the assignments

4: 80% - 89% from the total points of the assignments

Assessment criteria, excellent (5)

90%- 100% from the total points of the assignments

Further information

Use of AI in assignments and final project: USE OF AI REPORTED.
AI can be used in the creation of outputs, but student must clearly report its use. Failure to disclose the use of AI will be interpreted as fraud. The use of AI may affect to assessment.

-----------------------------------------------------------------------------------
Qualifications and Prerequisites:
Before taking an "Introduction to Data Engineering with Python" course, students typically need a foundational understanding of several key areas. Here are the mandatory and recommended prerequisite courses and topics.

1. Mandatory Prerequisites: 
1.1. Programming:
1.1.1. Introduction to Programming: Knowledge of programming fundamentals,
including concepts like variables, loops, conditionals, and functions. 
1.1.2. Python Programming: Familiarity with Python, including basic syntax, data
types, control structures, and function and modules
1.1.3. Error Handling 
1.1.4. Object-oriented programming (OOP) 
1.1.5. Data Manipulation: Skills in using Pandas library including DataFrames and
Series, reading, writing, filtering, and transforming data
1.2. Databases: Knowledge of how databases work, including concepts like tables, keys, normalization, and indexing.


2. Recommended Topics:
2.1. Algorithms and Data Structures: Basic understanding of algorithms and data
structures such as arrays, lists, trees, and graphs, which are crucial for data
processing
2.2. Having the fundamental knowledge of cloud services or passing the Cloud Services
Course in TUAS (Lecturer: Ali Khan)
2.3. Version Control Systems: Basic understanding of tools like Git for version control.
2.4. Basic Algebra and Calculus: Fundamental math skills to handle data transformations
and calculations.
2.5. Statistics: Understanding of basic statistical concepts like mean, median, standard
deviation, and probability distributions.
2.6. Being familiar with VirtualBox and Ubuntu

Go back to top of page