Hands-on Data Science

Next course starts on April 1st

A group of students collaborating on a data science project.
A group of students collaborating on a data science project.
Introduction
Syllabus

Prerequisite: Statistics for Data Science
Roll up your sleeves and create some production grade data science projects. Deepen your understanding and gain experience through real data science problems and practice both coding and statistical problem solving. Showcase your newly gained knowledge on github to catch the eye of recruiters quickly.
Work through 3 projects of different complexity.

Session 1 — Foundations: Functions, Random Variables, and ML Motivation

1. Mathematical Foundations

  • What is a function? Domain, codomain, mappings

  • Deterministic vs stochastic functions

2. Random Variables & Distributions

  • Discrete vs continuous RVs

  • PMF, PDF, CDF

  • Joint distributions, independence

3. Moments & Their Role in ML

  • Expectation

  • Variance

  • Covariance, correlation

4. Connecting Probability to ML

  • Bias–variance decomposition

  • Underfitting vs. overfitting

  • Why randomness matters for generalization

Session 4 — Advanced Algorithms: Trees, Ensembles and modern ML

1. Decision Trees

  • Splitting criteria (Gini, entropy)

  • Depth, pruning, overfitting

2. Bagging & Random Forests

  • Bootstrapping

  • Aggregation

  • Feature randomness

  • Out-of-bag evaluation

3. Boosting

  • AdaBoost intuition

  • Gradient boosting framework

  • High-level intuition of XGBoost

4. Comparing the Major Algorithms

  • When linear models win

  • When kNN is appropriate

  • When trees and ensembles dominate

  • Interpretability vs predictive performance

5. Practical Modeling Pipeline Wrap-Up

  • Choosing models based on data

  • Feature engineering considerations

  • Real-world ML workflow

Session 3 — Foundational Algorithms: Regression & Simple ML Models

1. Supervised Learning Types

  • Regression vs classification

  • Continuous vs discrete outputs

2. Linear Models

  • Linear regression

  • Logistic regression

  • Interpretation, assumptions

  • When linear models work (and don’t)

3. Distance-Based Methods

  • k-Nearest Neighbors

  • Distance metrics

  • Strengths and limitations

4. Basic Unsupervised (Minimal Coverage)

  • k-Means clustering

  • Use cases (segmentation, exploration)

  • Caveats (scaling, shape assumptions)

5. How These Models Fit Together

  • Parametric vs non-parametric

  • Low-variance vs high-variance models

Session 2 — Asymptotic Theory + ML Framework

1. Asymptotics for Learning

  • Convergence in probability and distribution

  • Law of Large Numbers, Central Limit Theorem

2. Classical Inference

  • Hypothesis testing and confidence intervals

  • Linking statistical inference to ML validation

3. ML Workflow & Core Concepts

  • Data Generating Process (DGP)

  • Train/validation/test splits and cross-validation

4. Loss Functions & Optimization

  • MSE, MAE, log-loss, hinge loss

  • Choosing the right loss

  • Gradient descent

5. Model Evaluation

  • Precision, recall, F1, accuracy and class imbalance

Schedule & Location

A 4-session, in-person course in the heart of Budapest. Classes are held on Sundays, usually from 9am to 1-2pm.

Pricing

Statistics for Data Science - 110.000 HUF