

Introduction
Syllabus
Give yourself a boost with this 4-session focused course. We'll cover everything that's needed to land your dream job. From functions and random variables to state-of-the-art methods like XGBoost.
Session 1 — Foundations: Functions, Random Variables, and ML Motivation
1. Mathematical Foundations
What is a function? Domain, codomain, mappings
Deterministic vs stochastic functions
2. Random Variables & Distributions
Discrete vs continuous RVs
PMF, PDF, CDF
Joint distributions, independence
3. Moments & Their Role in ML
Expectation
Variance
Covariance, correlation
4. Connecting Probability to ML
Bias–variance decomposition
Underfitting vs. overfitting
Why randomness matters for generalization
Session 4 — Advanced Algorithms: Trees, Ensembles and modern ML
1. Decision Trees
Splitting criteria (Gini, entropy)
Depth, pruning, overfitting
2. Bagging & Random Forests
Bootstrapping
Aggregation
Feature randomness
Out-of-bag evaluation
3. Boosting
AdaBoost intuition
Gradient boosting framework
High-level intuition of XGBoost
4. Comparing the Major Algorithms
When linear models win
When kNN is appropriate
When trees and ensembles dominate
Interpretability vs predictive performance
5. Practical Modeling Pipeline Wrap-Up
Choosing models based on data
Feature engineering considerations
Real-world ML workflow
Session 3 — Foundational Algorithms: Regression & Simple ML Models
1. Supervised Learning Types
Regression vs classification
Continuous vs discrete outputs
2. Linear Models
Linear regression
Logistic regression
Interpretation, assumptions
When linear models work (and don’t)
3. Distance-Based Methods
k-Nearest Neighbors
Distance metrics
Strengths and limitations
4. Basic Unsupervised (Minimal Coverage)
k-Means clustering
Use cases (segmentation, exploration)
Caveats (scaling, shape assumptions)
5. How These Models Fit Together
Parametric vs non-parametric
Low-variance vs high-variance models
Session 2 — Asymptotic Theory + ML Framework
1. Asymptotics for Learning
Convergence in probability and distribution
Law of Large Numbers, Central Limit Theorem
2. Classical Inference
Hypothesis testing and confidence intervals
Linking statistical inference to ML validation
3. ML Workflow & Core Concepts
Data Generating Process (DGP)
Train/validation/test splits and cross-validation
4. Loss Functions & Optimization
MSE, MAE, log-loss, hinge loss
Choosing the right loss
Gradient descent
5. Model Evaluation
Precision, recall, F1, accuracy and class imbalance
Schedule & Location
A 4-session, in-person course in the heart of Budapest. Classes are held on Sundays, usually from 9am to 1-2pm.
Pricing
Statistics for Data Science - 110.000 HUF
