Bobby Encoded
PostsAbout
PostsAbout

© 2026 Bobby Jose

← Back to Blog

Machine Learning Fundamentals: From Core Concepts to Azure ML Pipelines

February 1, 2026 · 11 min read

AI, Machine Learning, Azure, Azure ML, MLOps, Interview Prep

Part 6 of the "From .NET to AI Engineer" series

Why This Post Exists

When I started my AI journey, I kept running into terms that everyone seemed to understand except me. "We'll use supervised learning with a gradient boosted classifier, optimize the hyperparameters, and deploy the model to an inference endpoint."

I'd nod along, then frantically Google everything afterward.

This post is what I wish I had when I started—a clear explanation of ML fundamentals, followed by how Azure Machine Learning puts these concepts into practice. Whether you're a .NET developer like me or coming from any other background, this should get everyone on the same page.

Part 1: Machine Learning Fundamentals

What Is Machine Learning, Really?

Traditional programming: You write explicit rules.

if (temperature > 30) return "Hot";
else if (temperature > 20) return "Warm";
else return "Cold";

Machine learning: The system learns rules from data.

Given: 10,000 examples of temperatures labeled "Hot", "Warm", "Cold"
Output: A model that can classify new temperatures

The key insight: ML finds patterns in data that would be impossible or impractical to code manually.

The Three Types of Machine Learning

1. Supervised Learning

You have labeled data—inputs paired with correct outputs.

Classification: Predict a category

  • Is this email spam or not spam?
  • What type of animal is in this photo?
  • Will this customer churn?

Regression: Predict a number

  • What will the stock price be tomorrow?
  • How many units will we sell next month?
  • What's the estimated house price?
Training Data (Supervised):
| Square Feet | Bedrooms | Price (Label) |
|-------------|----------|---------------|
| 1500        | 3        | $300,000      |
| 2000        | 4        | $450,000      |
| 1200        | 2        | $250,000      |

Model learns: Price ≈ f(Square Feet, Bedrooms)

2. Unsupervised Learning

No labels—the algorithm finds structure on its own.

Clustering: Group similar items

  • Segment customers by behavior
  • Group documents by topic
  • Identify anomalies in network traffic

Dimensionality Reduction: Simplify data while preserving patterns

  • Compress features for visualization
  • Remove noise from data
Training Data (Unsupervised):
| Customer | Purchase Frequency | Avg Order Value | Recency |
|----------|-------------------|-----------------|---------|
| A        | High              | High            | Recent  |
| B        | Low               | Low             | Old     |
| C        | High              | Low             | Recent  |

Algorithm finds: 3 natural customer segments

3. Reinforcement Learning

An agent learns by trial and error, receiving rewards or penalties.

  • Game-playing AI (chess, Go)
  • Robotics control
  • Recommendation systems

Essential ML Vocabulary

Features: The input variables (columns) your model uses to make predictions.

Features for house price: [square_feet, bedrooms, bathrooms, location]

Labels: The output variable you're trying to predict (supervised learning only).

Label: price

Training: The process of feeding data to an algorithm so it learns patterns.

Inference: Using a trained model to make predictions on new data.

Model: The mathematical representation of patterns learned from data. Think of it as a function that maps inputs to outputs.

Dataset Split:

  • Training set (70-80%): Data the model learns from
  • Validation set (10-15%): Data to tune hyperparameters
  • Test set (10-15%): Data to evaluate final performance (never seen during training)

Overfitting: When a model memorizes training data instead of learning general patterns. Performs great on training data, poorly on new data.

Underfitting: When a model is too simple to capture patterns. Performs poorly on everything.

Good fit:     Training accuracy: 92%  |  Test accuracy: 89%
Overfitting:  Training accuracy: 99%  |  Test accuracy: 65%
Underfitting: Training accuracy: 60%  |  Test accuracy: 58%

Hyperparameters: Settings you configure before training (learning rate, number of layers, etc.). Unlike model parameters, these aren't learned from data.

Epoch: One complete pass through the entire training dataset.

Batch Size: Number of samples processed before updating the model.

Common Algorithms (Brief Overview)

AlgorithmTypeUse Case
Linear RegressionSupervisedPredicting continuous values
Logistic RegressionSupervisedBinary classification
Decision TreesSupervisedClassification/regression with interpretability
Random ForestSupervisedEnsemble of decision trees, more robust
Gradient Boosting (XGBoost, LightGBM)SupervisedHigh-performance tabular data
K-MeansUnsupervisedClustering
Neural NetworksSupervisedComplex patterns, images, text

Model Evaluation Metrics

For Classification:

  • Accuracy: % of correct predictions (misleading with imbalanced data)
  • Precision: Of all positive predictions, how many were correct?
  • Recall: Of all actual positives, how many did we find?
  • F1 Score: Harmonic mean of precision and recall
Spam Detection Example:
- Precision: 95% of emails we marked as spam were actually spam
- Recall: We caught 80% of all actual spam emails

For Regression:

  • MAE (Mean Absolute Error): Average absolute difference
  • RMSE (Root Mean Square Error): Penalizes large errors more
  • R² (R-squared): How much variance is explained by the model

Part 2: Azure Machine Learning

Now that we have the vocabulary, let's see how Azure Machine Learning operationalizes these concepts.

What Is Azure Machine Learning?

Azure ML is a cloud platform for the entire ML lifecycle:

  • Data preparation
  • Model training (with compute management)
  • Model evaluation
  • Deployment to endpoints
  • Monitoring and retraining

Think of it as an IDE + infrastructure + MLOps platform for machine learning.

Core Components

┌─────────────────────────────────────────────────────────────┐
│                   Azure ML Workspace                         │
├─────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Datasets   │  │   Compute    │  │ Environments │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │    Models    │  │  Pipelines   │  │  Endpoints   │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                   ML Studio (UI)                      │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Workspace: The top-level container for all Azure ML resources. Like a project folder.

Compute: Where your code runs.

  • Compute Instance: A VM for development (like a cloud laptop)
  • Compute Cluster: Auto-scaling cluster for training jobs
  • Kubernetes: For production inference
  • Serverless: Pay-per-job compute

Datasets: Versioned references to your data (in Blob Storage, Data Lake, etc.)

Environments: Reproducible Python environments (conda/pip dependencies)

Models: Trained model artifacts registered with versioning

Endpoints: Deployed models serving predictions

  • Managed Online Endpoints: Real-time inference
  • Batch Endpoints: Large-scale batch scoring

Azure ML Pipelines

Pipelines are the heart of production ML. They define repeatable workflows:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Ingest     │ -> │  Transform  │ -> │   Train     │ -> │  Evaluate   │
│  Data       │    │  Features   │    │   Model     │    │  Metrics    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                                                                │
                                                                v
                                      ┌─────────────┐    ┌─────────────┐
                                      │   Deploy    │ <- │  Register   │
                                      │  to Endpoint│    │   Model     │
                                      └─────────────┘    └─────────────┘

Why Pipelines?

  • Reproducibility: Same inputs → same outputs
  • Automation: Schedule runs or trigger on data changes
  • Modularity: Reuse components across projects
  • Scalability: Each step can use different compute

Pipeline Example (Python SDK v2):

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component

# Load reusable components
prep_component = load_component(source="./components/prep_data.yml")
train_component = load_component(source="./components/train_model.yml")
evaluate_component = load_component(source="./components/evaluate.yml")

@pipeline(default_compute="cpu-cluster")
def ml_pipeline(raw_data: Input, test_split: float = 0.2):
    # Step 1: Prepare data
    prep_step = prep_component(
        input_data=raw_data,
        test_split=test_split
    )

    # Step 2: Train model
    train_step = train_component(
        training_data=prep_step.outputs.train_data,
        learning_rate=0.01,
        epochs=100
    )

    # Step 3: Evaluate
    eval_step = evaluate_component(
        model=train_step.outputs.model,
        test_data=prep_step.outputs.test_data
    )

    return {
        "model": train_step.outputs.model,
        "metrics": eval_step.outputs.metrics
    }

# Create and submit pipeline
pipeline_job = ml_pipeline(
    raw_data=Input(type="uri_file", path="azureml://datastores/blob/paths/data.csv")
)
ml_client.jobs.create_or_update(pipeline_job)

Real-World Use Cases

1. Predictive Maintenance

Problem: Predict when industrial equipment will fail.

Pipeline:

  1. Ingest sensor data from IoT Hub
  2. Feature engineering (rolling averages, anomaly scores)
  3. Train classification model (failure within 7 days: yes/no)
  4. Deploy to real-time endpoint
  5. Alert system subscribes to predictions

Azure Services: IoT Hub → Event Hubs → Azure ML → Logic Apps

2. Customer Churn Prediction

Problem: Identify customers likely to cancel.

Pipeline:

  1. Pull customer data from Azure SQL
  2. Create features (usage patterns, support tickets, billing history)
  3. Train gradient boosting model
  4. Batch scoring weekly
  5. Feed predictions to CRM

Azure Services: Azure SQL → Azure ML → Power BI/Dynamics

3. Document Classification

Problem: Auto-categorize incoming documents.

Pipeline:

  1. Documents land in Blob Storage
  2. Extract text with Azure AI Document Intelligence
  3. Generate embeddings with Azure OpenAI
  4. Train classifier on embeddings
  5. Real-time classification endpoint

Azure Services: Blob Storage → Document Intelligence → Azure OpenAI → Azure ML

4. Demand Forecasting

Problem: Predict product demand for inventory planning.

Pipeline:

  1. Historical sales data from data warehouse
  2. Join with external data (holidays, weather, promotions)
  3. Train time series model (Prophet, ARIMA, or neural)
  4. Generate forecasts on schedule
  5. Push to planning systems

Azure Services: Synapse → Azure ML → Azure Data Factory → ERP

AutoML: When You're Not Sure Which Algorithm

Azure AutoML automatically tries multiple algorithms and hyperparameters:

from azure.ai.ml import automl

classification_job = automl.classification(
    compute="cpu-cluster",
    training_data=train_data,
    target_column_name="churn",
    primary_metric="AUC_weighted",
    enable_model_explainability=True,
    # AutoML will try: LogisticRegression, LightGBM, XGBoost, etc.
)

returned_job = ml_client.jobs.create_or_update(classification_job)

AutoML handles:

  • Algorithm selection
  • Hyperparameter tuning
  • Feature engineering
  • Cross-validation
  • Model explainability

Responsible ML in Azure

Azure ML has built-in tools for responsible AI:

Fairlearn Integration: Detect and mitigate bias

from fairlearn.metrics import MetricFrame

# Analyze model fairness across demographic groups
metric_frame = MetricFrame(
    metrics={"accuracy": accuracy_score},
    y_true=y_test,
    y_pred=predictions,
    sensitive_features=sensitive_features
)

Model Explainability: Understand why models make predictions

  • Feature importance
  • SHAP values
  • Counterfactual explanations

Data Drift Monitoring: Detect when production data changes

  • Automatic alerts when input distributions shift
  • Trigger retraining pipelines

MLOps: Bringing It All Together

MLOps is DevOps for machine learning. Azure ML supports:

Version Control: Models, datasets, and environments are versioned CI/CD: Integrate with Azure DevOps or GitHub Actions Monitoring: Track model performance in production A/B Testing: Compare model versions with traffic splitting

# Example: GitHub Actions for ML pipeline
name: Train and Deploy Model

on:
  push:
    paths:
      - 'src/training/**'
  schedule:
    - cron: '0 0 * * 0'  # Weekly retraining

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      - name: Run Training Pipeline
        run: |
          az ml job create --file pipeline.yml

Questions Worth Considering

These questions helped me understand ML and Azure ML at a deeper level:

When would you choose Azure ML over just writing Python scripts locally?

When you need: reproducibility across team members, scalable compute for large datasets, model versioning and lineage, production deployment with monitoring, and compliance/governance features. For quick experiments, local Jupyter is fine.

How do you decide between real-time and batch inference?

Real-time (online endpoints): User-facing predictions, latency matters, individual requests. Batch endpoints: Large-scale scoring, latency tolerance, cost optimization. Often you need both—real-time for user interactions, batch for periodic bulk processing.

What's the relationship between Azure ML and Azure AI Services?

Azure AI Services (Vision, Language, Speech) are pre-built models—you call an API. Azure ML is for custom models—you train on your data. Many solutions combine both: use AI Services for standard tasks, Azure ML for domain-specific models.

How do you handle the "cold start" problem with ML projects?

Start with rule-based systems or pre-built models. Collect data over time. When you have enough labeled examples, train custom models. Continuously improve with production feedback. Don't wait for perfect data—iterate.


Key Takeaways

  • Machine learning finds patterns in data that would be impractical to code manually
  • Supervised learning needs labeled data; unsupervised finds structure without labels
  • The ML workflow: prepare data → train model → evaluate → deploy → monitor
  • Azure ML provides the infrastructure for the entire ML lifecycle
  • Pipelines make ML reproducible, automated, and production-ready
  • AutoML is great for exploring algorithms when you're not sure where to start
  • MLOps brings DevOps practices to machine learning

Resources

Azure ML Documentation:

  • Azure Machine Learning Documentation
  • Azure ML Python SDK v2
  • MLOps with Azure ML

ML Fundamentals:

  • Google's ML Crash Course
  • Scikit-learn Documentation

This post is part of my journey from .NET developer to AI engineer. Understanding these fundamentals was essential before diving into the more advanced topics covered in earlier posts.

← Previous

iOS Widgets: The Complete Implementation Guide

Next →

From .NET Developer to AI Engineer: Bridging the Skills Gap