How AI Agents Work¶
Deep dive into the mechanics of AI agents.
The Agent Architecture¶
┌─────────────────┐
│ INPUT │
│ (Your data) │
└────────┬────────┘
│
↓
┌─────────────────────────┐
│ PREPROCESSING │
│ • Clean data │
│ • Format data │
│ • Extract features │
└────────┬────────────────┘
│
↓
┌─────────────────────────┐
│ NEURAL NETWORK/MODEL │
│ • Process features │
│ • Learn patterns │
│ • Make predictions │
└────────┬────────────────┘
│
↓
┌─────────────────────────┐
│ POST-PROCESSING │
│ • Format output │
│ • Calculate confidence │
│ • Apply thresholds │
└────────┬────────────────┘
│
↓
┌──────────────────┐
│ OUTPUT │
│ (Prediction) │
└──────────────────┘
Understanding Neurons & Layers¶
A Single Neuron¶
A neuron: 1. Receives multiple inputs 2. Multiplies each by a weight 3. Adds them up + bias 4. Applies activation function
inputs: [0.5, 0.8, 0.3]
weights: [0.2, 0.6, -0.3]
calculation:
(0.5 × 0.2) + (0.8 × 0.6) + (0.3 × -0.3) + bias
= 0.1 + 0.48 - 0.09 + 0.5
= 0.99
activation: relu(0.99) = 0.99
output: 0.99
Neural Network Layers¶
Layer 1 Layer 2 Layer 3
(Input) (Hidden) (Output)
[●]──────────[●]──────────[●]
[●] [●] [●]
[●]──────────[●]──────────[●]
[●] [●]
[●]
- Input Layer: Features of your data
- Hidden Layers: Learn patterns
- Output Layer: Final predictions
Training Process¶
Before Training¶
Random weights → Predictions are terrible
During Training¶
1. Make prediction
2. Compare to correct answer
3. Calculate error
4. Adjust weights slightly
5. Repeat thousands of times
After Training¶
Learned weights → Accurate predictions
Backpropagation (The Learning Engine)¶
The algorithm that updates weights:
1. Forward Pass
Input → Network → Prediction
2. Calculate Loss
loss = |prediction - actual|
3. Backward Pass
Start from output
Calculate gradient (direction to improve)
Update weights backwards through network
4. Repeat
Process another batch of data
Activation Functions¶
Functions that add non-linearity:
ReLU (Rectified Linear Unit)¶
f(x) = max(0, x)
Graph:
| /
| /
| /
|__/
Use: Hidden layers (most common)
Sigmoid¶
f(x) = 1 / (1 + e^-x)
Graph:
| ___
| /
|__/
Use: Binary classification output
Softmax¶
Converts outputs to probabilities
Sum to 1.0
Use: Multi-class classification output
Loss Functions¶
Measures how wrong predictions are:
Mean Squared Error (MSE)¶
For regression (predicting numbers)
Loss = average((predicted - actual)²)
Cross-Entropy¶
For classification (categorizing)
Loss = -sum(actual × log(predicted))
Optimization (Learning Algorithms)¶
Adjusts weights during training:
SGD (Stochastic Gradient Descent)¶
- Simple, reliable
- Slow convergence
- Good for large datasets
Adam (Adaptive Moment Estimation)¶
- Faster convergence
- Self-adjusting learning rate
- Recommended for most tasks
RMSprop¶
- Good for RNNs
- Adaptive learning rates
- Memory efficient
Overfitting & Underfitting¶
Underfitting¶
Model too simple for the data
Training Data Test Data
▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓
Model Fit: ╲────╱ (doesn't learn patterns)
╲╱
Solution: Bigger model, more training
Good Fit¶
Model learns patterns without memorizing
Training Data Test Data
▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓
Model Fit: ╱─────╲ (captures patterns)
╱ ╲
Status: ✅ Ready to deploy
Overfitting¶
Model memorizes training data
Training Data Test Data
▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓
Model Fit: ╱▓▓▓▓▓╲ (memorizes noise)
╱▓▓▓▓▓▓╲
Solution: Smaller model, more data, dropout
Preventing Overfitting¶
1. Regularization (L1/L2)¶
Penalizes large weights
regularization: l2
lambda: 0.001
2. Dropout¶
Randomly disables neurons during training
dropout: 0.3 # 30% of neurons disabled
3. Early Stopping¶
Stop when validation stops improving
early_stopping: true
patience: 5 # Stop after 5 non-improving epochs
4. More Data¶
The best regularizer - 2x data = better generalization - More diverse data = more robust
Inference (Using the Agent)¶
After training:
1. Load trained model & weights
2. Preprocess input (same as training)
3. Forward pass through network
4. Get output + confidence
5. Post-process results
6. Return prediction
Example¶
# Load model
agent = IOValence.load_model("trained_agent.pkl")
# Make prediction
result = agent.predict("Is this product good?")
# Output
{
"prediction": "positive",
"confidence": 0.95,
"scores": {
"positive": 0.95,
"negative": 0.04,
"neutral": 0.01
}
}
Performance Metrics Deep Dive¶
Confusion Matrix Components¶
Predicted Class
Positive Negative
Actual Pos TP(45) FN(5) → TPR = 45/50 = 0.90
Neg FP(3) TN(47) → TNR = 47/50 = 0.94
↓ ↓
PPV=0.94 NPV=0.90
- TP: Correctly identified positive
- FN: Missed positive (false negative)
- FP: Incorrectly identified as positive (false positive)
- TN: Correctly identified negative
When to Use Different Metrics¶
| Scenario | Best Metric | Why |
|---|---|---|
| Balanced classes | Accuracy | Simple and fair |
| Imbalanced data | F1-Score | Balances precision & recall |
| Important to catch all | Recall | Don't miss positive cases |
| Important to avoid false alarms | Precision | Reduce false positives |
| Spam detection | Precision | False alarms worse |
| Disease detection | Recall | Missing cases worse |
Transfer Learning¶
Using pre-trained models:
Pre-trained Model Your Task
(ImageNet, BERT, etc) (Your data)
[Learned Features] [Adapt to your needs]
↓ ↓
[Fixed Layers] ───────────► [Your Layers]
↓
[Final Predictions]
Benefits: - ✅ Faster training - ✅ Better performance - ✅ Less data needed - ✅ Leverages existing knowledge
Batch Normalization¶
Normalizes layer inputs:
Before: After:
[100] [0.5]
[2] [-0.3]
[500] [1.2]
[1] [0.1]
Effect:
• Faster training
• Allows higher learning rates
• More stable training
Next Steps¶
- 🧠 Training Basics - Training deep dive
- 📊 Data Preparation - Prepare your data
- 🚀 Create Your First Agent - Build something!
Ready to apply this knowledge? Create an Agent →