Evaluate Agent Performance¶

Measure and improve your agent's accuracy.

Evaluation Metrics¶

Classification Metrics¶

iovalence evaluate --model model.pkl --test data/test.csv

Output:

=== EVALUATION METRICS ===
Accuracy:  0.92
Precision: 0.90
Recall:    0.94
F1-Score:  0.92

=== CONFUSION MATRIX ===
              Predicted
           Positive Negative
Actual Pos    92      8
       Neg     5     95

=== CLASSIFICATION REPORT ===
           Precision  Recall  F1-Score
Positive      0.95    0.92     0.93
Negative      0.92    0.95     0.93

Regression Metrics¶

MAE (Mean Absolute Error):     0.45
RMSE (Root Mean Sq Error):     0.62
R² Score:                      0.89

Understanding Metrics¶

Accuracy¶

Percentage of correct predictions overall.

Good: > 85%
Excellent: > 95%

Precision¶

Of predicted positives, how many correct?

High precision = Few false positives
Use when: False alarms are costly

Recall¶

Of actual positives, how many caught?

High recall = Few false negatives
Use when: Missing cases is costly

F1-Score¶

Balance between precision and recall.

Harmonic mean of precision and recall
Range: 0-1 (higher is better)

Confusion Matrix¶

Shows true positives, false positives, etc.

          Predicted
       YES  |  NO
Actual ------|-----
YES    TP    |  FN
NO     FP    |  TN

TP: Correct positive prediction
FN: Missed positive (false negative)
FP: Incorrect positive (false positive)
TN: Correct negative prediction

Improving Performance¶

If Accuracy is Low¶

Get more data
More examples = better learning
2x data ≈ better accuracy
Improve data quality
Check labels correct
Remove outliers
Balance classes
Adjust training
More epochs
Lower learning rate
Larger model

If Model Overfits¶

Training accuracy: 95%
Test accuracy: 70% ❌ (overfitting)

Solutions: - Add more training data - Increase dropout - Reduce model size - Early stopping

If Model Underfits¶

Training accuracy: 75%
Test accuracy: 73% (both low)

Solutions: - More epochs - Larger model - Higher learning rate - Check data quality

Visualization¶

Plot Confusion Matrix¶

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns

cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True)
plt.show()

Plot ROC Curve¶

from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

fpr, tpr, _ = roc_curve(y_true, y_proba)
plt.plot(fpr, tpr, label=f'AUC = {auc(fpr, tpr):.2f}')
plt.show()

Cross-Validation¶

For robust evaluation:

iovalence evaluate --model model.pkl \
  --test data/test.csv \
  --cross-validation 5

Tests model 5 different ways = more reliable estimate.

Comparison¶

Compare multiple models:

iovalence compare \
  --models model1.pkl model2.pkl model3.pkl \
  --test data/test.csv

Export Results¶

Save evaluation report:

iovalence evaluate --model model.pkl \
  --test data/test.csv \
  --output report.json

Next Steps¶

Metrics Deep Dive →