Skip to content

Evaluate Agent Performance

Measure and improve your agent's accuracy.

Evaluation Metrics

Classification Metrics

iovalence evaluate --model model.pkl --test data/test.csv

Output:

=== EVALUATION METRICS ===
Accuracy:  0.92
Precision: 0.90
Recall:    0.94
F1-Score:  0.92

=== CONFUSION MATRIX ===
              Predicted
           Positive Negative
Actual Pos    92      8
       Neg     5     95

=== CLASSIFICATION REPORT ===
           Precision  Recall  F1-Score
Positive      0.95    0.92     0.93
Negative      0.92    0.95     0.93

Regression Metrics

MAE (Mean Absolute Error):     0.45
RMSE (Root Mean Sq Error):     0.62
R² Score:                      0.89

Understanding Metrics

Accuracy

Percentage of correct predictions overall.

Good: > 85%
Excellent: > 95%

Precision

Of predicted positives, how many correct?

High precision = Few false positives
Use when: False alarms are costly

Recall

Of actual positives, how many caught?

High recall = Few false negatives
Use when: Missing cases is costly

F1-Score

Balance between precision and recall.

Harmonic mean of precision and recall
Range: 0-1 (higher is better)

Confusion Matrix

Shows true positives, false positives, etc.

          Predicted
       YES  |  NO
Actual ------|-----
YES    TP    |  FN
NO     FP    |  TN

TP: Correct positive prediction
FN: Missed positive (false negative)
FP: Incorrect positive (false positive)
TN: Correct negative prediction

Improving Performance

If Accuracy is Low

  1. Get more data
  2. More examples = better learning
  3. 2x data ≈ better accuracy

  4. Improve data quality

  5. Check labels correct
  6. Remove outliers
  7. Balance classes

  8. Adjust training

  9. More epochs
  10. Lower learning rate
  11. Larger model

If Model Overfits

Training accuracy: 95%
Test accuracy: 70% ❌ (overfitting)

Solutions: - Add more training data - Increase dropout - Reduce model size - Early stopping

If Model Underfits

Training accuracy: 75%
Test accuracy: 73% (both low)

Solutions: - More epochs - Larger model - Higher learning rate - Check data quality

Visualization

Plot Confusion Matrix

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns

cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True)
plt.show()

Plot ROC Curve

from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

fpr, tpr, _ = roc_curve(y_true, y_proba)
plt.plot(fpr, tpr, label=f'AUC = {auc(fpr, tpr):.2f}')
plt.show()

Cross-Validation

For robust evaluation:

iovalence evaluate --model model.pkl \
  --test data/test.csv \
  --cross-validation 5

Tests model 5 different ways = more reliable estimate.

Comparison

Compare multiple models:

iovalence compare \
  --models model1.pkl model2.pkl model3.pkl \
  --test data/test.csv

Export Results

Save evaluation report:

iovalence evaluate --model model.pkl \
  --test data/test.csv \
  --output report.json

Next Steps


Metrics Deep Dive →