Evaluate Agent Performance¶
Measure and improve your agent's accuracy.
Evaluation Metrics¶
Classification Metrics¶
iovalence evaluate --model model.pkl --test data/test.csv
Output:
=== EVALUATION METRICS ===
Accuracy: 0.92
Precision: 0.90
Recall: 0.94
F1-Score: 0.92
=== CONFUSION MATRIX ===
Predicted
Positive Negative
Actual Pos 92 8
Neg 5 95
=== CLASSIFICATION REPORT ===
Precision Recall F1-Score
Positive 0.95 0.92 0.93
Negative 0.92 0.95 0.93
Regression Metrics¶
MAE (Mean Absolute Error): 0.45
RMSE (Root Mean Sq Error): 0.62
R² Score: 0.89
Understanding Metrics¶
Accuracy¶
Percentage of correct predictions overall.
Good: > 85%
Excellent: > 95%
Precision¶
Of predicted positives, how many correct?
High precision = Few false positives
Use when: False alarms are costly
Recall¶
Of actual positives, how many caught?
High recall = Few false negatives
Use when: Missing cases is costly
F1-Score¶
Balance between precision and recall.
Harmonic mean of precision and recall
Range: 0-1 (higher is better)
Confusion Matrix¶
Shows true positives, false positives, etc.
Predicted
YES | NO
Actual ------|-----
YES TP | FN
NO FP | TN
TP: Correct positive prediction
FN: Missed positive (false negative)
FP: Incorrect positive (false positive)
TN: Correct negative prediction
Improving Performance¶
If Accuracy is Low¶
- Get more data
- More examples = better learning
-
2x data ≈ better accuracy
-
Improve data quality
- Check labels correct
- Remove outliers
-
Balance classes
-
Adjust training
- More epochs
- Lower learning rate
- Larger model
If Model Overfits¶
Training accuracy: 95%
Test accuracy: 70% ❌ (overfitting)
Solutions: - Add more training data - Increase dropout - Reduce model size - Early stopping
If Model Underfits¶
Training accuracy: 75%
Test accuracy: 73% (both low)
Solutions: - More epochs - Larger model - Higher learning rate - Check data quality
Visualization¶
Plot Confusion Matrix¶
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True)
plt.show()
Plot ROC Curve¶
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
fpr, tpr, _ = roc_curve(y_true, y_proba)
plt.plot(fpr, tpr, label=f'AUC = {auc(fpr, tpr):.2f}')
plt.show()
Cross-Validation¶
For robust evaluation:
iovalence evaluate --model model.pkl \
--test data/test.csv \
--cross-validation 5
Tests model 5 different ways = more reliable estimate.
Comparison¶
Compare multiple models:
iovalence compare \
--models model1.pkl model2.pkl model3.pkl \
--test data/test.csv
Export Results¶
Save evaluation report:
iovalence evaluate --model model.pkl \
--test data/test.csv \
--output report.json