Confidence Scoring
Learn how CargoLint's AI assigns confidence scores to extracted fields and how to use them.
Understanding Confidence Scores
Confidence scores measure how certain CargoLint’s AI is about extracted field values. Scores range from 0.0 (no confidence) to 1.0 (absolute confidence).
Each extracted field receives an individual confidence score based on:
- Document quality - Clarity, resolution, legibility
- Field location - Position and context in document
- Text patterns - Matching known formats and standards
- Field completeness - Availability of supporting data
Per-Field Scoring
Every extracted field includes a confidence score:
{
"extracted_fields": {
"invoice_number": {
"value": "INV-2026-001",
"confidence": 0.98
},
"invoice_date": {
"value": "2026-03-02",
"confidence": 0.94
},
"total_amount": {
"value": 5250.00,
"confidence": 0.87,
"currency": "USD"
},
"buyer_phone": {
"value": "+1-555-0123",
"confidence": 0.62
}
}
}
Review Threshold
CargoLint uses a 70% (0.70) overall confidence threshold to determine whether a document requires human review. Documents scoring below this threshold are automatically routed to the review queue.
The confidence calculation applies penalties based on validation issues:
- Error-level issues (e.g., missing required fields, calculation mismatches): 0.20 penalty per issue
- Warning-level issues (e.g., format inconsistencies): 0.10 penalty per issue
Confidence Thresholds by Document Type
Each document type has its own confidence thresholds, reflecting the varying complexity of extraction:
Commercial Invoice
- High confidence: > 0.85
- Review recommended: 0.65 - 0.85
- Low confidence: < 0.65
Packing List
- High confidence: > 0.80
- Review recommended: 0.60 - 0.80
- Low confidence: < 0.60
Bill of Lading
- High confidence: > 0.88
- Review recommended: 0.68 - 0.88
- Low confidence: < 0.68
Certificate of Origin
- High confidence: > 0.85
- Review recommended: 0.62 - 0.85
- Low confidence: < 0.62
How Scores Affect Routing
CargoLint automatically routes documents based on confidence scores:
Auto-Processed Workflow:
- Overall document confidence >= 0.70
- No critical validation errors
- Document marked as processed
Review Queue Workflow:
- Overall document confidence < 0.70
- Document flagged as
requires_review - Reviewer confirms or corrects each field
- Validated data saved
HS Code Confidence
HS code suggestions have their own confidence thresholds:
- >= 0.55 - Top suggestion is auto-selected
- 0.25 - 0.54 - Suggestions are shown but require manual selection
- < 0.25 - No suggestions returned; manual classification needed
Improving Scores Over Time
User Corrections
When you correct an extracted value in the review queue, CargoLint logs the correction. These corrections help improve extraction quality for similar documents.
Interpreting Scores by Field Type
Confidence thresholds vary by field category:
Structured Fields (Numbers, Codes)
- Invoice numbers: Typically 0.90+
- Dates: Typically 0.92+
- HS codes: Typically 0.85+
- Currency amounts: Typically 0.88+
Lower confidence indicates: Handwritten data, unusual formatting, or poor document quality.
Alphanumeric Fields (Names, Addresses)
- Shipper/consignee names: Typically 0.80+
- Addresses: Typically 0.75+
Lower confidence indicates: Handwritten names, non-Latin characters, or abbreviated text.
Unstructured Fields (Descriptions, Notes)
- Line item descriptions: Typically 0.70+
- Marks and numbers: Typically 0.65+
Lower confidence indicates: Abbreviations, technical jargon, or non-standard formatting.
Best Practices
- Start conservative with auto-import thresholds, relax as confidence improves
- Monitor low-confidence fields to identify document quality issues
- Use confidence scores to prioritize manual review queue
- Track improvements over time as the model learns from corrections