Confidence Scoring - Docs

Learn how CargoLint's AI assigns confidence scores to extracted fields and how to use them.

Understanding Confidence Scores

Confidence scores measure how certain CargoLint’s AI is about extracted field values. Scores range from 0.0 (no confidence) to 1.0 (absolute confidence).

Each extracted field receives an individual confidence score based on:

Document quality - Clarity, resolution, legibility
Field location - Position and context in document
Text patterns - Matching known formats and standards
Field completeness - Availability of supporting data

Per-Field Scoring

Every extracted field includes a confidence score:

{
  "extracted_fields": {
    "invoice_number": {
      "value": "INV-2026-001",
      "confidence": 0.98
    },
    "invoice_date": {
      "value": "2026-03-02",
      "confidence": 0.94
    },
    "total_amount": {
      "value": 5250.00,
      "confidence": 0.87,
      "currency": "USD"
    },
    "buyer_phone": {
      "value": "+1-555-0123",
      "confidence": 0.62
    }
  }
}

Review Threshold

CargoLint uses a 70% (0.70) overall confidence threshold to determine whether a document requires human review. Documents scoring below this threshold are automatically routed to the review queue.

The confidence calculation applies penalties based on validation issues:

Error-level issues (e.g., missing required fields, calculation mismatches): 0.20 penalty per issue
Warning-level issues (e.g., format inconsistencies): 0.10 penalty per issue

Confidence Thresholds by Document Type

Each document type has its own confidence thresholds, reflecting the varying complexity of extraction:

Commercial Invoice

High confidence: > 0.85
Review recommended: 0.65 - 0.85
Low confidence: < 0.65

Packing List

High confidence: > 0.80
Review recommended: 0.60 - 0.80
Low confidence: < 0.60

Bill of Lading

High confidence: > 0.88
Review recommended: 0.68 - 0.88
Low confidence: < 0.68

Certificate of Origin

High confidence: > 0.85
Review recommended: 0.62 - 0.85
Low confidence: < 0.62

How Scores Affect Routing

CargoLint automatically routes documents based on confidence scores:

Auto-Processed Workflow:

Overall document confidence >= 0.70
No critical validation errors
Document marked as processed

Review Queue Workflow:

Overall document confidence < 0.70
Document flagged as requires_review
Reviewer confirms or corrects each field
Validated data saved

HS Code Confidence

HS code suggestions have their own confidence thresholds:

>= 0.55 - Top suggestion is auto-selected
0.25 - 0.54 - Suggestions are shown but require manual selection
< 0.25 - No suggestions returned; manual classification needed

Improving Scores Over Time

User Corrections

When you correct an extracted value in the review queue, CargoLint logs the correction. These corrections help improve extraction quality for similar documents.

Interpreting Scores by Field Type

Confidence thresholds vary by field category:

Structured Fields (Numbers, Codes)

Invoice numbers: Typically 0.90+
Dates: Typically 0.92+
HS codes: Typically 0.85+
Currency amounts: Typically 0.88+

Lower confidence indicates: Handwritten data, unusual formatting, or poor document quality.

Alphanumeric Fields (Names, Addresses)

Shipper/consignee names: Typically 0.80+
Addresses: Typically 0.75+

Lower confidence indicates: Handwritten names, non-Latin characters, or abbreviated text.

Unstructured Fields (Descriptions, Notes)

Line item descriptions: Typically 0.70+
Marks and numbers: Typically 0.65+

Lower confidence indicates: Abbreviations, technical jargon, or non-standard formatting.

Best Practices

Start conservative with auto-import thresholds, relax as confidence improves
Monitor low-confidence fields to identify document quality issues
Use confidence scores to prioritize manual review queue
Track improvements over time as the model learns from corrections