How Confidence Scoring Works - Docs

Understand how CargoLint's AI measures certainty about extracted data and what the scores mean.

Every field CargoLint extracts comes with a confidence score-a number between 0 and 1 that represents how certain the AI is about its answer. Understanding these scores helps you know which extractions to trust and which need a closer look.

What the confidence scale means

0.9-1.0: Excellent confidence. The AI found clear, unambiguous data that matches expected patterns. These extractions rarely need review.
0.78-0.9: Good confidence. The data was found with reasonable certainty, but there are minor ambiguities or variations from the expected format. Most of these pass without review.
0.5-0.78: Moderate confidence. The data was found but with some uncertainty-possibly due to unclear text, unexpected layout, or multiple possible interpretations. These go to review.
0.0-0.5: Low confidence. The AI struggled to find or interpret the data. These always go to review.

What affects confidence scores

Document quality and clarity

Clean, high-resolution scans produce higher scores than blurry or low-contrast images.
Handwritten text typically scores lower than printed text because it’s harder for the AI to parse.
Poor lighting, shadows, or damage to the document reduces confidence.

Field location and format consistency

Fields in expected locations (e.g., invoice number near the top) score higher.
Consistent formatting (dates in the same format, amounts with the same currency symbol) increases confidence.
Unusual layouts or fields in unexpected places lower scores.

Text patterns and validation

The AI validates extracted data against patterns. For example, if an invoice total should equal the sum of line items, a mismatch reduces confidence.
Fields that match known patterns (valid dates, valid currencies, properly formatted addresses) score higher.
Incomplete or malformed data (missing ZIP codes, invalid dates) lowers confidence.

Character recognition

Clearly printed, machine-readable text scores higher.
Documents with unusual characters or symbols may have lower confidence on those sections.

How issues reduce the score

The overall score starts from the average of the per-field confidences. Each issue the validator finds on the document then reduces it in proportion to severity:

Error-level issues (a missing shipper or invoice total, an invalid Incoterm): reduce the score by 20%
Warning-level issues (a calculation mismatch, a missing address or Incoterm): reduce the score by 10%
Informational notes: shown for context, with no score impact

Two exceptions govern the schedule:

An HS code issue reduces the score by just 2%, however many lines it affects. HS classification is a separate check that routes the document on its own (see below), so it barely moves the extraction-quality number.
A field the AI read with low confidence already pulled down the per-field average the score starts from, so it carries no further reduction on top.

Reductions compound multiplicatively, so several issues lower the score meaningfully while a document with many issues still ranks below one with few - useful when triaging a review queue.

How the score routes documents

After extraction, every document lands in one of two statuses:

Pending Approval - the overall score is 78% or higher and nothing needs explicit confirmation. The document is ready for one-click approval.
Review - the document needs closer human attention before it can be approved.

The 78% threshold balances accuracy with speed, and a person gives the final sign-off on both paths before a document is marked complete.

The score is one input to that routing decision, and three conditions send a document to Review regardless of how high it scores:

HS codes awaiting confirmation. HS codes are never applied automatically - even a confident suggestion waits for a human to confirm it. Any line item without a confirmed code holds the document in Review.
Degraded scans. When the scan quality is too low for reliable reading, the document is held for a human check even if every field was extracted.
Shipment findings. Shipment consistency checks compare a document against the others in its shipment after extraction. A conflict (say, the invoice and packing list disagree on quantities) moves the document back to Review.

Score color and document status measure different things

In the app, the score’s color follows the score alone: green at 85% or higher, amber from 78% to 84%, red below 78%. The status follows the routing rules above. That means the two can legitimately differ, and each combination has a precise meaning:

A green score in Review means the extraction read the document cleanly, but something the score does not measure needs a human: HS codes waiting for confirmation, a degraded scan, or a shipment finding. The document view shows the specific hold reason.
An amber score in Pending Approval means the extraction hit minor issues, but the score cleared the 78% threshold and no hold applies, so nothing blocks one-click approval.

Read the score as “how cleanly the AI read this document” and the status as “what needs to happen next”.

One threshold for every document type

Commercial invoices, packing lists, bills of lading, and certificates of origin all use the same 78% review threshold, and the same issue severities cost the same on every type - an 85% invoice and an 85% packing list mean the same thing.

What happens to your corrections

Every correction you make in review is logged with the document and field it applied to. HS code corrections feed back into future suggestions for your organization automatically. Extraction corrections accumulate as training data used to evaluate and improve extraction models over time.

Confidence scores describe extraction quality at processing time. Once you review and complete a document, CargoLint shows it as Reviewed instead of its original score - the human sign-off supersedes the machine’s estimate.