fix: prevent serial bleed into MODEL_NUMBER in vision output

Parser now strips any embedded field labels (e.g. "SERIAL: x") that the
LLM mistakenly appends to a field value. Prompt updated with a concrete
example showing MODEL_NUMBER as blank to reinforce leaving it empty when
no separate part code is visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Simon Kuehn 2026-05-18 10:19:17 +00:00
parent fc628df42b
commit 974bd239a5
2 changed files with 17 additions and 10 deletions

View file

@ -33,10 +33,15 @@ final class OllamaVisionAgent
private function extractField(string $response, string $field): string private function extractField(string $response, string $field): string
{ {
if (preg_match('/^'.$field.':\s*(.+)$/m', $response, $matches)) { if (!preg_match('/^'.$field.':\s*(.*)$/m', $response, $matches)) {
return trim($matches[1]);
}
return ''; return '';
} }
$value = trim($matches[1]);
// Strip any embedded field label the model mistakenly included (e.g. "SERIAL: PNV09SJZ")
$value = preg_replace('/\s+[A-Z_]+:.*$/i', '', $value) ?? $value;
return trim($value);
}
} }

View file

@ -44,14 +44,16 @@ PROMPT,
'vision_analyze' => <<<'PROMPT' 'vision_analyze' => <<<'PROMPT'
Look at this nameplate/label photo of IT hardware. Look at this nameplate/label photo of IT hardware.
Extract the manufacturer, any model identifier (name or number), and serial number visible on the label. Extract the manufacturer, any model identifier (name or number), and serial number visible on the label.
If the label shows both a product name (e.g. "ThinkPad T490s") and a part/product code (e.g. "20NXS0BA00"), put the product name in MODEL_NAME and the code in MODEL_NUMBER. If the label shows both a product name (e.g. "ThinkPad T490s") and a separate part/product code (e.g. "20NXS0BA00"), put the product name in MODEL_NAME and the code in MODEL_NUMBER.
If only one model field is visible (regardless of whether it looks like a name or a code), put it in MODEL_NAME and leave MODEL_NUMBER empty. If only one model field is visible, put it in MODEL_NAME and leave MODEL_NUMBER completely empty.
MODEL_NUMBER must never contain the serial number.
Do not guess or add information not visible on the label. Do not guess or add information not visible on the label.
Respond in exactly this format (use empty string if not visible): Respond in exactly this format:
MANUFACTURER: <brand name, e.g. Dell, HP, Lenovo, Medion> MANUFACTURER: Lenovo
MODEL_NAME: <product name or model identifier> MODEL_NAME: ThinkBook 14 G6 IRL
MODEL_NUMBER: <part/product code, only if separately shown> MODEL_NUMBER:
SERIAL: <serial number> SERIAL: PNV09SJZ
Use empty string (nothing after the colon) when a field is not visible.
PROMPT, PROMPT,
'json_coding' => <<<'PROMPT' 'json_coding' => <<<'PROMPT'