fix: prevent serial bleed into MODEL_NUMBER in vision output

Parser now strips any embedded field labels (e.g. "SERIAL: x") that the
LLM mistakenly appends to a field value. Prompt updated with a concrete
example showing MODEL_NUMBER as blank to reinforce leaving it empty when
no separate part code is visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Simon Kuehn 2026-05-18 10:19:17 +00:00
parent fc628df42b
commit 974bd239a5
2 changed files with 17 additions and 10 deletions

View file

@ -33,10 +33,15 @@ final class OllamaVisionAgent
private function extractField(string $response, string $field): string
{
if (preg_match('/^'.$field.':\s*(.+)$/m', $response, $matches)) {
return trim($matches[1]);
}
if (!preg_match('/^'.$field.':\s*(.*)$/m', $response, $matches)) {
return '';
}
$value = trim($matches[1]);
// Strip any embedded field label the model mistakenly included (e.g. "SERIAL: PNV09SJZ")
$value = preg_replace('/\s+[A-Z_]+:.*$/i', '', $value) ?? $value;
return trim($value);
}
}

View file

@ -44,14 +44,16 @@ PROMPT,
'vision_analyze' => <<<'PROMPT'
Look at this nameplate/label photo of IT hardware.
Extract the manufacturer, any model identifier (name or number), and serial number visible on the label.
If the label shows both a product name (e.g. "ThinkPad T490s") and a part/product code (e.g. "20NXS0BA00"), put the product name in MODEL_NAME and the code in MODEL_NUMBER.
If only one model field is visible (regardless of whether it looks like a name or a code), put it in MODEL_NAME and leave MODEL_NUMBER empty.
If the label shows both a product name (e.g. "ThinkPad T490s") and a separate part/product code (e.g. "20NXS0BA00"), put the product name in MODEL_NAME and the code in MODEL_NUMBER.
If only one model field is visible, put it in MODEL_NAME and leave MODEL_NUMBER completely empty.
MODEL_NUMBER must never contain the serial number.
Do not guess or add information not visible on the label.
Respond in exactly this format (use empty string if not visible):
MANUFACTURER: <brand name, e.g. Dell, HP, Lenovo, Medion>
MODEL_NAME: <product name or model identifier>
MODEL_NUMBER: <part/product code, only if separately shown>
SERIAL: <serial number>
Respond in exactly this format:
MANUFACTURER: Lenovo
MODEL_NAME: ThinkBook 14 G6 IRL
MODEL_NUMBER:
SERIAL: PNV09SJZ
Use empty string (nothing after the colon) when a field is not visible.
PROMPT,
'json_coding' => <<<'PROMPT'