The Issue During the extraction process, we frequently encounter the finishReason: 'RECITATION' trigger, which halts the generation.
We have observed that for some of these halted responses, the model identifies that the text exists on the open web and provides a source link.Our Constraints & Attempts
-
Authorization: We have full legal permission and rights to use these documents for this purpose. We made sure that this was reflected in the prompt.
-
Prompt Engineering: We have adjusted our system instructions and prompts to explicitly frame the task as an “Optical Character Recognition (OCR)” task rather than a generation task.
-
Schema Constraints: We are utilizing output JSON schemas to enforce structural rigor, but the recitation filter persists.
The Request Given that we have the rights to this content and the “recitation” is the functional goal of this specific agent, could you please advise on the following:
-
Are there specific safety settings or parameter adjustments available via the API to relax the Recitation/Copyright filters for authorized content?
-
Are there specific prompt engineering strategies or “OCR-mode” triggers that effectively signal to the model that it is performing a mechanical extraction rather than creative generation?
-
Is there a mechanism to whitelist specific domains or content types to prevent these false positives?