Today weโll send the extracted PDF text to GPT to pull out structured fields, such as parties involved, dates, amounts, and obligations โ turning raw contracts/invoices into usable data.
๐ฏ Goal
Parse AI-readable text into structured JSON using GPT function calling.
๐ฆ Step 1: Define GPT Function Schema
Create a helper or inline array:
$functions = [
[
"name" => "extract_contract_details",
"description" => "Extracts parties, dates, and values from contract or invoice text.",
"parameters" => [
"type" => "object",
"properties" => [
"party_a" => ["type" => "string"],
"party_b" => ["type" => "string"],
"start_date" => ["type" => "string"],
"end_date" => ["type" => "string"],
"total_amount" => ["type" => "string"],
],
"required" => ["party_a", "party_b"]
],
]
];
๐ง Step 2: Send Extracted Text to GPT
In a new service DocumentAnalysisService.php
:
use OpenAI\Laravel\Facades\OpenAI;
public function extractFields(string $text): array
{
$response = OpenAI::chat()->create([
'model' => 'gpt-4',
'messages' => [
['role' => 'user', 'content' => "Extract structured fields from this text:\n\n" . $text],
],
'functions' => $functions,
'function_call' => ['name' => 'extract_contract_details'],
]);
$result = $response['choices'][0]['message']['function_call']['arguments'] ?? '{}';
return json_decode($result, true);
}
๐ฅ Step 3: Update Controller to Save Extracted Data
Update DocumentController.php
:
use App\Services\DocumentAnalysisService;
public function store(Request $request, PdfTextExtractor $extractor, DocumentAnalysisService $ai)
{
...
$text = $extractor->extract($filename);
$fields = $ai->extractFields($text);
Document::create([
'title' => $request->title,
'type' => $request->type,
'filename' => $filename,
'user_id' => auth()->id(),
'extracted_text' => $text,
// optional: store $fields as JSON in another column or separate table
]);
}
Optional: create a
document_fields
table to store these fields if needed for searching/filtering.
โ Summary
โ Today you:
- Defined a GPT function to extract key values
- Sent raw PDF text to GPT for structured data extraction
- Returned parsed JSON fields like parties and amounts
โ Up next (Day 4): Weโll store and display extracted fields, and optionally allow admins to approve or correct GPT results before accepting them.