Today we’ll focus on extracting text from uploaded PDF documents, preparing it for AI analysis in later steps. This is a crucial step before sending data to GPT for validation.
📦 Step 1: Install PDF Parser
We’ll use Smalot/pdfparser:
composer require smalot/pdfparser
🧠 Step 2: Create a Service to Handle PDF Text Extraction
php artisan make:service PdfTextExtractor
In app/Services/PdfTextExtractor.php
:
namespace App\Services;
use Smalot\PdfParser\Parser;
use Illuminate\Support\Facades\Storage;
class PdfTextExtractor
{
public function extract(string $filename): string
{
$parser = new Parser();
$pdfPath = Storage::disk('public')->path("documents/{$filename}");
$pdf = $parser->parseFile($pdfPath);
return $pdf->getText();
}
}
⚙️ Step 3: Add extracted_text
to Documents Table
php artisan make:migration add_extracted_text_to_documents_table
In the migration:
public function up()
{
Schema::table('documents', function (Blueprint $table) {
$table->longText('extracted_text')->nullable();
});
}
Then run:
php artisan migrate
🧪 Step 4: Modify Store Logic to Extract Text
In DocumentController.php
:
use App\Services\PdfTextExtractor;
public function store(Request $request, PdfTextExtractor $extractor)
{
$request->validate([
'title' => 'required|string',
'type' => 'required|in:contract,invoice',
'document' => 'required|file|mimes:pdf|max:20480',
]);
$file = $request->file('document');
$filename = time() . '-' . $file->getClientOriginalName();
$file->storeAs('documents', $filename, 'public');
$text = $extractor->extract($filename);
Document::create([
'title' => $request->title,
'type' => $request->type,
'filename' => $filename,
'user_id' => auth()->id(),
'extracted_text' => $text,
]);
return redirect()->back()->with('success', 'Document uploaded and text extracted.');
}
✅ Summary
✅ Today you:
- Installed and used
smalot/pdfparser
to extract text from PDFs - Created a reusable service class
- Stored extracted content alongside the document
✅ Up next (Day 3): We’ll send the extracted text to GPT for structure detection – like identifying contract parties, payment terms, dates, and clauses.