Day 2 – Extracting Text from Uploaded PDFs in Laravel #LaravelGPT #PDFParsing #LaravelPDF #SmartDocs #TextExtraction #AIValidation

Today we’ll focus on extracting text from uploaded PDF documents, preparing it for AI analysis in later steps. This is a crucial step before sending data to GPT for validation.

Table of Contents

📦 Step 1: Install PDF Parser

We’ll use Smalot/pdfparser:

composer require smalot/pdfparser

🧠 Step 2: Create a Service to Handle PDF Text Extraction

php artisan make:service PdfTextExtractor

In app/Services/PdfTextExtractor.php:

namespace App\Services;

use Smalot\PdfParser\Parser;
use Illuminate\Support\Facades\Storage;

class PdfTextExtractor
{
    public function extract(string $filename): string
    {
        $parser = new Parser();
        $pdfPath = Storage::disk('public')->path("documents/{$filename}");
        $pdf = $parser->parseFile($pdfPath);

        return $pdf->getText();
    }
}

⚙️ Step 3: Add `extracted_text` to Documents Table

php artisan make:migration add_extracted_text_to_documents_table

In the migration:

public function up()
{
    Schema::table('documents', function (Blueprint $table) {
        $table->longText('extracted_text')->nullable();
    });
}

Then run:

php artisan migrate

🧪 Step 4: Modify Store Logic to Extract Text

In DocumentController.php:

use App\Services\PdfTextExtractor;

public function store(Request $request, PdfTextExtractor $extractor)
{
    $request->validate([
        'title' => 'required|string',
        'type' => 'required|in:contract,invoice',
        'document' => 'required|file|mimes:pdf|max:20480',
    ]);

    $file = $request->file('document');
    $filename = time() . '-' . $file->getClientOriginalName();
    $file->storeAs('documents', $filename, 'public');

    $text = $extractor->extract($filename);

    Document::create([
        'title' => $request->title,
        'type' => $request->type,
        'filename' => $filename,
        'user_id' => auth()->id(),
        'extracted_text' => $text,
    ]);

    return redirect()->back()->with('success', 'Document uploaded and text extracted.');
}

✅ Summary

✅ Today you:

Installed and used smalot/pdfparser to extract text from PDFs
Created a reusable service class
Stored extracted content alongside the document

✅ Up next (Day 3): We’ll send the extracted text to GPT for structure detection – like identifying contract parties, payment terms, dates, and clauses.

Post Views: 53

Day 2 – Extracting Text from Uploaded PDFs in Laravel #LaravelGPT #PDFParsing #LaravelPDF #SmartDocs #TextExtraction #AIValidation

📦 Step 1: Install PDF Parser

🧠 Step 2: Create a Service to Handle PDF Text Extraction

⚙️ Step 3: Add `extracted_text` to Documents Table

🧪 Step 4: Modify Store Logic to Extract Text

✅ Summary

Comments

Leave a Reply Cancel reply

📦 Step 1: Install PDF Parser

🧠 Step 2: Create a Service to Handle PDF Text Extraction

⚙️ Step 3: Add extracted_text to Documents Table

🧪 Step 4: Modify Store Logic to Extract Text

✅ Summary

Comments

Leave a Reply Cancel reply

⚙️ Step 3: Add `extracted_text` to Documents Table