Most developers rely on ready-made sitemap packages in Laravel, but there are times when you need full control, such as:

  • Multilingual URLs
  • Filtering specific domains
  • Avoiding crawling CDN or static files
  • Removing invalid or outdated URLs
  • Building reusable crawlers for scheduled commands
  • Custom JSON-driven URLs

In this guide, we’ll walk through how to build a fully custom sitemap crawler in Laravel — with zero composer packages.

⚑ This crawler was tested and implemented by the Yasoz Group development team, ensuring it works on real-world websites with multilingual and dynamic content.


πŸ—οΈ Step 1 — Create a Crawl Command

Run:

php artisan make:command GenerateSitemap

This generates a command at:
app/Console/Commands/GenerateSitemap.php


🌐 Step 2 — Define Your Base Settings

Inside the command:

protected $baseUrl = 'https://www.yourwebsite.com';

protected $visited = [];

protected $urls = [];

This ensures the crawler only scans URLs from your domain.


πŸ” Step 3 — Build the Crawler Logic

We will:

  • Fetch an HTML page
  • Extract all <a href=""> links
  • Normalize URLs
  • Filter unwanted URLs
  • Recursively crawl
  • Avoid duplicates

public function crawl($url)

{

    if (isset($this->visited[$url])) return;

    $this->visited[$url] = true;

    $html = @file_get_contents($url);

    if (!$html) return;

    preg_match_all('/href=["\']([^"\']+)["\']/', $html, $matches);

    foreach ($matches[1] as $link) {

        $normalized = $this->normalizeUrl($link);

        if ($this->shouldInclude($normalized)) {

            $this->urls[] = $normalized;

            $this->crawl($normalized);

        }

    }

}


✨ Step 4 — Normalize URLs

You want:

  • convert /about → https://www.yourwebsite.com/about
  • remove hash fragments #section
  • skip empty links

private function normalizeUrl($url)

{

    if (str_starts_with($url, '/')) {

        return $this->baseUrl . $url;

    }

    if (str_starts_with($url, $this->baseUrl)) {

        return strtok($url, '#');

    }

    return null;

}


πŸ›‘οΈ Step 5 — Filtering Rules (SEO Friendly)

You should filter:
❌ External URLs
❌ File types (.js, .css, .png, .jpg, .svg, .webp, .pdf)
❌ cdn-cgi URLs
❌ URLs that do not begin with your base domain
❌ Hash-only URLs

private function shouldInclude($url)

{

    if (!$url) return false;

    if (!str_starts_with($url, $this->baseUrl)) return false;

    if (preg_match('/\.(js|css|png|jpg|jpeg|svg|gif|webp|pdf)$/i', $url)) return false;

    if (str_contains($url, 'cdn-cgi')) return false;

    return true;

}


πŸ“¦ Step 6 — Generate the XML Sitemap

After crawling, generate the XML file:

public function generateXml()

{

    $this->urls = array_unique($this->urls);

    $xml = new \SimpleXMLElement('<urlset/>');

    $xml->addAttribute('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');

    foreach ($this->urls as $url) {

        $urlTag = $xml->addChild('url');

        $urlTag->addChild('loc', $url);

        $urlTag->addChild('changefreq', 'daily');

        $urlTag->addChild('priority', '0.8');

    }

    $xml->asXML(public_path('sitemap.xml'));

}


🧩 Step 7 — Bring It All Together in handle()

public function handle()

{

    $this->info("Starting sitemap crawl…");

    $this->crawl($this->baseUrl);

    $this->generateXml();

    $this->info("Sitemap generated successfully!");

}


🌍 Step 8 — Run the Command

php artisan generate:sitemap


πŸ§ͺ Optional: Add Dynamic JSON-Based URLs (Recommended)

If you store custom URLs in JSON files:

private function loadCustomUrls()

{

    $paths = glob(base_path('resources/custom-urls/*.json'));

    foreach ($paths as $file) {

        $data = json_decode(file_get_contents($file), true);

        foreach ($data as $url) {

            $this->urls[] = $url;

        }

    }

}

Call it before crawling:

$this->loadCustomUrls();


🏁 Conclusion

Building your own custom sitemap crawler in Laravel gives you full control:

βœ”οΈ You decide what gets crawled
βœ”οΈ You exclude unnecessary URLs like CDN resources & static files
βœ”οΈ You can support multi-language URLs
βœ”οΈ You can combine crawled URLs + JSON-defined URLs
βœ”οΈ You avoid heavy external packages
βœ”οΈ You can integrate it with cron, queue, or admin dashboard

⚑ This crawler has been tested and successfully implemented by the Yasoz Group development team on real-world websites. It is reliable, fully customizable, and production-ready.