Most developers rely on ready-made sitemap packages in Laravel, but there are times when you need full control, such as:
- Multilingual URLs
- Filtering specific domains
- Avoiding crawling CDN or static files
- Removing invalid or outdated URLs
- Building reusable crawlers for scheduled commands
- Custom JSON-driven URLs
In this guide, we’ll walk through how to build a fully custom sitemap crawler in Laravel — with zero composer packages.
β‘ This crawler was tested and implemented by the Yasoz Group development team, ensuring it works on real-world websites with multilingual and dynamic content.
ποΈ Step 1 — Create a Crawl Command
Run:
php artisan make:command GenerateSitemap
This generates a command at:
app/Console/Commands/GenerateSitemap.php
π Step 2 — Define Your Base Settings
Inside the command:
protected $baseUrl = 'https://www.yourwebsite.com';
protected $visited = [];
protected $urls = [];
This ensures the crawler only scans URLs from your domain.
π Step 3 — Build the Crawler Logic
We will:
- Fetch an HTML page
- Extract all <a href=""> links
- Normalize URLs
- Filter unwanted URLs
- Recursively crawl
- Avoid duplicates
public function crawl($url)
{
if (isset($this->visited[$url])) return;
$this->visited[$url] = true;
$html = @file_get_contents($url);
if (!$html) return;
preg_match_all('/href=["\']([^"\']+)["\']/', $html, $matches);
foreach ($matches[1] as $link) {
$normalized = $this->normalizeUrl($link);
if ($this->shouldInclude($normalized)) {
$this->urls[] = $normalized;
$this->crawl($normalized);
}
}
}
β¨ Step 4 — Normalize URLs
You want:
- convert /about → https://www.yourwebsite.com/about
- remove hash fragments #section
- skip empty links
private function normalizeUrl($url)
{
if (str_starts_with($url, '/')) {
return $this->baseUrl . $url;
}
if (str_starts_with($url, $this->baseUrl)) {
return strtok($url, '#');
}
return null;
}
π‘οΈ Step 5 — Filtering Rules (SEO Friendly)
You should filter:
β External URLs
β File types (.js, .css, .png, .jpg, .svg, .webp, .pdf)
β cdn-cgi URLs
β URLs that do not begin with your base domain
β Hash-only URLs
private function shouldInclude($url)
{
if (!$url) return false;
if (!str_starts_with($url, $this->baseUrl)) return false;
if (preg_match('/\.(js|css|png|jpg|jpeg|svg|gif|webp|pdf)$/i', $url)) return false;
if (str_contains($url, 'cdn-cgi')) return false;
return true;
}
π¦ Step 6 — Generate the XML Sitemap
After crawling, generate the XML file:
public function generateXml()
{
$this->urls = array_unique($this->urls);
$xml = new \SimpleXMLElement('<urlset/>');
$xml->addAttribute('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');
foreach ($this->urls as $url) {
$urlTag = $xml->addChild('url');
$urlTag->addChild('loc', $url);
$urlTag->addChild('changefreq', 'daily');
$urlTag->addChild('priority', '0.8');
}
$xml->asXML(public_path('sitemap.xml'));
}
π§© Step 7 — Bring It All Together in handle()
public function handle()
{
$this->info("Starting sitemap crawl…");
$this->crawl($this->baseUrl);
$this->generateXml();
$this->info("Sitemap generated successfully!");
}
π Step 8 — Run the Command
php artisan generate:sitemap
π§ͺ Optional: Add Dynamic JSON-Based URLs (Recommended)
If you store custom URLs in JSON files:
private function loadCustomUrls()
{
$paths = glob(base_path('resources/custom-urls/*.json'));
foreach ($paths as $file) {
$data = json_decode(file_get_contents($file), true);
foreach ($data as $url) {
$this->urls[] = $url;
}
}
}
Call it before crawling:
$this->loadCustomUrls();
π Conclusion
Building your own custom sitemap crawler in Laravel gives you full control:
βοΈ You decide what gets crawled
βοΈ You exclude unnecessary URLs like CDN resources & static files
βοΈ You can support multi-language URLs
βοΈ You can combine crawled URLs + JSON-defined URLs
βοΈ You avoid heavy external packages
βοΈ You can integrate it with cron, queue, or admin dashboard
β‘ This crawler has been tested and successfully implemented by the Yasoz Group development team on real-world websites. It is reliable, fully customizable, and production-ready.