XML Sitemap URL Limits: Complete Guide to Size and URL Maximums
XML sitemaps are limited to 50,000 URLs and 50MB (uncompressed) per file according to the official Sitemaps.org protocol and Google’s guidelines. These limits have remained consistent since the file size was increased from 10MB to 50MB in 2016. For websites exceeding these limits, sitemap index files allow you to reference multiple sitemaps, with a theoretical maximum of 2.5 billion URLs across 50,000 individual sitemap files.
This guide covers official sitemap limits, index file implementation, best practices for large sites, and common mistakes that can prevent search engines from processing your sitemaps effectively.
XML Sitemap Limits Quick Reference
| Limit Type | Maximum Value | Notes |
|---|---|---|
| URLs per sitemap | 50,000 | Includes all loc elements |
| File size (uncompressed) | 50MB | Increased from 10MB in 2016 |
| Sitemaps per index | 50,000 | Each can contain 50,000 URLs |
| Theoretical total URLs | 2.5 billion | Via index files |
| File encoding | UTF-8 | Required for all sitemaps |
| Protocol | HTTP or HTTPS | Must match robots.txt location |
These limits apply to standard XML sitemaps. Video sitemaps, news sitemaps, and image sitemaps follow the same structural limits but have additional element-specific requirements.
Understanding the 50,000 URL Limit
The 50,000 URL limit refers specifically to the number of <loc> elements in a single sitemap file. Each URL you want search engines to discover counts as one entry toward this limit, regardless of how many additional elements like <lastmod>, <changefreq>, or <priority> accompany it.
What counts toward the limit:
Only the primary URL location counts. If you use hreflang annotations for international versions of pages, the alternate language URLs do not count separately toward the 50,000 limit. Google’s John Mueller has confirmed this interpretation, clarifying that only the <loc> URL increments the counter, not the <xhtml:link> alternate URLs.
Practical example:
A sitemap entry with five language variations counts as one URL:
<url>
<loc>https://example.com/page</loc>
<xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/page"/>
<xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/page"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://example.com/fr/page"/>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/page"/>
</url>
This single entry counts as one URL toward your 50,000 limit, not five. This distinction matters significantly for multilingual sites that might otherwise need multiple sitemaps purely due to hreflang annotations.
When the limit becomes relevant:
Most websites never approach the 50,000 URL limit. E-commerce sites with extensive product catalogs, news publishers with large archives, and user-generated content platforms are the most likely to exceed this threshold. If your site has fewer than 50,000 indexable pages, a single sitemap file handles all your URLs.
The 50MB File Size Limit
The 50MB file size limit refers to the uncompressed sitemap content. Gzip compression is strongly recommended and widely supported, which can reduce actual file transfer sizes by 70-90% without affecting the limit calculation.
Historical context:
The original Sitemaps protocol specified a 10MB file size limit. In 2016, this was increased to 50MB to accommodate the growing size of websites and the inclusion of additional metadata like hreflang annotations. This change acknowledged that comprehensive sitemaps with full metadata often exceeded 10MB before reaching the 50,000 URL limit.
File size vs. URL count:
In practice, most sitemaps hit the URL limit before the file size limit. A basic sitemap with minimal metadata typically uses 100-200 bytes per URL entry:
| URLs | Estimated Size (Basic) | Estimated Size (With Hreflang) |
|---|---|---|
| 10,000 | 1-2 MB | 5-10 MB |
| 25,000 | 2.5-5 MB | 12-25 MB |
| 50,000 | 5-10 MB | 25-50 MB |
Sitemaps with extensive hreflang annotations, image tags, or video metadata can approach the 50MB limit before reaching 50,000 URLs. If your sitemap exceeds 50MB with fewer than 50,000 URLs, split it into multiple files based on content sections or URL patterns.
Compression best practices:
Always serve sitemaps with gzip compression (.xml.gz extension). This reduces bandwidth usage for both your server and search engine crawlers. All major search engines support gzip-compressed sitemaps, and the compression is transparent to limit calculations.
Sitemap Index Files Explained
Sitemap index files allow you to reference multiple individual sitemaps from a single location, effectively bypassing the 50,000 URL limit for large websites. An index file lists the locations of your sitemap files rather than individual page URLs.
Index file structure:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-categories.xml</loc>
<lastmod>2026-01-20</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-02-01</lastmod>
</sitemap>
</sitemapindex>
Index file limits:
A single sitemap index can reference up to 50,000 individual sitemaps. Since each sitemap can contain 50,000 URLs, the theoretical maximum is 2.5 billion URLs (50,000 sitemaps multiplied by 50,000 URLs each). No website currently approaches this limit, but the architecture scales to accommodate virtually any site size.
Naming conventions:
While not technically required, organizing sitemaps by content type improves maintainability:
sitemap-products.xml- Product pagessitemap-categories.xml- Category pagessitemap-blog.xml- Blog postssitemap-pages.xml- Static pages
This organization makes it easier to identify which sitemap needs updating when specific content sections change, and allows search engines to prioritize crawling based on your update patterns.
Nested index files:
Sitemap index files cannot reference other index files. The hierarchy is limited to two levels: one index file pointing to individual sitemaps. Attempting to nest index files results in search engines ignoring the nested references.
Directory and Location Rules
Sitemaps can only reference URLs at or below their directory level in your site structure. This restriction prevents unauthorized sitemap submissions and ensures sitemaps can only affect URLs within their scope.
Valid sitemap locations:
| Sitemap Location | Can Reference |
|---|---|
| example.com/sitemap.xml | All URLs on example.com |
| example.com/blog/sitemap.xml | Only URLs starting with /blog/ |
| example.com/products/sitemap.xml | Only URLs starting with /products/ |
| subdomain.example.com/sitemap.xml | Only URLs on subdomain.example.com |
Root sitemap advantage:
Place your primary sitemap or sitemap index at the domain root (example.com/sitemap.xml) to reference any URL on the domain. This provides maximum flexibility without location restrictions.
Cross-domain considerations:
Sitemaps cannot reference URLs on different domains unless you verify ownership through Google Search Console. Even with verification, most SEO professionals maintain separate sitemaps for each domain to avoid confusion and simplify management.
Robots.txt sitemap declaration:
Declare your sitemap location in robots.txt for automatic discovery:
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
The sitemap URL in robots.txt must be absolute (including protocol and domain). Multiple sitemap declarations are allowed, which is useful when you have separate sitemaps for different content types.
What Google Actually Uses in Sitemaps
Google has publicly stated that it ignores certain sitemap elements, focusing primarily on URL discovery rather than crawl prioritization hints.
Elements Google uses:
<loc>- The URL itself (required)<lastmod>- Last modification date (used if accurate)<xhtml:link>- Hreflang alternate URLs
Elements Google ignores:
<priority>- Relative importance hint (0.0-1.0)<changefreq>- Expected update frequency
Why priority and changefreq are ignored:
Google’s John Mueller and other representatives have confirmed that these elements are disregarded because they were consistently misused. Webmasters routinely set all pages to maximum priority (1.0) and daily changefreq, making the data unreliable. Google’s own algorithms determine crawl priority based on observed behavior and page importance signals.
The lastmod exception:
While Google uses <lastmod> dates, accuracy matters. If your sitemap shows recent lastmod dates for pages that haven’t actually changed, Google may begin ignoring your lastmod values entirely. Only update lastmod when meaningful content changes occur, not for trivial modifications like layout adjustments.
Best practice approach:
Include <loc> and accurate <lastmod> dates. Omit <priority> and <changefreq> to keep your sitemap lean and avoid the appearance of manipulation attempts. The elements add file size without providing SEO benefit.
Best Practices for Large Websites
Managing sitemaps for sites with hundreds of thousands or millions of URLs requires strategic organization beyond basic implementation.
Segment by content type:
Create separate sitemaps for distinct content categories. This approach provides several advantages:
- Easier identification of indexing issues by content type
- Ability to update specific sitemaps when only certain content changes
- Clearer crawl statistics in Search Console
- Simplified debugging when problems occur
Update frequency strategy:
Not all sitemaps need the same update schedule:
| Content Type | Update Frequency | Rationale |
|---|---|---|
| News articles | Multiple times daily | Time-sensitive content |
| Product pages | Daily | Price and availability changes |
| Category pages | Weekly | Less frequent structural changes |
| Static pages | Monthly | Rarely modified |
Dynamic sitemap generation:
For large sites, generate sitemaps dynamically from your database or CMS rather than maintaining static files. This ensures sitemaps stay synchronized with actual site content and automatically removes URLs when pages are deleted.
Pagination for very large sites:
If a content type exceeds 50,000 URLs, paginate your sitemaps:
sitemap-products-1.xml(URLs 1-50,000)sitemap-products-2.xml(URLs 50,001-100,000)sitemap-products-3.xml(URLs 100,001-150,000)
Reference all paginated sitemaps in your sitemap index file.
Monitor sitemap processing:
Use Google Search Console’s sitemap report to track:
- Submission status and errors
- URLs discovered vs. indexed
- Last read date by Googlebot
- Processing warnings
Regular monitoring catches issues before they significantly impact indexation.
Common Sitemap Mistakes to Avoid
Several common errors can prevent search engines from processing your sitemaps effectively or waste crawl budget on unnecessary requests.
Including non-indexable URLs:
Sitemaps should only contain URLs you want indexed. Exclude:
- URLs blocked by robots.txt
- Pages with noindex meta tags
- Redirect source URLs (include only destinations)
- Parameter variations of the same content
- Login-required pages
Including non-indexable URLs wastes crawl budget and creates conflicting signals.
Outdated lastmod dates:
Setting lastmod to the current date on every sitemap generation makes the data useless. Google may begin ignoring your lastmod values entirely if they’re consistently inaccurate. Only update lastmod when actual content changes occur.
Missing sitemap declaration:
Submit your sitemap through both methods:
- Google Search Console sitemap submission
- Robots.txt sitemap directive
This ensures discovery even if one method fails or is delayed.
HTTP/HTTPS mismatch:
If your site uses HTTPS, your sitemap must be served over HTTPS and contain HTTPS URLs. Mixing protocols creates confusion and may result in duplicate content signals.
Forgetting to update after site changes:
When you remove pages, restructure URLs, or migrate domains, update your sitemaps immediately. Stale sitemaps pointing to 404 pages or redirects waste crawl budget and delay discovery of new content.
Exceeding limits without index files:
If your sitemap exceeds 50,000 URLs or 50MB, search engines will truncate it unpredictably. Always implement sitemap index files before approaching limits rather than relying on partial processing.
Frequently Asked Questions
Do alternate hreflang URLs count toward the 50,000 limit?
No, only the primary <loc> URL counts toward the 50,000 URL limit. Hreflang alternate URLs specified in <xhtml:link> elements do not increment the counter. Google’s John Mueller has confirmed this interpretation, which is important for multilingual sites with many language variations per page.
What happens if my sitemap exceeds the limits?
Search engines may truncate your sitemap, processing only the first 50,000 URLs or the content before the 50MB threshold. This truncation is unpredictable and may exclude important pages. Always use sitemap index files to stay within limits rather than risking partial processing.
Should I gzip compress my sitemaps?
Yes, gzip compression is strongly recommended. All major search engines support compressed sitemaps (.xml.gz extension), and compression reduces bandwidth for both your server and search engine crawlers. The 50MB file size limit applies to uncompressed content, so compression doesn’t affect your URL capacity.
Can I submit the same URL in multiple sitemaps?
Technically yes, but it’s unnecessary and creates confusion. Each URL should appear in only one sitemap. Duplicates don’t provide additional crawl priority and make it harder to diagnose indexing issues when the same URL appears in multiple files.
How often should I update my sitemaps?
Update frequency depends on how often your content changes. News sites may update multiple times daily, while sites with stable content might update weekly or monthly. The key is accuracy: only update lastmod dates when meaningful content changes occur, and regenerate sitemaps when URLs are added or removed.
Do sitemaps guarantee indexing?
No, sitemaps help search engines discover URLs but don’t guarantee indexing. Search engines still evaluate each page for quality, uniqueness, and crawlability before deciding whether to index it. Think of sitemaps as making sure pages are found, not as a mechanism for forcing indexation.
Key Takeaways
- XML sitemaps are limited to 50,000 URLs and 50MB (uncompressed) per file, with UTF-8 encoding required for all content.
- Only primary
<loc>URLs count toward limits; hreflang alternate URLs do not consume your 50,000 URL quota. - Sitemap index files reference up to 50,000 individual sitemaps, enabling a theoretical maximum of 2.5 billion URLs for extremely large sites.
- Google ignores
<priority>and<changefreq>values but does use accurate<lastmod>dates for crawl scheduling. - Sitemaps can only reference URLs at or below their directory level; place your sitemap at the domain root for maximum flexibility.
- Always use gzip compression to reduce file transfer size without affecting the 50MB limit calculation.
Conclusion
XML sitemap limits are generous enough for the vast majority of websites, with the 50,000 URL and 50MB thresholds accommodating even large e-commerce sites and content publishers. For sites exceeding these limits, sitemap index files provide virtually unlimited scalability through their 2.5 billion URL theoretical maximum. Focus on accuracy over quantity: include only indexable URLs, maintain correct lastmod dates, and organize sitemaps logically by content type for easier management and debugging.
Try our free letter counter → to verify your sitemap URLs and metadata stay within optimal length limits for search engine processing.