CyberLens documentation

Sitemap.xml

Practical guide to when a sitemap really helps discovery, how CyberLens checks it today, and when action is worth taking.

Italian version
Severity
Informational
Estimated fix time
5-15 min
Technical level
Beginner / Intermediate
Applies to
WordPressStatic SitesCMS / E-commerceHosting

What it is

A sitemap XML file follows the Sitemaps Protocol and lists the public URLs you want search engines to discover more easily. It can also include metadata such as <lastmod>, and it may support image, video, or multilingual extensions.

In practical terms, the file must be served correctly and point to the live public URLs that matter.

Technical note: if a site exceeds 50,000 URLs or 50 MB per sitemap file before compression, it should use a sitemap index that references multiple smaller sitemap files. For most sites this is handled automatically by the system generating the sitemap.

Sitemap and robots.txt

These are separate files with different jobs:

  • the sitemap says, “these pages exist, and here is when they changed”;
  • robots.txt says, “this path may or may not be crawled.”

Best practice is to declare the sitemap in robots.txt with a standalone line:

Sitemap: https://example.com/sitemap.xml

If a URL appears in the sitemap but is also blocked by Disallow or marked noindex, the signals are contradictory and should be corrected.

Why it matters

  • It helps discovery, especially for pages that are not strongly connected by internal links.
  • It does not guarantee indexing. Google may still ignore a URL in the sitemap if it sees that page as weak, duplicated, or otherwise not worth indexing.
  • It does not improve rankings directly. A URL being present in the sitemap does not boost its position on its own.
  • It does not replace internal linking. A sitemap cannot compensate for weak site structure.

Info: on smaller sites with clear internal navigation, crawlers can often discover pages without a sitemap. The file becomes more useful on larger or fast-growing sites, dynamic ecommerce catalogs, newer domains with few backlinks, deeper archive pages, media-heavy sites, or multilingual sites that rely on structured discovery.

How CyberLens checks it

CyberLens checks:

  • whether a sitemap is present in common locations such as /sitemap.xml or /sitemap_index.xml, and whether it is referenced in robots.txt;
  • the HTTP status returned by the sitemap URL or index URL;
  • a quick content preview when the file is available, so readers can confirm that the discovered file is the expected one.

Future enhancements: full XML validation, deeper checks on internal URLs, contradiction checks against robots.txt or noindex, and advanced support for image or video extensions may be added later.

Possible findings

Each finding stands on its own, so readers can jump directly to the one shown in the report.

Sitemap missing

Severity: Moderate (Low on smaller sites)

No sitemap was confirmed in the common locations, and no sitemap reference was found in robots.txt. On a small, well-linked site this is rarely urgent; on larger, newer, or weakly linked sites it can make discovery of fresh content less immediate.

Sitemap not declared in robots.txt

Severity: Low / Moderate

The sitemap exists, but crawlers do not see a Sitemap: line in robots.txt. That line is a standard discovery hint, though not the only way a search engine may learn about the sitemap.

Sitemap URL unreachable (4xx / 5xx)

Severity: High / Critical

The sitemap URL or sitemap index returns an error, so crawlers cannot access the discovery list.

Tip: a missing sitemap on a small site is rarely urgent. An unreachable sitemap is worth addressing first.

Future enhancements: more granular findings for XML syntax, non-200 URLs inside the sitemap, blocked or noindex URLs, staging URLs, unreliable <lastmod> values, or ignored tags may be added later.

  1. Unreachable sitemap URL (4xx/5xx): restore access to the file first.
  2. Missing sitemap: generate or restore one.
  3. Sitemap not declared in robots.txt: add a Sitemap: line if the site uses a sitemap.

How to fix it

Tip: if the site does not already expose a sitemap, the quickest path is the CyberLens generator. It creates a simple sitemap.xml starting point based on the pages discovered during the scan. It is useful, but it may not include the whole site if crawl depth or plan limits restrict discovery.

WordPress

Since WordPress 5.5, the platform includes a native sitemap at /wp-sitemap.xml. If an SEO plugin is in charge of sitemap generation, use that version and avoid conflicts between multiple sitemap sources. If the site depends heavily on visual content, also review image and media sitemap settings.

Static site

Generate the sitemap during the build process so it stays aligned with the pages that are actually published. A minimal entry looks like this:

```xml https://example.com/page 2026-06-20 ```

CMS / E-commerce

Exclude system URLs with tracking or session parameters, and make sure permanently removed products disappear from the sitemap in a timely way instead of lingering as dead destinations.

Hosting / server

If the sitemap URL returns an error:

  • check file permissions on the server;
  • confirm that the URL is spelled exactly as expected, including case;
  • if the site is large, confirm that a sitemap index is in place and correctly references nested sitemap files.

Warning: do not update <lastmod> artificially when the page has not actually changed. If the declared dates stop being trustworthy, search engines may stop treating that signal as reliable.

How this appears in CyberLens

In the scan report, the sitemap finding appears with:

  • the observed HTTP status for the sitemap URL;
  • the current severity, based on whether the sitemap is missing, not declared in robots.txt, or unreachable;
  • a quick content preview when the file is available;
  • an option to generate a sitemap when the finding is “Sitemap missing”.