Beyond the content and visible portion of a website, there are a number of optimizations to be made behind the scenes. Reviewing the technical parts of your website on a regular basis ensures that it can be easily found, crawled, and indexed by the search engines. In doing so, you’ll create a solid foundation to build upon and it will maximize the potential of your on-page optimizations.
This article covers a few common topics around technical SEO to give you a better understanding of what to look for when setting up redirects, encountering 404 errors, and building your sitemap.
[Tweet “Get the most out of your on-page optimizations with these technical SEO fundamentals.”]
The robots.txt file is placed at the root of your site and provides directions to the search engine crawlers on the action they should take when they get to your website. When a crawler visits your site, it looks for the robots.txt file first. From there, it knows where it is allowed to go and what pages it shouldn’t visit.
This file is useful because it keeps the crawlers from indexing pages that shouldn’t be shown in the search results. The robots.txt file is also used to link to your website’s sitemap, which is covered in the next section.
In the case of some large sites, the search engines crawlers won’t stick around long enough to find every page, as there is a crawl budget which limits how far it will go. Excluding pages in the robots.txt frees up more time for the pages you want to be found and crawled.
When creating a robots.txt file, add only the pages you don’t want to have indexed or visible in search. This may include paid search landing pages, forms that can’t be crawled, or CMS login pages. What you want to keep out of there is your root domain or any other important pages. If you’re using the robots.txt to hide pages you don’t want your website visitors to find, the robots.txt may not be the best place to put it. It is a public file and if there is a URL you do not want to make known, it wouldn’t be difficult to go to the robots.txt and manually visit the page.
Below is an example robots.txt file.
- The User-Agent line refers to the search engine and different rules can be established for different crawlers. For example, if you only wanted to block Google from seeing a page, you would use User-agent: Google. To create rules for all crawlers, use the asterisk as shown above.
- To prevent pages from being crawled, the Disallow rule is used. Add each URL to its own line and only use the piece of the URL after the root domain.
- At the bottom of each robots.txt file, place a reference to your sitemap so it can be easily found. While sitemap.xml is the standard naming convention, sometimes a CMS may use a different name and adding it in makes locating it easier for the crawlers.
For more information on what a robots.txt file is and what it can do, refer to http://www.robotstxt.org/.
While the robots.txt file tells the site crawlers what to do, the sitemap tells them where to go. This file will include a list of every URL on your site, or in some cases the most important URLs. By having this list, it will be easier for the search engines to find and crawl your website. Rather than relying on the crawlers to find each page on their own, and potentially missing a few, you can prioritize and provide a complete index.
Beyond listing out the pages on your website, the sitemap.xml file also provides valuable meta data to the search engines. This meta data includes when the page was last updated, how often its content changes, and the priority of that page compared to others on your site. Adding in the meta data is optional, but helps to show the search engines the freshness of your site and provides some direction on how often to come back and check for content updates.
This information is added to the sitemap in a specific format and the complete file lives at the root of your website. More information about the format of your sitemap can be found at sitemaps.org and a sample is shown below.
Just as the sitemap.xml files tells the crawlers where your pages are on a site wide basis, setting your canonical URLs tells them where they are at the page level.
In the case of two or more pages that are duplicated or very similar, the canonical lets the search engines know which is your preferred version. So, if it is considering both for inclusion into the results, the canonical URL will be the master copy.
This is especially useful on ecommerce sites where the only difference between product pages is a color, or if your website resolves using both the www and non-www versions. Even if this isn’t the case on your site, it is still best practice to include the canonical tag on your site.
The canonical can be placed with the <head> tags of each page and below is an example of its formatting and structure.
If a page on your website is moved or replaced, a redirect can be used to send users to the correct, updated version of that page. This happens if you change the URL or if a new page is created to replace the original and you want your site visitors to only find the most up-to-date version.
Redirects are important because if you leave two similar pages on your site, there could be issues with duplicate content. While a canonical can help, there is also the chance that some site visitors find the new page and others the old, and if the information is not consistent, it can create confusion. Using redirects helps to minimize these issues.
Redirects commonly come in two forms: a 301 and a 302. These numbers refer to the response code the server provides when a redirect is called for. A 301 redirect is a permanent redirect and this means the new page will be where all visitors are sent moving forward. A 302 is a temporary redirect and indicates to the site crawlers that this redirect may be removed in the future.
In most cases, the ideal redirect to use is a 301. This is because a 301 redirect will always transfer the authority from the original page to the new one. This will make it easier for the new page to rank in searches because it will be associated with the back links and page authority from the older page. 302 redirects do not always transfer this authority because they are temporary and should only be used in cases where they will be removed in the future.
Sometimes pages on your website are deleted, moved, or changed without redirects or replacements put into place to handle the changes. So, when a visitor or website crawler attempts to visit that page it receives an error. The server code for this error is a 404.
In most cases, a 404 error is harmless. Websites change and, if there isn’t a likely candidate to redirect your site visitors to from a deleted page, it is best to let them know it is no longer there with a 404 page. However, leaving them at a dead end often causes them to leave your site and look elsewhere.
To prevent this, a custom 404 error page can be set up that sends site visitors to your more popular pages or at least attempts to get them closer to what they are looking for.
Ideally the number of 404 errors on your website can be minimized through redirects and proper site maintenance, but as long as they do not get out of control they can be left.
A great example of a custom 404 landing pages can be found at moz.com. They provide SEO tools and this page includes links to popular sections of their site, help resources, and a search bar.
With the technical side of your website covered, you can then start optimizing your on-page content. Start here with our guide to getting started with SEO in 4 steps.