Adding Robots Meta Tags To Hugo Pages

Note: This is a GitHub repository for this template.

Including a <meta name=“robots” content=“index, follow” /> tag in the <head></head> section of a web page is one method of indicating to some search engines how they should process the page.

The robots meta tag indicates to search engines two things:

  1. Should this page be indexed by the search engine.
    • A content value of “index” will tell the search engine to index the page.
    • A content value of “noindex” will tell the search engine that it should not index the page.
  2. Should links on this page be followed by the search engine.
    • A content value of “follow” will tell the search engine to follow the links on the page.
    • A content value of “nofollow” will tell the search engine not to follow the links on the page.

By default, search engines will index any page they find, and also follow the links on the page to index that content as well. So if you wish for the contents of a page to show up on search engine results, and it is ok for the search engine to follow the links on the page, then you don’t need to include the robots meta tag on a page.

However, if for some reason you wish to change either of these behaviors, the robots meta tag will tell well behaved search engines how to treat the content on the page.

A <meta name=“robots” content=“noindex, nofollow” /> tag is the most restrictive combination, telling a search engine not index the page and also telling the search engine not follow any links it discovers on the page.

Sadly, not all search engines are well behaved and some will ignore the directives in the robots meta tag. So if you really want to keep content out of a search engine, you’ll need to provide additional protections for content (password protecting a page, etc.) Your best bet though is to never include the content on a web page to begin with. Search engines are really good at discovering and indexing content.

The robots meta tag allows the index/noindex and follow/nofollow directives to be set singly. You don’t need to include both. However, I decided to include both for clarity of intent.

Also, it is also possible define meta tags for specific search engine crawlers. I decided not to do this at this time, as it would complicate things. For my needs, a single tag for all search engines is satisfactory.

The Robots Meta Tag In Hugo

The theme I am using for my Hugo site didn’t include a way to control the robots meta tag, nor could I find a way to do it Hugo by other means. So I ended up creating a new partial layout template that checks to see if the robots parameters are either defined on the site configuration or in the front matter of individual pages.

The basic rules for generating and populating the values for the robots meta tag are as follows:

  1. By default, no robots meta tag should be generated.
  2. A robots meta tag should be generated if site configuration defines values for the tag, or if a page on the site defines values for the robots tag in the front matter configuration.
  3. By configuring site values for the robots meta tag, the tag should be generated for all pages.
  4. By configuring values for the robots meta tag on a page, then only the tag should be generated only for that page.
  5. If a robots tag is to be generated, by default the values would be “index, follow”. They can be overriden in the site or page configuration settings.
  6. Site configuration values take precedence over the default values. Page configuration values take precedence over both the default and site configuration values.

To implement these reules, I created the following template, named robots.html.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{{  $robotsGenerate := false -}}
{{- if or (isset $.Params.robots "index") (isset $.Params.robots "follow") }}
{{-     $robotsGenerate = true -}}
{{- else if or (isset $.Site.Params.robots "index") (isset $.Site.Params.robots "follow") -}}
{{-     $robotsGenerate = true -}}
{{- end -}}
{{- if eq $robotsGenerate true -}}
{{-     $robotsIndex := true -}}
{{-     $robotsFollow := true -}}
{{-     if isset $.Site.Params "robots" -}}
{{-         if isset $.Site.Params.robots "index" -}}
{{-             $robotsIndex = $.Site.Params.robots.index -}}
{{-         end -}}
{{-         if isset $.Site.Params.robots "follow" -}}
{{-             $robotsFollow = $.Site.Params.robots.follow -}}
{{-         end -}}
{{-     end -}}
{{-     if isset $.Params "robots" -}}
{{-         if isset $.Params.robots "index" -}}
{{-             $robotsIndex = $.Params.robots.index -}}
{{-         end -}}
{{-         if isset $.Params.robots "follow" -}}
{{-             $robotsFollow = $.Params.robots.follow -}}
{{-         end -}}
{{-     end -}}
{{-     if and (eq $robotsIndex true) (eq $robotsFollow true) }}
    <meta name="robots" content="index, follow" />
{{-     else if and (eq $robotsIndex true) (eq $robotsFollow false) }}
    <meta name="robots" content="index, nofollow" />
{{-     else if and (eq $robotsIndex false) (eq $robotsFollow true) }}
    <meta name="robots" content="noindex, follow" />
{{-     else if and (eq $robotsIndex false) (eq $robotsFollow false) }}
    <meta name="robots" content="noindex, nofollow" />
{{-     end -}}
{{- end -}}

A brief breakdown of this template functionality:

Lines 1 through 6 determine if the robots meta tag should be generated. Line 2 checks the site configuration to see if params have been set for the robots meta tag. Line 4 checks the page front matter configuration for params.

Lines 7 through 35 are executed if the robots meta tag is to be generated.

Lines 8 and 9 set the default index and follow boolean values.

Lines 10 through 17 check the site configuration to see if new index and follow values are defined and overwrites the default values if they are.

Lines 18 though 25 check the front matter configuration of the current page to see if new index and follow values are defined and overwrites the previously determined values.

Line 26 through 34 check each to see if each of the values is a boolean and depending on the combination of boolean values, generates the appropriate robots meta tag. Since there are only 4 possible combinations of index and follow settings, the if/else if structure is readable and quick to process. If there were more options available, then the contents value of the tag would be build by concatenating strings, as the if/else if would quickly become complicated.

One thing to be aware of, I explictly require the index and follow configuration values to be either true or false. If for some reason, either the site or page configurations set them to something other than true or false, the robots meta tag will likely not generate.

Installing The Template File

For my site, I stored the above robots.html partial template in the the /layouts/partials/ directory.

root
    layouts
        partials
            robots.html

To get the new partial to work with my site, I had to modify the theme file used to generate the <head> content for the pages. For my site’s theme, that was the baseof.html template. So I copied the file to the /layouts/_default/ directory, which matched the directory structure of my theme.

root
    layouts
        _default
            baseof.html

Next, I modified the copy of baseof.html file to include the robots.html partial template.

<!DOCTYPE html>
<html lang="{{ .Site.LanguageCode }}">
    <head>
        ... Template Head stuff ...

        {{ partial "robots.txt" . }}

        ... More Template Head stuff ...
    </head>
    <body>
        ... Template Body Stuff
    </body>
</html>

At this point, the next time the site is compiled/generated the robots.html template should be used to include the robots meta tag where appropriate. To include the robots meta tag, you’ll need to set some configuration values in either the site configuration or the page front matter configuration.

The Site Configuration

The site configuration if optional. If no params.robots is defined, then by default no robots meta tag will be generated. However, if either the index or follow parameter is defined though, the robots meta tag will be generated for every page on the site.

To include the robots meta tag on only select pages of the site, see the next section for the page front matter configuration settings. You do not need to add the robots parameters to the site configuration if you only wish to include the robots meta tag on some pages.

In the site configuration example below, I’m using TOML for the configuration file. If you use a different syntax, you’ll need to make modifications appropriate to your site.

[params.robots]
    index = true
    follow = true

Setting the site configurations robots parameter index to true will tell the robots meta tag conent to use the index directive. Setting index parameter to false, will change the content directive to noindex.

Likewise, setting site configuration robots parameter follow to true will tell the robots meta tag content to use the follow directive. Setting follow parameter to false, will change the content directive to nofollow.

If either of the index or follow site configuration robots paremters are not defined, the template default values of true will be used for the missing configuration parameter. In other words, you only need to include the parameter if you intend to set it to false. However, for clarity you should probably include both settings.

The next time the site is compiled/generated, every page should include a robots meta tag if the robots site configuration parameters are set. For the above example site configuration, every page should include <meta name=“robots” content=“index, follow” /> in the <head> section of the web page source.

The Page Front Matter Configuration

The page front matter robots parameters are optional, as the code will use the site configuration values if defined, or the default values if no site configuration values are defined. If no site configuration or page front matter configuration parameter values are defined, then the robots meta tag will not be generated.

In the example below, I’m using toml for the front matter configuration. If you use a different configuration syntax, you’ll need to make modifications appropriate for your site.

+++
Title = "A Page That Should Not Be Indexed"
[robots]
    index = false
    follow = true
+++

You only need to set the robots parameters if you need to generate a specific robots meta tag for a page. The page front matter configuration settings will supercede the default and site configuration settings. You only need to define and set the value for the parameter you wish to override. However, if you define one of the robots parameters, it would be best to define the other one too for clarity.

Setting the page front matter configurations robots parameter index to true will tell the robots meta tag conent to use the index directive. Setting index parameter to false, will change the content directive to noindex.

Likewise, setting page front matter configuration robots parameter follow to true will tell the robots meta tag content to use the follow directive. Setting follow parameter to false, will change the content directive to nofollow.

The next time the site is compiled/generated, the page should include a robots meta tag if the appropriate robots site front matter configuration parameters are set. For the above example page front matter configuration, the page should include <meta name=“robots” content=“noindex, follow” /> in the <head> section of the web page source.

Conclusion

Although there is no guarantee a search engine will or will not index content on your site, setting the robots meta tags should help. Likewise, the sitemap.xml file will also assist (see the Excluding Pages From The Sitemap post for a template that will allow you exclude certain pages from Hugo generated sitemaps). Likewise, adding a robots.txt file to your site with disallow directives should also provide additional clarity to search engine crawlers.

But as stated earlier, to really keep your content from being indexed, extra steps will need to be taken to properly protect it. But the best method to protect the content is to never to include the content on a public site in the first place. If the content is there, it is likely to indexed by a search engine at some point.

As always, test the template with your site before deploying to production. Throroughly test if you trying to trying to keep search engines from indexing certain content on your site.

I hope you found this useful.

Dereck

comments powered by Disqus