Multi-language site - l10n

4 minute read

Providing a website in different languages is called “internationalization” - in short i16n or “localization” - l10n. If you search these abbreviations in the web, you’ll find a lot of useful material. By the way, the numbers in the brief version stand for the letter count in the word’s center.

Motivation

Right from the start, I was asked to prepare a German version of this website. To date I considered this too much effort and this only changed half a year ago when I was verbally asked by multiple people independently from each other.

So I invested half a year of evenings to get the heavy lifting done, transcribing roughly 80 articles. But I still wasn’t quite done.

Jekyll extension

I looked at a couple of extensions for Jekyll dedicated to multilingual support like Juan Palares’s plugin-free solution, untra’s polyglot plugin, kurtsson’s jekyll-multiple-languages-plugin and some other solutions to publish a website in multiple languages.

So I had to decide for one. The plugin-less solution was out quickly because I wanted a concise naming convention where identical posts in different languages have identical names. Plus, I wanted interlinks to be persistent for the selected language. Something would have to keep language pots separated.

After some research, I found leo3418’s wonderful series of blog posts for implementing Polyglot - and so I decided to go with that one.

Polyglot: Installation

Installation of PolyGlot is simple because both the documentation in the repository’sreadme.md and leo3418’s related post do instruct these first steps in detail. They do this so well so I won’t lose another word about it here.

Restructuring the website

image: updated folder structure for this site Polyglot offers two possibilities to adapt the site’s structure for multilanguage support:

  1. Use the language abbreviation in the post’s filename like this: 2023-11-14-jekyll-polyglot-language-support-en.md, repeat for each article in a different language with the appropriate abbreviation
  2. Create folders named like the language code, e.g. /en and put all english content in there, but keep file names the same between languages.

I decided for the latter option out of laziness - I didn’t want to rename existing articles. Polyglot manages to stick to the same language when navigating a site. If you, for instance, click https://schallbert.de/milling-small-parts/, Polyglot will assume you want the English version to be presented and thus provide it. Where this doesn’t work anymore is section titles as they are translated themselves, invalidating attached links, so the section link #restructuring-the-website will lead to a 404 fault when called in the German version.

Thus, when using section titles, the full post’s path has to be noted.

This is getting a little more complicated. Jekyll’s _config.yml is intended for one version of the website only, so we cannot just create two variants of this file and expect it to work. To still enable full translation, we’ll have to overwrite language sensitive parameters like site.title or site.description externally depending on selected locale.

I have solved this adding l10n.yml files in the _data folder that covers these parameters for each language, and then packing them into a language-code folder like we’re used to do with our posts. To show their content, the relevant liquid parameters in html files (_includes folder) have to be modified to now search there instead of taking the standard config. Example: Follow. German version of this site will show FOLGEN: while the English one says FOLLOW:.

Language selector

image: footer section of this website Of course I want to offer the option to select a locale on every page. To do this, I added the language code and its flag to the footer. On click, it will reload the page in the requested locale. The following code I wrote in Liquid enables this behavior:


{% for lang in site.languages %}
    {% assign url_parts = page.url | split: "/" %}
    {% assign lang_code = lang %}

    {% assign collection_name = url_parts[1] %}
    {% assign post_url = url_parts | last %}

    {% if lang == site.default_lang %}
        {% if collection_name == post_url %}
            {% assign lang_url = "/" | append: post_url %}
        {% else %}
            {% assign lang_url = "/" | append: collection_name | append: "/" | append: post_url %}
        {% endif %}
    {% else if url_parts.size > 2 %}
        {% assign lang_url = "/" | append: lang_code | append: "/" %}
        {% if collection_name == post_url %}
            {% assign lang_url = lang_url | append: post_url %}     
        {% else %}
            {% assign lang_url = lang_url | append: collection_name | append: lang_url | append: post_url %}
        {% endif %}
    {% endif %}
    <a href=" {{ lang_url }}">{{ site.data[lang].l10n.lang_name }}</a>
{% endfor %}

Here I’m splitting the current page url. The last section will be the post_url. The algo assumes that the first part is the post’s collection name, e.g. posts-hardware. The following logic inserts the language code into the url if and where needed so that the correct full url is built depending on the used collection and page navigation depth.

Publishing

I was pretty proud of this elegant way of selecting languages. I thing it all fits in nicely and the multi-language support potentially adds a good number of readers. The big downer is the duplicated work for creating each post in two languages.

Unfortunately, the site now wasn’t compatible anymore with Github Pages - the auto release would show each page twice instead of keeping them separated by language. Two options remained: creating custom Github Actions to publish on Github Pages, or to move the whole site to an own server. I’ll decide for the latter to be on the safe side regarding data protection regulations because in that case I can make sure the hoster is situated in Germany.

Well, this upcoming project is too big to take on in this post.