9  Multiple languages in PX-files

9.1 Adding another language to your PX-files

Multiple countries have more than one official language, and even though a country does not have multiple languages, it may still be valuable to add another language to the PX-web tables to increase accessibility.

For example, Finland has three languages in their Statistical Database: the country’s two official languages, Finnish and Swedish, and then English to increase accessibility for a broader audience. Likewise, Greenland also has three languages in their Statbank: Greenlandic, Danish, and English. Both use PX-files and PX-web to publish their tables.

A main argument for developing pxmake from Statistics Greenland was to have a good tool for handling multiple languages in PX-files, and as we will see below, it achieves just that.

9.1.1 The px_languages() function

When making a PX-file using pxmake, the language keyword does not have a default. For instance, the tables we made with data from the Rwandan Labour Force Survey in the chapter Updating PX-files do not have a language set. This can easily be set to English using the function px_language(), which sets the main language of the PX-file.

library(tidyverse)
library(pxmake)

# Importing our px-file about labour force
px_en <- px("lfs_tab1.px") %>% 
  # set language to English
  px_language("en")

# Check the language
px_en %>% 
  px_language()
[1] "en"

Now we have set the language to “en” for English using the px_language() function. Note that there are two language functions in pxmake: px_language(), which we just showcased, and px_languages() for setting multiple languages.

Let’s add another language to our PX-file, so it is in both English and French. Here we use the px_languages() function, and we still need to set English as the main language via px_language()1.

px("lfs_tab1.px") %>% 
  px_language("en") %>% 
  px_languages(c("en", "fr")) %>% 
  # print title to see what happened
  px_title()
# A tibble: 2 × 2
  language value                                                                
  <chr>    <chr>                                                                
1 en       Sex, province and highest attained education in Rwanda Labour Force …
2 fr       Sex, province and highest attained education in Rwanda Labour Force …

The code above added English and French as languages and printed the title. Now we have two titles, and in the language column we can see “en” and “fr”. However, as we can see, both titles are (for now) in English. So the px_languages() function has just duplicated the current titles, value labels, etc., and then pxmake leaves it to the language experts to translate.

9.1.2 Translating in R

We can use the functions in pxmake to translate our PX-file directly in R. However, this process can be quite code-heavy and requires that the translators know how to code in R. As we will see below, using Excel in the translation process may in many situations be a better solution.

px("lfs_tab1.px") %>% 
  px_language("en") %>% 
  px_languages(c("en", "fr")) %>% 
  # Translating title to French
  px_title(tribble(~language, ~value,
                   "en",  "Sex, province and highest attained education in Rwanda Labour Force Survey 2023-2024",
                   "fr", "Sexe, province et niveau d'éducation le plus élevé atteint dans l'Enquête sur la main-d'œuvre du Rwanda 2023-2024")) %>% 
  # Print title to see the changes
  px_title()
# A tibble: 2 × 2
  language value                                                                
  <chr>    <chr>                                                                
1 en       Sex, province and highest attained education in Rwanda Labour Force …
2 fr       Sexe, province et niveau d'éducation le plus élevé atteint dans l'En…

Now we have translated the title directly in R (with help from a chatbot). For the title, it was relatively simple to translate directly in R, but we still have many more fields to translate, for example, our value labels.

px("lfs_tab1.px") %>% 
  px_language("en") %>% 
  px_languages(c("en", "fr")) %>% 
  px_values()
# A tibble: 36 × 4
   `variable-code` code   language value 
   <chr>           <chr>  <chr>    <chr> 
 1 LFS_year        2023   en       2023  
 2 LFS_year        2023   fr       2023  
 3 LFS_year        2024   en       2024  
 4 LFS_year        2024   fr       2024  
 5 A01             Female en       Female
 6 A01             Female fr       Female
 7 A01             Male   en       Male  
 8 A01             Male   fr       Male  
 9 A01             Total  en       Total 
10 A01             Total  fr       Total 
# ℹ 26 more rows

We can, for instance, change the translation for the variable sex.

px("lfs_tab1.px") %>% 
  px_language("en") %>% 
  px_languages(c("en", "fr")) %>% 
  px_values(tribble(~`variable-code`, ~code, ~language, ~values,
                    "A01", "Female", "fr", "Femme",
                    "A01", "Male", "fr", "Homme")) %>% 
  px_values()
# A tibble: 36 × 4
   `variable-code` code   language value 
   <chr>           <chr>  <chr>    <chr> 
 1 LFS_year        2023   en       2023  
 2 LFS_year        2023   fr       2023  
 3 LFS_year        2024   en       2024  
 4 LFS_year        2024   fr       2024  
 5 A01             Female en       Female
 6 A01             Female fr       Femme 
 7 A01             Male   en       Male  
 8 A01             Male   fr       Homme 
 9 A01             Total  en       Total 
10 A01             Total  fr       Total 
# ℹ 26 more rows

This just translated the values for sex to French, but we still need to translate all other variables, which would require a lot of coding.


  1. This may be changed in a future version so px_languages(c("en", "fr") just can be used with the first as the main language.↩︎