4 PX files in R

4.1 PX-Files and the PX-File Format

4.1.1 What is the PX-file format?

The PX-file format is a specialized data format developed specifically for storing and distributing multidimensional statistical data. Originally created by Statistics Sweden (SCB) in the 1990s, the PX format has become the standard format for official statistics across most Nordic countries and is increasingly used by statistical organizations worldwide.

The name “PX” comes from the original Swedish-developed software program “PC-AXIS” to display and work with large tabular data. The format was designed to address the unique needs of statistical data distribution, where both data values and comprehensive metadata must be preserved and transmitted together.

4.1.2 Use of PX-files for dissemination of official statistics

The real power and motivation for PX-files stems from the possibility of disseminating the files through PX-web. PX-web can be used to publish statistics in a web-based database. Since January 1, 2016, it has been free for government agencies and municipalities, international NSIs, and international organizations of statistics to use free of charge.

When putting the PX-files in PX-web, it is possible to create a StatBank or Statistical Database with any imaginable official statistics, e.g., population, labor market, business, or financial statistics. Making the statistics available for the public and policy makers, so people can be informed and ultimately support better and more efficient policy making.

The PX-file and PX-web to create a StatBank are used by different national statistical institutes, e.g., Sweden, Norway, Finland, Iceland, Faroe Islands, Greenland, Latvia, Moldova, Ghana, and more. So the use has spread from the Nordic to many other countries. There is an R package pxweb that provides handy functions to work with the PX-web API. It also has a hardcoded list of 30 countries and agencies that use PX-web for disseminating statistics, but their list is not exhaustive.

4.2 PX-files in R

R can be used to create and work with PX-files. Statistics Greenland has developed a great package for working with PX-files in R, called pxmake. In its simplest form, it can take an existing data frame in R and turn it into an exported PX-file. More information can be found on the pxmake webpage. In the following documents, pxmake’s functions will be used to show how to update existing files, create new files, and change multiple files at once.

The following code showcases a basic example for generating a PX-file:

# Install pxmmake if not already installed
install.packages("pxmake")

# Load pxmake
library(pxmake)

# Dataframe we want to convert to PX-file
df <- expand.grid(age = 20:40,
                 sex = c("M", "F"),
                 urban = c("Urban", "Rural"))

# add counts 
df$value <- sample(100:1000, nrow(df), replace = TRUE)

# Creating px object in R 
px_df <- px(df)

# Saving the px file in current working directory
px_save(px_df, path = "test_px_file.px")

This is not a good PX-file yet! We need to add more metadata for it to be a proper PX-file.

4.3 Key characteristics of PX files

4.3.1 Self-documenting structure

PX files are self-documenting, meaning they contain both the statistical data and all necessary metadata within a single file. This includes variable descriptions, data sources, contact information, creation dates, and methodological notes. This comprehensive approach ensures that users receive not just numbers, but the context needed to understand and properly use the data.

4.4 File structure and components

4.4.1 Metadata section

The metadata section appears at the beginning of the file and contains descriptive information about the dataset. This includes:

Dataset identification: Title, description, and unique identifiers
Variable definitions: Names, labels, and valid values for each dimension
Time information: Time periods covered and temporal formatting
Geographic coding: Geographic classifications and hierarchies
Data source: Origin, collection method, and responsible organization
Technical details: Creation date, language, character encoding
Contact information: Who to contact for questions or additional information

4.4.2 Data section

The data section contains the actual statistical values, organized according to the dimensional structure defined in the metadata. The data is typically arranged in a specific order that corresponds to the variable hierarchy specified in the metadata section.

The documentation for the PX-file format can be found from Statistics Sweden here.

4.4.3 Metadata in `pxmake`

The pxmake package has functions to add metadata to our PX-file named after which keyword or metadata we want to create or edit. So if we want to add a title to our PX-file, we can do it with the function px_title.

px_df <- px_title(px_df, "Test PX-file")

With the power of the magrittr pipe, we can add many more metadata functions together. The following guides will exemplify this. A list of all supported keywords (and not supported keywords) can be found here.

Read the introduction from the pxmake package on how to create your first PX-file, which has a more extended example of making a proper PX-file.