Mauro Morales

software developer

Category: Explainers

  • How does a Raspberry Pi5 boot an image?

    When the Raspberry Pi5 is turned on, it will check on which device it is configured to boot. By default, this is the SD card, but you can change it to boot from an NVMe or USB drive while still fallback to SD. In my case, I’m using a USB SSD. Let’s take a look at how the disk is partitioned.

    For this article, I will be making reference to the Ubuntu 24.04 server image because its configuration is easier to understand than the Raspbian one, which uses implicit defaults. I mounted the image as a loop device, hence the dev/loop44 in the examples, but if you burned it to an SSD or SD card, you could get the same results from /dev/sdX and /dev/mmcblkY.

    root@zeno:~# lsblk -f /dev/loop44
    NAME       FSTYPE FSVER LABEL       UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
    loop44
    ├─loop44p1 vfat   FAT32 system-boot F526-0340                             419.3M    17% /media/mauro/system-boot
    └─loop44p2 ext4   1.0   writable    1305c13b-200a-49e8-8083-80cd01552617  781.9M    66% /media/mauro/writable

    From the labels, we can assume the system-boot partition will be the one booting the system, but how does the system know this is the case. From the documentation, I was able to find this:

    Partition numbers start at 1 and the MBR partitions are 1 to 4. Specifying partition 0 means boot from the default partition which is the first bootable FAT partition.

    Bootable partitions must be formatted as FAT12, FAT16 or FAT32 and contain a start.elf file (or config.txt file on Raspberry Pi 5) in order to be classed as be bootable by the bootloader

    Looking at the output of the previous command, only the system-boot partition has the right format, so let’s look into that one first.

    # ls -1 /media/mauro/system-boot/
    README
    bcm2710-rpi-2-b.dtb
    bcm2710-rpi-3-b-plus.dtb
    bcm2710-rpi-3-b.dtb
    bcm2710-rpi-cm3.dtb
    bcm2710-rpi-zero-2-w.dtb
    bcm2710-rpi-zero-2.dtb
    bcm2711-rpi-4-b.dtb
    bcm2711-rpi-400.dtb
    bcm2711-rpi-cm4.dtb
    bcm2711-rpi-cm4s.dtb
    bcm2712-rpi-5-b.dtb
    bcm2712-rpi-cm5-cm4io.dtb
    bcm2712-rpi-cm5-cm5io.dtb
    bcm2712d0-rpi-5-b.dtb
    boot.scr
    bootcode.bin
    cmdline.txt
    config.txt
    fixup.dat
    fixup4.dat
    fixup4cd.dat
    fixup4db.dat
    fixup4x.dat
    fixup_cd.dat
    fixup_db.dat
    fixup_x.dat
    hat_map.dtb
    initrd.img
    meta-data
    network-config
    overlays
    start.elf
    start4.elf
    start4cd.elf
    start4db.elf
    start4x.elf
    start_cd.elf
    start_db.elf
    start_x.elf
    uboot_rpi_3.bin
    uboot_rpi_4.bin
    uboot_rpi_arm64.bin
    user-data
    vmlinuz
    

    We can see that the expected config.txt. Let’s take a look at its contents.

    root@zeno:~# cat /media/mauro/system-boot/config.txt
    [all]
    kernel=vmlinuz
    cmdline=cmdline.txt
    initramfs initrd.img followkernel
    
    [pi4]
    max_framebuffers=2
    arm_boost=1
    
    [all]
    # Enable the audio output, I2C and SPI interfaces on the GPIO header. As these
    # parameters related to the base device-tree they must appear *before* any
    # other dtoverlay= specification
    dtparam=audio=on
    dtparam=i2c_arm=on
    dtparam=spi=on
    
    # Comment out the following line if the edges of the desktop appear outside
    # the edges of your display
    disable_overscan=1
    
    # If you have issues with audio, you may try uncommenting the following line
    # which forces the HDMI output into HDMI mode instead of DVI (which doesn't
    # support audio output)
    #hdmi_drive=2
    
    # Enable the serial pins
    enable_uart=1
    
    # Autoload overlays for any recognized cameras or displays that are attached
    # to the CSI/DSI ports. Please note this is for libcamera support, *not* for
    # the legacy camera stack
    camera_auto_detect=1
    display_auto_detect=1
    
    # Config settings specific to arm64
    arm_64bit=1
    dtoverlay=dwc2
    
    # Enable the KMS ("full" KMS) graphics overlay, leaving GPU memory as the
    # default (the kernel is in control of graphics memory with full KMS)
    dtoverlay=vc4-kms-v3d
    disable_fw_kms_setup=1
    
    [pi3+]
    # Use a smaller contiguous memory area, specifically on the 3A+ to avoid an
    # OOM oops on boot. The 3B+ is also affected by this section, but it shouldn't
    # cause any issues on that board
    dtoverlay=vc4-kms-v3d,cma-128
    
    [pi02]
    # The Zero 2W is another 512MB board which is occasionally affected by the same
    # OOM oops on boot.
    dtoverlay=vc4-kms-v3d,cma-128
    
    [all]
    
    [cm4]
    # Enable the USB2 outputs on the IO board (assuming your CM4 is plugged into
    # such a board)
    dtoverlay=dwc2,dr_mode=host
    
    [all]

    I’m only interested in the first 5 lines.

    • [all]: makes reference to the board, or in this case, all boards
    • kernel: defines the kernel file to be read, in this case vmlinuz which was present on the files list
    • cmdline: defines the file with the cmdline used to boot the kernel. In this case cmdline.txt which is also there
    • initramfs: defines the initrd file to be loaded, in this case initrd.img, also there. And the followkernel stanza loads the initrd file in memory right after the kernel. Pay attention that this instruction, different from all others, doesn’t use the assignment =.

    Now we can take a look into cmdline.txt

    # cat /media/mauro/system-boot/cmdline.txt
    console=serial0,115200 multipath=off dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc
    

    This tells us that the root of the system is a partition with the label writable. This matches with the output of our very first command. Listing everything in writable we find:

    root@zeno:~# ls -1 /media/mauro/writable/
    bin
    bin.usr-is-merged
    boot
    dev
    etc
    home
    lib
    lib.usr-is-merged
    lost+found
    media
    mnt
    opt
    proc
    root
    run
    sbin
    sbin.usr-is-merged
    snap
    srv
    sys
    tmp
    usr
    var

    This looks like a common root directory for an Ubuntu system, so I will not go deeper into it.

    From a Linux installation on a PC the bootloader, but it turns out that for the Pi 5 it is already part of the EEPROM. So we can trust that it’s present and following the instructions from config.txt.

    An important part of this process is the Device Tree (.dtb files), which is also read by the bootloader. The Device Tree describes the hardware present on the board, ensuring that the kernel knows how to interact with all connected peripherals.

    To summarize it all. When the Raspberry Pi 5 powers up, the EEPROM will look for the first bootable partition, where it will read the config.txt file. The configuration file tells the bootloader which kernel, initramfs and cmdline params to load. After that, it’s the kernel’s job to decide how to proceed, in this case after the kernel and initramfs are running in memory, it will pivot to the system living in the writable partition. Last and out of the scope of this article, the init system will finalize the system.

    Raspbian

    Keep in mind that the Raspbian image doesn’t define all these details, since it uses defaults:

    The Raspberry Pi 5 firmware defaults to loading kernel_2712.img because this image contains optimizations specific to Raspberry Pi 5 (e.g. 16K page-size). If this file is not present, then the common 64-bit kernel (kernel8.img) will be loaded instead.

    And I assume that for initramfs, if not defined, it will also look within the directory and load either initramfs8 or initramfs_2712 by default since those files are present in the raspbian image.

  • A New Dawn for Secure Linux in Untrusted Environments

    Linux has become the default operating system for running web applications. However, like any system connected to the internet, it is exposed to remote attacks. While public cloud environments and private datacenters offer some security from physical tampering, edge computing presents unique challenges.

    For this article, an edge device refers to a headless computer system (without direct human interface) deployed in remote locations like coffee shops, gas stations, or warehouses.

    The Security Challenge at the Edge

    Contrary to popular belief, Linux systems lack certain critical security features found in Windows (Trusted Boot) and macOS (Startup Security). While Linux supports Secure Boot and full-disk encryption, these measures alone are insufficient for edge environments where devices are physically accessible to untrusted parties.

    The primary security goals for edge devices are:

    1. Preventing unauthorized access to data if the device is stolen.
    2. Ensuring the device does not boot if tampered with.

    Protecting Your Data with Encryption

    Encrypting the disk keeps your data safe when the device is powered off, addressing the first security goal. However, this protection is compromised if the device is tampered with, leading us to the second goal.

    Protecting Your Device from Tampering

    Understanding the Linux boot process is crucial for securing a device against tampering. Upon powering on, a modern computer runs the UEFI firmware, which hands control to a bootloader. The bootloader initiates the operating system, which then decrypts your data and starts your application.

    Secure Boot helps secure the initial stage by only allowing execution of digitally signed bootloaders. However, the problem lies in the next stage: most Linux distributions’ bootloaders do not verify the signatures of the Kernel or Initrd, nor do they measure the integrity of these components. This oversight allows potential tampering to go unnoticed.

    Measuring for Integrity

    Measuring involves calculating a hash for artifacts like the Linux Kernel. Any change in these artifacts alters the hash. Utilizing Trusted Platform Module (TPM) chips, we can establish a validation system that only proceeds with booting if the measurements match the expected values.

    Unified System Image (USI)

    One effective solution is creating a Unified System Image (USI). This combines the Kernel, cmdline parameters, and Initrd into a single, immutable image. By measuring this single image, we ensure the integrity of the entire system. There’s no need to encrypt this image since it contains no sensitive data, which resides in the encrypted area. The system configuration and valuable data remain secure, and the image is mounted read-only to prevent changes.

    For more detailed information on this process, refer to the UAPI Group’s page and Lennart Poettering’s article, “Brave New Trusted Boot World.”

    Kairos: Simplifying Trusted Boot

    Implementing a USI with Trusted Boot can be complex. Kairos aims to simplify this process. Visit our Trusted Boot Installation instructions to try it out, or delve into the Trusted Boot Architecture documentation for a deeper understanding of how Kairos enhances security in untrusted environments.

  • Reading Binary Files

    Some files in a computer system are written for humans and contain text.

    % file /etc/hosts
    /etc/hosts: ASCII text

    But many other files are made for the computer to execute, and it isn’t possible to read them using a tool like cat.

    % cat /bin/ls | head
    ����@�
          ��Z������
    
    

    This is because they are binary files

    % file /bin/ls
    /bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
    /bin/ls (for architecture x86_64):    Mach-O 64-bit executable x86_64
    /bin/ls (for architecture arm64e):    Mach-O 64-bit executable arm64e

    However, it is possible to read them using a tool like hexdump

    hexdump -C /bin/ls | head
    00000000  ca fe ba be 00 00 00 02  01 00 00 07 00 00 00 03  |................|
    00000010  00 00 40 00 00 01 1c c0  00 00 00 0e 01 00 00 0c  |..@.............|
    00000020  80 00 00 02 00 01 80 00  00 01 5a f0 00 00 00 0e  |..........Z.....|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    The left letter of each pair is the high 4 bits and the second letter the lower 4 bits. Not all bytes represent a visible character, so I’m going to take 40, which represents the @ symbol. When split, the hexadecimal 4 can be represented as 0100 in binary and 0 as 0000. Merged back together forms the binary number 01000000, or 64 in decimal. We can validate this on an ASCII table like the one below.

    DECHEXBINASCII Symbol
    633F00111111?
    644001000000@
    654101000001A
    Table source: https://www.ascii-code.com/
    stateDiagram-v2
        40 --> 4
        40 --> 0
        4 --> 0100
        0 --> 0000
        0100 --> 01000000
        0000 --> 01000000
        01000000 --> 64
        64

    Hexdumpje

    To understand better how this works, I wrote a basic version of hexdump. The source code can be found on https://github.com/mauromorales/hexdumpje

  • Ruby On Rails: Storing JSON Directly in PostgreSQL

    Whenever we save data from one of our Rails models, each attribute is mapped one to one to a field in the database. These fields are generally of a simple type, like a string or an integer. However, it’s also possible to save an entire data object in JSON format in a field. Let’s see an example of how to do this from a Ruby on Rails application.

    For this example, let’s assume that I have a Page model where I want to save some stats. To begin, we’re going to generate a new migration and add the stats field and define it as type JSON which by default will save an empty array

    def change
      add_column :pages, :stats, :json, default: []
    end

    Once migrated, let’s have a deeper look at how our pages table looks like

    \d pages
    Table "public.pages"
     Column | Type | Default
    ...
     stats  | json | '[]'::json

    Now that is interesting, unlike the more common types which can be 0 or false, the default value of this field is literally the string cast to JSON. Let’s play a little bit with this and cast an array with values

    SELECT '[1, 2, 3]'::json
       json    
    -----------
     [1, 2, 3]
    (1 row)

    Turns out PostgreSQL offers also a set of functions to handle JSON data. Let’s say for example that I wanted to get all pages that have no pre-calculated stats. This can be done using the json_array_length function

    SELECT *
      FROM people
     WHERE json_array_length(stats) = 0

    This is way more performant than, fetching the data, serializing it, and loading it to a Ruby array to then calculate its length.

    Ok, that’s all nice, what about the cases when I do need to load the data in a Ruby object and then save it back? You’ll be happy to know that you don’t need to do anything else, Rails will do all the heavy lifting of serializing and deserializing for you, and provide a getter and setter methods so you can interact with the attribute as you normally would

    page = Page.find(1)
    page.stats.class
    => Array
    page.stats = [1, 2, 3]
    => [1, 2, 3]
    page.save
    => true

    Throughout this example, I used a very simple array, but you can of course use much more complex data objects like you normally would with JSON but be careful not to shoot yourself in the foot! Just because you can save a lot of data into a JSON field doesn’t mean that you should. Evaluate first if what you need is an additional model that relates to the model you’re working with.

    Want to know more? Checkout PostgreSQL documentation on the JSON datatype and the functions you can use

  • Ruby’s DATA Stream

    The STDIN and ARGF streams are commonly used in Ruby, however there’s also the less popular DATA one. This is how it works and some examples in the wild.

    HOW TO READ FROM DATA?

    Like with any other stream you can use gets and readlines. This behaviour is defined by the IO class. However there’s a caveat, your script needs to have a data section. To define it use the __END__ to separate code from data.

    $ cat hello_world.rb
    puts DATA.gets
    __END__
    hello world!
    
    $ ruby hello_world.rb
    hello world!

    Look at that, another way to code hello world in Ruby. Without the __END__keyword, you’ll get the following error:

    NameError: uninitialized constant DATA

    WHEN TO USE IT?

    You could use the data section of the script if you wanted to keep the data and code really close, or if you wanted to do some sort of pre processing to your sources. But to be honest, the only real benefit I can think of is performance. Instead of starting a second IO operation, to read a file containing the data, it’d get loaded at the same time than the script.

    EXAMPLES

    One thing I’ve learned while working with Go, is to check Go’s source files for good examples. Even though you cannot do this with Ruby at the same degree because the sources are in C, you can still check the parts of the sources that are in Ruby and the gems and tools maintained within the Ruby sources. Here are some examples:

  • Numbered Parameters in Ruby 2.7

    A new feature called “numbered parameters” will see the light of day in the Ruby 2.7 release at the end of the year. What caught my attention was not the feature itself but the mixed reception it got from the community.

    BLOCK PARAMETERS

    Whenever you open a block you have the chance to pass a list of numbered parameters

    object.method { |parameter_1, parameter_2, ... parameter_n| ... }

    For example if you were iterating over a hash to print its keys with matching values you’d do something like this:

    my_hash.each { |key, value| puts "#{key}: #{value}" }

    NUMBERED PARAMETERS

    With the new numbered parameters you are going to be able to save yourself some keystrokes and use @ followed by the number that represents the position of the parameter that you want do use so our previous code would now look like this:

    my_hash.each { puts "#{@1}: #{@2}" }

    NO DEFAULT VARIABLE NAME

    Other languages like Kotlin use it as the default variable name within a block.

    collection.map { println(it) }

    This is not the case with this new feature.

    object.method { p @1 }

    is syntactic sugar for

    object.method { |parameter_1,| p parameter } 

    and not for

    object.method { |parameter| p parameter } 

    So pay attention to the dataset you are passing because you might get some unexpected behaviour like this one:

    [1, ['a', 'b'], 3, {foo: "bar"}].map { @1 }
    => [1, "a", 3, {:foo=>"bar"}]

    As you can see 1 and 3 are taken as the first numbered parameter as expected. Each element of the array becomes one of the numbered parameters so @1 => 'a', @2 => 'b'. And the hash is treated as a single object so it won’t get split either.

    This shouldn’t come as a surprise since it’s the expected behaviour of doing

    [1, ['a', 'b'], 3, {foo: "bar"}].map { |x,| x }

    but in this case we make it clear to the reader when we say |x,|. There is no plan to make it a default variable name which is weird because that’s exactly what was requested in the original issue.

    BACKWARDS COMPATIBILITY IS A HIGH PRIORITY

    As I already mentioned this is what the person who requested the issue wanted to have but it was not accepted in its original form because of backwards compatibility. Introducing new keywords to the Ruby language is a no-go at the moment because Matz is not a fan of breaking developers’ old code with newer versions of Ruby.

    I appreciate that Matz takes such a strong stance on this matter, I think it’s important to update your code bases to use the latest version of Ruby but the harder it is to make an update, the less likely it is that you’ll end up doing it. So if I update to Ruby 2.7 and I start seeing breaking changes everywhere in my code base I’m just going to put it on hold for as long as possible. Instead this experience should be a welcoming one.

    PAIN OR GAIN?

    I don’t know how many times you pass a list of parameters to a block versus how many times you pass a single parameter, but I’m pretty sure in every code base you can find many more instances of the latter than the former. So the question is: How valuable is this new feature?

    Nobody seems to like the fact that numbered parameters start with @ and some community members are also saying that developers could get confused thinking that the numbered parameters are instance variables.

    There is currently an open issue requesting to reconsider numbered parameters because in it’s current state it brings more pain than value. What do you think? Do you like numbered parameters? Do you think they should be implemented in a different way? Would you rather not have them at all? There’s some informal voting happening in case you want to chip in.