Mauro Morales

software developer

Month: November 2023

  • Reading Binary Files

    Some files in a computer system are written for humans and contain text.

    % file /etc/hosts
    /etc/hosts: ASCII text

    But many other files are made for the computer to execute, and it isn’t possible to read them using a tool like cat.

    % cat /bin/ls | head
    ����@�
          ��Z������
    
    

    This is because they are binary files

    % file /bin/ls
    /bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
    /bin/ls (for architecture x86_64):    Mach-O 64-bit executable x86_64
    /bin/ls (for architecture arm64e):    Mach-O 64-bit executable arm64e

    However, it is possible to read them using a tool like hexdump

    hexdump -C /bin/ls | head
    00000000  ca fe ba be 00 00 00 02  01 00 00 07 00 00 00 03  |................|
    00000010  00 00 40 00 00 01 1c c0  00 00 00 0e 01 00 00 0c  |..@.............|
    00000020  80 00 00 02 00 01 80 00  00 01 5a f0 00 00 00 0e  |..........Z.....|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    The left letter of each pair is the high 4 bits and the second letter the lower 4 bits. Not all bytes represent a visible character, so I’m going to take 40, which represents the @ symbol. When split, the hexadecimal 4 can be represented as 0100 in binary and 0 as 0000. Merged back together forms the binary number 01000000, or 64 in decimal. We can validate this on an ASCII table like the one below.

    DECHEXBINASCII Symbol
    633F00111111?
    644001000000@
    654101000001A
    Table source: https://www.ascii-code.com/
    stateDiagram-v2
        40 --> 4
        40 --> 0
        4 --> 0100
        0 --> 0000
        0100 --> 01000000
        0000 --> 01000000
        01000000 --> 64
        64

    Hexdumpje

    To understand better how this works, I wrote a basic version of hexdump. The source code can be found on https://github.com/mauromorales/hexdumpje