Mauro Morales

software developer

Reading Binary Files

Some files in a computer system are written for humans and contain text.

% file /etc/hosts
/etc/hosts: ASCII text

But many other files are made for the computer to execute, and it isn’t possible to read them using a tool like cat.

% cat /bin/ls | head
����@�
      ��Z������

This is because they are binary files

% file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/bin/ls (for architecture x86_64):    Mach-O 64-bit executable x86_64
/bin/ls (for architecture arm64e):    Mach-O 64-bit executable arm64e
TODO

How is it possible to build a binary with 2 architectures? If I copy/paste this file between an Intel and an M1 Mac, it runs properly on both! 🤯

However, it is possible to read them using a tool like hexdump

hexdump -C /bin/ls | head
00000000  ca fe ba be 00 00 00 02  01 00 00 07 00 00 00 03  |................|
00000010  00 00 40 00 00 01 1c c0  00 00 00 0e 01 00 00 0c  |..@.............|
00000020  80 00 00 02 00 01 80 00  00 01 5a f0 00 00 00 0e  |..........Z.....|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The left letter of each pair is the high 4 bits and the second letter the
lower 4 bits. Not all bytes represent a visible character, so I’m going to take
40 which does. When split, 4 can be represented as 0100 and 0 as
0000, merged back together forms the binary number 01000000, or 64 in
decimal. Which happens to be the value for the character @.

DECHEXBINASCII Symbol
633F00111111?
644001000000@
654101000001A
Table source: https://www.ascii-code.com/

stateDiagram-v2
    40 --> 4
    40 --> 0
    4 --> 0100
    0 --> 0000
    0100 --> 01000000
    0000 --> 01000000
    01000000 --> 64
    64