Reading Binary Files

Some files in a computer system are written for humans and contain text.

% file /etc/hosts
/etc/hosts: ASCII text

But many other files are made for the computer to execute, and it isn’t possible to read them using a tool like cat.

% cat /bin/ls | head
����@�
      ��Z������

This is because they are binary files

% file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/bin/ls (for architecture x86_64):    Mach-O 64-bit executable x86_64
/bin/ls (for architecture arm64e):    Mach-O 64-bit executable arm64e

However, it is possible to read them using a tool like hexdump

hexdump -C /bin/ls | head
00000000  ca fe ba be 00 00 00 02  01 00 00 07 00 00 00 03  |................|
00000010  00 00 40 00 00 01 1c c0  00 00 00 0e 01 00 00 0c  |..@.............|
00000020  80 00 00 02 00 01 80 00  00 01 5a f0 00 00 00 0e  |..........Z.....|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The left letter of each pair is the high 4 bits and the second letter the lower 4 bits. Not all bytes represent a visible character, so I’m going to take 40, which represents the @ symbol. When split, the hexadecimal 4 can be represented as 0100 in binary and 0 as 0000. Merged back together forms the binary number 01000000, or 64 in decimal. We can validate this on an ASCII table like the one below.

DECHEXBINASCII Symbol
633F00111111?
644001000000@
654101000001A
Table source: https://www.ascii-code.com/
stateDiagram-v2
    40 --> 4
    40 --> 0
    4 --> 0100
    0 --> 0000
    0100 --> 01000000
    0000 --> 01000000
    01000000 --> 64
    64

Hexdumpje

To understand better how this works, I wrote a basic version of hexdump. The source code can be found on https://github.com/mauromorales/hexdumpje

Leave a Reply

Your email address will not be published. Required fields are marked *