Mauro Morales

software developer

Category: Explainers

  • Reading Binary Files

    Some files in a computer system are written for humans and contain text.

    % file /etc/hosts
    /etc/hosts: ASCII text

    But many other files are made for the computer to execute, and it isn’t possible to read them using a tool like cat.

    % cat /bin/ls | head
    ����@�
          ��Z������
    
    

    This is because they are binary files

    % file /bin/ls
    /bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
    /bin/ls (for architecture x86_64):    Mach-O 64-bit executable x86_64
    /bin/ls (for architecture arm64e):    Mach-O 64-bit executable arm64e

    However, it is possible to read them using a tool like hexdump

    hexdump -C /bin/ls | head
    00000000  ca fe ba be 00 00 00 02  01 00 00 07 00 00 00 03  |................|
    00000010  00 00 40 00 00 01 1c c0  00 00 00 0e 01 00 00 0c  |..@.............|
    00000020  80 00 00 02 00 01 80 00  00 01 5a f0 00 00 00 0e  |..........Z.....|
    00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    The left letter of each pair is the high 4 bits and the second letter the lower 4 bits. Not all bytes represent a visible character, so I’m going to take 40, which represents the @ symbol. When split, the hexadecimal 4 can be represented as 0100 in binary and 0 as 0000. Merged back together forms the binary number 01000000, or 64 in decimal. We can validate this on an ASCII table like the one below.

    DECHEXBINASCII Symbol
    633F00111111?
    644001000000@
    654101000001A
    Table source: https://www.ascii-code.com/
    stateDiagram-v2
        40 --> 4
        40 --> 0
        4 --> 0100
        0 --> 0000
        0100 --> 01000000
        0000 --> 01000000
        01000000 --> 64
        64

    Hexdumpje

    To understand better how this works, I wrote a basic version of hexdump. The source code can be found on https://github.com/mauromorales/hexdumpje

  • Ruby On Rails: Storing JSON Directly in PostgreSQL

    Whenever we save data from one of our Rails models, each attribute is mapped one to one to a field in the database. These fields are generally of a simple type, like a string or an integer. However, it’s also possible to save an entire data object in JSON format in a field. Let’s see an example of how to do this from a Ruby on Rails application.

    For this example, let’s assume that I have a Page model where I want to save some stats. To begin, we’re going to generate a new migration and add the stats field and define it as type JSON which by default will save an empty array

    def change
      add_column :pages, :stats, :json, default: []
    end

    Once migrated, let’s have a deeper look at how our pages table looks like

    \d pages
    Table "public.pages"
     Column | Type | Default
    ...
     stats  | json | '[]'::json

    Now that is interesting, unlike the more common types which can be 0 or false, the default value of this field is literally the string cast to JSON. Let’s play a little bit with this and cast an array with values

    SELECT '[1, 2, 3]'::json
       json    
    -----------
     [1, 2, 3]
    (1 row)

    Turns out PostgreSQL offers also a set of functions to handle JSON data. Let’s say for example that I wanted to get all pages that have no pre-calculated stats. This can be done using the json_array_length function

    SELECT *
      FROM people
     WHERE json_array_length(stats) = 0

    This is way more performant than, fetching the data, serializing it, and loading it to a Ruby array to then calculate its length.

    Ok, that’s all nice, what about the cases when I do need to load the data in a Ruby object and then save it back? You’ll be happy to know that you don’t need to do anything else, Rails will do all the heavy lifting of serializing and deserializing for you, and provide a getter and setter methods so you can interact with the attribute as you normally would

    page = Page.find(1)
    page.stats.class
    => Array
    page.stats = [1, 2, 3]
    => [1, 2, 3]
    page.save
    => true

    Throughout this example, I used a very simple array, but you can of course use much more complex data objects like you normally would with JSON but be careful not to shoot yourself in the foot! Just because you can save a lot of data into a JSON field doesn’t mean that you should. Evaluate first if what you need is an additional model that relates to the model you’re working with.

    Want to know more? Checkout PostgreSQL documentation on the JSON datatype and the functions you can use

  • Ruby’s DATA Stream

    The STDIN and ARGF streams are commonly used in Ruby, however there’s also the less popular DATA one. This is how it works and some examples in the wild.

    HOW TO READ FROM DATA?

    Like with any other stream you can use gets and readlines. This behaviour is defined by the IO class. However there’s a caveat, your script needs to have a data section. To define it use the __END__ to separate code from data.

    $ cat hello_world.rb
    puts DATA.gets
    __END__
    hello world!
    
    $ ruby hello_world.rb
    hello world!

    Look at that, another way to code hello world in Ruby. Without the __END__keyword, you’ll get the following error:

    NameError: uninitialized constant DATA

    WHEN TO USE IT?

    You could use the data section of the script if you wanted to keep the data and code really close, or if you wanted to do some sort of pre processing to your sources. But to be honest, the only real benefit I can think of is performance. Instead of starting a second IO operation, to read a file containing the data, it’d get loaded at the same time than the script.

    EXAMPLES

    One thing I’ve learned while working with Go, is to check Go’s source files for good examples. Even though you cannot do this with Ruby at the same degree because the sources are in C, you can still check the parts of the sources that are in Ruby and the gems and tools maintained within the Ruby sources. Here are some examples:

  • Numbered Parameters in Ruby 2.7

    A new feature called “numbered parameters” will see the light of day in the Ruby 2.7 release at the end of the year. What caught my attention was not the feature itself but the mixed reception it got from the community.

    BLOCK PARAMETERS

    Whenever you open a block you have the chance to pass a list of numbered parameters

    object.method { |parameter_1, parameter_2, ... parameter_n| ... }

    For example if you were iterating over a hash to print its keys with matching values you’d do something like this:

    my_hash.each { |key, value| puts "#{key}: #{value}" }

    NUMBERED PARAMETERS

    With the new numbered parameters you are going to be able to save yourself some keystrokes and use @ followed by the number that represents the position of the parameter that you want do use so our previous code would now look like this:

    my_hash.each { puts "#{@1}: #{@2}" }

    NO DEFAULT VARIABLE NAME

    Other languages like Kotlin use it as the default variable name within a block.

    collection.map { println(it) }

    This is not the case with this new feature.

    object.method { p @1 }

    is syntactic sugar for

    object.method { |parameter_1,| p parameter } 

    and not for

    object.method { |parameter| p parameter } 

    So pay attention to the dataset you are passing because you might get some unexpected behaviour like this one:

    [1, ['a', 'b'], 3, {foo: "bar"}].map { @1 }
    => [1, "a", 3, {:foo=>"bar"}]

    As you can see 1 and 3 are taken as the first numbered parameter as expected. Each element of the array becomes one of the numbered parameters so @1 => 'a', @2 => 'b'. And the hash is treated as a single object so it won’t get split either.

    This shouldn’t come as a surprise since it’s the expected behaviour of doing

    [1, ['a', 'b'], 3, {foo: "bar"}].map { |x,| x }

    but in this case we make it clear to the reader when we say |x,|. There is no plan to make it a default variable name which is weird because that’s exactly what was requested in the original issue.

    BACKWARDS COMPATIBILITY IS A HIGH PRIORITY

    As I already mentioned this is what the person who requested the issue wanted to have but it was not accepted in its original form because of backwards compatibility. Introducing new keywords to the Ruby language is a no-go at the moment because Matz is not a fan of breaking developers’ old code with newer versions of Ruby.

    I appreciate that Matz takes such a strong stance on this matter, I think it’s important to update your code bases to use the latest version of Ruby but the harder it is to make an update, the less likely it is that you’ll end up doing it. So if I update to Ruby 2.7 and I start seeing breaking changes everywhere in my code base I’m just going to put it on hold for as long as possible. Instead this experience should be a welcoming one.

    PAIN OR GAIN?

    I don’t know how many times you pass a list of parameters to a block versus how many times you pass a single parameter, but I’m pretty sure in every code base you can find many more instances of the latter than the former. So the question is: How valuable is this new feature?

    Nobody seems to like the fact that numbered parameters start with @ and some community members are also saying that developers could get confused thinking that the numbered parameters are instance variables.

    There is currently an open issue requesting to reconsider numbered parameters because in it’s current state it brings more pain than value. What do you think? Do you like numbered parameters? Do you think they should be implemented in a different way? Would you rather not have them at all? There’s some informal voting happening in case you want to chip in.