Unraveling the Mystery of Binary Data

As a developer, I once avoided working with binary data on the web, finding it too complex and overwhelming. However, when I started building a music app that required generating MIDI files without a server, I had to confront my fears head-on.

The Challenge: Generating MIDI Files in JavaScript

My partner and I built a music app to explore cyclic patterns found in non-Western musical traditions. To keep it simple, we opted for a static JavaScript app hosted on Github pages. The final hurdle was allowing users to download their creations as MIDI files. We chose the jsmidgen library, a pure JavaScript implementation that took musical notes as inputs and created a MIDI file. However, the examples were running in Node.js, and we needed to adapt it for client-side use.

Debugging Binary Files: A Daunting Task

After implementing the library, I was relieved to see the app running without errors, but when I double-clicked the downloaded MIDI file, my heart sank. The file wasn’t working as expected. With no warnings or errors in the code, I was left wondering how to debug a corrupted MIDI file.

The Eureka Moment: Understanding Typed Arrays

As I dug deeper, I realized that the issue lay in how JavaScript stores strings and represents them as bytes. I learned that jsmidgen creates a list of bytes as a regular array of numbers, which are then converted to characters using String.fromCharCode. To see the raw bytes, I converted each character back into its original character code. This revealed that the sequence of bytes generated in the browser was identical to the one in Node.js. The problem wasn’t with jsmidgen, but rather with how I was sending the data to the outside world as a file.

Forcing an Encoding: The Power of Typed Arrays

I discovered that JavaScript strings are encoded using UCS-2 or UTF-16, which represent values with 2 bytes. In contrast, MIDI stores each value in 1 byte. By using a typed array to store each value in 2 bytes, I could force the encoding scheme to match MIDI’s requirements. This led to a file that was still corrupted but had the correct values. Finally, by copying the values into a Uint8Array instead of a Uint16Array, I generated the correct MIDI file.

The Secret to Success: Understanding UTF-8 Encoding

The key to solving the mystery lay in understanding how UTF-8 encoding works. UTF-8 is a variable-length encoding scheme that can represent over a million characters. When encoding a string, it uses 1 byte for values under 127 and 2 bytes for values above. This meant that when I tried to encode a string with a character value above 127, it took up 2 bytes, leading to the mysterious c2 byte in my corrupted MIDI file.

Empowering Developers: Control over Binary Data

In conclusion, this journey taught me the importance of understanding binary data and encoding schemes in JavaScript. With typed arrays, developers have control over how their data is stored, allowing them to tackle complex tasks like generating MIDI files completely in the browser. By grasping these concepts, we can unlock new possibilities and create more powerful web applications.

Leave a Reply