Representing Data

Example of how the same data can be represented in an alternate way.

keywords

encoding c

2024-03-11


Here’s a C program that writes, “bananas\0”, into a raw binary file.

// encoding_demo.c

#include <stdio.h>

int main(void) {
  char *bananas = "bananas";
  FILE *file = fopen("bananas", "wb");

  fwrite(bananas, sizeof(bananas), 1, file);
  fclose(file);

  return 0;
}

Compile the program and use xxd -b to view how the data is arranged.

$ clang -Wall -Werror -pedantic encoding_demo.c -o demo
$ ./demo
$ xxd -b bananas
00000000: 01100010 01100001 01101110 01100001 01101110 01100001  banana
00000006: 01110011 00000000                                      s.

The first column shows the memory address. The first six bytes are shown along the top (middle) row, and the last two bytes are on the bottom row. Since these bytes are printable, you can see the text “bananas” listed in the last column. “.” represents null.

The number 32476749030646114 happens to share the same encoding as the text “bananas\0”. The following program writes the same result into the binary file called, bananas. It’s the same sequence of bits.

#include <stdint.h>
#include <stdio.h>

int main(void) {
  uint64_t bananas = 32476749030646114ULL;

  FILE *file = fopen("bananas", "wb");

  fwrite(&bananas, sizeof(uint64_t), 1, file);
  fclose(file);

  return 0;
}

Similarly, a character pointer can be used to iterate through the bits of the large number, one byte at a time.

#include <stdint.h>
#include <stdio.h>

int main(void) {
  uint64_t bananas = 32476749030646114ULL;
  char *bp = (char *)&bananas;

  for (int i = 0; i < (int)sizeof(uint64_t) - 1; i++) {
    printf("%c", *bp);
    bp++;
  }

  // prints "bananas"

  return 0;
}

So, the same data can be represented in multiple ways. Why pick a seemingly random number to represent text? While this is just an example, similar techniques are sometimes useful. Compression is one area where creative techniques are used to represent data.

While no compression has been performed in the examples above, the basic understanding of how memory is arranged and interpreted is important to understand first.