Definitions in character terminology

I would like to sum up the terminology (as far as I understand it):

Mojibake
Character in encoding X was decoded using encoding Y
glyph
A character / symbol identified by its shape
character
A digital representation of a glyph as code point
code point
An integer (mostly listed as hex code) representing / referring to a character
charset (abbr. character set)
A set of associations between code points and characters
encoding
A set of conventions to transform a code point to a byte string (always in respect of a charset)
character string
A string with 1 glyph / character as a unit
byte string
Character string encoded (in a specific encoding)

For example, the glyph A consists of three lines and is the first letter in the latin alphabet (which defines a set of characters). If you put two dots at its top (Ä) and decode its UTF-8 representation as latin1, you will get a mojibake Ãœ. The A’s code point in ASCII-compliant encodings such as Unicode is 65 (0x41). The charset UTF-8 defines the association between 65 and A and for charset UTF-8, the encoding is Unicode. Only 1 byte is required to store 0x41 in the memory (in UTF-8 charset) which is binary 101010. So the bytestring of A looks like this in binary: (01000001, ).

Definitions in character terminology

PETA auf #python.de

[23:34] <peta> hallo leute
[23:40] <__name__> hallo peta
[23:40] <__name__> schützt du tiere?
[23:41] <peta> ja … schau mich hier gerade um
[23:42] <peta> soweit ich sehen kann alles in ordnung
[23:42] <__name__> wir machen keine tierversuche
[23:42] <__name__> nur im hinterkammerl mit schlangen
[23:42] <peta> habe gehört dass hier mit exotischen schlangenarten gehandelt werden soll

PETA auf #python.de

On the issue of float, double and long

#include <stdio.h>
// Test environment:
//    Thinkpad Lenovo x201 -- 64bit, gcc 4.5.1, Linux Fedora

int main()
{
  // [0] error: both ‘long’ and ‘float’ in declaration specifiers
  // [1] error: both ‘long long’ and ‘double’ in declaration specifiers

  int a = 1;
  long int b = 1;
  int long c = 1;
  long long int d = 1;
  int long long e = 1;

  float f = 1.0;
  //float long g = 1.0; // [0]
  //long float h = 1.0; // [0]
  //float long long i = 1.0; // [0] [0]
  //long long float j = 1.0; // [0]

  double o = 1.0;
  long double p = 1.0;
  double long q = 1.0;
  //long long double r = 1.0; // [1]
  //double long long s = 1.0; // [1]

  printf("%zun", sizeof(a)); // 4
  printf("%zun", sizeof(b)); // 8
  printf("%zun", sizeof(c)); // 8
  printf("%zun", sizeof(d)); // 8
  printf("%zun", sizeof(e)); // 8
  printf("%zun", sizeof(f)); // 4
  printf("%zun", sizeof(o)); // 8
  printf("%zun", sizeof(p)); // 16
  printf("%zun", sizeof(q)); // 16
  printf("%lfn", o); // 1.000000
  return 0;
}
  1. Why is “double” not “long float”? Replace “double” with “float long” in the example above and you will recognize some consistency.
  2. See the line before “return 0”. Double is semantically a “long float” for printf.
  3. Why is the error of “float long long” printed twice?
  4. As everybody should know, datatypes of C are not bound to any length of size. [wiki]
  5. According to GCC, “float” is defined as “float” < “double” < “double long” with a radix of 2
  6. I don’t like C 🙁
On the issue of float, double and long

Probably, maybe, …

The canPlayType() function doesn’t return true or false. In recognition of how complex video formats are, the function returns a string:

  • "probably" if the browser is fairly confident it can play this format
  • "maybe" if the browser thinks it might be able to play this format
  • "" (an empty string) if the browser is certain it can’t play this format

😀 Sounds like an easteregg, but isn’t…

via Dive into HTML5: Video formats

Probably, maybe, …