switch in amd64 assembly

Introduction

Some time ago when discussing with my brother, we concluded that switch statements can be better optimized at machine level than if/else. The discussion started because python does not support switch constructs.

Some time later I was questioning myself if this was really true. I mean the most obvious way to implement switch was a list of conditional jumps in assembly code. In this case there would be no performance advantage at all. But there is not a hashtable created by the compiler, right?!

switch in C

Okay, let's take a simple program with a switch. C, C++, Java, PHP and ActionScript use the same syntax. It's like "switch ("<condition>")" "{" ("case" <constant>":" (<expression>;)+ (break;)? )+ "}". An optional default block can be added, which is executed whenever no other case matches. break terminates the current switch; missing to use it will lead to a program executing all kinds of case blocks.

Let's start with this C code. It uses some input character as condition and has 3 case branches. Just for the record: The input character is only used here to urge the compiler to not optimize the switch away (which is the case whenever a constant is the condition).

#include <stdio.h>

int
main()
{
  char a = getchar();
  switch (a)
    {
    case 'A':
      a = 0;
      break;
    case 'B':
      a = 1;
      break;
    case 'C':
      a = 2;
      break;
    case 'D':
      a = 3;
      break;
    case 'E':
      a = 4;
      break;
    default:
      a = -1;
      break;
    }
  return a;
}
C program with a simple switch statement
image/svg+xml
State diagram of the C program

My assumption

My assumption is that this source code will get translated into the following machine pseudo code:

00  0x41 (ASCII A)
01  0x42 (ASCII B)
02  0x43 (ASCII C)
03  0x44 (ASCII D)
04  0x45 (ASCII E)

10  IF ([a]-00 == 0) GOTO 15
11  IF ([a]-01 == 0) GOTO 17
12  IF ([a]-02 == 0) GOTO 19
13  IF ([a]-03 == 0) GOTO 1B
14  IF ([a]-04 == 0) GOTO 1D
13  [a] = -1   (default case)
14  GOTO 1E
15  [a] = 0    (case A)
16  GOTO 1E
17  [a] = 1    (case B)
18  GOTO 1E
19  [a] = 2    (case C)
1A  GOTO 1E
1B  [a] = 3    (case D)
1C  GOTO 1E
1D  [a] = 4    (case E)
1E  EAX = [a]  (return a)
C program translated in pseudo machine code

In this case the machine has to test each condition separately, which simply takes Ο(n). This would be equal to a program using if else repetitions.

  if (a == 'A')
      a = 0;
  else if (a == 'B')
      a = 1;
  else if (a == 'C')
      a = 2;
  else if (a == 'D')
      a = 3;
  else if (a == 'E')
      a = 4;
  else
      a = -1;
else if can replace switch statements

elif in python

python does not have a switch statement. Guido asked the community and they did not like it [pyt01]. Like me. Default case, break and lots of control flow. It's just as low-level like for (i=0; i<42; i++) and not easy to process in our brain. Furthermore in some languages there is not only one comparison operator (weakly vs strongly typed) and the way of comparsion has to define another convention. Well, the last point is actually not a problem in python…

So we have to use if/else again. The equivalent python program looks like this:

#!/usr/bin/env python

import sys

def func(a):
    if a == 'A':
        a = 0
    elif a == 'B':
        a = 1
    elif a == 'C':
        a = 2
    elif a == 'D':
        a = 3
    elif a == 'E':
        a = 4
    else:
        a = -1
    return a

if __name__ == '__main__':
    sys.exit(func(raw_input()[0]))
Equivalent python program

Of course we are interested into the assembler generated by python. So let's have a look using the dis module [dis01].

>>> dis.dis(ifelse.func)
  6           0 LOAD_FAST                0 (a)
              3 LOAD_CONST               1 ('A')
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       21

  7          12 LOAD_CONST               2 (0)
             15 STORE_FAST               0 (a)
             18 JUMP_FORWARD            90 (to 111)

  8     >>   21 LOAD_FAST                0 (a)
             24 LOAD_CONST               3 ('B')
             27 COMPARE_OP               2 (==)
             30 POP_JUMP_IF_FALSE       42

  9          33 LOAD_CONST               4 (1)
             36 STORE_FAST               0 (a)
             39 JUMP_FORWARD            69 (to 111)

 10     >>   42 LOAD_FAST                0 (a)
             45 LOAD_CONST               5 ('C')
             48 COMPARE_OP               2 (==)
             51 POP_JUMP_IF_FALSE       63

 11          54 LOAD_CONST               6 (2)
             57 STORE_FAST               0 (a)
             60 JUMP_FORWARD            48 (to 111)

 12     >>   63 LOAD_FAST                0 (a)
             66 LOAD_CONST               7 ('D')
             69 COMPARE_OP               2 (==)
             72 POP_JUMP_IF_FALSE       84

 13          75 LOAD_CONST               8 (3)
             78 STORE_FAST               0 (a)
             81 JUMP_FORWARD            27 (to 111)

 14     >>   84 LOAD_FAST                0 (a)
             87 LOAD_CONST               9 ('E')
             90 COMPARE_OP               2 (==)
             93 POP_JUMP_IF_FALSE      105

 15          96 LOAD_CONST              10 (4)
             99 STORE_FAST               0 (a)
            102 JUMP_FORWARD             6 (to 111)

 17     >>  105 LOAD_CONST              11 (-1)
            108 STORE_FAST               0 (a)

 18     >>  111 LOAD_FAST                0 (a)
            114 RETURN_VALUE
func disassembled

Comparisons for each case. Pretty much the same like if/else in C. Now we have seen enough source code. What about the reality?

What it looks like when we choose the red pill.

Hashtables

Hashtables are important data structures. They are widely used among different software implementations. This includes programming languages like PHP [php01] and python [pyh01].

The primary reason we are mentioning hashtables here is the lookup runtime. We were talking about Ο(n) before in context of if/else. Looking up the last case statement (= worst case) takes as long as iterating over all case conditions. Hashtables using hashing functions allow us runtimes with Ο(1).

C to amd64 assembly

gcc -Wall -S switch.c

I will reduce the source code to important parts and add comments.

main:
.LFB0:
	pushq	%rbp
	movq	%rsp, %rbp
	call	getchar                 # char a = getchar();
	movb	%al, -1(%rbp)
	movsbl	-1(%rbp), %eax
	subl	$65, %eax               # [1] a = a - 65
	cmpl	$4, %eax
	ja	.L2                     # [2] if (4 > a) GOTO L2
	mov	%eax, %eax
	movq	.L8(,%rax,8), %rax      # [3] load branch address to %rax
	jmp	*%rax                   # jump to case branch
	.section	.rodata         # array of pointers to cases
.L8:
	.quad	.L3
	.quad	.L4
	.quad	.L5
	.quad	.L6
	.quad	.L7
	.text
.L3:                                    # case 'A':
	movb	$0, -1(%rbp)
	jmp	.L9
.L4:                                    # case 'B':
	movb	$1, -1(%rbp)
	jmp	.L9
.L5:                                    # case 'C':
	movb	$2, -1(%rbp)
	jmp	.L9
.L6:                                    # case 'D':
	movb	$3, -1(%rbp)
	jmp	.L9
.L7:                                    # case 'E':
	movb	$4, -1(%rbp)
	jmp	.L9
.L2:                                    # default:
	movb	$-1, -1(%rbp)
	nop
.L9:
	movsbl	-1(%rbp), %eax          # return a;
	leave
	ret
C code from above compiled to ASM

This assembler code comes with 3 neat tricks. The first one is to subtract 65 from variable a. 65 is so special only because it's the first number to be a condition to be tested (ASCII A). If you subtract 65 from a, you have to jump to the default case if a is greater than 4. The compiler recognized this situation (all numbers between 65 and 69 point to some case) and this is trick number two for me.

The third trick is also referenced in the source code by brackets and describes the way to lookup the pointer to the case codeblock by a hashtable. We have to do some arithmetic and have to explain GAS (GNU assembly syntax):

.L8(,%rax,8) means take the content of the address .L8 + (%rax * 8). And what does that mean? Well, on L8 we have the array of pointers to the cases. The margin between the pointers is 8 on a 64bit system. So case 'A' is on .L8 + 1 * 8. case 'B' is on .L8 + 2 * 8. And so on. Solving this expression returns an pointer to the first command of the case code block. So .L8 + (%rax * 8) is actually our hashing function and there is no iteration over cases. Lookup takes Ο(1).

Conclusion

So it is true: switch statements can be translated to hashtables at assembler level and therefore are more efficient than if/else constructs. If you use the else if code in C at least GCC is not able to evaluate this "hashtable would be possible" situation. As far as python is concerned, it does not support switches and therefore cannot be translated to hashtables easily.

[pyt01]
Guido van Rossum, "PEP 3103 -- A Switch/Case Statement"
http://www.python.org/dev/peps/pep-3103/ [accessed 28th Oct 2011].
[dis01]
Python Software Foundation, "Disassembler for Python bytecode — Python v2.7.2 documentation"
http://docs.python.org/library/dis.html [accessed 28th of Oct 2011].
[php01]
Zend Technologie Ltd., "Contents of /php/php-src/trunk/Zend/zend_hash.c"
http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.c?view=markup [accessed 28th of Oct 2011].
[pyh01]
Python Software Foundation, "Contents of /python/trunk/Objects/dictobject.c"
http://svn.python.org/view/python/trunk/Objects/dictobject.c?view=markup [accessed 28th of Oct 2011].