Lukas Prokop, 2016
My master thesis is about
differential cryptanalysis with SAT solvers.
→ ergo, this remains a "non-introduction".
→ However, I wrote the following library as first project for my master thesis.
In mathematics and mathematical logic, Boolean algebra is the branch of algebra in which the values of the variables are the truth values true and false, usually denoted 1 and 0 respectively. […] the main operations of Boolean algebra are the conjunction and, denoted ∧, the disjunction or, denoted ∨, and the negation not, denoted ¬.
—Wikipedia:Boolean algebra
AND | ||
arg 1 | arg 2 | result |
false | false | false |
false | true | false |
true | false | false |
true | true | true |
Compare with tools:truthtable.
OR | ||
arg 1 | arg 2 | result |
false | false | false |
false | true | true |
true | false | true |
true | true | true |
Compare with tools:truthtable.
NOT | ||
arg 1 | result | |
false | true | |
true | false |
Compare with tools:truthtable.
You can represent any boolean function as combination of AND, OR and NOT.
It is a simple exercise to generate all 8 functions for 2 input bits and 1 output bit (2^{2+1} = 8). Let k=2 be your induction base. Consider arguments A_{k-1}, one additional value b, result r and boolean function f. f(V_{k}) AND b = r or f(V_{k}) AND (NOT b) = r is an extension to f to k+1 arguments where x=y means ((NOT x) OR y) AND ((NOT y) or X)
IMPLY | ||
arg 1 | arg 2 | result |
false | false | true |
false | true | true |
true | false | false |
true | true | true |
EQUALS | ||
arg 1 | arg 2 | result |
false | false | true |
false | true | false |
true | false | false |
true | true | true |
XOR | ||
arg 1 | arg 2 | result |
false | false | false |
false | true | true |
true | false | true |
true | true | false |
NAND | ||
arg 1 | arg 2 | result |
false | false | true |
false | true | true |
true | false | true |
true | true | false |
You can even represent any boolean function as combination of NANDs.
NOT x = (x NAND x)
(x AND y) = ((x NAND y) NAND (x NAND y))
(x OR y) = ((x NAND x) NAND (y NAND y))
Every algorithm can be represented with AND, OR and NOT.
Proof is difficult, but our computer is built using transistors and stuff, right? And yes, I disregarded their physical properties. Sorry.
In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics
Finite State Machines?
Petri nets?
Process algebra?
Hoare logic?
Yeah, possibly. But let's stick to boolean algebra.
arg 1 | 0 | 1 | 1 | 0 |
arg 2 | 1 | 1 | 0 | 0 |
arg 1+2 | ? | ? | ? | ? |
arg 1 | 0 | 1 | 1 | 0 |
arg 2 | 1 | 1 | 0 | 0 |
arg 1+2 | ? | ? | ? | 0 |
arg 1 | 0 | 1 | 1 | 0 |
arg 2 | 1 | 1 | 0 | 0 |
arg 1+2 | ? | ? | 1 | 0 |
arg 1 | 0 | 1 | 1 | 0 |
arg 2 | 1 | 1 | 0 | 0 |
arg 1+2 | ? | 0 | 1 | 0 |
arg 1 | 0 | 1 | 1 | 0 |
arg 2 | 1 | 1 | 0 | 0 |
arg 1+2 | 0 | 0 | 1 | 0 |
Given argument 3 and some value smaller 8,
expect result between 3 and 11.
a | 0 | 1 | 1 | 0 |
b | 1 | 1 | 0 | 0 |
c | 1 | 1 | 0 | 0 |
r | 0 | 0 | 1 | 0 |
r_{0} = a_{0} XOR b_{0}
c_{0} = a_{0} AND b_{0}
a | 0 | 1 | 1 | 0 |
b | 1 | 1 | 0 | 0 |
c | 1 | 1 | 0 | 0 |
r | 0 | 0 | 1 | 0 |
r_{1} = a_{1} XOR b_{1} XOR c_{0}
c_{1} = (a_{1} AND b_{1}) OR (b_{1} AND c_{0}) OR (c_{0} AND a_{1})
a | 0 | 1 | 1 | 0 |
b | 1 | 1 | 0 | 0 |
c | 1 | 1 | 0 | 0 |
r | 0 | 0 | 1 | 0 |
r_{2} = a_{2} XOR b_{2} XOR c_{1}
c_{2} = (a_{2} AND b_{2}) OR (b_{2} AND c_{1}) OR (c_{1} AND a_{2})
a | 0 | 1 | 1 | 0 |
b | 1 | 1 | 0 | 0 |
c | 1 | 1 | 0 | 0 |
r | 0 | 0 | 1 | 0 |
r_{3} = a_{3} XOR b_{3} XOR c_{2}
c_{3} = (a_{3} AND b_{3}) OR (b_{3} AND c_{2}) OR (c_{2} AND a_{3})
Every boolean function can be represented as ORs of ANDs of literals
XOR of a_{0}, a_{1} and r as ANDs of ORs:
(¬a_{0} ∧ ¬a_{1} ∧ ¬r) ∨
(¬a_{0} ∧ a_{1} ∧ r) ∨
(a_{0} ∧ ¬a_{1} ∧ r) ∨
(a_{0} ∧ a_{1} ∧ ¬r)
Every boolean function can be represented as ANDs of ORs of literals
Previous XOR negated (and therefore ANDs of ORs):
(a_{0} ∨ a_{1} ∨ r) ∧ (a_{0} ∨ ¬a_{1} ∨ ¬r) ∧
(¬a_{0} ∨ a_{1} ∨ ¬r) ∧ (¬a_{0} ∨ ¬a_{1} ∨ r)
A boolean function is satisfiable iff there exists some boolean assignment for variables such that the equation evaluates to true.
#!/usr/bin/env python3 import inspect import itertools def f(a, b): return not a or b and not b argc = len(inspect.getargspec(f).args) for args in itertools.product([True, False], repeat=argc): if f(*args): print("SAT") break else: print("UNSAT")
2 arguments | <1 sec. |
---|---|
4 arguments | ? sec. |
8 arguments | ? sec. |
16 arguments | ? sec. |
32 arguments | ? sec. |
… for an UNSAT function
with CPython 3.4.3
2 arguments | <1 sec. |
---|---|
4 arguments | <1 sec. |
8 arguments | <1 sec. |
16 arguments | <1 sec. |
32 arguments | ? sec. |
… for an UNSAT function
20 arguments | <1 sec. |
---|---|
21 arguments | 1.3 sec. |
22 arguments | 2.6 sec. |
23 arguments | 5.1 sec. |
24 arguments | 10.8 sec. |
25 arguments | 22.4 sec. |
26 arguments | 40 sec. |
… for an UNSAT function
20 arguments | 0.2 sec. |
---|---|
21 arguments | 0.4 sec. |
22 arguments | 0.6 sec. |
23 arguments | 1.2 sec. |
24 arguments | 2.2 sec. |
25 arguments | 4.6 sec. |
26 arguments | 8.7 sec. |
… for an UNSAT function
with PyPy 2.5.0 compiled with GCC 4.9.2
Interpolation tells us that it will take 22 minutes to test one single 32-bit integer. It takes 66810602 days to test two 32-bit integers.
O(2^{n}) for n variables.
But wait… this was the worst case and in python.
What about:
→ this is what SAT solvers typically do.
A boolean function is satisfiable iff there exists some boolean assignment for variables such that the equation evaluates to true.
… becomes …
A boolean function is satisfiable iff an algorithm meets its specification.
import algotocnf def algo(): a = 0x25 b = 0xFF return a ^ b def main(): env = algotocnf.evaluate() env.output(algo(), name='o') env.expect_output({'o': '0xDA'}) assert env.test(), "this input/output relation is not possible" print('Finished successfully.') if __name__ == '__main__': main()
#!/usr/bin/env python3 import io import algotocnf algo = lambda a, b: (a + b, a + b + 0x2) def main(): env = algotocnf.evaluate() env.solver_config.max_models = 5 # input is one of # {'x': '0b0100', 'y': '0b1010101'}, # {'x': '0b0101', 'y': '0b1010101'}, # {'x': '0b0110', 'y': '0b1010101'}, # {'x': '0b0111', 'y': '0b1010101'} x = env.input(bits='01??', name='x') y = env.input(constant=0x9, size=4, name='y') o1, o2 = algo(x, y)
env.output(o1, name='o1') env.output(o2, name='o2') env.expect_output( {'o1': '0b1101', 'o2': '0b1111'}, {'o1': '0b1110', 'o2': '0b0000'}, {'o1': '0b1111', 'o2': '0b0001'}, {'o1': '0b0000', 'o2': '0b0010'} ) assert env.test(), "this input/output relation is not possible" print('Finished successfully.') if __name__ == '__main__': main()
def main(): env = algotocnf.evaluate() env.solver_config.max_models = 10 md4 = MD4(env) # compute rounds = md4.update([0b10000000] + [0] * 15) # actual output for i, (a, b, c, d) in enumerate(rounds): env.output(b, name='B' + str(i)) # expected output after 1 round expectation = {} for i in range(48): expectation['B' + str(i)] = md4_trace[i] env.expect_output(expectation) assert env.test(), "this input/output relation is not possible" print('Finished successfully.')
c Lingeling SAT Solver c c Version ats1o0 cfbdebf2806fb8a4b5b9d366561d56a57c2b9470 c c Copyright (C) 2010-2013 Armin Biere JKU Linz Austria. c All rights reserved. c c released Mon Apr 4 12:21:28 CEST 2016 c compiled Mon Apr 4 12:36:57 CEST 2016 c c gcc (Debian 4.9.2-10) 4.9.2 c -Wall -O3 -DNDBLSCR -DNLGLOG -DNDEBUG -DNCHKSOL -DNLGLPICOSAT c Linux cm 3.16.0-4-amd64 x86_64 c c reading input file algotocnf_21-t9_diff-desc_ocnf_s=21.cnf c no embedded options c found 'p cnf 146368 693512' header c read 146368 variables, 693512 clauses, 2344824 literals in 0.0 seconds
c c 581.268 18% analysis c 81.040 3% bump … c 0.000 0% phase c 0.000 0% rdp c ---------------------------------- c 2.464 0% probe simple 48% c 1.496 0% probe basic 29% c 1.216 0% probe tree-look 23% c ================================== c 1.540 0% preprocessing 1% c 132.340 4% inprocessing 99% c ================================== c 133.880 4% simplifying c 3087.696 96% search c ================================== c 3221.580 100% all c c 22244136 decisions, 6904.7 decisions/sec c 11915604 conflicts, 3698.7 conflicts/sec c 966541398 propagations, 0.3 megaprops/sec c 3221.6 seconds, 163.1 MB
s SATISFIABLE v -1 2 -3 4 5 -6 7 -8 9 -10 -11 -12 -13 -14 15 -16 -17 18 19 20 21 -22 -23 24 25 v -26 -27 28 29 -30 -31 32 -33 34 -35 36 37 -38 39 -40 41 -42 -43 -44 -45 -46 47 v -48 -49 50 51 52 53 -54 -55 56 57 -58 -59 60 61 -62 -63 64 -65 -66 -67 -68 -69 v -70 -71 -72 -73 -74 -75 -76 -77 -78 -79 -80 -81 -82 -83 -84 -85 -86 -87 -88 …
Every boolean variable represents one bit.
? becomes 0 or 1.
Wait… you just showed me some integer examples.
But I want everything™
The more you want…
the more you need to check …
the slower you are.
Distinction between problems becomes important.
Encoding becomes important
(which guarantees are encoded implicitly/explicitly?)
Strongly related:
My library is just one of many. Like STP or boolector.
But it addresses specifically the issues I had during my master thesis (differential encoding, flexible encoding, export of CNF and solving it on a cluster). python was the easiest to use (no concurrency required, performance-critical part is limited to SAT solver).
It was a pleasure to work in that field and with those tools. Thanks to all involved. You are mentioned in my thesis' Acknowledgements.
… so far?!