Paper: “Engineering a Sort Function”

“Engineering a Sort Function” by John L. Bentley and M. Douglas McIlroy gives a nice insight how theoretical algorithm design is applied in software engineering. I think I came across the paper, because Go’s sort function implementation is based on the described algorithm.

The paper outlines how the unstable sort algorithm has evolved over several iterations during development. It is based on Quicksort, but uses insertion sort for arrays of size ≤7. The solution to Dijkstra’s Dutch national flag problem gives the partition method for QuickSort. Pivot elements are selected with the median of 3 (array size >40) or median of 9 (array size ≤40). You can also see how cost models are applied such that asymptotic behavior becomes less important compared to the expense of particular operations. I think it is interesting, because cost models are not taught here at University of Technology, Graz, whereas asymptotic behavior is focused. However, this model fails where the comparison function can be arbitrary according to the interface and so the actual expense of comparison cannot be estimated.

In the end, I think it is a nice read for 2 hours assuming you are familiar with basic algorithm design.

“[…] This algorithm is easy to describe, and also easy to get wrong—Knuth tells horror stories about inefficient partitioning algorithms.”

“The key to performance is elegance, not battalions of special cases.”

Paper: “Engineering a Sort Function”

Using namespaces with lxml.etree

I need to admit, when I initiated and released ruledxml for the first time, I didn’t care much about namespaces. The usecase was not prevalent in my setting. This changed now.

Time to implement it properly and so I need to look at handling XML namespaces with python’s lxml.etree in more detail. In the following you can see a cheatsheet how you achieve a certain XML namespaced output with lxml.etree.Element instances.

The following table always shows a desired result following by the python source code to generate it.

<a><b/></a>
>>> a = lxml.etree.Element('a')
>>> b = lxml.etree.Element('b')
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<a><b/></a>'
<a xmlns="http://example.org/">
  <b/>
</a>
>>> a = lxml.etree.Element('a', nsmap={None: 'http://example.org/'})
>>> b = lxml.etree.Element('b')
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<a xmlns="http://example.org/"><b/></a>'
<a xmlns:ex="http://example.org/">
  <ex:b/>
</a>
>>> a = lxml.etree.Element('a',
...   nsmap={'ex': 'http://example.org/'})
>>> b = e('{http://example.org/}b')
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<a xmlns:ex="http://example.org/"><ex:b/></a>'
<ex:a xmlns:ex="http://example.org/">
  <ex:b/>
</ex:a>
>>> a = lxml.etree.Element('{http://example.org/}a',
...   nsmap={'ex': 'http://example.org/'})
>>> b = e('{http://example.org/}b')
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<ex:a xmlns:ex="http://example.org/"><ex:b/></ex:a>'
<ex:a xmlns:ex="http://example.org/" ex:attr="value">
  <ex:b/>
</ex:a>
>>> a = lxml.etree.Element('{http://example.org/}a',
...   nsmap={'ex': 'http://example.org/'})
>>> a.attrib['{http://example.org/}attr'] = 'value'
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<ex:a xmlns:ex="http://example.org/"
   ex:attr="value"><ex:b/></ex:a>'
<ex:a xmlns:ex="http://example.org/example"
      xmlns:org="http://example.org/organization"
      ex:attr="value">
  <org:b ex:example="value1" org:example="value2"/>
</ex:a>
>>> a = lxml.etree.Element('{http://example.org/example}a',
...   nsmap={'ex': 'http://example.org/example',
...          'org': 'http://example.org/organization'})
>>> a.attrib['{http://example.org/example}attr'] = 'value'
>>> b = lxml.etree.Element('{http://example.org/organization}b')
>>> b.attrib['{http://example.org/example}example'] = 'value1'
>>> b.attrib['{http://example.org/organization}example'] = 'value2'
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<ex:a xmlns:ex="http://example.org/example"
        xmlns:org="http://example.org/organization"
        ex:attr="value">
    <org:b ex:example="value1" org:example="value2"/></ex:a>'
<a xmlns:ex="http://example.org/" xml:lang="ja">
  <ex:b/>
</a>
>>> a = lxml.etree.Element('a',
...   nsmap={'ex': 'http://example.org/'})
>>> a.attrib["{http://www.w3.org/XML/1998/namespace}lang"] = 'ja'
>>> b = lxml.etree.Element('{http://example.org/}b')
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<a xmlns:ex="http://example.org/" xml:lang="ja"><ex:b/></a>'
<a xmlns:ex="http://example.org/" xml:lang="ja">
  <ex:b xml:lang="en"/>
</a>
>>> a = lxml.etree.Element('a',
...   nsmap={'ex': 'http://example.org/'})
>>> a.attrib["{http://www.w3.org/XML/1998/namespace}lang"] = 'ja'
>>> b = lxml.etree.Element('{http://example.org/}b')
>>> b.attrib["{http://www.w3.org/XML/1998/namespace}lang"] = 'en'
>>> a.append(b)
>>> lxml.etree.tostring(a)
b'<a xmlns:ex="http://example.org/" xml:lang="ja"><ex:b
  xml:lang="en"/></a>'
Using namespaces with lxml.etree

Buchtipp: “Darum nerven Japaner”


Darum nerven Japaner Buchcover

Fig. 1. “Darum nerven Japaner – Der ungeschminkte Wahnsinn des japanischen Alltags”

Der Autor Christoph Neumann erklärt erst im Nachwort wie er dazu kam dieses Buch zu schreiben. ここがヘンだよ日本人 (laut dem Autor mit “Die spinnen, die Japaner!” frei nach Asterix & Obelix zu übersetzen) ist eine populäre japanische TV-Sendung, in der er auftrat. Dort diskutieren die Anwesenden die Gewohnheiten der Japaner und hinterfragen deren Sinnhaftigkeit. Als Spin-Off dieser Auftritte schrieb er das Buch, welches zu einem der bekanntesten seines Genres wurde.

Als Kenntnisse greift er damit auf seine Erfahrungen als Deutscher mit längerem Auslandsaufenthalt in Japan zurück. Er illustriert japanische Eigenheiten anhand seiner Erfahrungen und schreckt nicht davor zurück sie mit harten Worten in krassen Gegensatz zu europäischen Gewohnheiten zu setzen. Die diskutierten Themen illustrieren sich in den 19 Kapiteln des Buchs:

  1. Vorwort: Dürfen Japaner nerven?
  2. Regeln: Das Volk will belehrt werden
  3. Schuhe: Das elfte Gebot: “Du sollst deine Schuhe ausziehen”
  4. Essen: Die mit dem Bauch denken…
  5. Schwimmbad: Japan im Schnelldurchschwimm
  6. Radfahren: Radfahren schwer gemacht
  7. Mafia: Verbrecher, die keine sind
  8. Warnungen: Achten Sie darauf, darauf zu achten!
  9. Verhütung: Pille killen und Föten töten
  10. Körpersprache: Das Gegenteil von Anmut
  11. Yamanote: Mehr als nur eine S-Bahn-Linie
  12. Spass: Kein Spass an der Freud
  13. Schlafen: Bett? Nein, danke!
  14. Diebstahl: Mein Geld, dein Geld – Geld ist für uns alle da
  15. Gleichheit: Reich wie Scheich und dennoch gleich?
  16. Fremdsprachen: Englisch hassen lernen
  17. Urlaub: Die Suche nach dem Vertrauten in der Fremde
  18. Müll: Parolen statt Mülleimer
  19. Big Brothers: Ist Gott eine japanische Firma?
  20. Flirt: Die Sau rauslassen, aber ordentlich

Ein netter gedanklicher 2-Tages Ausflug in den japanischen Alltag!

Buchtipp: “Darum nerven Japaner”

Time management, an analysis

This semester I consciously decided to reduce my number of courses I take. As far as academic work is concerned I need to focus on finishing the first and second semester and finish my master thesis. I reduced the amount of work in associations, I do not take some law course (oh, that’s been some while!) and I don’t take any courses of higher semester.

I wanted to discuss the amount of time I want to invest into something. I fixed holes in my schedule and organized remaining time. I have several categories and every category as a minimum amount of time I invest per week. Good opportunity to document that for future reference:

  • I consider 16 hours per day productive. So I have 8 hours of sleep per day.
  • I invest 42.5 hours per week in attending lectures, tutorials, meetings, visiting the Aikido dojo and quality time.
  • 69.5 hours are left. How do I use them?

There are 10 categories:

    project:
    programs I want to release, watch tutorials/videos, publishing content
    university:
    old courses I need to finish (exams, submissions)
    A2:
    course Analysis 2 and everything related
    LA2:
    course Linear Algebra 2 and everything related
    FOM:
    course Foundations of Mathematics
    master thesis:
    releasing subprojects for my master thesis, writing thesis document
    JAP:
    course Japanese Language and everything related
    日本語:
    learning Japanese on my own (especially Aikido & math)
    日本:
    exchange student year organization
    GLT16:
    upcoming Grazer Linuxtage – graphics work

And how much time do I invest into each category at minimum?
This sums up to 34 hours and I have 69.5 hours available, so this is flexible on purpose.
But it gives a relative measure for future reference:

project 3 hours per week
university 4 hours per week
A2 5 hours per week
LA2 4 hours per week
FOM 1 hours per week
master thesis 8 hours per week
JAP 3 hours per week
日本語 3 hours per week
日本 2 hours per week
GLT16 1 hours per week
Time management, an analysis

Boston Key Party: Graph 3-coloring with SAT solvers

The Boston Key Party was held between 4th and 6th of March 2016. A few colleagues of mine attended this event and asked for help with one particular problem:

Given a graph with 683 vertices and 1253 edges. Assign one of three colors to each vertex. No edge must connect two vertices of the same color.

Graph 3-coloring is a classic example of an NP-complete problem. So a SAT solver might be suitable for it as far as more efficient algorithms are not known.

Let’s solve this problem with an approach similar to an implementation I use in my master thesis: Generate the CNF using python, let a SAT solver do the hard work and parse the solver’s result back to python and print the assignment of colors to vertices.

The boolean logic is really simple: Every vertex gets assigned three boolean variables. We add the following constraints:

  • Only one of three variables can be true.
  • For every edge (s, t) we look up the corresponding colors for s and t: r1,g1,b1 and r2,g2,b2. We claim not both r1 and r2 can be true. Not both g1 and g2 can be true. Not both b1 and b2 can be true.

In code it looks like this:

def oneof3(w, a, b, c):
    w.add_clause(a, b, c)
    w.add_clause(-a, -b, c)
    w.add_clause(-a, b, -c)
    w.add_clause(a, -b, -c)
    w.add_clause(-a, -b, -c)

def notboth(w, a, b):
    w.add_clause(-a, -b)

def add_constraints(graph, w):
    # constraints for colors of one vertex
    for v, (red, green, blue) in graph.vertices.items():
        oneof3(w, red, green, blue)

    # constraint for edges
    for (src, dst) in graph.edges:
        r1, g1, b1 = graph.vertices[src]
        r2, g2, b2 = graph.vertices[dst]
        notboth(w, r1, r2)
        notboth(w, g1, g2)
        notboth(w, b1, b2)

The final source code looked like this (you need to adjust the path to the SAT solver executable; you should use lingeling):

#!/usr/bin/env python3

import re
import os
import sys
import logging
import tempfile
import itertools
import subprocess


# the output format is expected to be compatible with lingeling
SATSOLVER_EXECUTABLE = '/opt/lingeling-bal-2293bef-151109/lingeling'
SATSOLVER_TIMEOUT = None  # or specify an integer in seconds
VERTICES = 683


class Graph:
    def __init__(self, nb_vertices):
        self.edges = set()
        self.nb_vertices = nb_vertices
        self.vertices = {k: (3 * k + 1, 3 * k + 2, 3 * k + 3)
                         for k in range(nb_vertices)}

    def read_edges(self, filepath: str):
        """Read edges from a file containing '{src} {dest}' lines"""
        with open(filepath) as fd:
            for line in fd:
                if line.startswith('#'):
                    continue
                src, dst = [int(v) for v in line.split()]
                assert 0 <= src < self.nb_vertices \
                    and 0 <= dst < self.nb_vertices, "vertex number invalid"
                self.edges.add((src, dst))

    def get_assignment(self, model: set):
        """Given a set of positive or negative boolean variables,
        return a dictionary assigning 1, 2 or 3 to every vertex
        """
        ass = {}
        for v, (r, g, b) in self.vertices.items():
            assert (r in model) ^ (g in model) ^ (b in model)
            assert (r not in model) | (g not in model) | (b not in model)
            ass[v] = {r in model: 1, g in model: 2, b in model: 3}[True]
        return ass


# storage/writer for CNF data

class CnfStorage:
    def __init__(self, fd):
        self.fd = fd
        self.clauses = set()
        self.nbvars = 0

    def add_clause(self, *ints):
        self.nbvars = max(self.nbvars, max([abs(v) for v in ints]))
        self.clauses.add(tuple(ints))

    def finish(self):
        self.fd.write('p cnf {} {}\n'.format(self.nbvars, len(self.clauses)))
        for clause in self.clauses:
            self.fd.write(' '.join([str(i) for i in clause]) + ' 0\n')
            self.fd.flush()


# CNF primitives
#   Related: http://lukas-prokop.at/tools/truthtable/

def oneof3(w, a, b, c):
    w.add_clause(a, b, c)
    w.add_clause(-a, -b, c)
    w.add_clause(-a, b, -c)
    w.add_clause(a, -b, -c)
    w.add_clause(-a, -b, -c)


def notboth(w, a, b):
    w.add_clause(-a, -b)


def add_constraints(graph, w):
    # constraints for colors of one vertex
    for v, (red, green, blue) in graph.vertices.items():
        oneof3(w, red, green, blue)

    # constraint for edges
    for (src, dst) in graph.edges:
        r1, g1, b1 = graph.vertices[src]
        r2, g2, b2 = graph.vertices[dst]
        notboth(w, r1, r2)
        notboth(w, g1, g2)
        notboth(w, b1, b2)


# run the SAT solver

def run_solver(filepath):
    """Run SAT solver as subprocess. Return stdout"""
    proc = subprocess.Popen(
        [SATSOLVER_EXECUTABLE, filepath],
        stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    try:
        outs, errs = proc.communicate(timeout=SATSOLVER_TIMEOUT)
        out = outs.decode('ascii')
    except subprocess.TimeoutExpired:
        proc.kill()
        outs, errs = proc.communicate()
        out = errs.decode('ascii')

    if proc.returncode == 20:
        raise RuntimeError('SAT solver returned UNSATisfiability')
    if proc.returncode != 10:
        errmsg = 'SAT solver returned error code {}: {}'
        raise RuntimeError(errmsg.format(proc.returncode, errs))
    return out


def interpret_output(out):
    """Read a model (set of positive/negative integers) from stdout"""
    model = []
    logging.debug(out)

    something = False
    for line in out.splitlines():
        if line.startswith("v "):
            matches = map(int, re.findall(" (-?\d+)", line))
            something = True
            for match in matches:
                if match == 0:
                    # trailing zero
                    continue
                assert -match not in model
                model.append(match)

    if not something:
        raise RuntimeError("Model empty - most likely the SAT solver " +
                           "output is incompatible with lingeling")

    return tuple(model)


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('usage: ./graph-3-coloring.py <edges.txt>')
        sys.exit(1)

    fd, filepath = tempfile.mkstemp(prefix='graph-3-coloring')

    try:
        # generate graph
        graph = Graph(VERTICES)
        graph.read_edges(sys.argv[1])
        logging.info("{} read".format(sys.argv[1]))

        # store CNF
        w = CnfStorage(os.fdopen(fd, mode='w', encoding='ascii'))
        add_constraints(graph, w)
        w.finish()
        logging.info("Finished generating CNF file")

        # run SAT solver
        out = run_solver(filepath)
        model = interpret_output(out)
        assignment = graph.get_assignment(model)

        # print result
        for vertex, color in assignment.items():
            print('{}: {}'.format(vertex, color))
    finally:
        os.unlink(filepath)

And how long did it take to run it? 0.9 seconds 😎
Thank you lingeling!

Boston Key Party: Graph 3-coloring with SAT solvers

Project nihon ni ryuugakusei
(日本に留学生)

Starting with October 2016 I am going to Japan for 1 year with my girlfriend. Both of us got granted a student’s exchange year at the same place.

Setting
  • I will attend classes as a math undergraduate.
  • She attends classes as a math graduate student.
  • I just took my first Japanese classes this February.
  • She has more experience with the Japanese language (5 courses).
  • Our studies will be financed by the University of Graz and University of Kōbe. The funds cover the difference in living costs between Austria and Japan (housing will be more expensive in Japan). Thanks to all those institutions!
Schedule until then
  • Find appropriate classes I can attend at Kōbe University and ask for future recognition
  • I will try to finish my first and (coming) second semester in Mathematics. And finish my computer science master’s degree at TU Graz.
  • I will try to get my second course in Japanese during the summer term.
  • I will apply for a summer job for 2 months (July & August).
  • With September things get real. I will no longer live in my flat in Graz.
  • On 3rd of October we will have our first regular day at Kōbe University [math department].
  • I will return to Austria in August 2017 at the earliest (exams take place in July and August in Japan). It seems financially dull [to me] to come back to Austria at any point in time during this year.
Related work until then
  • Suspend participation in local associations (GLT, Aikikai Graz, not sure about LaTeX@Graz)
  • Reduce the amount of physical stuff I have here in Graz, store it somewhere and move out of the flat
  • Lots of planning with her – thanks sweety!
Being in Japan
  • My main focus one year abroad will be on Japanese writing, Japanese language, math and typesetting.
  • Japanese classes will be very appropriate as they are designed for exchange students with various Japanese skills. But math classes will give me a hard time. All appropriate classes for me are in Japanese.
  • I need to do at least 30 ECTS. The usual amount of ECTS per year at universities is 60 ECTS.
  • I will try to attend Aikidō classes there. Getting into a real Dōjō is supposedly difficult. But it should be easy to get into the Kōbe university Aikido club.
  • We will use the awesome train network in Japan to get around.
  • We will keep you informed about information channels, so you can reach us. But we need to organize that in the coming months.

So again: Where is Kōbe? We are close to Ōsaka, to the west (a bit to the south) of Tokyō. It takes us 3 hours with Shinkansen to get to Tokyō. And for those of you who actually cared to read that much, I have that one for you:

The Austrian flag at the Kōbe exchange programme website

Figure 1. The Austrian flag at the Kōbe exchange programme website

This is truly a great opportunity for us!
Thanks for the support we received 🙂

Project nihon ni ryuugakusei
(日本に留学生)

Paper: “Mastering the game of Go with deep neural networks and tree search”

Most recently, a paper “Mastering the game of Go with deep neural networks and tree search” was published in Nature. It describes a neural network architecture which for the first time won against a 2-dan professional Go player (Fan Hui) without handicaps achieving the title of the strongest Go engine existing. Between 5th and 15th of March 2016 a match between the engine – namely AlphaGo – and Lee Se-dol is scheduled. AlphaGo is pushed by Google.

It was difficult to me to follow the formal details lacking experience in machine learning. Here I would just like to cite the most interesting quotes:

[…] These games may be solved by recursively computing the optimal value function in a search tree containing approximately bd possible sequences of moves, where b is the game’s breadth (number of legal moves per position) and d is its depth (game length). In large games, such as chess (b ≈ 35, d ≈ 80) and especially Go (b ≈ 250, d ≈ 150), exhaustive search is infeasible, but the effective search space can be reduced by two general principles. First, […] position evaluation […] Second […] sampling actions from a policy p(a|s) that is a probability distribution over possible moves a in position s.

[…] evaluating positions using a value network, and sampling actions using a policy network.

We trained a 13-layer policy network […] from 30 million positions from the KGS Go Server.

We also tested against the strongest open-source Go program, Pachi.

The final version of AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs. We also implemented a distributed version of AlphaGo that exploited multiple machines, 40 search threads, 1202 CPUs and 176 GPUs. […] The distributed version of AlphaGo was significantly stronger, winning 75% of games against single-machine AlphaGo and 100% of its games against other programs.

A policy p(a|s) is a probability distribution over legal actions a ∈ (s). A value function is the expected outcome if all actions for both players are selected according to policy p, that is, vP(s) = [zt|st = s, at…T ~ p].

Our rollout policy pΠ(a|s) contains less handcrafted knowledge than state-of-the-art Go programs.

Time controls for formal games were 1h main time plus three periods of 30s byoyomi.

The neural network architecture itself is described in the Methods Appendix. The (formal and informal) game results are provided as Data sheet. And in Methods/Evaluation a closing parenthesis is missing in the denominator of the logistic function 😉

Paper: “Mastering the game of Go with deep neural networks and tree search”

I computed M74207281

On 7th of January 2016, GIMPS found Mersenne prime number M74207281. GIMPS is a project combining efforts of a large crowd of volunteers contributing computing power and a university server combining the result – maintained by Curtis Cooper. If a number can be found satisfying the conditions of a Mersenne prime number, the number will be checked with primality tests; according to the article it took 31 days to do so.

M74207281 is specified 2274 207 281-1 and a friend of mine at GoGraz wondered how to compute this number in decimal representation. I suggested square and multiply, but wasn’t sure how long it might take.

So I implemented the exp-by-squaring-iterative algorithm on Wikipedia (didn’t thoroughly check whether it is the most efficient algorithm for this kind of problem):

package main

import (
	"log"
	"math/big"
	"os"
	"strconv"
)

var zero *big.Int
var one *big.Int
var two *big.Int
var hundred *big.Int

func init() {
	// initialize global values
	zero = big.NewInt(0)
	one = big.NewInt(1)
	two = big.NewInt(2)
	hundred = big.NewInt(100)
}

// ExpBySquaringIterative exponentiates by squaring meaning it computes x^n
// in logarithmic time in size of n
func ExpBySquaringIterative(x, n *big.Int) *big.Int {
	switch n.Cmp(zero) {
	case -1:
		x.Div(one, x)
		n.Neg(n)
	case 0:
		c := big.NewInt(1)
		return c
	}

	y := big.NewInt(1)
	for n.Cmp(one) > 0 {
		x2 := big.NewInt(0)
		x2.Set(x)

		nm := big.NewInt(0)
		if n.Bit(0) == 0 {
			x.Mul(x2, x2)
			n.Div(n, two)
		} else {
			y.Mul(x2, y)
			x.Mul(x2, x2)
			nm.Sub(n, one)
			n.Div(nm, two)
		}
	}
	return x.Mul(x, y)
}

func main() {
	if len(os.Args) == 1 {
		log.Println("usage: go run num_2n1.go <exponent>")
		log.Println("  M74207281 is given by exponent 74207281")
		os.Exit(1)
	}

	expstr := os.Args[1]
	exponent, err := strconv.Atoi(expstr)
	if err != nil {
		log.Fatal(err)
	}

	log.Println("TBD   ExpBySquaringIterative")
	result := ExpBySquaringIterative(big.NewInt(2), big.NewInt(int64(exponent)))
	result.Sub(result, one)
	log.Println("DONE  ExpBySquaringIterative")
	log.Println("TBD   computing decimal representation")
	log.Printf("2^%d - 1 = ", exponent)
	log.Println(result)
	log.Println("DONE  computing decimal representation")
}

And how long does it take to compute this number?

Computation of M74207281

Figure 1. Computation of M74207281

Was there anything interesting about it?

  • The decimal representation uses 22.4 MB memory.
  • It took 1 hour 27 minutes to compute the decimal representation. But it only takes 4 seconds to actually compute the number in binary. By far most of the hours are spent on computing the decimal string representation.
  • The builtin function Exp (you need to set m to nil) takes 1 hour 17 minutes; so it takes a little bit less time.
  • For reference, the first digits are 300376418084… and the last digits are …073391086436351 as can be seen on lcns2’s page.
I computed M74207281

When is Tutte’s theorem in K₅ violated?

Theorem

A graph, G = (V, E), has a perfect matching if and only if for every subset U of V, the subgraph induced by V − U has at most |U| connected components with an odd number of vertices.
Tutte’s theorem

Illustration
tutte_theorem_examples
Figure 1. Tutte theorem examples

Consider the two examples given in Figure 1. G is a bipartite graph and we consider two possible subsets U the induced subgraph of G-U is given on the right side.

  1. For U₁ the number of odd components in G-U is 1 and the size of U is given with 3. Because 1 ≤ 3, the Tutte theorem holds.
  2. For U₂ the number of odd components in G-U is also 1 and the size of U is also given with 3. Again, the Tutte theorem holds.

For any U, this condition will hold. Underneath the perfect matching is given (vertex 1 is associated to vertex 2, etc).

Problem setting

planar graph K5

Figure 2. Planar graph K5 and a reduced variant

Consider graph K₅. A perfect matching clearly cannot exist for K₅, because a perfect matching can only exist if the number of vertices is even (as every vertex is associated to another vertex). Because the Tutte theorem provides a necessary and sufficient condition for the existence of a perfect matching, it should yield

For some subset U of V, the subgraph induced by V − U
has less than |U| connected components with an odd number of vertices.

Question

What is U for K₅? The solution is equivalent for its incomplete variant, which was originally given to me.

Solution

I wasn’t able to come up with a solution. So I wrote a program to try all combinations:

#!/usr/bin/env python3

import itertools

def component_sizes(G):
    """component sizes in an undirected graph"""
    comps = list(set([v]) for v in G[0])
    for e in G[1]:
        s, t = e[0], e[1]
        idx_s, idx_t = -1, -1
        for i, comp in enumerate(comps):
            if s in comp:
                idx_s = i
            if t in comp:
                idx_t = i
        if idx_s == idx_t:
            continue
        else:
            for item in comps[idx_t]:
                comps[idx_s].add(item)
            comps.remove(comps[idx_t])
    return tuple(sorted(len(comp) for comp in comps))

assert component_sizes(([1], [])) == (1,)
assert component_sizes(([1, 2], [])) == (1, 1)
assert component_sizes(([1, 2], [(1, 2)])) == (2,)
assert component_sizes(([1, 2, 3], [(1, 2)])) == (1, 2)

def subgraph(G, excl_vs):
    """Subgraph induces by excluding vertices"""
    red_V = tuple(filter(lambda v: v not in excl_vs, G[0]))
    red_E = tuple(filter(lambda e: e[0] not in excl_vs and e[1] not in excl_vs, G[1]))
    return (red_V, red_E)

assert subgraph(([1], []), [1]) == (tuple(), tuple())
assert subgraph(([1, 2], [(1, 2)]), [1]) == ((2,), tuple())
assert subgraph(([1, 2, 3], [(1, 2), (2, 3), (3, 1)]), [1]) == ((2, 3), ((2, 3),))

if __name__ == '__main__':
    def excl_vs_selection(G):
        for size in range(len(G[0])):
            for sel in itertools.combinations(G[0], r=size):
                yield set(sel)

    K5 = ((1, 2, 3, 4, 5), list(itertools.combinations(range(1,6), r=2)))
    reduced_K5 = ((1, 2, 3, 4, 5), [(1, 2), (2, 3), (3, 4), (4, 5), (5, 1)])

    G = K5
    for excl_vs in excl_vs_selection(G):
        sub = subgraph(G, excl_vs)
        sizes = component_sizes(sub)
        odd_sized_comps = len(list(map(lambda v: v % 2 == 0, sizes)))
        if odd_sized_comps > len(excl_vs):
            print(excl_vs)
            #raise ValueError("Tutte theorem violated with exclusion vertices {}".format(excl_vs))

The program yields set(). Hence the only solution is the empty set.

Consider U = {}. Then the induced graph is K₅ itself. The number of connected components in K₅ is 1 (all pairs of vertices are reachable from each other) and this component has an odd number of vertices (namely 5). The size of U is 0. And 1 > 0, so the Tutte theorem is violated and no perfect matching exists indeed.

When is Tutte’s theorem in K₅ violated?