Motivation
Plagiarism in student submissions is frequently a problem. Some trivial algorithm to detect “variable renaming” in student submissions would be characterizing variables by the number of references within a program. Consider the following rust program:
let a = 42;
let b = 32 + a;
let c = 2 * (a - b) + a.pow(2);
This program should be determined to be equivalent to the following program:
let c = 42;
let b = 32 + c;
let a = 2 * (c - b) + c.pow(2);
Apparently, the first variable is used 4 times in total. The second variable is used twice and the third variable is used once. In this particular problem, the counts are distinctive. But does this work for larger programs?
Graph-related preliminaries
-
An [undirected] graph is a tuple of vertices (V) and edges (E)
-
Vertices are any distinctive elements and edges are sets of two different vertices
-
e.g. ({v₀, v₁, v₂}, {{v₀, v₁}, {v₁, v₂}}) is an undirected graph
-
The degree of a vertex is defined as the number of edges containing the given vertex
-
e.g. In ({v₀, v₁, v₂}, {{v₀, v₁}, {v₁, v₂}}), deg(v₁) = 2 but deg(v₀) = 1.
-
A degree sequence is the monotonically decreasing sequence of degrees of vertices of a graph.
-
e.g. ({v₀, v₁, v₂}, {{v₀, v₁}, {v₁, v₂}}) has degree sequence (2, 1, 1)
-
An isomorphism between graph G₀ and G₁ is given if we can rename the vertices of G₀ such that G₁ is given
-
e.g. ({v₀, v₁, v₂}, {{v₀, v₁}, {v₁, v₂}}) is isomorphic to ({v₀, v₁, v₂}, {{v₀, v₂}, {v₁, v₂}})
-
e.g. ({v₀, v₁, v₂}, {{v₀, v₁}}) is not isomorphic to ({v₀, v₁, v₂}, {{v₀, v₂}, {v₁, v₂}})
Claim
Two graphs are isomorphic if and only if their degree sequences are equal.
Result
This counterexample shows that two graphs can have the same degree sequence, but are non-isomorphic. The common degree sequence is (3, 2, 2, 2, 2, 1, 1, 1). One violating property of isomorphism is recognizable if you consider the neighbors of the vertex with degree 3. On the top, we have one neighbor of degree 1. On the bottom, we have two neighbors of degree 1. This difference yields non-isomorphism. Trivially, this example can be found on Wikipedia.
On the other hand: If a graph is isomorphic, we can apply the relabelling to retrieve the same graph. Obviously the degree sequence of two equivalent graphs are the same.
In conclusion: isomorphism ⇒ equal degree sequences. equal degree sequences ⤃ isomorphism.
Variable renaming checks are impossible?
Does this mean perfect variable renaming checks are impossible? No, we only considered one characterization of a graph (its degree sequence). I thought of this one, because it can be applied so universally. But there are many other properties as well.
Maybe “variable renaming” is actually characterized by graph isomorphism already. In this case, we are looking for detection of isomorphism between two graphs which is a well-known problem. And at this point, I have not thought about the original problem, I want to solve, hard enough.