Introduction
The traveling salesman problem (TSP) can be described as follows: given a list of cities and the distances between each pair of cities, find the shortest route possible that visits each city exactly once then returns to the origin city. Specifically, given an undirected weighted graph , with an ordered pair of nodes set and an edge set where is equipped with spatial structure. This means that each edge between nodes will have different weights and each node will have its coordinates, we want to find a simple cycle that visits every node exactly once while having the smallest cost.
We will utilize GCNN (Graph Convolutional Neural Network), a particular kind of GNN, together with imitation learning to solve TSP in an interesting and inspiring way. In particular, we focus on the generalization ability of models trained on small-sized problem instances.
Preliminary
We direct reader who is interested in technical detailed to the paper for the preliminary and technical part of this project. The following is just a very brief summary.
Integer Linear Programming Formulation of TSP
We first formulate TSP in terms of Integer Linear Programming. Given an undirected weighted group , we label the nodes with numbers and define
where is a variable which can be viewed as a compact representation of all variables , . Furthermore, we denote the weight on edge by , then for a particular TSP problem instance, we can formulate the problem as follows.
This is the Miller-Tucker-Zemlin formulation\cite{MTZ-formulation}. Note that in our case, since we are solving TSP exactly, all variables are integers. This type of integer linear programming is sometimes known as pure integer programming.
Branch and Bound
We then solve TSP as the above ILP formulation by branch and bound. It's possible to do branch and bound sufficiently by choosing a good branch strategy, and since branching and bound involves performing a series of branching strategy, so we model this as Markov Decision Process (MDP) in its nature.
Now, our goal is clear: We want to learn from an expert (in this case, a SOTA modern solver ), which is typically called imitating learning.
Pipeline
Our learning pipeline is as follows. We first create some random TSP instances and turn them into ILP, then we use imitation learning to learn how to choose the branching target at each branching. Our GNN model produces a set of actions with the probability corresponding to each possible action (in our case, which variable to branch). We then use Cross-Entropy Loss to compare our prediction to the result produced by and complete one iteration.

Graph Convolutional Neural Network (GCNN)
One may wonder where does the GNN involve in our methodology, is it used to model the topology of the nodes of a particular TSP instances?
The answer is no. The GCNN is our model which learn how to perform branching given the state of the problem (e.g., given the current state of the explored recursion tree of the branch and bound algorithm). Intuitively, in the pipeline graph above,
- Top-left corresponds to TSP instances (red dots corresponding to actual cities in TSP problem).
- Bottom-left corresponds to our model (black dots corresponding to a node in our GCNN).
Now, it should clear that how we utilize GNN to help us to solve this TSP problem: We use GCNN to learn a strong branching strategy and use it to do branching whenever needed.
Experimental Result
We look at the walltime needed for the model trained on TSP10/TSP15 and tested on TSP25 for 100 instances (ordered by the walltime of ).


If we zoom-in to the first 80 and last 20 instances, we have the following.




Discussion
Here we list some selected discussion. Again, please refer to the paper for completeness.
Generalization Ability
We observe that our TSP10 and TSP15 imitation models outperform the solver on baseline test instances, and successfully generalizes to TSP15, TSP20, and TSP25. They perform significantly better on average than in difficult-to-solve TSPs as compared to easier instances. They also perform better in cases of larger test instances like TSP20 and TSP25 as compared to TSP10 and TSP15. This might be due to an inherent subset structure between TSP10 and TSP20 instances, and similarly TSP15 and TSP25 instances which might not be the case for smaller test sizes. Unlike other problems, when we formulate TSP as an ILP, the problem size is growing quadratically. In other words, when we look at the model performance, the generalization ability from TSP10 to TSP25 is not a , but rather a generalization in our formulation. By adapting this methodology on a more sophisticated algorithm which formulates TSP linearly, the generalization ability should remain, and the performance will be even better in terms of TSP sizes.
Bottlenecks and Future Work
There is a huge performance difference between our proposed model (also ) and the SOTA TSP solver, . Since the proposed model's backbone is branch and bound algorithm, by formulating TSP into an ILP, we lost some useful problem structures which can be further exploited by algorithms used in . But the existence of a similar pattern of growth in solving time for more difficult instances of larger TSP sizes even for and is promising, as our imitation model applied to these solvers should lead to similar time improvements. A major bottleneck is that SOTA solvers like , or , are often licensed, hence not open-sourced. This results in the difficulty of utilizing a stronger baseline and learn from which to get a further improvement.
Conclusion
Finding exact solutions of combinatorial optimization problems as fast as possible is a challenging avenue in modern theoretical CS. Our proposed method is a step toward this goal via machine learning. For nearly all exact optimization solving algorithms, there is some kind of exhaustion going on which usually involves decisions-making when executing the algorithm. For example, the cutting plane algorithm also involves decisions-making on variables when it needs to choose a variable to cut. We see that by using our model to replace several such algorithms, we can speed up the inference time while still retaining a high-quality decision strategy. Furthermore, our experimental results show that the model can effectively learn such strategies while using less time when inference, which is a promising strategy when applied to other such algorithms.