## Data mining in Mathematics: A human example

Computers beat humans at Computation, Enumeration and even (though most people don’t realise this) Deduction. However, a central feature of most mathematics is relating to the literature, which in practise is gathering together results (and concepts) to be used for deduction. However, computers at present do not search the (human)  literature. In the hope of eventually teaching them, here are the steps by which I proved a lemma.

The  lemma is a statement of the following nature.

Lemma: For words a, b, A and B in the fundamental group of a surface satisfying some conditions, if ab=AB then a=A and b=B.

Here are the steps in my discovery of a proof:

1. The starting point was a result of Chas-Krongold, which was a special case in a more restricted context, plus knowing from Moira Chas that she had proved (but not published) a result similar to my lemma but in a more restricted context.
2. The Chas-Krongold work used what is called small cancellation theory. It is well known that a method that proves the same results in more generality (at the cost of weaker estimates) is $\latex\delta$-hyperbolicity, which is applicable in our context.
3. To use word-hyperbolicity, I tried to construct quasi-geodesics associated to ab and AB. This is what we can try to associate to elements.
4. The construction of the quasi-geodesics was the most obvious one. The issue was to prove it was a quasi-geodesic. Here I used a non-trivial theorem – local quasi-geodesics are quasi-geodesics.
5. To verify that what we obtained was a local quasi-geodesic, one used some geometry and another lemma – a lower bound on angles.
6. To get the lower bound on angles involved using another known trick from the literature – consider the commutator and show that it is small.
7. Now that we have quasi-geodesics, we use one of the main theorems  concerning $\latex\delta$-hyperbolicity, that quasi-geodesics are close to geodesics.
8. Now we use some geometry parallel to the combinatorics of Chas-Krongold and the trick of considering commutators to finish the proof.

The deductions are not difficult at any point. The main obstacle to automation (say of a computer as a collaborator) is to be able to make the right analogies and dig up the literature. Indeed the analogy is also not that obscure – early papers on $\latex\delta$-hyperbolicity do show that the results include those of small-cancellation theory.

The main barrier is understanding natural language. In the sequel I speculate on a scheme to try this.

## Stable and Unstable Curvatures: Lessons from Lohkamp

Many years ago I spent several months obsessed with a marvellous theorem of Lohkamp, without any new theorems of my own resulting. I have learnt both mathematical and moral lessons from this, which I explore in this posting (and possible sequels).

Theorem (Lohkamp): Let $(M,g)$ be a Riemannian manifold whose dimension is at least three. Then there is a Riemannian metric $g'$ on $M$, which can be taken to be arbitrarily close to $g$ in the $C^0$ metric, so that $(M,g')$ has negative Ricci curvature.

Three questions come to mind:

1. Why does this need dimension at least three?
2. The corresponding result for positive Ricci curvature is false. Why the asymmetry?
3. For what other curvature conditions can we prove such a result?

The third question amounts to asking what curvature properties are natural properties of the underlying metric space, or the underlying metric measure space. Both positive and negative sectional curvatures, more generally lower and upper bounds on sectional curvatures, can be expressed in terms of comparison triangles. This makes them metric properties. More recently it has been shown that lower bounds on Ricci curvature can be expressed in terms of so called optimal transport on metric measure spaces.

On the other hand, Lohkamp’s theorem shows that negative Ricci curvature is not an interesting natural property of metric measure spaces. More precisely, the set of Riemannian metrics with negative Ricci curvature is dense in the space of all Riemannian metrics – in particular there are no topological restrictions. In general, we would like to know which properties are dense. In particular there are no topological restrictions.

One can, to some extent, answer the first two questions relatively easily, but the answers are illuminating. I return to these in sequels. For now, I will just end with a moral from my previous failure.

Moral: We cannot avoid hard analysis. My previous attempts all amounted to attempting to reduce the problem to understanding the dominant term. However one (presumably) cannot avoid the stubborn persistence of a pair of comparable terms of opposite signs, one only slightly larger than the other. In this case, and many others that also are hard to intuitively understand, qualitative properties are determined by quantitative behaviour of the underlying system.

## Gromov’s entropy for sheafs

The entropy $h(S)$ of a set $S$ is the logarithm of the number of elements in $S$. Gromov’s remarkable insight (which he elaborated on in a course I attended a decade ago) by taking the logarithm we can ensure additivity: $h(S\times T)=h(S)+h(T)$. Perhaps one should also observe a more obvious property, monotonicity: $S\subset T\implies h(S)\leq h(T)$.

Gromov observed that in place of the logarithm of the cardinality, one can use any other quantity that is additive (and monotonic). Indeed there is another familiar additive (and monotonic) quantity: dimension. By using dimension in this way, Gromov introduced the idea of mean dimension of the space of holomorphic maps, which is analogous to the density of entropy.

Here I will comment on building on Gromov’s idea in a couple of different ways.  The starting point in both cases is that there is an obvious connection to Sheafs.

Sheafs and Subadditivity: Suppose we have a sheaf on a topological space $X$. This  means that we associate to each open set $U\subset latex X$ an object – here a set $S(U)$, perhaps with the structure of a vector space. If $V\subset U$, there is a corresponding restriction map from $S(U)$ to $S(V)$. Further, given elements $s_U\in S(U)$ and $s_V\in latex S(V)$ whose restrictions to $U\cap V$ agree, there is a unique element in $S(U\cup V)$ whose restrictions to $U$ and $V$ are $s_U$ and $s_V$. The obvious examples of sheafs are continuous (smooth, holomorphic) functions on $U$. Gromov considers sheafs of holomorphic functions with appropirate bounds to ensure finite-dimensionality.

Suppose we are given a sheaf and an appropriate notion of entropy $h(S(U)$ for the sets $S(U)$ with the given structure – for example, logarithm of the cardinality or dimension of vector spaces. We can then associate to open sets $U$ the function $H(U)=h(S(U))$. Additivity and monotonicity then give: $\min(H(U),H(V)) \leq(H(U\cup V)\leq H(U)+H(V)$.

Independence and Conditional Entropy: If $U\cap V=\phi$, then we see that $H(U\cup V)=H(U)+H(V)$, so the sets (or rather the systems with these sets as domains) are independent. We can take this to be  the definition of independence. More generally, we can define the conditional entropy: $H(U\vert V)=H(U\cap V)-H(V)$

This satisfies $0\leq H(U\vert V)\leq H(U)$, with equality in the second inequality characterising independence.

Domains of holomorphy:

We can consider the sheaf of holomorphic functions satisfying some appropriate conditions ensuring finite dimensionality: $latex\frac{\partial f}{\partial z_i}\leq Cf$ seem promising. We can then consider the function $H(U)=dim(s(U)$. We get associated conditional entropies.

If $U$ is contained in the holomorphic convex hull of $V$, then $H(U\vert V)=0$. Thus, we get a nice refinement of notions like domains of holomorphy.

Graphs and Colourings:

Given a countable graph, we consider its set of vertices as a discrete set. We can associate to a set $U$ of vertices the set of colourings of the vertices in $U$ using red, blue and green, so that adjacent vertices have different colours. This gives a sheaf, and the entropy may give insights into 3-colourings.

## Should computers enjoy solving puzzles?

People enjoy solving puzzles. In real life, solving puzzles is useful, for instance recognising a camouflaged predator.  But this is not a sufficient reason to be rewarded (emotionally) for solving the puzzle, rather than just for the outcome.

In a complex world, with a favourable outcome depending on many factors (including chance),  learning only from ultimate successes and failures is limited. It is better to reward actions that lead to success – solving puzzles, hard work, learning, etc. It is even better to develop ones judgement and taste, i.e., to know what to reward by how much.

While computers today have many wonderful reasoning powers – being the best chess players and mastermind quizzers, they seem to lack the taste and judgement that will lead to deeper discovery. But one can argue that this is because we forgot to put these qualities in them. While automated reasoning functions do have reward functions, and even compete for survival,  computers are not set up to decide, or even fine tune, what deserves a reward. A good way to introduce such taste may be to mimic the interplay between our own cognitive and emotional systems, and their relation to the world outside. Many computers may simply become tetris addicts, but perhaps a few deep thinkers will emerge.

Posted in Uncategorized | 1 Comment

## Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

Posted in Uncategorized | 1 Comment