The statistics of pattern matching
The whole basis of our interpretation
of Gobekli Tepe rests on the interpretation of Pillar 43, and our
interpretation of Pillar 43 rests entirely on our statistical analysis. So,
this is the most important aspect of our work. However, the statistical
analysis of the patterns on Pillar 43 is a little complicated. Unless you are
familiar with handling the statistics of configurations of objects, as I am, and with pattern matching, you might not
understand the justification we provided in our Fox paper. Therefore, rather
than rushing straight into the statistical analysis of Pillar 43, it will
probably be useful to set out some key ideas. To do this I’ll take you through
a series of examples that contain the core ideas, gradually increasing their
complexity until we reach a level sufficient to analyse Pillar 43. You might
want to think of this as a training exercise. But hopefully you’ll find it
interesting and not too cumbersome.
Training
Let’s start with a simple example of
pattern matching. Consider a pattern formed from three squares, where each
square is connected to at least one other by one of its sides (a bit like the
game Carcassonne). The squares must line up perfectly as though on a square
grid, and diagonal connections via corners don’t count. How many distinct
patterns are there, taking into account that rotations and reflections of the
whole pattern are allowed? The answer is just 2 (see Figure 1). The only
distinct patterns are three squares in a row, and three squares forming a
corner. Therefore, if someone randomly selects the same pattern of three
connected squares as you, after allowing for rotations and reflections, you
should not be at all surprised. For the same problem, but with four squares,
there are only 5 distinct patterns (see Figure 1). For five squares, there are
13 distinct patterns. Now it would be a little, but not especially, lucky if
someone randomly chose the same pattern of 5 connected squares at random as you
did. This problem is a good one to start with because there are a distinct
number of patterns that can be recognised absolutely. Another way of saying
this is that there is no noise in these patterns – they can be identified
without error, by a human or suitably equipped computer. Even if a pattern is
selected that is rotated by multiples of 90 degrees or is reflected
horizontally or vertically, it can still be classified among the 13 distinct
patterns. Another nice thing about this problem is that the chance of selecting
a specific pattern is not necessarily uniform, i.e. it is not necessarily true
that each distinct pattern has the same chance of being selected at random. It
depends on the rules of construction. For example, consider these two different
rules. According to one game, a pattern is selected at random from among the 13
distinct patterns – we call this selecting a pattern with ‘uniform’
probability. The probability of selecting any specific pattern is clearly 1 in
13 in this case. Now consider another game where patterns are constructed by
building a pattern from scratch, where each new square, up to the maximum of 5,
is added by randomly choosing (with uniform probability) which edge of the
already placed squares to join it to. In this case, the probability of ending
up with 5 squares in a straight line is different to the probability of ending
up with a capital ‘T’ shape.
Figure 1. The square puzzle described
in the text for 3 and 4 squares.
There are some nice similarities with
the patterns at Gobekli Tepe here. First, there are around 13 distinct animal
patterns at Gobekli Tepe currently easily visible, although that number will
likely grow as excavations continue. And second, some of the animal patterns at
Gobekli Tepe have been carved ‘upside-down’ or reflected ‘left-right’. Finally,
if we look at all the animal patterns carved on the broad sides of Gobekli
Tepe’s pillars, we see there are not the same number of each. For example,
there are more fox patterns than lion patterns currently excavated. We can,
therefore, if we want, make up different rules for selecting animal patterns at
random from those that appear at Gobekli Tepe. In one game, we can select them
with uniform probability, i.e. each has a probability of 1 in 13 of being
selected. In another game, we can select them according to their relative
abundance at Gobekli Tepe. So, in this case, we would have a higher probability
of selecting a fox at random than a lion. That is, if twice as many fox as lion
symbols have been excavated at Gobekli Tepe, we can ensure, if we wish, that a
fox has twice the probability of being selected as a lion. We won’t go into the
details of this here, but it is important to realise that this type of game can
be played in principle.
Okay, now let’s make the game a
little more complex. Sticking with our patterns made from 5 squares joined to
each other along at least one flat edge, let’s now consider selecting 7 of
these 5-square patterns at random and placing them on a line. There is no
problem with selecting the same pattern more than once, or with selecting a
rotated or reflected version of a pattern. Let us for the moment consider that
each pattern is selected randomly with uniform probability from the 13 distinct
patterns, i.e. they all have the same chance of being selected. We can ask, how
many distinct sequences are there, supposing we read left-to-right along our
line? The answer is simply (1/13)7, or about 1 in 63 million.
Therefore, if you were asked to make an 7-pattern sequence of these 5-squares
at random, you should be very surprised indeed to select the same sequence as
someone else. Of course, if 63 million people play this game once each, or if
you play it 63 million times with a (very patient and understanding) friend,
you will then expect, perhaps, to make the same sequence once. But for just two
people playing the same game just once, the probability of selecting the same
sequence is extremely low. In fact, if you did choose the same sequence as
someone else, you would be entitled to suspect foul play.
Now, consider Figure 2. What does it
say? You should find you can read this pattern quite quickly – it shouldn’t
take much more than, say, 10 seconds to realise it reads ‘science’. Now the
fact that you can read this pattern is very interesting. A computer might also
be able to read it, if it had some particularly sophisticated artificial
intelligence (AI) software programmed to read arbitrary writing. But this
software would need to be extremely complex. But you can do it easily. This
just shows how powerful and effective the human brain is at pattern
recognition.
Figure 2. What does it say?
Let’s think about how your brain, or
some sophisticated software, might attempt to read this complex pattern by
considering the necessary sequence of logical steps. First, it has to realise
the pattern is made from separate shapes. Next, it has to realise that each of
the large shapes is a letter, while the small shapes can be ignored. Next, it
has to match the shapes against the known alphabet. To do this it has to be
able to ‘look past’ or reject the noise, which in this case is in the form of
an unusual font. It also has to be able to rotate patterns, and be able to
‘read’ a circular arrangement of letters, and allow for some letters to be
somewhat out of perfect circular alignment. It then has to match the
arrangement of these letters against known words. And all of this is achieved
in seconds by a human brain.
Now, this game is no different to our
previous game of 7 x 5-square patterns on a line except for two complications.
First, the sequence is circular, or nearly circular, with no pre-defined
beginning or end. Second, there is some ‘noise’ in this pattern, with noise
occurring in the form of distracting small patterns separate from the letters,
and the letters themselves containing significant ‘noise’ in the form of an
unusual font. The issue of noise is especially important for our problem. Our
human brains are easily able to distinguish the ‘signal’ from the ‘noise’ in
these letters. In fact, we do this all the time whenever we read anything. You
are doing it right now. We can read many different fonts, of any size and
colour, on any background. We can do this even if words are misspelled, or if
some letters are back-to-front or upside-down, or if the word is ‘bent’ into a shape.
The reason we are extremely good at pattern matching, much better than any
existing computer software, is that it is important for our survival. Without
pattern matching, we would not have evolved much further than pond-life. We
need pattern matching not just for reading, but also for navigating our way
around a room, recognising faces, understanding danger etc. We are
exceptionally good at it. To do this, we need to be able to match patterns with
known templates by rejecting noise. But there is a limit to what we can do. The
more noisy a signal becomes, the harder it becomes to decode it. Too much
noise, and the signal is lost. Artificial intelligence vision systems have the
same problem. If there is too much noise, or if a pattern is too complex, then
AI cannot decode what it is seeing.
Now, on pillar 43 we have a
combination of difficult pattern matching problems. First, we don’t know in
advance what the animal symbols represent. Therefore, we don’t know if they
actually represent wild animals, or if they perhaps somehow represent fruit, or
letters of a word, or even constellations. And second, given a specific system
of symbolic representation (i.e. wild animals, or fruit, or letters, or
constellations), each symbol might contain a lot of noise when compared to a
specific example of that general type. You might think it is safe to assume the
animal symbols do actually represent wild animals, since the signal-to-noise
ratio would then be very low. We can then easily match each animal symbol to a
wild animal – the lion symbol represents a wild lion, and so on. But where
would this get us? And what is the probability of being able to do this by pure
chance? Of course, the probability of being able to do this by pure chance is
precisely 1, since all the animal symbols are indeed of wild animals.
Therefore, in a scientific sense, we cannot confirm that this is the correct
decoding of Pillar 43. It might be. It might not be. Essentially, matching the
animal symbols to wild animals is too easy. We learn nothing from doing this.
Instead, what we should do is
consider a variety of different systems of symbolic representation (fruit,
letters, constellations etc) until we find a system of representation for which
the match between all the animal symbols and their represented objects is very unlikely to occur by pure chance. At
that point, if the match is significant in a scientific sense, say one in a
million of occurring by pure chance, then we can be satisfied that we have
almost certainly found the correct system of representation, be it fruit,
letters or constellations. We can then try to decode the ‘meaning’ of Pillar 43
with our new understanding of what the symbols almost certainly represent.
However, before tackling Pillar 43,
let’s consider another ‘training’ problem which will include all the elements
or complications we need to be able to decode Pillar 43. In particular, this
training problem will show how we can handle the issue ‘signal-to-noise’ in
pattern matching, in an approximate sense, and how we can handle the problem of
the specific spatial ordering of the patterns, i.e. their distinct
configuration on the pillar.
So, let’s consider a football team of
13 players in a gym hall, consisting of 8 on-field players and 5 subs. Let us suppose
there is a training session where the 8 on-field players are standing around
the coach during a break, listening to her instructions. Now, suppose each of
the 8 on-field players is standing inside a small circle containing a
caricature of a face. These caricatures have been painted on the gym floor by a
professional painter. Got the picture? Basically, there are 8 people standing
on 8 caricatures, and 5 other people are out of the game. The caricatures are
all good likenesses of one of the players, but they are not photographic. In
other words, the caricatures are noisy – they accentuate some aspects of a
person’s face and have a ‘cartoonish’ appearance. Nevertheless, as you are
expert at recognising faces you can easily tell them apart, and match them to
the players. Now, you look at the caricatures and realise that the 8 on-field
players are all standing on what appears to be their own caricature. What would
you think had happened? Would you think that this is just luck, or instead
would you think that the coach had purposefully selected these players and
asked them to stand in ‘their’ circles? You would naturally think the latter,
and you would be right to do so. You might just do a ‘double-check’ of the
caricatures to make sure you had not misinterpreted any, but once satisfied
with that, you would rightly assume the players had been asked to stand in
their circles. The reason you would naturally come to this conclusion, is that
the probability of this occurring by pure chance, assuming that any player has
an equal chance of being chosen, is 1/13 x 1/12 x 1/11 x 1/10 x 1/9 x 1/8 x 1/7
x 1/6 = 5!/13! where we have used the ‘factorial’ notation (!). This is very
nearly a chance of 1 in 52 million. This is a tiny chance, and in scientific
terms is usually considered sufficiently small to assume a hypothesis is
correct. Now, in this problem, we clearly assumed there is only one ‘instance’
of each person, i.e. we cannot have the same person in more than one circle.
And we also assumed any player had an equal chance of being chosen from the
pool of 13. But, this does not reflect the patterns at Gobekli Tepe, because
there are several pillars at Gobekli Tepe where the same animal pattern is used
more than once on a pillar, and we also know that some patterns occur more
frequently than others. Therefore, for consistency with Gobekli Tepe, we should
allow a person to appear in more than one circle (even though this is
physically impossible, it is appropriate for this analogy), and likewise we
should allow the same caricature to appear more than once. The probability for
finding each player standing on their own caricature is now (1/13)8,
i.e. around 1 in 816 million, assuming a uniform probability for choosing each
player. Even more unlikely to occur by pure chance.
But to help us with the problem of
identifying the patterns on Pillar 43 we need to add yet another level of
difficulty. So far with our training game, we have assumed that players and
their caricatures can be identified perfectly. That is, although the caricatures
are not photo-realistic and effectively include some noise, there is not enough
noise to be in any doubt about associating a player with a caricature. This is
not true of the patterns on the Vulture Stone. Therefore, we need to make the
game more difficult. We can do this by making the caricatures more noisy and/or
making some of the players faces more similar. So, let’s assume that a few of
the players are sufficiently similar, or their caricatures are sufficiently
poorly drawn, that it is hard to tell which is definitely associated with
which. Also, we can make caricatures more noisy by adding, perhaps, the
occasional beard, or eye-patch, or hat. Sounds crazy perhaps - but in terms of
facial recognition, adding these extra features is effectively adding noise to
make facial recognition more difficult, and that suits our purpose here. Now,
you might think that if any of our caricature – player associations become
difficult to establish with absolute confidence, that we can no longer decide
whether the coach asked the players to stand on their caricatures, i.e. that we
must simply give up. But this is wrong. We can still estimate the chance for any
specific set of player placements, and provided this result is statistically
significant, we can still make a fair decision about whether we think the players
were asked to stand on their caricatures or not.
Let’s start with just one difficult
case – let us suppose that there are two players, Neil and Nigel, that both
look like one of the caricatures, but that in all the other cases we can be
quite sure, despite any disguises, of which caricature represents each player.
And let us suppose that this particularly difficult caricature occurs only once
in the circle around the coach. In this case, there are two different
combinations of groups of players that could be chosen that would provide a
good fit to the set of caricatures. One group would involve Neil, while the
other group would be identical except Neil is swapped for Nigel. This means
there are two possible combinations, out of the total that can occur at random,
that provide a ‘perfect’ fit. This means our chance of getting a perfect fit
has doubled, i.e. it is now only one in 408 million if players are chosen at
random with uniform probability. So, if either of these two combinations of
players occurs, we can safely assume they were chosen deliberately and asked to
stand on their caricatures.
Now let us suppose that the noise
level increases until it becomes difficult to recognise any of the player’s caricatures
with absolute confidence. In this case, one way of proceeding would be to rank
each player against each caricature in terms of how good a likeness they are.
The best fit scores 1 while the worst fit scores 13. We would then have 8 lists
of 13 numbers. For example, suppose after doing this that we found that the
players all scored 1 for the caricatures they are stood on. In this case, the
probability of this happening is 1 in 816 million, again. If instead, they all
scored 2, there are 28 = 256 different combinations of players that
are at least as a good a fit as the one actually stood there. The probability
of this selection of players occurring by pure chance is then (2/13)8
= 258/816 million = 1 in 3 million. This is still very significant, and so we
should conclude that the players have very almost certainly been chosen
deliberately to stand on caricatures that they resemble quite well.
Moving on, if instead, they all
scored 3, then the probability of this happening by pure chance is (3/13)8
= 6561/816 million = 1 in 124 thousand. From a scientific perspective, we
should conclude from this that the players have very likely been chosen
deliberately to stand on caricatures that they resemble quite well, but some
doubt is starting to creep in.
Now let’s add a final level of
difficulty that will take us to the level needed to analyse Pillar 43. Let us
suppose that the coach is still surrounded by 8 circles, as before, in which
the caricatures are drawn, and the circles are all evenly spaced around the
coach, but the players are not all standing directly inside their respective
circles. Perhaps they have been chatting to each other, or have got bored of
the coach’s instructions, and have moved slightly outside their respective
circles. Statistically, this doesn’t change anything. Provided they have not
swapped positions the statistics are the same as before.
We could take this ‘positional’
correlation a notch higher and add another level of difficulty where the
circles with their inscribed caricatures are not evenly spaced around the coach.
This more accurately reflects the situation on Pillar 43. So, let’s suppose the
circles are clustered into two groups, with 4 on one side of the coach and 4 on
the other. We also see that the players are standing in two groups according to
the grouping of their respective caricatures. Now this situation is more
statistically significant than before. Let’s see how we can take account of
this ‘positional’ correlation.
To analyse this situation, we need to
work out how likely it is that two groups of players like this could form at
random. There are several ways this problem can be tackled exactly, but we want
a very simple way of analysing this situation, that is nonetheless quite
accurate. In other words, we want a simple approximation – like a
‘back-of-the-envelope’ calculation. So, let’s phrase the problem in simple
terms – let’s simplify it in such a way that the statistical estimate
‘overestimates’ the true probability. We can then provide an ‘upper bound’ to
our confidence level, i.e. we can say the probability is less than a certain
level, which is often good enough to make a scientific declaration.
So, by looking at the size of the circles
containing caricatures, we see that we can fit a maximum of 12 of them into the
circle around the coach without overlapping either each other or the coach. By
doing this we are ‘discretising’ the space in which caricatures, and hence
players, can be placed. This slightly reduces the freedom to place caricatures
and hence players anywhere, and therefore any probability we find for this
discretised space will be an overestimate of the true statistics, for which
there is slightly more freedom. We can now ask the question, what is the
probability of placing the caricatures into two opposing groups in the 12
circles. Let’s suppose only one caricature can occupy each space, and that a
group is defined by caricatures being neighbours. Moreover, to be ‘opposing’
groups we need the two groups to be separated by two empty groups – see Figure
3. The probability that 8 caricatures, and therefore the 8 corresponding
players, can form 2 opposing groups is then easy to calculate, assuming
caricatures are placed at random with uniform probability. The first caricature
we place defines where all the following caricatures can be placed. If we
define this first circle chosen as circle 1, with the remaining circles
labelled 2 to 12 clockwise, then the remaining 7 caricatures can be placed in
circles 2-4 and 7-10. The probability of placing the remaining caricatures in
just these circles is simply 7/11 x 6/10 x 5/9 x 4/8 x 3/7 x 2/6 x 1/5 = 7! x
4! / 11! = 1 in 330. But this calculation assumes the first caricature placed
always begins a group in a clock-wise direction. As there are four different
positions the first caricature placed could end up in within its group, we need
to multiply this by 4. We therefore find a total probability of 4 in 330, or 1
in 82.5, that the 8 players are arranged into two opposing groups as defined.
When we combine this result of around
one in 82.5, which is an overestimate, with our earlier result based on the
pattern matching, of around 1 in 408 million (supposing all player – caricature
matches are ranked 1st, except for one which is ranked equal 1st
with one other player), we get a final figure of about 1 in 34 billion. This is
extremely small, and therefore extremely significant. This means, if you found
that the 8 caricatures were placed into two groups in a circle on either side
of the coach, and that each player was stood on, or very near, a caricature
that looked more like them than any other, except for one caricature for which
two players match equally well, and there was room for 12 caricatures in total
around the coach, then the probability of this happening by pure chance is less
than 1 in 34 billion, i.e. a tiny, tiny chance. You would therefore be entitled
to be quite sure that this arrangement was no accident, and that the coach has
almost certainly asked these 8 players to stand on their caricatures.
Figure 3. By dividing space up into
separate regions, or blocks, we can estimate the probability of specific
arrangements of patterns. Left: two groups of 4 players around the coach in the
training problem. Right: Pillar 43 divided into 8 regions.
Statistics of the Vulture Stone
We are now, finally, in a position to
analyse Pillar 43. There is no fundamental difference between the ‘player –
coach’ training system just described and the pattern analysis of Pillar 43. We
have already seen that assuming the animal symbols represent wild animals is
too easy, as it has probability 1 of success. We therefore must seek other systems of representation. Remember, if
no system is found that is statistically significant, then we are stuck – we
can’t know for sure what the animal symbols mean beyond representing wild
animals. But as soon as we find one system of representation that is
statistically significant, then we can stop our analysis there and assume that
this is indeed the system chosen by the people of Gobekli Tepe, as the pattern
match is very unlikely to have occurred by chance.
Now, we can immediately identify the
scorpion on Pillar 43 with the Scorpius constellation. We then also notice the
circle above the vulture/eagle’s wing might indicate the vulture/eagle
represents the summer solstice constellation, with the 3 small animals along
the top panel representing the other solstice and equinox constellations, so
that Pillar 43 might represent a date using precession of the equinoxes. We can
ask, what is the probability for these animal symbols to look like their
associated constellations, and all be placed in approximately the correct
positions?
First, we assume that any animal
symbol could have been chosen with equal probability to appear at any position
on Pillar 43. Actually, this assumption is not quite correct – most of the
animal symbols on this pillar occur only rarely at Gobekli Tepe. Therefore, by
making this assumption we are likely overestimating the probability of choosing
the set that actually occurs – as this set consists mainly of ‘rare’ animal
symbols.
Next, according to the above
procedure outlined for the football team training example, we need to rank the
animal symbols against each constellation presumed by our hypothesis to be
represented by them. There are 7 constellations and 13 animal symbols to choose
from. I am not counting the scorpion-Scorpius association here, as that was
used to locate this specific position in the sky. Basically, we are taking the
scorpion – Scorpius association as given, and asking about the probability for
all the other animal symbols to appear in their respective positions relative
to it. As excavations continue more animal symbols will likely be found – for
example a wild ass seems to be present at Gobekli Tepe, but its symbol is obscured
and unclear. By not including these, we are again overestimating the
probability of a pattern match for Pillar 43, which is fine for our purposes.
This ranking appears in my papers and
in my book. This is my perceived
ranking. I will discuss the possibility of alternative rankings later. As shown
in the training example, we simply multiply these rankings together to obtain
an overall score for the observed configuration of animal symbols, resulting in
a score of 2 in my case. This tells us that the number of different animal
configurations I consider to be a good fit to the hypothesized constellations
is just 2. The total number of configurations, regardless of their score, is 137
= 63 million. Therefore, the probability of choosing a configuration of animal
symbols that matches the constellations hypothesized, given all the possible
configurations, is simply 2/63 million, or about 1 in 31.5 million, according
to my ranking.
However, we must take account of the
positional ordering of the three small animal symbols next to the ‘handbags’ at
the top of the pillar. These animal symbols would be in the correct order
whether they were ordered left-to-right or right-to-left, since we do not know
in advance in which direction they were written. So, we need to double our
chance of ‘success by pure chance’ by a factor of 2. We are therefore at around
1 in 16 million.
Next, we need to consider the
probability of the spatial match, i.e. the positional correlation between the
animal symbols and the constellations on the main panel. To do this we need to
divide the main panel of Pillar 43 into several ‘regions’ within which only 1
animal pattern can appear. Following our player – coach example above, and
given the size of the animal patterns on Pillar 43, I decide to divide this
part of the pillar into 8 regions surrounding the scorpion. Therefore, each
region defines an arc of 45 degrees around the scorpion. I assume that the four
animal symbols around the scorpion could have appeared in any of these 8
regions, providing their clockwise order is fixed. As it is, they appear to be
in almost exactly the correct spatial position to match the positions of the
constellations, except that the bending bird with down-wriggling fish (which we
match to Ophiuchus) is about 45 degrees (i.e. one region) out of place. It
should be in region 3, not 2, in Figure 3. We ask, what is the probability of
this good relative positioning occurring by pure chance, keeping the clockwise
order of the animal symbols fixed? In other words, what is the probability that
when randomly placing these three animal symbols, at most 1 would be
out-of-place by 1 region?
We can estimate this as follows. The
first animal symbol we place (the eagle/vulture, say) defines where all the
following animal symbols can be placed. If we define this first region chosen
as region 1, with the remaining regions labelled 2 to 8 clockwise, then the
remaining 3 animal symbols can be placed, in clockwise order, in regions 2,
5,6, or in 3, 5, 6, or in 4, 5, 6, or in 3, 4, 6, or 3, 5, 7. Any of these 5
situations could be deemed to be as good, or better, than the one that actually
occurs on Pillar 43. The total number of configurations available without
changing the clockwise order of the animal symbols is 5 + (4 x 2) + (3 x 3) +
(2 x 4) + 5 = 35. Therefore, the good orientational ordering of these 4 animal
symbols on the main part of Pillar 43 around the scorpion has a chance of
around 5 in 35 of occurring by pure chance, or 1 in 7.
When we combine this estimate with
our earlier estimate for the pattern matches of 1 in 16 million, we obtain a
combined figure of around 1 in 112 million. Therefore, we should conclude our
hypothesis is almost certainly true, because the probability of obtaining this
excellent match between Pillar 43 and the constellations suggested by pure
chance is extremely remote – less than 1 in 100 million.
Now, this is the estimated probability
based on my perceived ranking of the
animal symbols against each constellation. But you might see things a little differently. To, refute my conclusion,
you will need to have a very different view to me about the ranking of the
animal symbols. Essentially, when multiplied together, your rankings will need
to come to around 200, and not just 2 as they do for me, to begin to doubt my
conclusion.
Note, the only way to refute my
conclusion is to perform this ranking exercise. To not even attempt, and still claim my conclusion is wrong, is irrational. That is, it is
irrational to agree, more-or-less, with my ranking and yet criticise my
conclusions. Moreover, claims of refutation based on looking at other pillars are
also invalid. In any case, following my ‘Palaeolithic cave art’ paper, the
animal symbols are proven to represent constellations. So, this is a moot
point.
Comments
Post a Comment