Prehistory Decoded

The statistics of pattern matching

The whole basis of our interpretation of Gobekli Tepe rests on the interpretation of Pillar 43, and our interpretation of Pillar 43 rests entirely on our statistical analysis. So, this is the most important aspect of our work. However, the statistical analysis of the patterns on Pillar 43 is a little complicated. Unless you are familiar with handling the statistics of configurations of objects, as I am, and with pattern matching, you might not understand the justification we provided in our Fox paper. Therefore, rather than rushing straight into the statistical analysis of Pillar 43, it will probably be useful to set out some key ideas. To do this I’ll take you through a series of examples that contain the core ideas, gradually increasing their complexity until we reach a level sufficient to analyse Pillar 43. You might want to think of this as a training exercise. But hopefully you’ll find it interesting and not too cumbersome.

Training

Let’s start with a simple example of pattern matching. Consider a pattern formed from three squares, where each square is connected to at least one other by one of its sides (a bit like the game Carcassonne). The squares must line up perfectly as though on a square grid, and diagonal connections via corners don’t count. How many distinct patterns are there, taking into account that rotations and reflections of the whole pattern are allowed? The answer is just 2 (see Figure 1). The only distinct patterns are three squares in a row, and three squares forming a corner. Therefore, if someone randomly selects the same pattern of three connected squares as you, after allowing for rotations and reflections, you should not be at all surprised. For the same problem, but with four squares, there are only 5 distinct patterns (see Figure 1). For five squares, there are 13 distinct patterns. Now it would be a little, but not especially, lucky if someone randomly chose the same pattern of 5 connected squares at random as you did. This problem is a good one to start with because there are a distinct number of patterns that can be recognised absolutely. Another way of saying this is that there is no noise in these patterns – they can be identified without error, by a human or suitably equipped computer. Even if a pattern is selected that is rotated by multiples of 90 degrees or is reflected horizontally or vertically, it can still be classified among the 13 distinct patterns. Another nice thing about this problem is that the chance of selecting a specific pattern is not necessarily uniform, i.e. it is not necessarily true that each distinct pattern has the same chance of being selected at random. It depends on the rules of construction. For example, consider these two different rules. According to one game, a pattern is selected at random from among the 13 distinct patterns – we call this selecting a pattern with ‘uniform’ probability. The probability of selecting any specific pattern is clearly 1 in 13 in this case. Now consider another game where patterns are constructed by building a pattern from scratch, where each new square, up to the maximum of 5, is added by randomly choosing (with uniform probability) which edge of the already placed squares to join it to. In this case, the probability of ending up with 5 squares in a straight line is different to the probability of ending up with a capital ‘T’ shape.

Figure 1. The square puzzle described in the text for 3 and 4 squares.

There are some nice similarities with the patterns at Gobekli Tepe here. First, there are around 13 distinct animal patterns at Gobekli Tepe currently easily visible, although that number will likely grow as excavations continue. And second, some of the animal patterns at Gobekli Tepe have been carved ‘upside-down’ or reflected ‘left-right’. Finally, if we look at all the animal patterns carved on the broad sides of Gobekli Tepe’s pillars, we see there are not the same number of each. For example, there are more fox patterns than lion patterns currently excavated. We can, therefore, if we want, make up different rules for selecting animal patterns at random from those that appear at Gobekli Tepe. In one game, we can select them with uniform probability, i.e. each has a probability of 1 in 13 of being selected. In another game, we can select them according to their relative abundance at Gobekli Tepe. So, in this case, we would have a higher probability of selecting a fox at random than a lion. That is, if twice as many fox as lion symbols have been excavated at Gobekli Tepe, we can ensure, if we wish, that a fox has twice the probability of being selected as a lion. We won’t go into the details of this here, but it is important to realise that this type of game can be played in principle.

Okay, now let’s make the game a little more complex. Sticking with our patterns made from 5 squares joined to each other along at least one flat edge, let’s now consider selecting 7 of these 5-square patterns at random and placing them on a line. There is no problem with selecting the same pattern more than once, or with selecting a rotated or reflected version of a pattern. Let us for the moment consider that each pattern is selected randomly with uniform probability from the 13 distinct patterns, i.e. they all have the same chance of being selected. We can ask, how many distinct sequences are there, supposing we read left-to-right along our line? The answer is simply (1/13)⁷, or about 1 in 63 million. Therefore, if you were asked to make an 7-pattern sequence of these 5-squares at random, you should be very surprised indeed to select the same sequence as someone else. Of course, if 63 million people play this game once each, or if you play it 63 million times with a (very patient and understanding) friend, you will then expect, perhaps, to make the same sequence once. But for just two people playing the same game just once, the probability of selecting the same sequence is extremely low. In fact, if you did choose the same sequence as someone else, you would be entitled to suspect foul play.

Now, consider Figure 2. What does it say? You should find you can read this pattern quite quickly – it shouldn’t take much more than, say, 10 seconds to realise it reads ‘science’. Now the fact that you can read this pattern is very interesting. A computer might also be able to read it, if it had some particularly sophisticated artificial intelligence (AI) software programmed to read arbitrary writing. But this software would need to be extremely complex. But you can do it easily. This just shows how powerful and effective the human brain is at pattern recognition.

Figure 2. What does it say?

Let’s think about how your brain, or some sophisticated software, might attempt to read this complex pattern by considering the necessary sequence of logical steps. First, it has to realise the pattern is made from separate shapes. Next, it has to realise that each of the large shapes is a letter, while the small shapes can be ignored. Next, it has to match the shapes against the known alphabet. To do this it has to be able to ‘look past’ or reject the noise, which in this case is in the form of an unusual font. It also has to be able to rotate patterns, and be able to ‘read’ a circular arrangement of letters, and allow for some letters to be somewhat out of perfect circular alignment. It then has to match the arrangement of these letters against known words. And all of this is achieved in seconds by a human brain.

Now, this game is no different to our previous game of 7 x 5-square patterns on a line except for two complications. First, the sequence is circular, or nearly circular, with no pre-defined beginning or end. Second, there is some ‘noise’ in this pattern, with noise occurring in the form of distracting small patterns separate from the letters, and the letters themselves containing significant ‘noise’ in the form of an unusual font. The issue of noise is especially important for our problem. Our human brains are easily able to distinguish the ‘signal’ from the ‘noise’ in these letters. In fact, we do this all the time whenever we read anything. You are doing it right now. We can read many different fonts, of any size and colour, on any background. We can do this even if words are misspelled, or if some letters are back-to-front or upside-down, or if the word is ‘bent’ into a shape. The reason we are extremely good at pattern matching, much better than any existing computer software, is that it is important for our survival. Without pattern matching, we would not have evolved much further than pond-life. We need pattern matching not just for reading, but also for navigating our way around a room, recognising faces, understanding danger etc. We are exceptionally good at it. To do this, we need to be able to match patterns with known templates by rejecting noise. But there is a limit to what we can do. The more noisy a signal becomes, the harder it becomes to decode it. Too much noise, and the signal is lost. Artificial intelligence vision systems have the same problem. If there is too much noise, or if a pattern is too complex, then AI cannot decode what it is seeing.

Now, on pillar 43 we have a combination of difficult pattern matching problems. First, we don’t know in advance what the animal symbols represent. Therefore, we don’t know if they actually represent wild animals, or if they perhaps somehow represent fruit, or letters of a word, or even constellations. And second, given a specific system of symbolic representation (i.e. wild animals, or fruit, or letters, or constellations), each symbol might contain a lot of noise when compared to a specific example of that general type. You might think it is safe to assume the animal symbols do actually represent wild animals, since the signal-to-noise ratio would then be very low. We can then easily match each animal symbol to a wild animal – the lion symbol represents a wild lion, and so on. But where would this get us? And what is the probability of being able to do this by pure chance? Of course, the probability of being able to do this by pure chance is precisely 1, since all the animal symbols are indeed of wild animals. Therefore, in a scientific sense, we cannot confirm that this is the correct decoding of Pillar 43. It might be. It might not be. Essentially, matching the animal symbols to wild animals is too easy. We learn nothing from doing this.

Instead, what we should do is consider a variety of different systems of symbolic representation (fruit, letters, constellations etc) until we find a system of representation for which the match between all the animal symbols and their represented objects is very unlikely to occur by pure chance. At that point, if the match is significant in a scientific sense, say one in a million of occurring by pure chance, then we can be satisfied that we have almost certainly found the correct system of representation, be it fruit, letters or constellations. We can then try to decode the ‘meaning’ of Pillar 43 with our new understanding of what the symbols almost certainly represent.

However, before tackling Pillar 43, let’s consider another ‘training’ problem which will include all the elements or complications we need to be able to decode Pillar 43. In particular, this training problem will show how we can handle the issue ‘signal-to-noise’ in pattern matching, in an approximate sense, and how we can handle the problem of the specific spatial ordering of the patterns, i.e. their distinct configuration on the pillar.

So, let’s consider a football team of 13 players in a gym hall, consisting of 8 on-field players and 5 subs. Let us suppose there is a training session where the 8 on-field players are standing around the coach during a break, listening to her instructions. Now, suppose each of the 8 on-field players is standing inside a small circle containing a caricature of a face. These caricatures have been painted on the gym floor by a professional painter. Got the picture? Basically, there are 8 people standing on 8 caricatures, and 5 other people are out of the game. The caricatures are all good likenesses of one of the players, but they are not photographic. In other words, the caricatures are noisy – they accentuate some aspects of a person’s face and have a ‘cartoonish’ appearance. Nevertheless, as you are expert at recognising faces you can easily tell them apart, and match them to the players. Now, you look at the caricatures and realise that the 8 on-field players are all standing on what appears to be their own caricature. What would you think had happened? Would you think that this is just luck, or instead would you think that the coach had purposefully selected these players and asked them to stand in ‘their’ circles? You would naturally think the latter, and you would be right to do so. You might just do a ‘double-check’ of the caricatures to make sure you had not misinterpreted any, but once satisfied with that, you would rightly assume the players had been asked to stand in their circles. The reason you would naturally come to this conclusion, is that the probability of this occurring by pure chance, assuming that any player has an equal chance of being chosen, is 1/13 x 1/12 x 1/11 x 1/10 x 1/9 x 1/8 x 1/7 x 1/6 = 5!/13! where we have used the ‘factorial’ notation (!). This is very nearly a chance of 1 in 52 million. This is a tiny chance, and in scientific terms is usually considered sufficiently small to assume a hypothesis is correct. Now, in this problem, we clearly assumed there is only one ‘instance’ of each person, i.e. we cannot have the same person in more than one circle. And we also assumed any player had an equal chance of being chosen from the pool of 13. But, this does not reflect the patterns at Gobekli Tepe, because there are several pillars at Gobekli Tepe where the same animal pattern is used more than once on a pillar, and we also know that some patterns occur more frequently than others. Therefore, for consistency with Gobekli Tepe, we should allow a person to appear in more than one circle (even though this is physically impossible, it is appropriate for this analogy), and likewise we should allow the same caricature to appear more than once. The probability for finding each player standing on their own caricature is now (1/13)⁸, i.e. around 1 in 816 million, assuming a uniform probability for choosing each player. Even more unlikely to occur by pure chance.

But to help us with the problem of identifying the patterns on Pillar 43 we need to add yet another level of difficulty. So far with our training game, we have assumed that players and their caricatures can be identified perfectly. That is, although the caricatures are not photo-realistic and effectively include some noise, there is not enough noise to be in any doubt about associating a player with a caricature. This is not true of the patterns on the Vulture Stone. Therefore, we need to make the game more difficult. We can do this by making the caricatures more noisy and/or making some of the players faces more similar. So, let’s assume that a few of the players are sufficiently similar, or their caricatures are sufficiently poorly drawn, that it is hard to tell which is definitely associated with which. Also, we can make caricatures more noisy by adding, perhaps, the occasional beard, or eye-patch, or hat. Sounds crazy perhaps - but in terms of facial recognition, adding these extra features is effectively adding noise to make facial recognition more difficult, and that suits our purpose here. Now, you might think that if any of our caricature – player associations become difficult to establish with absolute confidence, that we can no longer decide whether the coach asked the players to stand on their caricatures, i.e. that we must simply give up. But this is wrong. We can still estimate the chance for any specific set of player placements, and provided this result is statistically significant, we can still make a fair decision about whether we think the players were asked to stand on their caricatures or not.

Let’s start with just one difficult case – let us suppose that there are two players, Neil and Nigel, that both look like one of the caricatures, but that in all the other cases we can be quite sure, despite any disguises, of which caricature represents each player. And let us suppose that this particularly difficult caricature occurs only once in the circle around the coach. In this case, there are two different combinations of groups of players that could be chosen that would provide a good fit to the set of caricatures. One group would involve Neil, while the other group would be identical except Neil is swapped for Nigel. This means there are two possible combinations, out of the total that can occur at random, that provide a ‘perfect’ fit. This means our chance of getting a perfect fit has doubled, i.e. it is now only one in 408 million if players are chosen at random with uniform probability. So, if either of these two combinations of players occurs, we can safely assume they were chosen deliberately and asked to stand on their caricatures.

Now let us suppose that the noise level increases until it becomes difficult to recognise any of the player’s caricatures with absolute confidence. In this case, one way of proceeding would be to rank each player against each caricature in terms of how good a likeness they are. The best fit scores 1 while the worst fit scores 13. We would then have 8 lists of 13 numbers. For example, suppose after doing this that we found that the players all scored 1 for the caricatures they are stood on. In this case, the probability of this happening is 1 in 816 million, again. If instead, they all scored 2, there are 2⁸ = 256 different combinations of players that are at least as a good a fit as the one actually stood there. The probability of this selection of players occurring by pure chance is then (2/13)⁸ = 258/816 million = 1 in 3 million. This is still very significant, and so we should conclude that the players have very almost certainly been chosen deliberately to stand on caricatures that they resemble quite well.

Moving on, if instead, they all scored 3, then the probability of this happening by pure chance is (3/13)⁸ = 6561/816 million = 1 in 124 thousand. From a scientific perspective, we should conclude from this that the players have very likely been chosen deliberately to stand on caricatures that they resemble quite well, but some doubt is starting to creep in.

Now let’s add a final level of difficulty that will take us to the level needed to analyse Pillar 43. Let us suppose that the coach is still surrounded by 8 circles, as before, in which the caricatures are drawn, and the circles are all evenly spaced around the coach, but the players are not all standing directly inside their respective circles. Perhaps they have been chatting to each other, or have got bored of the coach’s instructions, and have moved slightly outside their respective circles. Statistically, this doesn’t change anything. Provided they have not swapped positions the statistics are the same as before.

We could take this ‘positional’ correlation a notch higher and add another level of difficulty where the circles with their inscribed caricatures are not evenly spaced around the coach. This more accurately reflects the situation on Pillar 43. So, let’s suppose the circles are clustered into two groups, with 4 on one side of the coach and 4 on the other. We also see that the players are standing in two groups according to the grouping of their respective caricatures. Now this situation is more statistically significant than before. Let’s see how we can take account of this ‘positional’ correlation.

To analyse this situation, we need to work out how likely it is that two groups of players like this could form at random. There are several ways this problem can be tackled exactly, but we want a very simple way of analysing this situation, that is nonetheless quite accurate. In other words, we want a simple approximation – like a ‘back-of-the-envelope’ calculation. So, let’s phrase the problem in simple terms – let’s simplify it in such a way that the statistical estimate ‘overestimates’ the true probability. We can then provide an ‘upper bound’ to our confidence level, i.e. we can say the probability is less than a certain level, which is often good enough to make a scientific declaration.

So, by looking at the size of the circles containing caricatures, we see that we can fit a maximum of 12 of them into the circle around the coach without overlapping either each other or the coach. By doing this we are ‘discretising’ the space in which caricatures, and hence players, can be placed. This slightly reduces the freedom to place caricatures and hence players anywhere, and therefore any probability we find for this discretised space will be an overestimate of the true statistics, for which there is slightly more freedom. We can now ask the question, what is the probability of placing the caricatures into two opposing groups in the 12 circles. Let’s suppose only one caricature can occupy each space, and that a group is defined by caricatures being neighbours. Moreover, to be ‘opposing’ groups we need the two groups to be separated by two empty groups – see Figure 3. The probability that 8 caricatures, and therefore the 8 corresponding players, can form 2 opposing groups is then easy to calculate, assuming caricatures are placed at random with uniform probability. The first caricature we place defines where all the following caricatures can be placed. If we define this first circle chosen as circle 1, with the remaining circles labelled 2 to 12 clockwise, then the remaining 7 caricatures can be placed in circles 2-4 and 7-10. The probability of placing the remaining caricatures in just these circles is simply 7/11 x 6/10 x 5/9 x 4/8 x 3/7 x 2/6 x 1/5 = 7! x 4! / 11! = 1 in 330. But this calculation assumes the first caricature placed always begins a group in a clock-wise direction. As there are four different positions the first caricature placed could end up in within its group, we need to multiply this by 4. We therefore find a total probability of 4 in 330, or 1 in 82.5, that the 8 players are arranged into two opposing groups as defined.

When we combine this result of around one in 82.5, which is an overestimate, with our earlier result based on the pattern matching, of around 1 in 408 million (supposing all player – caricature matches are ranked 1^st, except for one which is ranked equal 1^st with one other player), we get a final figure of about 1 in 34 billion. This is extremely small, and therefore extremely significant. This means, if you found that the 8 caricatures were placed into two groups in a circle on either side of the coach, and that each player was stood on, or very near, a caricature that looked more like them than any other, except for one caricature for which two players match equally well, and there was room for 12 caricatures in total around the coach, then the probability of this happening by pure chance is less than 1 in 34 billion, i.e. a tiny, tiny chance. You would therefore be entitled to be quite sure that this arrangement was no accident, and that the coach has almost certainly asked these 8 players to stand on their caricatures.

Figure 3. By dividing space up into separate regions, or blocks, we can estimate the probability of specific arrangements of patterns. Left: two groups of 4 players around the coach in the training problem. Right: Pillar 43 divided into 8 regions.

Statistics of the Vulture Stone

We are now, finally, in a position to analyse Pillar 43. There is no fundamental difference between the ‘player – coach’ training system just described and the pattern analysis of Pillar 43. We have already seen that assuming the animal symbols represent wild animals is too easy, as it has probability 1 of success. We therefore must seek other systems of representation. Remember, if no system is found that is statistically significant, then we are stuck – we can’t know for sure what the animal symbols mean beyond representing wild animals. But as soon as we find one system of representation that is statistically significant, then we can stop our analysis there and assume that this is indeed the system chosen by the people of Gobekli Tepe, as the pattern match is very unlikely to have occurred by chance.

Now, we can immediately identify the scorpion on Pillar 43 with the Scorpius constellation. We then also notice the circle above the vulture/eagle’s wing might indicate the vulture/eagle represents the summer solstice constellation, with the 3 small animals along the top panel representing the other solstice and equinox constellations, so that Pillar 43 might represent a date using precession of the equinoxes. We can ask, what is the probability for these animal symbols to look like their associated constellations, and all be placed in approximately the correct positions?

First, we assume that any animal symbol could have been chosen with equal probability to appear at any position on Pillar 43. Actually, this assumption is not quite correct – most of the animal symbols on this pillar occur only rarely at Gobekli Tepe. Therefore, by making this assumption we are likely overestimating the probability of choosing the set that actually occurs – as this set consists mainly of ‘rare’ animal symbols.

Next, according to the above procedure outlined for the football team training example, we need to rank the animal symbols against each constellation presumed by our hypothesis to be represented by them. There are 7 constellations and 13 animal symbols to choose from. I am not counting the scorpion-Scorpius association here, as that was used to locate this specific position in the sky. Basically, we are taking the scorpion – Scorpius association as given, and asking about the probability for all the other animal symbols to appear in their respective positions relative to it. As excavations continue more animal symbols will likely be found – for example a wild ass seems to be present at Gobekli Tepe, but its symbol is obscured and unclear. By not including these, we are again overestimating the probability of a pattern match for Pillar 43, which is fine for our purposes.

This ranking appears in my papers and in my book. This is my perceived ranking. I will discuss the possibility of alternative rankings later. As shown in the training example, we simply multiply these rankings together to obtain an overall score for the observed configuration of animal symbols, resulting in a score of 2 in my case. This tells us that the number of different animal configurations I consider to be a good fit to the hypothesized constellations is just 2. The total number of configurations, regardless of their score, is 13⁷ = 63 million. Therefore, the probability of choosing a configuration of animal symbols that matches the constellations hypothesized, given all the possible configurations, is simply 2/63 million, or about 1 in 31.5 million, according to my ranking.

However, we must take account of the positional ordering of the three small animal symbols next to the ‘handbags’ at the top of the pillar. These animal symbols would be in the correct order whether they were ordered left-to-right or right-to-left, since we do not know in advance in which direction they were written. So, we need to double our chance of ‘success by pure chance’ by a factor of 2. We are therefore at around 1 in 16 million.

Next, we need to consider the probability of the spatial match, i.e. the positional correlation between the animal symbols and the constellations on the main panel. To do this we need to divide the main panel of Pillar 43 into several ‘regions’ within which only 1 animal pattern can appear. Following our player – coach example above, and given the size of the animal patterns on Pillar 43, I decide to divide this part of the pillar into 8 regions surrounding the scorpion. Therefore, each region defines an arc of 45 degrees around the scorpion. I assume that the four animal symbols around the scorpion could have appeared in any of these 8 regions, providing their clockwise order is fixed. As it is, they appear to be in almost exactly the correct spatial position to match the positions of the constellations, except that the bending bird with down-wriggling fish (which we match to Ophiuchus) is about 45 degrees (i.e. one region) out of place. It should be in region 3, not 2, in Figure 3. We ask, what is the probability of this good relative positioning occurring by pure chance, keeping the clockwise order of the animal symbols fixed? In other words, what is the probability that when randomly placing these three animal symbols, at most 1 would be out-of-place by 1 region?

We can estimate this as follows. The first animal symbol we place (the eagle/vulture, say) defines where all the following animal symbols can be placed. If we define this first region chosen as region 1, with the remaining regions labelled 2 to 8 clockwise, then the remaining 3 animal symbols can be placed, in clockwise order, in regions 2, 5,6, or in 3, 5, 6, or in 4, 5, 6, or in 3, 4, 6, or 3, 5, 7. Any of these 5 situations could be deemed to be as good, or better, than the one that actually occurs on Pillar 43. The total number of configurations available without changing the clockwise order of the animal symbols is 5 + (4 x 2) + (3 x 3) + (2 x 4) + 5 = 35. Therefore, the good orientational ordering of these 4 animal symbols on the main part of Pillar 43 around the scorpion has a chance of around 5 in 35 of occurring by pure chance, or 1 in 7.

When we combine this estimate with our earlier estimate for the pattern matches of 1 in 16 million, we obtain a combined figure of around 1 in 112 million. Therefore, we should conclude our hypothesis is almost certainly true, because the probability of obtaining this excellent match between Pillar 43 and the constellations suggested by pure chance is extremely remote – less than 1 in 100 million.

Now, this is the estimated probability based on my perceived ranking of the animal symbols against each constellation. But you might see things a little differently. To, refute my conclusion, you will need to have a very different view to me about the ranking of the animal symbols. Essentially, when multiplied together, your rankings will need to come to around 200, and not just 2 as they do for me, to begin to doubt my conclusion.

Note, the only way to refute my conclusion is to perform this ranking exercise. To not even attempt, and still claim my conclusion is wrong, is irrational. That is, it is irrational to agree, more-or-less, with my ranking and yet criticise my conclusions. Moreover, claims of refutation based on looking at other pillars are also invalid. In any case, following my ‘Palaeolithic cave art’ paper, the animal symbols are proven to represent constellations. So, this is a moot point.

Search This Blog

Prehistory Decoded

The statistics of pattern matching

Training

Statistics of the Vulture Stone

Comments

Post a Comment

Popular posts from this blog

Gobekli Tepe's Pillars

The meaning of H-symbols at Gobekli Tepe (updated 7th June)