Can We Prove That a Large System is Self-Organizing?

In my 2014 article about large systems I wrote that "what makes a system large is our inability to observe everything within the system". Large systems theory has been one of my personal thought experiments for a long time because I have long wondered how much of a system you would have to map before you could understand (more-or-less correctly) how it worked. This was a consequence of having wiped out a few large data files on computer systems back in the days when it was neither easy nor simple to create a backup for your data. When you have to write software that examines remnants of obliterated disk files you find yourself wondering if there is any way to guess at what comes next because otherwise you spend all night piecing together 1s and 0s in an attempt to figure out what it was that you just partially erased.

Another incentive to understand large systems came when I had to merge databases that were created at different times by different people. There are so many ways to spell an address like "10 Downing Street" that I thought I would never be able to normalize all the data I worked with. At what point are you done writing rules to transform non-normalized data into a standard, normalized format? How many more iterations of exception data do you have to screen before you can say you have performed enough transformations on a large set of merged tables?

Never being able to see the end of the task taught me that all the systems theory I learned in school was not very useful. We learned how to work with well-defined, closed systems but our algorithms and flowcharts and virtual machines were never applied to states in which the boundaries were unknown. In my 2014 article mentioned above I redefined "system" to mean something that has scope and boundaries. By scope I mean "the identifiable properties of all the components of a system" and by boundaries I mean "limits which determine whether things are in the system or not".

When you are normalizing data you use your rule sets to define the scope of the data and anything that doesn't comply with the rules falls outside the boundaries of your (desired) system. "Street" and "st" may fall within the boundaries of a system consisting of mailing addresses but "str" doesn't match any of the standard rules, so it falls outside those boundaries. The rule that defines the scope might be as simple as "includes 'street' or 'st' encapsulated by space characters". But a rule might also consist of "max of 15 uniquely named street records for this city". If you come across 16th uniquely named street record for the city you know something is outside the boundaries of the system, and in this case the scope is not defined well enough to tell you what.

If we assume that a large system is coherent (there is no corruption or obscuration of the data contained in the system) then we should be able to infer or learn what the scope of the system is as well as its boundaries. You have properly identified all the boundaries of a system when you have categorized all of its components according to its scope. Except that my definition of a large system says that "a large system has unexpectedly large scope and boundaries". In other words, if I cannot accurately predict what the components of the system will be or how many of them there are, the system is too large to be measured.

This is problematic if I am writing an algorithm to learn about the system. The algorithm may be something simple, such as a count of all the components that match certain criteria. Or it may be something complicated, such as dividing the system's components into LEFT and RIGHT (or UP and DOWN) things without the benefit of clear labels that designate LEFT/RIGHT or UP/DOWN.

You cannot predict what you will find in a large system, and so you cannot measure the large system based on what you already know about it. To get around this problem we resort to sampling. Sampling makes it easy for us to extrapolate what the large system might look like but it represents only one of many different possibilities. You might miss a huge radical deformation in your data. For example, suppose you have the capacity to process 100 million data rows but you are trying to analyze 10 billion rows of data. You will try to "randomly" select 100 million data rows to represent the 10 billion.

By sampling only 1% of the data you create huge gaps in visibility that might hide significant peaks and valleys (according to whatever arbitrary measurements you are using for your analysis). We can arbitrarily slide our visibility gauge from 1% up to 99% but at no time can we ever be sure that something odd is not happening in that hidden data. Our expectations will almost always fail, given a large enough data set.

We state the problem in a different way: suppose you compute a probability distribution that says you have a 1-in-500 chance of some specific event happening. Given a data set of 10 million events, where in your set does that first 1-in-500 event occur? There are 9,980,001 potential places in your data set where that first event occurs. This is the fallacy of using probability distributions to illustrate any point. Call it the Scratch Off Card Paradox, where you know that there are a certain number of winners in a scratch off ticket lottery game but you don't know which tickets are the winners. Why do you keep getting losers?

Probability distributions don't predict where and win you find a winning ticket. I cringe every time I read an article or paper that tries to explain some concept with a probability distribution. You know all the scores for a real distribution and so your distribution is not a large system.

You cannot compute a probability distribution for a large system. That's a real problem if you're trying to estimate, say, how many planets in the Milky Way galaxy probably have Earth-like life on them. All the formulas produce numbers but we have no way of knowing how close to the mark any of the formulas may be.

But you can measure parts of a large system. This is an important principle that we keep coming back to. You cannot see the entire large system but you can see parts of it. In my 2014 article I also argued that the context of a large system is commutative. A commutative context allows us to evaluate a large system without having to measure the entire system. Assuming you can take a snapshot of about 1% of a large system, you can call that a context. By capturing and comparing adjacent 1% contexts you gradually build up a picture of the large system. But let us assume that no matter how many of these 1% contexts we capture we will never see the entire 100% of the system.

Can we learn enough about the large system from changes and similarities between contexts to infer the (meta) properties of the system itself? If so then even though we cannot measure the entire system we can say that we have measured enough of the system to understand it "well enough".

A self-organizing system is a closed system. Regardless of whether it is large (immeasurable) or small (measurable), a system that is closed contains a finite set of components and properties of components. We can easily determine if a small system is self-organizing by showing that it is closed. A closed system is not acted upon by an outside force.

The closed system is self-organizing because all changes or transformations that occur within the system are produced as a result of the system's components and their properties. Think of atoms grouping together to form molecules, molecules grouping together to form chains, and chains grouping together to form things like cells or grains of minerals. Everything comes together in some way because of the inherent properties of the system. Hence, the system is self-organizing.

Any system that is not self-organizing must be acted upon by something outside the system. This external action might inject new components into the system (as with morphing sets) or it might inject new properties into the system.

If we can measure the system and determine that nothing external is acting upon the system then we know that it is closed and self-organizing. It is a miniature universe contained in itself and thus is rather boring.

On the other hand, if we cannot measure the system then we cannot determine that nothing external is acting upon it. Or can we? Can we see enough of the large system to predict that the rest of the system will behave exactly the same way as the part of the large system that we observe?

Put another way, can we validate sampling by increasing the number of samples we take? That is a common enough question in statistics. In a quality control process where the sampling is validated against benchmarks we are dealing with a small system.

What we want to do is infer from limited data whether the system is self-organizing. A self-organizing system should behave in a homogeneous fashion across all of its components regardless of how much of the system has been measured. But how many samples do you need to take before you can say that you have proven homogeneity?

What makes a large system so challenging is that no matter how many samples you take you never know how many more potential samples you could take. Your samples should all perform within acceptable limits according to the scope you have inferred from your samples. So what do you do when you encounter exceptions?

Conjecture: A self-organizing large system is homogeneous. The problem with homogeneity, however, is that if your sample sizes are too small you may not account for all local variations in composition and properties. When measuring a large system we want to define a scope that sufficiently describes all the components of the system without having to measure the entire system.

In the commutative model we have a formula MAXIMUM = (A, B, C) where Maximum less A = (B,C). We know we're dealing with a closed system if we can infer A from Maximum less (B,C). So I think to show that a large system is homogeneous you have to reach a point in your measurements where you can predict what the next sample should look like (including tolerances for variations in your scores). As long as the next sample meets your expectations you are okay. You just don't know when or if a sample will come up that breaks the scope of your system definition.

These samples are what I called "Localities" in my 2014 article. A single locality within a large system might behave differently from all other localities in the system, but if you can predict a locality that matches the next sample you process then your assumption that you are dealing with a closed system remains intact (even though it may be incorrect).

At some point you should be able to map enough of a large system to define a scope that predicts all of its localities. Of course, there is always the possibility that you have to map the entire large system (at which point it becomes a small system, because it is measurable). So let us say that some large systems can be predictable enough that you could map only a fraction of them and predict what all the localities look like.

If this is true then I don't think we can prove that a large system is self-organizing. That is an important point. If you cannot show that a large system is self-organizing then you have to allow for the possibility that something outside the system is acting upon it. Your inability to prove self-organization doesn't prove that there is an external force.

We need a method for proving self-organization in large systems, even if only a subset of large systems. This is a bit like trying to prove Fermat's Last Theorem using only integer arithmetic. You can prove some cases but not all. It may be that large systems, being unmappable, all act like virtual open systems. If even one sample cannot be predicted by what we have learned about the large system them it has to be treated like it is not homogeneous and therefore like it is not closed.

And what is the harm in that? Well, maybe nothing. But it would certainly be nice to know that if we collect enough samples we'll reach a point where we can say with certainty that we know how the whole system behaves. Otherwise there is no way to eliminate reasonable doubt about any theory that attempts to explain how the large system works.

And now that I have written all that out I hope it's not completely trivial but I'll probably come back to this in the future.

Another incentive to understand large systems came when I had to merge databases that were created at different times by different people. There are so many ways to spell an address like "10 Downing Street" that I thought I would never be able to normalize all the data I worked with. At what point are you done writing rules to transform non-normalized data into a standard, normalized format? How many more iterations of exception data do you have to screen before you can say you have performed enough transformations on a large set of merged tables?

Never being able to see the end of the task taught me that all the systems theory I learned in school was not very useful. We learned how to work with well-defined, closed systems but our algorithms and flowcharts and virtual machines were never applied to states in which the boundaries were unknown. In my 2014 article mentioned above I redefined "system" to mean something that has scope and boundaries. By scope I mean "the identifiable properties of all the components of a system" and by boundaries I mean "limits which determine whether things are in the system or not".

When you are normalizing data you use your rule sets to define the scope of the data and anything that doesn't comply with the rules falls outside the boundaries of your (desired) system. "Street" and "st" may fall within the boundaries of a system consisting of mailing addresses but "str" doesn't match any of the standard rules, so it falls outside those boundaries. The rule that defines the scope might be as simple as "includes 'street' or 'st' encapsulated by space characters". But a rule might also consist of "max of 15 uniquely named street records for this city". If you come across 16th uniquely named street record for the city you know something is outside the boundaries of the system, and in this case the scope is not defined well enough to tell you what.

If we assume that a large system is coherent (there is no corruption or obscuration of the data contained in the system) then we should be able to infer or learn what the scope of the system is as well as its boundaries. You have properly identified all the boundaries of a system when you have categorized all of its components according to its scope. Except that my definition of a large system says that "a large system has unexpectedly large scope and boundaries". In other words, if I cannot accurately predict what the components of the system will be or how many of them there are, the system is too large to be measured.

This is problematic if I am writing an algorithm to learn about the system. The algorithm may be something simple, such as a count of all the components that match certain criteria. Or it may be something complicated, such as dividing the system's components into LEFT and RIGHT (or UP and DOWN) things without the benefit of clear labels that designate LEFT/RIGHT or UP/DOWN.

You cannot predict what you will find in a large system, and so you cannot measure the large system based on what you already know about it. To get around this problem we resort to sampling. Sampling makes it easy for us to extrapolate what the large system might look like but it represents only one of many different possibilities. You might miss a huge radical deformation in your data. For example, suppose you have the capacity to process 100 million data rows but you are trying to analyze 10 billion rows of data. You will try to "randomly" select 100 million data rows to represent the 10 billion.

By sampling only 1% of the data you create huge gaps in visibility that might hide significant peaks and valleys (according to whatever arbitrary measurements you are using for your analysis). We can arbitrarily slide our visibility gauge from 1% up to 99% but at no time can we ever be sure that something odd is not happening in that hidden data. Our expectations will almost always fail, given a large enough data set.

We state the problem in a different way: suppose you compute a probability distribution that says you have a 1-in-500 chance of some specific event happening. Given a data set of 10 million events, where in your set does that first 1-in-500 event occur? There are 9,980,001 potential places in your data set where that first event occurs. This is the fallacy of using probability distributions to illustrate any point. Call it the Scratch Off Card Paradox, where you know that there are a certain number of winners in a scratch off ticket lottery game but you don't know which tickets are the winners. Why do you keep getting losers?

Probability distributions don't predict where and win you find a winning ticket. I cringe every time I read an article or paper that tries to explain some concept with a probability distribution. You know all the scores for a real distribution and so your distribution is not a large system.

You cannot compute a probability distribution for a large system. That's a real problem if you're trying to estimate, say, how many planets in the Milky Way galaxy probably have Earth-like life on them. All the formulas produce numbers but we have no way of knowing how close to the mark any of the formulas may be.

But you can measure parts of a large system. This is an important principle that we keep coming back to. You cannot see the entire large system but you can see parts of it. In my 2014 article I also argued that the context of a large system is commutative. A commutative context allows us to evaluate a large system without having to measure the entire system. Assuming you can take a snapshot of about 1% of a large system, you can call that a context. By capturing and comparing adjacent 1% contexts you gradually build up a picture of the large system. But let us assume that no matter how many of these 1% contexts we capture we will never see the entire 100% of the system.

Can we learn enough about the large system from changes and similarities between contexts to infer the (meta) properties of the system itself? If so then even though we cannot measure the entire system we can say that we have measured enough of the system to understand it "well enough".

A self-organizing system is a closed system. Regardless of whether it is large (immeasurable) or small (measurable), a system that is closed contains a finite set of components and properties of components. We can easily determine if a small system is self-organizing by showing that it is closed. A closed system is not acted upon by an outside force.

The closed system is self-organizing because all changes or transformations that occur within the system are produced as a result of the system's components and their properties. Think of atoms grouping together to form molecules, molecules grouping together to form chains, and chains grouping together to form things like cells or grains of minerals. Everything comes together in some way because of the inherent properties of the system. Hence, the system is self-organizing.

Any system that is not self-organizing must be acted upon by something outside the system. This external action might inject new components into the system (as with morphing sets) or it might inject new properties into the system.

If we can measure the system and determine that nothing external is acting upon the system then we know that it is closed and self-organizing. It is a miniature universe contained in itself and thus is rather boring.

On the other hand, if we cannot measure the system then we cannot determine that nothing external is acting upon it. Or can we? Can we see enough of the large system to predict that the rest of the system will behave exactly the same way as the part of the large system that we observe?

Put another way, can we validate sampling by increasing the number of samples we take? That is a common enough question in statistics. In a quality control process where the sampling is validated against benchmarks we are dealing with a small system.

What we want to do is infer from limited data whether the system is self-organizing. A self-organizing system should behave in a homogeneous fashion across all of its components regardless of how much of the system has been measured. But how many samples do you need to take before you can say that you have proven homogeneity?

What makes a large system so challenging is that no matter how many samples you take you never know how many more potential samples you could take. Your samples should all perform within acceptable limits according to the scope you have inferred from your samples. So what do you do when you encounter exceptions?

Conjecture: A self-organizing large system is homogeneous. The problem with homogeneity, however, is that if your sample sizes are too small you may not account for all local variations in composition and properties. When measuring a large system we want to define a scope that sufficiently describes all the components of the system without having to measure the entire system.

In the commutative model we have a formula MAXIMUM = (A, B, C) where Maximum less A = (B,C). We know we're dealing with a closed system if we can infer A from Maximum less (B,C). So I think to show that a large system is homogeneous you have to reach a point in your measurements where you can predict what the next sample should look like (including tolerances for variations in your scores). As long as the next sample meets your expectations you are okay. You just don't know when or if a sample will come up that breaks the scope of your system definition.

These samples are what I called "Localities" in my 2014 article. A single locality within a large system might behave differently from all other localities in the system, but if you can predict a locality that matches the next sample you process then your assumption that you are dealing with a closed system remains intact (even though it may be incorrect).

At some point you should be able to map enough of a large system to define a scope that predicts all of its localities. Of course, there is always the possibility that you have to map the entire large system (at which point it becomes a small system, because it is measurable). So let us say that some large systems can be predictable enough that you could map only a fraction of them and predict what all the localities look like.

If this is true then I don't think we can prove that a large system is self-organizing. That is an important point. If you cannot show that a large system is self-organizing then you have to allow for the possibility that something outside the system is acting upon it. Your inability to prove self-organization doesn't prove that there is an external force.

We need a method for proving self-organization in large systems, even if only a subset of large systems. This is a bit like trying to prove Fermat's Last Theorem using only integer arithmetic. You can prove some cases but not all. It may be that large systems, being unmappable, all act like virtual open systems. If even one sample cannot be predicted by what we have learned about the large system them it has to be treated like it is not homogeneous and therefore like it is not closed.

And what is the harm in that? Well, maybe nothing. But it would certainly be nice to know that if we collect enough samples we'll reach a point where we can say with certainty that we know how the whole system behaves. Otherwise there is no way to eliminate reasonable doubt about any theory that attempts to explain how the large system works.

And now that I have written all that out I hope it's not completely trivial but I'll probably come back to this in the future.

## Comments