A Critical Empirical Analysis of My Bitmass Theory
as a
First Order Attempt to Achieve
an Objective Measure of Complexity
S.Roof
May, 2003
ABSTRACT:
[ This paper takes a look at the problem of defining complexity in a mathematical and consistent way. A concept of 'bitmass' will be introduced. This idea may prove to be a significant window into the differences between entropy, information, randomness, and complexity. The original Bitmass Theory as presented in 1993 can be found at: http://home.earthlink.net/~sroof/Abraxas/sar/redund/redundan.htm
I have since discarded my original terminology as being subject to the same problems as other terms associated with entropy and information. Therefore I will be using a neutral variable which I call 'KBITS'. In this paper I present several graphs of empirical KBITS data that clarify the ideas presented. My conclusions are both disappointing and rewarding. The theory sort of shoots itself down, but while going down in flames it clarifies some of the mysteries of information theory.]
addendum: A recent theoretical and experimental breakthrough of mine has enabled me to accurately define randomness and detect it within any bit string. My next paper will present this delightful discovery. The paradigm shift needed for the next paper was a result of my empirical analyses begun while working on this project. My discovery shows that randomness does indeed have deep structure. Here, I will only hint at the idea with this tease:
"The question is not how can we know the un-knowable? The question, rather, is how can we structure our expectations?"
Introduction
Widespread use of the terms entropy, information, chaos, randomness, and complexity has led to much confusion about their meanings. This is especially true in contexts which allow subtle shifts in perspective to emerge. It is the author's belief that several factors contribute to this confusion. One problem is that these terms have both subjective and objective components.
Information as originally defined by Shannon and Weaver depends on an interested observer who attaches meanings to symbols that he receives from some source. Shannon, however, defined the mathematical measure of information to exclude concepts of meaning as he was only interested in how fast a channel could transmit the symbols or messages via some suitable coding scheme. Information in this context was a measure of the reduction in uncertainty about the content of messages as they are received. This definition is abstract and doesn't exactly specify information as residing in one place or another.
It is tempting of course to say that the information resides in the source, in the observer, or more frequently in the coding scheme embedded in the channel. This tendency to attach a physical reference to information can cause problems of interpretation. Despite this, most of us are used to handling information as 'hard copy' or representations embedded in some physical medium, but we also often view physical objects themselves as having an objective measure of information via their essential structure. We must not forget, though, that physical information still must be qualified by some process of perception that identifies what constitutes a meaningful and relevant distinction. We can never remove this subjective component, but once the criteria for representation are established we can be clearer about the meaning. The coding scheme is the very thing that establishes the criteria for what constitutes a symbol or message. Seen thus, information always has an active preparatory process which precedes it even if it is only implicit.
Another problem with useage involves the original formulation by Shannon in terms of probabilities. There are two problems with Probability Theory. One is that probabilities are rarely available prior to useage. They usually emerge after the fact by statistical analysis. The other problem with probabilities is the tendency to think that they exist objectively as causal tendencies of nature. Bayesian Probability Theory, however, argues that probability is only a measure of an observer's uncertainty about an outcome. This type of probability then changes as the observer gains more information. This Bayesian association of probability with subjective information is the more desirable view in this author's opinion.
It would be nice, though, if we could remove probability terms from our constructs. That in itself would restore some objectivity to the discussion. Lets look at that very possibility!
An Alternative View
Shannon's original formulation1 for information as a measure of the reduction in uncertainty is:
Uncertainty H = The negative sum over i of p-sub-i times log2 of p-sub-i
where p-sub-i is the probability of occurence of i'th symbol out of n independent symbols.
What if we simply go as a blind observer into a situation and work with a certainty only of the total number of symbols? Our 'best case' scenario is to simply assume an equal likelihood for each symbol. The probability p for each symbol then reduces to 1/n. Then log2(1/n) = log2(n^-1) = -log2(n).
The 'H' sum then reduces to -n(-1/n)log2(n) = log2(n)
Thus we see that Shannon's 'Uncertainty H' is merely the log2 of the number of possible symbols i.e. the number of ways a system can be arranged if all arrangements are considered equal. Shannon of course was very interested in variations in probability as he was primarily concerned with the logistics of communication and language exhibits strong stochastic and Markov properties. We do not have that immediate concern in this paper. We want only to present some general ideas. This avoidance of probability issues will free us to draw conclusions that are unbiased and not dependant on exterior contingencies. The attitude is sort of like a 'man from mars' mentality. We want to look at information sources like a newborn babe with no apriori knowledge. We want to just let the patterns soak in.
Now let's look at coding schemes. The universal code these days is binary and to a large extent ascii. Ascii codes are 8bits long per symbol. Those 8bits can code for 256 different symbols (2^8 = 256). Therefore, for any peek at a random ascii code, the reduction in uncertainty measure is H = log2(256) - log2(1.0) = log2(2^8) = 8 bits. The (log2(1.0) = 0) term emphasizes that once we have peeked at the code it is 100% certain and H reduces to 0. This is tautological but emphasizes that the 256 symbols are just all the possible arrangements of 8 bits. This establishes the limits on what can be done with 8 bits of code!
If we were a Shannon engineer we could do an empirical frequency analysis of ascii codes in samples of communications and come up with a nice probability distribution to compute our H values. Such a distribution is basically a histogram that partitions the space of possiblities into discrete ranges or boxes which we can use to classify the different probabilities. If we are working with the maximum uncertainty of outcomes and assume no prior knowledge of probabilities, though, is there still a way to partition the possibility space? Turns out there is!
It is important at this point to mention a distinction between pattern and value. Our approach will look at the patterns in bit string codes, not the numerical value of the codes. An obvious source of pattern is any clumping of '1' or 'on' bits. This is where I introduce the concept of 'bitmass'. Bitmass, as I coin the term, refers to the number of 'one' or 'on' bits. The bit string '0011', for instance, has a bitmass of 2 and ellicits an obvious asymmetric pattern.
We can use bitmass to partition the space of possiblities into discrete boxes. In a box of bitmass 4, for example, we will put only those strings that have four bits set to 'on'. We could call this partition or class of patterns an ensemble. The question then remains - Can we classify all the possible bit strings of a given length in terms of bitmass. Yes. Once we do this we essentially have a histogram of sorts that is implicit and does not in any way refer to exterior probabilities. To be frank, it is a classification scheme that partitions possiblity space into conceptual ensembles. All ideas of information, uncertainty, entropy etc. have certain implicit classifiers in the same fashion. The arbitrariness of the scheme is not often mentioned though. We're making it explicit here.
Consider 4-bit strings. There are 16 possiblities. Can we classify them by bitmass? Sure:
bitmass = 0: 0000 - 1 arrangement
bitmass = 1: 1000, 0100, 0010, 0001 - 4 arrangements
bitmass = 2: 1100, 0110, 0011, 1010, 0101, 1001 - 6 arrangements
bimass = 3: 0111, 1011, 1101, 1110 - 4 arrangements
bitmass = 4: 1111 - 1 arrangement
There are thus 5 ensembles in our distribution and 1+4+6+4+1 = 16 total possibilities for 4 bit codes.
Bitmass classifications are always mirror imaged and symmetric about the halfway point. Assuming that any given arrangement is equally likely, we still notice that all patterns are not created equal! It is not that we have deduced any sort of frequency of occurence but we do have an objective measure of uncertainty if bitmass is a given constraint within a system. What kinds of systems have such constraints?
A prime example of such a constraint is physical systems! That is why I coined the term 'bitmass' to begin with. Let us consider a very simple mass constrained system. Consider a box that has 4 partitions in its bottom so that if we drop marbles into it, the marbles must land in one of the partitions. The partitions constrain the system and allow one and only one marble to inhabit a partition at a time. We can describe the possible states of this system exactly by the above analysis for a 4 bit binary code. If we constrain the mass of the system by dropping a specified number of marbles into the box and then shake it up, we essentially have equalized the probabilities (in our minds at least) for the class denoted by that particular mass of marbles. There are 5 state classes for this system. Looking through the window of bitmass classification there are only 5 distinct types of pattern. Each has a different degree of uncertainty associated with it though. The 'mass = 4 marbles' state has only one possible situation. It has a uncertainty of 0. If we drop four marbles in, we know for certain that they can lie in only one arrangement because there are only 4 partitions or available positions! The situation is different for the other classes. They are less certain. The class m=2 state has 6 possiblities.
The 'H' measure for m=2 is given by log2(6) = 2.58
If we do not know the number of marbles in the box we have 4 bits of uncertainty. If we know that we dropped in 2 marbles our uncertainty is now 2.58 bits. If we wish a relative measure that goes from 0.0 to 1.0, we can simply divide by the maximum uncertainty possible for this system i.e
relative H = 2.58/4 = 0.64624
We could loosely say that we are 64% uncertain about which way those two marbles landed in the bottom of the box. From now on, when I use the term KBITS I will mean the relative measure of H over some class of bitmass constraint. It turns out that there is a simple formula for calculating the number of possible arrangements of a bitmass system of 'b' bits and bitmass 'm'. The number of arrangements is given by bCm which is the mathematical form called the combination of b things taken m at a time.
It is calculated by taking factorials: bCm = b!/m!(b-m)!
Therefore KBITS(b,m) = relative H = log2(bCm) / b
The KBITS or relative uncertainty of a 4bit-mass2 state is:
KBITS(4,2) = log2( 4C2 )/ 4 = log2( 4x3x2x1/(2x1)(2x1) )/4 = log2(6)/4 = 0.64624
Physical Information
The physical system discussed above had two types of physical partitions - one was of mass and the other was spatial. Time was considered static or separated into isolated events. The principle of exclusion required by nature asserts that no two masses can occupy the same spot at the same time. This is true with classical physics but breaks down when we consider quantum states. In general principal, though, we may always describe physical systems by partitioning in mass, space, and time. Even if mass is not a consideration, there are still property and entity relationships (temperature, color, whatever) that classify or partition the possible states into distinct groupings. All objective (empirical) measurements of physical systems involve some level of discrete granularity or resolution which forms the minimal partitions within the system description. Even quantum physics must conform to this discrete experimental partitioning when measurements are made. The exact state of a system may be unmeasureable, but some type of partition class will be available. It will always have some KBITS measure of relative uncertainty associated with it. The constraint may not be mass, but it will appear in some other guise. This is how our minds work when dealing with symbolic information. We mentalize classes of symbol, possible arrangements, and uncertainty about what a particular outcome or symbolic state might be in the future.
A little thought should convince one that ultimately one should always be able to reduce all of his perceptual or empirical knowledge about physical systems to binary strings.It is a base 2 coding system and forms a universal coding scheme for all types of systems including real world physical stuff.
It is important to emphasize at this point that this coding of physical information is descriptive and thus subjective. It has an arbitrary component i.e. what particular coding scheme and classification scheme are chosen as the basis for representation. This arbitrary nature of all information is captured by the idea of a 'model'. All models only represent the reality. Empirical models inevitably involve discrete partitions.
The alternative view presented in this paper thus seems to correspond very strongly with the idea of 'physical information'. It also corresponds with Kolmogorov-Chaitin (KC) information theory as it approaches binary strings from an arbitrary perspective. The 'information measure' being calculated though is taken straight from Shannon-Weaver (SW) Information Theory. My approach to 'physical information' strikes me as a sort of middle ground between KC and SW formulations.
What is entropy? Entropy originated as a term in thermodynamics. It is thus concerned with physical information. Entropy S = -k log W where k is Boltzmann's constant and W is a term based on the number of ways the system can be configured. Well, this is the same form as SW's H term given some physical constants of nature. It corresponds even more directly with my formulation because Boltzman derived the entropy of an ideal gas in a box - i.e. a fixed number of molecules that cant each occupy the same position at the same time - they can only shift arrangements. My formulation for KBITS thus accurately (ignoring the physical constants) conforms to entropy of a bitB-massM system. The bits are simply the number of spatial partitions available to the molecules defined in the particular model. There is possiblity for overlap but if we partition to a small enough size, this cant happen. Theoretical treatment can still be given for larger partitions but this paper wont go that far.
The point is that KBITS is a good entropic measure, but keep in mind that entropy is always relative to the model and the model is relative to the coding system. That is why I choose KBITS as a term that can be willy nilly applied to bitstrings or even physical systems without offending too many people. It is also a good idea to keep in mind that SW information measures the reduction of uncertainty after some measurement, observation, or event. Information thus is also relative to some before/after sort of process. It is dual in nature. It makes no real sense to speak of SW information as an absolute measure contained by some entity.
Models are constructed from classification schemes. Binary codes are the ultimate bricks from which classification schemes are constructed. Ensembles produced by these classification schemes constrain and define the context for meaning. They enable us to quantify our uncertainty. Mental models thus are conceptually biased, subjective and arbitrary to some extent. This is not to say that objective interpretations cannot be shared by like minds however. Communication would not be possible if this were not so.
Therefore we may loosely say that Entropy ->Uncertainty -> Information about uncertainty -> KBITS. Each particular formulation is relative to its own model, but at some level of cognition they all resonate.
What about terms like chaos, order, randomness, complexity? These terms involve a whole other classification.
The basic difference between the above two groups of concepts is that chaos, order, randomness, and complexity refer to phenomena while entropy, uncertainty, and information are tools which we use to measure or classify those phenomena.
We will save their discussion till later though. Now I want to discuss my empirical data.
Analysis
When I had my bitmass classification idea back in 1993, I was stymied by the immense combinatorial explosion of bCm into megasized numbers for all but the simplest cases. Recently I realized that the logarithms of factorials are ridiculously easy to compute though. There was nothing to stop me from doing real-world analysis of bitstrings i.e. files.
The first thing that needed to be done was to plot theoretical curves for various bit-mass ensembles. I wanted to be able to compare them, so some normalization proceedure was needed. Getting KBITS as a relative measure was the ticket.
KBITS = log2( bCm )/b where b = number of bits in a string and m is the bitmass or number of 'on' bits in the string. It's value can only vary between 0.0 and 1.0. When KBITS = 1.0 we have all bits set to 'one' and the 'absolute' measure is simply equal to the number of bits in the string.
Please recall that we will be observing under the bitmass constraint i.e. a given string of bimass 'm' will be considered to be a member of a class or ensemble of patterns with bitmass m. When we get the KBITS value for a string we thus are making a statement about our uncertainty of what the actual pattern is relative to measurements we make. Our KBITS value is an entropic measure of a class for which we have measured only the bitmass and bit length of the string. Any information we gather is thus relative to this distinction we have made.
Next I needed to normalize bitmass as well. This is easily done as bitmass density or BMD.
BMD = bitmass/ #bits = m/b
When I actually began making plots, I was both disappointed and astounded. Below is a series of KBIT curves for various lengths of bit strings.

I had not anticipated that the curves would rapidly converge to a limit curve which forms an outer envelope or boundary. I found this result a bit shocking, as I thought they might continue to escalate. Of course, the normalization is what pulls everything back to scale, but still, the rapidity of convergence was disturbing. I reached the basic limit curve at about 1024 bits. This represents a 1024/8 = 128 byte file. Really puny by todays standards!
Even more weird was the fact that the limit curve bumps right up to but never touches the relative entropic value of 1.0 at center! To touch the KBITS = 1.0 value, the central partition (KBITS for BMD = 1/2) would have to have all the bits available to the entire string, but by definition, we are only considering BMD = half the available bits as 'on'. How can one tiny partition have darn near all of the entropic uncertainty? For 1000 bits, the bitmass = 500 partition requires almost 1000 bits to encode all the possible arrangements. Keep in mind, that for a 1000 bit string, there are 1001 separate bitmass classifications or partitions. How could 1/1001 of the possible classes require almost 1000 bits to descibe? Very strange indeed! Increasing the length of the binary string only emphasized the paradox more brutally.This was so astounding, I had to do another test.
The central partition is always 1/(b+1) of all the possible bitmass partitions. The relative uncertainty or number of bits required to encode a list of all possible arrangements in that partition is given by:
KBITS = log2( bCb/2 )/b where we have simply let m = b/2 since it is always the peak partition. Below is my graph of the maximal relative entropy for m = b/2 partition as a function of the length of the binary string.

This curve was even more disturbing. It is logarithmic of course, but it is also very very steep. The first plot of KBITS vs BMD demonstrated that darn near all computer files of any useful length at all are already wedged right on the limit curve. The second curve, however, demonstrated that one is facing a kind of overwhelming fact of mathematical nature. If the plot is done with a larger domain of bit lengths, it looks like a step function! There is always a small residual difference between KBITS and the bit length itself but it becomes vanishingly small in a relative sense.
During my research, I toyed with the idea of using bitmass in some sort of compression scheme. When doing compression algorithms there is always a price to pay in the coding scheme. One starts with a classification model, but a certain number of bits still have to be set aside for the description of the model via a coding scheme. In the above plot, it became apparant that the residual number of bits left after subtracting out the classification bits (KBITS) would never suffice to do compression. It was starting to look like all files are inevitably overwhelmingly random from the viewpoint of bitmass!
There was some interesting conclusions here. First I eventually realized that even in the m = b/2 partition, there is still lots of room for regularites in the arrangements. This relieved my tension about the randomness being inevitable. But something seemed inevitable. What was it?
Well, it seems I have independently rediscovered Bernoulli's Golden Theorem of Probability2 .The theorem as applied to a coin toss experiment simply states that the extended outcome of a large number of trials rapidly converges to a 50/50 balance between heads and tails - i.e. BMD = 0.5. My plot demonstrates this as a mathematical fact that has nothing to do with outside probabilities. It is the result of considering all arrangements equally likely. As the number of trials (bit length) increases, the number of possible arrangements in the m=b/2 partition quickly outpaces all the other possible outcomes. It is asymtotic and the rate of convergence is fast.
What other conclusions can we draw? Well, in physical systems there seems to be an optimal mix between spatial constraint and mass. When the two are balanced 50/50 there is maximum entropy and maximum degrees of freedom. Even in a maximum entropy state,though, the state may represent a high degree of order or pattern. An ordered state is just as likely as a more random state. The possible states within m = b/2 can be so immense though, that any one order has an extremely low probability of actualizing. I still haven't precisely defined those terms so my next analysis began looking at this problem.
Randomness and Compression
The central tenet of Kolmogorov-Chaitin Information Theory is that compressibility is a measure of the randomness of a bit string. Although there is of yet no precise mathematical definition of random, one can still say that if a file is perfectly random, you can't compress it. Compression algorithms exploit regularities or redundancies. Any asymmetry in the data can be exploited also. Basically if a suitable algorithmic expression can describe the data, there is a hook into compressing it.
If there is such a thing as randomness, then surely it must reside somewhere under the peak of the KBITS curve. Any bit string that is a member of an ensemble to the left or right of center must have an asymmetry of bits. It appeared to me that the next step would be to plot compression ratios for various file sizes and types to get an idea of where they fall with respect to the KBITS curve. The plot below does that for a handful of files selected off my computer. I used the reciprocal of measured compression ratios. The value plotted is the after/before size ratio. This keeps the values between 0.0 and 1.0 mostly. A ratio of 1.0 indicates no compression. I used WinZip to compress the files.

As you can see, different file types compress differently. I was interested to note that they tend to clump in certain regions.
The .BMP files (in blue) obviously compressed most. That is because images generally have lots of redundancy. The BMP files I used did not have internal compression in effect. The JPG files (yellow points), in contrast, have their own compression builtin, so they are in effect random and cannot be compressed again. Notice that they clump right around the peak of the limit curve.
EXE and TXT files fall somewhere in between. There is usually about 50% redundancy in in the English language according to Shannon.
Notice that all the files lie pretty much along the central BMD = 0.5 axis. This is not unexpected according to my analysis given earlier. Specifically they lie just to the short side of 0.5. I believe this has to do with the way coding is normally done. Ascii codes for the letters of the alphabet lie in a certain range below value 128 and for EXE files, it is natural when allocating variables for the space to be larger on the top side of what range you want to represent. Delimiters and escape codes are also usually relegated to the low valued terms. While I stated that we are only looking at bit patterns and not values, it is still useful to realize that a given maximum binary value puts a cap on the possible bitmass of an ensemble, so pattern and value are not totally unrelated.
Randomness vs. Order
I still had a problem with the randomness issue. My scheme does not predict randomness per se. Let's look at a specific example and see why. Consider an 8bit-mass4 ensemble of patterns. There are 8C4 = 70 possible arrangements. A few examples are: 00001111, 10000111, 11001100, 10101010 etc.
Recall that 8C4 is the central partition with BMD = 4/8 = 0.5. For this reason that class has the most members and represents maximum KBITS for an 8bit system. The first example in the list above demonstrates that some patterns have quite a bit of order or redundancy. There is a balance point in this class as shown in the last example 10101010. Every central KBIT ensemble for any bit system always has its own internal central balance point also. At that point the 50/50 bit ratio is dispersed equally as a simple periodic alternation of on/off bits. Thus we see that for any KBIT ensemble, there is another symmetry which I call clumping symmetry. On either end are the maximum clumping of the 50% 'on' bits to the left and similarly to the right. At the center symmetry point, there is an equally balanced periodic pattern. Segregation on the ends and equal dispersion in the middle seems to characterize KBIT ensembles. That universal order amongst KBIT ensembles puts a strange constraint on randomness. Which patterns in an ensemble can be considered random if the central ones and the ends are obviously always highly ordered?
I am forced to conclude that randomness, if it is to be a viable concept, must occur in a limited halo of patterns around the central point within an ensemble. Let's be clear here. Although empirical evidence aptly confirms this fact, it is nonetheless a strict fact of combinatoric math. This presents a paradox for defining randomness! The central periodic balance point of equal dispersion of pattern violates one of the foundational ideas of randomness. Randomness cannot tolerate periodic patterns or patterns that show asymmetric grouping in the long haul. That is a regularity that can be exploited. Random patterns thus have to cluster very strongly about the central bCb/2 ensemble and they also have to cluster tightly about the central balance point within that ensemble itself. To not do so would imply an asymmetry which can be exploited. Asymmetry implies that not all sub-patterns are represented equally. The central pattern itself, though, is pure periodic order, the opposite of random!
Thus we see that, conceptually, our idea of randomness includes an insidious attraction to symmetry, but perfect symmetry is denied to it. This reminds me of the 'moth to a flame' idea. The concept has an inevitable conclusion which causes it to self-destruct upon arrival. The problem is, we know in fact that certain files do not compress well. They appear to be random, so where the heck does randomness lie?
One problem with our concept is that we tend to think of unlimited lengths of strings. This is a conceptual generalization that aids thinking outside the box. When we think of pattern, though, our mind jumps to narrow specific examples. Most people can only store about 7 or 8 digits in short term memory. We seem to have a perceptual window on patterns that seems to be about one bytes (8bits) worth of information. In a way, one can say that when we think 'random' our mind considers an unlimited number of bits. When we think 'pattern' we consider relatively small groupings. The catch is that whenever we make any finite constraint on the number of bits, the dilema produced by combinatorial fact rears its head and randomness must concede to some degree of order.
One reason why we dont often notice this dilema is that we are creatures caught in time. The uncertainty of future events for even a 1 bit system like a coin toss makes the idea of randomness attractive. For any finite experiment we perform, however, there will always be some regularities or deviations from randomness. Recall, though, that Bernoulli's Theorem points out that the odds for a 0.5 BMD increase over time. The final bit string will represent one of the total number for that ensemble of possible patterns.
A Quantitative Mix of Order Amidst Randomness
I decided that my KBIT approach could not directly deal with the randomness issue. Was there a way to test it more directly? I started a new experiment. I wrote an algorithm to randomly generate files of a given bitmass. The idea was simple. Test my random generator for a value less than the target BMD. If the condition was satisfied, write a 'one' bit. If not, write a 'zero' bit. The resulting files would have on average the requisite bitmass density, yet those 'on' bits would still be randomly distributed. I generated 11 different files having an average BMD ranging from 0.0 up 1.0 in 0.1 increments. I also empirically measured the BMD after generation to confirm that the algorithm did indeed accomplish its mission. The results on average were within a percentage point of the target values.
After generating the files, I compressed them with WinZip and plotted the inverse compressibility vs. BMD for each file. The results are below.
[ side note: The PRNG algorithm I used was derived from Stephen Wolfram's cellular automata rule #30. He claims it survives all known tests for randomness. My results confirm that it indeed gave Winzip a lot of trouble.]

The compressibility of the eleven files is represented by the blue points. The red curve is the standard limit curve for KBIT ensembles. The results of this experiment were impressive. The random mix of controlled order mirrors the limit curve of possible KBIT ensembles. This suggests that the KBIT limit curve is in fact, the curve for an ideal compression engine that requires no overhead. I had expected WinZip to do better on these files, but the results indicate that it has a certain inefficiency that pushes the compressed sizes above the ideal. The 5 points at top center in fact show that WinZip hit a wall it couldn't deal with. All the compressed files were larger than the original. To be fair to WinZip, I should say that all the files were only 1000 bytes (8000 bits) long. It is more efficient when dealing with larger file sizes, but the results were still quite similar and not significant enough to warrant multiple graphs.
We should recall at this point that the empirical results for useful real world files show a different pattern. This suggests that useful information has a range of patterns that mix order with randomness in more significant ways. This leads to the idea of complexity.
Complexity
I believe one is eventually forced to concede that random files can potentially contain the most information, but to be useful information there has to exist a decoder that can decompress the file by adding useful redundancies back in to enable human perceptual classifications. This is a backwards way of looking at the issue but it highlights the fact that randomness in itself is the opposite of information. 100% uncertainty, if you will. There's that apparent paradox again.
Another way to look at it is so see that perceptual classifications are a form of order that is arbitrarily imposed. It is a pro-active endeavor. The order we see is dependant on the constraints we choose. By building algorithmic ordering engines we can temporarilly squeeze out those redundancies and achieve economical information packaging. We still have to carry our algorithmic tool kit with us to unpack those packages though.
Kolmogorov Complexity is the size of the ideal minimal compressed code for a string. In a sense it is an ideal limit just as my KBIT curve is. KC complexity, though, is not computable like my model is. In addition it relies on an external engine (Universal Turing Machine or algorithm). Of course my model is strongly flawed also because it paints with too broad a brush and lumps ordered bits right in with the random ones in the same ensemble. In the process of making this discovery, though, I think I have highlighted that the terms uncertainty, entropy, and information measure fall into one class of mental concept that is distinct from order, complexity, randomness, and chaos. There is an interplay between these two classes of idea that causes a lot of confusion. The idea of redundancy or structure appears to be the bridge between these two classes of terminology. As such, I think it might provide a quantitative theoretical hook.
Is there a way my method might be considered as a first order approximation to complexity? Can it be successively refined? I think so but, I haven't begun that phase of investigation yet. Here are my preliminary thoughts though.
Since even the central bCb/2 ensemble exhibits patterns with clumping redundancies, it would be useful to somehow come up with a classification scheme that sub-classes the redundancy patterns within a particular bitmass class. At this point I immediately hit a conceptual wall.
Finding an algorithm to explicate the combinatorial patterns seemed impossible for me. It would probably be devilishly recursive. Giving up on a straight theoretical approach, I began experimenting with various brute force statistical measures. A couple of weeks were spent writing code to do variance testing for kbits, sliding-window kbits, clumping frequencies etc. The result always revealed a pattern similar to the above. There were obvious correlations but the results were confusing.
Finally I achieved a breakthrough! The results were so astonishing, that my research took an abrupt change in orientation. The paradigm shift enabled me to work out a most wonderfully simple and elegant approach to randomness. This revelation will be expounded and demonstrated with real world data in my next paper.
@@@
References:
1
page 14 of:
"The Mathematical Theory of Communication" by
Claude E. Shannon and Warren Weaver. University of Illinois
Press. Copyright 1949.
2
page 54 of:
"Grammatical Man" by Jetremy Campbell. Simon
& Schuster, Inc. Copyright 1982. Bernoulli died in 1705. His
book Ars Conjectandi was published in 1713.