Redundancy:
A Fresh Look at the Underpinnings of Information Theory
by Swinton A. Roof
Thanksgiving Weekend 11-26-93
Introduction
During my attempts to clarify some of the ideas of Semiotics, I managed to discover a few mathematical ideas and theorems which I think might allow a more formal approach to some of my in the past most vexing and annoying thoughts. The following words, to my dismay seemed to have no single objective definition - INFORMATION, REDUNDANCY, ORDER, CHAOS. In fact, every discusion, whether in book or conversation, if extended seemed to revolve around and contradict itself.
Finally I read the book "Grammatical Man" by Jeremy Campbell. Campbell tackles this very problem and confirmed my worst suspicions that there might be no end to that game. Thinking about the distinction between open and closed systems, however, led to an insight which for me was a small crack in this 'tough nut'. This paper attempts to elucidate a tentative theory of the relationship between redundancy and uniqueness as they apply to simple 'patterns' or 'blocks of information'. The ideas expressed herein are my own invention but I wish to credit Deric Morris and friends at MA for giving me enough positive reinforcement to continue my efforts.
My basic premis starts at the same point as that of Shannon & Weaver in conventional information theory. From there I rapidly diverge to what I feel might be a new way of looking at information. Shannon defined information from a communications standpoint because he was working at Bell Telephone Laboratories. His primary concern was how to encode messages in the most efficient and least-noisy way for transmission through some channel. His premis was to define messages in terms of the total possible messages a given block of info might represent. From there, he explored the effects of adding various forms of redundancy and probability-coding on the overall transmission rate or bandwidth. Shannon's approach, however, looks at information independant of any semantic content. Shannon was not concerned with the actual number of 'on-bits' or 'off-bits' in a message. He was only concerned with the total number of positions or bits required to express that message.
This is where my approach diverges. I will introduce a concept which I shall call bit-mass. Bitmass is nothing other than the number of positive 'on-bits' within a block of binary encoded data. This distinction is arbitrary and we could have specified 'off' bits, but if one does a comparison with particles of physical mass in a closed space, 'on' bits are more natural. From there, I will look at messages, not as units of communication, but as individual patterns which have measures of redundancy and uniqueness within themselves. This is why my paper describes itself as looking at the 'underpinnings'. The origin of my approach lies within Probability & Set Theory, but its expansion is based on simple discrete computational techniques. It is my belief that I may extend this tentative one-dimensional theory into higher dimensions, and that most if not all important concepts about the physical world can be abstractly modeled within a binary- computational framework, given enough dimensions and bits to 'play with'. I also believe that this 'finite symbol analysis' can be extended to cover continous-information-channels as Shannon did. Looking to the future, one might eventually have a fractal model of information theory with great explanatory power.
Terminology
Assume a standard random-variable non-stochastic probability model and discrete one-dimensional binary-valued sample space. This means that each of the individual points within some set of possible examples is assumed to be independant and having equal likelihood in itself. The sample space is assumed to be one dimension or sequence of two-valued positions. That is, each point can have one and only one of two possible values. A unique feature of such measurable or 'metric' space is that no two masses can occupy the same point at the same time. This is what I call the 'exclusion principle'. A simple example or model would be a chain of boxes, each capable of containing one marble or not. Another example would be a simple string of binary ones and zeros. ex. 00110111010111000.
OK.
Binary Mass ( bitmass ): The unit value of one symbol in a two symboled system. The simplest and most familiar is to let '1' equal one and the symbol '0' equal zero unit mass. The bitmass of any set of binary symbols is simply the count or sum of the unit '1' or 'ON' symbols.
Binary Sample Space 'S': The set of all possible binary sequences
Binary Event or Possiblity Space 'P': A collection of sub-sets of 'S'
Binary Event 'E': A sub-set of 'P' which has a given mass 'm' and spatial or informational-mmeasure 'n'. An event space 'P' is considered to be a union of dis-joint non-overlapping events E. An event may be considered as an instantiation of one disjoint sub-set of P. Probalistically, E is the result of a trial or experiment and has some probability p(n)/P(N), where N = summation of all n.
Information or Space-Measure: The total number of binary symbols, 'ON' and 'OFF', specifying an event. This is the information defined in Shannon's Theory.
State: Any collection of Events having the same mass and measure. Generally I will use this term to imply the full combinatorial collection containing all possible events with equivalent mass and measure. In the closed,bounded case 'state' will be considered synonymous with the word 'system'. An open system, however, will be considered as having or existing in one or more 'states'.
Theory
First I wish to consider some more ideas as a preface. As I said in the introduction, the consideration of open and closed systems led to an insight. This insight was that if you consider a block of information as closed or having a fixed mass count, then there are simply some event messages that cannot be represented by that paticular 'state'. Given say, a common computer 'byte' of information with the additional specification that only mass3-measure8-events be considered we have an interesting restriction of parameters.
Imagine that you have three marbles which can land in a row of eight boxes. The total number of ways or patterns one can produce is given by the mathematical combination of 8 things taken three at time, which computes as 8*7*6/1*2*3 = 56. This is actually a distortion of the usual meaning of the nCr function. nCr combinations do not care about positional order, whereas we do care. Also combinations consider n different things whereas all of our n are equivalent. What has transpired is an inversion of a position-valued interpretation of probability. I seem to do this a lot.
What I am mainly interested in here is that I get the right answer even if the interpretation is unconventional. Ok, this means that the above mass3-measure8 state represents a relative proportion of the total 256 possibilities of 8 bits in possibility space. This proportion is 56/256 = 0.21875. When one computes the proportions for all other possible measure8-states, one discovers that they add up to a relative total of unity or 256 total possible states or patterns.
Now, another crucial insight was that the end points or boundary conditions reveal something interesting. Consider the states where bitmass is zero and eight. Either of these states can be represented by one and only one pattern or event. Eight zeros can only be arranged one way. Eight ones can only be arranged one way. Eight bit redundancy! The maximum allowable redundancy in a measure8-possibility-space! Is not this some signature of what 'frst-order' redundancy means. The insight revealed that redundacy is mirror-imaged separate and complimentary. Uniqueness on the other hand seemed to be a unified ( although still symmetrical ) island in between these complimentary redundancies!
ADDENDUM: This paper was written in 1993. Upon re-reading it, I discovered there is still much room for confusion here. The above paragraph and succeeding arguments consider the terms redundancy and uniqueness from the perpsective of the bits themselves. By considering the pattern as a whole in relation to the total possiblity space, however, one can sustain the opposite viewpoint. Let me explain. The case of 0000 is a string of redundant zeros. The bits are highly redundant and minimally unique. Now consider the whole pattern. 0000 as a pattern has only one case so it is very unique, as is 1111. There are four possible patterns having one bit 'on', however, so the total pattern in that case will be less unique. The nominal bitwise view says 0000 is redundant, whereas the pattern or 'state' view says 0000 is unique. Having observed the above interpretational dilema, I decided to only point it out and not rewrite this paper. One should keep this in mind, however. Perhaps a more accurate terminology or mathematical treatment would precisely delineate this position. The only affect this has is to reverse the interpretive graphs. It is important however to remember the distinction.
Listing of Theorems which will be demonstrated in this paper.
1) Completeness Theorem
P(n) = log2( sum from m=0 to n of nCm )
2) Uniqueness Theorem
U(n,m) = log2( nCm )
3) Redundancy Theorem
R(n,m) = n - U(n,m)
4) Complexity Theorem
Given sample space of measure n and bitmass m1 to m2. A complexity set RUR may exist such that for m2 > m1 > 0 if R(n,m1) = U(n,m1) and R(n,m2) = U(n,m2) then URU is the sample-space set for all m > m1 and m < m2
5) Conservation Theorem
Within a closed and bounded bitmass system, information shall, by definition, be conserved.
i.e. P(n) = U(n,m) + R(n,m) = constant = n = I
Any system wherein the bitmass is allowed to change will be considered open. Any change in event sample-space will constitute breaking the condition of 'boundedness'.
I apologize for the remaining images of handwritten math.














-sar 2000