The many worlds of probability, reality and cognition
This page holds a copy of an article posted Nov. 17, 2013. On Nov. 25, 2013, extensive amplifications of Part V and Part VI were made. This latest version includes minor editorial and coding changes. Some newer footnotes will be found at the bottom of this essay, rather than at the bottom of a section. Use of control f should aid the reader greatly.
On the intersection of philosophy and modern science
By Paul Conant
Table of ContentsPlease use the "control f" function or equivalent to reach desired spots in this article.
Part I
Where we are going
Types of probability
Types of randomness
Types of ignorance
Insufficient reason
Purpose of probability numbers
Overview of key developments on probability
Part II
Law of large numbers
Probability distributions
Concerning Bayesianism
Expert opinion
More on Laplace's rule
On Markov chains
A note on complexity
Part III
The empirico-inductive concept
More on induction
On causality
Randomness versus causality
Is this knowledge?
How safe is the urn of nature model?
Part IV
Equiprobability and the empirico-inductive framework
Equiprobability and propensity
Popper and propensity
Indeterminism in Popper's sense
Part V
What exactly is entropy?
More on entropy
Part VI
Noumena I: Spacetime and its discontents
Noumena II: Quantum weirdness
The importance of brain teasers
A note on telepathy
In the name of Science
Part I
Where we are going
Controversy is a common thread running through the history of studies of probability assessments and statistical inference. It is my contention that so many opinions and characterizations exist because the concept roughly known as "probability" touches on the enigma of existence, with its attendant ambiguities (1).
A large and interesting literature has arisen concerning these controversies. I have sampled some of it but an exhaustive survey this is not. Nor is any formal program presented here. Rather, the idea is to try to come to grips with the assumptions on which rest the various forms of probabilistic thinking (2).
The extent of the disagreement and the many attempts to plug the holes make obvious that there is no consensus on the meaning of "probability," though students are generally taught some sort of synthesis, which may give an impression of consensus. And it is true that, in general, working scientists and statisticians adopt a pragmatic attitude, satisfying themselves with the vague thought that the axioms of Richard Von Mises [see below; use Control f with mises ax] and Andrey Kolmogorov and the work of Bruno De Finetti or of Emile Borel cover any bothersome meta-questions, which are seen as essentially trivial and irrelevant to the work at hand. That is to say, they tend to foster a shared assumption that science is solely based on testable ideas. Still, a major tool of their work, probabilistic statistics, rests upon untestable assumptions.
Kolmogorov's axioms
http://mathworld.wolfram.com/KolmogorovsAxioms.html
De Finetti's views on probability
https://en.wikipedia.org/wiki/Bruno_de_Finetti
On Borel's contributions
http://www.informationphilosopher.com/solutions/scientists/borel/
Not only do I intend to talk about these assumptions, but to also enter the no-go zone of "metaphysics." Though practicing scientists may prefer to avoid the "forbidden fruit" of ontology and epistemology found in this zone, they certainly will lack an important understanding of what they are doing if they decline to enter. The existence of the raging controversies tends to underscore the point that there is something more that needs understanding than is found in typical probability and statistics books.
Further, I intend to argue that the statistical conception of the world of appearances is only valid under certain conditions and that an unseen "noumenal" world is of great significance and implies a nonlinearity, in the asymmetric n-body sense, that current probability models cannot account for.
The notion of a noumenal world of course has a long history. Recall Plato's cave parable and the arguments of Kant and the idealists. Plato's noumenal world refers to what is knowable to the enlightened, whereas Kant's refers to what cannot be known at all, though the philosopher argues for a divine order. My version does not address Plato's vision or necessarily imply Kant's spiritual realm -- though that notion cannot on logical grounds be excluded -- but rather suggests that modern physics sometimes yields ways to infer aspects of this hidden world.
In addition, I suggest that though a number of exhaustive formalizations of "probability theory" have been proffered, people tend to pilfer a few attractive concepts but otherwise don't take such formalizations very seriously -- though perhaps that assessment does not apply to pure logicians. Similarly, I wonder whether talk of such things as "the topology of a field" adds much to an understanding of probability and its role in science (3). Certainly, few scientists bother with such background considerations.
In the end, we find that the value of a probabilistic method is itself probabilistic. If one is satisfied that the success rate accords with experience, one tends to accept the method. The more so if a group corroborates that assessment.
The usual axioms of probability found in standard statistics textbooks are axioms for a reason: There is no assurance that reality will in fact operate "probabilistically," which is to say we cannot be sure that the definition of randomness we use won't somehow be undermined.
Standard axioms
http://mathworld.wolfram.com/ProbabilityAxioms.html
This is not a trivial matter. How, for example, do we propose to use probability to cope with "backward running time" scenarios that occur in modern physics? Yes, we may have at hand a means of assigning, say, probability amplitudes, but if the cosmos doesn't always work according to our standard assumptions, then we have to question whether what some call a "universe of chance" is sufficient as a model not only of the cosmos at large, but of the "near reality" of our everyday existence (4).
And, as is so often the case in such discussions, a number of definitions are entangled, and hence sometimes we simply have to get the gist (5) of a discussion until certain terms are clarified, assuming they are.
Though we will discuss the normal curve and touch lightly on other distributions, the reader needn't worry that he or she will be subjected to much in the way of intricacies of mathematical statistics. All methods of inferential statistics rest on assumptions concerning probability and randomness, and they will be our main areas of concern.
Types of probability
Rudolf Carnap (6), in an attempt to resolve the controversy between Keynesian subjectivists and Neyman-Pearson frequentists, offered two types of probability: probability1, giving degrees of confidence or "weight of evidence"; and probability2, giving "relative frequency in the long run." In my view, Carnap's two forms are insufficient.
In my classification, we have:
Probability1: Classical, as in proportion of black to white balls in an urn.
Probability2: Frequentist, as in trials of coin flips.
Probability3: Bayesian (sometimes called the probability of causes), as in determining the probability that an event happened, given an initial probability of some other event.
Probability4: Degree of confidence, as in expert opinion. This category is often subsumed
under Probability3.
Probability5: "Objective" Bayesian degree of confidence, in which an expert opinion goes hand in hand with relevant frequency ratios -- whether the relative frequency forms part of the initial estimate or whether it arrives in the form of new information.
Probability6: "Subjective" Bayesian degree of confidence, as espoused by De Finetti later in life, whereby not only does probability, in some physical sense, not exist, but degree of belief is essentially a matter of individual perception.
Probability7: Ordinary a priori probability, often termed propensity, associated, for example, with gambling systems. The biases are built into the game based on ordinary frequency logic and, possibly, based on the advance testing of equipment.
Probability8: The propensity of Karl Popper, which he saw as a fundamental physical property that has as much right to exist as --though distinct from -- a force.
Probability9: Standard quantum propensity, in which experimentally determined a priori frequencies for particle detection have been bolstered by the highly accurate quantum formalism.
Probability10: Information theory probability, which is "ordinary" probability; still, subtleties enter into the elucidation of information entropy as distinguished from physical entropy. In terms of ordinary propensity, information theory accounts for the structural constraints, which might be termed advance information. These constraints reduce the new information, sometimes called surprisal value. I' = I - Ic, where I is the new information, I the total information and Ic the structural information.
We will review these concepts to a greater or lesser degree as we proceed. Others have come up with different categorizations. From Bruce Hajek
Hajek on interpretations of probability
http://plato.stanford.edu/entries/probability-interpret/
we have three main concepts in probability:
1. Quasi-logical: "meant to measure evidential support relations." As in: "In light of the relevant seismological and geological data, it is probable that California will experience a major earthquake this decade."
2. Degree of confidence: As in: "It is probable that it will rain in Canberra."
3. An objective concept: As in: "A particular radium atom will probably decay within 10,000 years."
Ian Hacking wrote that chance began to be tamed in the 19th century when a lot of empirical data were published, primarily by government agencies.
"The published facts about deviancies [variation], and the consequent development of the social sciences, led to the erosion of determinism, so that by the end of the century C.S. Peirce could say we live in a universe of chance."
Hacking saw probability as having two aspects. "It is connected with the degrees of belief warranted by the evidence, and it is connected with the tendencies, displayed by some chance devices, to produce stable relative frequencies" (7).
Another type of probability might be called "nonlinear probability," but I demur to include this as a specific type of probability as this concept essentially falls under the rubric of conditional probability.
By "nonlinear" probability I mean a chain of conditional probabilities that includes a feedback process. If we look at any feedback control system, we see that the output is partly dependent on itself. Many such systems, though not all, are expressed by nonlinear differential equations.
So the probability of a molecule being at point X is influenced by the probabilities of all other molecules. Now, by assumption of randomness of many force vectors, the probabilities in a flowing stream tend to cancel, leaving a constant. Yet, in a negative feedback system the constants must be different for the main output and the backward-flowing control stream. So we see that in some sense probabilities are "influencing themselves." In a positive feedback loop, the "self-referencing" spirals toward some physical upper bound and again we see that probabilities, in a manner of speaking, are self-conditional.
The feedback system under consideration is the human mind's interaction with the noumenal world, this interaction producing the phenomenal world. For more detail, see sections on the noumenal world (Part VI, see sidebar) and my paper
Toward a Signal Model of Perception
https://cosmosis101.blogspot.com/2017/06/toward-signal-model-of-perception.html
Types of randomness
When discussing probability, we need to think about the complement concept of randomness, which is an assumption necessary for independence of events.
My categories of randomness:
Randomness1: Insufficient computing power leaves much unpredictable. This is seen in nonlinear differential equations, chaos theory and in cases where small truncation differences yield widely diverging trajectories (Lorenz's butterfly effect). In computer lingo, such calculations are known as "hard" computations whose computational work increases exponentially (though it has not been proved that exponential computation can never be reduced to a polynomial-work algorithm).
Randomness2: Kolmogorov/Chaitin randomness, which is closely related to randomness1. The computational complexity is measured by how close to 1 is the ratio of the algorithm information and its input information versus the output information. (If we reduce algorithms to Turing machines, then the algorithmic and the input information are strung together in a single binary string.)
Chaitin-Kolmogorov complexity
https://en.wikipedia.org/wiki/Algorithmic_information_theory
Randomness3: Randomness associated with probability9 seen in quantum effects. Randomness3 is what most would regard as intrinsic randomness. Within the constraints of the Heisenberg Uncertainty Principle one cannot, in principle, predict exactly the occurrence of two related properties of a quantum detection event.
Randomness4: Randomness associated with Probability8, the propensity of Popper. It appears to be a mix of randomness3 and randomness1.
Randomness5: The imposition of "willful ignorance" in order to guard against observer bias in a frequency-based experiment.
Randomness6: Most real numbers are not computable. They are inferred in ZFC set theory, inhabiting their own noumenal world. They are so wild that one must consider them to be utterly random. A way to possibly notionally write such a string would be to tie selection of each subsequent digit to a quantum detector. One could never be sure that such a string would not actually randomly find itself among the computables, though it can be claimed that there is a probability 1 (virtual certainty) that it would in fact be among the non-computables. Such a binary string would also, with probability 1, contain infinities of every possible finite substring. These sorts of probability claims are open to question as to whether they represent true knowledge, though for Platonically inclined mathematicians, they are quite acceptable. See my post
A fractal that heuristically suggests an ordering of the reals
https://madmathzera.blogspot.com/2017/09/the-binary-tree-representation-of-reals.html
Benoit Mandelbrot advocated three states of randomness: mild, wild and slow.
By mild, he meant randomness that accords easily with the bell-curved normal distribution. By wild, he meant curves that fluctuate sharply at aperiodic intervals. By slow, he meant a curve that looks relatively smooth (and accords well with the normal distribution) but steadily progresses toward a crisis point that he describes as equivalent to a physical phase shift, which then goes into the wild state.
One can see for example that in the iterative logistic equation, as the initial value increases asymptotically toward 4, we go from simple periodicity to intermittent finite intervals of chaos alternating with periodicity, but at 4, the crisis point, all higher reals initiate an escape from the logistic graph. The Feigenbaum constant is a measure of the tendency toward chaos (trapped aperiodic orbits) and itself might be viewed as a crisis point.
Another way to think of this is in terms of shot noise. Shot noise may increase as we change variables. So the graph of the information stream will show disjoint spikes with amplitudes that indicate the spikes can't be part of the intended message; the spikes may gradually increase in number, until we get to a crisis point, from whence there is more noise than message. We also have the transition from laminar flow to turbulence under various constraints. The transition can be of "short" or "long" duration, where we have a mixture of turbulent vortices with essentially laminar flow.
Mandelbrot wished to express his concepts in terms of fractals, which is another way of saying power laws. Logarithmic and exponential curves generally have points near origin which, when subjectively considered, seem to mark a distinction between the routine change and bizarre change. Depending on what is being measured, that distinction might occur at 210 or 210.871, or in other words arbitrarily. Or some objective measure can mark the crisis point, such as when noise equals message in terms of bits.
Wikipedia refines Mandelbrot's grading thus:
1. Proper mild randomness: short-run portioning is even for N = 2, e.g. the normal distribution.
2. Borderline mild randomness: short-run portioning is concentrated for N = 2, but eventually becomes even as N grows, e.g. the exponential distribution with λ = 1. Slow randomness with finite delocalized moments: scale factor increases faster than q but no faster than w less than 1.
3. Slow randomness with finite and localized moments: scale factor increases faster than any power of q, but remains finite, e.g. the lognormal distribution.
4. Pre-wild randomness: scale factor becomes infinite for q greater than 2 , e.g. the Pareto distribution with α = 2.5.
5. Wild randomness: infinite second moment, but finite moment of some positive order, e.g. the Pareto distribution with α = 1.5.
6. Extreme randomness: all moments are infinite, e.g. the Pareto distribution with α = 1.
I have made no attempt to make the numbering of my categories for probability correspond with that for randomness. The types of probability I present do not carry one and only one type of randomness. How they relate to randomness is a supple issue to be discussed as we go along.
In this respect, we shall also examine the issues of ignorance of a deterministic output versus ignorance of an indeterministic (quantum) output.
Types of ignorance
Is your ignorance of what outcome will occur in an experiment utterly subjective or are there physical causes for the ignorance, as in the
propensity notion(s)? Each part of that question assumes a strict demarcation between mind and external environment in an experiment, a simplifying assumption in which feedback is neglected (but can it be, really?).
Much of the difficulty in discerning the "meaning of probability" arose with the development of quantum mechanics, which, as Jan von Plato notes, "posed an old question anew, but made it more difficult than ever before to dismiss it: is probability not also to be viewed as ontic, i.e., as a feature of reality, rather than exclusively as epistemic, i.e., as a feature characterizing our state of knowledge?" (8)
The scenario below gives a glimpse of some of the issues which will be followed up further along in this essay.
Insufficient reason
A modern "insufficient reason" (8a) scenario:
Alice tells you that she will forward an email from Bob, Christine or Dan within the next five minutes. In terms of prediction, your knowledge can then be summarized as p(X = 1/3) where X = B,C,D. Whether Alice has randomized the order is unknown to you and so to you any potential permutation is just as good as any other. The state of your knowledge is encapsulated by the number 1/3. In effect, you are assuming that Alice has used a randomization procedure that rules out "common" permutations, such as BCD, though you can argue that you are assuming nothing. This holds on the 0th trial.
In this sort of scenario, it seems justifiable to employ the concept of equiprobability, which is a word reflecting minimum knowledge. We needn't worry about what Alice is or isn't doing when we aren't looking. We needn't worry about a hidden influence yielding a specific bias. All irrelevant in such a case (and here I am ignoring certain issues in physics that are addressed in the noumena sections (Part VI, below, and in
Toward).
We have here done an exercise in classical probability and can see how the principle of insufficient reason (called by John Maynard Keynes "the principle of indifference") is taken for granted as a means of describing partial, and, one might say, subjective knowledge. We can say that in this scenario, randomness4 is operative.
Even so, once one trial occurs, we may wish to do the test again. It is then that "maximum entropy" or well-shuffling may be called for. Of course, before the second trial, you have no idea whether Alice has changed the order. If you have no idea whether she has changed the permutation, then you may wish to look for a pattern that discloses a "tendency" toward nonrandom shuffling. This is where Simon Laplace steps in with his controversial rule of succession, which is intended as a means of determining whether an observed string is nonrandom; there are of course other, modern tests for nonrandomness.
If, however, Alice tells you that each trial is to be independent, then you are accepting her word that an effective randomization procedure is at work. We now enter the realm of the frequentist. Nonparametric tests or perhaps the rule of succession can test the credibility of her assertion. It is here -- in the frequentist doctrine -- where the "law of large numbers" enters the picture. So here we suggest randomness1 and perhaps randomness2; randomness3 concerning quantum effects also applies, but is generally neglected.
We should add that there is some possibility that, despite Alice's choice of permutation, the emails will arrive in an order different from her choice. There are also the possibilities that either the sender's address and name are garbled or that no message arrives at all. Now these last possibilities concern physical actions over which neither sender nor receiver has much control. So one might argue that the subjective "principle of insufficient reason" doesn't apply here. On the other hand, in the main scenario, we can agree that not only is there insufficient reason to assign anything but 1/3 to any outcome, but that also our knowledge is so limited that we don't even know whether there is a bias toward a "common" permutation, such as BCD.
Thus, application of the principle of insufficient reason in a frequentist scenario requires some care. In fact, it has been vigorously argued that this principle and its associated subjectivism entails flawed reasoning, and that only the frequentist doctrine is correct for science.
We can think of the classical probability idea as the 0th trial of a frequency scenario, in which no degree of expertise is required to obtain the number 1/3 as reflecting the chance that you will guess right as to the first email.
Purpose of probability numbers
Though it is possible, and has been done, to sever probability conceptions from their roots in the decision-making process, most of us have little patience with such abstractions, though perhaps a logician might find such an effort worthwhile. Yet, we start from the position that the purpose of a probability assignment is to determine a preferred course of action. So there may be two courses of action, and we wish to know which, on the available evidence, is the better. Hence what is wanted is an equality or an inequality. We estimate that we are better off following Course A (such as crediting some statement as plausible) than we are if we follow course B (perhaps we have little faith in a second statement's plausibility). We are thus ordering or ranking the two proposed courses of action, and plan to make a decision based on the ranking.
This ranking of proposed actions is often termed the probability of a particular outcome, such as success or failure. The ranking may be made by an expert, giving her degrees of confidence, or it may be made by recourse to the proportions of classical probability, or the frequency ratios of repeated trials of an "equivalent" experiment. Still, probability is some process of ranking, or prioritizing, potential courses of action. (We even have meta-analyses, in which the degrees of several experts are averaged.) Even in textbook drills, this purpose of ranking is implied. Many think that it is "obvious" that the frequency method has a built-in objectivity and that observer bias occurs when insufficient care has been taken to screen it out. Hence, as it is seemingly possible to screen out observer bias, what remains of the experiment must be objective. And yet that claim is open to criticism, not least of which is how a frequency ratio is defined and established.
In this respect, we need be aware of the concept of risk. Gerd Gigerenzer in his
Calculated Risks (9) presents a number of cases in which medical professionals make bad decisions based on what Gigerenzer sees as misleading statements of risk.
He cites a 1995 Scotland coronary prevention study press release, which claimed that:
"People with high cholesterol can rapidly reduce" their risk of death by 22% by taking a specific drug.
Of 1,000 people with high cholesterol who took the drug over a five-year period, 32 died.
Of 1,000 people who took a placebo over the five-year period, 41 died.
There are three ways, says Gigeranzer, to present the benefit:
1. Absolute risk reduction.
(41-32)/
1000 = 0.9%.
2. Relative risk reduction.
(Absolute risk reduction)/
(Number who die who haven't been treated) = 9/41 = 22%.
3. Number needed to treat (NNT). Number of people who must participate in a treatment in order to save one life. In this case: 9/1000 =~ 1/111, meaning 111 people is the minimum.
When such reasoning was used to encourage younger women to get breast X-rays, the result was an imposition of excessive anxiety, along with radiation risk, on women without a sufficient reason, Gigerenzer writes.
John Allen Paulos on the mammogram controversy
http://www.nytimes.com/2009/12/13/magazine/13Fob-wwln-t.html?_r=1&emc=tnt&tntemail1=y#
New knowledge may affect one's estimate of a probability. But how is this new knowledge rated? Suppose you and another are to flip a coin over some benefit. The other provides a coin, and you initially estimate your chance of winning at 1/2, but then another person blurts out: "The coin is loaded."
You may now have doubts about the validity of your estimate, as well as doubt about whether, if the claim is true, the coin is mildly or strongly biased. So, whatever you do, you will use some process of estimation that may be quite appropriate but that might not be easily quantifiable.
And so one may say that the purpose of a probability ranking is to provide a "subjective" means of deciding on a course of action in "objective reality."
Eminent minds have favored the subjectivist viewpoint. For example, Frank P. Ramsey (10) proposed that probability theory represents a "logic of personal beliefs" and notes: "The degree of a belief is just like a time interval [in relativity theory]; it has no precise meaning unless we specify more exactly how it is to be measured."
In addressing this problem, Ramsey cites Mohs' scale of hardness, in which 10 is arbitrarily assigned to a diamond, etc. Using a psychological perspective, Ramsey rates degrees of belief by scale of intensity of feeling, while granting that no one feels strongly about things he takes for granted [unless challenged, we add]. And, though critical of Keynes's
A Treatise on Probability, Ramsey observed that all logicians -- Keynes included -- supported the degrees of belief viewpoint whereas statisticians in his time generally supported a frequency theory outlook.
Yet, as Popper points out, "Probability in physics cannot be merely an indication of 'degrees of belief,' for some degrees lead to physically wrong results, whereas others do not" (11).
Overview of key developments on probability
We begin with the track of probability as related to decisions to be made in wagering during finite games of chance. Classical probability says we calculate an estimate based on proportions. The assumption is that the urn's content is "well mixed" or the card deck "well shuffled."
What the classical probabilist is saying is that we can do something with this information, even though the information of the system is incomplete. The classical probabilist had to assume maximum information entropy with proper shuffling, even though this concept had not been developed. Similarly, the "law of large numbers" is implicit. Otherwise, why keep playing a gambling game?
We show in the discussion on entropy below that maximum entropy means loss of all memory or information that would show how to find an output value in say, less than n steps, where n is the number of steps (or, better, bits) in the shuffling algorithm. We might say this maximum Kolmogorov-Chaitin entropy amounts to de facto deterministic irreversibility.
That is to say, the memory loss or maximum entropy is equivalent to effective "randomization" of card order. Memory loss is implicit, though not specified in Shannon information theory -- though one can of course assert that digital memory systems are subject to the Second Law of Thermodynamics.
But even if the deck is highly ordered when presented to an observer, as long as the observer doesn't know that, his initial probability estimate, if he is to make a bet, must assume well-shuffling, as he has no reason to suspect a specific permutation.
Despite the fact that the "law of large numbers" is implicit in classical thinking, there is no explicit statement of it prior to Jacob Bernoulli and some may well claim that classical probability is not a "frequentist" theory.
Yet it is only a short leap from the classical to the frequentist conception. Consider the urn model, with say 3 white balls and 2 black balls. One may have repeated draws, with replacement, from one urn. Or one may have one simultaneous draw from five urns that each have either a black or white ball. In the first case, we have a serial, or frequentist, outlook. In the second, we have a simple proportion as described by the classical outlook.
The probability of two heads in a row is 1/4, as shown by the table.
HH
TH
HT
TT
Now suppose we have an urn in which we place two balls and specify that there may be 0 to 2 black balls and 0 to 2 white balls. This is the same as having 4 urns, with these contents:
BB
BW
WB
WW
One then is presented with an urn and asked the probability it holds 2 black balls. The answer is 1/4.
Though that result is trivial, the comparison underscores how classical and frequentist probability are intertwined.
So one might argue that the law of large numbers is the mapping of classical probabilities onto a time interval. But if so, then classical probability sees the possibility of maximum entropy as axiomatic (not that early probabilists necessarily thought in those terms). In classical terms, one can see that maximum entropy is equivalent to the principle of insufficient reason, which to most of us seems quite plausible. In other words, if I hide the various combinations of balls in four urns and don't let you see me doing the mixing, then your knowledge is incomplete, but sufficient to know that you have one chance in four of being right.
But, one quickly adds, what does that information mean? What can you do with that sort of information as you get along in life? It is here that, if you believe in blind chance, you turn to the "law" of large numbers. You are confident that if you are given the opportunity to choose over many trials, your guess will turn out to have been right about 25% of the time.
So one can say that, at least intuitively, it seems reasonable that many trials, with replacement, will tend to verify the classically derived probability, which becomes the asymptotic limiting value associated with the law of large numbers. Inherent in the intuition behind this "law" is the notion that hidden influences are random -- if we mean by random that over many trials, these influences tend to cancel each other out, analogous to the fact that Earth is essentially neutrally charged because atoms are, over the large, randomly oriented with respect to one another, meaning that the ionic charges virtually cancel out. (Notice the circularity problem in that description.)
Nevertheless, one could say that we may decide on, say, 10 trials of flipping a coin, and assume we'd be right on about 50% of guesses -- as we have as yet insufficient reason to believe the coin is biased. Now consider 10 urns, each of which contains a coin that is either head up or tail up. So your knowledge is encapsulated by the maximum ignorance in this scenario. If asked to bet on the first draw, the best you can do is say you have a 50% chance of being right (as discussed a bit more below). Though the time-dependent coin-flip frequency scenario yields outcome ratios that are equivalent to the time-independent urn classical scenario, in the first scenario we ordinarily require cancellation of most unseen influences or causes while in the second scenario only the guesser's lack of complete knowledge is at issue.
(We should here acknowledge that though classical probability originated with analysis of games of chance, in fact those games almost always implied frequency ratios and the law of large numbers.)
At any rate,
the notion that, on average, hidden force vectors in a probabilistic scenario cancel out might seem valid if one holds to conservation laws, which imply symmetries, but I am uncertain on this point; it is possible that Noether's theorem applies.
So it ought to be apparent that Bernoulli's frequency ideas were based on the principle of insufficient reason, or "subjective" ignorance. The purpose of the early probabilists was to extract information from a deterministic system, some of whose determinants were unknown. In that period, their work was frowned upon because of the belief that the drawing of lots should be reserved for specific moral purposes, such as the settling of an argument or discernment of God's will on a specific course of action. Gambling, though a popular pastime, was considered sinful because God's will was being ignored by the gamers.
Such a view reflects the Mosaic edict that Israel take no census, which means this: To take a census is to count one's military strength, which implies an estimate of the probability of winning a battle or a war. But the Israelis were to trust wholly in their God for victory, and not go into battle without his authorization. Once given the word to proceed, they were to entertain no doubt. This view is contrary to the custom of modern man, who, despite religious beliefs, assesses outcomes probabilistically, though usually without precise quantification.
To many modern ears, the idea that probabilities are based on a philosophy that ejects divine providence from certain situations sounds quite strange. And yet, if there is a god, should we expect that such a being must leave certain things to chance? Does blind chance exist, or is that one of the many illusions to which we humans are prone? (I caution that I am not attempting to prove the existence of a deity or to give a probability to that supposition.)
In classical and early frequentist approaches, the "maximum entropy" or well-mixing concept was implicitly assumed. And yet, as Ludwig Boltzmann and Claude Shannon showed, one can think of degrees of entropy that are amenable to calculation.
Von Mises has been called the inventor of modern frequentism, which he tried to put on a firm footing by making axioms of the law of large numbers and of the existence of randomness, by which he meant that, over time, no one could "beat the odds" in a properly arranged gambling system.
The Von Mises axioms
1. The axiom of convergence: "As a sequence of trials is extended, the proportion of favorable outcomes tends toward a definite mathematical limit."
2. The axiom of randomness: "The limiting value of the relative frequency must be the same for all possible infinite sub-sequences of trials chosen solely by a rule of place selection within the sequence (i.e., the outcomes must be randomly distributed among the trials)."
Alonzo Church on the random sequences of Von Mises
http://www.socsci.uci.edu/~bskyrms/bio/readings/church_randomness.pdf
Of course, the immediate objection is that declaring axioms does not necessarily mean that reality agrees. Our collective experience is that reality does often seem to be in accord with Von Mises's axioms. And yet, one cannot say that science rests on a testable foundation, even if nearly all scientists accept these axioms. In fact, it is possible that these axioms are not fully in accord with reality and only work within limited spheres. A case that illustrates this point is Euclid's parallel postulate, which may not hold at the cosmic scale. In fact, the counterintuitive possibilities for Riemann space demonstrate that axioms agreed to by "sensible" scientists of the Newtonian mold are not necessarily the "concrete fact" they were held to be.
Consider the old syllogism:
1. All men are mortal.
2. Socrates is a man.
3. Hence, Socrates is mortal.
It is assumed that all men are mortal. But suppose in fact 92% of men are mortal. Then conclusion 3 is similarly uncertain, although rather probable.
Following in David Hume's track, we must concede to having no way to prove statement 1, as there might be some exception that we don't know about. When we say that "all men are mortal," we are relying on our overwhelming shared experience, with scientists proceeding on the assumption that statement 1 is self-evidently true.
If we bring the study of biology into the analysis, we might say that the probability that statement 1 holds is buttressed by both observational and theoretical work. So we would assign the system of members of the human species a propensity of virtually 1 as to mortality. That is to say, we take into account the systemic and observational evidence in assigning an
a priori probability.
And yet, though a frequency model is implicit in statement 1, we cannot altogether rule out an exception, not having the power of perfect prediction. Thus, we are compelled to accept a degree of confidence or degree of belief. How is this to be arrived at? A rough approximation might be to posit a frequency of less than 1 in 7 billion, but that would say that the destiny of everyone alive on Earth today is known. We might match a week's worth of death records over some wide population against a week's worth of birth records in order to justify a statement about the probability of mortality. But that isn't much of a gain. We might as well simply say the probability of universal mortality is held to be so close to 1 as to be accepted as 1.
The difficulties with the relative frequency notion of probability are well summarized by Hermann Weyl (13). Weyl noted that Jacob Bernoulli's earlier parts of his Ars Conjectandi were sprinkled with words connoting subjective ideas, such as "hope" and "expectation." Still, in the fourth part of that book, Bernoulli introduces the seemingly objective "law of large numbers," which he established with a mathematical proof. Yet, says Weyl, the logical basis for that law has remained murky ever since.
Wikipedia article on 'Ars Conjectandi'
https://en.wikipedia.org/wiki/Ars_Conjectandi
Yes, it is true that Laplace emphasized the aspect of probability with the classical quantitative definition: the quotient of the number of favorable cases over the number of all possible cases, says Weyl. "Yet this definition presupposes explicitly that the different cases are equally possible. Thus, it contains as an
aprioristic basis a quantitative comparison of possibilities."
The conundrum of objectivity is underscored by the successful use of inferential physics in the hard and soft sciences, in the insurance business and in industry in general, Weyl points out.
Yet, if probability theory only concerns relative frequencies, we run into a major problem, Weyl argues. Should we not base this frequency interpretation directly on trial series inherent in the law of large numbers? We might say the limiting value is reached as the number of trials increases "indefinitely." But even so it is hard to avoid the fact that we are introducing "the impossible fiction of an infinity of trials having actually been conducted." He adds, "Moreover, one thereby transcends the content of the probability statement. Inasmuch as agreement between relative frequency and probability p is predicted for such a trial series with 'a probability approaching certainty indefinitely,' it is asserted that every series of trials conducted under the same conditions will lead to the same frequency value."
The problem, as Weyl sees it, is that if one favors "strict causality," then the methods of statistical inference must find a "proper foundation in the reduction to strict law" but, this ideal seems to run into the limit of partial acausality at the quantum level. Weyl thought that perhaps physics could put statistical inference on a firm footing, giving the physical example of equidistribution of gas molecules, based on the notion that forces among molecules are safe to ignore in that they tend to cancel out. But here the assumption behind this specimen of the law of large numbers has a physical basis -- namely Newtonian physics, which, in our terms, provides the propensity information that favors the equiprobabilities inherent in equidistribution.
Yet, I do not concede that this example proves anything. Really, the kinetic gas theories of Maxwell, Boltzmann and Gibbs tend to assert the Newtonian mechanics theory, but are based on the rough-and-ready relative-frequency empirico-inductive perception apparatus used by human beings and other mammals.
How does one talk about frequencies for infinite sets? In classical mechanics, a molecule's net force vector might point in any direction and so the probability of any specific direction equals zero, leading Weyl to remark that in such continuous cases one can understand why measure theory was developed.
On measure theory
https://en.wikipedia.org/wiki/Measure_(mathematics)
Two remarks:
1. In fact, the force vector's possible directions are limited by Planck's constant, meaning we have a large population of discrete probabilities that can very often be treated as an infinite set.
2. Philosophically, one may agree with Newton and construe an infinitesimal as a discrete unit that exists in a different realm than that of the reals. We see a strong echo of this viewpoint in Cantor's cardinal numbers representing different orders of infinity.
An important development around the turn of the 19th century was the emergence of probabilistic methods of dealing with error in observation and measurement. How does one construct a "good fit" curve from observations that contain seemingly random errors? By "random" an observer means that he has insufficient information to pinpoint the source of the error, or that its source isn't worth the bother of determining. (The word "random" need not be used only in this sense.)
The binomial probability formula is simply a way of expressing possible proportions using combinatorial methods; it is a logical tool for both classical and frequentist calculations.
Now this formula (function) can be mapped onto a Cartesian grid. What it is saying is that finite probabilities are highest for sets with the highest finite numbers of elements, or permutations. As a simple example, consider a coin-toss experiment. Five flips yields j heads and k tails, where j or k = 0 to 5.
This gives the binomial result:
5C5 = 1, 5C4 = 5, 5C3 = 10, 5C2 = 10, 5C1 = 5, 5C0 = 1.
You can, obviously, visualize a symmetrical graph with two center bars of 10 units high flanked on both sides by bars of diminishing height.
Now if the probability of p = 1/2 and q = 1/2 (the probability of occurrence and the probability of non-occurrence), we get the symmetrical graph:
1/32, 5/32, 10/32, 10/32, 5/32, 1/32
We see here that there are 10 permutations with 3 heads and 2 tails and 10 with 3 tails and 2 heads in which chance of success or failure is equal. So, if you are asked to bet on the number of heads turning up in 5 tosses, you should -- assuming some form of randomness -- choose either 3 or 2.
Clearly, sticking with the binomial case, there is no reason not to let the number of notional tosses go to infinity, in which case every specific probability reduces to zero. Letting the binomial graph go to infinity gives us the Gaussian normal curve. The normal curve is useful because calculational methods have been worked out that make it more convenient than binomial (or multinomial) probability calculation. And, it turns out that as n increases in the binomial case, probabilities that arise from situations where replacement is logically required are nevertheless well approximated by probabilities arising with the no-replacement assumption.
So binomial probabilities are quite well represented by the Gaussian curve when n is large enough. Note that, implicitly, we are assuming "well mixing" or maximum entropy.
So the difference between the mean and the next unit shrinks with n.
50C25/51C25 = 0.509083922
and if we let n run to infinity, that ratio goes exactly to 0.5.
So it made sense, as a useful calculational tool, to use the Gaussian curve, where n runs to infinity.
Yet one should beware carelessly assuming that such a distribution is some form of "objective" representation of reality. As long as no one is able to fully define the word "random" in whatever aspect, then no one can say that the normal curve serves as a viable approximate representation of some arena of reality. Obviously, however, that distribution has proved to be immensely productive in certain areas of science and industry -- though one should not fail to appreciate its history of misuse. At any rate, a great advantage of the normal curve is that it so well represents the binomial distribution.
Certainly, we have here an elegant simplification, based on the assumption of well-mixing or maximum entropy. As long as we use the normal distribution to approximate the possibilities for a finite experiment, that simplification will be accepted by many as reasonable. But if the bell curve is meant to represent some urn containing an infinitude of potential events, then the concept of normal distribution becomes problematic.
In other words, any finite cluster of, say, heads, can come up an infinity of times. We can say our probability of witnessing such a cluster is low, but how do we ensure well mixing to make sure that that belief holds? If we return to the urn model, how could we ensure maximally entropic shuffling of an infinitude of black and possibly white balls? We have no recourse but to appeal to an unverifiable Platonic ideal or perhaps to say that the principle of insufficient reason is, from the observer's perspective, tantamount to well mixing. (Curiously, the axiom of choice of Zermelo-Fraenkel set theory enters the picture here, whereby one axiomatically is able to obtain certain subsets of an infinitude.)
Keynes takes aim at the principle of indifference (or, in our terms, zero propensity information) in this passage:
"If, to take an example, we have no information whatever as to the area or population of the countries of the world, a man is as likely to be an inhabitant of Great Britain as of France, there being no reason to prefer one alternative to the other.
"He is also as likely to be an inhabitant of Ireland as of France. And on the same principle he is as likely to be an inhabitant of the British Isles as of France. And yet these conclusions are plainly inconsistent. For our first two propositions together yield the conclusion that he is twice as likely to be an inhabitant of the British Isles as of France. Unless we argue, as I do not think we can, that the knowledge that the British Isles are composed of Great Britain and Ireland is a ground for supposing that a man is more likely to inhabit them than France, there is no way out of the contradiction. It is not plausible to maintain, when we are considering the relative populations of different areas, that the number of names of subdivisions which
are within our knowledge, is, in the absence of any evidence as to their size, a piece of relevant evidence.
"At any rate, many other similar examples could be invented, which would require a special explanation in each case; for the above is an instance of a perfectly general difficulty. The possible alternatives may be a, b, c, and d, and there may be no means of discriminating between them; but equally there may be no means of discriminating between (a or b), c, and d" (14).
Modern probability texts avoid this difficulty by appeal to set theory. One must properly define sets before probabilities can be assigned.
Two points:
1. For most purposes, no one would gain knowledge via applying probability rankings to Keynes's scenario. Even so, that doesn't mean no situation will ever arise when it is not worthwhile to apply probabilistic methods, though of course the vagueness of the sets makes probability estimates equally vague.
2. If we apply set theory, we are either using naive set theory, where assumptions are unstated, or axiomatic set theory, which rests on unprovable assertions. In the case of standard ZFC set theory, Goedel's incompleteness theorem means that the formalism is either incomplete or inconsistent. Further, it is not known whether ZFC is both incomplete and inconsistent.
Randomness4 arises when willful ignorance is imposed as a means of obtaining a classical form of probability, or of having insufficient reason to regard events as other than equiprobable.
Consider those exit polls that include late sampling, which are the only exit polls where it can be assumed that the sample set yields a quantity close to the ratio for the entire number of votes cast.
This is so, it is generally believed, because if the pollsters are doing their jobs properly, the pollster's selection of every nth person leaving a polling station screens out any tendency to select people who are "my kind."
In fact, the exit poll issue underscores an existential conundrum: suppose the exit poll ratio for candidate A is within some specified margin of error for a count of the entire vote. That is to say, with a fairly high number of ballots there is very likely to be occasional ballot-count errors, which, if random, will tend to cancel. But the level of confidence in the accuracy of the count may be only, say, 95%. If the exit poll has a low margin of error in the counting of votes -- perhaps the pollsters write down responses with 99% accuracy -- then one may find that the exit poll's accuracy is better than the accuracy of the entire ballot count.
A recount may only slightly improve the accuracy of the entire ballot count. Or it may not provably increase its accuracy at all, if the race is especially tight and the difference is within the margin of error for ballot counting.
A better idea might be to have several different exit polls conducted simultaneously with an average of results taken (the averaging might be weighted if some exit pollsters have a less reliable track record than others).
So as we see -- even without the theorems of Kurt Goedel and Alan Turing and without appeal to quantum phenomena -- some statements may be undecidable. It may be impossible to get definitive proof that candidate A won the election, though in most cases recounts, enough attention to sources of bias would possibly drastically alter the error probabilities. But even then, one can't be certain that in a very tight races the outcome wasn't rigged.
It is important to understand that when randomness4 is deployed, the assumption is that influences that would favor a bias tend to cancel out. In the case of an exit poll, it is assumed that voters tend to arrive at and leave the polling station randomly (or at least pseudorandomly). The minor forces affecting their order of exiting tend to cancel, it is believed, permitting confidence in a sample based on every nth voter's disclosure of her vote.
In another important development in probability thinking, Karl Popper in the mid-20th century proposed the propensity idea as a means of overcoming the issue of "subjectivity," especially in the arena of quantum mechanics. This idea says that physical systems have elementary propensities or tendencies to yield a particular proposition about some property. In his thinking, propensity is no more "occult" a notion than the notion of force. The propensity can be deduced because it is an
a priori (a term Popper disdains) probability that is fundamental to the system. The propensity is as elementary a property as is the spin of an electron; it can't be further reduced or described in terms of undetected vibrations (though he didn't quite say that).
The propensity probability can be approximated via repeated trials, but applies immediately on the first trial. By this, Popper avoids the area of hidden variables and in effect quantizes probability, though he doesn't admit to having done so. What he meant to do was minimize quantum weirdness so as to save appearances, or that is, "external" reality.
Popper wasn't always clear as to what he meant by realism, but it is safe to assume he wanted the laws of physics to hold whether or not he was sleeping. Even so, he was forced to concede that it might be necessary to put up with David Bohm's interpretation of quantum behaviors, which purports to save realism only by sacrificing bilocality, and agreeing to "spooky action at a distance."
The notion of propensity may sometimes merge with standard ideas of statistical inference. Consider this passage from J.D. Stranathan's history of experimental physics:
"In the case of the Abraham
theory the difference between the observed and calculated values of m/mo are all positive; and these differences grow rather large at the higher velocities. On the Lorentz theory the differences are about as often positive as negative; the sum of the positive errors is nearly equal to the sum of negative ones. Furthermore, there is no indication that the error increases at the higher velocities. These facts indicate that the errors are not inherent in the theory; the Lorentz theory describes accurately the observed variation" (15).
Hendrik A. Lorentz was first to propose relativistic mass change for the electron only, an idea generalized by Einstein to apply to any sort of mass. Max Abraham clung to a non-relativistic ether theory.
1. I do not regard myself as an expert statistician. When I wish to obtain some statistical result, I consult my textbooks. Neither do I rate myself as extremely facile with probability calculations, though I sometimes enjoy challenging questions (and have been known to make embarrassing flubs).
2. At times we will use the word "probability" as short for "probabilistic thinking."
3. That is not to say that topologies can have no affect on probability calculations. For example, topologies associated with general relativity certainly may be relevant in probability calculations or are not relevant when it comes to such concepts as the four-dimensional spacetime block.
4. A term attributed to the 19th century logician and philosopher C.S. Peirce.
5. The convenient word "gist" has an echo of some noumenal realm, as it stems from the word "geist," as in "spirit" or "ghost."
6. Logical Foundations of Probability by Rudolph Carnap (University of Chicago, 1950).
7. From the 2006 introduction to The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference by Ian Hacking (Cambridge, 1975, 2006).
8. Jan von Plato in The Probability Revolution Vol. 2: ideas in the sciences; Kruger, Gigerenzer, and Morgan, editors (MIT Press, 1987).
8a. A philosopher's play on the prior philosophical idea of "principle of sufficient reason."
9. Calculated Risks: How to know when numbers deceive you by Gerd Gigerenzer (Simon and Schuster, 2002).
10. "Truth and Probability" (1926) by Frank P. Ramsey, appearing in The Foundations of Mathematics and Other Logical Essays.
11. The Logic of Scientific Discovery by Karl Popper. Published as Logik der Forschung in 1935; English version published by Hutchinson in 1959,
Kluwer Academic Publishers 2001.
13. The Philosophy of Mathematics and Science by Hermann Weyl (Princeton University Press, 1949, 2009).
14. A Treatise on Probability by J.M. Keynes (Macmillan, 1921).
15. The 'Particles' of Modern Physics by J.D. Stranathan (Blakiston, 1942).
Part II
Law of large numbers
In my view, a basic idea behind the "law of large numbers" is that minor influences tend to cancel each other out asymptotic to infinity. We might consider these influences to be small force vectors that can have a butterfly effect as to which path is taken. If we map these small vectors onto a sine wave graph, we can see heuristically how the little bumps above the axis tend to be canceled by the little bumps below the axis, for partially destructive interference. We can also see how small forces so mapped occasionally superpose in a constructive way, where, if the amplitude is sufficient, a "tipping point" is reached and the coin falls head side up.
In fact, two forms -- the weak and the strong -- of this law have been elucidated. This distinction however doesn't address the fundamental issues that have been raised.
On the law of large numbers
http://www.mhhe.com/engcs/electrical/papoulis/graphics/ppt/lectr13a.pdf
The strong law
http://mathworld.wolfram.com/StrongLawofLargeNumbers.html
The weak law
http://mathworld.wolfram.com/WeakLawofLargeNumbers.html
Keynes raises, from a logician's perspective, strong objections to the law of large numbers, though he considers them minor from the physicist's point of view. His solution is to eschew explaining reproducible regularities in terms of accumulations of accidents. "...for I do not assert the identity of the physical and the mathematical concept of probability at all; on the contrary, I deny it" (15).
This amounts to tossing out the logical objections to the "law," and accepting that "law" on an
ad hoc or axiomatic basis. Yet, he makes an attempt at a formalistic resolution.
Neither is that law always valid, he says. "The rule that extreme probabilities have to be neglected ... agrees with the demand for scientific objectivity." That is, there is the "obvious objection" that even an enormous improbability always remains a probability, however small, and that consequently even the most "impossible" processes -- i.e., those which we propose to neglect -- will someday happen. And that someday could be today.
Keynes, to make his point, cites some extraordinarily improbable distributions of gas molecules in the Maxwell's demon thought experiment. "Even if a physicist happened to observe such a process, he would be quite unable to reproduce it, and therefore would never be able to decide what had really happened in this case, and whether he had made an observational mistake" (16).
Citing Arthur Eddington's statement to the effect that some potential events in nature are impossible, while other things don't happen because of their remote probability, Keynes says that he prefers to avoid non-testable assertions about whether extremely improbable things in fact occur. Yet, he observes that Eddington's assertion agrees well with how the physicist applies probability theory (17).
I note that if a probability is so remote as to be untestable via experiment, then, as E.T. Jaynes says, a frequentist model is not necessarily hard and fast. It can only be assumed that the probability assignments are adequate guides for some sort of decision-making. Testing is out of the question for extreme cases.
So, I suggest that Keynes here is saying that the scientific basis for probability theory is intuition.
A problem with Keynes's skepticism regarding highly improbable events is that without them, the notion of randomness loses some of its power.
The mathematics of the the chaos and catastrophe theories makes this clear. In the case of a "catastrophe" model of a continuously evolving dynamical system, sudden discrete jumps to a new state are inevitable, though it may not be so easy to say when such a transition will occur.
Concerning catastrophe theory
http://www.physics.drexel.edu/~bob/PHYS750_NLD/Catastrophe_theory.pdf
Nonlinear dynamics and chaos theory
http://www2.bren.ucsb.edu/~kendall/pubs_old/2001ELS.pdf
We also must beware applying the urn of nature scenario. An urn has one of a set of ratios of white to black balls. But, a nonlinear dynamic system is problematic for modeling by an urn. Probabilities apply well to uniform, which is to say, for practical purposes, periodic systems. One might possibly justify Laplace's rule of succession on this basis (for derivation of the rule of succession see More on Laplace's rule, using your control f function). Still, quasi-periodic systems may well give a false sense of security, perhaps masking sudden jolts into atypical, possibly chaotic, behavior. Wasn't everyone carrying on as usual, "marrying and giving in marriage," when in 2004 a tsunami killed 230,000 people in 14 countries bordering the Indian Ocean?
So we must be very cautious about how we use probabilities concerning emergence of high-information systems. Here is why: A sufficiently rich mix of chemical compounds may well form a negative feedback dynamical system. It would then be tempting to apply a normal probability distribution to such a system, and that distribution very well may yield reasonable results for a while. But, if the dynamical system is nonlinear -- which most are -- the system could reach a threshold, akin to a chaos point, at which it crosses over into a positive feedback system or into a substantially different negative feedback system.
The closer the system draws to that tipping point, the less the normal distribution applies. In chaotic systems, normal probabilities, if applicable, must be applied with great finesse. Hence to say that thus and such an outcome is highly improbable based on the previous state of the system is to misunderstand how nonlinearities can work. In other words, a Markov process (see below) is often inappropriate for predicting "highly improbable" events, though it may do as a good enough approximation in many nonlinear scenarios.
It is noteworthy that Keynes thought that the work of Pafnuty Chebyshev and Andrey Markov should replace Laplace's rule, implying that he thought a Markov process adequate for most probabilistic systems (18). Certainly he could not have known much of what came to be known as chaos theory and nonlinear dynamics.
Another issue is the fact that an emergent property may not be obvious until it emerges (echoes of David Bohm's "implicate order"). Consider the Moebius band. Locally, the surface is two-sided, such that a vector orthogonal to the surface has a mirror vector pointing in the opposite direction. Yet, at the global scale, the surface is one-sided and a mirror vector is actually pointing out from the same surface as its partner is.
If a paper model of a Moebius strip were partially shown through a small window and an observer were asked if she thought the paper was two-sided, she would reply: "Of course." Yet, at least at a certain scale at which thickness is ignored, the paper strip has one side.
What we have in the case of catastrophe and chaos events is often called pseudorandomness, or effectively incalculable randomness. In the Moebius band case, we have a difficulty on the part of the observer of conceptualizing emergent properties, an effect also found in Ramsey order.
We can suggest the notion of unfoldment of information, thus: We have a relation R representing some algorithm.
Let us suppose an equivalence relation such that
(i) aR
refa <--> aR
refa (reflexivity).
(ii) aR
symmb <--> bR
symma (symmetry).
(iii) aR
tranb and bR
tranc) --> aR
tranc (transitivity).
The redundancy, or structural information, is associated with R. So aRa corresponds to 0 Shannon information in the output. The reflexivity condition is part of the structural information for R, but this redundancy is irrelevant for Rreflex. The structural information is relevant in the latter two cases. In those cases, if we do not know the structure or redundancy in R, we say the information is enfolded. Once we have discovered some algorithm for R, then we say the information has been revealed and is close to zero, but not quite zero, as we may not have advance knowledge concerning the variables.
Some would argue that what scientists mean by order is well summarized by an order relation aRb, such that A X B is symmetric and transitive but not reflexive. Even so, I have yet to convince myself on this point.
Ramsey order
John Allen Paulos points out an important result of network theory that guarantees that some sort of order will emerge. Ramsey proved a "strange theorem," stating that if one has a sufficiently large set of geometric points and every pair of them is connected by either a red line or a green line (but not by both), then no matter how one paints the lines, there will always be a large subset of the original set with a special property. Either every pair of the subset's members will be connected by a red line or every pair of the subset's members will be connected by a green line.
"If, for example, you want to be certain of having at least three points all connected by red lines or at least three points all connected by green lines, you will need at least six points," says Paulos.
"For you to be certain that you will have four points, every pair of which is connected by a red line, or four points, every pair of which is connected by a green line, you will need 18 points, and for you to be certain that there will be five points with this property, you will need -- it's not known exactly -- between 43 and 55. With enough points, you will inevitably find unicolored islands of order as big as you want, no matter how you color the lines," he notes.
Paulos on emergent order
http://abcnews.go.com/Technology/WhosCounting/story?id=4357170&page=1
In other words, no matter what type or level of randomness is at work, "order" must emerge from such networking. Hence one might run across a counterintuitive subset and think its existence highly improbable, that the subsystem can't have fallen together randomly. So again, we must beware the idea that "highly improbable" events are effectively nonexistent. Yes, if one is applying probabilities at a near-zero propensity, and using some Bayesian insuffucient reason rationale, then such an emergent event would be counted as virtually impossible. But, with more knowledge of the system dynamics, we must parse our probabilistic questions more finely.
On the other hand, intrinsic fundamental randomness is often considered doubtful except in the arena of quantum mechanics -- although quantum weirdness does indeed scale up into the "macro" world (see noumena sections in Part VI, link in sidebar). Keynes of course knew nothing about quantum issues at the time he wrote Treatise.
Kolmogorov used his axioms to try to avoid Keynesian difficulties concerning highly improbable events.
Kolmogorov's 1933 book (19) gives these two conditions:
A. One can be practically certain that if C is repeated a large number of times, the relative frequency of E will differ very little from the probability of E. [He axiomatizes the law of large numbers.]
B. If P(E) is very small, one can be practically certain that when C is carried out only once, the event E will not occur at all.
But, if faint chances of occurrences are ruled out beyond some limit, doesn't this really go to the heart of the meaning of randomness?
Kolmogorov's 'Foundations' in English
http://www.mathematik.com/Kolmogorov/index.html
And, if as Keynes believed, randomness is not all that random, we lose the basic idea of independence of like events, and we bump into the issue of what is meant by a "regularity" (discussed elsewhere).
Statisticians of the 19th century, of course, brought the concept of regularity into relief. Their empirical methods disclosed various recurrent patterns, which then became fodder for the methods of statistical inference. In those years, scientists such as William Stanley Jevons began to introduce probabilistic methods. It has been argued that Jevons used probability in terms of determining whether events result from certain causes as opposed to simple coincidences, and via the method of the least squares. The first approach, notes the Stanford Encyclopedia of Philosophy, "entails the application of the 'inverse method' in induction: if many observations suggest regularity, then it becomes highly improbable that these result from mere coincidence."
Encyclopedia entry on Jevons
http://plato.stanford.edu/entries/william-jevons/
Jevons also employed the method of least squares to try to detect regularities in price fluctuations, the encyclopedia says.
Statistical regularities, in my way of thinking, are a reflection of how the human mind organizes the perceived world, or world of phenomena. The brain is programed to find regularities (patterns) and to rank them -- for the most part in an autonomic fashion -- as an empirico-inductivist-frequentist mechanism for coping.
Yet, don't statistical regularities imply an objective randomness that implies a reality larger than self? My take is that the concept of intrinsic randomness serves as an idealization, which serves our desire for a mathematical, formalistic representation of the phenomenal world and in particular, serves our desire to predict properties of macro-states by using the partial, or averaged, information we have of the micro-states, as when we obtain the macro-state information of the threshold line for species extinction, which serves to cover the not-very-accessible information for the many micro-states of survival and mortality of individuals.
Averaging however does not imply intrinsic randomness. On the other hand, the typical physical assumption that events are distinct and do not interact without recourse to known physical laws implies independence of events, which in turn implies effectively random influences. In my estimation, this sort of randomness is a corollary of the reductionist, isolationist and simplificationist method of typical science, an approach that can be highly productive, as when Claude Shannon ignored the philosophical ramifications of the meaning of information.
The noted probability theorist Mark Kac gives an interesting example of the relationship of a deterministic algorithm and randomness [35].
Consider the consecutive positive integers {1, 2, ... ,n} -- say with n = 104 and corresponding to each integer m in this range, and then consider the number f(m) of the integer's different prime factors.
Hence, f(1) = 0, f(2) = f(3) = f(5) = 1, f(4) = 22 = 1, f(6) = f(2*3) = 2, f(60) = f(22*3*5) = 3, and so forth.
Kac assigns these outputs to a histogram of the number of prime divisors, using ln (ln n) and adjusting suitably the size of the base interval. He obtains an excellent approximation -- which improves as n rises -- to the normal curve. The statistics of the number of prime factors is, Kac wrote, indistinguishable from the statistics of the sizes of peas or the statistics of displacement in Brownian motion. And yet, the algorithm is fully deterministic, meaning from his perspective that there is neither chance nor randomness.
We note that in classical dynamical systems, there is also no intrinsic randomness, and that probabilities are purportedly determined by activities below the threshold of convenient observation. And yet the fact that prime factors follow the normal curve is remarkable and deserving of further attention. There should, one would think, be a relationship between this fact and Riemann's conjecture.
Interestingly primes fall in the holes of the sieve of Eratosthenes, implying that they do not follow any algebraic formula (which they do not except in a very special case that is not applicable to the general run of algebraic formulas). Is it so surprising that primes occur "erratically" when one sees that they are "anti-algebraic"? In general, non-algebraic algorithms produce outputs that are difficult to pin down exactly for some future state. Hence, probabilistic methods are called for. In that case, one hopes that some probability distribution/density will fit well enough.
But, the fact that the normal curve is the correct distribution is noteworthy, as is the fact that the samples of prime factors follow the central limit theorem.
My take is that the primes do not fall in an equally probable pattern, a fact that is quite noticeable for low n. Yet, as n increases, the dependence tends to weaken. So at 104 the dependence among prime factors is barely detectable, making their detections effectively independent events. In other words, the deterministic linkages among primes tend to cancel or smear out, in a manner similar to sub-threshold physical variables tending to cancel.
In a discussion of the Buffon's needle problem and Bertrand's paradox, Kac wishes to show that if probability theory is made sufficiently rigorous, the layperson's concerns about its underlying value can be answered. He believes that sufficient rigor will rid us of the "plague of paradoxes" entailed by the different possible answers to the so-called paradoxes.
Still, 50 years after Kac's article, Bertrand's paradox still stimulates controversy. The problem is often thought to be resolved by specification of the proper method of setting up an experiment. That is to say, the conceptual probability is not divorced from the mechanics of an actual experiment, at least in this case.
And because actual Buffon needle trials can be used to arrive at acceptable values of pi, we have evidence that the usual method of computing the Buffon probabilities is correct, and further that the notion of equal probabilities is for this problem a valid assumption, though Kac argued that only a firm mathematical foundation would validate that assumption.
A useful short discussion of Bertrand's paradox is found here:
Wikipedia article on Bertrand's paradox
https://en.wikipedia.org/wiki/Bertrand_paradox_%28probability%29
At any rate, whether the universe ( = both the phenomenal and noumenal worlds) follows the randomness assumptions above is open to debate. Note that the ancients had a law of gravity (though not articulated as such). Their empirical observations told them that an object falls to the ground if not supported by other objects. The frequency ratio was so high that any any exception would have been regarded as supernatural. These inductive observations led to the algorithmic assessments of Galileo and Newton. These algorithmic representations are very successful, in limited cases, at prediction. These representations are deductive systems. Plug in the numbers, compute, and, in many cases, out come the predictive answers. And yet the highly successful systems of Newton and Einstein cannot be used, logically, as a means of excluding physical counterexamples. Induction supports the deductive systems, and cannot be dispensed with.
A statement such as "The next throw of a die showing 5 dots has a probability of 1/6" is somewhat inadequate because probabilities, says Popper, cannot be ascribed to a single occurrence of an event, but only to infinite sequences of occurrences (i.e., back to the law of large numbers). He says this because he is saying that any bias in the die can only be logically ruled out by an infinity of trials (20). Contrast that with Weyl's belief (21) that symmetry can provide the basis for an expectation of zero bias (see Part IV) and with my suggestion that below a certain threshold, background vibrations may make no difference.
One can see the harbinger of the "law of large numbers" in the urn model's classical probability: If an urn contains say 5 white balls and 2 black, then our ability to predict the outcome is given by the numbers 2/5 and 3/5. But why is that so? Answer: it is assumed that if one conducts enough experiments, with replacement, guesses for black or white will asymptotically approach the ratios 2/5 and 3/5. Yet, why do we consider that assumption reasonable in "real life" and without bothering with formalities? We are accepting the notion that the huge aggregate set of minor force vectors, or "causes," tends to be neutral. There are two things to say about this:
1. This sort of randomness excludes the operation of a God or superior being. At one time, the study of probabilities with respect to games of chance was frowned upon on grounds that it was blasphemous to ignore God's influence or to assume that that influence does not exist (Bernoulli was prudently circumspect on this issue). We understand that at this point, many react: "Aha! Now you are bringing in religion!" But the point here is that the conjecture that there is no divine influence is an article of faith among some scientifically minded persons. This idea of course gained tremendous momentum from Darwin's work.
2. Results of modern science profoundly challenge what might be called a "linear perspective" that permits "regularities" and the "cancelation of minor causes." As we show in the noumena sections (Part VI, see sidebar), strange results of both relativity theory and quantum mechanics make the concept of time very peculiar indeed, meaning that causality is stood on its head.
Keynes tells his readers that Siméon Denis Poisson brought forth the concept of the "law of large numbers" that had been used by Bernoulli and other early probabilists. "It is not clear how far Poisson's result [the law of large numbers as he extended it] is due to a priori reasoning, and how far it is a natural law based on experience; but it is represented as displaying a certain harmony between natural law and the a priori reasoning of probabilities."
The French statistician Adolph Quetelet, says Keynes, did a great deal to explain the use of statistical methods. Quetelet "belongs to the long line of brilliant writers, not yet extinct, who have prevented probability from becoming, in the scientific salon, perfectly respectable. There is still about it for scientists a smack of astrology, of alchemy" (21a). It is difficult to exorcise this suspicion because, in essence, the law of large numbers rests on an unprovable assumption, though one that tends to accord with experience.
This is not to say that various people have not proved the weak and strong forms once assumptions are granted, as in the case of Borel, who was a major contributor to measure theory, which he and others have used in their work on probability. Yet, we do not accept that because a topological framework exists that encompasses probability ideas, it follows that the critical issues have gone away.
On Adolph Quetelet
http://mnstats.morris.umn.edu/introstat/history/w98/Quetelet.html
Keynes makes a good point about Poisson's apparent idea that if one does enough sampling and analysis, "regularities" will appear in various sets. Even so, notes Keynes, one should beware the idea that "because the statistics are numerous, the observed degree of frequency is therefore stable."
Keynes's insight can be appreciated with respect to iterative feedback functions. Those which tend to stability (where the iterations are finitely periodic) may be thought of in engineering terms as displaying negative feedback. Those that are chaotic (or pre-chaotic with spurts of instability followed by spurts of stability) are analogous to positive feedback systems. So, here we can see that if a "large" sample is drawn from a pre-chaotic system's spurt of stability, a wrong conclusion will be drawn about the system's regularity. And again we see that zero or near-zero propensity information, coupled with the assumption that samples represent the population (which is not to say that samples are not normally distributed), can yield results that are way off base.
Probability distributions
If we don't have at hand a set of potential ratios, how does one find the probability of a probability? If we assume that the success-failure model is binomial, then of course we can apply the normal distribution of probabilities. With an infinite distribution, we don't get the probability of a probability, of course, though we would if we used the more precise binomial distribution with n finite. But, we see that in practice, the "correct" probability distribution is often arrived at inductively, after sufficient observations. The Poisson distribution is suited to rare events; the exponential distribution to radioactive decay. In the latter case, it might be argued that along with induction is the deductive method associated with the rules of quantum mechanics.
Clearly, there is an infinitude of probability distributions. But in the physical world we tend to use a very few: among them, the uniform, the normal and the exponential. So a non-trivial question is: what is the distribution of these distributions, if any? That is, can one rationally assign a probability that a particular element of that set is reflective of reality? Some would argue that here is the point of the Bayesians. Their methods, they say, give the best ranking of initial probabilities, which, by implication suggest the most suitable distribution.
R.A. Fisher devised the maximum likelihood method for determining the probability distribution that best fits the data, a method he saw as superior to the inverse methods of Bayesianism (see below). But, in Harold Jeffreys's view, the maximum likelihood is a measure of the sample alone; to make an inference concerning the whole class, we combine the likelihood with an assessment of prior belief using Bayes's theorem (22).
Jeffreys took maximum likelihood to be a variation of inverse probability with the assumption of uniform priors.
In Jae Myung on maximum likelihood
http://people.physics.anu.edu.au/~tas110/Teaching/Lectures/L3/Material/Myung03.pdf
For many sorts of data, there is the phenomenon known as Benford's law, in which digit probabilities are not distributed normally but logarithmically. Not all data sets conform to this distribution. For example, if one takes data from plants that manufacture beer in liters and then converts those data to gallons, one wouldn't expect that the distribution of digits remains the same in both cases. True, but there is a surprise.
In 1996, Theodore Hill, upon offering a proof of Benford's law, said that if distributions are taken at random and random samples are taken from each of these distributions, the significant digit frequencies of the combined samples would converge to the logarithmic distribution, such that probabilities favor the lower digits in a base 10 system. Hill refers to this effect as "random samples from random distributions." As Julian Havil observed, "In a sense, Benford's Law is the distribution of distributions!" (23).
MathWorld: Benford's law
http://mathworld.wolfram.com/BenfordsLaw.html
Hill's derivation of Benford's law
http://www.gatsby.ucl.ac.uk/~turner/TeaTalks/BenfordsLaw/stat-der.pdf
Though this effect is quite interesting, it is not evident to me how one would go about applying it in order to discover a distribution beyond the logarithmic. Nevertheless, the logarithmic distribution does seem to be what emerges from the general set of finite data. Even so, however, Hill's proof appears to show that low digit bias is an objective artifact of the "world of data" that we commonly access. The value of this distribution is shown by its use as an efficient computer coding tool.
It is not excessive to say that Benford's law, and its proof, encapsulates very well the whole of the statistical inference mode of reasoning. And yet plainly Benford's law does not mean that "fluke" events don't occur. And who knows what brings about flukes? As I argue in
Toward, the mind of the observer can have a significant impact on the outcome.
Another point to take into consideration is the fact that all forms of probability logic bring us to the self-referencing conundrums of Bertrand Russell and Kurt Goedel. These are often dismissed as trivial. And yet, if a sufficiently rich system cannot be both complete and consistent, then we know that there is an enforced gap in knowledge. So we may think we have found a sublime truth in Benford's law, and yet we must face the fact that this law, and probabilistic and mathematical reasoning in general, cannot account for all things dreamt of, or undreamt of, in one's philosophy.
Concerning Bayesianism
The purpose of this paper is not to rehash the many convolutions of Bayesian controversies, but rather to spotlight a few issues that may cause the reader to re-evaluate her conception of a "probabilistic universe." (The topic will recur beyond this section.)
"Bayesianism" is a term that has come to cover a lot of ground. Bayesian statistical methods these days employ strong computational power to achieve results barely dreamt of in the pre-cyber era.
Yet, two concepts run through the heart of Bayesianism: Bayes's formula for conditional probability and the principle of insufficient reason or some equivalent. Arguments concern whether "reasonable" initial probabilities are a good basis for calculation and whether expert opinion is a valid basis for an initial probability. Other arguments concern whether we are only measuring a mental state or whether probabilities have some inherent physical basis external to the mind. Further, there has been disagreement over whether Bayesian statistical inference for testing hypotheses is well-grounded in logic and whether the calculated results are meaningful.
The clash is important because Bayesian methods tend to be employed by economists and epidemiologists and so affect broad government policies.
"The personal element is recognized by all statisticians," observes David Howie. "For Bayesians, it is declared on the choice of prior probabilities; for Fisherians in the construction of statistical model; for the Neyman-Pearson school in the selection of competing hypotheses. The social science texts, however, portrayed statistics as a purely impersonal and objective method for the design of experiments and the representation of knowledge" (24).
Similarly, Gerd Gigerenzer argues that a conspiracy by those who control social science journals has brought about the "illusion of a mechanized inference process." Statistics textbooks for social science students have, under publisher pressure, tended to omit or play down not only the personality conflicts among pioneering statisticians but also the differences in reasoning, Gigerenzer says. Such textbooks presented a hybrid of the methods of R.A. Fisher and of Jerzy Neyman and Egon Pearson, without alerting students as to the schisms among the trailblazers, or even, in most cases, mentioning their names. The result, says Gigerenzer, is that: "Statistics is treated as abstract truth, the monolithic logic of inductive inference."
Gigerenzer on 'mindless statistics'
http://library.mpib-berlin.mpg.de/ft/gg/GG_Mindless_2004.pdf
In the last decade, a chapter on Bayesian methods has become de rigeur for statistics texts. Even so, it remains true that students are given the impression that statistical inferences are pretty much cut and dried, though authors often do stress the importance of asking the right questions when setting up a method of attack on a problem.
This view, as espoused by Von Mises, was later carried forward by Popper (25), who eventually replaced it with his propensity theory (26), which is also anti-Bayesian in character.
A useful explanation of modern Bayesian reasoning is given by Michael Lavine:
What is Bayesian statistics and why everything else is wrong
http://www.math.umass.edu/~lavine/whatisbayes.pdf
The German tank problem gives an interesting example of a Bayesian analysis.
The German tank problem
http://en.wikipedia.org/wiki/German_tank_problem
In the late 19th century, Charles S. Peirce denounced the Bayesian view and tried to assure that frequency ratios are the basis of scientific probability.
C.S. Peirce on probability
http://plato.stanford.edu/entries/peirce/#prob
Expert opinion
One justification of Bayesian methods is the use of a "reasonable" initial probability arrived at by the opinion of an expert or experts. Nate Silver points out for example that scouts did better at predicting who would make a strong ball player than did his strictly statistical method, prompting him to advocate a combination of the subjective expert opinion along with standard methods of statistical inference.
"If prospect A is hitting .300 with twenty home runs and works at a soup kitchen during his off days, and prospect B is hitting .300 with twenty home runs but hits up night clubs during his free time, there is probably no way to quantify this distribution," Silver writes. "But you'd sure as hell want to take it into account."
Silver notes that the arithmetic mean of several experts tends to yield more accurate predictions than the predictions of any single expert (27).
Obviously, quantification of expert opinion is merely a convenience. Such an expert is essentially using probability inequalities, as in p(x) less than p(y) less than p(z) or p(x) less than [1 - p(x)].
Sometimes when I go to a doctor, the nurse asks me to rate pain on a scale of 1 to 10. I am the expert, and yet I have difficulty with this question most of the time. But if I am shown a set of stick figure faces, with various expressions, I can always find the one that suits my subjective feeling. Though we are not specifically talking of probabilities, we are talking about the information inherent in inequalities and how that information need not always be quantified.
Similarly, I suggest that experts do not use fine-grained degrees of confidence, but generally stick with a simple ranking system, such as {1/100, 1/4, 1/3, 1/2, 2/3, 3/4, 99/100}. It is important to realize that a ranking system can be mapped onto a circle, thus giving a system of pseudo-percentages. This is the custom. But the numbers, not representing frequencies, cannot be said to represent percentages. An exception is the case of an expert who has a strong feel for the frequencies and uses her opinion as an adequate approximation of some actual frequency.
Often, what Bayesians do is to use an expert opinion for the initial probability and then apply the Bayesian formula to come up with frequency probabilities. Some of course argue that if we plug in a pseudo-frequency and use the Bayesian formula (including some integral forms) for an output, then all one has is a pseudo-frequency masquerading as a frequency. Still, it is possible to think about this situation differently. One hazards a guess as to the initial frequency -- perhaps based on expert opinion -- and then looks at whether the output frequency ratio is reasonable. That is to say, a Bayesian might argue that he is testing various initial values to see which yields an output that accords with observed facts.
One needn't always use the Bayesian formula to use this sort of reasoning.
Consider the probability of the word "transpire" in an example of what some would take as Bayesian reasoning. I am fairly sure it is possible, with much labor, to come up with empirical frequencies of that word that could be easily applied. But, from experience, I feel very confident in saying that far fewer than 1 in 10 books of the type I ordinarily read have had that word appear in the past. I also feel confident that a typical reader of books will agree with that assessment. So in that case, it is perfectly reasonable to plug in the value 0.1 in doing a combinatorial probability calculation for a recently read string of books. If, of the last 15 books I have read, 10 have contained the word "transpire," we have 10C15 x (1/10)
10 x (9/10)5 = 1.77 x 10
(-7). In other words, the probability of such a string of books occurring nonrandomly is much less than 1 in 10 million.
This sort of "Bayesian" inference is especially useful when we wish to establish an upper bound of probability, which, as in the "transpire" case, may be all we need.
One may also argue for a "weight of evidence" model, which may or may not incorporate Bayes's theorem. Basically, the underlying idea is that new knowledge affects the probability of some outcome. Of course, this holds only if the knowledge is relevant, which requires "reasonableness" in specific cases, where a great deal of background information is necessary. But this doesn't mean that the investigator's experience won't be a reliable means of evaluating the information and arriving at a new probability, arguments of Fisher, Popper and others notwithstanding.
A "weight of evidence" approach of course is nothing but induction, and requires quite a bit of "subjective" expert opinion.
On this point, Keynes, in his Treatise, writes: "Take, for instance, the intricate network of arguments upon which the conclusions of The Origin of Species are founded: How impossible it would be to transform them into a shape in which they would be seen to rest upon statistical frequencies!" (28)
Mendelism and the statistical population genetics pioneered by J.B. Haldane, Sewall Wright and Fisher were still in the early stages when Keynes wrote this. And yet, Keynes's point is well taken. The expert opinion of Darwin the biologist was on the whole amply justified (29) when frequency-based methods based on discrete alleles became available (superseding much of the work of Francis Galton).
Three pioneers of the 'modern synthesis'
http://evolution.berkeley.edu/evolibrary/article/history_19
About Francis Galton
https://www.intelltheory.com/galton.shtml
Keynes notes that Darwin's lack of statistical or mathematical knowledge is notable and, in fact, a better use of frequencies would have helped him. Even so, Darwin did use frequencies informally. In fact, he was using his expert opinion as a student of biology to arrive at frequencies -- though not numerical ones, but rather rule-of-thumb inequalities of the type familiar to non-mathematical persons. From this empirico-inductive method, Darwin established various propositions, to which he gave informal credibility rankings. From these, he used standard logical implication, but again informally.
One must agree here with Popper's insight that the key idea comes first: Darwin's notion of natural selection was based on the template of artificial selection for traits in domestic animals, although he did not divine the driving force --eventually dubbed "survival of the fittest" -- behind natural selection until coming across a 1798 essay by Thomas Malthus.
Essay on the Principle of Population
http://www.ucmp.berkeley.edu/history/malthus.html
Keynes argues that the frequency of some observation and its probability should not be considered to be identical. (This led Carnap to define two forms of probability, though unlike Keynes, he was only interested in frequentist probability.) One may well agree that a frequency gives a number. Yet there must be some way of connecting it to degrees of belief that one ought to have. On the other hand, who actually has a degree of belief of 0.03791? Such a number is only valuable if it helps the observer to discriminate among inequalities, as in p(y) much less than p(x) less than p(z).
One further point: The ordinary human mind-body system usually learns through an empirico-inductive frequency-payoff method, as I describe in Toward. So it makes sense that a true expert would have assimilated much knowledge into her autonomic systems, analogous to algorithms used in computing pattern detection and "auto-complete" systems. Hence one might argue that, at least in some cases, there is strong reason to view the "subjective" opinion as a good measuring rod. Of course, then we must ask, How reliable is the expert? And it would seem a frequency analysis of her predictions would be the way to go.
Studies of polygraph and fingerprint examiners have shown that in neither of those fields does there seem to be much in the way of corroboration that these forensic tools have any scientific value. At the very least, such studies show that the abilities of experts vary widely (30). This is an appropriate place to bring up the matter of the "prosecutor's fallacy," which I describe here:
The prosecutor's fallacy
http://kryptograff.blogspot.com/2007/07/probability-and-prosecutor-there-are.html
Here we run into the issue of false positives. A test can have a probability of accuracy of 99 percent, and yet the probability that that particular event is a match can have a very low probability. Take an example given by mathematician John Allen Paulos. Suppose a terrorist profile program is 99 percent accurate and let's say that 1 in a million Americans is a terrorist. That makes 300 terrorists. The program would be expected to catch 297 of those terrorists. Yet, the program has an error rate of 1 percent. One percent of 300 million Americans is 3 million people. So a data-mining operation would turn up some 3 million "suspects" who fit the terrorist profile but are innocent nonetheless. So the probability that a positive result identifies a real terrorist is 297 divided by 3 million, or about one in 30,000 -- a very low likelihood.
But data mining isn't the only issue. Consider biometric markers, such as a set of facial features, fingerprints or DNA patterns. The same rule applies. It may be that if a person was involved in a specific crime or other event, the biometric "print" will finger him or her with 99 percent accuracy. Yet context is all important. If that's all the cops have got, it isn't much. Without other information, the odds are still tens of thousands to one that the police or the Border Patrol have the wrong person.
The practicality of so-called Bayesian reasoning has been given by Enrico Fermi, who would ask his students to estimate how many piano tuners were working in Chicago. Certainly, one should be able to come up with plausible ballpark estimates based on subjective knowledge.
Conant on Enrico Fermi and a 9/11 plausibility test
http://znewz1.blogspot.com/2006/11/enrico-fermi-and-911-plausibility-test.html
I have also used the Poisson distribution for a Bayesian-style approach to the probability that wrongful executions have occurred in the United States.
Fatal flaws
http://znewz1.blogspot.com/2007/06/fatal-flaws.html
Some of my assumptions in those discussions are open to debate, of course.
More on Laplace's rule
The physicist Harold Jeffreys agrees with Keynes that the rule of succession isn't plausible without modification, that is via some initial probability. In fact the probability in the Laplacian result of (m+1)/(m+2) after one success is 2/3 that the next trial will succeed by this route -- which, for some experimental situations, Jeffreys regards as too low, rather than too high!
CTK Math Network derives Laplace's rule
https://www.cut-the-knot.org/Probability/RuleOfSuccession.shtml#
I find it interesting that economist Jevons's use of the Laplacian formula echoes the doomsday argument of Gott. Jevons observed that "if we suppose the sun to have risen demonstratively" one billion times, the probability that it will rise again, on the ground of this knowledge merely, is
(10
9 + 1)/
(10
9 + 2)
Yet, notes Jevons, the probability it will rise a billion years hence is
(10
9 + 1)/
(2*10
9 + 2)
or very close to 1/2.
Though one might agree with Jevons that this formula is a logical outcome of the empirico-inductivist method in science, it is the logic of a system taken to an extreme where, I suggest, it loses value. That is to say, the magnification of our measuring device is too big. A question of that sort is outside the scope of the tool. Of course, Jevons and his peers knew nothing of the paradoxes of Cantor and Russell, or of Goedel's remarkable results. But if the tool of probability theory -- whichever theory we're talking about -- is of doubtful value in the extreme cases, then a red flag should go up not only cautioning us that beyond certain boundaries "there be dragons," but also warning us that the foundations of existence may not really be explicable in terms of so-called blind chance.
In fact, Jevons does echo Keynes's view that extreme cases yield worthless quantifications, saying: "Inference pushed far beyond their data soon lose a considerable probability." Yet, we should note that the whole idea of the Laplacean rule is to arrive at probabilities when there is very little data available. I suggest that not only Jevons, but Keynes and other probability theorists, might have benefited from more awareness of set theory. In other words, we have sets of primitive observations that are built up in the formation of the human social mind and from there, culture and science build sets of relations from these primitive sets.
So here we see the need to discriminate between a predictive algorithm based upon higher sets of relations (propensities of systems), versus a predictive algorithm that emulates the human mind's process of assessing predictability based on repetition, at first with close to zero system information (the newborn). And a third scenario is the use of probabilistic assessment in imperfectly predictive higher-level algorithms.
"We ought to always be applying the inverse method of probabilities so as to take into account all additional information," argues Jevons. This may or may not be true. If a system's propensities are very well established, it may be that variations from the mean should be regarded as observational errors and not indicative of a system malfunction.
"Events when closely scrutinized will hardly ever prove to be quite independent, and the slightest preponderance one way or the other is some evidence of connexion, and in the absence of better evidence should be taken into account," Jevons says (31).
First of all, two events of the same type are often beyond close scrutiny. But, what I think Jevons is really driving at is that when little is known about a dynamical system, the updating of probabilities with new information is a means of arriving at the system's propensities (biases). In other words, we have a rough method of assigning a preliminary information value to that system (we are forming a set of primitives), which can be used as a stopgap until such time as a predictive algorithm based on higher sets is agreed upon, even if that algorithm also requires probabilities for predictive power. Presumably, the predictive power is superior because the propensities have now been well established.
So we can say that the inverse method, and the rule of succession, is in essence a mathematical systemization of an intuitive process, which, however, tends to be also fine-gauged. By extension, much of the "scientific method" follows such a process, where the job of investigators is to progressively screen out "mere correlation" as well as to improve system predictability.
Or that is to say, a set based on primitive observations is "mere correlation" and so, as Pearson argues, this means that the edifice of science is built upon correlation, not cause. As Pearson points out, the notion of cause is very slippery, which is why he prefers the concept of correlation (32). Yet, he also had very little engagement with set theory. I would say that what we often carelessly regard as "causes" are to be found in the mathematics of sets.
~(A ∩ B) may be thought of as the cause of ~A ∪ ~B.
Of course, I have left out the time elements, as I only am giving a simple example. What I mean is that sometimes the relations among higher-order sets correspond to "laws" and "causes."
On Markov chains
Conditional probability of course takes on various forms when it is applied. Consider a Markov chain, which is considered far more "legitimate" than Laplace's rule.
Grinstead and Snell gives this example: The Land of Oz is a fine place but the weather isn't very good. Ozmonians never have two nice days in a row. "If they have a nice day, they are just as likely to have snow as rain the next day. If they have snow or rain, they have an even chance of having the same the next day. If there is change from snow or rain, only half of the time is this a change to a nice day."
With this information, a Markov chain can be obtained and a matrix of "transition probabilities" written.
Grinstead and Snell gives this theorem: Let P be the transition matrix of a Markov chain, and let u be the probability vector which represents the starting distribution. Then the probability that the chain is in state Si after n steps is the ith entry in the vector u(n) = uPn (33).
Grinstead and Snell chapter on Markov chains
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf
Wolfram MathWorld on Markov chain
http://mathworld.wolfram.com/MarkovChain.html
At least with a Markov process, the idea is to deploy non-zero propensity information, which is determined at some specified state of the system. Nevertheless, there is a question here as to what type of randomness is applicable. Where does one draw the line between subjective and objective in such a case? That depends on one's reality superstructure, as discussed later.
At any rate, it seems fair to say that what Bayesian algorithms, such as the rule of succession, tend to do is to justify via quantification our predisposition to "believe in" an event after multiple occurrences, a Darwinian trait we share with other mammals. Still, it should be understood that one is asserting one's psychological process in a way that "seems reasonable" but is at root faith-based and may be in error. More knowledge of physics may increase or decrease one's confidence, but intuitive assumptions remain faith-based.
It can be shown via logical methods that, as n rises, the opportunities for a Goldbach pair, in which n is summable by two primes, rise by approximately n2. So one might argue that the higher an arbitrary n, the less likely we are to find a counterexample. And computer checks verify this point.
Or one can use Laplace's rule of succession to show that the probability that the proposition holds for n is given by (n+1)/(n+2). In both cases, at infinity, we have probability 1, or "virtual certainty," that Goldbach's conjecture is true, and yet it might not be, unless we mean that the proposition is practically true because it is assumed that an exception occurs only occasionally. And yet, there remains the possibility that above some n, the behavior of the primes changes (there being so few). So we must even beware the idea that the probabilities are even meaningful over infinity.
At any rate, the confidence of mathematicians that the conjecture is true doesn't necessarily rise as n is pushed by ever more powerful computing. That's because no one has shown why no counterexample can occur. Now, one is entitled to act as though the conjecture is true. For example, one might include it in some practical software program.
A scientific method in the case of attacking public key cryptography is to use independent probabilities concerning primes as a way of escaping a great deal of factorization. One acts as though certain factorization conjectures are true, and that cuts the work involved. When such tests are applied several times, the probability of insufficient factorization drops considerably, meaning that a percentage of "uncrackable" factorizations will fall to this method.
As Keynes shrewdly observed, a superstition may well be the result of the empirical method of assigning a non-numerical probability based on some correlations. For example, when iron plows were first introduced into Poland, that development was followed by a succession of bad harvests, whereupon many farmers revived the use of wooden plowshares. In other words, they acted on the basis of a hypothesis that at the time seemed reasonable.
They also had a different theory of cause and effect than do we today, though even today correlation is frequently taken for causation. This follows from the mammalian psychosoma program that adopts the "survival oriented" theory that when an event often brings a positive or negative feeling, that event is the cause of the mammal's feeling of well-being.
Keynes notes the "common sense" that there is a "real existence of other people" may require an a priori assumption, an assumption that I would say implies the existence of a cognized, if denied, noumenal world. So the empirical, or inductive, notion that the real existence of a human being is "well established" we might say is circular.
Unlike many writers on the philosophy of science, Popper (34) rejected induction as a method of science. "And although I believe that in the history of science it is always the theory and not the experiment, always the idea and not the observation, which opens up the way to new knowledge, I also believe that it is the experiment which saves us from following a track that leads nowhere, which helps us out of the rut, and which challenges us to find a new way."
(Popper makes a good point that there are "diminishing returns of learning by induction." Because lim[m,n --> ∞ ] (m/n) = 1. In other words, as more evidence piles up, its value decreases with the number of confirmations.)
Bertrand Russell, who was more interested in Bayesian estimation than either the classical or frequency outlooks, was cautious about inductive knowledge. He backed the approach "advocated by Keynes" to the effect that, in Russell's words, "inductions do not make conclusions probable unless certain conditions are fulfilled, and that experience alone can never prove that these conditions are fulfilled."
[z1]
A note on complexity
As it is to me inconceivable that a probabilistic scenario doesn't involve some dynamic system, it is evident that we construct a theory -- which in some disciplines is a mathematically based algorithm or set of algorithms for making predictions. The system with which we are working has initial value information and algorithmic program information. This information is non-zero and tends to yield propensities, or initial biases. Even so, the assumptions or primitive notions in the theory either derive from a subsidiary formalism or are found by empirical means; these primitives derive from experiential -- and hence unprovable -- frequency ratios.
I prefer to view simplicity of a theory as a "small" statement (which may be nested inside a much larger statement). From the algorithmic perspective, we might say that number of parameters is equivalent to number of input values, or, better, that the simplicity corresponds to the information in the algorithm design and input. Simplicity and complexity may be regarded as two ends of some spectrum of binary string lengths.
Another way to view complexity is similar to the Chaitin algorithmic information ratio, but distinct. In this case, we look at the Shannon redundancy versus the Shannon total information.
So the complexity of a signal -- which could be the mathematical representation of a physical system -- would then not be found in the maximum information entailed by equiprobability of every symbol. The structure in the mathematical representation implies constraints -- or conditional probabilities for symbols. So then maximum structure is found when symbol A strictly implies symbol B in a binary system, which is tantamount to saying A = B, giving the uninteresting string: AA...A.
Maximum structure then violates our intuitive idea of complexity. So what do we mean by complexity in this sense?
A point that arises in such discussions concerns entropy (the tendency toward decrease of order) and the related idea of information, which is sometimes thought of as the surprisal value of a digit string. Sometimes a pattern such as AA...A is considered to have low information because we can easily calculate the nth value (assuming we are using some algorithm to obtain the string). So the Chaitin-Kolmogorov complexity is low, or, that is, the information is low. On the other hand a string that by some measure is effectively random is considered here to be highly informative because the observer has almost no chance of knowing the string in detail in advance.
Leon Brillouin in Science and Information Theory gives a thorough and penetrating discussion of information theory and physical entropy. Physical entropy he regards as a special case under the heading of information theory (32aa).
Shannon's idea of maximum entropy for a bit string means that it has no redundancy, and so potentially carries the maximum amount of new information. This concept oddly ties together maximally random with maximally informative. It might help to think of the bit string as a carrier of information. Yet, because we screen out the consumer, there is no practical difference between the "actual" information value and the "potential" information value, which is why no one bothers with the "carrier" concept.
Still, we can also take the opposite tack. Using runs testing, most digit strings (multi-value strings can often be transformed, for test purposes, to bi-value strings) are found under the bulge in the runs test bell curve and represent probable randomness. So it is unsurprising to encounter such a string. It is far more surprising to come across a string with far "too few" or far "too many" runs. These highly ordered strings would then, from this persepctive, be considered to have high information value because possibly indicative of a non-random organizing principle.
This distinction may help address Stephen Wolfram's attempt to cope with "highly complex" automata (32a). By these, he means those with irregular, randomlike stuctures running through periodic "backgrounds" (sometimes called "ether"). If a sufficiently long runs test were done on such automata, we would obtain, I suggest, z scores in the high but not outlandish range. The z score would give a gauge of complexity.
We might distinguish complicatedness from complexity by saying that a random-like permutation of our grammatical symbols is merely complicated, but a grammatical permutation, taking into account transmission error, is complex.
In this respect, we might also construe complexity as a measure of coding efficiency.
So we know that "complexity" is a worthwhile concept, to be distinguished -- at times -- from "complicatedness." We would say that something that is maximally complicated has the quality of binomial "randomness;" it resides with the largest sets of combinations found in the 68% zone.
I suggest that we may as well define maximally complex to mean a constraint set yielding 50% redundancy in Shannon information. That is to say, I' = I - Ic, where I' is the new information, I the maximum information that occurs when all symbols are equiprobable (zero structural or propensity information).
Consider two specific primes that are multiplied to form a composite. The names of the primes and the multiplication algorithm. This may be given an advance information value Ic. Alice, who is doing the computation, has this "information," but doesn't know what the data stream will look like when the composite is computed. But she would be able to estimate the stream's approximate length and might know that certain substrings are very likely, or certain. In other words, she has enough advance information to devise conditional probabilities for the characters.
Bob encounters the data string and wishes to decipher it. He lacks part of Ic: the names of the primes. So there is more information in the string for him than for Alice. He learns more once he deciphers it than does she, who needn't decipher.
In this respect we see that for him the characters are closer to equiprobability, or maximum Shannon entropy, than for Alice. For him, the amount of information is strongly correlated to the algorithmic work involved. After finding the square root, his best algorithm -- if he wants to be certain of obtaining the primes -- is the sieve of Eratosthenes. This is considered a "hard" computing problem as the work increases exponentially with n.
On the other hand, if Alice wants to compute pn x pm, her work increases close to linearly.
A string with maximum Shannon entropy means that the work of decipherment is very close to kn, where k represents the base k number system.
We see then that algorithmic information and standard Shannon information are closely related by the concept of computing work.
Another way to view complexity is via autocorrelation. So an autocorrelation coefficient near 1 or -1 can be construed to imply high "order." As Wikipedia notes, autocorrelation, also known as serial correlation, is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.
Multidimensional autocorrelation can also be used as a gauge of complexity. Yet, it would seem that any multidimensional signal could be mapped onto a two-dimensional signal graph. (I concede I should look into this further at some point.) But, we see that the correlation coefficient, whether auto or no, handles randomness in a way that is closely related to the normal curve. Hence, the correlation coefficient for something highly complex would fall somewhere near 1 or -1, but not too close, because, in general, extreme order is rather uncomplicated.
One can see that the autorcorrelation coefficient is a reflection of Shannon's redundancy quantity. (I daresay there is an expression equating or nearly equating the two.)
When checking the randomness of a signal, the autocorrelation lag time is usually put at 1, according to the National Institute of Standards and Technology, which relates the following:
Given measurements, Y1, Y2, ..., YN at time X1, X2, ..., XN, the lag k autocorrelation function is defined as
rk =
Σ N-ki=1 (Y
i - Y')
----------------------
Σ
Ni=1 (Y
i - Y')2
with Y' representing the mean of the Y values.
Although the time variable, X, is not used in the formula for autocorrelation, the assumption is that the observations are equi-spaced.
Wikipedia article on autocorrelation
http://en.wikipedia.org/wiki/Autocorrelation
NIST article on autocorrelation
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm
In another vein, consider the cross product A X B of phenomena in A related to B, such that a is a member of A and b is a member of B and aRb means a followed by b and the equivalence relation applies, such that the relation is reflexive, symmetric and transitive.
One algorithm may obtain a smaller subset of A X B than does another. The superior algorithm fetches the larger subset, with the caveat that an "inferior" algorithm may be preferred because its degree of informational complexity is lower than that of the "superior" algorithm.
One might say that algorithm X has more "explanatory power" than algorithm Y if X obtains a larger subset of A X B than does Y and, depending on one's inclination, if X also entails "substantially less" work than does Y.
The method of science works about like the technique in bringing out a logic proof via several approximations. Insight can occur once an approximation is completed and the learner is then prepared for the next approximation or final proof.
This is analogous to deciphering a lengthy message. One may have hard information, or be required to speculate, about a part of the cipher. One then progresses -- hopefully -- as the new information helps unravel the next stage. That is to say, the information in the structure (or, to use Shannon's term, in the redundancy) is crucial to the decipherment. Which is to say that a Bayesian style of thinking is operative. New information alters probabilities assigned certain substrings.
Decipherment of a coded or noisy message is a pretty good way of illustrating why a theory might be considered valid. Once part of the "message" has been analyzed as having a fair probability of meaning X, the scientist ("decoder") uses that provisional information, along with any external information at hand, to make progress in reading the message. Once a nearly complete message/theory is revealed, the scientist/decoder and her associates believe they have cracked the "code" based on the internal consistency of their findings (the message).
In the case of science in general, however, no one knows how long the message is, or what would assuredly constitute "noise" in the signal (perhaps, a priori wrong ideas?). So the process is much fuzzier than the code cracker's task.
Interestingly, Alan Turing and his colleagues used Bayesian conditional probabilities as part of their decipherment program, establishing that such methods, whatever the logical objections, work quite well in some situations. Still, though the code-cracking analogy is quite useful, it seems doubtful that one could use some general method of assigning probabilities -- whether of the Turing or Shannon variety -- to scientific theories, other than possibly to toy models.
Scientists usually prefer to abstain from metaphysics, but their method clearly begs the question: "If the universe is the signal, what is the transmitter?" or "Can the message transmit itself?" Another fair question is: "If the universe is the message, can part of the message (we humans) read the message fully?"
We have a problem of representation when we pose the question: "Can science find an algorithm that, in principle, simulates the entire universe?" The answer is that no Turing machine can model the entire universe.
Conant on Hilbert's sixth problem
https://cosmosis101.blogspot.com/2017/07/on-hilberts-sixth-problem-and-boolean.html
15. A Treatise on Probability by J.M. Keynes (Macmillan, 1921).
16. Treatise, Keynes.
17. Treatise, Keynes.
18. Treatise, Keynes.
19. Grundbegriffe der Wahrscheinlichkeitsrechnung by Andrey Kolmogorov (Julius Springer, 1933). Foundations of the Theory of Probability (in English) was published by Chelsea Publishing Co. in 1950.
20. Popper's propensity theory is discussed in his Postscript to The Logic of Scientific Discovery, which was published in three volumes as:
a. Realism and the Aim of Science, Postscript Volume I, Routledge, 1985. Hutchinson, 1983.
b. The Open Universe, Postscript Volume II Routledge, 1988. Hutchinson, 1982.
c. Quantum Theory and the Schism in Physics, Postscript Volume III, Routledge, 1989. Hutchinson, 1982.
21. Symmetry by Hermann Weyl (Princeton, 1952).
21a. Treatise, Keynes.
22. Theory of Probability by Harold Jeffreys (Oxford/Clarendon Third edition 1961; originally published in 1939)
23. Gamma: Exploring Euler's Constant by Julian Havil (Princeton, 2003).
24. Interpreting Probability: controversies and developments of the early 20th century by David Howie (Cambridge 2002).
25. The Logic of Scientific Discovery by Karl Popper. Published as Logik der Forschung in 1935; English version published by Hutchinson in 1959.
26. Postscript, Popper.
which was published in three volumes as:
27. The Signal and the Noise: Why So Many Predictions Fail But Some Don't by Nate Silver (Penguin 2012).
In his book, Silver passively accepts the official yarn about the events of 9/11, and makes an egregious error in the logic behind his statistical discussion of terrorist acts.
Amazing blunder drowns out 'Signal'
http://conantcensorshipissue.blogspot.com/2013/04/amazing-blunder-drowns-out-signal-on.html
For a kinder review, see
John Allen Paulos in 2012 at the Washington Post.
28. Treatise, Keynes.
29. Treatise, Keynes.
30. Calculated Risks: How to know when numbers deceive you by
Gerd Gigerenzer (Simon and Schuster 2002).
31. The Principles of Science (Vol. I) by William Stanley Jevons (Routledge/Thoemmes Press, 1996 reprint of 1874 ms).
32. The Grammar of Science by Karl Pearson (Meridian 1957 reprint of 1911 revised edition).
32aa. Science and Information Theory, Second Edition, by Leon Brillouin (Dover 2013 reprint of Academic Press 1962 edition; first edition, 1956).
32a. A New Kind of Science by Stephen Wolfram (Wolfram Media, 2002).
33. Charles M. Grinstead and Laurie J. Snell Introduction to Probability, Second Edition (American Mathematical Society 1997).
34. Logic, Popper.
35. Probability by Mark Kac in "The Mathematical Sciences -- A collection of essays" (MIT 1969). The essay appeared originally in Scientific American, Vol. 211, No. 3, 1964.
Part III
The empirico-inductive concept
On induction and Bayesian inference, John Maynard Keynes wrote:
"To take an example, Pure Induction can be usefully employed to strengthen an argument if, after a certain number of instances have been examined, we have, from some other source, a finite probability in favour of the generalisation, and, assuming the generalisation is false, a finite uncertainty as to its conclusion being satisfied by the next hitherto unexamined instance which satisfies its premise."
He goes on to say that pure induction "can be used to support the generalisation that the sun will rise every morning for the next million years, provided that with the experience we have actually had there are finite probabilities, however small, derived from some other source, first, in favour of the generalisation, and, second, in favour of the sun's not rising to-morrow assuming the generalisation to be false," adding: "Given these finite probabilities, obtained otherwise, however small, then the probability can be strengthened and can tend to increase towards certainty by the mere multiplication of instances provided that these instances are so far distinct that they are not inferable one from another" (35).
Keynes's book is highly critical of a theory of scientific induction presented by a fellow economist, William Stanley Jevons, that uses Laplace's rule of succession as its basis. Keynes argues here that it is legitimate to update probabilities as new information arrives, but that -- paraphrasing him in our terms -- the propensity information and its attendant initial probabilities cannot be zero. In the last sentence in the quotation above, Keynes is giving the condition of independence, a condition that is often taken for granted, but that rests on assumptions about randomness that we will take a further look at as we proceed. Related to that is the assumption that two events have enough in common to be considered identical. Even so, this assumption must either be accepted as a primitive, or must be based on concepts used by physicists. We will question this viewpoint at a later point in our discussion. Other Bayesian probabilists, such as Harold Jeffreys, agree with Keynes on this.
One might say that the empirico-inductive approach, at its most unvarnished, assumes a zero or near-zero information value for the system's propensity. But such an approach yields experimental information that can then be used in probability calculations. A Bayesian algorithm for updating probabilities based on new information -- such as Laplace's rule of succession -- might or might not be gingerly accepted for a specific case, such as one of those posed by J. Richard Gott III.
Article on Gott
http://en.wikipedia.org/wiki/J._Richard_Gott
It depends on whether one accepts near-zero information for the propensity. How does one infer a degree of dependence without any knowledge of the propensity? If it is done with the rule of succession, we should be skeptical. In the case of sun risings, for all we know, the solar system -- remember, we are assuming essentially no propensity information -- is chaotic and only happens to be going through an interval of periodicity or quasi-periodicity (when trajectories cycle toward limit points -- as in attractors -- ever more closely). Maybe tomorrow a total eclipse will occur as the moon shifts relative position, or some large body arrives between the earth and sun. This seems preposterous, but only because in fact we are aware that the propensity information is non-zero; that is to say, we know something about gravity and have observed quite a lot about solar system dynamics.
But Gott argues that we are entitled to employ the Copernican principle in some low-information scenarios. The "information" in this principle says that there is no preferred orientation in space and time for human beings. With that in mind, the "doomsday argument" follows, whereby the human race is "most likely" to be about halfway through its course of existence, taking into account the current world population. We note that the doomsday argument has commonalities with Pascal's wager on the existence of God. In other words, Pascal assigned a probability of 1/2 for existence and against existence based on the presumption of utter ignorance. Yet, how is this probability arrived at? There are no known frequencies available. Even if we use a uniform continuous distribution from 0 to 1, the prior isn't necessarily found there, system information being ruled out and expert opinion unwelcome. That is to say, how do we know that an initial probability exists at all?
The doomsday argument
http://en.wikipedia.org/wiki/Doomsday_argument
With respect to the doomsday scenario, there are frequencies, in terms of average lifespan of an arbitrary species (about a million years), but these are not taken into account. The fact is that the doomsday scenario's system "information" that we occupy no privileged spacetime position is, being a principle, an assumption taken as an article of faith. If we accept that article, then we may say that we are less likely to be living near the beginning or ending of the timeline of our species, just as when one arrives at a bus stop without a schedule, one expects that one probably won't have to wait the maximum possible time interval, or something near the minimum, between buses. Similarly, we hazard this guess based on the idea that there is a low-level of interaction between one's brain and the bus's appearance, or based on the idea that some greater power isn't "behind the scenes" controlling the appearance of buses. If one says such ideas are patently absurd, read on, especially the sections on noumena (Part VI, see sidebar).
Another point: an inference in the next-bus scenario is that we could actually conduct an experiment in which Bus 8102 is scheduled to arrive at Main and 7th streets at 10 minutes after the hour and a set of random integers in [0,60] is printed out; the experimenter then puts his "arrival time" as that randomly selected number of minutes past the hour; these numbers are matched against when the 8102 actually arrives, the results compiled and the arithmetic mean taken. Over a sufficient number of tests, the law of large numbers suggests that the average waiting time is a half hour. Because we know already that we can anticipate that result, we don't find it necessary to actually run the trials. So, is that Bayesianism or idealistic frequentism, using imaginary trials?
In the case of the plausibility that we are about halfway through the lifespan of our species, it is hard to imagine even a fictional frequency scenario. Suppose we have somehow managed to obtain the entire population of Homo sapiens sapiens who have ever lived or who ever will live. From that finite set, a person is chosen randomly, or as close to randomly as we can get. What is the probability our choice will come from an early period in species history? There is no difference between that probability and the probability our choice came from about the halfway mark. Of course, we hasten to concede that such reasoning doesn't yield much actionable knowledge. What can anyone do with a probability-based assessment that the human species will be extinct in X number of years, if X exceeds one's anticipated lifetime?
Concerning imaginary trials, E.T. Jaynes (36), a physicist and Bayesian crusader, chided standard statistics practitioners for extolling objectivity while using fictional frequencies and trials. A case in point, I suggest, is the probability in a coin toss experiment that the first head will show up on an odd-numbered flip. The probability is obtained by summing all possibilities to infinity, giving an infinite series limit of 2/3. That is Σ a
i = Σ 1/2
i = 2/3 as n goes infinite. That probability isn't reached after an infinitude of tosses, however. It applies immediately. And one would expect that a series of experiments would tend toward the 2/3 limit. Even so, such a set of experiments is rarely done. The sum is found by use of the plus sign to imply the logical relation "exclusive or." The idea is that experiments with coins have been done, and so independence has been well enough established to permit us to make such a calculation without actually doing experiments to see whether the law of large numbers will validate the 2/3 result. That is to say, we say that the 2/3 result logically follows if the basic concept of independence has been established for coin tossing in general.
Concerning the doomsday argument, we note that in the last few hundred years the human population has been increasing exponentially. Prior to that, however, its numbers went up and down in accord with Malthusian population dynamics as indicated by the logistic differential equation, which sets a threshold below which a population almost certainly goes extinct. That harsh Malthusian reality becomes ever more likely as the global population pushes the limits of sustainability. So this tends to rule out some normally distributed population over time -- whereby the current population is near the middle and low population tails are found in past and future -- simply because the population at some point may well return from its current exponential distribution to a jaggedly charted curve reminiscent of stock market charts.
The bus stop scenario can serve to illustrate our expectation that random events tend to cancel each other out on either side of a mean. In other words, we expect that the randomly chosen numbers below the mean of 30 to show low correlation with the randomly chosen numbers above the mean, but that nevertheless we think that their average will be 30. This is tantamount to defining the interval [-30,30] where 0 represents the median 30 in the interval [0,60]; we suspect that if we add all the randomly chosen numbers the sum will be close to zero. Why do "randoms" tend to cancel? Elsewhere, we look into "hidden variables" views.
Of course, our various conceptions of randomness and our belief in use of such randomness as a basis for predictions would be greatly undermined by even one strong counterexample, such as an individual with a strong "gift" of telekinesis who is able to influence the computer's number selection. At this point, I am definitely not proposing that such a person exists. Still, if one did come to the attention of those whose mooring posts are the tenets of probability theory and inferential statistics, that person (or the researcher reporting on the matter) would come under withering fire because so many people have a strong vested emotional interest in the assumptions of probability theory. They would worry that the researcher is a dupe of the Creationists and Intelligent Design people. But let us not tarry too long here.
More on induction
Jevons saw the scientific method as based on induction and used Laplace's rule of succession as a "proof" of scientific induction. Yet neither he, nor Pearson, who also cited it favorably, included a mathematical explanation of Laplace's rule, unlike Keynes, who analyzed Laplace's rule and also offered an amended form of it. Jevons, Pearson and Keynes all favored forms of Bayesian reasoning, often called "the inverse method."
Jevons and probability
http://plato.stanford.edu/entries/william-jevons/
On causality and probability, Jevons wrote: "If an event can be produced by any one of certain number of different causes, the probabilities of the existence of these causes as inferred from the event, are proportional to the event as derived from the causes," adding: "In other words, the most probable cause of an event which has happened supposing the cause to exist; but all other possible causes are also taken into account with probabilities proportional to the probability that the event would have happened if the cause existed" (37).
Jevons then uses standard ball and urn conditional probability examples.
A point of dispute here is the word "cause." In fact, in the urn and ball case, we might consider it loose usage to say that it is most probable that urn A is the cause of the outcome.
Still, it is fair to say that urn A's internal composition is the most probable relevant predecessor to the outcome. "Causes" are the hidden sub-vectors of the net force vector, which reaches the quantum level, where causation is a problematic idea.
The problem of causation deeply occupied both Pearson and Fisher, who were more favorably disposed to the concept of correlation as opposed to causation. We can see here that their areas of expertise would tend to promote such a view; that is to say that they, not being physicists, would tend to favor positing low propensity information. Philosophically, they were closer to the urn of nature concept of probability than they might have cared to admit.
And that brings us back to the point that a probability method is a tool for guidance in decision-making or possibly in apprehending truth, though this second item is where much ambiguity arises.
One must fill in the blanks for a particular situation. One must use logical reasoning, and perhaps statistical methods, to go from mere correlation to causation, with the understanding that the problem of cause and effect is a notorious philosophical conundrum.
Cause-effect is in many respects a perceptual affair. If one steps "outside" the spacetime block (see section on spacetime), where is cause-effect?
Also, consider the driver who operates his vehicle while under the influence of alcohol and becomes involved in an auto accident. He is held to be negligent as if by some act of will, whereby his decision to drink is said to have "caused" the accident. First, if a person's free-will is illusory, as seems to be at least partly true if not altogether true, then how do we say his decision caused anything? Second, some might term the decision to drink and drive the "proximate" cause of the accident. Yet, there are many other influences (causes?) that sum to the larger "cause." The interlinked rows of dominos started falling sometime long ago -- if one thinks in a purely mechanistic computer-like model. How does one separate out causes? We address this issue from various perspectives as we go along.
Brain studies tend to confirm the point that much that passes for free will is illusory. And yet, this very fact seems to argue in favor of a need for a core "animating spirit," or amaterial entity: something that is deeper than the world of phenomena that includes somatic functions; such a liberated pilot spirit would, it seems to me, require a higher order spirit to bring about such a liberation. I use the word "spirit" in the sense of amaterial unknown entity and do not propose a mystical or religious definition; however, the fact that the concept has held for centuries suggests that many have come intuitively to the conclusion that "something is in there."
I realize that here I have indulged in "non-scientific speculation" but I argue that "computer logic" leads us to this means of answering paradoxes. That is to say, we have a Goedelian argument that points to a "higher frame." But in computer logic, the frames "go all the way up" to infinity. We need something outside, or that is greater than and fundamentally different from, the spacetime block with which to have a bond.
Pearson in The Grammar of Science (38) makes the point that a randomized sequence means that we cannot infer anything from the pattern. But, if we detect a pattern, we can then write an algorithm for its continuation. So we can think of the program as the cause, and it may or may not give a probability 1 as to some number's existence at future step n.
I extend Pearson's idea here; he says the analogy should not be pressed too far but I think it makes a very strong point; and we can see that once we have an algorithm, we have basic system information. The longer the recognizable sequence of numbers, the higher the probability we assign it for non-randomness; see my discussion on that:
A note on periodicity and probability
http://kryptograff5.blogspot.com/2013/08/draft-1-please-let-me-know-of-errors.html
Now when we have what we suspect is a pattern, but have no certain algorithm, then we may find ways to assign probabilities to various conjectures as to the underlying algorithm.
A scientific theory serves as a provisional algorithm: plug in the input values and obtain predictable results (within tolerances).
If we see someone write the series
1,2,4,8,16,32
we infer that he is doubling every previous integer. There is no reason to say that this is definitely the case (for the moment disregarding what we know of human learning and psychology), but with a "high degree of probability," we expect the next number to be 64.
How does one calculate this probability?
The fact that the series climbs monotonically would seem to provide a floor probability at any rate, so that a nonparametric test would give us a useful value. Even so, what we have is a continuum. Correlation corresponds to moderate probability that A will follow B, causation to high probability of same. In modern times, we generally expect something like 99.99% probability to permit us to use the term "cause." But even here, we must be ready to scrap our assumption of direct causation if a better theory strongly suggests that "A causes B" is too much of a simplification.
For example, a prosecutor may have an apparently air-tight case against a husband in a wife's murder, but one can't completely rule out a scenario whereby CIA assassins arrived by black helicopter and did the woman in for some obscure reason. One may say that the most probable explanation is that the husband did it, but full certainty is rare, if it exists at all, in the world of material phenomena.
And of course the issue of causation is complicated by issues in general relativity -- though some argue that these can be adequately addressed -- and quantum mechanics, where the problem of causation becomes enigmatic.
Popper argued that in "physics the use of the expression 'causal explanation' is restricted as a rule to the special case in which universal laws have the form of laws of 'action by contact'; or more precisely, of 'action at a vanishing distance' expressed by differential equations" (39) [Popper's emphasis].
The "principle of causality," he says, is the assertion that any event that is amenable to explanation can be deductively explained. He says that such a principle, in the "synthetic" sense, is not falsifiable. So he takes a neutral attitude with respect to this point. This relates to our assertion that theoretic systems have mathematical relations that can be viewed as cause and effect relations.
Popper sees causality in terms of universal individual concepts, an echo of what I mean by sets of primitives. Taking up Popper's discussion of "dogginess," I would say that one approach is to consider the ideal dog as an abstraction of many dogs that have been identified, whereby that ideal can be represented by a matrix with n unique entries. Whether a particular object or property qualifies as being associated with a dogginess matrix depends on whether that object's or property's matrix is sufficiently close to the agreed universal dogginess matrix. In fact, I posit that perception of "reality" works in part according to such a system, which has something in common with the neural networks of computing fame.
(These days, of course, the ideal dog matrix can be made to correspond to the DNA sequences common to all canines.)
But, in the case of direct perception, how does the mind/brain know what the template, or matrix ideal, of a dog is? Clearly, the dogginess matrix is compiled from experience, with new instances of dogs checked against the previous matrix, which may well then be updated.
A person's "ideal dog matrix" is built up over time, of course, as the brain integrates various percepts. Once such a matrix has become "hardened," a person may find it virtually impossible to ignore that matrix and discover a new pattern. We see this tendency especially with respect to cultural stereotypes.
Still, in the learning process, a new encounter with a dog or representation of a dog may yield only a provisional change in the dogginess matrix. Even if we take into account all the subtle clues of doggy behavior, we nevertheless may be relating to something vital, in the sense of nonphysical or immaterial, that conveys something about dogginess that cannot be measured. On the other hand, if one looks at a still or video photograph of a dog, nearly everyone other than perhaps an autistic person or primitive tribesman unaccustomed to photos, agrees that he has seen a dog. And photos are these days nothing but digital representations of binary strings that the brain interprets in a digital manner, just as though it is using a neural matrix template.
Nevertheless, that last point does not rule out the possibility that, when a live dog is present, we relate to a "something" within, or behind, consciousness that is nonphysical (in the usual sense). In other words, the argument that consciousness is an epiphenomenon of the phenomenal world cannot rule out something "deeper" associated with a noumenal world.
The concept of intuition must also be considered when talking of the empirico-inductive method. (See discussion on types of intuition in the "Noumenal world" section of Part VI; link in sidebar.)
More on causality
David Hume's argument that one cannot prove an airtight relation between cause and effect in the natural world is to me self-evident. In his words:
"Matters of fact, which are the second objects of human reason, are not ascertained in the same manner [as are mathematical proofs]; nor is our evidence of their truth, however great, of a like nature with the foregoing. The contrary of every matter of fact is still possible, because it can never imply a contradiction, and is conceived by the mind with the same facility and distinctness, as if ever so conformable to reality. That the sun will not rise tomorrow is no less intelligible a proposition, and implies no more contradiction, than the affirmation, that it will rise. We should in vain, therefore, attempt to demonstrate its falsehood. Were it demonstratively false, it would imply a contradiction, and could never be distinctly conceived by the mind..."
To summarize Hume:
I see the sun rise and form the habit of expecting the sun to rise every morning. I refine this expectation into the judgment that "the sun rises every morning."
This judgment cannot be a truth of logic because it is conceivable that the sun might not rise. This judgment cannot be conclusively established empirically because one cannot observe future risings or not-risings of the sun.
Hence, I have no rational grounds for my belief, but custom tells me that its truthfulness is probable. Custom is the great guide of life.
We see immediately that the scientific use of the inductive method itself rests on the use of frequency ratios, which themselves rest on unprovable assumptions. Hence a cloud is cast over the whole notion of causality.
This point is made by Volodya Vovk: "... any attempt to base probability theory on frequency immediately encounters the usual vicious cycle. For example, the frequentist interpretation of an assertion such as Pr(E) = 0.6 is something like: we can be practically (say, 99.9%) certain that in 100,000 trials the relative frequency of success will be within 0.02 of 0.6. But how do we interpret 99.9%? If again using frequency interpretation, we have infinite regress: probability is interpreted in terms of frequency and probability, the latter probability is interpreted in terms of frequency and probability, etc" (40).
Infinite regress and truth
We have arrived at a statement of a probability assignment, as in: Statement S: "The probability of Proposition X being true is y."
We then have:
Statement T: "The probability that Statement Q is true is z."
Now what is the probability of Q being true? And we can keep doing this ad infinitum?
Is this in fact conditional probability? Not in the standard sense, though I suppose we could argue for that also.
Statement S is arrived at in a less reliable manner than statement T, presumably, so that such a secondary can be justified, perhaps.
This shows that, at some point, we simply have to take some ancillary statement on faith.
Pascal's wager and truth
"From nothing, nothing" is what one critic has to say about assigning equal probabilities in the face of complete ignorance.
Let's take the case of an observer who claims that she has no idea of the truthfulness or falseness of the statement "God exists."
As far as she is concerned, the statement and its negation are equally likely. Yes, it may be academic to assign a probability of 1/2 to each statement. And many will object that there are no relevant frequencies; there is no way to check numerous universes to see how many have a supreme deity.
And yet, we do have a population (or sample space), that being the set of two statements {p, ~p}. Absent any other knowledge, it may seem pointless to talk of a probability. Yet, if one is convinced that one is utterly ignorant, one can still take actions:
1. Flip a coin and, depending on the result, act as though God exists, or act as though God does not exist.
2. Decide that a consequence of being wrong about existence of a Supreme Being is so great that there is nothing to lose and a lot to gain to act as though God exists (Pascal's solution).
3. Seek more evidence, so as to bring one closer to certainty as to whether p or ~p holds.
In fact, the whole point of truth estimates is to empower individuals to make profitable decisions. So when we have a set of equiprobable outcomes, this measures our maximum ignorance. It is not always relevant whether a large number of trials has established this set and its associated ratios.
That is to say, one can agree that the use of mathematical quantifications in a situation such as Pascal's wager is pointless, and yields no real knowledge. But that fact doesn't mean one cannot use a form of "probabilistic reasoning" to help with a decision. Whether such reasoning is fundamentally wise is another question altogether, as will become apparent in the sections on noumena (Part VI, see sidebar).
There have been attempts to cope with Hume's "problem of induction" and other challenges to doctrines of science. For example, Laplace addressed Hume's sun-rising conundrum with the "rule of succession," which is based on Bayes's theorem. Laplace's attempt, along with such scenarios as "the doomsday argument," may have merit as thought experiments, but cannot answer Hume's basic point: We gain notions of reality or realities by repetition of "similar" experiences; if we wish, we could use frequency ratios in this respect. But there is no formal way to test the truthfulness of a statement or representation of reality.
Is this knowledge?
"Science cannot demonstrate that a cataclysm will not engulf the universe tomorrow, but it can prove that past experience, so far from providing a shred of evidence in favour of any such occurrence, does, even in the light our ignorance of any necessity in the sequence of our perceptions, give an overwhelming probability against such a cataclysm." -- Karl Pearson (41)
I would argue that no real knowledge is gained from such an assessment. Everyone, unless believing some occasional prophet, will perforce act as though such an assessment is correct. What else is there to do? Decision-making is not enhanced. Even so, we suggest that, even in the 21st century, physics is not so deeply understood as to preclude such an event. Who knows, for example, how dark matter and dark energy really behave at fundamental levels? So there isn't enough known to have effective predictive algorithms on that scale. The scope is too great for the analytical tool. The truth is that the propensities of the system at that level are unknown.
This is essentially the same thought experiment posed by Hume, and "answered" by Laplace. Laplace's rule is derived via the application of the continuous form of Bayes's theorem, based on the assumption of a uniform probability distribution. In other words, all events are construed to have equal probability, based on the idea that there is virtually no system information (propensity), so that all we have to go on is equal ignorance. In effect, one is finding the probability of a probability with the idea that the possible events are contained in nature's urn. With the urn picture in mind, one then is trying to obtain the probability of a specific proportion. (More on this later.)
S.L. Zabell on the rule of succession
http://www.ece.uvic.ca/~bctill/papers/mocap/Zabell_1989.pdf
To recapitulate, in Pearson's day, as in ours, there is insufficient information to plug in initial values to warrant assigning a probability -- though we can grant that Laplace's rule of succession may have some merit as an inductive process, in which that process is meant to arrive at the "most realistic" initial system information (translated as an
a priori propensity).
Randomness versus causality
Consider, says Jevons, a deck of cards dealt out in numerical order (with suits ranked numerically). We immediately suspect nonrandomness. I note that we should suspect nonrandomness for any specific order posited in advance. Even so, in the case of progressive arithmetic order, we realize that this is a common choice of order among humans. If the deck proceeded to produce four 13-card straight flushes in a row, one would surely suspect design.
But why? This follows from the fact that the number of "combinations" is far higher than the number of "interesting" orderings.
Here again we have an example of the psychological basis of probability theory. If we took any order of cards and asked the question, what is the probability it will be dealt that way, we get the same result: (52!)-1 =~ (8*10
67).
Now suppose we didn't "call" that outcome in advance, but just happened upon a deck that had been recently dealt out after good shuffling. What is our basis for suspecting that the result implies a nonrandom activity, such as a card sharp's maneuvering, is at work? In this case a nonparametric test, such as a runs test or sign test may well apply and be strongly indicative. Or, we may try to compile a set of "interesting" orderings, such as given by various card games along with other common ordering conventions. Such a set, even if quite "large," will still be tiny with respect to the complete set of permutations. That is Subset P'/Set P will yield a very small probability ratio.
Be that as it may, Jevons had no nonparametric test at hand (unless one considers Laplace's rule to be such) but even so argued that if one sees a deck dealt out in arithmetical order, then one is entitled to reason that chance did not produce it. This is a simple example of the inverse method.
Jevons points out that, whether math is used or not, scientists tend to reason by inverse probability. He cites the example of the then recently noted flint flakes, many found with evidence of more than one stroke of one stone against another. Without resort to statistical analysis, one can see why scientists would conclude that the flakes were the product of human endeavor.
In fact, we might note that the usual corpus of standard statistical models does indeed aim to sift out an inverse probability of sorts in a great many cases, notwithstanding the dispute with the Bayesian front.
The following examples of the inverse method given by Jevons (42) are of the sort that Keynes disdained:
1. All larger planets travel in the same direction around the sun; what is the probability that, if a new planet exterior to Neptune's orbit is discovered, it will follow suit? In fact, Pluto, discovered after Jevons's book was published (and since demoted from planetary status), also travels in the same direction around the sun as the major planets.
2. All known noble gases, excepting chlorine, are colorless; what is the probability that, if some new noble gas is discovered, it will be colorless? And here we see the relevance of a system's initial information. Jevons wrote well before the electronic theory of chemistry was worked out. (And obviously, we have run the gamut of stable elements, so the question is irrelevant from a practical standpoint.)
3. Bode's law of distance from the sun to each planet, except Neptune, showed in Jevons's day close agreement with distances calculated using a specific mathematical expression, if the asteroid belt was also included. So, Jevons reasoned that the probability that the next planet beyond Neptune would conform to this law is -- using Laplace's rule -- 10/11. As it happens, Pluto was found to have fairly good agreement with Bode's law. Some experts believe that gravitational pumping from the larger planets has swept out certain oribital zones, leaving harmonic distances favorable to long-lasting orbiting of matter.
The fact that Laplace's rule "worked" in terms of giving a "high" ranking to the plausibility of a Bode distance for the next massive body to be found may be happenstance. That is to say, it is plausible that a physical resonance effect is at work, which for some reason was violated in Neptune's case. It was then reasonable to conjecture that these resonances are not usually violated and that one may, in this case, assign an expert opinion probability of maybe 80 percent that the next "planet" would have a Bode distance. It then turns out that Laplace's rule also gives a high value: 90.9 percent. But in the first case, the expert is using her knowledge of physics for a "rough" ranking. In the second case, no knowledge of physics is assumed, but a definitive number is given, as if it is somehow implying something more than what the expert has suggested, when in fact it is an even rougher means of ranking than is the expert's.
Now one may argue that probability questions should not be posed in such cases as Jevons mentions. Yet if we remember to regard answers as tools for guidance, they could possibly bear some fruit. Yet, an astronomer might well scorn such questions because he has a fund of possibly relevant knowledge. And yet, though it is hard to imagine, suppose life or death depends on a correct prediction. Then, if one takes no guidance from a supernatural higher power, one might wish to use a "rationally" objective probability quantity.
Induction, the inverse method and the "urn of nature" may sound old-fashioned, but though the names may change, much of scientific activity proceeds from these concepts.
How safe is the urn of nature model?
Suppose there is an urn holding an unknown but large number of white and black balls in unknown proportion. If one were to draw 20 white balls in a row, "common sense" would tell us that the ratio of black balls to white ones is low (and Laplace's rule would give us a 21/22 probability that the next ball is white). I note that "common sense" here serves as an empirical method. But we have two issues: the assumption is that the balls are well mixed, which is to say the urn composition is presumed homogeneous; if the number of balls is high, homogeneity needn't preclude a cluster of balls that yields 20 white draws.
We need here a quantitative way to measure homogeneity; and this is where modern statistical methods might help, given enough input information. In our scenario, however, the input information is insufficient to justify the assumption of a 0.5 ratio. Still, a runs test is suggestive of nonrandomness in the sense of a non-0.5 ratio.
Another issue with respect to induction is an inability to speak with certainty about the future (as in, will the ball drop if I let go of it?). This in fact is known as is "the problem of induction," notably raised by Hume.
To summarize some points previously made, induction is a process of generalizing from the observed regularities. Now these regularities may be gauged by applying simple frequencies or by rule of succession reasoning, in which we have only inference or a propensity of the system, or by deductive reasoning, whereby we set up an algorithm that, when applied, accounts for a range of phenomena. Here the propensity is given a nonzero information value. Still, as said, such deductive mechanisms are dependent on some other information -- "primitive" or perhaps axiomatic propensities -- as in the regularities of gravity. Newton and Einstein provide nonzero framework information (propensities) leading to deductions about gravity in specific cases. Still, the deductive system of Newton's gravitational equation depends on the axiomatic probability 1 that gravity varies by inverse square of the distance from a ponderable object's center of mass, as has, for non-relativistic magnitudes, been verified extensively with very occasional anomalies ignored as measurement outliers.
Popper takes note of the typical scientist's "metaphysical faith of the existence of regularities in the world (a faith which I share and without which practical action is hardly conceivable)" (43).
Measures of future uncertainty, such as the Gaussian distribution, "satisfy our ingrained desire to 'simplify' by squeezing into one single number matters that are too rich to be described by it. In addition, they cater to psychological biases and our tendency to understate uncertainty in order to provide an illusion of understanding the world," observed Benoit Mandelbrot and Nassim Taleb (the ink to this quotation was taken down by the Financial Times.)
Some outliers are just too big to handle with a normal curve. For example, if a 300-pound man's weight is added to that of the weights of 100 other persons, he isn't likely to have substantial effect on the mean. But if Bill Gates's net income is added to the incomes of 100 other persons, the mean will be meaningless. Similarly, the Fukushima event, using Gaussian methods, was extraordinarily improbable. But the world isn't necessarily as Gaussian as we would like to believe. As said previously, one way to approach the idea of regularity is via pattern recognition matrices. If a sufficient number of entries in two matrices are identical, the two "events" so represented are construed as identical or similar to varying degrees between 1 and 0. But of course, we are immediately brought to the concept of perception, and so we may say that Popper has a metaphysical faith in the reality-sensing and -construction process of the minds of most people, not including some severe schizophrenics. (See
Toward.)
Imagine a situation in which regularities are disrupted by sudden jumps in the mind's reality constructor. Life under such conditions might be unbearable and require neurological attention. On the other hand, sometimes "miracles" or "works of wonder," are attested to, implying that some perceive occasional violation of the humdrum of regularities, whether this is a result of a psychosomatic process, wishful/magical thinking, or some sort of intersection with a noumenal world (See Part VI).
The theme of "regularities" coincides with what Gott calls the Copernican principle, which I interpret as implying a metaphysical faith that the rules of nature are everywhere the same (except perhaps in parallel universes).
It is important to face Hume's point that scientific ideologies of various sorts rest upon unprovable assumptions. For example, the Copernican principle, which Gott interprets as meaning that a human observer occupies no special time or place in the cosmos, is a generalization of the Copernican-Galilean model of the solar system. Interestingly, by the way, the Copernican principle contrasts with the anthropic cosmological principle (discussed later).
Einstein's belief in what must be considered a form of Laplacian realism is put in sharp relief with this assertion:
“The only justification for our concepts and system of concepts is that they serve to represent the complex of our experiences; beyond this they have no legitimacy. I am convinced that the philosophers have had a harmful effect upon the progress of scientific thinking in removing certain fundamental concepts from the domain of empiricism, where they are under our control, to the intangible heights of the
a priori. For even if it should appear that the universe of ideas cannot be deduced from experience by logical means, but is, in a sense, a creation of the human mind, without which no science is possible, nevertheless this universe of ideas is just as little independent of the nature of our experiences as clothes are of the form of the human body” (44).
The problem of induction is an obstacle for Einstein. Scientific inquiry requires that it be ignored. Yet, one might say that this irrational rationalism led to a quagmire that he was unable to see his way past, despite being a friend and colleague of the physicist-turned-logician Kurt Goedel, who had strong reservations about what is sometimes termed Einstein's naive realism.
Another take on this subject is to make a formally valid statement: A implies B, which is to say, "If A holds, then so does B." So if one encounters A as being true or as "the case," then he can be sure that B is also the case. But, at some point in his chain of reasoning, there is no predecessor to A. So then A must be established by induction or accepted as axiomatic (often both) and not by deduction. A is not subject to proof within the system. Of course this is an elementary observation, but those who "believe in Science" need to be reminded that scientific method is subject to certain weaknesses inherent in our plane of existence.
So we tend to say that though theories cannot be proved true, there is a level of confidence that comes with how many sorts of phenomena and how many special cases are successfully predicted by a theory (essentially, in the "hard" sciences, via an algorithm or set of algorithms).
But the fact that some theories are quite successful over a range of phenomena does not mean that they have probabilistically ruled out a noumenal world. It does not follow that successful theories of phenomena (and they are not fully successful) demonstrate that a noumenal world is highly improbable. In fact, David Bohm's struggles with quantum theory led him to argue that the world of phenomena must be supplemented by a noumenal world (he did not use that term) that permits bilocality via an "implicate," or unseen, order.
The noumenal world is reflected by our attempts to contend with randomness. The concept of pure randomness is, I would say, an ideal derived from our formalizations of probability reasoning. Consider a notionally constructible binary string, in which each unit is selected by a quantum detector. For example, if the algorithm clock is set for 1 second intervals. A detection of a cosmic ray in that period is recorded as a 1, whereas no detection during the interval receives a 0.
If this algorithm runs to infinity, we have a probability 1 of every possible finite substring appearing an infinity of times (ignoring the presumed change in cosmic ray distribution tens of billions of years hence). This follows from (1-p)
n = 1 as n goes infinite. So, for example, Fred Hoyle noticed that if the cosmos is infinite in size, we would expect an infinity of Fred Hoyles spread across the cosmos.
But, unless you are quite unusual, such a possibility doesn't accord with your concept of reality, does it? You have an inner sense here that we are "playing with numbers." And yet, in the Many Worlds interpretation of quantum physics, there is either an infinite or some monstrously large set of cosmoses in which versions of Fred Hoyle are found many times over -- and remarkably, this multiplicity scenario was developed in order to affirm causality and be rid of the notion of intrinsic randomness.
The Many Worlds "multiverse" is one explanation of what I term the noumenal world. But this interpretation has its problems, as I discuss in
Toward.
Yet it is hard not to make common cause with the Many Worlds defenders, arguing that randomness run amok does not seem an appropriate representation of the universe.
35.
A Treatise on Probability by J.M. Keynes (Macmillan, 1921).
36.
E.T. Jaynes: Papers on probability, statistics and statistical physics, R.D. Rosenkrantz editor (D. Reidel 1983).
37.
The Principles of Science (Vol. I) by William Stanley Jevons (Routledge/Thoemmes Press, 1996 reprint of 1874 ms).
38.
The Grammar of Science by Karl Pearson (Meridian 1957 reprint of 1911 revised edition).
39.
The Logic of Scientific Discovery by Karl Popper. Published as
Logik der Forschung in 1935; English version published by Hutchinson in 1959.
40. Vovk's paper, "Kolmogorov complexity conception of probability," appears in
Probability Theory, Philosophy, Recent History and Relations to Science edited by Vincent F. Hendricks, Stig Andur Pedersen, Klaus Frovin Jorgensen.
41.
Grammar, Pearson.
42.
Principles of Science, Jevons.
Part IV
Equiprobability and the empirico-inductive framework
We return to the scenario in which system information is zero. In other words, there is no algorithm known to the observer for obtaining systemic information that can be used to assign an initial probability.
Let us do a thought experiment in which a ball is to be drawn from an urn containing n black and/or white balls. We have no idea of the proportion of white to black balls. So, based on ignorance, every ratio is equiprobable. Let n = 2.
We have of course
BB WW BW WB
By symmetry, we then have for a 2-unit (white or black) test, a probability of 1/2 for a first draw of black from an urn containing between 0 and 2 black balls and 0 and 2 white balls. This symmetry, under the condition, holds for any number of balls in the urn, meaning it holds as n approaches infinity.
One might assume the urn has a denumerable infinitude of balls if one has zero knowledge of how many balls are inside the urn. If n is infinite, a "long" run of k black balls cannot affect the probability of the (k+1)th draw, as we know from (lim n --> inf) k/n = 0. Note that we are not assuming no bias in how the balls are arranged, but rather that we simply have no knowledge in that area.
(Similar reasoning applies to multinomial examples.)
Does this scenario justify the concept of independence? It may be so used, but this description is not airtight. Just because, based on ignorance, we regard any ratio as equiprobable, that does not entitle us to assume that every ratio is in fact equiprobable. This presentation, however, suffices for a case of zero input information where we are dealing with potential ratios.
An important point here is that in this case we can find some rational basis for determining the "probability of a probability." But, this holds when we have the knowledge that one ratio exists for k colors of balls. Yet, if we don't know how many colors of balls are potentially available, we must then sum the rationals. Even if we exclude the pure integers, the series sum exceeds that of the harmonic series from n = 3 on and so the sum of all ratios is unbounded, or, that is to say, undefined, or one might say that one ratio in an infinitude carries probability 0.
So empirico-inductive methods imply a Bayesian way of thinking, as with approaches such as the rule of succession or possibly, nonparametric tests. Though these approaches may help us answer the question of the probability of an observer encountering a "long" run, we can never be certain a counterexample won't occur. That is to say, we are saying that we believe it is quite probable that a probability method tends to establish "objective" facts that are to be used within some scientific discipline. We feel this confidence because it has been our experience that what we conceive of as identical events often show patterns, such as a periodicity of 1 (as with a string of sun-risings).
Now an empirico-inductive method generally assumes effective independence. Again, utter intrinsic physical randomness is not assumed, but rather ignorance of physical relations, excepting the underlying idea that the collective mass of "causes" behind any event is mostly neutral and can be ignored. So then we must also account for probability of non-occurrence of a long run, so that on the assumption of independence, we assert that the probability of a long run not occurring within n steps is about (1 - p)
n.
For example, consider the number 1321 (assuming equiprobability of digit selection based on some randomization procedure). Taking logs, we find that the run 1321 has a probability of 1/2 of occurrence within a string of length 6932.
The use of the independence assumption to test the probability of independence is what the runs test does, but such is not the approach of the rule of succession, though the rule of succession might be said to have anticipated nonparametric tests, such as the runs test.
The runs test uses n/2 as the mean for runs in a randomized experiment. And we have shown elsewhere that as n increases, a string of two or more identical runs (a periodic run) has a probability approaching that of the probability for a sole permutation, which, in base 2, is 2
-n, as I discuss in my paper
Note on the probability of periodicity
http://kryptograff5.blogspot.com/2013/08/draft-1-please-let-me-know-of-errors.html
We may also ask the question: how many runs are there of length m or greater that fall upon a string of length n? And one might use this information to obtain a probability of a run of length m. For example, for n = 6932, any specified string of length 4 has a probability of about 1/2 of appearing somewhere in the overall string.
For a specified substring of length 7, we require a string of 693,148 digits for a probability of 1/2 of encountering it.
Modern computing methods greatly increase our capacity for such calculations, of course.
If we stick with an urn model, we are suggesting zero bias, or, at any rate, zero input knowledge, excluding the knowledge of how many elements (possible colors) are in use. We can then match the outcome to the potential probabilities and argue that the string 1321 is reasonable to expect on a string of length 6932, but that it is not reasonable to expect to bet on the string 1321578 turning up within 6932 draws.
In other words, we are not arguing for complete axiomatic justification of an empirico-inductive approach, but we are suggesting that such an approach can be justified if we think it acceptable to assume lack of bias in selection of balls from an urn.
The above tends to underscore my point earlier that probabilities are best viewed in terms of logically explicable -- by some line of reasoning -- inequalities that assist in the decision-making process.
Of course, the standard idea is to say that the probability of the number 1321 appearing over the first four trials is 10
-4. Implicit here is the simplifying assumption of equiprobability, an assumption that serves well in many cases. What we have tried to do, however, is posit some credible reason for such an assumption. We may suspect bias if we called the number 1321578 in advance and it turned up as a substring over 6932 draws. The urn model often permits a fair amount of confidence that the draws ought be unbiased. In my opinion, that model is superior to the coin-toss model, which can always be criticized on specific physical grounds, such as the virtual impossibility of positioning the center of mass exactly at the coin's geometric centroid.
Note that in our scenario positing independence, we begin with the assumption of nearly complete ignorance; we have no propensity information if the urn model is used to represent a typical dynamical system. Again, the propensity information in the urn system is that we have no reason to think that the urn's contents are not at maximum entropy, or homogeneity. By specifying the number 1321 before doing 6932 draws, we are saying that such a test is of little value in estimating bias, as the probability is 1/2. Yet, if we test other longer numbers, with the cutoff at substring length 7, we find that calling the number 1321578 provides a fairly good test for bias. Determining the kind of bias would require further refinements.
As noted, of course, we may find it extraordinarily unlikely that a physical experiment really has zero bias; what we are usually after is negligible bias. In other words, if one flips a coin and finds an experimental bias of 0.5 +/- 10
-10, in most cases we would very likely ignore the bias and accept the coin as fair.
Equiprobability and propensity
In his book
Symmetry, Weyl wrote, "We may be sure that in casting dice which are perfect cubes, each side has the same chance, 1/6. Because of symmetry sometimes we are thus able to make predictions
a priori on account of the symmetry of the special cases, while the general case, as for instance the law of equilibrium for scales with arms of different lengths [found by Archimedes], can only be settled by experience or by physical principles ultimately based on experience. As far as I see, all
a priori statements in physics have their origins in symmetry" (44).
By symmetry in the die case, Weyl means the center of mass and the geometric centroid are the same and that all sides are congruent. Hence, he believes it obvious that equiprobability follows. Yet he has neglected here to talk about the extreme sensitivity to initial conditions that yield effectively random outcomes, based on force and position of the throw, along with background fluctuations. These variations in the evolving net force vector are held to be either undetectable or not worth bothering about. But, at any rate, I interpret his remarks to mean that symmetry implies here equiprobability on account of the assumption that the laws of physics are invariant under spatial translation. Specifically, a Newtonian system of forces is implied.
In the perfect die scenario, we would say that the system's propensity information is essentially that the laws of physics remain the same under spatial translation. The propensity of the system is summarized as zero bias or equiprobability of six outcomes.
Consider a perfectly equiprobable gambling machine.
1. It is impossible or, if possible, extremely difficult to physically test the claim of true equiprobability, as it is notionally possible for bias to be out of range of the computational power -- although we grant that one might find that bias beyond a certain decimal can't occur because the background perturbations would never be strong enough.
2. If one uses enough resources (not infinite), one comes asymptotically close to an error-free message, though one which moves very slowly. But, if the output is random (true noise), there is no signal. Yet an anti-signal contains information (if we have enough bits for analysis). If the redundancy is too low and we have no reason to suspect encryption, then we have the information that the bit stream is noise. (As Shannon demonstrated, language requires sufficient structure, or redundancy -- implying conditional probabilities among letters -- to be able to make crossword puzzles.)
So if we determine that a bit stream is pure noise, can we then be sure that the bit samples are normally, or randomly, distributed? No, because one can think of a "pseudo-message" in which an identifiable pattern, such as a period, is sent that does not contain sufficient redundancy at a relevant scale. On the other hand, if the scale of the bit string is large enough, the redundancy inherent in the period will show. But there are other possibilities that might escape redundancy testing though one parametric test or another is liable to detect the presence of apparent structure.
3. We are also entitled to wonder whether considerations of entropy rule out such a device.
Popper cites Alfred Lande's concern with the "law of large numbers" in which Landes shows that the idea that hidden "errors" cancel is a statistical result derived from something that can be accounted for deterministically (45).
So one might say that there exists a huge number of "minor" hidden force vectors relative to the experiment in the environment and that these tend to cancel -- over time -- summing to some constant, such as 0. For purposes of the experiment, the hidden net force vectors tend to be of the same order of magnitude (the available orders of magnitude would be normally distributed).
Popper's point is: Why are the vectors distributed relatively smoothly, whether normally or uniformly?
The answer: That is our empirical experience. And the paradigm of normal distribution of hidden vectors may only be partly accurate, as I try to show in my paper
Toward.
Of course, the phenomena represented by dynamical systems would not have been systematized had there not been some provisional input information, and so it is difficult to say what "equiprobable" really means. There are a number of reasons for questioning the concept of equiprobability. Among those are the fact that a dynamical system would rarely be free of at least a slight bias. But even when we talk about bias, we are forced to define that word as the complement of equiprobable.
Another issue is what is meant by repeatable event. The whole of probability theory and inferential statistics is grounded on the notions of repeatable and yet distinguishable events -- what Pearson called "regularities" in nature. But as I show in
Toward and as others have shown, regularities are a product of the interface of the brain-mind and the "external" environment. (The work of Parmenides and his adherent Zeno do not pose mere intellectual brain-teasers. And what constitutes the ship of Theseus?)
The concept of equiprobability essentially axiomatically accepts that some sets are composed of events that do not influence one another. Yet, if events are, in ways not fully understood, products of mental activity, it is quite bold to assume such an axiom. Even so, one might accept that, under certain circumstances, it may be possible to neglect apparently very slight influences attributable to the fact that one event may be construed an "effect" of the entire set of "causes" inherent in the world, including a previous event under consideration. There are so many such "causes" that, we suspect, they strongly tend to cancel out, just as the electromagnetism of our planet is close to neutral because the electron charges are distributed randomly. (As you see, it is pretty much impossible to avoid circularities.)
One idea might be to employ an urn model, whereby we avoid the issues of causation and instead focus on what the observer doesn't know. If, let's say, all we know is that the urn contains only black or white balls, then, as shown above, we may encapsulate that knowledge with the ratio 1/2. This however doesn't tell us anything about the actual proportion, other than what we don't know about it. So perhaps in this case we may accept the idea of equiprobability.
Yet if we do know the proportion -- say 2 black balls and 3 white -- then we like to assert that the chance of drawing a black ball on the first draw is 2/3. But, even in this case, we mean that our ignorance can be encapsulated with the number 2/3 (without reference to the "law of large numbers"). We don't actually know what the "true" probability is in terms of propensity. It's possible that the balls have been so arranged that it is highly likely a white ball will come out first.
So in this example, an assurance that the game is fair gives the player some information about the system; to wit, he knows that homogeneous mixing has occurred, or "well shuffling," or "maximum entropy." That is to say, the urn has been "well shaken" to assure that the balls are "randomly" distributed. The idea is to reduce a component of the propensity information to the minimum.
Now we are saying that the observer's ignorance is in accord with "maximum" or equilibrium entropy of the system. (We have much more to say about entropy in Part V.)
Now suppose we posit an urn with a denumerably infinite number of balls. Such a model might be deployed to approximate a physical system. But, as n gets large or infinite, how can we be sure of maximum entropy, or well mixing? What if the mixing is only strong at a certain scale, below which clusters over a specific size are highly common, perhaps more so than small runs? We can't really be sure. So we are compelled to fall back to observer ignorance; that is, if one is sufficiently ignorant, then one may as well conjecture well mixing at all scales because there is insufficient reason not to do so.
Thus far in our analysis, the idea of equiprobability is difficult to apprehend, other than as a way of evaluating observer ignorance.
If we disregard randomness3 -- standard quantum randomness -- and randomness4, associated with Popper's version of propensity, we then have nothing but "insufficient reason," which is to say partial observer ignorance, as a basis of probability determinations.
Consider a perfectly balanced plank on a fulcrum. Disregarding background vibrations, if we add a weight x to the left end of the plank, what is the probability the plank will tilt down?
In Newtonian mechanics, the probability is 1 that the left end descends, even if the tilt is so tiny as to be beyond the means of detection. The plank's balance requires two net vectors of equal force. The tiniest change in one force vector destabilizes the balancing act. So, in this respect, we have an easily visualizable "tipping point" such that even one infinitesimal bit of extra mass is tantamount to the "cause" of the destabilization.
So if we assign a probability to the plank sinking on the left or the right, we can only be speaking of observer ignorance as to which end will receive an unseen addition to its force vector. In such a Laplacian clockwork, there is no intrinsic randomness in "objective" reality. This notion is different metaphysically, but similar conceptually, to the older idea that "there are no mere coincidences" because God controls every action.
We cannot, of course, dispense with randomness3 (unless we are willing to accept Popperian randomness4). Infinitesimals of force below the limit set by Plank's constant generally have no effect on the "macro" system. And, when attempting to deal with single quanta of energy, we then run into intrinsic randomness. Where and how will that quantum interact with the balance system? The answer requires a calculation associated with the intrinsic randomness inherent in Heisenberg's uncertainty principle.
Popperian
a priori propensity (probability8) requires its own category of randomness (randomness4). From what I can gather, randomness4 is a mixture of randomness1, randomness2 and randomness3.
Popper, I believe, wanted to eat his cake and have it too. He wanted to get rid of the anomolies of randomness3 while coming up with another, not very well justified, form of intrinsic randomness.
How are probability amplitudes conceptualized in physics? Without getting into the details of calculation, we can get a sense of this by thinking of the single-photon-at-a-time double slit experiment. If one knows where a photon is detected, that information yields very little information as to where the next photon will be detected. We do know the probability is near zero for detection to occur at what will become a diffraction line, where we have the effect of waves destructively interfering. After enough photons have been fired, one at a time, a diffraction pattern builds up that looks exactly like the interference pattern of a wave going through both silts simultaneously.
So our empirical experience gives this result. With this evidence at hand, Werner Heisenberg, Erwin Schroedinger, Paul Dirac and others devised the quantum formalism that gives, in Max Born's words, a "probability wave" conceptualization. So this formalism gives what I term quantum propensity, which is the system information used to calculate outcomes or potential outcomes. Once the propensities are known and corroborated with an effective calculation method, then frequency ratios come into play. In the case of a Geiger counter's detection of radioactive decay events, an exponential distribution of probabilities works best.
We also have ordinary propensity -- probability7 -- associated with the biases built into a gambling machine or system. Generally, this sort of propensity is closely associated with randomness1 and randomness2. Even so, modern electronic gambling systems could very well incorporate quantum fluctuations into the game design.
To summarize, equiprobability in classical physics requires observer ignorance of two or more possible outcomes, in which hidden force vectors play a role. Still, in quantum physics, a Stern-Gerlach device can be set up such that the probability of detecting spin up is 1/2 as is the probability of detecting spin down. No further information is available -- even in principle -- according to the standard view.
Popper and propensity
Resetting a penny-dropping machine changes the measure of the possibilities. We might have a slight shift to the left that favors the possibility of a head. In other words, such an "objective probability" is the propensity inherent in the system that comes with the first trial -- and so does not depend on frequency or subjective classical reasoning. (Popper eschews the term "a priori" because of the variety of ways it is used and, I suspect, because it is associated with subjective Bayesian reasoning.)
Popper asserts, "Thus, relative frequencies can be considered as the result, or the outward expression, or the appearance, of a hidden and not directly observable physical disposition or tendency or propensity; and a hypothesis concerning the strength of this physical disposition or tendency may be tested by statistical tests, that is to say, by observations of relative frequencies."
He adds, "I now regard the frequency interpretation as an attempt to do without the hidden physical reality..." (46).
Propensities are singular insofar as they are inherent in the experimental setup which is assumed to be the same for each experiment. (Thus we obtain independence, or freedom from aftereffects, for the elements of the sequence.)
"Force," observes Popper, is an "occult" term used to account for a relation between phenomena. So, he argues, why can't we have another occult term? Even if we agree with him, this seems to be exchanging one occult idea (quantum weirdness) for another (propensity). Popper wants his propensity to represent a physical property that implies that any trial will tend to be verified by the law of large numbers. Or, one might say he means spooky action at a micro-level.
The propensity idea is by no means universally accepted. At the very least, it has been argued, a propensity conceptualization is not sufficient to encompass all of probability.
Mauricio Suarez and others on propensities
http://www.academia.edu/2975215/Four_Theses_on_Probabilities_Causes_Propensities
Humphrey's paradox and propensity
http://www.jstor.org/stable/20117533
Indeterminism in Popper's sense
It is not terribly surprising to learn that the "publish or perish" mantra is seen as a factor in the observer bias that seems evident in a number of statistical studies. Examination of those studies has found that their results cannot be corroborated. The suspect here is not outright scientific fraud, but rather a tendency to wish to present "significant" results. Few journals are much interested in studies that yield null results (even though null -- or statistically insignificant -- results are often considered to be important).
I would however point out that another suspect is rarely named: the fact that probabilities may be a consequence of feedback processes. For example, it has been shown that a user's environment affects drug and alcohol tolerance -- which is to say that the actual probabilities of the addiction syndrome don't stay fixed. And further, this implies that any person's perceptions and entanglement with the environment are malleable. So it becomes a question as to whether perceived "regularities" on which probability ideas are based are, in fact, all that regular.
Alcoholics and Pavlov's dogs have much in common
http://cdp.sagepub.com/content/14/6/296.abstract
See also
http://www.apa.org/monitor/sep02/drug.aspx
And also
http://www.apa.org/monitor/mar04/pavlovian.aspx
Another point: non-replicability is a charge leveled against statistico-probabilistic studies favoring paranormal phenomena. It seems plausible however that such phenomena are in general not amenable to experiments of many trials in that we do not understand the noumenal interactions that may be involved. In fact, the same can be said for studies of what are assumed to be non-paranormal phenomena. The ordinary and the paranormal, however, may be different ends of a spectrum, in which "concrete reality" is far more illusory than we tend to realize. (Part VI, see below; also see
Toward on this blog.)
Popper, who spent a lifetime trying to reduce quantum weirdness to something amenable to classical reasoning, came to the belief that "the doctrine of indeterminism is true and that determinism is completely baseless." He objected to Heisenberg's use of the word "uncertainty" as reflecting subjective knowledge, when what occurs, Popper thought, is the scattering of particles in accord with a propensity that cannot be further reduced. (It is not clear whether Popper was aware of experiments showing single particles detected over time as an interference pattern.)
What does he mean by indeterministic? Intrinsically random?
Take the creation of a new work, Popper suggests, such as Mozart's G minor symphony; it cannot be predicted in all its detail by the careful study of Mozart's brain. Yet, some neuroscientists very likely do believe that the day will come when MRI imagery and other technology will in fact be able to make such a prediction. Still, I am skeptical, because it seems to me that the flash of creative insight is not purely computational, that some noumenal effect is involved that is more basic than computationalism. This is a position strongly advocated by Roger Penrose in
The Emperor's New Mind (47).
What Popper is getting at, when talking of indeterminism, is the issue of effective computational irreversibility, along with the problem of chaos. He wants to equate the doctrine of determinism with an idealistic form of universal predictability and hence kill off the ideal form of universal determinism.
But, is he saying that we cannot, even in principle, track all the domino chains of "causation" and hence we are left with "indeterminism"? Or, does he mean that sometimes links in the domino chains are missing and yet life goes on? The former, I think.
My thought is that he is saying that we can only model the world with an infinite sequence of approximations, analogous to what we do when computing a representation of an irrational number. So there is no subjective universal domino theory. Yet, we are left to wonder: Does his theory allow for missing dominoes and, if so, would not that notion require something beyond his "propensity"?
"Causality has to be distinguished from determinism," Popper wrote, adding that he finds "it crucially important to distinguish between the determined past and the open future." Evidently he favored
de facto free will, a form of vitalism, that was to be somehow explained by his concept of indeterminism.
"The determinist doctrine -- according to which the future is also completely determined by what has happened, wantonly destroys a fundamental asymmetry in the structure of experience" such as the fact that one never observes waves propagating back toward the point at which a stone is dropped into a pool. Popper's stance here is related to his idea that though the past is closed, the future is open. Despite propensity concepts, this idea doesn't seem properly justified. But, if we look at past and future in terms of superpositions of possible realities, an idea abhorrent to Popper, then one might agree that in some sense the future, and the universe, is "open" (48). (See Part VI; also see
Toward.)
Popper, however, seems to have been talking about limits on knowledge imposed by the light cones of special relativity, as well as those imposed by Alan Turing's result in the halting problem.
Scientific determinism, said Popper, is the "doctrine that the structure of the world is such that any event can be rationally predicted, with any desired degree of precision, if we are given a sufficiently precise description of past events, together with all the laws of nature." In other words, the Laplacian clockwork universe model, a mechanistic theory which I characterize as the domino theory.
Though Popper's
Open Universe was published in 1982, most of it was written in the early 1950s, and so we wouldn't expect Popper to have been abreast of the developments in chaos theory spurred by computerized attempts to solve previously intractable problems.
But at one point, Popper makes note of a theorem by Jacques Hadamard from 1898 discussing extreme sensitivity to initial conditions, implying that, though not up to date on chaos theory, Popper had at least some inkling of that issue. In fact, after all the locution, this -- plus his propensity idea -- seems to be what he meant by indeterminism. "For, as Hadamard points out, no finite degree of precision of the initial conditions" will permit us to learn whether an n-body system is stable in Laplace's sense, Popper writes (48).
Hadamard and chaos theory
http://www.wolframscience.com/reference/notes/971c
Popper not only faulted Heisenberg, he also took on Einstein, writing that "Einstein's famous objection to the dice-playing God is undoubtedly based on his view that probability is a stopgap based on a lack of knowledge, or human fallibility; in other words, ... his belief in the subjective interpretation of probability theory, a view that is clearly linked with determinism" (50).
Correct. In a mechanistic view, or "domino theory," there is no intrinsic randomness; external and internal realities are "really" separate and so what you don't know is knowable by someone with more knowledge of the system, perhaps some notional super-intellect in the case of the cosmos at large. So then deploying propensities as a means of talking about indeterminism seems to be an attempt to say that the universe is non-mechanistic (or acausal). Yet, by giving a name to a scenario -- "propensity" -- have we actually somehow restored "realism" to its "rightful place"?
A related point concerning prediction possibilities: Kolmogorov-Chaitin information, which is related to chaos theory results, says that one can have a fully deterministic system and still only be able to compute some output value or "final state" with as many steps or as much work as the actual readout. And, in the discussion of entropy below, I bring up a high-computation algorithm for a simple result that is meaningful in terms of what it is possible to know.
I interject here that there must be some upper limit on computational power because we know that the Busy Beaver function exceeds it.
John Carlos Baez on Busy Beaver
https://johncarlosbaez.wordpress.com/2016/05/21/the-busy-beaver-game/
At any rate, it is in principle true -- excepting the case of the busy beaver function as described by Chaitin -- that with enough input information, one can obtain an exact readout for a fully deterministic system, assuming all forces and other determinants are exactly described. Still, in the chaotic (asymmetric) three-body gravitational problem, we may find that our computing power must rise asymptotically toward infinity (the y asymptote is at 90 degrees to the x axis) as we push our prediction point into the future. So then, how does a mortal simulate reality here?
At some point, we must begin approximating values, which very often gives rise to Lorenz's "butterfly effect;" two input values differing only, say, in the fifth decimal place, may after an interval of "closeness" produce wildly different trajectories. Hence, we expect that many attempts at computation of Lorenz-style trajectories will fail to come anywhere near the true trajectory after some initial interval.
Note: there are three intertwined definitions of "butterfly effect":
1. As Wolfram MathWorld puts it: "Due to nonlinearities in weather processes, a butterfly flapping its wings in Tahiti can, in theory, produce a tornado in Kansas. This strong dependence of outcomes on very slightly differing initial conditions is a hallmark of the mathematical behavior known as chaos."
Weather system 'butterfly effect'
http://mathworld.wolfram.com/ButterflyEffect.html
2. The Lorenz attractor, derived from application of Edward N. Lorenz's set of differential equations that he devised for modeling weather systems, has the appearance of a butterfly (51).
On the Lorenz attractor
http://www.wolframalpha.com/input/?i=Lorenz+attractor
3. The "butterfly catastrophe" is named for the appearance of its graph. Such a catastrophe is produced by the following equation:
F(x,u,v,w,t) = x6 + ux4 + vx3 + wx2 + tx.
The word "catastrophe" is meant to convey the idea of a sudden, discrete jump from one dynamical state to another, akin to a phase shift. The use of the term "unfoldment parameters" echoes Bohm's concept of implicate order. We see directly that, in this sense, the dynamical system's post-catastrophe state is covertly implicit in its pre-catastrophe state. Note the nonlinearity of the equation.
The 'butterfly catastrophe'
http://mathworld.wolfram.com/ButterflyCatastrophe.html
So even if we were to grant a domino theory of nature, unless one possesses godlike powers, it is in principle impossible to calculate every trajectory to some indefinite point in a finite time period.
Further, and importantly, we learn something about so-called "causes." On the one hand, a fine "tipping point" such as when a coin bounces and lands head or tail up or on an edge, shows that a large set of small force vectors does indeed nearly cancel, but not quite. That "not quite" is the remaining net force vector. Yet, we may also see that the tipping point net force vector is composed of a large set of "coherent" sub-vectors. That is to say, "causes" may pile up on the brink of a significant event. The word "coherent" is appropriate. If we map the set of constituent vectors onto a Fourier sine wave map, it is evident that a number of small waves (force vectors) have cohered into a wave form of sufficient amplitude to "tip the balance."
44.
Symmetry by Hermann Weyl (Princeton, 1952).
45.
New Foundations of Quantum Mechanics by Alfred Landé (Cambridge University Press, 1965). Cited by Popper in Quantum Theory and the Schism in Physics (Postscript to the Logic of Scientific Discovery, Volume III; Routledge, 1989. Hutchinson, 1982).
46. Popper's evolving view on probability shows up in material added to the English language edition of
The Logic of Scientific Discovery (Hutchinson 1959). The original was published as
Logik der Forschung in 1935.
47.
The Emperor's New Mind: Concerning Computers, Minds, and the Laws of Physics by Roger Penrose, (Reed Business Information Inc.).
48.
Logic, Popper.
50.
The Open Universe, described as a "Postscript to the
Logic of Scientific Discovery", Vol. II) by Karl Popper (Routledge, 1988. Hutchinson, 1982).
51.
The Essence of Chaos by Edward N. Lorenz (University of Washington, 1996).
Part V
What exactly is entropy?
Entropy is a big bone of contention among physicists and probability theorists. Consider: does nature provide automatic "objective" shuffling processes or do we have here an artifact of human observational limitations? (52)
Can information be lost? From the viewpoint of an engineer, Shannon information is conserved. I' = I - Ic, where I is the total information, I' the new information and I
c the information in the constraints, or structural information, or propensity information.
When reflecting on that question, perhaps it would be helpful to look beyond the usual idea of Shannon information in a transitory data stream and include the ideas of storage and retrieval, being careful to grant that those concepts can easily be accommodated in standard information theory. But what of the difficulty in retrieving some bit string? Using a Kolmogorov-Chaitin sort of measure, we have the ratio of input bits to output bits, meaning that we regard maximum entropy, or equilibrium entropy, in this case as occurring when the ratio is near 1. We may or may not get effective reversibility.
A composite number contains the information implying constituent primes. But the information needed to multiply the primes is much less than that needed, usually, to find the primes in the composite. "Operational information" is lost once two primes are encoded as a composite. That is to say, what one person encodes as a composite in all but trivial cases no one else can decode with as little computation as went into the multiplication.
It is often convenient "for practical purposes" to think in terms of a classical mechanistic dynamic system, which is an approximation of a system in nature. But, we must also acknowledge that another way to look at the ability to retrieve information stems from the Heisenberg Uncertainty Principle. In a typical ensemble there is, at the quantum level, an inherent unpredictability of a specific path. So, one obviously can't reverse a trajectory which isn't precisely known. Again we discover that at the quantum level, the "arrow of time" is blurred in both "directions." Once a detection is made, we may think we know the incoming trajectory; yet, Richard Feynman devised his path integral formalism specifically in order to account for the large (infinite?) number of possible trajectories.
As soon as we measure a particle, the HUP gives a measure of information for one property and a measure of entropy for the corresponding property. (One may accept this point as a "principle of correspondence.") Before measurement, the entropy ( = the uncertainty) is the HUP relation.
Before proceeding further, let us pause to consider an important insight given by Bruce Hood.
Hood's statement for Edge
http://edge.org/response-detail/11275
"As a scientist dealing with complex behavioral and cognitive processes, my deep and elegant explanation comes not from psychology (which is rarely elegant) but from the mathematics of physics. For my money, Fourier's theorem has all the simplicity and yet more power than other familiar explanations in science. Stated simply, any complex pattern, whether in time or space, can be described as a series of overlapping sine waves of multiple frequencies and various amplitudes" (53aa).
Hood's observation neatly links the observer's brain to the physics of "external" nature, as the brain must have some way to filter out the signal from the noise. And, further, even the noise is a product of filtration from what we characterize as a phenomenal input (see
Toward).
One might think of entropy as the decreasing probability of observing a net force vector composed of coherent sub-vectors. Map a force vector onto a wave graph and consider the possible decomposition unit waves. At the unit level, a few wave forms may completely interfere constructively or destructively, but most are out of phase with each other, especially when frequencies vary. The higher the overall amplitude of the composite waveform, the less likely that sub-waveforms are all precisely in phase or that half are precisely one phase over (deconstructively interfering) from the other half.
We are using a proportion definition of probability here, such that over a particular interval, the set of mixed waveforms (that don't interfere coherently) is much larger than the set of completely coherent waveforms. In fact, we may regard the population of waveforms that fits on an interval to be composed of sample waveforms, where the sample is a sub-form composed of more sub-forms and eventually composed of unit forms. By the Central Limit Thereom, we know the set of samples is normally distributed. So it then becomes evident that the coherent waveforms are represented by normal curve tails for constructive and deconstructive coherence and the central region under the curve represents "incoherent" waveforms. Mixed waveforms have degrees of coherence, measured by their standard deviations.
Knowing that waveform coherence is normally distributed, we then say that equilibrium, or maximum, entropy occurs at the mean of waveform coherence.
In fact, this isn't quite right, because the normal curve strictly speaking represents infinity. Yet, as sub waveforms are added, and assuming that none has an outrageously high amplitude, the irregularities smooth out and the amplitude goes to zero, a peculiar form of destructive interference. Of course, the energy does not go to zero although in such an ideal scenario it does go to infinity.
Now if the number of unit waveforms is constant, then, no matter how irregular the composite waveform, there exists at least one composite waveform that must be periodic. Hence, one might claim that a truly closed system perforce violates the principle of entropy (a point that has been a source of much discussion). We can see here an echo of Ramsey order -- discussed elsewhere -- which is an important consideration when thinking about entropy.
So when talking of entropy -- and of probability in general -- we are really saying that the observer is most likely stationed near the mean of the probability distribution. Do you see the Bayesian bogeyman hiding in the shadows?
It seems to me that there is no need to account for the information, entropy or energy represented by Maxwell's demon. What we have is a useful thought experiment that establishes that there is a small, but finite probability that dissipation could spontaneously reverse, as we can rarely be sure of the macro-propensity of a system based on its micro-states. After all, even in disorderly systems, occasionally the wave forms representing the net force vector cohere.
The job of Maxwell's imp, it will be recalled, was to open a door between two gas-filled containers whenever he spotted a swifter molecule. In this way, Maxwell said, the creature would "without expenditure of work raise the temperature of B and lower that of A in contradiction to the second law of thermodynamics" (54a).
Plainly, Maxwell was deliberately disregarding the work done by the demon.
This seems an opportune point to think about the concept of work, which is measured in units of energy. Let us consider work in terms of a spherical container filled with a gas at equilibrium. The total work, in terms of pressure on the container wall, is 0. The potential work is also 0. Suppose our demon magically vaporizes the container wall without influencing any gas molecules. The net force vector (found by summing all the micro-force vectors) is, with very high probability, nearly 0. Hence, the potential power is nearly 0. (Of course, if one takes an incremental wedge of the expanding gas, one can argue that that wedge contains conditions necessary for work on that scale. In other words, a number of molecules are moving collectively by inverse square and so can exchange kinetic energy with other molecules and push them outward. Regarded in this way, a bomb performs work on surrounding objects, even though at the instant of explosion the total work, both actual and potential, is -- in an idealized scenario -- zero.)
But if a valve is opened, a set of molecules rushes to the region of lower pressure and the net force vector for this exiting gas, if contained in a pipe, is associated with non-zero power and is able to do work (such as push a piston). That is to say, the probability is fantastically high that the net vector is not zero or close to it, assuming sufficient initial pressure. Hence work, whether potential or actualized, requires a non-zero net force vector, which is tantamount to saying that its waveform representation shows a great deal of constructive interference (coherence).
In classical terms, dispersion means that for most ensembles of gas molecules, the initial arrangement of the micro-states (when we take a notional snapshot) is most probably asymmetric by the simple proportion that the set of symmetric micro-states to the set of asymmetric micro-states is minute. Now an asymmetric ensemble of micro-states, in general, takes far, far longer to return to the snapshot condition than is the case for a symmetric ensemble of micro-states. (Think of an asymmetric versus a perfectly symmetric break of a rack of pool balls on a frictionless surface.) Hence, the probability that an observer will see a violation of the second law -- even in the fictional classical case -- is remote.
The interesting thing about using a chaos analogy for entropy is that there is usable information in the chaos. In other words, the attractor, (i.e., the set of limit points) gives information analogous to the information provided by, say, the extinction threshold of a continuous logistic population equation.
Now we must expect statistical fluctuations that give order without employing a chaos model. That is to say, over infinity we expect any randomly generated run to recur an infinitude of times. But in that case, we are assuming absolute randomness (and hence, independence). There are cosmological models that posit an infinity of Paul Conants but in general the cosmos and all its subsystems are regarded as finite, and the cosmos is regarded as closed (though this property is also a bone of contention as physicists posit existence of a super-universe containing numerous cosmoses).
At any rate, one might argue that in a very large system, coherent waveforms (representing various levels of complexity) are "quite likely" to occur sometime or somewhere.
What does all this say about the presumed low entropy near the big bang versus maximum entropy in the far future (the heat death of the universe)? I would say it is conceivable that the true topology of the universe will show that "beginning" and "end" are human illusions. This point is already evident when modeling the cosmos as a four-dimensional spacetime block. The deck-of-cards model of entropy is OK at a certain range, but seems unlikely to apply at the cosmic level.
At any rate, classical ideas imply maximum entropy in the mixing process, but also imply that a closed system holding a finite number of particles is periodic, though a period may be huge.
In the case of the infinitely roomy urn, we are compelled to agree on a level of granularity in order to assess the probability of a particular cluster (run). So it is difficult to utterly justify "entropy" here.
Another point of controversy is the ergodic hypothesis, explained by Jan Von Plato thus: "The ergodic (quasi-ergodic) hypothesis is the assumption that a mechanical system goes through all states (comes arbitrarily close to all states) in its evolution. As a consequence of Liouville's theorem, this amounts to giving the same average kinetic energies to all degrees of freedom of the system (equipartition of energies). In the case of specific heats, this is contradicted, so that the ergodic hypothesis has to be given up in many cases" (54).
The ergodic hypothesis has been deemed important to the physics of gas molecules and hence entropy, though Jaynes argued that Shannon's maximum entropy sufficed for Jaynes's Bayesian calculations and methods (55).
The ergodic hypothesis
https://en.wikipedia.org/wiki/Ergodic_hypothesis
I note that the ergodic assumption is violated at the quantum level in the sense of "borrowing" of large amounts of energy, for conservation violation, which is presumably reconciled when longer time intervals are regarded (the "borrowings" are "paid off").
So an alternative notion is that the ergodic principle is from time to time violated, though probabilities can be used in such cases, we have the issue of circularity: it would seem that physical probability assumptions rest on the ergodic hypothesis, which in turn rests on probability assumptions. In addition, the ergodic conjecture fails to take proper account of quantum indeterminism, which is important in the gas scenario.
Jaynes argues that a Laplacian principle of insufficient reason should be replaced by his definition of maximum entropy.
"The max-entropy distribution may be asserted for the positive reason that it is uniquely determined as the one which is maximally non-commital with regard to measuring information, instead of the negative one that there was no reason to think otherwise," Jaynes writes, adding: "Thus, the concept of entropy supplies the missing criterion of choice, which Laplace needed to remove the apparent arbitrariness of the principle of insufficient reason..."
Jaynes notes that thermodynamic and Shannon entropy are identical except for Boltzmann's constant, and suggests that Boltzmann's constant be made equal to 1, in line with Jaynes's program to make entropy "the primitive concept with which we work, more fundamental even than energy." This idea is reminiscent of Popper making propensity an archaic property on an equal footing with force.
In the classical mechanics conception, says Jaynes, "the expression 'irreversible process' represents a semantic confusion; it is not the physical process that is irreversible, but rather our ability to follow it. The second law of thermodynamics then becomes merely the statement that although our information as to the state of the system may be lost in a variety of ways, the only way in which it can be gained is by carrying out of further measurements."
Brillouin asserts that an argument of Joseph Loschmidt and another of Ernst Zermelo and Henri Poincare regarding reversibility do not apply in the physical world because of what we now call "butterfly effect" unpredictability and the impossibility of exact measurement (55aa).
As said, quantum uncertainty is an important consideration for entropy. Jaynes comments: "Is there any operational meaning to the statement that the probabilities of quantum mechanics are objective?" (56)
In the case of the urn with a finite number of possible draws, maximum entropy is equivalent to maximum homogenization, which corresponds to the classical probability of cases/possibilities, where that proportion is all the knowledge available. What the observer is then doing when calculating the probability of a specific property on a draw is stating the fact that that number encapsulates all the information available.
A point that bears repetition: The observer is forced to assume "maximum mixing" even though he doesn't know this has actually been done. Perhaps all or most of the black balls have been layered atop most of the white balls. As long as he doesn't know that, he must assume maximum mixing, which essentially means that if all combinations are considered, particular runs are unlikely using various parametric tests or by the consideration that binomial curve logic applies.
Even if he knows there are 99 black balls and 1 white ball, he cannot be sure that the white ball hasn't been strategically placed to greatly increase his chance of drawing it. But he can do nothing but assume this is not the case, at least for the first draw.
So, again I repeat that, if possible, he would like to hear that there is no such bias, that the mixture is "fair." By fair, in this example, is meant maximum entropy: the observer has been reassured that it is objectively true that no maneuver has been employed to introduce bias and that measures have been taken to "randomize" the order of balls. Perhaps the urn has been given a good shake and the balls bounce around in such a way that the results of chaos theory apply. It becomes effectively impossible to predict order. In this case, the lone white ball might end up atop the black balls; but it would happen pseudorandomly. So we regard maximum entropy as occurring when more shaking does not substantially lengthen any calculation that might backtrack the trajectories of the bouncing balls. Once the Kolmogorov-Chaitin limit has been essentially reached, we are at entropy equilibrium.
Though it is true that at some point in the far future continued shaking yields a return to the initial order, this fact carries the caveat that the shake net force vector must be a constant, an impossibility in the real world.
Now let us consider what constitutes good mixing for an ordered 52-card deck. That is to say, to "most likely" get rid of the residuals of order, while conceding that order is a subjective concept.
Rather than talk of repeated shuffles, we suggest that a random number generator (perhaps tied to a Geiger counter) chooses numbers between 0 and 53. If a number shows up more than once, subsequent instances are ignored, a process that is terminated on the 51st card. We see that we have maximally randomized the card order and so have maximum entropy insofar as an observer who is presented with the deck and turns the first card over. Assuming draws without replacement, the maximum entropy ( = minimum information) changes on each draw, of course.
On the first draw, his chance of picking an ace of spades is (52!)
(-1). This chance he posits on the notion of fairness, or maximum entropy. This assumption is distinct from the principle of indifference where he may be presented with a deck and asked to estimate the probability of an ace of spades. From his perspective, he may say (52!)
(-1) because he has no reason to believe the deck has been stacked; but this assumption is not the same as being told that the deck has been well shuffled.
At any rate, once we have shuffled the deck with a random number generator, we gain nothing by using our random number generator to reshuffle it. The deck order is maximally unpredictable in case 1, or at maximum, or equilibrium entropy.
Nevertheless, it may be agreed that given enough shuffles our randomization algorithm yields a probability 1 of a return to the initial order (which in this case is not the same as certainty).
The number of shuffles that gives a probability of 0.99 of returning to the original permutation is, according to my calculation on
WolframAlpha, on the order of 10
64. In other words, in the abstract sense, maximum entropy has a complementary property that does indeed imply a probability near 1 of return to the original order in a finite period of time.
So one can take entropy as a measure of disorder, with an assumption of no bias in the mixing process, or uncertainty, where such an assumption cannot be made. Yet, one might conclude that we do not have a complete measure for dispersion or scattering. On the other hand, as far as I know, nothing better is available.
Consider manual shuffling of cards by a neutral dealer. Because of our observation and memory limitations, information available to us is lost. So we have the case of observer-centric entropy.
Consider two cards, face up. They are turned face down and their positions swapped once. An attentive observer will not lose track.
If we now go to three cards, we have 3!, or 6 permutations of card order. If the dealer goes through the same permutations in the same order repeatedly, then the observer can simply note the period and check to see whether the shuffle ends with remainder 0, 1 or 2, do a quick mental calculation, and correctly predict the cards before they are turned over.
But suppose the permutation changes with each shuffle. The observer now finds it harder to obtain the period, if any. That is to say, it may be that the initial permutation returns after k shuffles. But what if the dealer -- perhaps it is a computer program -- is using an algorithm which gives permutation A for the first shuffle, B for the second ... F for the sixth, and then reorders the shuffles as FABCDE, and again EFABCD and so on.
The work of following the shuffles becomes harder. Of course, this isn't the only routine. Perhaps at some point, the cycle is changed to EBCDA, followed by AEBCD, followed by DEBCA followed by ADEBC and so on.
With 52 cards, of course, the possibility of keeping track is exponentially difficult. In this sense, maximum or equilibrium entropy occurs when the observer cannot predict the position in the deck of an arbitrary card. In other words, he has become maximally ignorant of the position of the card other than knowing it must be in there somewhere.
Some deterministic algorithms require output information that is nearly as large as the input information (including the information describing the algorithm). When this information ratio is near 1, the implication is that we are maximally ignorant in this respect. There are no calculational shortcuts available. (Even if we include the potential for quantum computing, the issue of minimum computational complexity is simply "pushed back" and not resolved.)
Such an algorithm in the card-shuffling case might, for example, form meta-permutations and meta-meta-permutations and so on. That is to say, we have 6 permutations of 3 cards. So we shuffle those 6 permutations before ordering ("shuffling") the cards. But there are 6! (720) permutations of the previous set; there are 720! permutations of that set, and so on. In order to determine which order to place the 3 cards in, we first perform these "meta-orderings." So it will take the computer quite some time and work to arrive at the final order. In that case, an observer, whose mind cannot follow such calculations, must content herself with knowing that an arbitrary card is in one of three positions.
In other words, we have maximum or equilibrium entropy. Of course, 720! is the fantastic quantity 2.6 x 10
1746 (according to WolframAlpha), an absurd level of difficulty, but it illustrates the principle that we need not be restricted in computational complexity in determining something as straightforward as the order of 3 cards. Here we exceed Chaitin complexity (while noting that it is usual to describe the minimum complexity of a computation by the shortest program for arriving at it).
This amounts to saying that the entropy is related to the amount of work necessary to uncover the algorithm. Clearly, the amount of work could greatly exceed the work that went into the algorithm's computation.
Note that the loss of information in this respect depends on the observer. If we posit some sort of AI sentience that can overview the computer's immense computation (or do the computation without recourse to an earthly machine), in that case the information is presumably not lost.
In the case of gas molecules, we apply reasoning similar to that used in card-shuffling. That is to say, given sufficient time, it is certain that "classical" gas in a perfect container will reach a state in which nearly all molecules are in one corner and the remainder of the container holds a vacuum. On the other hand, the total energy of the molecules must remain constant. As there is no perfect container, the constant total kinetic energy of the molecules will gradually diminish as wall molecules transmit some kinetic energy to the exterior environment ("the enropy of the universe goes up").
So is the universe a perfect container? As it evidently has no walls, we cannot be sure that the various kinetic energies do or do not return to some original state, especially in light of the fact that the Big Bang, if it "occurred," cannot be tracked back into the Planck time interval.
And what of the intrinsic randomness inherent in quantum measurements, which makes it impossible to track back all trajectories, as each macro-trajectory is composed of many quantum "trajectories." The macro-trajectory is assumed to be what one gets with the decoherence of the quantum trajectories, but it would only take one especially anomalous "decoherence" to throw off all our macro-trajectory calculations. In addition, no one has yet come up with a satisfactory answer to the Schroedinger's cat thought experiment (see Noumena II, Part VI).
Maximum entropy can be viewed in terms of a binomial, success/failure scenario in which, as we have shown, the mean simply represents the largest set of permutations in a binomial distribution.
We should recognize that:
1. If we know the distribution, we have at hand system information that we can use to guide how we think about our samples.
2. The fact that the set of finite samples of a population is normally distributed gives us meta-information about the set of populations (and systems). (For a proof, see Appendix.)
So this fact permits us to give more credibility to samples close to the mean than those in the tails. Still, our unstated assumption is that the cognitive process does not much affect the credibility (= probability area) of a specific outcome.
At any rate, these two points help buttress the case for induction, but do not nail that case down by any means.
More on entropy
One may of course look at a natural system and see that entropy is increased, as when an egg breaks upon falling to the floor. But how about when a snowflake melts in the sun and then its puddle refreezes after dusk? Has the organization gone down by much? Crystals come apart in a phase transition and then others are formed in another phase shift. Yes, over sufficient time the planet's heat will nose toward absolute zero, perhaps leaving for some eons a huge mass of "orderly" crystals. A great deal depends on how precisely we define our system -- and that very act of definition requires the mind of an observer.
Roger Penrose says people tend to argue that information is "lost" to an observer once it gets past a black hole's event horizon (58). But, I note, it never gets past the event horizon with respect to the observer. So, in order to make the case that it is lost, one needs a super-observer who can somehow manage to see the information (particles) cross the event horizon.
It is certainly so that those physicists who believe in a single, discrete, objective cosmic reality, accept as a matter of course the idea that Shannon information can, in theory, precisely describe the entire content of the cosmos at present, in the past and in the future -- even if such a project is technically beyond human power. By this, they are able to screen out as irrelevant the need for an observer to interpret the information. And, those who believe in objective information tend to be in the corner of those who believe in "objective" probabilities, or, that is, physical propensities. Yet, we should be quick to acknowledge a range of thought among physicists on notions of probability.
In 2005, Steven Hawking revived a long-simmering argument about black holes and entropy.
"I'm sorry to disappoint science fiction fans, but if information is preserved, there is no possibility of using black holes to travel to other universes. If you jump into a black hole, your mass energy will be returned to our universe but in a mangled form, which contains the information about what you were like but in a state where it can not be easily recognized. It is like burning an encyclopedia. Information is not lost, if one keeps the smoke and the ashes. But it is difficult to read. In practice, it would be too difficult to re-build a macroscopic object like an encyclopedia that fell inside a black hole from information in the radiation, but the information preserving result is important for microscopic processes involving virtual black holes."
I suppose Hawking means by this that the particular bit string represented by the encyclopedia is encoded 1-to-1 by the bit string representing some specific ejectum from a radiating black hole. Ordinarily, we would say that the Shannon information in the encyclopedia represents some specific bit string with some probability ratio, whereas the ejected material is described by a bit string with far less redundancy than that of the encyclopedia. High complexity equals high redundancy (structure) for Shannon, implying a lower probability than maximum entropy bit string, which has no redundancy. So in that case, we would say that information is not conserved. Yet, it may also be argued that redundancy is a result of an observer's subjective values, his means of analyzing some system. So if the only information we are interested in with regard to the encyclopedia is mass-energy content, then it holds that information is conserved. Otherwise, not.
Another view is that of Kip S. Thorne, an expert on general relativity (56a), who believes black holes provide the possibility of wormholes connecting different points in Einstein spacetime (more on this in Noumena I in Part VI).
Hawking's updated black hole view
http://www.nature.com/news/2004/040712/full/news040712-12.html
Hawking paper on information loss in black holes
http://arxiv.org/pdf/hepth/0507171.pdf
So then, what does it mean to preserve information in Hawking's scenario? Preservation would imply that there is some Turing machine that can be fed the scrambled data corresponding to the physical process and reconstruct the original "signal." Yet, this can't be the case.
Quantum indeterminism prevents it. If one cannot know in principle which trajectory a particle will take, then neither can one know in principle which trajectory it has taken. Those "trajectories" are part of the superposed information available to an observer. The Heisenberg uncertainty principle ensures that some information is indeed lost, or at least hidden in superposed states. So, thinking of entropy in terms of irreversibility, we have not so much irreversibility as inability, in principle, to calculate the precise state of the particles when they were organized as an encyclopedia.
"When I hear of Schroedinger's cat, I reach for my gun," is a quotation attributed to Hawking. In other words, as an "objective realist," he does not fully accept information superposition with respect to an observer. Hence, he is able to claim that information doesn't require memory. (It is noteworthy that Hawking began his career as a specialist in relativity theory, rather than quantum theory.) Of course, if one were to find a way out of the sticky superposition wicket, then one might argue that the information isn't lost, whether the black hole slowly evaporates via Hawking radiation, or whether the particularized energy is transmitted to another point in spacetime via "tunneling." If bilocality of quantum information is required by quantum physics anyway, why shouldn't bilocality be posited for black hole scenarios?
So, assuming the superposition concept, does information superposition mean that, objectively, the particles won't one day in the far distant future reassemble as an encyclopedia? If the cosmos can be represented as a perfect container, the answer is yes with probability 1 (but not absolute certainty), given enough time. But this is the sort of probability assessment we have disdained as an attempt to use a probability tool at an inappropriate scale.
Consider just one "macro" particle in a vacuum inside a container with perfectly elastic walls in a zero-gravity field. If we fire this particle with force F at angle theta we could calculate its number of bounces and correspending angles as far into the future as we like. If we are given the force and angle data for any particular collision with a wall, we can in principle calculate as far backward in time as we like, including the points when and where the particle was introduced and with what force.
But even if we fire a quantum level particle at angle theta with force F, there is no guarantee that it will bounce with a classically derived angle. It interacts with a quantum particle in the wall and shoots toward an opposing wall's position in accord with a quantum probability amplitude. Assuming no quantization of space with regard to position, we then have a continuous number of points covered by the probability of where the next "bounce" will occur. And this continuity -- if it holds -- reinforces probability zero of reversibility.
At any rate, informational entropy is guaranteed by the HUP where equilibrium is "formal" incalculability in reverse.
Claude Shannon did not specify an observer for his form of entropy. Nevertheless, one is implicit because he was spotlighting signal versus noise. A signal without an observer is just so much background clutter. So we have a legitimate issue of information as a subjective phenomenon -- even though, from an engineering standpoint, it was a brilliant idea to ignore the property of cognition.
Leon Brillouin defines Shannon information as abstract "free information" and the data associated with physical entropy as "bound information." Shannon entropy is a term he eschews, reserving the term "entropy" for thermodynamic systems. Brillouin is then able to come up with inequalities in which the two types of information intersect. For example the "negentropy" of a physical system plus Shannon information must be greater than or equal to zero.
In fact, Brillouin finds that the smallest possible amount of negentropy required in an observation equals k ln 2, which is about 0.7 k, which is equivalent to 10
-16 Kelvin, in cgs units. In other words, he says, one bit of information cannot be obtained for less than that negentropy value.
In his system, information can be changed into negentropy, and vice versa. If the transformation is reversible, there is no loss.
Any experiment that yields information about a physical system produces on average an increase in the entropy of the system or its surroundings. This average increase is greater than or equal to the amount of information obtained. In other words, information must always be paid for by negentropy.
My take is that he means that negentropy corresponds to the system's advance (propensity) information. He then feels able to dispose of Maxwell's imp by saying the critter changes negentropy into information and back into negentropy.
In a close parallel to algorithmic information's output to input ratio, Brillouin defines the efficiency of an experimental method of observation as the ratio of information obtained Δ I to the cost in negentropy
| Δ N |, which is the entropy increase Δ S accompanying the observation. Δ N = -- Δ S.
We note that work is defined in energy units and so there is no distinction, other than conceptually, between work and any other form of energy transformation. So I suggest that negentropy is a means of measuring a property associated with a capacity for work.
Entropy is inescapable, Brillouin says, because -- even without the Heisenberg uncertainty relation -- exact measurement of physical processes is impossible. Points may exist on a Cartesian plane, but they can never be precisely located, even with interferometers or other precision methods. If that is so, the same holds for measurement of time intervals. Hence there is always a degree of uncertainty, which is tantamount to intrinsic disorder.
Even so, Brillouin, in an echo of Popper's intrinsic propensity, writes that "the negentropy principle of information is actually a new principle, and cannot be reduced to quantum and uncertainty relations."
To Brillouin, free information is encoded in pure thought. He gives this scenario:
A person possesses free information in his mind.
He tells a friend about it in English, requiring a physical process. So the information has been transformed from free to bound, via sound waves and-or electromagnetic waves. If there were errors in his mind's coding of the transmission, some free information will be lost.
Further, distortion and thermal noise in the communication channel will result in loss of some bound information.
The friend is hard of hearing, and he misses a few words. Bound information is lost, his hearing organs being part of the physical process. Yet, once this pared information is in the friend's mind, it is now free information.
After a while, the friend forgets some of the information, representing a loss of free information.
It seems evident that Brillouin suspects a mind/mody dischotomy. Otherwise, the information held in the mind would be associated with a physical system, whereby the mind is modeled as a software program deployed in the brain's hardware (56b).
I am guessing that Brillouin's purpose is to find some way to discriminate between the principle of insufficient reason and what he takes to be the objective reality of dissipation processes described in terms of probability distributions (or densities).
Brillouin distinguishes between absolute information and "distributive" (relative) information. Absolute information would exist platonically in
The Book, to borrow Paul Erdos's whimsical term for the transcendental place where mathematical theorems are kept. Relative information must be potentially of use, even though the thoughts of the consumers are disregarded.
In my view, absolute information is conveyed via ideal channels, meaning the physical entropy of the channel is disregarded. Relative information may well be construed to travel an ideal channel, with the proviso that that channel have an idealized physical component, to wit: a lower bound for physical entropy, which Brillouin in his thorough analysis provides.
Let us consider the Shannon information in the observer's brain.
We regard the brain's operating system as a circuit that carries information. We have the information in the circuit parts and the information being carried in the circuit at some specific time. Considering first the latter case, it is apparent that the operating system is a feedback control system (with quite a number of feedback loops), which, at least notionally, we might model as a software program with many feedback loops. Because of the feedback loops, the model requires that the program be treated as both transducer and receiver. So we can think of the brain's "software" as represented by a composite, iterative function, f(x-1) = x.
We may address the complexity of the data stream by relating this concept to the added redundancy given by each feedback loop, or in computer terms, by each logic gate. Each logic gate changes the probabilities of characters in the data stream. Hence, the redundancy is associated with the set of logic gates.
Now it may be that much of the data stream remains in an internal closed, or almost closed loop (which might help us get a bit of insight into the need for sleep of sentient animals). Still, in the time interval [t
a,t
b], f(x
a) serves to represent the transducer and f(x
b) the receiver, with the channel represented here by the algorithm for the function. We must synchronize the system, of course, setting t
unit to correspond to the state in which f(x
a) has done its work, yielding x
a+1.
By such constraints, we are able to view the brain and the brain's mind, if modeled as a software program running on a hardware system that is not representable as a universal Turing machine, but must be viewed as a special TM with a "long" description number.
Yet, as Penrose and others have said, it is not certain that the computing model applies fully to the conscious mind, though it does seem to fit nicely with learned and instinctive autonomic behaviors, including -- at least in part -- unconscious emotional motivations.
What prevents perfect error correction in machine and in nature?
By error in nature we mean a sentient being misreading the input data. The "erroneous" reading of data requires an assessment by some observer. If we model a brain as a Turing machine -- in particular as a TM modeling a feedback control system -- we see that the TM, no matter how seemingly complex, can't make calculational errors in terms of judgments, as a TM is fully deterministic and doesn't have room for the mystical process of judgment (unless you mean calculation of probabilities). Even if it miscalculates 2 + 3, it is not making a judgment error. Rather the feedback loop process is at a stage where it can't calculate the sum and resorts to a "guess." But, the guess occurs without choice. The guess is simply a consequence of a particular calculation at that stage.
So we can't at this juncture really discuss error made in what might be called the noumenal realm, and so "error" in machine and in nature, if excluding this noumenal realm, mean the same thing: the exact signal or data stream transmitted is not received.
In that case, consider the Shannon ideal of some perfect error correction code. Suppose the ideal turns out to require a convergent infinite series, as it well could. Truncation could reduce error vastly, but the Shannon existence theorem would in that case hold only in the infinite limit.
So, if, at root, we require information to have the property of observability, then black hole entropy might be undefined, as is the proposed black hole singularity.
One can say that the observable entropy of a black hole is low, whereas its observable order is high. But, on the other hand, the concept of entropy is easier to accept for a system involving many particles. Consider an ordinary light bulb's chaotic (low "order") emission of photons versus a laser's focused beam (high "order"). Both systems respond to the second law, though the second case is "farther from equilibrium" than the first. But what of a one-photon-at-a-time double slit experiment? We still have entropy, but must find it in the experimental machinery itself.
One might say that a black hole, from the outside, contains too few components for one to be able to say that it has a high information value.
In this respect, Brillouin uses Boltzmann's entropy formula where W has been replaced by P, for Planck's "complexions," or quantized possible orderings, for an insulated system containing two bodies of different heat in contact with each other. The total number of complexions for the composite system is P = P
1(E
1) x P
2(E
2). Using basic physical reasoning, he arrives at the well-known result that the most probable distribution of energies corresponds to the two bodies having the same temperature.
But then he notes: "The crucial point in this reasoning is the assumption that P
1(E
1) and P
2(E
2) are continuous functions with regular derivatives. This will be true only for large material systems, containing an enormous number of atoms, when the energies E
1 and E
2 are extremely large compared to the quantum discontinuities of energy." That is to say, infinity assumptions closely approximate the physical system when n is large enough (56bb). Of course, without considering Hawking radiation, we see that the low entropy of a black hole is associated with very few numbers.
Yet entropy is often thought of as a physical quantity. Though it is a quantity related to the physical world, its statistical nature is crucial. One might think of entropy heuristically in terms of chaos theory. In that area of study, it is possible to obtain information about the behavior of the system -- say the iterative logistic equation -- when we can predict transformations analogous to phase shifts at specific points (we jump from one period to another). Even so, Feigenbuam's constant assures that the system's behavior becomes chaotic: the "phase shifts" occur pseudorandomly. Aside from the attractor information, the system has close to zero predictive information for values past the chaos threshold.
On Feigenbaum's constant
http://mathworld.wolfram.com/FeigenbaumConstant.html
So here we see that the tendency to "disorder" rises to a point of "maximum entropy" measurable in terms of Feigenbaum's constant. In the case of the iterative logistic equation, chaos looms as we approach the initial value 3. Between 0 and 3, predictability occurs to varying degree. At 3, we have chaos, which we may regard as maximum entropy.
With respect to chaos and nonlinear dynamics, Kolmogorov-Sinai entropy enters the picture.
K-S entropy is associated with the probabilities of paths of a chaotic system. This entropy is similar to Shannon entropy but the probabilities on the k
n branching paths are in general not equal at the nth set of branches. In his discussion of K-S entropy, Garnett P. Williams (60) puts K-S entropy (61) in terms of the number of possible phase space routes, and he writes:
H Δ t = Σ Ps (1/Ps) running from i = 1 to N
t, which is the number of phase space routes; s denotes sequence probabilities associated with specific paths.
"K-S entropy, represents a rate," notes Williams. "The distinctive or indicative feature about a dynamical system isn't entropy by itself but rather the entropy rate," which is simply the above expression divided by time.
Williams points out that K-S entropy is a limiting value; for discrete systems of observations, two limits are operative: the number that occurs by taking t to infinity in the expression above and letting the data box size (formed by overlaying an nxm grid over the system's output graph) go to zero.
He underscores three interpretations of K-S entropy:
Average amount of uncertainty in predicting the next n events.
Average rate at which the accuracy of a prediction decays as prediction time increases.
Average rate at which information about the system is lost.
This last interpretation is important when discussing what we mean by physical information (absolute versus relative). Also note that the K-S entropy reflects the Shannon redundancy of the system. In other words, even a chaotic system contains structural information.
Now suppose we regard any system as describable as a Turing machine, regarding the output tape's content as a signal.
Questions:
A. What is the ratio of the information in the algorithm to the information in the final output tape?
B. Can we confidently check the output and discern the algorithm (how does this question relate to entropy)?
With respect to B, if we only have the output and no other information, it is impossible to know for certain that it is a readout of a particular algorithm. For example, suppose we encounter the run 0101010101. We have no way of knowing that the next digit, if any, will be a zero. Perhaps a different sequence will be appended, such as 001100110011, or anything.
We can of course use statistical methods to estimate the probability that the next digit is a zero, and so we may think of entropy as related to the general success of our probabilistic method. The fuzzier the estimation, the greater the "entropy."
Recall that if we consider the binomial probability distribution, a rise in entropy accords with the central region of the bell curve. All this says is that systems about which our knowledge is less exact fall toward the center. Considering that the binomial curve converges with the Gaussian normal curve
-- which is good for much of energetics -- it seems that there is indeed a close relationship between physical entropy and Shannon entropy.
With respect to A, perhaps more significant is the point that if we have a Universal Turing Machine and start a program going for each consecutive integer (description number), most will crash: go into a closed loop before printing the first digit (which includes freezing on algorithm step 1) or run forever in a non-loop without printing the first digit.
So the overwhelming majority of integers do not express viable Turing machines, if we are interested, that is, in a computable number. We see immediately that order of this sort is rare. But of course we have here only a faint echo of the concept of entropy. At any rate, we can say that if a TM is nested (its output in the substring [m,n] is a TM description number), we can consider the probability of such a nested TM thus: consider the ratio of the output string length to the substring length; the closer that ratio is to 1, the more orderly or improbable we may evaluate it to be and so as that ratio goes to 0 we can think of this as being "entropy-like."
Also, see my
Note on Wolfram's principle of computational equivalence
http://paulpages.blogspot.com/2013/04/thoughts-on-wolframs-principle-of.html
And the more nested TMs we encounter, each with "low entropy," the more we tend to suspect nonrandom causation.
When we say nonrandom, what we really seem to mean is that the event is not consistent with an accepted probability distribution and so we wonder whether it is more consistent with a different distribution. Still, we might be more interested in the fact that a randomly chosen TM's output string is highly likely to have either a very long period, or to be aperiodic (with aperiodic strings swamping all other outputs). Additionally, not only is the output information likely to be high (in the strict sense), but the input information is also likely to be high with respect to what humans, even assisted by computers, normally grapple with.
As for an arbitrary member of the set of TMs that yields computables, the algorithmic information (in the Chaitin-Kolmogrov sense) is encapsulated by that TM's complexity, or ratio of input bits (which include specific initial values with the bit string describing the specific program) to output bits. In this respect, maximally complex TMs are found at the mean of the normal curve, corresponding to equilibrium entropy. This follows from the fact that there are far more aperiodic outputs than periodic ones. An aperiodic output of bit length n requires m steps such that at the infinite limit, m = n.
It is straightforward that in this case TMs of maximum complexity accord with maximum entropy found at the normal curve mean.
Note, however, that here maximum complexity is not meant to suggest some arbitrary string having probability 2
-n. An arbitrary string of that length, singled out from the set of possible combinations, can be construed as "maximally complex," as when talking of a lifeform describable by a specific bit string of length n, versus the set of bit strings of that length.
Where we really come to grips with entropy is in signal-to-noise ratio, which is obvious, but what we want to do is consider any TM as a transducer and its output as a signal, thus apparently covering all (except for the cosmos itself) physical systems. What we find is that no computer hardware can print a software output with 100% infallibility. The reason is that quantum fluctuations in the circuit occur within constrained randomness, meaning occasionally a 1 is misrepresented as a 0, and this can't be prevented. As Shannon proved with his noiseless channel theorem, employment of error correction codes can drastically reduce the transmission of errors, but, I suggest, only in the ideal case can we have a noise-free channel.
Shannon's groundbreaking paper
http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
So, no matter how well designed a system, we find that quantum effects virtually guarantee noise in a sufficiently long signal. If the signal is reprocessed (and possibly compiled) so as to serve as an input value, that noise may well be amplified in the next output, or, that is, the next output waveform will have its own noise plus the original noise. We see here that an increase in entropy (noise) is strictly related to the rules of quantum mechanics, but is also related to what is defined as a system. (The relationship of noise to dissipated heat is apparent here.)
Yet, Shannon's noiseless channel theorem means that in the theoretical limit, information can be conserved. No one has come up with a perfectly efficient method of transmission, although even simple error-correction systems exceed 90 percent efficiency. Yet, the physical entropy of the transmitter and receiver guarantees that over time transmission is not noiseless.
If we use an electronic or optical cable for the channel, we have these sorts of noise that affect the data stream:
Impulse noise, which is noticeable for low intensity signals. It is noticeable at a scale where quantum fluctuations become important.
Shot noise, which tends to be Poisson distributed, is often construed as a sum of impulse noises.
Thermal noise (Johnson-Nyquist noise) resulting from excited atoms in the channel exchanging energy quanta with the electronic or photonic current.
1/f noise is not well understood, but has the characteristic that it follows a power law distribution, which can be obtained by analysis of the normal curve.
Crosstalk occurs when signals of two channels (or more) interfere, sending one or more unwanted sub-signals down one or both channels. The undesired data stream may be decodable by the receiver (you overhear a part of another conversation, for example).
Gaussian or white noise is normally distributed when the data stream is analyzed. Gaussian noise tends to be composed of a number of uncorrelated or weakly correlated sources, as in shot noise, thermal noise, black body noise (from warm objects such as the earth) and celestial noise, as in cosmic rays and solar particles. The lack of correlation accounts for the normal curve sort of randomness.
In the case of 1/f noise, we can approach this issue by considering noise to be an anti-signal, where the maximum entropy Gaussian noise defines a complete anti-signal. (We note in passing that an anti-signal corresponds to the condition of maximum Shannon information.) So a 1/f anti-signal is composed of units with conditional probabilities. From this, we argue that in general a 1/f anti-signal has anti-information ~I1/f less than ~Iwhite. This gives us ~Iwhite - Ic = ~I1/f. That is to say, the 1/f anti-signal contains structural (or propensity) information that reduces the noise.
Also affecting the signal:
Attenuation, or decreasing power (or amplitude of a specific frequency) along the channel. This is a consequence of cumulative effects of such things as thermal noise.
The signal's wave packet is composed of different wavelengths, which correspond to different frequencies. These frequencies over time get out of synchrony with one another. Envelope delay distortion is a function of the amount of delay among frequency components.
Phase jitter occurs when the signal components are not in phase. If viewed on an oscilloscope, the signal appears to wiggle horizontally.
Amplitude jitter occurs when the amplitude varies.
Nonlinear distortion is seen when the harmonics of the fundamental signal frequency are not equally attenuated.
Transients occur when the signal's amplitude abruptly changes. Impulse noise is often the agent behind transients (56c).
So we can perhaps see Brillouin's point that there is for virtually all practical systems a lower bound of physical entropy that cannot be avoided, which decreases the signal information. We see that the physical entropy of the channel corresponds well with Shannon's entropy -- if we are not considering long-term entropy. Suppose some error-correction process is found that reduces signal loss to zero, in accord with Shannon's noiseless channel theorem. The device that does the error correcting must itself experience entropy ("degradation" of energy) over time, as must the channel, the transducer and the receiver.
Scholarpedia article on 1/f noise
http://www.scholarpedia.org/article/1/f_noise
Shot noise
https://en.wikipedia.org/wiki/Shot_noise
This brings us to the assumption that the information "in" or representing some block of space or spacetime can be said to exist platonically, meaning that not only do we filter out the observer but we also filter out the transmitter, the receiver and the information channel. This seems to be what some physicists have in mind when they speak of the information of a sector of space or the entire cosmos. But this assumption then requires us to ignore the physical entropy of the equipment. Even if we use an arbitrary physical model, we should, I would say, still include a lower limit for the physical entropy. In other words, even idealized information should be understood to be subject to idealized decay.
Another non-trivial point. We have much argument over whether the entropy of the universe was "very high" near the Big Bang, before quantum fluctuations pushed the system into asymmetries, hence increasing entropy. Yet, if we consider the cosmos to be a closed system, then the total entropy is constant, meaning that the current asymmetries will smooth out to nearly symmetrical. Of course, if space expands forever, then we must wonder whether, for entropy calculation, the cosmos system is closed. Supposing it is a higher-dimensional manifold that is topologically cohesive (closed), then it is hard to say what entropy means, because in an (n greater than 3 + t) spacetime block, the distances between points "transcend" space and time. Now as entropy is essentially the calculation of a large effect from the average of many tiny vectors representing points in space and time, we face the question of whether our probabilistic measuring rod is realistic at the cosmic scale. And, if that is the case, we must beware assuming that probabilistic reasoning is both consistent and complete (which, by Goedel, we already know it can't be).
Brillouin shows that Boltzmann's equation (using Planck's energy quanta rather than work energy) can be obtained by the assumption of continuity, which permits the taking of derivatives, which then establishes that the most probable final state of two blocks in thermal communication is equality of temperature for both blocks. The continuity assumption requires "large material systems, containing an enormous number of atoms, when the energies E1 and E2 are extremely large compared to the quantum discontinuities of energy" (57).
That is to say, the approximation that deploys an infinitude only works when the number of micro-states is very large. So then, when we're in the vicinity of a black hole or the Big Bang, the entropy is considered to be very low. There are few, if any, identifiable particles (excluding those outside the scope of investigation). Hence, there is a serious question as to whether the continuity approximation applies, and whether entropy -- in the physical sense -- is properly defined at this level.
A common view is that, as a direct implication of the Second Law, there had to be extreme organization present at the Big Bang.
Consider the analogy of a bomb before, or at the instant, of detonation. The bomb is considered to have high organization, or information, being representable as a bit string that passes various tests for nonrandomness, whereas if a snapshot is taken of the explosion at some time after initiation and it is expressed as a bit string, that string would almost certainly show apparent randomness. How is the bomb more ordered than the explosion? Upon seeing the bomb's string representation we might say we have reason to believe the string is part of a small subset of TM output tapes, as opposed to the blast particles, the representative string appearing to be part of a larger subset of TM tapes.
Or, we might say that the string representing the blast is found within a standard deviation of the normal curve mean and that the string representing the bomb is found in a tail and so might be indicative of some unknown probability distribution (assuming we wish to avoid the implication of design).
In Cycles, Penrose talks about scanning cosmic background radiation to see whether there is a "significant deviation from 'Gaussian behavior'." (58) So here the idea is that if we don't see a normal distribution, perhaps we may infer "higher order" -- or, really, a different probability distribution that we construe as special.
Cosmic-scale entropy poses difficulties:
A. If the cosmos cannot be modeled as a TM (which it cannot), then information and entropy become undefined at cosmic scale.
B. Nonrandom information implies either intelligence or an undetermined secondary probability distribution. We note that the Second Law came about as a result of study of intelligently designed machines. At any rate, with respect to the Big Bang, how proper is it to apply probabilities (which is what we do when we assign an entropy value) to an "event" about which we know next to nothing?
If the universe were fully deterministic (a Laplacian clockwork), has a sufficiently long life and is closed, every event will repeat, and so at some point entropy, as seen from here and now, must begin to decrease (which is the same as saying the system's equilibrium entropy is a constant). Still it is the case that, as said, quantum fluctuations ensure that exact deterministic replicability is virtually impossible. This is very easily seen in the Schroedinger cat thought experiment (discussed in Noumena II, Part VI).
On the arrow of time. There is no arrow without consciousness. Newton's absolute time, flowing equably like a river, is a shorthand means of describing the human interface with nature -- a simplification that expediently neglects the brain's role.
And of course, what shall dark matter and dark energy do to entropy? One can speculate, but we must await developments in theoretical physics.
Interestingly, like the Big Bang theory, the out-of-fashion steady state theories predicted that distant galaxies should be accelerating away from us.
Three features of the steady state universe are found in currently accepted observational facts: the constant density of the universe, its seemingly flat or nearly flat, geometry, and its accelerating expansion (explained by steady staters as a consequence of the creation of new matter) (59). (By "seemingly flat," we mean that gigantic triangles in deep space do not vary from 180 degrees within measurement error.)
Summarizing:
1. Our concept of informational entropy (noise) increases toward the past and the future because of limits of memory and records -- though "absolute realists" can argue that the effect of observer, though present, can be canceled.
2. Entropy also increases toward the past because of the quantum measurement problem. We can't be sure what "the past" is "really." In a quantum system, phase space may after all be aperiodic so that it is undecidable whether a low entropy state is returned to.
3. It is questionable whether the entropy concept is applicable to the whole universe. I suspect that Goedel's incompleteness theorem and Russell's paradox apply.
Appendix to Part V
Statement:
The set of finite samples of a population is normally distributed.
Proof:
A finite population is described by a set of n elements. If we regard a sample as a subset of the population, then the set of all samples is the power set of the population set. The power set always has k
n elements, where k is the base of the number system used. Any integer-base system can be represented in base 2. So then the population and its power set can be represented in base 2. Whatever the distribution of the population, the power set always has a symmetric binomial distribution of member bit strings of lengths between 0 and n. (We can see this symmetry by the fact that every bit string has a mirror image, as in 01 and 10.) But, it has been proved that the binomial distribution converges to the normal distribution as n goes to infinity.
Hence, for n sufficiently large (generally put at 30 or greater), the power set of the population set is approximately normally distributed, which is to say that the finite set of samples of an arbitrary population is approximately normally distributed.
Note that if the population is infinite, the set of all samples corresponds to the power set of all positive integers, which is given as the Cantorian cardinal number
2N,
which is equal to the cardinal,
R, for the set of real numbers. In other words, members of an infinite set of samples are one-to-one and onto with the set of real numbers. Yet, most reals are noncomputable, and so most samples are inaccessible.
By the Church-Turing thesis, it is believed that any computation can, in theory, be done by a Turing machine [very basic computer program]. Of course, any output can be represented as a bit string, which represents a unique number. Hence, there is at least one Turing machine for every possible bit string output (inclusive of all the computable irrationals). Each TM is a program the elements of which are assigned integers. As the design elements are in a specific order, these integers are strung together to form the unique description number (DN) of an arbitrary TM. (The uniqueness of the description number can be assured, for example, by using a base 2 representation with two symbols, say r,s, reserving t for the space between base 2 integers and then regarding the string (as in, rrstrssts) as a unique base 3 number; it is also possible in theory to use order-preserving Goedel numbers.)
We see that the set of computables, each computable being tagged with a unique finite integer, is bijective with an infinite subset of the integers. It is known that such a subset is bijective with the entire set
N. (Still, a computable may correspond to more than one DN, especially with respect to irrationals, but this does not affect our result.) Now it is not clear that the description bit string for TM
X has a legal mirror image. A design that works in the "forward" direction may not work in the "backward" direction. That is to say, the mirror image bit string might not represent a TM that is able to begin issuing a printout tape in finite time.
Even so, it is clear that if TM
A prints out a bit string, then TM
A can be modified into TM
B to, at every step, swap a u for a v and a v for a u. A single sub-program -- call it tm
x -- should work for all effective TMs, as it simply modifies the output of each effective TM. In other words, tm
x is simply fed the output for an effective TM, meaning DN
x is simply appended to DN
X. We will say that TM
B has DN
X,x. (To assure that a number exists, we place a binary point at the head of any string.)
(We needn't concern ourselves with the issue that TM
A's output can be recovered via a TM with DN
X,x,x.)
The ratio of the DN
X,x (a constant) to the set of all DNs of effective TMs is 0. And so as n gets larger, the difference between the DN
X,x and DN
X becomes vanishingly small. So that for "effective" DN
X of length m, there is an "effective" DN
X,x of length m such that n and m are approximately equal. This means that we have a nearly symmetrical binomial graph, with the set of DN
X's on one side of the median in descending order by height and the set of DN
X,x's on the other side of the median in descending order.
Hence, the set of DNs for computables has close to a binomial distribution when n is finite and a normal distribution when n is (denumerably) infinite. This shows that the set of descriptions of accessible samples of a denumerably infinite set, whatever its density function, is normally distributed.
Caveat i: This result requires acceptance of DN
X,x, while ignoring DN
X,x...x.
Caveat ii: If we ignore the infinitude of zeros that can be placed to the right of a base k rational found between 0 and 1, then each rational has a finite bit representation (assuming there is a number corresponding to an instruction that says "this cycle repeats indefinitely so the machine halts here"). This subset is bijective with the integers. We also have the subset of computable irrationals. A TM prints out a base k number at any step n, such that by the nth step the "halt" number is never computed. It is also bijective with the integers. So in this sense, we cannot say that the set of computables, which is the set of samples, is normally distributed (we cannot define a symmetric binomial curve with this data).
Yet, if we permit only samples of finite size, then the set of samples is normally distributed.
52. A previous Conant discussion on entropy:
http://paulpages.blogspot.com/2013/10/drunk-and-disorderly-rise-and-fall-of.html
53aa. Bruce Hood is director of Bristol Cognitive Development Centre, University of Bristol, and author of The Self Illusion: how the social brain creates identity (Oxford, 2012).
54. Jan von Plato in The Probability Revolution Vol 2: ideas in the sciences. Kruger, Gigerenzer, and Morgan, editors (MIT Press 1987).
54a. Referenced in a Wikipedia account and in a number of other internet pages.
55. E.T. Jaynes: Papers on probability, statistics and statistical physics, R.D. Rosenkrantz editor (D. Reidel 1983).
55aa. Science and Information Theory, Second Edition, by Leon Brillouin (Dover 2013 reprint of Academic Press 1962 edition; first edition, 1956).
56. Jaynes, Papers
56a. See Gravitation by Charles W. Misner, Kip S. Thorne and John Archibald Wheeler (W.H. Freeman, 1970, 1971). The exhaustive tome on general relativity is sometimes known as "the phone book."
56bb. Science and Information Theory, Brillouin.
56b. Science and Information Theory, Brillouin.
56c. This set of noise definitions was in part taken from Internetworking: A Guide to Network Communications by Mark A. Miller (M&T Books, 1991).
57. Science and Information, Brillouin.
58. Cycles of Time: An extraordinary new view of the universe by Roger Penrose (The Bodley Head, 2010).
59. Conflict in the Cosmos: Fred Hoyle's life in science by Simon Mitton (Joseph Henry Press, 2005).
60. Chaos Theory Tamed by Garnett P. Williams (Joseph Henry Press, 1997).
61. Alternative names for K-S entropy, given by Williams, are source entropy, measure-theoretic entropy, metric-invariant entropy and metric entropy.
Part VI
Noumena I: Spacetime and its discontents
Newton with his belief in absolute space and time considers motion a proof of the creation of the world out of God's arbitrary will, for otherwise it would be inexplicable why matter moves in this [relative to a fixed background frame of reference] rather than any other direction. -- Hermann Weyl (60).
Weyl, a mathematician with a strong comprehension of physics, had quite a lot to say about spacetime. For example, he argued that Mach's principle, as adopted by Einstein, was inconsistent with general relativity.
Background on Weyl
http://plato.stanford.edu/entries/weyl/#LawMotMacPriWeyCosPos
Weyl's book Symmetry online
https://archive.org/details/Symmetry_482
See also my paper,
Einstein, Sommerfeld and the Twin Paradox
https://cosmosis101.blogspot.com/2017/10/einstein-sommerfeld-and-twin-paradox.html
Einstein had hoped to deal only with "observable facts," in accord with Mach's empiricist (and logical positivist) program, and hence to reduce spacetime motions to relative actions among bodies, but Weyl found that such a level of reduction left logical holes in general relativity. One cannot, I suggest, escape the background frame, even if it is not a strictly Newtonian background frame. Sometimes this frame is construed to be a four-dimensional spacetime block.
So how would one describe the "activity" of a four-dimensional spacetime block? Something must be going on, we think, yet, from our perspective looking "out," that something "transcends" space and time.
Popper, in his defense of phenomenal realism, objected that the spacetime block interpretation of relativity theory implies that time and motion are somehow frozen, or not quite real. While not directly challenging relativity theory, he objected to such a manifold and ruled it out as not in accord with reality as he thought reality ought to be. But, we hasten to make note of Popper's trenchant criticism of the logical positivism of most scientists.
My thought: "Laws" of nature, such as Einstein's law of universal gravitation, are often thought of in a causative sense, as in "the apple drops at 9.81 meters per second squared by cause of the law of gravity."
Actually, the law describes a range of phenomena that are found to be predictable via mathematical formulas. We have a set of observable relations "governed" by the equations. If something has mass or momentum, we predict that it will follow a calculated trajectory. But, as Newton knew, he had an algorithm for representing actions in nature, but he had not got to the world beneath superficial appearances. How does gravity exercise action at a distance? If you say, via curved spacetime fields, one may ask, how does spacetime "know" to curve?
We may say that gravity is a principle cause of the effect of a rock falling. But, in truth, no one knows what gravity is. "Gravity" is a word used to represent behavior of certain phenomena, and that behavior is predictable and calculable, though such predictability remains open to Hume's criticism.
On this line, it should be noted that Einstein at first resisted basing what became his General Theory of Relativity on geometrical (or, topological) representations of space and time. He thought that physical insight should accompany his field equations, but eventually he settled on spacetime curvature as insight enough. His competition with David Hilbert may well have spurred him to drop that proviso. Of course, we all know of his inability to accept the lack of "realism" implied by quantum mechanics, which got the mathematics right but dispensed with certain givens of phenomenal realism. To this end, we note that he once said that he favored the idea that general relativity's mathematics gave correct answers without accepting the notion that space and time were "really" curved.
Newton had the same problem: There was, to him, an unsatisfactory physical intuition for action at a distance. Some argue that this difficulty has been resolved through the use of "fields," which act as media for wave motion. The electromagnetic field is invoked as a replacement for the ether that Einstein ejected from physics as a useless concept. Still, Einstein saw that the field model was intuitively unsatisfactory.
As demonstrated by the "philosophical" issues raised by quantum theory, the problem is the quantization of energies needed to account for chains of causation. When the energy reaches the quantum level, there are "gaps" in the chain. Hence the issue of causation can't easily be dismissed as a problem of "metaphysics" but is in truth a very important area of discussion on what constitutes "good physics."
One can easily visualize pushing an object, but it is impossible to visualize pulling an object. In everyday experience, when one "pulls," one is in fact pushing. Yet, at the particle level, push and pull are complementary properties associated with charge sign. This fact is now sufficiently familiar as not to seem occult or noumenal. Action at a distance doesn't seem so mysterious, especially if we invoke fields, which are easy enough to describe mathematically, but does anyone really know what a field is? The idea that gravitation is a macro-effect from the actions of gravitons may one day enhance our understanding of nature. But that doesn't mean we really know what's going on at the graviton level.
Gott (61), for example, is representative of numerous physicists who see time as implying many strange possibilities. And Goedel had already argued in the 1940s that time must not exist at all, implying that it is some sort of illusion. Goedel had found a solution to Einstein's field equations of general relativity for a rotating universe in which closed time loops exist, meaning a rocket might travel far enough to find itself in its past. Einstein shrugged off this finding of his good friend, arguing that it does not represent physical reality. But Goedel countered that if such a solution exists at all, then time cannot be what we take it to be and doesn't actually exist (62).
These days, physicists are quite willing to entertain multiple dimension theories of cosmology, as in the many dimensions of string theory and M theory.
We have Penrose's cyclic theory of the cosmos (63), which differs from previous Big Bang-Big Crunch cyclic models. Another idea comes from Paul J. Steinhardt, who proposes an "ekpyrotic universe" model. He writes that his model is based on the idea that our hot big bang universe was formed from the collision of two three-dimensional worlds moving along a hidden, extra dimension. "The two three-dimensional worlds collide and 'stick,' the kinetic energy in the collision is converted to quarks, electrons, photons, etc., that are confined to move along three dimensions. The resulting temperature is finite, so the hot big bang phase begins without a singularity."
Steinhardt on the ekpyrotic universe
http://wwwphy.princeton.edu/~steinh/npr/
The real point here is that spacetime, whatever it is, is rather strange stuff. If space and time "in the extremes" hold strange properties, should we not be cautious about assigning probabilities based on absolute Newtonian space and equably flowing time? It is not necessarily a safe assumption that what is important "in the extremes" has no relevance locally.
And yet, here we are, experiencing "time," or something. The difficulty of coming to grips with the meaning of time suggests that beyond the phenomenal world of appearances is a noumenal world that operates along the lines of Bohm's implicate order, or -- to use his metaphor -- of a "holographic universe."
But time is even more mind-twisting in the arena of quantum phenomena (as discussed in Noumena II, below).
The "anthropic cosmological principle" has been a continuing vexation for cosmologists (64). Why is it that the universe seems to be so acutely fine-tuned to permit and encourage human life? One answer is that perhaps we are in a multiverse, or collection of noninteracting or weakly interacting cosmoses. The apparent miniscule probability that the laws and constants are so well suited for the appearance of humans might be answered by increasing the number and variety of cosmoses and hence increasing the distribution of probabilities for cosmic constants.
The apparent improbability of life is not the only reason physicists have for multiverse conjectures. But our concern here is that physicists have used probabilistic reasoning on a question of the existence of life. This sort of reasoning is strongly reminiscent of Pascal's wager and I would argue that the question is too great for the method of probability analysis. The propensity information is far too iffy, if not altogether zero. Yet, that doesn't mean the problem is without merit. To me, it shows that probability logic cannot be applied universally and that it is perforce incomplete. It is not only technically incomplete in Goedel's sense, it is incomplete because it fundamentally rests on the unknowable.
Paul Davies, in the
Guardian newspaper, wrote: "The multiverse comes with a lot of baggage, such as an overarching space and time to host all those bangs, a universe-generating mechanism to trigger them, physical fields to populate the universes with material stuff, and a selection of forces to make things happen. Cosmologists embrace these features by envisaging sweeping 'meta-laws' that pervade the multiverse and spawn specific bylaws on a universe-by-universe basis. The meta-laws themselves remain unexplained -- eternal, immutable transcendent entities that just happen to exist and must simply be accepted as given. In that respect the meta-laws have a similar status to an unexplained transcendent god." Davies concludes, "Although cosmology has advanced enormously since the time of Laplace, the situation remains the same: there is no compelling need for a supernatural being or prime mover to start the universe off. But when it comes to the laws that explain the big bang, we are in murkier waters."
Davies on the multiverse
http://www.theguardian.com/commentisfree/belief/2010/sep/04/stephen-hawking-big-bang-gap
Noumena II: Quantum weirdness
The double-slit experiment
The weird results of quantum experiments have been known since the 1920s and are what led Werner Heisenberg to his breakthrough mathematical systemization of quantum mechanics.
An example of quantum weirdness is the double-slit experiment, which can be performed with various elementary particles. Consider the case of photons, in which the intensity of the beam is reduced to the point that only one photon at a time is fired at the screen with the slits. In the case where only one slit is open, the photo-plate detector on the other side of the screen will record basically one spot where the photons that make it through the slit arrive in what one takes to be a straight line from source to detector.
Still, when two slits are open the photons are detected at different places on the plate. The positions are not fully predictable, and so are random within constraints. After a sufficient number of detections, the trained observer notices a pattern. The spots are forming a diffraction pattern associated with how one would expect a wave going through the two slits to separate into two subwaves, and simultaneously interact, showing regions of constructive and destructive wave interference. Yet, the components of the "waves" are the isolated detection events. From this effect, Max Born described the action of the particles in terms of probability amplitudes, or, that is, waves of probability.
This is weird because there seems to be no way, in terms of classical causality, for the individual detection events to signal to the incoming photons where they ought to land. It also hints that the concept of time isn't what we typically take it to be. In other words, one might interpret this result to mean that once the pattern is noticed, one cannot ascribe separate time units to each photon (which seems to be what Popper, influenced by Landes, was advocating). Rather, it might be argued that after the fact the experiment must be construed as an irreducible whole. This bizarre result occurs whether the particles are fired off at the velocity of light or well below it.
Schroedinger's cat
In 1935, Erwin Schroedinger, proposed what has come to be known as the Schroedinger's cat thought experiment in an attempt to refute the idea that a quantum property in many experimental cases cannot be predicted precisely, but can only be known probabilistically prior to measurement. Exactly what does one mean by measurement? In the last analysis, isn't a measurement an activity of the observer's brain?
To underscore how ludicrous he thought the probability amplitude idea is, Schroedinger gave this scenario: Suppose we place a cat in a box that contains a poison gas pellet rigged to a Geiger counter that measures radioactive decay. The radioactive substance has some suitable half-life, meaning there is a probability that it is detected or not.
Now, in the standard view of quantum theory there is no routine causation that can be accessed that gives an exact time that the detection will occur. So then, the property (in this case, the time of the measurement) does not exist prior to detection but exists in some sort of limbo in which the quantum possibilities -- detection at time interval x and non-detection at time interval x -- are conceived of as a wave of probabilities, with the potential outcomes superposed on each other.
So then, demanded Schroedinger, does not this logically require that the cat is neither alive nor dead prior to the elapse of the specified time interval?! Of course, once we open the box, the "wave function collapses" and the cat's condition -- dead or alive -- tells us whether the quantum event has been detected. The cat's condition is just as much of a detection event as a photo-plate showing a bright spot.
Does this not then mean that history must be observer-centric? No one was able to find a way out of this dilemma, despite many attempts (see
Toward). Einstein conceded that such a model was consistent, but rejected it on philosophical grounds. You don't really suppose the moon is not there when you aren't looking, he said.
The EPR scenario
In fact, also in 1935, Einstein and two coauthors unveiled another attack on quantum weirdness known as the Einstein-Podolsky-Rosen (EPR) thought experiment, in which the authors pointed out that quantum theory implies what Einstein called "spooky action at a distance" that violated C, the velocity of light in a vacuum, which is an anchor of his theory of relativity. Later John Bell found a way to apply a test to see whether statistical correlation would uphold the spooky quantum theory. Experiments by Alain Aspect in the 1980s and by others have confirmed, to the satisfaction of most experts, that quantum "teleportation" occurs.
So we may regard a particle as carrying a potential for some property or state that is only revealed upon detection. That is the experiment "collapses the wave function" in accordance with the property of interest. Curiously, it is possible to "entangle" two particles of the same type at some source. The quantum equations require that each particle carries the complement property of the other particle -- even though one cannot in a proper experiment predict which property will be detected first.
Bohm's version of EPR is easy to follow: An electron has a property called "spin." Just as a screw may rotate left or right, so an electron's spin is given as "up" or "down," which is where it will be detected in a Stern-Gerlach device. There are only two possibilities, because the electron's rotational motion is quantized into halves -- as if the rotation jumps immediately to its mirror position without any transition, just as the Bohr electron has specific discontinuous "shells" around a nucleus.
Concerning electron spin
http://hyperphysics.phy-astr.gsu.edu/hbase/spin.html
So if we entangle electrons at a source and send them in different directions, then, quantum theory declares that if we detect spin "up" at detector A, it is necessarily so that detector B ought to read spin "down."
In that case, as Einstein and his coauthors pointed out, doesn't that mean that the detection at A required a signal to reach B faster than the velocity of light?
For decades, EPR remained a thought experiment only. A difficulty was that detectors and their related measuring equipment tend to be slightly cranky, giving false positives and false negatives. It may be that error correction codes might have reduced the problem, but it wasn't until Bell introduced his statistical inequalities that the possibility arose of conducting actual tests of correlation.
In the early 1980s Aspect arranged photon experiments that tested for Bell's inequalities and made the sensational discovery that the correlation showed that Einstein was wrong and that detection of one property strongly implied that its "co-particle" would detect for the complement property. (We should tip our hats both to Schroedinger and Einstein for the acuity of their thought experiments.) Further, Aspect did experiments in which monitors were arranged so that any signal from one particle to another would necessarily exceed the velocity of light. Even so, the spooky results held.
This property of entanglement is being introduced into computer security regimens because if, say, the NSA or other party is looking at the data stream, the use of some entangled particles can be used to tip off the sender that the stream is being observed.
Hidden variables
John Von Neumann, contradicting Einstein, published a proof that quantum theory was complete in Heisenberg's sense and that "hidden variables" could not be used to devise causal machinery to explain quantum weirdness. Intuitively, one can apprehend this by noting that if one thinks of causes as undetected force vectors, then Planck's constant means that there is a minimum on the amount of force (defined in terms of energy) that can exist insofar as detection or observation. If we think of causes in terms of rows of dominoes fanning out and at points interacting, we see there is nothing smaller than the "Planck domino." So there are bound to be gaps in what we think of as "real world causation."
Popper objected to Von Neumann's claim on grounds that after it was made, discoveries occurred in the physics of the nucleus that required "new" variables. Yet if hidden variables are taken to mean the forces of quantum chromodynamics and the other field theories, these have no direct influence on the behaviors of quantum mechanics (now known as quantum field theory). Also, these other theories are likewise subject to quantum weirdness, so if we play this game, we end up with a level where the "variables" run out.
We should note that by "hidden variable," Von Neumann evidently had in mind the materialist viewpoint of scientists like Bohm, whose materialism led him to reject the minimalist ideas of the "Copenhagen interpretation" whereby what one could not in principle observe simply doesn't count. Instead, Bohm sought what might be called a pseudo-materialist reality in which hidden variables are operative if one concedes the bilocality inherent in entanglement. In fact, I tend to agree with Bohm's view of some hidden order, as summarized by his "holographic universe" metaphor. On the other hand, I do not agree that he succeeded in his ambition to draw a sharp boundary between the "real external world" and subjective perception.
Bohm quotes John Archibald Wheeler:
"No phenomenon is a phenomenon until it is an observed phenomenon" so that "the universe does not exist 'out there' independently of all acts of observation. It is in some strange sense a participatory universe. The present choice of mode of observation... should influence what we see about the past... the past is undefined and undefinable without the observation" (65) [Wheeler's point is a major theme of my paper
Toward.]
"We can agree with Wheeler that no phenomenon is a phenomenon until it is observed, as by definition, a phenomenon is what appears. Therefore it evidently cannot be a phenomenon unless it is the content of an observation," Bohm says, adding, "The key point in an ontological interpretation such as ours is to ask the question as to whether there is an underlying reality that exists independently of observation" (66).
Bohm argues that a "many minds" interpretation of quantum effects removes "many of the difficulties with the interpretation of [Hugh] Everett and [Bryce] DeWitt (67), but requires making a theory of mind basically to account for the phenomena of physics. At present we have no foundations for such a theory..." He goes on to find fault with this idea.
And yet, Bohm sees that "ultimately our overall world view is neither absolutely deterministic nor absolutely indeterministic," adding: "Rather it implies that these two extremes are abstractions which constitute different views or aspects of the overall set of appearances" (68).
So perhaps the thesis of determinism and the antithesis of indeterminism resolve in the synthesis of the noumenal world. In fact, Bohm says observables have no fundamental significance and prefers an entity dubbed a "be-able," again showing that his "implicate order" has something in common with our "noumenal world." And yet our conceptualization is at root more radical than is his.
One specialist in relativity theory, Kip S. Thorne (69), has expressed a different take. Is it possible that the spacetime continuum, or spacetime block, is multiply connected? After all if, as relativity holds, a Riemann topology holds for expressing spacetime, then naive Euclidean space is not operative, except vanishingly close to the curvature functions. So in that case, it shouldn't be all that surprising that spacetime might have "holes" connecting one region to another. Where would such wormholes be most plausible? In black holes, Thorne says. By this, the possibility of a "naked singularity" is addressed. The singularity is the point at which Einstein's field equations cease to be operative; the presumed infinitely dense point at the center of mass doesn't exist because the wormhole ensures that the singularity never occurs; it smooths out spacetime (70).
One can see an analog of this by considering a sphere, which is the surface of a ball. A wormhole would be analogous to a straight-line tunnel connecting Berlin and London by bypassing the curvature of the Earth. So on this analogy, one can think of such tunnels connecting different regions of spacetime. The geodesic -- analogous to a great circle on a sphere -- yields the shortest distance between points in Einstein spacetime. But if we posit a manifold, or cosmic framework, of at least five dimensions then one finds shortcuts, topologically, connecting distinct points on the spacetime "surface." Does this accord with physical reality? The answer is not yet in.
Such wormholes could connect different points in time without connecting different regions of space, thereby setting up a time travel scenario, though Thorne is quoted as arguing that his equation precludes time travel paradoxes.
Thorne's ideas on black holes and wormholes
https://en.wikipedia.org/wiki/Kip_Thorne
The standard many-worlds conjecture is an interpretation of quantum mechanics that asserts that a universal wave function represents objective phenomenal reality. So there is no intrinsically random "collapse of the wave function" when a detection occurs. The idea is to be rid of the Schroedinger cat scenario by requiring that in one world the cat is alive and in another it is dead. The observer's world is determined by whether he detects cat dead or cat alive. These worlds are continually unfolding.
The key point here is the attempt to return to a fully deterministic universe, a modern Laplacian clockwork model. Yet, as the observer is unable to foretell which world he will end up in, his ignorance (stemming from randomness1 and randomness2) is tantamount to intrinsic quantum randomness (randomness3).
In fact, I wonder how much of a gain there is in saying Schroedinger's cat was alive in one world and dead in another prior to observation as opposed to saying the cat was in two superposed states relative to the observer.
On the other hand it seems likely that Hawking favors the notion of a universal wave function because it implies that information represents hard, "external" reality. But even so, the information exists in superposed states as far as a human observer is concerned.
At present, there is no means of calculating which of Everett's worlds the cat's observer will find himself in. He can only apply the usual quantum probability methods.
Time-bending implications
What few have understood about Aspect's verification of quantum results is that time itself is subject to quantum weirdness.
A logical result of the entanglement finding is this scenario:
We have two detectors: A, which is two meters from the source and B, which is one meter distant. You are positioned at detector A and cannot observe B. Detector A goes off and registers, say, spin "down." You know immediately that Detector B must read spin "up" (assuming no equipment-generated error). That is to say, from your position, the detector at B went off before your detector at A. If you like, you may in principle greatly increase the scale of the distances to the detectors. It makes no difference. B seems to have received a signal before you even looked at A. It's as if time is going backward with respect to B, as far as you are concerned.
Now it is true that a great many physicists agree with Einstein in disdaining such scenarios, and assume that the picture is incomplete. Still, incomplete or not, the fact is that the observer's sense of time is stood on its head. And this logical implication is validated by Aspect's results.
Well, you may respond that it might be that the "information" is transmitted instantaneously. In other words, the instant the detector nearer the source goes off, the result is transmitted at infinite velocity to the farther detector. In that case, however, we are left agape. Instantaneous motion isn't really motion, it is
translation without motion, as when Jesus translated a boatful of followers across the Sea of Galilee, with no sense of time passing.
On the other hand,
translation
is a good word to describe quantum jumps, whereby a quantum particle climbs the ladder of energy levels with no motion between rungs -- as if each quantum state were a still frame in a cinema film. And there seems to be no limit, in principle, to the length of some quantum jumps. What happens between two successive quantum states? It appears that time, as well as time's complement, motion, is suspended: successive quantum states are a succession of frozen frames in a "higher manifold" reel. We have the weird effect that the quantum particle "moves" along some finite length as a succession of frozen states, in a weird update of Zeno's paradoxes of motion. But the, what the deuce happened to time and motion?
So if a quantum translation occurred between the two particles such that the nearer detector instantly alerted the farther detector, we may at first think we have preserved the arrow of time. But, on the other hand, we preserve that arrow at the expense of accepting the annihilation of space and time between two "events," implying that these events are linked by some noumenal tunnel that is undetectable in the phenomenal world.
So preserving the arrow of time doesn't get rid of quantum weirdness, though it does annihilate time.
Let us extend our experimentally doable scenario with a thought experiment reminiscent of Schroedinger's cat. Suppose you have an assistant stationed at detector B, at X kilometers from the source. You are at X/2 kilometers from the source. Your assistant is to record the detection as soon as the detector goes off, but is to wait for your call to report the result. As soon as you look at A, you know his property will be the complement of yours. So was he in a superposed state with respect to you? Obnoxious as many find this, the logical outcome, based on the Aspect experiments and quantum rules, is yes.
True, you cannot in relativity theory receive the information from your assistant faster than C, thus presenting the illusion of time linearity. And yet, I suggest, neither time nor our memories are what we suppose them to be.
The amplituhedron
When big particle accelerators were introduced, it was found that Richard Feynman's diagrams, though conceptually useful, were woefully inadequate for calculating actual particle interactions. As a result, physicists have introduced a remarkable calculational tool called the "amplituhedron." This is a topological object that exists in higher-dimensional space. Particles are assumed to follow the rules in this object, and not the rules of mechanistic or pseudo-mechanistic and continuous Newtonian and Einsteinian spacetime.
Specifically, it was found that the scattering amplitude equals the volume of this object. The details of a particular scattering process dictate the dimensionality and facets of the corresponding amplituhedron.
It has been suggested that the amplituhedron, or a similar geometric object, could help resolve the perplexing lack of commensurability of particle theory and relativity theory by removing two deeply rooted principles of physics: locality and unitarity.
“Both are hard-wired in the usual way we think about things,” according to Nima Arkani-Hamed, a professor of physics at the Institute for Advanced Study in Princeton. “Both are suspect.”
Locality is the notion that particles can interact only from adjoining positions in space and time. And unitarity holds that the probabilities of all possible outcomes of a quantum mechanical interaction must add up to one. The concepts are the central pillars of quantum field theory in its original form, but in certain situations involving gravity, both break down, suggesting neither is a fundamental aspect of nature.
At this point I interject that an axiom of nearly all probability theories is that the probabilities of the outcome set must equal 1. So if, at a fundamental, noumenal level, this axiom does not hold, what does this bode for the whole concept of probability? At the very least, we sense some sort of nonlinearity here. (At this point we must acknowledge that quantum physicists have for decades used negative probabilities with respect to the situation before the "collapse of the wave function," but "negative unity" is preserved.)
Mark Burgin on negative probabilities
http://arxiv.org/ftp/arxiv/papers/1008/1008.1287.pdf
Wikipedia article on negative probabilities
https://en.wikipedia.org/wiki/Negative_probability
According to the article linked below, scientists have also found a “master amplituhedron” with an infinite number of facets, analogous to a circle in 2D, which has an infinite number of facets. This amplituhedron's volume represents, in theory, the total amplitude of all physical processes. Lower-dimensional amplituhedra, which correspond to interactions between finite numbers of particles, are conceived of as existing on the faces of this master structure.
“They are very powerful calculational techniques, but they are also incredibly suggestive,” said one scientist. “They suggest that thinking in terms of space-time was not the right way of going about this.”
“We can’t rely on the usual familiar quantum mechanical spacetime pictures of describing physics,” said Arkani-Hamed. “We have to learn new ways of talking about it. This work is a baby step in that direction.”
So it indeed looks as though time and space are in fact some sort of illusion.
In my estimate, the amplituhedron is a means of detecting something of the noumenal world that is beyond the world of appearances or phenomena. Quantum weirdness implies that interactions are occurring in a way and place that do not obey our typical perceptual conceits. It's as if, in our usual perceptual state, we are encountering the "shadows" of "projections" from another "manifold."
Simons Foundation article on the amplituhedron
https://www.simonsfoundation.org/quanta/20130917-a-jewel-at-the-heart-of-quantum-physics/
The spacetime block of relativity theory likewise suggests that there is a realm that transcends ordinary energy, time and motion.
Zeno's paradox returns
Motion is in the eye of the beholder.
When an object is lifted to height n, it has a specific potential energy definable in terms of Planck's energy constant. Hence, only the potential energies associated with multiples of Planck's constant are permitted. In that case, only heights associated with those potential energies are permitted. When the object is released and falls, its kinetic energy increases with the acceleration. But the rule that only multiples of Planck's constant are permitted means that there is a finite number of transition heights before the object hits the ground. So what happens between quantum height y and quantum height y - 1?
No doubt Zeno would be delighted with the answer:
The macro-object can't cross these quantum "barriers" via what we think of as motion. The macro-object makes a set of quantum jumps across each "barrier," exactly like electrons in an atom jumping from one orbital probability shell to another.
Here we have a clear-cut instance of the "macro-world" deceiving us, when in fact "motion" must occur in quantum jumps. This is important for not only what it says about motion, but also because it shows that the macro-world is highly -- not minimally -- interactive with the "quantum world." Or, that is, that both are highly interactive with some noumenal world that can only be apprehended indirectly.
Even in classical physics, notes Popper in his attack on the Copenhagen interpretation, if acceleration is measured too finely, one finds one gets an indeterminate value, as in a = 0/0 (71).
Even on a cosmic scale, quantum weirdness is logically required.
Cosmic weirdness
Suppose we had a theory of everything (ToE) algorithm. Then at proper time t
awe will be able to get a snapshot of the ToE waveform -- obtained from the the evolving net ToE vector -- from t
a to t
b. It is pointless to decompose the waveform below the threshold set by Planck's constant. So the discrete superpositions of the ToE, which might be used to describe the evolution of the cosmos, cannot be reduced to some continuum level. If they could be reduced infinitely, then the cosmic waveform would in effect represent a single history. But, the fact that the waveform is composed of quantum harmonics means that more than one history (and future) is in superposition.
In this respect, we see that quantum theory requires "many universes," though not necessarily in the sense of Hugh Everett or of those who posit "bubble" universes.
Many will object that what we have is simply an interpretation of the meaning of quantum theory. But, I reply that once hidden variables are eliminated, and granted the success of the Aspect experiments, quantum weirdness logically follows from quantum theory.
EPR, action at a distance, special relativity and the fuzziness of motion and time and of the cosmos itself, all suggest that our reality process only reflects but cannot represent noumenal reality. In other words, what we visualize and see is not what's actually behind what we visualize and see (what we perceive is not, as Kant says, the thing-in-itself). Quantum theory gives us some insight into how noumena are mapped into three- and four-dimensional phenomena, but much remains uncharted.
So if phenomenon A correlates with phenomenon B, we may be able to find some algorithm that predicts this and other outcomes with a probability near 1. But if A and B are phenomena with a relation determined in a noumenal "world," then what is to prevent all sorts of oddities that make no sense to phenomenalist physicists? Answer: If so, it might be difficult to plumb such a world, just as a shadow only discloses some information about the object between the projection and the light source.
Physicists are, I would say, somewhat more likely to accept a nonlinearity in causality than are scientists in general. For example, Brian Josephson, a Nobel laureate in physics, favors a radical overhaul of physical knowledge by taking into account such peculiarities as outlined by John A. Wheeler, who proposes a "participatory universe." Josephson believes C.S. Peirce's semiotics combined with a new approach to biology may help resolve the impasses of physics, such as the evident incommensurability of the standard model of particle physics with the general theory of relativity.
Josephson on a 'participatory universe'
http://arxiv.org/pdf/1108.4860v4.pdf
And Max Tegmark argues that the cosmos has virtually zero algorithmic information content, despite the assumption that "an accurate description of the state of the universe appears to require a mind-bogglingly large and perhaps even infinite amount of information, even if we restrict our attention to a small subsystem such as a rabbit."
But, he says that if the Schroedinger equation is universally valid, then "decoherence together with the standard chaotic behavior of certain non-linear systems will make the universe appear extremely complex to any self-aware subsets that happen to inhabit it now, even if it was in a quite simple state shortly after the big bang."
Tegmark's home page
http://space.mit.edu/home/tegmark/home.html
Roger Penrose has long been interested in the "huge gap" in the understanding of physics posed by the Schroedinger's cat scenario. He sees this issue as strongly suggestive of a quantum influence in consciousness -- consciousness being crucial to the collapse of Schroedinger's wave function.
He and Stuart Hameroff, a biologist, propose that microtubules in the brain are where the relevant quantum activities occur (a notion that has gained little professional acceptance).
Even though Penrose is attempting to expunge the problem of superposed realities from physics with his novel proposal, the point to notice here is that he argues that the quantum enigma is indicative of something beyond current physical knowledge that must be taken into account. The conscious mind, he claims, is not at root doing routine calculations. That chore is handled by the unconscious autonomic systems, he says.
In our terms, he is pointing to the existence of a noumenal world that does not operate in the routine "cause-effect" mode of a calculational model.
The 'Orch OR' model for consciousness
http://www.quantumconsciousness.org/sites/default/files/1998%20Hameroff%20Quantum%20Computation%20in%20Brain%20Microtubules%20The%20Penrose%20Hameroff%20Orch%20OR%20model%20of%20consciousness%20-%20Royal%20Society_0.pdf
Penrose talk on quantum activity in consciousness
https://www.youtube.com/watch?v=3WXTX0IUaOg
On the other hand, there has always been a strong belief in non-illusional reality among physicists. We have Einstein and Popper as notable examples. Popper was greatly influenced by Alfred Landes, whose strong opposition to the Copenhagen interpretation is spelled out in books published well before Aspect's experiments had confirmed bilocality to the satisfaction of most physicists (72).
Yet, the approach of many probability theorists has been to ignore these sorts of implications. Carnap's attitude is typical. In his
Logical Foundations of Probability (73), Carnap mentions a discussion by James Jeans of the probability waves of quantum mechanics, which Jeans characterizes as "waves of knowledge," implying "a pronounced step in the direction of mentalism" (74).
But Carnap breezes right past Jeans's point, an omission that I hazard to guess calls into question the logical foundation of Carnap's whole system -- though I note that I have not attempted to plow through the dense forest of mathematical logic symbols in Carnap's book.
I have tried to address some of the issues in need of exploration in my paper
Toward, which discusses the reality construction process and its implications. The paper you are reading is intended as a companion to that paper:
Toward a Signal Model of Perception
https://cosmosis101.blogspot.com/2017/06/toward-signal-model-of-perception.html
We have two opposing tendencies: On the one hand, our experiments require that detections occur according to a frequency-style "probability wave," in which the probability of a detection is constrained by the square of the wave amplitude. If numerous trials are done, the law of large numbers will come into effect in, say, the correlation found in an Aspect experiment. So our sense is that quantum probabilities are intrinsic, and that quantum randomness is fundamental. That is to say, quantum propensities verify an "objective external reality."
On the other hand, the logical implication of such randomness -- as demonstrated in the double-slit and Aspect experiments -- is that what we call reality must be more subjective than usually supposed, that various histories (and hence futures) are superpositions of potential outcomes and do not actualize until observation in the form of cognitive focus (which may be, for all we know, partly unconscious). So one's mental state and train of thought must influence -- after the manner of the science fiction film
The Matrix -- one's perceived world. This is the asymmetric three-sink vector field problem. Where will the fixed point (as in center of mass or gravity) be found?
So then assignment of probabilities may seem to make sense, but only if one neglects the influence of one's mind on the perceived outcomes. As long as you stay in your assumed "world," probability calculations may work well enough -- as you inhabit a world that "goes in that direction" (where many people micromanage "the future" in terms of probabilities).
This logical outcome of course is what Popper and many others have objected to. But, despite a great many attempts to counter this line of thought, none have succeeded (as I argue in
Toward). At the end of his career, Popper, faced with the Aspect results, was reduced to a statement of faith: even if bilocality holds, his belief in classical realism would not be shaken.
He points to the horror of Hiroshima and Nagasaki and the "real suffering" of the victims as a moral reason to uphold his objectivism. In other words, he was arguing that we must not trivialize their pain by saying our perception of what happened is a consequence of some sort of illusion (75).
Einstein, it seems clear, implicitly believed in a phenomenal world only, though his own progress with relativity required the ditching of such seemingly necessary phenomena as the ether for mediating not light waves. In his more mature period, he conceded that the spacetime continuum devised by himself and Minkowski was in effect an ether. My estimate is that Einstein mildly conceded a noumenal world, but resisted a strong dependence on such a concept. Bohm, who favored a form of "realism," settled on a noumenal world with the analogies of a holographic universe and of the "implicate" order shown by an ink blob that unravels when spun in a viscous fluid and pretty much exactly is restored to its original state when its spin is reversed. Phenomena are observed because of some noumenal relation.
So we say there is some physical process going on that we can only capture in part. By probability wave, we mean that we may use a wave model to represent what we can know about the unfolding of the process. The probability wave on the one hand implies an objective reality but on the other a reality unfolding in a feedback loop within one's brain-mind.
Waves imply change. But whatever is going on in some noumenal realm is partly predictable in terms of probability of observed properties. That is to say, the probability wave function is a means of partly predicting event observations but we cannot say it models the noumenal process precisely or at all.
As Jeans put it:
"Heisenberg attacked the enigma of the physical universe by giving up the main enigma -- the nature of the physical universe -- as insoluble, and concentrating on the minor puzzle of co-ordinating our observations of the universe. Thus it is not surprising that the wave picture which finally emerged should prove to be concerned solely with our knowledge of the universe as obtained from our observations (76)."
This pragmatic idea of ignoring the noumenal, or as some might prefer, sub-phenomenal, world has been largely adopted by practicing scientists, who also adopt a background assumption that discussion of interpretation is philosophy and hence outside science. They accept Popper's view that there exists a line dividing science from meta-science and similarly his view that interpretations are not falsifiable. And yet, a counterexample to that belief is the fact that Einstein's interpretation barring bilocalism was falsified, in the minds of most physicists, by the experiments of Aspect.
The importance of brain teasers
Consider the Monty Hall problem.
The scenario's opener: The contestant is shown three curtains and told that behind one is a new car and behind each of the others an old boot and she is told to choose one of the three curtains, which we will label from left to right as ABC.
She chooses B.
Monty opens curtain A and reveals a boot. He then surprises her with the question: "Do you want to stick with your choice of A, or do you want to switch to curtain C?"
The problem is: Should she switch?
The counterintuitive answer, according to numerous probabilists, is yes. When the problem, and its answer, first appeared in the press there were howls of protest, including from mathematicians and statisticians.
Here is the reasoning: When she chose B she had a 1/3 chance of winning a car. Hence her choice of B had a 2/3 chance of being wrong. Once Monty opened curtain A, her choice still carried a 2/3 probability of error. Hence a switch to C gives her a 2/3 probability of being right!
Various experiments were done and it was found that a decision to switch tended to "win a car" in two out of three trials.
A few points:
The contestant starts out with complete observer ignorance. She has no idea whether a "common" permutation is in effect and so she might as well assume a randomization process has established the permutation.
Once Monty opens curtain A, the information available to her increases and this affects the probabilities in an unanticipated way. The typical reaction is to say that whether one switches or not is immaterial because the odds are now 50/50. It seems quite bothersome that her mental state can affect the probabilities. After all, when she chose B, she wasn't in the standard view actually making anything happen. So why should the disclosure of the boot at A make any difference as to what actually happens? Hence the thought that we have a new trial which ought to be independent from the previous, making a probability of 1/2.
Yet, the new information creates conditions for application of conditional probability. This experiment then tends to underscore that Bayesian reasoning has "real world" applications. (There exist Monty Hall proofs that use the Bayesian formula, by the way.)
The initial possible permutations, in terms of car or boot, are:
bcb
bbc
cbb
where bcb means, for example, boot behind curtain A, car behind curtain B, boot behind curtain C.
By raising curtain A, the contestant has the information that two orderings remain, bcb and bbc, leading to the thought that the probability of guessing correctly is 1/2. But before Monty asks her if she wishes to switch, she has made an estimate based upon the initial information. In that case, her probability of guessing wrong is 2/3. If she switches, her probability of winning the car becomes 2/3.
Part of the perplexity stems from the types of randomness and probability at hand. A modern American tends to relate probabilities to assumed random forces in the external world, rather than only to mental state (the principle of insufficient reason).
And yet, if we grant that the six permutations entail six superposed histories, we then must consider a negative feedback control process that might affect the probabilities, as suggested above and described in
Toward. The brain's method of "selecting" and "constructing" reality may help explain why a few people are consistently well above or well below the mean in such low-skill games of chance.
This point of course raises a serious difficulty: solipsism. I have addressed that issue, however inadequately, in
Toward.
There are a number of other probability brain teasers, with attendant controversy over proper methods and quantifications. A significant issue in these controversies is the usefulness of the probabilistic process in making decisions. As Paul Samuelson observed, the St. Petersburg paradox is a poser that would never happen in actuality because no sane person would make such an offer.
Samuelson on paradoxes
http://www.jstor.org/stable/2722712
Keynes argued similarly that simple expectation value is not always valid, as when the risk is unacceptable no matter how great the payoff.
In the case of the Sleeping Beauty
problem,
http://www.u.arizona.edu/~thorgan/papers/other/Beauty.htm
we could notionally run a series of trials to find the limiting relative frequency. But such an experiment is likely to encounter ethics barriers and, even more likely, to be seen as too frivolous for the time and expense.
These and other posers are legitimate questions for those inclined to logical analysis. But note that such scenarios all assume a "linear" background randomness. Such an assumption may serve in many instances, but what of potential exceptions? For example, Sleeping Beauty, from her orientation, may have several superposed "histories" upon awakening. Which history "happens" is, from the experimenter's orientation, partly guided by quantum probabilities. So to ask for the "linear" probability solution to the Sleeping Beauty problem is to ignore the "reality wave" probabilities that affect any solution.
The St. Petersburg paradox
https://en.wikipedia.org/wiki/St._Petersburg_paradox
Also see
http://books.google.com/books?id=vNvXkFUbfM8C&pg=PA267&lpg=PA267&dq=robert+martin+st+petersburg+paradox+dictionary&source=bl&ots=x5NVz53Ggc&sig=JrlvGjnw5tcX9SEbgaA1CbKGW9U&hl=en&sa=X&ei=OdzmUdWXItK24AOxqoHQAQ&ved=0CDYQ6AEwAQ#v=onepage&q=robert%20martin%20st%20petersburg%20paradox%20dictionary&f=false
A noumenal world
Ludwig Wittgenstein's
Tractatus does something very similar to what Goedel proved rigorously, and reflects the paradoxes of Bertrand Russell and Georg Cantor. In other words, philosophy is described by statements, or propositions, but cannot get at "the problems of life." I.e., philosophy uses a mechanistic structure that cannot apprehend directly what others have called the noumenal world. Hence, the propositions used in
Tractatus are themselves nonsense. Again, the self-referencing dilemma.
Interestingly, later on Wittgenstein was unable to follow Goedel and dropped work on the philosophy of mathematics.
The phenomenal versus the noumenal is reflected in experiments in which participants at a console are urged to inflict pain on a subject allegedly connected to an electroshock device, but who is in fact an actor simulating pain responses. Here we see Hannah Arendt's "banality of evil" among those who obey the experimenter's commands to inflict greater and greater pain. Those who passively obey criminal orders are responding to the social clues suggested by the situation, and are rather easily persuaded when they see others carrying out obnoxious deeds under "legal" circumstances. They accept rationalizations because, essentially, they see their immediate interest to lie in conforming to settings controlled by some authority. In some cases, they may also have a psychological need to express the primitive predator within (which for most people is expressed during sporting events or entertainment of other sorts). These persons are close to the phenomenal world accepted by Darwinists.
Yet there are those who resist criminal orders or cajoling, whatever the setting. Is this only the Freudian superego that has been programed to bar such behavior (the internalization of parental strictures)? If so, one would suspect that such inhibitions would be weakened over time by consistent exposure to the actions of the herd. Yet, there are those who do not respond well to the blandishments that come from herd leaders and who strenuously resist being pushed into criminal (even if "legalized") behavior. Very often such persons cite religious convictions. Still, as shown by the horrific history of religious warfare, it is possible for a person to have a set of religious ideas that do not work against the stampede effect.
However that may be, such peculiar individuals point to an interior moral compass not found in others, or if some others do possess that quality, it has been greatly repressed. The idea that such a moral compass is a consequence of random physical forces is, by today's standards, plausible. But another possibility is connection with a noumenal world, which hides the source of the resistance of banal evil.
Types of intuition
Consider Type 1 intuition:
Let us think about mathematical intuition, in which we have what is really an informed guess as to a particular mathematical truth.
Such intuition is wrong often enough such that the word "counterintuitive" is common among mathematicians. Such informed guesswork is based on one's experience with similar sets of relations. So such intuition can improve with experience, thus the respect given to experts.
There is also intuition based on subtle clues, which may tip one off to imminent danger.
Then we have someone who is vexed by a scientific (or other) problem and, no matter how much he spins his mental wheels, is unable to solve it. But while asleep or in a reverie, he suddenly grasps an answer.
The following scenario is plausible:
He has the intuition, based on experience, that the problem is solvable (though he may be wrong about this). In many cases he has quieted the left-brain analytic function in order to permit the right brain to make associations at a more "primitive" level. It is noteworthy that the left brain will call to consciousness the precise left-right, or time sequence, order of a telephone number. When fatigue dims the analytic function, the right brain will call to consciousness the digits, but generally not in left-right order. So one can see how some set of ideas might similarly be placed in an unexpected arrangement by the right brain, leading possibly to a new insight recognized by the integrated mind.
The analytic functions are in some people "closer to consciousness," being centered in the frontal lobes which represent the most recent major adaptation of the human species. The more basic associative functions are often regarded as an expression of an earlier, in an evolution sense, segment of the brain, and so further into the unconscious. This is the region expressed by artists of all sorts.
Mental relaxation means curtailing the analytic function so as to let the associative region express itself. When one is dreaming, it may be said that one's analytic function and executive control is almost shut down -- though the dream censorship shows that the mental monitor is still active.
So we might say that, at times, what is meant by intuition is that the brain's executive function is refereeing the analytic and associative processes and integrating them into an insight that may prove fruitful.
We regard this form of intuition as belonging to the phenomenal, and not the noumenal world -- though the noumenal world's influence is felt, I daresay, at all times.
[Another view of intuition is that of Henri Bergson.
Discussion of Bergson's ideas
http://plato.stanford.edu/entries/bergson/
]
But Type 2 intuition is the direct knowledge of something without recourse to the phenomenal world associated with the senses (of which there are many, and not five). This form of communication (though who's to say what is doing the communicating) bypasses or transcends the phenomenal world, as when an individual turns about upon being gazed at from a distance. I realize that such a phenomenon doesn't get much support in the available literature; still, I have on many occasions looked at people from behind from a distance -- say while on public transport -- and noticed them turning about and scanning the middle distance, often with a quizzical look on their faces. Of course such an effect can be "explained," but it seems quite apparent that the person -- more often a woman than a man -- is not turning around for identifiable reasons.
The person who turns about may not even be conscious of what prompted her. Part of her brain has "intuitive" or direct knowledge of another's presence. One can view this effect in terms of a Darwinistic survival advantage. That is to say, one may say that conscious life forms interact with an unknown world, which, by its nature is immaterial and apparently not subject to the laws of physics as they apply in the phenomenal world.
Sometimes, and perhaps always, Type 1 intuition would seem to have at its core Type 2 intuition.
This is to say there is something other than digital and analog reasoning, whether unconscious or not. Hence, one would not expect an artificial intelligence program, no matter how advanced, to have Type 2 intuition (77).
Of course, it is to be expected that some will disparage such ideas as constituting a "revival of vitalism." Even so, the anti-vitalist must do more than wave hands against "paranormal" events; he must make serious attempts to exclude the likelihood of what I term noumenal effects.
A confusion here is the claim that "because" vitalism seemingly can't be tested or falsified in a Popperian sense, the idea is hence unscientific and must be ignored by scientifically minded people. True, there has been an ongoing battle of statistics with respect to "psychic phenomena" between the yay sayers and the nay sayers. But I wonder about attempts to use repeated trials, because it seems unlikely that the independence criterion will work. Here is a case where Bayesian methods might be more appropriate.
In the Newtonian-Laplacian era, prior to the quantum mechanics watershed, the concept of randomness was tied to the belief in a fully deterministic cosmos, in which humans are players on a cosmic stage. The Laplacian clockwork model of the cosmos forbids intrinsic randomness among the cogs, wheels and pulleys. The only thing that might, from a human viewpoint, be construed as random would have been the actions of the elan vitale. Of course the devout did not consider the vital spirit to be random at all, but rather saw it as stemming from a direct influence of God.
Curiously, in the minds of some, a clockwork cosmos seemed to imply a need for God. Otherwise, there would be no free will. Humans would be reduced to delusional automatons.
So uncertainty, in the clockwork model, was viewed as simply a lack of sufficient knowledge for making a prediction. In the famous conceit of Laplace, it was thought that a grand robot would be able to calculate every trajectory in the entire universe to any extent forward or backward in time. It was lack of computing power, not inherent randomness that was thought to be behind the uncertainty in gambling systems.
In the early 20th century, R.A. Fisher introduced random selection as a means of minimizing bias. Or, a better way to express this is that he sought ways to screen out unwanted extraneous biases. The methods chosen for filtering out bias were then seen as means of ensuring randomness, and this perspective is still in common use. So one might then define randomness as a consequence of low (in the ideal case zero) bias in sampling. In the 1930s, however, some prominent probabilists and statisticians, influenced by the new quantum theory, accepted the notion of intrinsic background randomness, leading them to dispense with the idea that probability measures a degree of belief. They thought there was an objective discipline of probability that did not require "subjectivism." To them, quantum mechanics justified the idea that a properly calculated probability result yields a "concrete" truth that is true regardless of the observer.
Physicists however do not tend to see quantum logic as an easy way to dispose of subjectivism. In fact, a number take quite the opposite tack, acknowledging a strong logical case for a "spooky" interface between subject and object. Such a noumenal world -- where space and time are "transcended" -- should indeed interact with the phenomenal world in "weird" ways, reminiscent of the incident in the science fiction film
The Matrix when the hero observes a cat move oddly, as in a quick film rewind and replay. The example may be silly, but the concept is not.
Now, as a great many reports of "paranormal" events are subjective first-person accounts, it is easy to dismiss them all under the rubric "anecdotal." Clearly, many scientists want nothing to do with the paranormal because it attracts so many starry eyed "true believers" who have very little scientific background. Such notoriety can be a career destroyer for a young academic.
Bruce Hood, who sits on the board of
The Skeptic magazine, is a psychologist who take's a neuroscience view of cognition. To Hood, the fact that the "self" is an integrated composite of physical functions implies that consciousness is an epiphenomenon. Hence, religion, faith and assorted superstitions are delusions; there is no self in need of being saved and no evidence of a soul, which is viewed as paranormal nonsense (83).
Hood on 'the self illusion'
http://www.psychologytoday.com/blog/the-self-illusion/201205/what-is-the-self-illusion
While I agree that phenomenal reality, including in part the reality of self, is interwoven with the perception-cognition apparatus, my point is that if we look closely enough, we apprehend something beyond our usual set of conceits and conceptions. The observer has much to do with forming phenomenal reality, and to me this of itself points to a component of cognition that is non-phenomenal or, as we say, noumenal.
Hence, it is not unreasonable after all to think in terms of a noumenal world in which transactions occur that are beyond our immediate ken. It is safe to say that for quite some time a great many men of high caliber knew from "self-evidence" that the world was flat. And yet there were clues, such as sailing ship masts sinking below the horizon, that suggested a revolutionary way of conceiving of the world, one that at first makes no sense: if the world is round, why don't people fall off on the underside?; if this round world is spinning, why isn't everyone hurled off?
So I would say that for the flat-earthers, the reality of a round world was hidden, part of an "implicate order" in need of unfolding.
Writing prior to the development of thermonuclear bombs, J.D. Stranathan gives this account of the discovery of deuterium (84) :
G.H. Aston in 1927 had obtained a value 1.00778 +/- 0.00015 for the atomic weight of hydrogen, which differed from the accepted chemical value of 1.00777 +/- 0.00002. The figures were so close that no isotope seemed necessary.
But, the discovery of the two heavier isotopes of oxygen forced a reconsideration because their existence meant that the physically derived and chemically derived scales of atomic weight were slightly, but importantly, different. This meant that Aston's value, when converted to the chemical scale, was 1.00750, and this was appreciably smaller than the chemically determined atomic weight. The alleged close agreement was adjudged to be false.
That discrepancy spurred Harold C. Urey, Ferdinand G. Brickwedde and George M. Murphy to hunt for deuterium, which they found and which became a key component in the development of the atomic bomb.
But, this discrepancy turned out to have been the result of a small experimental error. It was shown that the 1927 mass spectrograph value was slightly low, in spite of having been carefully confirmed by Kenneth T. Bainbridge. When the new spectrograph value of 1.0081 was converted to the chemical scale, there was no longer a substantive disagreement. Hence, there was no implication of the existence of deuterium.
Though the chemical and physical scales were revealed to have been slightly different, that revelation, without the 1927 error, would have yielded no reason to expend a great deal of effort searching for heavy hydrogen.
Had heavy water been unknown, would allied scientists have been fearful of German development of atomic fission weapons (British commandos wrecked Germany's heavy water production in occupied Norway) and have spurred the British and American governments into action?
Even had the Manhattan Project been inevitable, it is conceivable that, at the outset of World War II, the existence of heavy water would have remained unknown and might have remained unknown for years to come, thus obviating postwar fulfillment of Edward Teller's dream of a fusion bomb. By the time of deuterium's inevitable (?) discovery, the pressure for development of thermonuclear weapons might well have subsided.
In other words, looking back, the alleged probability of the discovery of heavy water was miniscule, and one is tempted to wonder about some noumenal influence that fated humanity with this almost apocalyptic power.
At the least, we have the butterfly effect on steroids.
At any rate, the idea here is not to idolize paranormal phenomena, but rather to urge that there is no sound epistemological reason to justify the "Darwinistic" (or, perhaps, Dawkinsistic) edict of ruling out any noumenal world (or worlds) and the related prohibition of consideration of any interaction between phenomenal and noumenal worlds.
In fact, our attempt to get a feel for the noumenal world is somewhat analogous to the work of Sigmund Freud and others in examining the unconscious world of the mind in order to find better explanations of superficially cognized behaviors. (Yet I hasten to add that, though Carl Jung's brilliance and his concern with what I term the noumenal world cannot be gainsaid, I find that he has often wandered too far from the beaten path even for my tastes.)
A note on telepathy
There is a great deal of material that might be explored on "paranormal" phenomena and their relation to a noumenal world. But, we will simply give one psychologist's thoughts on one "noumenal" subject. Freud was quite open-minded about the possibility of extra-normal thought transference.
In New Introductory Lectures on Psycho-Analysis, he writes: "One is led to the suspicion that this is the original, archaic method of communication between individuals and in the course of phylogenetic evolution it has been replaced with the better method of giving information via signals which are picked up by the sense organs."
He relates a report of Dorothy Burlingham, a psychoanalyst and "trustworthy witness." (She and colleague Anna Freud did pioneering work in child psychology.)
A mother and child were in analysis together. One day the mother spoke during analysis of a gold coin that had played a particular part in one of her childhood experiences. On her return home, the woman's son, who was about 10, came to her room and gave her a gold coin, which he asked her to keep for him. Astonished, she asked him where he had got it. It turned out that it had been given him as a birthday present a few months previously. Yet there was no obvious reason why he had chosen that time to bring her the coin.
Freud sees this report as potential evidence of telepathy. One might also suspect it as an instance of Jungian "synchronicity" or of the reality construction process discussed in
Toward.
At any rate, a few weeks later the woman, on her analyst's instructions, sat down to write an account of the gold coin incident. Just then her child approached her and asked for his coin back, as he wanted to show it during his analysis session.
Freud argues that there is no need for science to fear telepathy (though his collaborator, Ernest Jones, certainly seems to have feared the ridicule the subject might bring); Freud, who never renounced his atheism, remained open-minded not only about telepathy, but about the possibility of other extra-normal phenomena.
From our perspective, we argue that reports of "paranormal" communication and other such phenomena tip us off to an interaction with a noumenal world that is the reality behind appearances -- appearances being phenomena generally accepted as ordinary, whether or not unusual.
See my post:
Freud and telepathy
http://randompaulr.blogspot.com/2013/10/freud-on-telepathy.html
Freud, of course, was no mathematician and could only give what seemed to him a reasonable assessment of what was going on. Keynes's view was similar to Freud's. He was willing to accept the possibility of telepathy but rejected the "logical limbo" of explaining that and other "psychic phenomena" with other-worldly spirits.
Many scientists, of course, are implacably opposed to the possibility of telepathy in any form, and there has been considerable controversy over the validity of statistical studies for and against such an effect.
On the other hand, Nobelist Josephson has taken on the "scientific system" and upheld the existence of telepathy, seeing it as a consequence of quantum effects.
Josephson's page of psychic phenomena links
http://www.tcm.phy.cam.ac.uk/~bdj10/psi.html
In the name of Science
The tension between Bayesian reasoning and the intrinsic background randomness imputed to quantum physics perforce implies Wheeler's "participatory universe" in which perception and "background reality" (the stage on which we act, with the props) merge to an extent far greater than has previously been suspected in the halls of academia -- despite herculean efforts to exorcise this demon from the Realm of Science. In other words, we find that determinism and indeterminism are inextricably entangled at the point where consciousness meets "reality."
Nevertheless, one cannot avoid the self-referencing issue. In fact, if we suspend the continuity assumptions of space and time, which quantum theory strongly advises that we should, we arrive at infinite regress. But even with continuity assumptions, one can see infinite regress in say an asymmetric three-sink vector field. Where is the zero point at time T
x? In a two-sink field, the symmetry guarantees that the null point can be exactly determined. But in a three-sink field that is not symmetric, one always faces something analogous to quantum uncertainty -- and that fact also points to problems of infinite regress.
We can think of this in terms of a nonlinear feedback control system. Some such systems maintain an easily understood homeostasis. The thermostat is a case in point. But others need not follow such a simple path to homeostasis. A particular input value may yield a highly unpredictable output within the constraint of homeostasis. In such systems, we tend to find thresholds and properties to be the best we can do in the way of useful information. Probabilities may help us in estimating properties, as we find in the behavior of idealized gas systems.
Even so, these probabilities cannot really be frequency based, except in the classical sense based on the binomial distribution. Trials can't be done. E.T. Jaynes thought that the Shannon approach of simply expropriating what I call the classical approach sufficed for molecular physics. Yet, I add that when Einstein used probabilities to establish that Brownian motion conformed to the behavior of jostling atoms, he was not only implicitly using the classical approach, but also what we might call a propensity approach in which the presumed probabilities were assigned in accordance with system start-up information, which in this case was given by Newtonian and Maxwellian mechanics.
The above considerations suggest that it is a mistake to assume that human affairs are correctly portrayed in terms of intrinsic randomness played out in some background framework that is disentangled from the observer's consciousness.
In fact, we may see some kind of malleable interconnectedness that transcends the phenomenal world.
This also suggests that linear probability reckoning works well enough within limits. We use the word linear to mean that the influences among events are small enough so as to be negligible, permitting us the criterion of independence. (Even conditional probabilities rest on an assumption of independence at some point.) The limits are not so easily defined, as we have no system of nonlinear differential equations to represent the sharing of "reality" among minds or the balance between the brain's reality construction versus "external" reality.
Certainly in the extremes, probability assessments do not seem terribly satisfactory within a well-wrought metaphysical system, and should not be so used, even though "linear" phenomenal randomness is viewed as a component of the Creed of Science, being a basic assumption of many a latter day atheist, whether or not scientifically trained.
"Everyone knows" that some phenomena are considered to be phantasms of the mind, whether they be optical or auditory illusions or delusions caused by temporary or permanent brain impairment, and that, otherwise, these phenomena are objective, meaning that there is wide agreement that such phenomena exist independently of any observer, especially if such phenomena have been tested and verified by an accepted scientific process. Still, the underlying assumptions are much fuzzier than the philosophical advocates of "hard science" would have us believe.
So this suggests there exists some holistic "uber force," or organizing principle (the
logos of Parmenides and the Gospel of John)-. Certainly we would not expect an atheist to believe this uber force is conscious, though he or she might, like Einstein, accept the existence of such an entity in Spinoza's pan-natural sense. On the other hand, neither Einstein, nor other disciples of Spinoza, had a logical basis for rejecting the possibility that this uber force is conscious (and willing to intervene in human affairs). This uber force must transcend the laws of physics of this universe (and any clone-like cosmoses "out there"). Here is deep mystery; "dark energy" is a term that comes to mind.
I have not formalized the claim for such an uber force. Still, we do have Goedel's ontological proof of God's existence, though I am unsure I agree that such a method is valid. (However that may be, Kant seems to have taken care of that one.) An immediate thought is that the concept of "positive" requires a subjective interpretation. On the other hand, we have shown that the human brain/mind is a major player in the construction of so-called "concrete" phenomenal reality.
Goedel's ontological proof of God's existence
http://math.stackexchange.com/questions/248548/godels-ontological-proof-how-does-it-work
Asserted existence of god theorem
https://en.wikipedia.org/wiki/G%C3%B6del's_ontological_proof
Formalization, mechanization and automation of Gödel's proof of god's existence
http://arxiv.org/abs/1308.4526
In a private communication, a mathematician friend responded thus:
"For example, BMW is a good car. BMW produces nitrous oxide pollution. Therefore nitrous oxide pollution is good."
My friend later added: "But maybe the point of the ontological proof is not 'good' but 'perfect.' God is supposed to be perfect. A perfect car would not pollute."
Again, the property of goodness requires more attention; though a doubting Thomas, I am not fully unpersuaded of Goedel's offering.
In this respect, we may ponder Tegmark's mathematical universe hypothesis, which he takes to imply that all computable mathematical structures exist.
Tegmark's mathematical universe paper
http://arxiv.org/pdf/gr-qc/9704009v2.pdf
Tegmark's mathematical universe hypothesis has been stated thus: Our external physical reality is a mathematical structure. That is to say, the physical universe is mathematics in a well-defined sense. So in worlds "complex enough to contain self-aware substructures," these entities "will subjectively perceive themselves as existing in a physically 'real' world." The hypothesis suggests that worlds corresponding to different sets of initial conditions, physical constants, or altogether different equations may be considered equally real. Tegmark elaborates his conjecture into the computable universe hypothesis, which posits that all computable mathematical structures exist.
Here I note my paper:
On Hilbert's sixth problem
https://cosmosis101.blogspot.com/2017/07/on-hilberts-sixth-problem-and-boolean.html
which argues against the notion that the entire cosmos can be modeled as a Boolean circuit or Turing machine.
An amusing aside:
1. Assuming the energy resources of the universe are finite, there is a greatest expressible integer.
2. (The greatest expressible number) + 1.
3. Therefore God exists.
Arthur Eddington once observed that biologists were more likely to be in the camp of strict materialists than physicists. Noteworthy examples are Sigmund Freud, who considered himself a biologist, and J.B.S. Haldane, a pioneer in population genetics. Both came of age during the first wave of the Darwinian revolution of the 19th century, a paradigm that captured many minds as a model that successfully screened out God just as, as it was thought, the clockwork cosmos of Laplace had disposed of the need for the God hypothesis. Freud and Haldane were convinced atheists, and it is safe to say that Freud's view of extraordinary communication was thoroughly materialist. Haldane's severe materialism can be seen in the context of his long-term involvement in Soviet Communism.
Though physicists often remained reticent about their views on God, those who understood the issues of quantum theory were inclined toward some underlying transcendence. This situation remains as true today as it did in the 1930s, as we see with the effectively phenomenalist/materialist world view of the biologist Richard Dawkins, who is conducting a Lennonist crusade against belief in God.
A previous generation witnessed Bertrand Russell, the logician, in the role of atheist crusader. Russell with his colleague Alfred North Whitehead, in their
Principia Mathematica had tried to assure that formal knowledge could be described completely and consistently. One can see that such an achievement would have bolstered the cause of atheism. If, at least in principle, the universe can be "tamed" by human knowledge, then one can explain every step of every process without worry about God, or some transcendental entity. God, like the ether, would have been consigned to the rubbish heap of history, a single "real" history and not one of many potentials determined in part by present observation.
Of course, in 1931 Goedel proved this goal an illusion, using
Principia Mathematica to demonstrate his proof. Goedel's incompleteness theorem caught the inrushing tide of the quantum revolution, which brought the question of traditional scientific external reality into question. The revolution had in part been touched off by experimental confirmation of Louis de Broglie's proposed matter waves, an idea that made use of Einstein's energy/matter relation to posit matter waves, So the doctrine of materialism was not only technically in question, but, because material waves obeyed probability amplitudes, the very existence of matter had become a very strange puzzle, a situation that continues today.
Even before this second quantum revolution, the astrophysicist Arthur Eddington had used poetic imagery to put into perspective Einstein's spacetime weirdness (78).
Perhaps to move His laughter at their quaint opinions wide Hereafter, when they come to model Heaven And calculate the stars, how they will wield The mighty frame, how build, unbuild, contrive To save appearances. -- John Milton, Paradise Lost
Quantum weirdness only strengthened Eddington's belief in a noumenal realm.
"A defence of the mystic might run something like this: We have acknowledged that the entities of physics can from their very nature form only a partial aspect of the reality. How are we to deal with the other part?" Not with the view that "the whole of consciousness is reflected in the dance of electrons in the brain" and that "quasi-metrical aspects" suffice.
Eddington, in countering Russell on what Eddingon said was Russell's charge of an attempt to "prove" distinctive beliefs of religion, takes aim at loose usage of the word "reality," warning that it is possible to employ that word as a talisman providing "magic comfort." And, we add that cognitive dissonance with internal assumptions and rationalizations usually provokes defensive reactions.
"We all know that there are regions of the human spirit untrammeled by the world of physics" and that are associated with an "Inner Light proceeding from a greater power than ours."
Another English astrophysicist, James Jeans, also inclined toward some noumenal presence (79).
Jeans writes that the surprising results of the theories of relativistic and quantum physics leads to "the general recognition that we are not yet in contact with ultimate reality." We are only able to see the shadows of that reality. Adopting John Locke's assertion that "the real essence of substances is unknowable," Jeans argues that scientific inquiry can only study the laws of the changes of substances, which "produce the phenomena of the external world."
In a chapter entitled, "In the Mind of Some Eternal Spirit," Jeans writes: "The essential fact is simply that all the pictures which science now draws of nature, and which alone seem capable of according with observational fact, are mathematical pictures." Or, I would say, the typical human brain/mind's empirically derived expectations of physical reality are inapplicable. In a word, the pictures we use for our existence are physically false, delusional, though that is not to say the delusional thinking imparted via the cultic group mind and by the essentials of the brain/mind system (whatever they are) are easily dispensed with, or even safe to dispense with absent something reliably superior.
In a play on the old epigram that the cosmos had been designed by a "Great Architect," Jeans writes of the cosmos "appearing to have been designed by a pure mathematician."
He adds: "Our remote ancestors tried to interpret nature in terms of anthropomorphic concepts and failed. The efforts of our nearer ancestors to interpret nature on engineering lines proved equally inadequate. Nature refused to accommodate herself to either of these man-made moulds. On the other hand, our efforts to interpret nature in terms of the concepts of pure mathematics have, so far, proved brilliantly successful."
Further, "To my mind the laws which nature obeys are less suggestive of those which a machine obeys in its motion than those which a musician obeys in writing a fugue, or a poet in composing a sonnet."
Remarks:
We are unsure whether Jeans believed in a God who intervenes in human affairs. Often, the denial of "anthropomorphism" includes denial of the basis of Christianity. But what does he mean when he says that, contrary to Kant's idea that the "mathematical universe" was a consequence of wearing mathematical eyeglasses, "the mathematics enters the universe from above instead of from below"? He seems to mean that the mathematics of physics corresponds to an objective reality capable of being discerned by the human mind in terms of mathematics.
Though he saw the "engineering" paradigm as fundamentally flawed, that message has failed to make much headway in the ranks and files of scientific activity, where full determinism is still accepted as a practical article of faith, based on the incorrect assumption that physical indeterminism is only relevant in very limited areas.
Jeans's enthusiasm about the mathematics is, in another book, tempered when he criticizes Werner Heisenberg for focusing on the lesser problem of the mathematics of quantum relations while ignoring the greater problem of observer influence.
Even so, Heisenberg's views were not all that out of tune with Jeans's. In the context of a discussion on the interpretation of quantum theory, Heisenberg uses the term "central order" as akin to the concept of a personal God or to the inner flame of a person's soul. Without this "central order" humanity would be in straits far more dire than posed by the ordeal of atomic war or mass concentration camps. Heisenberg also equates the Platonic world of ideas and ideals with the theme "God is spirit." Yet, he urges that the languages of science and of religion be kept distinct, so as not to weaken either mode of understanding (7).
Materialism, writes Heisenberg, is a concept that at root is found wanting. "For the smallest units of matter are, in fact, not physical objects in the ordinary sense of the word; they are forms, structures or -- in Plato's sense -- Ideas, which can be unambiguously spoken of in the language of mathematics."
Heisenberg relates the paradox of Parmenides: "Only being is; non-being is not. But if only being is, there cannot be anything outside this being that articulates it or could bring about changes. Hence being will have to be conceived of as eternal, uniform, and unlimited in space and time. The changes we experience can thus be only an illusion."
Though initially unnerved that his wave mechanics could not resolve the "quantum jump" problem, Erwin Schroedinger's concept of reality evolved.
Schroedinger did not care for the idea he attributes to another quantum pioneer, Pascual Jordan, that quantum indeterminacy is at the basis of free will, an idea echoed in some ways by Penrose. If free will steps in to "fill the gap of indeterminacy," writes Schroedinger, the quantum statistics will change, thus disrupting the laws of nature.
In the same article, Schroedinger talks about how scientific inquiry can't cope very well, if at all, with what we have called noumena:
"The scientific picture of the real world around me is very deficient. It gives a lot of factual information, puts all our experience in a magnificently consistent order, but it is ghastly silent about all and sundry that is really near to our heart, that really matters to us. It cannot tell us a word about red and blue, bitter and sweet, physical pain and physical delight; it knows nothing of beautiful and ugly, good or bad, God and eternity. Science sometimes pretends to answer questions in these domains, but the answers are very often so silly that we are not inclined to take them seriously."
Further, "The scientific world-picture vouchsafes a very complete understanding of all that happens -- it makes it just a little too understandable. It allows you to imagine the total display as that of a mechanical clockwork which, for all that science knows, could go on just the same as it does, without there being consciousness, will, endeavour, pain and delight and responsibility connected with it -- though they actually are."
Hence, "this is the reason why the scientific worldview contains of itself no ethical values, no aesthetical values, not a word about our own ultimate scope or destination, and no God, if you please." (79)
Elsewhere, he argues that Science is repeatedly buffeted by the unjust reproach of atheism. When we use the clockwork model of the cosmos, "we have used the greatly simplifying device of cutting our own personality out, removing it; hence it is gone, it has evaporated, it is ostensibly not needed.” (79)
My thought is that such a method of depersonalization has strong advantages, if the scope of inquiry is limited, as in Shannon's depersonalized information. Depersonalization of information for specific purposes does not imply, of course, that information requires that no persons are needed to justify existence of information.
"No personal god," says Schroedinger, "can form part of a world model that has only become accessible at the cost of removing everything personal from it. We know, when God is experienced, this is an event as real as an immediate sense perception or as one’s own personality. Like them he must be missing in the space-time picture."
Though Schroedinger does not think that physics is a good vehicle for religion, that fact does not make him irreligious, even if he incurs blame from those who believe that "God is spirit.”
And yet Schroedinger favored Eastern philosophy: “Looking and thinking in that manner you may suddenly come to see, in a flash, the profound rightness of the basic conviction in Vedanta: it is not possible that this unity of knowledge, feeling and choice which you call your own should have sprung into being from nothingness at a given moment not so long ago; rather this knowledge, feeling, and choice are essentially eternal and unchangeable and numerically one in all men, nay in all sensitive beings.”
Schroedinger upheld "the doctrine of the Upanishads" of the "unification of minds or consciousnesses" despite an illusion of multiplicity (79).
In a similar vein, Wolfgang Pauli, another quantum pioneer, relates quantum weirdness to the human means of perception:
"For I suspect that the alchemistical attempt at a unitary psychophysical language miscarried only because it was related to a visible concrete reality. But in physics today we have an invisible reality (of atomic objects) in which the observer intervenes with a certain freedom (and is thereby confronted with the alternatives of "choice" or "sacrifice"); in the psychology of the unconscious we have processes which cannot always be unambiguously ascribed to a particular subject. The attempt at a psychophysical monism seems to me now essentially more promising, given the relevant unitary language (unknown as yet, and neutral in regard to the psychophysical antithesis) would relate to a deeper invisible reality. We should then have found a mode of expression for the unity of all being, transcending the causality of classical physics as a form of correspondence (Bohr); a unity of which the psychophysical interrelation, and the coincidence of
a priori instinctive forms of ideation with external perceptions, are special cases. On such a view, traditional ontology and metaphysics become the sacrifice, but the choice falls on the unity of being." (79)
Pauli appears here to be sympathetic with the notion of Jungian archetype, along with possibly something like the Jungian collective unconscious. Though he relates quantum weirdness to the human means of perception, one cannot be sure of any strong support of Jung's synchronicity theory.
Pauli, says his friend Heisenberg, was very fussy about clear thinking in physics and arrived at the idea of a psychophysical interrelation only after painstaking reflection. Even so, it should be noted that Pauli had been a patient of Jung and remained on good terms with Jung for many years.
Einstein made strict physical causality an article of faith, an outlook that underlies his Spinoza-style atheism. That view, however, did not make him irreligious, he argues. Despite church-state persecution of innovative thinkers, "I maintain that the cosmic religious feeling is the strongest and noblest motive for scientific research." (79)
Yet he makes plain his belief in strict causality, which meant there is no need for a personal God to interfere in what is already a done deal. Consider the "phenomenological complex" of a weather system, he says. The complex is so large that in most cases of prediction "scientific method fails us." Yet "no one doubts that we are confronted with a causal connection whose causal components are in the main known to us."
Leon Brillouin, on the other hand, says bluntly that because exact predictibility is virtually impossible, a statement such as Einstein's was an assertion of faith that was not the proper province of science (81). Curious that Brillouin uses the logical positivist viewpoint to banish full causality from the province of science, just as Einstein used the same philosophical viewpoint to cast the luminiferous ether into outer darkness.
Einstein, of course, was swept up in the Darwinisitic paradigm of his time, which I believe, is reflected in his point that early Jewish religion was a means of dealing with fear, but that it evolved into something evincing a sense of morality as civilization advanced. He believed that spiritual savants through the ages tended to have a Buddhist-style outlook, in which a personal, anthropomorphic God is not operative. Einstein did on occasion however refer to a "central order" in the cosmos, though he plainly did not have in mind Bohm's implicate order, which accepts quantum bilocalism.
Einstein: “I see on the one hand the totality of sense-experiences, and, on the other, the totality of the concepts and propositions laid down in books. The relations between concepts and propositions among themselves and each other are of a logical nature, and the business of logical thinking is strictly limited to the achievement of the connection between concepts and propositions among each other according to firmly laid down rules, which are the concern of logic. The concepts and propositions get 'meaning,' viz., 'content,' only through their connection with sense-experiences. The connection of the latter with the former is purely intuitive, not itself of a logical nature. The degree of certainty with which this relation, viz., intuitive connection, can be undertaken, and nothing else, differentiates empty fantasy from scientific 'truth'." (82)
The idea that abstract concepts draw meaning from the content and context of sense experiences is a core belief of many. But is it true? It certainly is an unprovable, heuristic allegation. What about the possibility that meaning is imparted via and from a noumenal realm? If you say, this is a non-falsifiable, non-scientific speculation, you must concede the same holds for Einstein's belief that meaning arises via the sensory apparatus. We note further that "meaning" and "consciousness" are intertwined concepts. Whence consciousness? There can never be a scientific answer to that question in the Einsteinian philosophy. At least, the concept of a noumenal world, or Bohmian implicate order, leaves room for an answer.
In buttressing his defense of strict causality, Einstein lamented the "harmful effect upon the progress of scientific thinking in removing certain fundamental concepts from the domain of empiricism, where they are under our control, to the intangible heights of the
a priori." (81) Logic be damned, is my take on this remark.
Louis de Broglie, another pioneer of quantum physics, at first accepted matter-wave duality, but later was excited by David Bohm's idea of what might be termed "saving most appearances" by conceding quantum bilocalism.
What, de Broglie asks, is the "mysterious attraction acting on certain men that urges them to dedicate their time and labours to works from which they themselves often hardly profit?" Here we see the dual nature of man, he says. Certain people aim to escape the world of routine by aiming toward the ideal. Yet, this isn't quite enough to explain the spirit of scientific inquiry. Even when scientific discoveries are given a utilitarian value, one can still sense the presence of an "ontological order."
We are nowhere near a theory of everything, de Broglie says. Yet "it is not impossible that the advances of science will bring new data capable" of clarifying "great problems of philosophy." Already, he writes, new ideas about space and time, the various aspects of quantum weirdness and "the profound realities which conceal themselves behind natural appearances" provide plenty of philosophical fodder (79).
Scientific inquiry yields technology that "enlarges" the body by amplifying the power of brawn and perhaps brain. But, such vast amplification has resulted in massive misery as well as widespread social improvements. "Our enlarged body clamours for an addition to the spirit," says de Broglie, quoting Henri Bergson.
The man who ignited the quantum revolution, Max Planck, warned that because science can never solve the ultimate riddle of nature, science "cannot really take the place of religion" (79). If one excludes "nihilistic" religion, "there cannot be any real opposition between religion and science," he writes. "The greatest thinkers of all ages were deeply religious" even if quiet about their religious thoughts.
"Anybody who has been seriously engaged in scientific work of any kind realizes that over the entrance to the temple of science are written the words: Ye must have faith."
This last sentiment seems boringly conventional. But how would a person, even a strict determinist, proceed without a strong conviction that the goal is achievable? As Eddington jokes, "Verily it is easier for a camel to pass through the eye of a needle than for a scientific man to pass through a door. And whether the door be barn door or church door it might be wiser that he should consent to be an ordinary man and walk in rather than wait till all the difficulties involved in a really scientific ingress are resolved."
Would, for example, the atheist Alan Turing have achieved so much had he not believed that his initial ideas would bear fruit? Would an AI program that passes the Turing test encounter an idea that provokes it to highly focused effort in anticipation of some reward, such as the satisfaction of solving a problem? Can qualia, or some equivalent, be written into a computer program?
In
The Grammar of Science, Pearson has an annoying way of personifying Science, almost as if it is some monolithic God. I realize that Pearson is using a common type of metaphorical shorthand, but nevertheless he gives the impression that he and Science are One, a criticism that is applicable to various thinkers today.
Let us digress for a bit and consider what is meant by the word "science."
At a first approximation, one might say the word encompasses the publications and findings of people who interrogate nature in a "rational" manner.
In other words, the scientific methods attempt to establish relations among various phenomena that do not contradict currently accepted major theories. The scientific investigator has much in common with the police detective. He or she often uses a process of elimination, coupled with the sketch of a narrative provided by various leads or "facts." The next step is to fill out the narrative to the degree necessary to establish the "truth" of a particular finding. It is often the case that more than one narrative is possible. The narratives that are crowned with the title "theory" (in the scientific sense of the word) are those that seem to be most internally consistent and that also are not dissonant with the background framework "reality" (and if it is, that theory will encounter strong resistance).
So "science" is a word used to describe the activities of a certain group of people who interrogate nature in accord with certain group norms -- norms concerning process and norms concerning philosophy or metaphysics (denial of such interests nevertheless implies a metaphysical belief set).
One idea of Popper's that is widely accepted among scientists is that if a statement is not potentially falsifiable via advances in experimental technology or clever experiment design, then that statement is not scientific and not a proper focus of scientists (though light-hearted speculations may be tolerated).
Hence, the entire scientific enterprise is not scientific, as its underlying assumptions, which are metaphysical, cannot be falsified by experimental or logical means.
At any rate, many will agree that "Science" as a monolithic entity does not exist. Science does not do anything. Science does not prove anything. The word is really a convenient handle that by itself cannot properly summarize a complex set of human activities. Scientists of course all know that there is no great being named Science. And yet when various thinkers, including scientists, employ this anthropomorphism, they often tend to give this being certain qualities, such as rationality, as distinct from, say, irrational Religion, a straw man that is also an anthropomorphism for a wide range of human activities.
In other words, we should beware scientists propagating their faith in the god Science. They may say they don't mean to do that, but, mean it or not, that is in fact what quite a few of them do.
60. Symmetry by Hermann Weyl (Princeton, 1952).
61. Time Travel in Einstein's Universe by J. Richard Gott III (Houghton Mifflin, 2001).
62. Kurt Goedel in Albert Einstein: Philosopher-Scientist, edited by Paul Arthur Schilpp (Library of Living Philosophers, 1949).
63. Cycles of Time: An extraordinary new view of the universe by Roger Penrose (The Bodley Head, 2010).
64. The Anthropic Cosmological Principle by John D. Barrow and Frank J. Tipler (Oxford, 1988).
65. Wheeler quoted in The Undivided Universe: An Ontological Interpretation of Quantum Theory by David Bohm, Basil James Hiley (Routledge, Chapman & Hall, Incorporated, 1993). The quotation is from Wheeler in Mathematical Foundations of Quantum Mechanics, A.R. Marlow, editor (Academic Press, 1978).
66. Undivided Universe, Bohm.
67. Bohm (see above) is referring to The Many-Worlds Interpretation of Quantum Mechanics by B.S. DeWitt and N. Graham (Princeton University Press 1973).
68. Undivided Universe, Bohm.
69. Gravitation by Charles W. Misener, Kip S. Thorne and John Archibald Wheeler (W.H. Freeman, 1970, 1971).
70. Black Holes and Wormholes by Kip Thorne (W.W. Norton, 1994).
71. The Open Universe (Postscript Volume II) by Karl Popper (Routledge, 1988. Hutchinson, 1982).
72. New Foundations of Quantum Mechanics by Alfred Landé (Cambridge University Press, 1965). Cited by Popper in Schism.
73. Logical Foundations of Probability by Rudolph Carnap (University of Chicago, 1950).
74. Physics and Philosophy by James Jeans (Cambridge, Macmillan, 1943).
75. Quantum Theory and the Schism in Physics (Postscript Vol. III) by Karl Popper (Routledge, 1989. Hutchinson, 1982).
76. The New Background of Science by James Jeans (Cambridge, 1933, 1934).
77. B. Alan Wallace, a Buddhist scholar, tackles the disconnect between the scientific method and consciousness in this video from the year 2000.
B. Alan Wallace on science and consciousness
http://www.youtube.com/watch?v=N0IotYndKfg
78. Space, Time and Gravitation: An Outline of the General Relativity Theory by Arthur Eddington (Cambridge 1920, Harper and Row reprint, 1959).
79. Taken from excerpts of the cited scientist's writings found in Quantum Questions: Mystical Writings of the World's Great Physicists, edited by Ken Wilbur (Shambhala Publications, 1984). Wilbur says the book's intent is not to marshal scientific backing for a New Age agenda.
80. From "Autobiographical Notes" appearing in Albert Einstein: Philosopher-Scientist, Paul Arthur Schilpp, editor (Library of Living Philosophers 1949).
81. Science and Information Theory, Second Edition, by Leon Brillouin (Dover 2013 reprint of Academic Press 1962 edition; first edition, 1956).
82. The Meaning of Relativity by Albert Einstein (fifth edition, Princeton, 1956).
83. The Self Illusion: how the social brain creates identity by Bruce Hood (Oxford, 2012).
84. The 'Particles' of Modern Physics by J.D. Stranathan (Blakison, 1942).
z1. Human Knowledge -- Its Scope and Limits by Bertrand Russell (Simon and Schuster 1948).
No comments:
Post a Comment