Re: probability vs. statistics (was Re: Model of induction)

Posted by Robert Wall on
URL: http://friam.383.s1.nabble.com/Model-of-induction-tp7588431p7588458.html

Hi Glen, 

I feel a bit like Nick says he feels when immersed in the stream of such erudite responses to each of your seemingly related, but thread-separated questions.  As always, though, when reading the posted responses in this forum, I learn a lot from the various and remarkable ways questions can be interpreted based on individual experiences.  Perhaps this props up the idea of social constructivism more than Platonism.  So, if you can bear with me, my response here is more of a summary of my takeaways from the variety of responses to your two respective questions, with my own interpretations thrown in and based on my own experiences. 

Taking each question separately ...

Imagine a thousand computers, each generating a list of random numbers.  Now imagine that for some small quantity of these computers, the numbers generated are in n a normal (Poisson?) distribution with mean mu and standard deviation s.  Now, the problem is how to detect these non-random computers and estimate the values of mu and s.

Nick's question seems to be about how to determine non-random event generators from independent streams of reportedly random processes.  This is not really difficult to do and doesn't require any assumptions about underlying probability distributions other than that each number in the stream is equally likely as any other number in the stream [i.e., uniformly distributed in probability space] and that the cumulative probability over all possible outcomes sums to unity: the very definition of a random variable ... a non-deterministic event--an observation--mapped to a number line or a categorical bin.  A random variable has both mathematical and philosophical properties, as we have heard in this thread.

For Nick's question, I think that Roger has provided the most practical answer with Marsaglia's Die Hard battery of tests for randomness.  In my professional life, I used these tests to prepare, for example, a QC procedure for ensuring our hashing algorithms remained random allocators after each new build of our software suite.  For example, a simple test called the "poker test" using the Chi-squared distribution could be used to satisfy Nick's question with the power of the test (i.e., reducing the probability of rejecting the null hypothesis of randomness when it is true; thus perhaps finding more non-random processes than really exist) increasing with larger sample sizes ... longer runs.

So, does anyone here have an opinion on the ontological status of one or both probability and/or statistics?  Am I demonstrating my ignorance by suggesting the "events" we study in probability are not (identical to) the events we experience in space & time?

At the risk of exposing my own ignorance, I'll also say your question has to do with the ontological status of any random "event" when treated in any estimation experiments or likelihood computation; that is, are proposed probability events or measured statistical events real? 

For example--examples are always good to help clarify the question--is the likelihood of a lung cancer event given a history of smoking pointing to some reality that will actually occur with a certain amount of uncertainty? In a population of smokers, yes.  For an individual smoker, no. In the language of probability and statistics, we say that in a population of smokers we expect this reality to be observed with a certain amount of certainty (probability). To be sure, these tests would likely involve several levels of contingencies to tame troublesome confounding variables (e.g., age, length of time, smoking rate). Don't want to get into multi-variate statistics, though. 

Obviously, time is involved here but doesn't have to be (e.g., the probability of drawing four aces from a trial of five random draws). An event is an observation in, say, a nonparametric Fisher exact test of significance against the null hypothesis of, say, a person that smokes will contract lung cancer, which we can make contingent on, say, the number of years of smoking. Epidemiological studies can be very complex, so maybe not the best of examples ...

So, since probability and statistics both deal with the idea of an event--as your "opponent" insists--events are just observations that the event of interest [e.g., four of a kind] occurred; so I would say epistemologically they are real experiences with a potential (probability) based on either controlled randomized experiments of observational experience.  But is a potential ontologically real?  🤔

Asking if those events come with ontologically real probabilistic properties is another, perhaps, different question?  This gets into worldview notions of determinism and randomness. We tend to say that if a human cannot predict the event in advance, it is random ... enough. If it can be predicted based, say, on known initial conditions, then using probability theory here is misplaced. Still, there are chaotic non-random events that are not practically predictable ... they seem random ... enough.  Santa Fe science writer and book author George Johnson gets into this in his book Fire in the Mind.

I would just close with another comment, this time regarding Roger's recounting of Marsaglia's report on the issues with pseudo-random number generators.  RANDU was used on mainframes for years but was subsequently found to be seriously flawed. If I remember correctly, the rand() function used in C applications was also found to be deficient.  Likely, this is why we need these a battery of randomness tests to be sure. But there has been a great deal of research in this area and things have improved dramatically.  

There are even so-called true random number generators that "tap" into off-computer and decidedly random-event sources like atmospheric noise [or even quantum-level events].  But even here, some folks who's worldview see the universe as deterministic would say that these generators are not truly random either. Chaotic, yes.  But, not random. I say, likely random enough.

Finally, I would say that we can use number generators that are random enough for our own purposes. In fact, for running simulation models, say, to compare competing alternatives for decision support, we need to use pseudo-random number generators in order to be able to gain a sizable reduction in the (random) variance of the results. This would tend to sharpen up our test of significance in comparing the resulting output statistics as well. 

Kind of a fun topic.  Hope this adds a little of its own sharpness to the discussion and doesn't just add variance. 🤔 If y' all deem not, I will expect some change from my $0.02. 🤐

Cheers,

Robert W.

On Tue, Dec 13, 2016 at 3:42 PM, Grant Holland <[hidden email]> wrote:

Glen,

On closer reading of the issue you are interested in, and upon re-consulting the sources I was thinking of (Bunge and Popper), I can see that neither of those sources directly address the question of whether time must be involved in order for probability theory to come into play. Nevertheless, I  think you may be interested in these two sources anyway.

The works that I've been reading from these two folks are: Causality and Modern Science by Mario Bunge and The Logic of Scientific Discovery by Karl Popper. Bunge takes (positive) probability to essentially be the complement of causation. Thus his book ends up being very much about probability. Popper has an eighty page section on probability and is well worth reading from a philosophy of science perspective. I recommend both of these sources.

While I'm at it, let me add my two cents worth to the question concerning the difference between probability and statistics. In my view, Probability Theory should be  defined as "the study of probability spaces". Its not often defined that way - usually something about "random variables" appears in the definition. But the subject of probability spaces is more inclusive, so I prefer it.

Secondly, its reasonable to say that a probability space defines "events" (at least in the finite case) as essentially a set of combinations of the sample space (with a few more specifications). Nothing is said in this definition that requires that "the event must occur in the future". But it seems that many people (students) insist that it has to - or else they can't seem to wrap their minds around it. I usually just let them believe that "the event has to be in the future" and let it go at that. But there is nothing in the definition of an event in a probability space that requires anything about time.

I regard the discipline of statistics (of the Fisher/Neyman type) as the study of a particular class of problems pertaining to probability distributions and joint distributions: for example, test of hypotheses, analysis of variance, and other problems. Statistics makes some very specific assumptions that probability theory does not always make: such as that there is an underlying theoretical distribution that exhibits "parameters" against which are compared "sample distributions" that exhibit corresponding "statistics". Moreover, the sweet spot of statistics, as I see it, is the moment and central moment functionals that, essentially, measure chance variation of random variables.

I admit that some folks would say that probability theory is no more inclusive than I described statistics as being. But I think that it is. Admittedly, what I have just said is more along the lines of "what it is to me" - a statement of preference, rather than an ontic argument that "this is what it is".

As long as we're all having a good time...

Grant

On 12/13/16 12:03 PM, glen ☣ wrote:
Yes, definitely.  I intend to bring up deterministic stochasticity >8^D the next time I see him.  So a discussion of it in the context QM would be helpful.

On 12/13/2016 10:54 AM, Grant Holland wrote:
This topic was well-developed in the last century. The probabilists argued the issues thoroughly. But I find what the philosophers of science have to say about the subject a little more pertinent to what you are asking, since your discussion seems to be somewhat ontological. In particular I'm thinking of Peirce, Popper and especially Mario Bunge. The latter two had to account for quantum theory, so are a little more pertinent - and interesting. I can give you more specific references if you are interested.

    


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove