Wednesday, July 29, 2015

keynes 诺伊曼 smi value and belief= a compete set keynes “主观概率学派”。凯恩斯主张把任何命题都看作事件,例如“明天将下雨”,“土星上有生命”等等都是事件,人们对这些事件的可信程度就是概率,而与随机试验无关,通常称为主观概率

本征态: 对于某一类作用,微观状态被作用后本身不变 - 新浪博客

blog.sina.com.cn/s/blog_a582cd40010164gr.html - 轉為繁體網頁
2012年6月24日 - 实验表明,对于别的一些状态(非本征态),宏观世界和它的作用将使它变为很多本征态中的一个,但究竟具体是哪一个,这将是不能预先确定的,但变 ...

phymath999: 宏观世界和它的作用将使它变为很多本征态中的 ...

phymath999.blogspot.com/2014/05/blog-post_2258.html 轉為繁體網頁
2014年5月3日 - 的作用, 用力学量E , x , P …等表示我们获得的宏观输出。 ... 实验表明,对于别的一些状态(非本征态) ,宏观世界和它的作用将使它变为很多本征 ...

[PDF]从三十年代冯'诺伊曼等人的工作开始, 量子力学逐渐被 ... - 物理

www.wuli.ac.cn/fileup/PDF/19760509.pdf 轉為繁體網頁
1976年5月9日 - 但对于别的一些状态(非本征态), 宏观世界和它. 的作用将使它变为很多本征态中的一个, 但究竟具体. 是哪一个,这将是不能预先确定的,但变成某 ...

"lorenz對稱"

 对称性的意思是当事物处在一个态时观测其处于另一个态的概率不依赖于观察者的时空位置与运动状态(也即坐标架与惯性系的选择)。由此 Wigner 推出对于不同观察者的态之间通过一个幺正或反幺正算符转换。特别地,不同时刻的观察者观察到的态也由一个幺正算符转换。
  
  狭义相对论基本公理——所有保持平直的时空坐标变换需满足任意两个事件的时空间隔不变性。


Measuring Information: Shannon versus Popper

 
Page history last edited by PBworks 8 years ago
哈耶克说的,“总是使一个国家变成人间地狱的东西,恰恰是人们试图将其变成天堂。”


概率论公理化的三种学派的讨论
已有 2521 次阅读 2011-2-19 22:51 |个人分类:相对信息论|系统分类:论文交流|关键词:概率论
概率论公理化的三种学派的讨论
网上看到相关的信息如下:
目前概率论公理化有三种学派。
1921年以凯恩斯(J.M.Keynes)为代表的“主观概率学派”。凯恩斯主张把任何命题都看作事件,例如“明天将下雨”,“土星上有生命”等等都是事件,人们对这些事件的可信程度就是概率,而与随机试验无关,通常称为主观概率
1928年以冯.米泽斯(von Mises)为代表的“客观概率学派”,
米泽斯定义事件的概率为该事件出现的频率的极限,而作为公理就必须把这一极限的存在作为第一条公理,通常称为客观概率.
1933年以柯尔莫哥洛夫为代表的“以测度论为基础的概率公理化体系。
目前,绝大多数教科书都是采用柯尔莫哥洛夫的概率公理化体系.
我的看法是,凯恩斯的看法有点绝对,这些事件的概率虽然是不为随机实验决定的,但是,这些事件的概率本身在一定的程度上影响(但是不决定)随机实验的结果。所以,应该说随机实验反映了这些事件内在的概率,假如其概率是确定的话。
而客观概率学派则是走向另外一个极端,过度看重随机实验的结果,随机试验的结果本身并不完全由事件的概率决定,它也有脱轨的时候。而且我们不能指望做无限次实验,有时候想在完全相同的条件下重复试验都困难,更不用说许多次。
柯尔莫哥洛夫给出的概率公理化体系,我只是参考网上的归结出来的那些,但是觉得那几条完全无法得出事件的概率值来。当然或许是我孤陋寡闻。恳请批评指正。
 
对于随机试验,我的看法是,它既受到事件概率的影响,也受到随机性本身的偶然性影响,所以,我们很难将它们分离出来,特别是到底受到了后者多大的影响,我们是不知道的,因为它本身是未知的。还有,两者对于实验结果的影响大小还与实验的次数等因素有关系。

Measuring Information: Shannon versus Popper


Extracted and adapted for the Web from "Value and Belief", a PhD thesis accepted by the University of Bristol, 2003.
Topics: information theory, inductive logic, epistemology, logical probability

 

Abstract


Philosophers have a notion of the epistemic "strength" or "boldness" of a proposition, or rather its information content, and perhaps have an idea from Popper or Wittgenstein that it can be measured using probability. This short note explains the advantage of the Shannon information measure used in information science, in terms of logical consistency and with a minimum of formalism.

The issue of how to quantify information has come up frequently in the literature on inductive logic (e.g. Hempel & Oppenheim (1948), Carnap and Bar-Hillel (1952)). What is agreed is that information content is a quantity attaching to propositions. When you receive the message "Supper's ready", we say that strictly the information content attaches not to that utterance but to the proposition that you have received that utterance. As such, information content can be represented as a mathematical function over sentences of a logical language, much like probability or utility functions. The common theme between different proposed measures is the principle, found in Popper and in Wittgenstein, that a proposition is informative according to how many possible situations it excludes. Popper and others have insisted that the information content of H is measured by 1-P(H) where P(H) is the logical probability of H . This means that the information content is just the ratio of possibilities excluded by H to all logical possibilities. This measure meets a basic requirement of a measure of information: namely that if B is a proposition which has a non-negligible probability given A, then A&B is more informative than A, because it is true in fewer situations. AvB, on the other hand, has less content.

However, the question of how to measure information has been decisively solved by Shannon (Shannon and Weaver (1949)) in a paper that is crucial to what is now called information technology. To show what is at stake, I will explain how Shannon derived his measure and then show why Popper's measure is unacceptable.

Shannon based his measure of information on requirements of logical consistency. Indeed his work is very similar to the Cox proof of Bayesian probability. Like Cox, Shannon set out consistency requirements on a type of formal system as mathematical constraints on a function, then showed that the functions satisfying these constraints differ only trivially from each other, and hence that there is really only one consistent measure.

To illustrate what is meant by a consistency constraint in this context, imagine that you receive two successive messages through the same channel each consisting of one letter of the alphabet. Imagine separately that you receive a single message consisting of two letters of the alphabet. It should be clear that these are different descriptions of the same situation, hence any truly propositional measure should give them the same value. Put another way, measures of information content should give the same value to "You receive 'A' followed by 'B'" as to "You receive 'AB'."

At the moment, we are concerned with measuring the information content of the message 'AB', not in the sense of how much is tells us about a particular issue, but in the sense of how much information would be required to unambiguously transmit the message down a hypothetical communication channel. This intrinsic complexity or information content is referred to in the theory as its self-information, whereas the extent to which a message is informative about whether or not H is called its cross-information on H.

With Popper, let us take 1-P(H) to measure information content, where each letter is taken as equally probable. In the first situation, the information content of the first message, whichever letter it turns out to be, is 25/26. Since there are two individual messages, the total information received is 50/26. In the second situation, the total number of possible messages (two-letter sequences) is 676. Whatever message you receive will logically exclude 675 of these messages, so the total information received is 675/676. Thus we have reached two entirely different values depending on how a particular message was described, and this serves to illustrate the problem with using a non-Shannonian measure.

Shannon's measure itself uses logarithms. The information content of a particular message A, called its surprisal, is -logP(A). It does not matter which base we use for the logarithm so long as we are consistent: this is the sense in which there are different mathematically allowable measures, but they differ so trivially that we can consider them to be one measure. When base two is used, the resulting unit of information is called a 'bit' (short for "binary digit"), a bit being the maximum amount of information that can be conveyed by the answer to a yes/no question.
In the above example, each one-letter message has a surprisal of -log21/26 = 4.7 bits, and a two-letter message has a surprisal of -log21/676 = 9.4 bits. Hence we see that the additivity requirement (that the content of two one-letter messages is that same as that of the one two-letter message) is satisfied.

Like probability and utility, information content is a propositional measure which obeys the expectation principle. If we do not know what a particular message is, but that it is the answer to a question whose possible answers are A1, A2, A3,..., An then the information content is the expectation of the information content over all possible messages, in other words the sum of -P(Ai)logP(Ai).
An information source or communications channel can be thought of as a question with one of a (possibly very large) set of possible answers.

This defines a crucial term in information theory: entropy. Calculating the expected information content for the set of possible answers to an inquiry gives us the entropy for that inquiry, which can informally be regarded as a measure of uncertainty attached to it. If a subject is irrevocably certain about an issue, in that one answer is given probability one while all others have probability zero, then the entropy is zero. When we have a finite set of mutually exclusive hypotheses with no information to discriminate between them, then entropy is at its maximum when all are given the same probability./p>

Information versus Probability


Since information content measures are simply descriptions of probability functions, it may seem that we do not gain anything by talking of information content that can not be expressed in terms of probability. However, information theory gives us a perspective on inferential tasks that we can miss if we talk entirely in terms of probability. To illustrate this I will consider a standard example. In a particular city, it rains three days out of four. The local weatherman makes a forecast for rain half the time, no rain the rest of the time. His predictions are such that he correctly predicts rain half the time, correctly predicts no rain a quarter of the time and incorrectly predicts no rain the remaining quarter of the time. This can be expressed in the following table of probabilities.

 RainNo Rain
Rain forecast50%0%
No Rain forecast25%25%
Here is the problem. Someone who predicts rain for every day will be right 75% of the time. Someone who accepts the weatherman's forecast also has a probability of 75% of being right on any given day. So why is the weatherman any use? The answer is that we do not simply have to accept the weatherman's forecast but use it as an information source. In other words, rather than taking the "No Rain" forecast uncritically, we can conditionalise on it to get a new probability of rain (in this case 50%).

We can evaluate how informative the forecaster is about the weather by measuring the reduction in entropy: a perfectly reliable forecaster would reduce the entropy to zero. The entropy resulting from consulting the weather forecaster is zero if the forecast is for rain and one bit if the forecast is no rain. Since these are equally likely, the overall entropy is half a bit. If we do not consult the weatherman, then given just the 75% chance of rain on any one day, the entropy is .811 . So the benefit of this forecaster is a .311 bit reduction in entropy.

By measuring the information content of the predictions in this way, we have a basis for comparison of weather forecasters (or other predictors) which is more meaningful than merely taking the probability of them being correct.

References


Carnap, R. and Y. Bar-Hillel., 1953. "An outline of a theory of semantic information." British Journal for the Philosophy of Science, 4, 147-157.

Hempel, C. G. and P, Oppenheim., 1948. "Studies in the logic of explanation." Philosophy of Science, 15: 135-175.

Shannon, C. E. and W. Weaver, 1949. The mathematical theory of communication. Urbana, Illinois: University of Illinois Press.



我自己阅读的一点理解

2014-10-18 23:06:18   来自: 七星之城
The Quantum Theory of Fields Volume I:Foundations的评论    5 star rating5 star rating5 star rating5 star rating5 star rating 5
提示: 有关键情节透露


No comments:

Post a Comment