Johnjoe McFadden 发表于 2014-11-29 11:11


然而,薛定谔1944年的《生命是什么》(What Is Life)一书中却写道,生命中一些最为基础的砖石,必定会像肉眼看不到的放射性原子一样,是一种量子实体,具有反直觉的特征。实际上薛定谔认为,生命和非生命之所以不同,正是因为生命存在于量子世界和经典世界之间的中间地带——我们可以称之为“量子边界”。

让我们从几个相对边缘的例子说起——比如嗅觉。关于嗅觉的传统理论认为,气味分子会被味觉受体探测到,靠的是鼻子内一种钥匙-锁结构:气味分子与受体的空隙结合,然后触发反应,就像钥匙转动了锁。这是一种令人愉快、非常直观的学说,但是它解释不了某些现象——例如,外形相似的分子经常会闻起来不一样,反之亦然。经过修正的学说认为,感受器也许是对分子振动做出回应。在1996年这个想法在量子学层面得到了进一步的解释——生物物理学家卢卡•都灵(Luca Turin)提出振动可能会促进电子的量子隧道效应。打开嗅觉的“锁”。嗅觉的量子理论也许听起来很奇怪,但最近出现了支持的证据:果蝇可以分辨形状完全相同、只是用了同一元素不同同位素的气味分子,这用量子力学之外的理论很难解释清楚。或者考虑一下这个问题:我们已知一些鸟类和其他动物会通过感知地球上非常微弱的磁场来导航,但它们是怎么办到的,一直是个谜。很难想象到如此微弱的磁场如何在动物体内产生一个信号。在另一个关于欧亚鸲的研究中出现了更深层次的问题,这种鸟的导航系统依赖光线,并且不同于常规的指南针,它探测的不是磁感线的朝向,而是磁感线相对于地表的角度。没有人知道为什么。
直到20世纪70年代,德国化学家克劳斯•舒特恩(Klaus Schulten)发现一些化学反应产生的粒子对会保持连接状态,靠的是一种特殊的量子属性——量子缠结。量子缠结允许远距离的粒子维持即时通讯,无论它们之间有多远,即便被扔在银河系的两端,它们之间仍然能难以理解地相互关联。量子缠结是如此诡异以至于提出黑洞和时空扭曲理论的阿尔伯特•爱因斯坦(Albert Einstein)本人说这是“鬼魅似的远距作用”。但数以百计的实验证明这是真实的。
舒特恩发现,缠结的粒子对会对磁场的强度和方向极其敏感。他认为神秘的鸟类导航也许用到了粒子的量子纠缠。几乎没有人认同这个观点,但在2000年时,舒特恩和他的学生索斯藤•丽兹(Thorsten Ritz)写了一篇很有影响力的文章,这篇文章展示了在鸟的眼睛中,光是如何影响量子缠结导航的。在2004年,丽兹与著名鸟类学家沃尔夫冈和罗斯维塔•威尔科奇夫妇合作,他们找到了能令人信服的实验证据,证明欧亚鸲每年在全球范围内迁徙时,的确存在爱因斯坦所说的“鬼魅”作用。
比方说酶。它们是生命世界的老黄牛,能够加速化学反应,在几秒内就完成要花数千年才能完成的过程。酶往往能让反应加快几万亿倍,但它是怎么做到这一点的,一直是个谜。不过现在,加州伯克利大学的朱迪思•克兰曼(Judith Klinman)和曼彻斯顿大学的奈杰尔•斯克鲁顿(Nigel Scrutton)等人发现,酶有一个神奇的量子小窍门——隧道效应。简单来说,酶在生物化学反应中促进了这样个一过程:电子和质子从生化分子的某处消失,同一瞬间在另一个地方出现,而不必经过中间的任何地方——也就是某种意义上的“传送”。这都是非常基本的东西。这个星球上每个生物的每个细胞中的每一个生物分子,都是酶创造的。酶比任何其他成分(哪怕DNA,毕竟有些细胞没有DNA也能活)都更有资格称为生命的必备成分。而它们靠浸入量子世界来帮助我们存活下去。
我们还可以把论证再往前推一步。光合作用是地球上最重要的生化反应。它负责将光,空气,水和少量矿物质转变成草,树木,粮食以及以植物或食草动物为食的我们。起初是由叶绿素分子捕获光能。该光能被转化为电能,然后这些电能将被输送到被称为反应中心的生化工厂,在那里它们被用来固定二氧化碳并将其转化成植物物质。长期以来,这种能源运输的过程让研究者们着迷,因为它可以如此高效——接近100% 。绿叶运输能量的过程是如何做到比我们最先进的技术还要好的?
在加州大学伯克利分校,格雷厄姆•佛莱明(Graham Fleming)的实验室已经利用“飞秒光谱技术”对光合作用的效率问题进行了十多年的研究。从本质上说,这个研究小组就是在极短的时间内往光合作用复合物上照射激光,以找出光子抵达光合反应中心的路径。早在2007年,这个小组就研究了细菌中的FMO复合物。在这个复合物中,光子的能量需要通过一簇叶绿素分子。人们曾认为在这个过程中,光子会如同带电粒子一样从一个叶绿素分子跳到另一个叶绿素分子上,就好比薛定谔的猫在横渡溪流时可能会从一块石头跳到另一块上一样。但这种解释并不完全说得通。光子可没有方向感,大多数光能应该会漫无目的地往错误的方向传递,最终一头栽到“溪水”里。可是,在植物和能进行光合作用的细菌中,几乎全部光能都传到了光合反应中心。
如果对你来说,这样都还不够的话,我们最后来看看演化机制本身吧。薛定谔认为突变可能与一种量子跃迁有关。在沃森和克里克那篇经典的DNA文章中,他们提出基因突变可能牵扯到核苷酸碱基的“互变异构”——互变异构过程被认为与量子隧穿效应有关。在1999年,吉姆•艾尔-卡里利(Jim Al-Khalili)和我觉得质子隧穿可能解释一种特别的突变类型——所谓的“适应性突变”。当这种突变能为个体带来好处时,这种突变似乎就会更加频繁地发生。我们当时的论文完全是理论性的,但我们现在正在试图为DNA中的质子隧穿找到实验证据。所以,请拭目以待。
薛定谔(Erwin Schrdinger, 1887~1961),奥地利理论物理学家。1906年至1910年在维也纳大学物理系学习。1910年获博士学位后,在维也纳大学第二物理研究所工作。1921年至1927年在瑞士苏黎士大学任数学物理教授,1927年接替普朗克到柏林大学担任理论物理学教授。出于对纳粹政权的愤慨,1933年移居英国牛津。1939年转到爱尔兰,在都柏林高级研究所工作了17年,直到1956年返回奥地利。1961年元月在奥地利阿尔卑巴赫山村病逝。
Measuring Information: Shannon versus Popper
Philosophers have a notion of the epistemic "strength" or "boldness" of a proposition, or rather its information content, and perhaps have an idea from Popper or Wittgenstein that it can be measured using probability. This short note explains the advantage of the Shannon information measure used in information science, in terms of logical consistency and with a minimum of formalism.
The issue of how to quantify information has come up frequently in the literature on inductive logic (e.g. Hempel & Oppenheim (1948), Carnap and Bar-Hillel (1952)). What is agreed is that information content is a quantity attaching to propositions. When you receive the message "Supper's ready", we say that strictly the information content attaches not to that utterance but to the proposition that you have received that utterance. As such, information content can be represented as a mathematical function over sentences of a logical language, much like probability or utility functions. The common theme between different proposed measures is the principle, found in Popper and in Wittgenstein, that a proposition is informative according to how many possible situations it excludes. Popper and others have insisted that the information content of H is measured by 1-P(H) where P(H) is the logical probability of H . This means that the information content is just the ratio of possibilities excluded by H to all logical possibilities. This measure meets a basic requirement of a measure of information: namely that if B is a proposition which has a non-negligible probability given A, then A&B is more informative than A, because it is true in fewer situations. AvB, on the other hand, has less content.
However, the question of how to measure information has been decisively solved by Shannon (Shannon and Weaver (1949)) in a paper that is crucial to what is now called information technology. To show what is at stake, I will explain how Shannon derived his measure and then show why Popper's measure is unacceptable.
Shannon based his measure of information on requirements of logical consistency. Indeed his work is very similar to the Cox proof of Bayesian probability. Like Cox, Shannon set out consistency requirements on a type of formal system as mathematical constraints on a function, then showed that the functions satisfying these constraints differ only trivially from each other, and hence that there is really only one consistent measure.
To illustrate what is meant by a consistency constraint in this context, imagine that you receive two successive messages through the same channel each consisting of one letter of the alphabet. Imagine separately that you receive a single message consisting of two letters of the alphabet. It should be clear that these are different descriptions of the same situation, hence any truly propositional measure should give them the same value. Put another way, measures of information content should give the same value to "You receive 'A' followed by 'B'" as to "You receive 'AB'."
At the moment, we are concerned with measuring the information content of the message 'AB', not in the sense of how much is tells us about a particular issue, but in the sense of how much information would be required to unambiguously transmit the message down a hypothetical communication channel. This intrinsic complexity or information content is referred to in the theory as its self-information, whereas the extent to which a message is informative about whether or not H is called its cross-information on H.
With Popper, let us take 1-P(H) to measure information content, where each letter is taken as equally probable. In the first situation, the information content of the first message, whichever letter it turns out to be, is 25/26. Since there are two individual messages, the total information received is 50/26. In the second situation, the total number of possible messages (two-letter sequences) is 676. Whatever message you receive will logically exclude 675 of these messages, so the total information received is 675/676. Thus we have reached two entirely different values depending on how a particular message was described, and this serves to illustrate the problem with using a non-Shannonian measure.
Shannon's measure itself uses logarithms. The information content of a particular message A, called its surprisal, is -logP(A). It does not matter which base we use for the logarithm so long as we are consistent: this is the sense in which there are different mathematically allowable measures, but they differ so trivially that we can consider them to be one measure. When base two is used, the resulting unit of information is called a 'bit' (short for "binary digit"), a bit being the maximum amount of information that can be conveyed by the answer to a yes/no question.
In the above example, each one-letter message has a surprisal of -log21/26 = 4.7 bits, and a two-letter message has a surprisal of -log21/676 = 9.4 bits. Hence we see that the additivity requirement (that the content of two one-letter messages is that same as that of the one two-letter message) is satisfied.
Like probability and utility, information content is a propositional measure which obeys the expectation principle. If we do not know what a particular message is, but that it is the answer to a question whose possible answers are A1, A2, A3,..., An then the information content is the expectation of the information content over all possible messages, in other words the sum of -P(Ai)logP(Ai).
An information source or communications channel can be thought of as a question with one of a (possibly very large) set of possible answers.
This defines a crucial term in information theory: entropy. Calculating the expected information content for the set of possible answers to an inquiry gives us the entropy for that inquiry, which can informally be regarded as a measure of uncertainty attached to it. If a subject is irrevocably certain about an issue, in that one answer is given probability one while all others have probability zero, then the entropy is zero. When we have a finite set of mutually exclusive hypotheses with no information to discriminate between them, then entropy is at its maximum when all are given the same probability./p>
Since information content measures are simply descriptions of probability functions, it may seem that we do not gain anything by talking of information content that can not be expressed in terms of probability. However, information theory gives us a perspective on inferential tasks that we can miss if we talk entirely in terms of probability. To illustrate this I will consider a standard example. In a particular city, it rains three days out of four. The local weatherman makes a forecast for rain half the time, no rain the rest of the time. His predictions are such that he correctly predicts rain half the time, correctly predicts no rain a quarter of the time and incorrectly predicts no rain the remaining quarter of the time. This can be expressed in the following table of probabilities.
Here is the problem. Someone who predicts rain for every day will be right 75% of the time. Someone who accepts the weatherman's forecast also has a probability of 75% of being right on any given day. So why is the weatherman any use? The answer is that we do not simply have to accept the weatherman's forecast but use it as an information source. In other words, rather than taking the "No Rain" forecast uncritically, we can conditionalise on it to get a new probability of rain (in this case 50%).
We can evaluate how informative the forecaster is about the weather by measuring the reduction in entropy: a perfectly reliable forecaster would reduce the entropy to zero. The entropy resulting from consulting the weather forecaster is zero if the forecast is for rain and one bit if the forecast is no rain. Since these are equally likely, the overall entropy is half a bit. If we do not consult the weatherman, then given just the 75% chance of rain on any one day, the entropy is .811 . So the benefit of this forecaster is a .311 bit reduction in entropy.
By measuring the information content of the predictions in this way, we have a basis for comparison of weather forecasters (or other predictors) which is more meaningful than merely taking the probability of them being correct.
