Tuesday, March 22, 2016

regression premier Unbiased Estimation Y = α + Xβ +

http://math.arizona.edu/~jwatkins/N_unbiased.pdf


https://www-sop.inria.fr/asclepios/events/MFCA11/Proceedings/MFCA11_3_1

2 Multiple Linear Regression Before formulating geodesic regression on general manifolds, we begin by reviewing multiple linear regression in R n. Here we are interested in the relationship between a non-random independent variable X ∈ R and a random dependent variable Y taking values in R n. A multiple linear model of this relationship is given by Y = α + Xβ + , (1) where α ∈ R n is an unobservable intercept parameter, β ∈ R n is an unobservable slope parameter, and is an R n-valued, unobservable random variable representing the error. Geometrically, this is the equation of a one-dimensional line through R n (plus noise), parameterized by the scalar variable X. For the purposes of generalizing to the manifold case, it is useful to think of α as the starting point of the line and β as a velocity vector. Given realizations of the above model, i.e., data (xi , yi) ∈ R × R n, for i = 1, . . . , N, the least squares estimates, ˆα, β, ˆ for the intercept and slope are computed by solving the minimization problem (ˆα, βˆ) = arg min (α,β) X N i=1 kyi − α − xiβk 2 . (2) This equation can be solved analytically, yielding βˆ = 1 N Pxi yi − x¯ y¯ Px 2 i − x¯ 2 , αˆ = ¯y − x¯ β, ˆ where ¯x and ¯y are the sample means of the xi and yi , respectively. If the errors in the model are drawn from distributions with zero mean and finite variance, then these estimators are unbiased and consistent. M yi f (x ) = Exp(p, xv) p v Fig. 1. Schematic of the geodesic regression model.


http://math.arizona.edu/~jwatkins/N_unbiased.pdf


Topic 14 Unbiased Estimation 14.1 Introduction In creating a parameter estimator, a fundamental question is whether or not the estimator differs from the parameter in a systematic manner. Let’s examine this by looking a the computation of the mean and the variance of 16 flips of a fair coin. Give this task to 10 individuals and ask them report the number of heads. We can simulate this in R as follows > (x<-rbinom(10,16,0.5)) [1] 8 5 9 7 7 9 7 8 8 10 Our estimate is obtained by taking these 10 answers and averaging them. Intuitively we anticipate an answer around 8. For these 10 observations, we find, in this case, that > sum(x)/10 [1] 7.8 The result is a bit below 8. Is this systematic? To assess this, we appeal to the ideas behind Monte Carlo to perform a 1000 simulations of the example above. > meanx<-rep(0,1000) > for (i in 1:1000){meanx[i]<-mean(rbinom(10,16,0.5))} > mean(meanx) [1] 8.0049 From this, we surmise that we the estimate of the sample mean x¯ neither systematically overestimates or underestimates the distributional mean. From our knowledge of the binomial distribution, we know that the mean µ = np = 16 · 0.5=8. In addition, the sample mean X¯ also has mean EX¯ = 1 10(8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8) = 80 10 = 8 verifying that we have no systematic error. The phrase that we use is that the sample mean X¯ is an unbiased estimator of the distributional mean µ. Here is the precise definition. Definition 14.1. For observations X = (X1, X2,...,Xn) based on a distribution having parameter value ✓, and for d(X) an estimator for h(✓), the bias is the mean of the difference d(X) ! h(✓), i.e., bd(✓) = E✓d(X) ! h(✓). (14.1) If bd(✓)=0 for all values of the parameter, then d(X) is called an unbiased estimator. Any estimator that is not unbiased is called biased.

No comments:

Post a Comment