Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Eece5644 assignment 2

Question 1 (20%) The probability density function (pdf) for a 2-dimensional real-valued random vector X is as follows: p(x) = P(L = 0)p(x|L = 0) + P(L = 1)p(x|L = 1). Here L is the true class label that indicates which class-label-conditioned pdf generates the data. The class priors are P(L = 0) = 0.6 and P(L = 1) = 0.4. The class class-conditional pdfs are p(x|L = 0) = w01g(x|m01,C01) + w02g(x|m02,C02) and p(x|L = 1) = w11g(x|m11,C11) + w12g(x|m12,C12), where g(x|m,C) is a multivariate Gaussian probability density function with mean vector m and covariance matrix C. The parameters of the class-conditional Gaussian pdfs are: wi1 = wi2 = 1/2 for i ∈ {1,2}, and m01 = [−0.9 −1.1 ] m02 = [ 0.8 0.75 ] m11 = [−1.1 0.9 ] m12 = [ 0.9 −0.75 ] Ci j = [ 0.75 0 0 1.25 ] for all {i j} pairs. For numerical results requested below, generate the following independent datasets each consisting of iid samples from the specified data distribution, and in each dataset make sure to include the true class label for each sample. • D 20 train consists of 20 samples and their labels for training; • D 200 train consists of 200 samples and their labels for training; • D 2000 train consists of 2000 samples and their labels for training; • D 10K validate consists of 10000 samples and their labels for validation; Part 1: (6%) Determine the theoretically optimal classifier that achieves minimum probability of error using the knowledge of the true pdf. Specify the classifier mathematically and implement it; then apply it to all samples in D 10K validate. From the decision results and true labels for this validation set, estimate and plot the ROC curve for a corresponding discriminant score for this classifier, and on the ROC curve indicate, with a special marker, the location of the min-P(error) classifier. Also report an estimate of the min-P(error) achievable, based on counts of decisiontruth label pairs on D 10K validate. Optional: As supplementary visualization, generate a plot of the decision boundary of this classification rule overlaid on the validation dataset. This establishes an aspirational performance level on this data for the following approximations. Part 2: (12%) (a) Using the maximum likelihood parameter estimation technique train three separate logistic-linear-function-based approximations of class label posterior functions given a sample. For each approximation use one of the three training datasets D 20 train, D 200 train, D 2000 train. When optimizing the parameters, specify the optimization problem as minimization of the negative-loglikelihood of the training dataset, and use your favorite numerical optimization approach, such as gradient descent or Matlab’s fminsearch. Determine how to use these class-label-posterior approximations to classify a sample in order to approximate the minimum-P(error) classification rule; apply these three approximations of the class label posterior function on samples in D 10K validate, and estimate the probability of error that these three classification rules will attain (using counts of decisions on the validation set). Optional: As supplementary visualization, generate plots of the decision boundaries of these trained classifiers superimposed on their respective training datasets and the validation dataset. (b) Repeat the process described in Part (2a) using a logistic-quadraticfunction-based approximation of class label posterior functions given a sample. Discussion: (2%) How does the performance of your classifiers trained in this part compare to each other considering differences in number of training samples and function form? How do they compare to the theoretically optimal classifier from Part 1? Briefly discuss results and insights. 1 Note 1: With x representing the input sample vector and w denoting the model parameter vector, logistic-linear-function refers to h(x,w) = 1/(1+e −w T z(x) ), where z(x) = [1,x T ] T ; and logisticquadratic-function refers to h(x,w) = 1/(1+e −w T z(x) ), where z(x) = [1, x1, x2, x 2 1 , x1x2, x 2 2 ] T . Question 2 (20%) Assume that scalar-real y and two-dimensional real vector x are related to each other according to y = c(x,w) +v, where c(.,w) is a cubic polynomial in x with coefficients w and v is a random Gaussian random scalar with mean zero and σ 2 -variance. Given a dataset D = (x1, y1),…,(xN, yN) with N samples of (x, y) pairs, with the assumption that these samples are independent and identically distributed according to the model, derive two estimators for w using maximum-likelihood (ML) and maximum-a-posteriori (MAP) parameter estimation approaches as a function of these data samples. For the MAP estimator, assume that w has a zero-mean Gaussian prior with covariance matrix γI. Having derived the estimator expressions, implement them in code and apply to the dataset generated by the attached Matlab script. Using the training dataset, obtain the ML estimator and the MAP estimator for a variety of γ values ranging from 10−m to 10n . Evaluate each trained model by calculating the average-squared error between the y values in the validation samples and model estimates of these using c(.,wtrained). How does your MAP-trained model perform on the validation set as γ is varied? How is the MAP estimate related to the ML estimate? Describe your experiments, visualize and quantify your analyses (e.g. average squared error on validation dataset as a function of hyperparameter γ) with data from these experiments. Note: Point split will be 20% for ML and 20% for MAP estimator results and discussion. Question 3 (20%) A vehicle at true position [xT , yT ] T in 2-dimensional space is to be localized using distance (range) measurements to K reference (landmark) coordinates {[x1, y1] T ,…,[xi , yi ] T ,…,[xK, yK] T}. These range measurements are ri = dTi +ni for i ∈ {1,…,K}, where dTi = ∥[xT , yT ] T −[xi , yi ] T∥ is the true distance between the vehicle and the i th reference point, and ni is a zero mean Gaussian distributed measurement noise with known variance σ 2 i . The noise in each measurement is independent from the others. Assume that we have the following prior knowledge regarding the position of the vehicle: p x y ! = (2πσxσy) −1 e − 1 2 h x yi ” σ 2 x 0 0 σ 2 y #−1″ x y # (1) where [x, y] T indicates a candidate position under consideration. Express the optimization problem that needs to be solved to determine the MAP estimate of the vehicle position. Simplify the objective function so that the exponentials and additive/multiplicative terms that do not impact the determination of the MAP estimate [xMAP, yMAP] T are removed appropriately from the objective function for computational savings when evaluating the objective. Implement the following as computer code: Set the true vehicle location to be inside the circle with unit radious centered at the origin. For each K ∈ {1,2,3,4} repeat the following. Place evenly spaced K landmarks on a circle with unit radius centered at the origin. Set measurement noise standard deviation to 0.3 for all range measurements. Generate K range measure2 ments according to the model specified above (if a range measurement turns out to be negative, reject it and resample; all range measurements need to be nonnegative). Plot the equilevel contours of the MAP estimation objective for the range of horizontal and vertical coordinates from −2 to 2; superimpose the true location of the vehicle on these equilevel contours (e.g. use a + mark), as well as the landmark locations (e.g. use a o mark for each one). Provide plots of the MAP objective function contours for each value of K. When preparing your final contour plots for different K values, make sure to plot contours at the same function value across each of the different contour plots for easy visual comparison of the MAP objective landscapes. Suggestion: For values of σx and σy, you could use values around 0.25 and perhaps make them equal to each other. Note that your choice of these indicates how confident the prior is about the origin as the location. Supplement your plots with a brief description of how your code works. Comment on the behavior of the MAP estimate of position (visually assessed from the contour plots; roughly center of the innermost contour) relative to the true position. Does the MAP estimate get closer to the true position as K increases? Doe is get more certain? Explain how your contours justify your conclusions. Note: The additive Gaussian distributed noise used in this question is likely not appropriate for a proper distance sensor, since it could lead to negative measurements. However, in this question, we will ignore this issue and proceed with this noise model for illustration. In practice, a multiplicative log-normal distributed noise may be more appropriate than an additive normal distributed noise depending on the measurement mechanism. Question 4 (20%) Problem 2.13 from Duda-Hart-Stork textbook: 3 Question 5 (20%) Let Z be drawn from a categorical distribution (takes discrete values) with K possible outcomes/states and parameter θ, represented by Cat(Θ). Describe the value/state using a 1-of-K scheme for z = [z1,…,zK] T where zk = 1 if variable is in state k and zk = 0 otherwise. Let the parameter vector for the pdf be Θ = [θ1,…,θK] T , where P(zk = 1) = θk , for k ∈ {1,…,K}. Given D{z1,…, zN} with iid samples zn ∼ Cat(Θ) for n ∈ {1,…,N}: • What is the ML estimator for Θ? • Assuming that the prior p(Θ) for the parameters is a Dirichlet distribution with hyperparameter α, what is the MAP estimator for Θ? Hint: The Dirichlet distribution with parameter α is p(Θ|α) = 1 B(α) K ∏ k=1 θ αk−1 k where the normalization constant is B(α) = ∏ K k=1 Γ(αk) Γ(∑ K k=1αk) 4