Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Cse100 lab 03 solving the max subarray problem via divide-and-conquer

In this lab assignment, your job is to implement the O(n log n) time divideand-conquer algorithm for the Max Subarray Problem; for the pseudo-code, see page 72 in the textbook or the lecture slides. Recall that in the problem, we are given as input an array A[1 · · · n] of n integers, and would like to find i ∗ and j ∗ (1 ≤ i ∗ ≤ j ∗ ≤ n) such that A[i ∗ ] + A[i ∗ + 1] + · · · + A[j ∗ ] is maximized.Input structure The input starts with an integer number n, which indicates the array size. Then, the integers, A[1], A[2], · · · , A[n], follow, one per line.Output structure Output the sum of integers in the max subarray, i.e., A[i ∗ ] + A[i ∗ + 1] + · · · + A[j ∗ ]. Examples of input and output: Input 6 -3 11 -2 -3 10 -5Output 16 Note that in this example, the max subarray is A[2 · · · 5]. So, we output A[i ∗ ] + · · · + A[j ∗ ] = 11 − 2 − 3 + 10 = 16. The output is only one number and has no white space. See the lab guidelines for submission/grading, etc., which can be found in Files/Labs.

[SOLVED] Cse100 lab 02 merge sort

In this lab assignment your job is to implement Merge-Sort. See the textbook for the algorithm and its pseudo-code. You must output the given elements (integers) in nondecreasing order.Input structure The input starts with an integer number which indicates the number of elements, n. Then, the elements follow, one per line.Output structure Output the elements in non-decreasing order. Each element must be followed by ;. Examples of input and output: Input 6 5 3 2 1 6 4 Output 1;2;3;4;5;6;Note that the output has only one line and has no white characters. See the lab guidelines for submission/grading, etc., which can be found in Files/Labs.

[SOLVED] Cse100 lab 00 find max and min

This lab is indented to help you understand how you should test and submit your own code. Therefore, we provide a solution (yourid.cpp). This lab will be worth ZERO points but we still strongly encourage you to submit your code to see if you’ve completely understood how submission and grading work.Input structure The input starts with an integer number which indicates the number of elements (integers) in the input sequence, n. Then, the elements in the sequence follow, one per line.Output structure Output the maximum of all numbers in the sequence, followed by a semicolon and the minimum number. There should be no white character in your output. Examples of input and output: Input 6 15 13 12 11 16 14Output 16;11 See the lab guidelines for submission/grading, etc., which can be found in Files/Labs or in the CatCourses page for this assignment.

[SOLVED] Cse100 lab 01 insertion sort

In the first lab assignment, your job is to implement insertion-sort (Yes, this is just a warm-up, and the labs will be increasingly difficult. So heads up!)Input structureThe input starts with an integer number which indicates the number ofelements (integers) to be sorted, n. Then, the elements follow, one per line. Output structure Recall that Insertion Sort first sorts the first two elements (in non-decreasing order), then the first three elements, and so on. You are asked to output the snapshot of the array at the end of each iteration. More precisely, for each 2 ≤ k ≤ n, output the first k elements (in non-decreasing order) in a separate line where each element is followed by ;. A new line is followed by an enter.Examples of input and output: Input 6 5 3 2 1 6 4Output 3;5; 2;3;5; 1;2;3;5; 1;2;3;5;6; 1;2;3;4;5;6;More precisely, the above output example has 6 lines since a “cout

[SOLVED] (csc420) assignment 4 part i: theoretical problems (70 marks) [question 1] ransac (10 marks)

We have two images of a planar object (e.g. a painting) taken from different viewpoints and we want to align them. We have used SIFT to find a large number of point correspondences between the two images and visually estimate that at least 70% of these matches are correct with only small potential inaccuracies. We want to find the true transformation between the two images with a probability greater than 99.5%.1. (5 marks) Calculate the number of iterations needed for fitting a homography. 2. (5 marks) Without calculating, briefly explain whether you think fitting an affine transformation would require fewer or more RANSAC iterations and why.Assume a plane passing through point P⃗ 0 = [X0, Y0, Z0] T with normal ⃗n. The corresponding vanishing points for all the lines lying on this plane form a line called the horizon. In this question, you are asked to prove the existence of the horizon line by following the steps below:1. (15 marks) Find the pixel coordinates of the vanishing point corresponding to a line L, passing point P⃗ 0 and going along direction ⃗d. Hint: P⃗ = P⃗ 0 +t ⃗d are the points on line L, and ⃗p =   ωx ωy ω   = K P⃗ = K   X0 + t dx Y0 + t dy Z0 + t dz   are pixel coordinates of the same line in the image, and K =   f 0 px 0 f py 0 0 1  , where f is the camera focal length and (px, py) is the principal point.2. (15 marks) Prove the vanishing points of all the lines lying on the plane form a line. Hint: all the lines on the plane are perpendicular to the plane’s normal ⃗n; that is, ⃗n . ⃗d = 0, or nx dx + ny dy + nz dz = 0Using the homogeneous coordinates: 1. (15 marks) (a) Show that the intersection of the 2D line l and l ′ is the 2D point p = l × l ′ . (here × denotes the cross product)2. (15 marks) (b) Show that the line that goes through the 2D points p and p ′ is l = p×p ′ .You are given three images hallway1.jpg, hallway2.jpg, hallway3.jpg which were shot with the same camera (i.e. same internal camera parameters), but held at slightly different positions/orientations (i.e. with different external parameters). hallway1.jpg hallway2.jpg hallway3.jpg Consider the homographies H,   wexe weye we   =   x y 1   that map corresponding points of one image I to a second image Ie, for three cases: A. The right wall of I =hallway1.jpg to the right wall of Ie=hallway2.jpg. B. The right wall of I =hallway1.jpg to the right wall of Ie=hallway3.jpg. C. The floor of Ie=hallway1.jpg to the floor of Ie=hallway3.jpg.For each of these three cases: 1. (10 marks) Use a Data Cursor to select corresponding points by hand. Select more than four pairs of points. (Four pairs will give a good fit for those points, but may give a poor fit for other points.) Also, avoid choosing three (or more) collinear points, since these do not provide independent information. This is trickier for case C. Make two figures showing the gray-level images of I and Ie with a colored square marking each of the selected points. You can convert the image I or Ie to gray level using an RGB to grayscale function (or the formula gray = 0.2989 × R + 0.5870 × G + 0.1140 × B).2. (10 marks) Fit a homography H to the selected points. Include the estimated H in the report, and describe its effect using words such as scale, shear, rotate, translate, if appropriate. You are not allowed to use any homography estimation function in OpenCV or other similar packages.3. (10 marks) Make a figure showing the Ie image with red squares that mark each of the selected (x, e ye), and green squares that mark the locations of the estimated (x, e ye), that is, use the homography to map the selected (x, y) to the (x, e ye) space.4. (25 marks) Make a figure showing a new image that is larger than the original one(s). The new image should be large enough that it contains the pixels of the I image as a subset, along with all the inverse mapped pixels of the Ie image. The new image should be constructed as follows:• RGB values are initialized to zero, • The red channel of the new image must contain the rgb2gray values of the I image (for the appropriate pixel subset only );• The blue and green channels of the new image must contain the rgb2gray values of the corresponding pixels (x, e ye) of Ie. The correspondence is computed as follows: for each pixel (x, y) in the new image, use the homography H to map this pixel to the (x, e ye) domain (not forgetting to divide by the homogeneous coordinate), and round the value so you get an integer grid location. If this (x, e ye) location indeed lies within the domain of the Ie image, then copy the rgb2gray’ed value from that Ie(x, e ye) into the blue and green channel of pixel (x, y) in the new image. (This amounts to an inverse mapping.)If the homography is correct and if the surface were Lambertian∗ then corresponding points in the new image would have the same values of R,G, and B and so the new image would appear to be gray at these pixels.• Based on your results, what can you conclude about the relative 3D positions and orientations of the camera? Give only qualitative answers here. Also, What can you conclude about the surface reflectance of the right wall and floor, namely are they more or less Lambertian? Limit your discussion to a few sentences. (5 marks) Along with your writeup, hand in the program that you used to solve the problem. You should have a switch statement that chooses between cases A, B, C.∗ Lambertian reflectance is the property that defines an ideal “matte” or diffusely reflecting surface. The apparent brightness of a Lambertian surface to an observer is the same regardless of the observer’s angle of view. Unfinished wood exhibits roughly Lambertian reflectance, but wood finished with a glossy coat of polyurethane does not, since the glossy coating creates specular highlights. Specular reflection, or regular reflection, is the mirror-like reflection of waves, such as light, from a surface. Reflections on still water are an example of specular reflection.In tutorial 10, we learned about the mean shift and cam shift tracking. In this question, we first attempt to evaluate the performance of mean shift tracking in a single case and will then implement a small variation of the standard mean shift tracking. For both parts you can use the attached short video KylianMbappe.mp4 or, alternatively, you can record and use a short (2-3 second) video of yourself. You can use any OpenCV (or other) functions you want in this question.1. (20 marks) Performance Evaluation • Use the Viola-Jones face detector to detect the face on the first frame of the video. The default detector can detect the face in the first frame of the attached video. If you record a video of yourself, make sure your face is visible and facing the camera in the first frame (and throughout the video) so the detector can detect your face in the first frame.• Construct the hue histogram of the detected face on the first frame using appropriate saturation and value thresholds for masking. Use the constructed hue histogram and mean shift tracking to track the bounding box of the face over the length of the video (from frame #2 until the last frame). So far, this is similar to what we did in the tutorial.• Also, use the Viola-Jones face detector to detect the bounding box of the face in each video frame (from frame #2 until the last frame). • Calculate the intersection over union (IoU) between the tracked bounding box and the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis of the plot should be the frame number (from 2 until the last frame) and the y axis should be the IoU on that frame.• In your report, include a sample frame in which the IoU is large (e.g. over 50%) and another sample frame in which the IoU is low (e.g. below 10%). Draw the tracked and detected bounding boxes in each frame using different colors (and indicate which is which).• Report the percentage of frames in which the IoU is larger than 50%. • Look at the detected and tracked boxes at frames in which the IoU is small (< 10%) and report which (Viola-Jones detection or tracked bounding box) is correct more often (we don’t need a number, just eyeball it). Very briefly (1-2 sentences) explain why that might be.2. (10 marks) Implement a Simple Variation • In the examples in Tutorial 10 (and the previous part of this question) we used a hue histogram for mean shift tracking. Here, we implement an alternative in which a histogram of gradient direction values is used instead.• After converting to grayscale, use blurring and the Sobel operator to first generate image gradients in the x and y directions (Ix and Iy). You can then use cartToPolar (with angleInDegrees=True) to get the gradient magnitude and angle at each frame. You can use 24 histogram bins and [0,360] (i.e. not [0,180]) directions.• When constructing hue histograms, we thresholded saturation and value channels to create a mask. Here, you can threshold the gradient magnitude to create a mask. For example, you can mask out pixels in the region of interest in which the gradient magnitude is less than 10% of the maximum gradient magnitude in the RoI.• Calculate the intersection over union (IoU) between the tracked bounding box and the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis of the plot should be the frame number (from 2 until the last frame) and the y axis should be the IoU on that frame.• In your report, include a sample frame in which the IoU is large (e.g. over 50%) and another sample frame in which the IoU is low (e.g. below 10%). Draw the tracked and detected bounding boxes in each frame using different colors (and indicate which is which). • Report the percentage of frames in which the IoU is larger than 50%.

[SOLVED] (csc420) assignment 3 part i: theoretical problems (50 marks) [question 1] laplacian of gaussian (25 marks)

The Laplacian of Gaussian operator is defined as: ∇2G(x, y, σ) = ∂ 2G(x, y, σ) ∂x2 + ∂ 2G(x, y, σ) ∂y2 = 1 πσ4 x 2 + y 2 2σ 2 − 1 e − x 2+y 2 2σ2 , where the Gaussian filter G is: G(x, y, σ) = 1 2πσ2 e − x 2+y 2 2σ2The characteristic scale is defined as the scale that produces the peak value (minimum or maximum) of the Laplacian response.1. (10 marks) What scale (i.e. what value of σ) maximises the magnitude of the response of the Laplacian filter to an image of a black circle with diameter D on a white background? Justify your answer.2. (5 marks) What scale should we use if we want to instead detect a white circle of the same size on a black background?3. (10 marks) Experimentally find the value of σ that maximizes the magnitude of the response for a black square of size 100×100 pixels on a sufficiently large white background. Hint: You can simply implement this task and automatically test for a large set of samples. You may also want to first generate the samples in log-domain to accurately locate the optimal value in a large spectrum.For corner detection, we defined the Second Moment Matrix as follows: M = X x X y w(x, y) I 2 x IxIy IxIy I 2 y Let’s denote the 2×2 matrix used in the equation by N; i.e.: N = I 2 x IxIy IxIy I 2 y 1. (10 marks) Compute the eigenvalues of N denoted by λ1 and λ2? 2. (15 marks) Prove that matrix M is positive semi-definite.The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of scale-invariant feature transform (SIFT) descriptors, and shape contexts (a similar technique we have not seen in class), but differs in the sense that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. Until deep learning, HOG was one of the long-standing top representations for object detection.In this assignment, you will implement a variant of HOG. Given an input image, your algorithm will compute the HOG feature and visualize it as shown in Figure 1 (the line directions are perpendicular to the gradient to show edge alignment). Figure 1: HOG features plotted on an example image.The orientation and magnitude of the red lines represent the gradient components in a local cell. A HOG descriptor is formed at a specified image location as follows: 1. Compute image gradient magnitudes and directions over the whole image, thresholding small gradient magnitudes to zero. You should empirically set a reasonable value for the threshold for each of the input images.2. Center a cell grid (m × n) on the image. To create this grid cell, assume the grid cells are square and we have a fixed-size length for each of the cells in this grid; let us call that size τ . For example, if your image size is 1021 ×975 and τ = 8, then you will have a grid size of (m = 127) × (n = 121). You can ignore the boundary of the image that can not be fit into a grid (on either end), i. e., just consider the crop of the image that fits to the grid perfectly, which is 1016 × 968 in this example.3. For each cell, form an orientation histogram by quantizing the gradient directions and, for each such orientation bin, add the (thresholded) gradient magnitudes. This processcan be done in two steps: Imagine gradient orientations are discretized by 6 bins: [−15◦ , 15◦ ), [15◦ , 45◦ ), [45◦ , 75◦ ), [75◦ , 105◦ ), [105◦ , 135◦ ), [135◦ , 165◦ )Remember 165◦ is equivalent to −15◦ where the orientation is not directed. Now create a 3D array (m × n × 6) where in element (i, j, k) of this 3D array you will store the accumulated gradient magnitudes over all the pixels in the cell (i, j) with gradient orientations corresponding to bin k.Another approach for constructing the HOG, is to collect the number of occurrences in each bin, rather than accumulating the magnitudes of occurrences; i.e. in element (i, j, k) of the histogram, we store the number of pixels in cell (i, j) with gradient orientations corresponding to bin k Choose reasonable values for the threshold and cell size, and then visualize the resulting 3D arrays (using both approaches) on the given images similar to the quiver plot of Figure 1. Briefly, compare the two approaches by inspecting the visualizations.(15 marks)Hint: You can use any package/function for creating the visualization in Figure 1. One way to do that is to superimpose 6 quiver plots (one for each bin), generated by quiver function in matplotlib package. For the remaining tasks, you can use either approaches for constructing HOG. Make sure to explicitly mention your choice in the report.4. To account for changes in illumination and contrast, the gradient strengths must be locally normalized, which requires grouping the cells together into larger, spatially connected blocks (adjacent cells). Given the histogram of oriented gradients, you apply L2 normalization as follows:• Build a descriptor for the first block by concatenating the HOG within the block. You can use block size = 2, i.e., 2 × 2 block will contain 2 × 2 × 6 entries that will be concatenated to form one long vector. • Normalize the descriptor as follows: hˆ i = p hi P i h 2 i + e 2 where hi is the i th element of the vector and hˆ i is the normalized histogram. e is the normalization constant to prevent division by zero (e.g., e = 0.001).• Assign the normalized histogram to the first cell of a new histogram array, i.e. cell (1,1). • Move to the next block of old histogram array with stride 1 and iterate steps 1-3 above, to compute the next cell of the new histogram array.The resulting new histogram array will have the size of (m − 1) × (n − 1) × 24. Compute normalized histogram arrays for the provided images, and store them in a single line text file where the data is stored row by row (first row then second row etc.). Submit both your code and the files that are generated by your code. Please note that the file should have the same name as the image (e.g. ‘image.jpg’ → ‘image.txt’). (15 marks)In addition to the provided images, use your own camera (e.g. smartphone camera) to capture two images of the same scene, one with flash and one without flash. Convert the images to gray-scale, and down-sample the images if needed to avoid excessive computation overhead.First, compute the original HOG arrays for these two images and visualize them similar to Figure 1. (5 marks) Second, compute the normalized histogram arrays for each of these two images, and store them in two txt files as instructed earlier. (5 marks)Third, by comparing the results, argue why or why not the normalization of HOG may be beneficial. Limit your discussion to a paragraph, containing the main points. You can compare the histograms visually or you may want to define a quantifiable measure to compare the histograms for pair of with-flash/no-flash images. If you choose to visually compare, provide the details of your visualization approach for normalized HOG; alternatively, if you decide to quantitatively compare the histograms, include the function you used and your justification in the report. (20 marks)Download two images (I1 and I2) of the Sandford Fleming Building taken under two different viewing directions: • https://commons.wikimedia.org/wiki/File:University College, University of Toronto.jpg • https://commons.wikimedia.org/wiki/File:University College Lawn, University of Toronto, Canada.jpg 1. Calculate the eigenvalues of the Second Moment Matrix (M) for each pixel of I1 and I2.2. Show the scatter plot of λ1 and λ2 for all the pixels in I1 (5 marks) and the same scatter plot for I2 (5 marks). Each point shown at location (x, y) in the scatter plot, corresponds to a pixel with eigenvalues: λ1 = x and λ2 = y.3. Based on the scatter plots, pick a threshold for min(λ1, λ2) to detect corners. Illustrate detected corners on each image using the chosen threshold (10 marks). 4. Constructing matrix M involves the choice of a window function w(x, y). Often a Gaussian kernel is used. Repeat steps 1, 2, and 3 above, using a significantly different Gaussian kernel (i.e. a different σ) than the one used before. For example, choose a σ that is significantly (e.g. 5 times, or 10 times) larger than the previous one (10 marks). Explain how this choice influenced the corner detection in each of the images (10 marks).

[SOLVED] (csc420) assignment 2 part i: theoretical problems (80 marks) [question 1] image pyramids (10 marks)

In Gaussian pyramids, the image at each level Ik is constructed by blurring the image at the previous level Ik−1 and downsampling it by a factor of 2. A Laplacian pyramid, on the other hand, consists of the difference between the image at each level (Ik) and the upsampled version of the image in the next level of the Gaussian pyramid (Ik+1).Given an image of size 2n × 2 n denoted by I0, and its Laplacian pyramid representation denoted by L0, …, Ln−1, show how we can reconstruct the original image, using the minimum information from the Gaussian pyramid. Specify the minimum information required from the Gaussian pyramid and a closed-form expression for reconstructing I0. Hint: The reconstruction follows a recursive process; What is the base case that contains the minimum information?Show that in a fully connected neural network with linear activation functions, the number of layers has effectively no impact on the network. Hint: Express the output of a network as a function of its inputs and its weights of layers.Consider a neural network that represents the following function: yˆ = σ(w5σ(w1x1 + w2x2) + w6σ(w3x3 + w4x4)) where xi denotes input variables, ˆy is the output variable, and σ is the logistic function: σ(x) = 1 1 + e −x .Suppose the loss function used for training this neural network is the L2 loss, i.e. L(y, yˆ) = (y − yˆ) 2 . Assume that the network has its weights set as: (w1, w2, w3, w4, w5, w6) = (−0.65, −0.55, 1.74, 0.79, −0.13, 0.93)[3.a] (5 marks) Draw the computational graph for this function. Define appropriate intermediate variables on the computational graph. [3.b] (5 marks) Given an input data point (x1, x2, x3, x4) = (1.2, −1.1, 0.8, 0.7) with true label of 1.0, compute the partial derivative ∂L w3 , by using the back-propagation algorithm.Indicate the partial derivatives of your intermediate variables on the computational graph. Round all your calculations to 4 decimal places. Hint: For any vector (or scalar) x, we have ∂ ∂x (||x||2 2 ) = 2x. Also, you do not need to write any code for this question! You can do it by hand.In this problem, our goal is to estimate the computation overhead of CNNs by counting the FLOPs (floating point operations). Consider a convolutional layer C followed by a max pooling layer P. The input of layer C has 50 channels, each of which is of size 12×12. Layer C has 20 filters, each of which is of size 4 × 4. The convolution padding is 1 and the stride is 2. Layer P performs max pooling over each of the C’s output feature maps, with 3 × 3 local receptive fields, and stride 1.Given scalar inputs x1, x2, …, xn, we assume: • A scalar multiplication xi .xj accounts for one FLOP. • A scalar addition xi + xj accounts for one FLOP.• A max operation max(x1, x2, …, xn) accounts for n − 1 FLOPs. • All other operations do not account for FLOPs. How many FLOPs layer C and P conduct in total during one forward pass, with and without accounting for bias?The following CNN architecture is one of the most influential architectures that was presented in the 90s. Count the total number of trainable parameters in this network. Note that the Gaussian connections in the output layer can be treated as a fully connected layer similar to F6.For backpropagation in a neural network with logistic activation function, show that, in order to compute the gradients, as long as we have the outputs of the neurons, there is no need for the inputs. Hint: Find the derivative of a neuron’s output with respect to its inputs.One alternative to the logistic activation function is the hyperbolic tangent function: tanh(x) = 1 − e −2x 1 + e −2x . • (a) What is the output range for this function, and how it differs from the output range of the logistic function? • (b) Show that its gradient can be formulated as a function of logistic function. • (c) When do we want to use each of these activation functions?In this question, we train (or fine-tune) a few different neural network models to classify dog breeds. We also investigate their dataset bias and cross-dataset performances. All the tasks should be implemented using Python with a deep learning package of your choice, e.g. PyTorch or TensorFlow.We use two datasets in this assignment. 1. Stanford Dogs Dataset 2. Dog Breed ImagesThe Stanford Dogs Dataset (SDD) contains over 20,000 images of 120 different dog breeds. The annotations available for this dataset include class labels (i.e. dog breed name) and bounding boxes. In this assignment, we’ll only be using the class labels. Further, we will only use a small portion of the dataset (as described below) so you can train your models on Colab. Dog Breed Images (DBI) is a smaller dataset containing images of 10 different dog breeds.To prepare the data for the implementation tasks, follow these steps: 1- Download both datasets and unzip them. There are 7 dog breeds that appear in both datasets: • Bernese mountain dog • Border collie • Chihuahua • Golden retriever • Labrador retriever • Pug • Siberian husky2- Delete the folders associated with the remaining dog breeds in both datasets. You can also delete the folders associated with the bounding boxes in the SDD.3- For the 7 breeds that are present in both datasets, the names might be written slightly differently (e.g. Labrador Retriever vs. Labrador). Manually rename the folders so the names match (e.g. make them both labrador retriever ).4- Rename the folders to indicate that they are subsets of the original datasets (to avoid potential confusion if you later want to use them for another project). For example, SDDsubset and DBIsubset. Each of these should now contain 7 subfolders (e.g. border collie, pug, etc.) and the names should match.5- Zip the two folders (e.g. SDDsubset.zip and DBIsubset.zip) and upload them to your Google Drive.You can find sample code working with the SDD on the internet. If you want, you are welcome to look at these examples (particularly the one linked here) and use them as your starting code or use code snippets from them. You will need to modify the code as our questions are asking you to do different tasks, which are not the same as the ones in these online examples. But using and copying code snippets from these resources is fine. If you choose to use this (or any other online example) as your starting code, please acknowledge them in your submission. We also suggest that before starting to modify the starting code, you run them as is on your data (e.g. DBIsubset) to 1) make sure your dataset setup is correct and 2) to make sure you fully understand the starter code before you start modifying it.Look at the images in both datasets, and briefly explain if you observe any systematic differences between images in one dataset vs. the other.Construct a simple convolutional neural network (CNN) for classifying the images in SDD. For example, you can construct a network as follow: • convolutional layer – 16 filters of size 3×3 • batch normalization • convolutional layer – 16 filters of size 3×3 • max pooling (2×2) • convolutional layer – 8 filters of size 3×3 • batch normalization • convolutional layer – 8 filters of size 3×3 • max pooling (2×2) • dropout (e.g. 0.5) • fully connected (32) • dropout (0.5) • softmaxIf you want, you can change these specifications; but if you do so, please specify them in your submission. Use RELU as your activation function, and cross-entropy as your cost function. Train the model with the optimizer of your choice, e.g., SGD, Adam, RMSProp, etc.Use random cropping, random horizontal flipping, and random rotations for augmentation. Make sure to tune the parameters of your optimizer for getting the best performance on the validation set. Plot the training, and test accuracy over the first 10 epochs. Note that the accuracy isdifferent from the loss function; the accuracy is defined as the percentage of images classified correctly.Train the same CNN model again; this time, with dropout. Plot the training and test accuracy over the first 10 epochs; and compare them with the model trained without dropout. Report the impact of dropout on the training and its generalization to the test set.[III.a] (15 marks) ResNet models were proposed in the “Deep Residual Learning for Image Recognition” paper. These models have had great success in image recognition on benchmark datasets. In this task, we use the ResNet-18 model for the classification of the images in the DBI dataset. To do so, use the ResNet-18 model from PyTorch, modify the input/output layers to match your dataset, and train the model from scratch; i.e., do not use the pretrained ResNet. Plot the training, validation, and testing accuracy, and compare those with the results of your CNN model.[III.b] (10 marks) Run the trained model on the entire SDD dataset and report the accuracy. Compare the accuracy obtained on the (test set of) DBI, vs. the accuracy obtained on the SDD. Which is higher? Why do you think that might be? Explain very briefly, in one or two sentences.Similar to the previous task, use the following three models from PyTorch: ResNet18, ResNet34, and ResNeXt32. This time you are supposed to use the pre-trained models and fine-tune the input/output layers on DBI training data. Report the accuracy of these finetuned models on DBI test dataset, and also the entire SDD dataset. Discuss the crossperformance of these trained models. For example, are there cases in which two different models perform equally well on the test portion of the DBI but have significant performance differences when evaluated on the SDD?Train a model that – instead of classifying dog breeds – can distinguish whether a given image is more likely to belong to SDD or DBI. To do so, first, you need to divide your data into training and test data (and possibly validation if you need those for tuning the hyperparameters of your model).You need to either reorganize the datasets (to load the images using torchvision.datasets.ImageFolder ) or write your own data loader function. Train your model on the training portion of the dataset. Include your network model specifications in the report, and make sure to include your justifications for that choice. Report your model’s accuracy on the test portion of the dataset.

[SOLVED] (csc420) assignment 1 part i: theoretical problems (60 marks) [question 1] convolution (10 marks)

[1.a] (5 marks) Calculate and plot the convolution of x[n] and h[n] specified below: x[n] = ( 1 −3 ≤ n ≤ 3 0 otherwise h[n] = ( 1 −2 ≤ n ≤ 2 0 otherwise (1) [1.b] (5 marks) Calculate and plot the convolution of x[n] and h[n] specified below: x[n] = ( 1 −3 ≤ n ≤ 3 0 otherwise h[n] = ( 2 − |x| −2 ≤ n ≤ 2 0 otherwise (2)We define a system as something that takes an input signal, e.g. x(n), and produces an output signal, e.g. y(n). Linear Time-Invariant (LTI) systems are a class of systems that are both linear and time-invariant. In linear systems, the output for a linear combination of inputs is equal to the linear combination of individual responses to those inputs. In other words, for a system T, signals x1(n) and x2(n), and scalars a1 and a2, system T is linear if and only if: T[a1x1(n) + a2x2(n)] = a1T[x1(n)] + a2T[x2(n)]Also, a system is time-invariant if a shift in its input merely shifts the output; i.e. If T[x(n)] = y(n), system T is time-invariant if and only if: T[x(n − n0)] = y(n − n0)[2.a] (5 marks) Consider a discrete linear time-invariant system T with discrete input signal x(n) and impulse response h(n). Recall that the impulse response of a discrete system is defined as the output of the system when the input is an impulse function δ(n), i.e. T[δ(n)] = h(n), where: δ(n) = ( 1, if n = 0, 0, else. Prove that T[x(n)] = h(n) ∗ x(n), where ∗ denotes convolution operation. Hint: represent signal x(n) as a function of δ(n).[2.b] (5 marks) Is Gaussian blurring linear? Is it time-invariant? Make sure to include your justifications. [2.c] (5 marks) Is time reversal, i.e. T[x(n)] = x(−n), linear? Is it time-invariant? Make sure to include your justifications.Vectors can be used to represent polynomials. For example, 3rd-degree polynomial (a3x 3 + a2x 2 + a1x + a0) can by represented by vector [a3, a2, a1, a0]. If u and v are vectors of polynomial coefficients, prove that convolving them is equivalent to multiplying the two polynomials they each represent. Hint: You need to assume proper zero-padding to support the full-size convolution.The Laplace operator is a second-order differential operator in the “n”-dimensional Euclidean space, defined as the divergence (∇.) of the gradient (∇f). Thus if f is a twice-differentiable real-valued function, then the Laplacian of f is defined by: ∆f = ∇2 f = ∇ · ∇f = Xn i=1 ∂ 2 f ∂x2 i where the latter notations derive from formally writing: ∇ = ∂ ∂x1 , . . . , ∂ ∂xn .Now, consider a 2D image I(x, y) and its Laplacian, given by ∆I = Ixx+Iyy. Here the second partial derivatives are taken with respect to the directions of the variables x, y associated with the image grid for convenience. Show that the Laplacian is in fact rotation invariant.In other words, show that ∆I = Irr + Ir ′r ′, where r and r ′ are any two orthogonal directions.Hint: Start by using polar coordinates to describe a chosen location (x, y). Then use the chain rule.Using the sample code provided in Tutorial 2, examine the sensitivity of the Canny edge detector to Gaussian noise. To do so, take an image of your choice, and add i.i.d Gaussian noise to each pixel. Analyze the performance of the edge detector as a function of noise variance. Include your observations and three sample outputs (corresponding to low, medium, and high noise variances) in the report.In this question, the goal is to implement a rudimentary edge detection process that uses a derivative of Gaussian, through a series of steps. For each step (excluding step 1) you are supposed to test your implementation on the provided image, and also on one image of your own choice. Include the results in your report.Step I – Gaussian Blurring (10 marks): Implement a function that returns a 2D Gaussian matrix for input size and scale σ. Please note that you should not use any of the existing libraries to create the filter, e.g. cv2.getGaussianKernel(). Moreover, visualize this2D Gaussian matrix for two choices of σ with appropriate filter sizes. For the visualization, you may consider a 2D image with a colormap, or a 3D graph. Make sure to include the color bar or axis values.Step II – Gradient Magnitude (10 marks): In the lectures, we discussed how partial derivatives of an image are computed. We know that the edges in an image are from the sudden changes of intensity and one way to capture that sudden change is to calculate the gradient magnitude at each pixel. The edge strength or gradient magnitude is defined as: g(x, y) = |∇f(x, y)| = q g 2 x + g 2 y where gx and gy are the gradients of image f(x, y) along x and y-axis direction respectively.Using the Sobel operator, gx and gy can be computed as: gx =   −1 0 1 −2 0 2 −1 0 1   ∗ f(x, y) and gy =   −1 −2 −1 0 0 0 1 2 1   ∗ f(x, y)Implement a function that receives an image f(x, y) as input and returns its gradient g(x, y) magnitude as output using the Sobel operator. You are supposed to implement the convolution required for this task from scratch, without using any existing libraries.Step III – Threshold Algorithm (20 marks): After finding the image gradient, the next step is to automatically find a threshold value so that edges can be determined. One algorithm to automatically determine image-dependent threshold is as follows: 1. Let the initial threshold τ0 be equal to the average intensity of gradient image g(x, y), as defined below: τ0 = Ph j=1 Pw i=1 g(i, j) h × w where h and w are the height and width of the image under consideration.2. Set iteration index i = 0, and categorize the pixels into two classes, where the lower class consists of the pixels whose gradient magnitudes are less than τ0, and the upper class contains the rest of the pixels.3. Compute the average gradient magnitudes mL and mH of lower and upper classes, respectively.4. Set iteration i = i + 1 and update threshold value as: τi = mL + mH 25. Repeat steps 2 to 4 until |τi − τi−1| ≤ ϵ is satisfied, where ϵ → 0; take τi as final threshold and denote it by τ . Once the final threshold is obtained, each pixel of gradient image g(x, y) is compared with τ . The pixels with a gradient higher than τ are considered as edge point and is represented as white pixel; otherwise, it is designated as black. The edge-mapped image E(x, y), thus obtained is: E(x, y) = ( 255, if g(x, y) ≥ τ 0, otherwise Implement the aforementioned threshold algorithm. The input to this algorithm is the gradient image g(x, y) obtained from step II, and the output is a black and white edge-mapped image E(x, y).Step IV – Test (10 marks): Use the image provided along with this assignment, and also one image of your choice to test all the previous steps (I to III) and to visualize your results in the report. Convert the images to grayscale first.Please note that the input to each step is the output of the previous step. In a brief paragraph, discuss how the algorithm works for these two examples and highlight its strengths and/or its weaknesses.

[SOLVED] Cs 170 homework 3 updating labels

You are given a tree T = (V, E) with a designated root node r, and a non-negative integer label l(v) for each node v. We wish to relabel each vertex v, such that lnew(v) is equal to l(w), where w is the k-th ancestor of v in the tree for k = l(v). In other words, v “inherits” the label from its k-th ancestor. Follow the convention that the root node, r, is its own parent.Design a linear time algorithm to compute the new label, lnew(v), for each v in V . Just describe the algorithm and give a runtime analysis in terms of n = |V | and m = |E|; no proof of correctness is necessary.Each of the following problems can be solved with techniques taught in lecture. Construct a simple directed graph and write an algorithm for each problem by black-boxing algorithms taught in lecture and in the textbook.(a) Sarah wants to do an extra credit problem for her math class. She is given three numbers: 1, x, and y. Starting from x, she needs to find the shortest sequence of additions, subtractions, and divisions (only possible when the number is divisble by y) using 1 and y to get to 2022. If there are multiple sequences with the shortest length, return any one of them. She can use 1 and y multiple times. Give an algorithm that Sarah can query to get this sequence of arithmetic operations.(b) There are n different species of Pokemon, all descended from the original Mew. For any species of Pokemon, Professor Juniper knows all of the species directly descended from it. She wants to write a program that answers queries about these Pokemon. The program would have two inputs: a and b, which represent two different species of Pokemon.Her program would then output one of three options in constant time (the time complexity cannot rely on n): (1) a is descended from b. (2) b is descended from a. (3) a and b share a common ancestor, but neither are descended from each other.Unfortunately, Professor Juniper’s laptop is very old and its SSD drive only has enough space to store up to O(n) pieces of data for the program. Give an algorithm that Professor Juniper’s program could use to solve the problem above given the constraints.Hint: Professor Juniper can run some algorithm on her data before all of her queries and store the outputs of the algorithm for her program; time taken for this precomputation is not considered in the query run time.(c) Bob has n different boxes. He wants to send the famous ”Blue Roses’ Unicorn” figurine from his glass menagerie to his crush. To protect it, he will put it in a sequence of boxes. Each box has a weight w and size s; with advances in technology, some boxes have negative weight. A box a inside a box b cannot be more than 15% smaller than the size of box b; otherwise, the box will move, and the precious figurine will shatter. The figurine needs to be placed in the smallest box x of Bob’s box collection.Bob (and Bob’s computer) can ask his digital home assistant Falexa to give him a list of all boxes less than 15% smaller (but not necessarily lighter) than a given box c. Bob will need to pay postage for each unit of weight. Find an algorithm that will find the lightest sequence of boxes that can fit in each other in linear time (in terms of the graph). Hint: how can we create a graph knowing that no larger box can fit into a smaller box, and what property does this graph have?Arguably, one of the best things to do in America is to take a great American road trip. And in America there are some amazing roads to drive on (think Pacific Coast Highway, Route 66 etc). An intrepid traveler has chosen to set course across America in search of some amazing driving. What is the length of the shortest path that hits at least k of these amazing roads?Assume that the roads in America can be expressed as a directed weighted graph G = (V, E, d), and that our traveler wishes to drive across at least k roads from the subset R ⊆ E of “amazing” roads. Furthermore, assume that the traveler starts and ends at her home a ∈ V . You may also assume that the traveler is fine with repeating roads from R, i.e. the k roads chosen from R need not be unique.Design an efficient algorithm to solve this problem. Provide a 3-part solution with runtime in terms of n = |V |, m = |E|, k. Hint: Create a new graph G′ based on G such that for some s ′ , t′ in G′ , each path from s ′ to t ′ in G′ corresponds to a path of the same length from a to itself in G containing at least k roads in R. It may be easier to start by trying to solve the problem for k = 1.Design an efficient algorithm that given a directed graph G = (V, E), outputs the set of all vertices v such that there is a cycle containing v. In other words, your algorithm should output a set containing all vertices v such that there exists a path from v to itself in G that traverses at least one other unique vertex u ̸= v ∈ V . Provide a 3-part solution with run-time in terms of n = |V | and m = |E|.For this week’s coding questions, we’ll be implementing FFT, DFS, and BFS, and applying them to common problems. The Jupyter Notebooks are located under the hw3 folder here. You can also download these files to your personal computer and complete the exercises locally. Please complete both jupyter notebooks and make sure to submit them to the right subparts on gradescope. (a) fft.ipynb (b) dfs bfs.ipynbNotes: • Submission Instructions: Please merge your completed Jupyter Notebooks with your written solutions for other questions and submit one merged pdf file to Gradescope. • OH/HWP Instructions: While we will be providing conceptual help on the coding portion of homeworks, OH staff will not look at your code and/or help you debug.• Academic Honesty Guideline: We realize that code for some of the algorithms we ask you to implement may be readily available online, but we strongly encourage you to not directly copy code from these sources. Instead, try to refer to the resources mentioned in the notebook and come up with code yourself. That being said, we do acknowledge that there may not be many different ways to code up particular algorithms and that your solution may be similar to other solutions available online.

[SOLVED] Cs 170 homework 2 werewolves

You are playing a party game with n other friends, who play either as werewolves or villagers. You do not know who is a villager and who is a werewolf, but all your friends do. There are always more villagers than there are werewolves.Your goal is to identify one player who is certain to be a villager. Your allowed ‘query’ operation is as follows: you pick two people as partners. You ask each person if their partner is a villager or a werewolf. When you do this, a villager must tell the truth about the identity of their partner, but a werewolf doesn’t have to (they may lie or tell the truth about their partner).Your algorithm should work regardless of the behavior of the werewolves. (a) Given a single person, devise an algorithm that returns whether or not that person is a villager using O(n) queries. Just an informal description of your test and a brief explanation of why it works is needed.(b) Show how to find a villager in O(n log n) queries (where one query is taking two people x and y and asking x to identify y and y to identify x). There is a linear-time algorithm for this problem, but you cannot use it here, as we would like you to get practice with divide and conquer.Hint: Split the group into two groups, and use part (a). What invariant must hold for at least one of the two groups?Give a 3-part solution. (c) (Extra Credit) Can you give a linear-time algorithm? Give a 3-part solution. Hint: Don’t be afraid to sometimes ‘throw away’ a pair of people once you’ve asked them to identify their partners.We are playing a variant of The Resistance, a board game where there are n players, k of which are spies. In this variant, in every round, we choose a subset of players to go on a mission. A mission succeeds if no spies are chosen to go on the mission, but fails if at least one spy goes on the mission, and when a mission fails we are not told who the spies are that went on the mission.Come up with a strategy that identifies all the spies in O(k log(n/k)) missions. Only a main idea and runtime analysis are needed.Fourier transforms (FT) have to deal with computations involving irrational numbers which can be tricky to implement in practice. Motivated by this, in this problem you will demonstrate how to do a Fourier transform in modular arithmetic, using modulo 5 as an example. (a) There exists ω ∈ {0, 1, 2, 3, 4} such that ω are 4th roots of unity (modulo 5), i.e., solutions to z 4 = 1. When doing the FT in modulo 5, this ω will serve a similar role to the primitive root of unity in our standard FT. Show that {1, 2, 3, 4} are the 4th roots of unity (modulo 5). Also show that 1 + ω + ω 2 + ω 3 = 0 (mod 5) for ω = 2.(b) Using the FFT, produce the transform of the sequence (0, 2, 3, 0) modulo 5; that is, evaluate the polynomial 2x+ 3x 2 at {1, 2, 4, 3} using the recursive FFT algorithm defined in class, but with ω = 2 and in modulo 5 instead of with ω = i in the complex numbers.All calculations should be performed modulo 5. Hint: You can verify your calculation by evaluating the polynomial at the roots of unity using the slow method, if you like.(c) Now perform the inverse FFT on the sequence (0, 1, 4, 0), also using the recursive algorithm. Recall that the inverse FFT is the same as the forward FFT, but using ω −1 instead of ω, and with an extra multiplication by 4−1 for normalization. (d) Now show how to multiply the polynomials 3x + 2x 2 and 3 − x using the FFT modulo5. You may use the fact that the FT of (3, 4, 0, 0) modulo 5 is (2, 1, 4, 0) without doing your own calculation.5 Pattern Matching Consider the following string matching problem: Input: • A string g of length n made of 0s and 1s. Let us call g, the “pattern”. • A string s of length m made of 0s and 1s. Let us call s the “sequence”. • Integer kGoal: Find the (starting) locations of all length n-substrings of s which match g in at least n − k positions. Example: Using 0-indexing, if g = 0111, s = 01010110111, and k = 1 your algorithm should output 0,2,4 and 7.(a) Give a O(nm) time algorithm for this problem. We will now design an O(m log m) time algorithm for the problem using FFT. Pause a moment here to contemplate how strange this is. What does matching strings have to do with roots of unity and complex numbers?(b) Devise an FFT based algorithm for the problem that runs in time O(m log m). Write down the algorithm, prove its correctness and show a runtime bound. Hint: On the example strings g and s, the first step of the algorithm is to construct the following polynomials 0111 → 1 + x + x 2 − x 3 01010110111 → −1 + x − x 2 + x 3 − x 4 + x 5 + x 6 − x 7 + x 8 + x 9 + x 10To start, try to think about the case when k = 0 (i.e. g matches perfectly), and then work from there. (c) (Extra Credit) Often times in biology, we would like to locate the existence of a gene in a species’ DNA. Of course, due to genetic mutations, there can be many similar but not identical genes that serve the same function, and genes often appear multiple times in one DNA sequence. So a more practical problem is to find all genes in a DNA sequence that are similar to a known gene.This problem is very similar to the one we solved earlier, the string s is complete sequence and the pattern g is a specific gene. We would like to find all locations in the complete sequence s, where the gene g appears, but for k modifications. Except in genetics, the strings g and s consist of one of four alphabets {A, C, T, G} (not 0s and 1s). Can you devise an O(m log m) time algorithm for this modified problem?As a new initiative this semester, we’re going to have short coding exercises so that you can get a feel of what it’s like to implement these amazing algorithms that you learn about in lecture in real code. While the notebook may seem large, the actual code you have to write will usually be very short and direct.There are two ways you can access the hw2.ipynb notebook and complete the problems: 1. Click here if you prefer to complete this question on Berkeley DataHub. 2. Run git clone https://github.com/Berkeley-CS170/cs170-coding-notebooks-fa22.git in your computer’s terminal if you prefer to complete it locally.Notes: • Submission Instructions: Please merge your completed Jupyter Notebook with your written solutions for other questions and submit one merged pdf file to Gradescope. • OH/HWP Instructions: While we will be providing conceptual help on the coding portion of homeworks, OH staff will not look at your code and/or help you debug.• Academic Honesty Guideline: We realize that code for some of the algorithms we ask you to implement may be readily available online, but we strongly encourage you to not directly copy code from these sources. Instead, try to refer to the resources mentioned in the notebook and come up with code yourself. That being said, we do acknowledge that there may not be many different ways to code up particular algorithms and that your solution may be similar to other solutions available online.

[SOLVED] Cs 170 homework 1 4 recurrence relations

For each part, find the asymptotic order of growth of T; that is, find a function g such that T(n) = Θ(g(n)). In all subparts, you may ignore any issues arising from whether a number is an integer. (a) T(n) = 4T(n/4) + 32n (b) T(n) = 4T(n/3) + n 2 (c) T(n) = T(3n/5) + T(4n/5) (We have T(1) = 1)Find a function f(n) ≥ 0 such that: • For all c > 0, f = Ω(n c ) • For all α > 1, f = O(α n ) Give a proof for why it satisfies both these properties.Suppose we have a sequence of integers An, where A0, . . . , Ak−1 < 50 are given, and each subsequent term in the sequence is given by some integer linear combination of the k previous terms: Ai = Ai−1b1 + Ai−2b2 + · · · + Ai−kbk. You are given as inputs A0 through Ak−1 and the coefficients b1 through bk.(a) Devise an algorithm which computes An mod 50 in O(log n) time (Hint: use the matrix multiplication technique from class). You should treat k as a constant, and you may assume that all arithmetic operations involving numbers of O(log n) bits take constant time. 1 Give a 3-part solution as described in the homework guidelines.(b) Devise an even faster algorithm which doesn’t use matrix multiplication at all. Once again, you should still treat k as a constant. Hint: Exploit the fact that we only want the answer mod a constant (here 50). Give a 3-part solution as described in the homework guidelines.Given the n-digit decimal representation of a number, converting it into binary in the natural way takes O(n 2 ) steps. Give a divide and conquer algorithm to do the conversion and show that it does not take much more time than Karatsuba’s algorithm for integer multiplication.Just state the main idea behind your algorithm and its runtime analysis; no proof of correctness is needed as long as your main idea is clear. 1A similar assumption – that arithmetic operations involving numbers of O(log N) bits take constant time, where N is the number of bits needed to describe the entire input – is known as the transdichotomous word RAM model and it is typically at least implicitly assumed in the study of algorithms.Indeed, in an input of size N it is standard to assume that we can index into the input in constant time (do we not typically assume that indexing into an input array takes constant time?!).Implicitly this is assuming that the register size on our computer is at least log N bits, which means it is natural to assume that we can do all standard machine operations on log N bits in constant time.

[SOLVED] Si251 – convex optimization homework 1 to 4 solutions

1. Please prove that the following sets are convex: 1) S = {x ∈ Rm | | p(t) |≤ 1 for |t| ≤ π/3}, where p(t) = x1 cost + x2 cos 2t + · · · + xm cos mt. (5 pts)2) (Ellipsoids) n x| p (x − xc) T P(x − xc) ≤ r o (xc ∈ R n, r ∈ R, P ⪰ 0). (5 pts)3) (Symmetric positive semidefinite matrices) S n×n + = n P ∈ S n×n|P ⪰ 0 o . (5 pts) 4) The set of points closer to a given point than a given set, i.e., n x | ∥x − x0∥2 ≤ ∥x − y∥2 for all y ∈ S o , where S ∈ Rn. (5 pts)2. (15 pts) For a given norm ∥ · ∥ on Rn, the dual norm, denoted ∥ · ∥∗, is defined as ∥y∥∗ = sup x∈Rn {y T x | ∥x∥ ≤ 1}.Show that the dual of Euclidean norm is the Euclidean mom, i.e., supx∈Rn {z T x | ∥x∥2 ≤ 1} = ||z||2.3. (15 pts) Define a norm cone as C ≡ (x, t) : x ∈ R d , t ≥ 0, ∥x∥ ≤ t⊆ R d+1 Show that the norm cone is convex by using the definition of convex sets.4. (18 pts) Let C ⊂ R n be convex and f : C → R⋆ . Show that the following statements are equivalent: (a) epi(f) is convex. (b) For all points xi ∈ C and {λi |λi ≥ 0, Pn i=1 λi = 1, i = 1, 2, · · · , n}, we have f Xn i=1 λixi ≤ Xn i=1 λif(xi).(c) For ∀x, y ∈ C and λ ∈ [0, 1], f (1 − λ)x + λy ≤ (1 − λ)f(x) + λf(y).5. (14 pts) Monotone Mappings. A function ψ : Rn → Rn is called monotone if for all x, y ∈ domψ, (ψ(x) − ψ(y))T (x − y) >= 0. (1)Suppose f : Rn → Rn is a differentiable convex function. Show that its gradient ∇f is monotone. Is the convex true, i.e., is every monotone mapping the gradient of a convex function?6. (18 pts) Please determine whether the following functions are convex, concave or none of those, and give a detailed explanation for your choice. 1) f1(x1, x2, · · · , xn) = ( −(x1x2 · · · xn) 1 n , if x1, · · · , xn > 0 ∞ otherwise; 2) f2(x1, x2) = x α 1 x 1−α 2 , where 0 ≤ α ≤ 1, on R 2 ++; 3) f3(x, u, v) = − log(uv − x T x) on domf = {(x, u, v)|uv > xT x, u, v > 0}.In the lecture, we have learned about robust linear programming as an application of second-order cone programming. Now we will consider a similar robust variation of the convex quadratic program minimize (1/2)x T P x + q T x + r subject to Ax ⪯ b.For simplicity, we assume that only the matrix P is subject to errors, and the other parameters (q, r, A, b) are exactly known. The robust quadratic program is defined as minimize supP ∈E (1/2)x T P x + q T x + r subject to Ax ≺ b where E is the set of possible matrices P.For each of the following sets E, express the robust QP as a convex problem in a standard form (e.g., QP, QCQP, SOCP, SDP). (a) A finite set of matrices: E = {P1, …, PK}, where Pi ∈ S n +, i = 1, …, K. (b) A set specified by a nominal value P0 ∈ S n + plus a bound on the eigenvalues of the deviation P − P0: E = {P ∈ S n | −γI ⪯ P − P0 ⪯ γI} where γ ∈ R and P0 ∈ S n +. (c) An ellipsoid of matrices: E = ( P0 + X K i=1 Piui | ∥u∥2 ≤ 1 ) .You can assume Pi ∈ S n +, i = 0, . . . , K.Please consider the convex optimization problem and calculate its solution minimize − Pn i=1 log (αi + xi) subject to x ⪰ 0, 1 T x = 1,1. (50 pts) L-smooth functions. Suppose the function f : R n → R is convex and differentiable. Please prove that the following relations holds for all x, y ∈ R if f with an L-Lipschitz continuous conditions, [1] ⇒ [2] ⇒ [3] [1] ⟨∇f(x) − ∇f(y), x − y⟩ ≤ L∥x − y∥ 2 , [2] f(y) ≤ f(x) + ∇f(x) T (y − x) + L 2 ∥y − x∥ 2 , [3] f(y) ≥ f(x) + ∇f(x) T (y − x) + 1 2L ∥∇f(y) − ∇(x)∥ 2 , ∀x, y,2. (50 pts) Backtracking line search. Please show the convergence of backtracking line search on a m-strongly convex and M-smooth objective function f as f x (k) − p ⋆ ≤ c k f x (0) − p ⋆ where c = 1 − min{2mα, 2βαm/M} < 1.For each of the following convex functions, compute the proximal operator proxf . (1) (10 pts) f(x) = λ∥x∥1, where x ∈ R d and λ ∈ R+ is the regularization parameter.(2) (20 pts) f(X) = λ∥X∥∗, where X ∈ R d×m is a matrix, ∥X∥∗ denotes the nuclear norm, and λ ∈ R+ is the regularization parameter. 2 Alternating Direction Method of Multipliers (35 pts) Consider the following problem. minimize − log det X + Tr(XC) + ρ∥X∥1 subject to X ⪰ 0 (1)In (1), ∥ · ∥1 is the entrywise ℓ1-norm. This problem arises in estimation of sparse undirected graphical models. C is the empirical covariance matrix of the observed data. The goal is to estimate a covariance matrix with sparse inverse for the observed data. In order to apply ADMM we rewrite (1) as minimize − log det X + Tr(XC) + IX⪰0(X) + ρ∥Y ∥1 subject to X = Y (2) where IX⪰0(·) is the indicator function associated with the set X ⪰ 0. Please provide the ADMM update (the derivation process is required) for each variable at the t-th iteration.3 Monotone Operators and Base Splitting Schemes (35 pts) Proof the theorem below: Theorem 1. For v ∈ R n, the solution of the equation u ∗ = (I − JW) −T v (3) is given by u ∗ = v + WT u˜ ∗ (4) where I is the identity matrix and u˜ ∗ is a zero of the operator splitting problem 0 ∈ (F + G)(u ∗ ), with operators defined as F(˜u) = (I − WT )(˜u), G(˜u) = Du˜ − v (5) where D is a diagonal matrix defined by J = (I + D) −1 (where Jii > 0).(Hint-1, please refer to Monotone Operators-note.pdf) (Hint-2, I = (I − JW) −T (I − JW) T )

[SOLVED] Si251 – convex optimization homework 4

For each of the following convex functions, compute the proximal operator proxf . (1) (10 pts) f(x) = λ∥x∥1, where x ∈ R d and λ ∈ R+ is the regularization parameter.(2) (20 pts) f(X) = λ∥X∥∗, where X ∈ R d×m is a matrix, ∥X∥∗ denotes the nuclear norm, and λ ∈ R+ is the regularization parameter. 2 Alternating Direction Method of Multipliers (35 pts) Consider the following problem. minimize − log det X + Tr(XC) + ρ∥X∥1 subject to X ⪰ 0 (1)In (1), ∥ · ∥1 is the entrywise ℓ1-norm. This problem arises in estimation of sparse undirected graphical models. C is the empirical covariance matrix of the observed data. The goal is to estimate a covariance matrix with sparse inverse for the observed data. In order to apply ADMM we rewrite (1) as minimize − log det X + Tr(XC) + IX⪰0(X) + ρ∥Y ∥1 subject to X = Y (2) where IX⪰0(·) is the indicator function associated with the set X ⪰ 0. Please provide the ADMM update (the derivation process is required) for each variable at the t-th iteration.3 Monotone Operators and Base Splitting Schemes (35 pts) Proof the theorem below: Theorem 1. For v ∈ R n, the solution of the equation u ∗ = (I − JW) −T v (3) is given by u ∗ = v + WT u˜ ∗ (4) where I is the identity matrix and u˜ ∗ is a zero of the operator splitting problem 0 ∈ (F + G)(u ∗ ), with operators defined as F(˜u) = (I − WT )(˜u), G(˜u) = Du˜ − v (5) where D is a diagonal matrix defined by J = (I + D) −1 (where Jii > 0).(Hint-1, please refer to Monotone Operators-note.pdf) (Hint-2, I = (I − JW) −T (I − JW) T )

[SOLVED] Si251 – convex optimization homework 3

1. (50 pts) L-smooth functions. Suppose the function f : R n → R is convex and differentiable. Please prove that the following relations holds for all x, y ∈ R if f with an L-Lipschitz continuous conditions, [1] ⇒ [2] ⇒ [3] [1] ⟨∇f(x) − ∇f(y), x − y⟩ ≤ L∥x − y∥ 2 , [2] f(y) ≤ f(x) + ∇f(x) T (y − x) + L 2 ∥y − x∥ 2 , [3] f(y) ≥ f(x) + ∇f(x) T (y − x) + 1 2L ∥∇f(y) − ∇(x)∥ 2 , ∀x, y,2. (50 pts) Backtracking line search. Please show the convergence of backtracking line search on a m-strongly convex and M-smooth objective function f as f x (k) − p ⋆ ≤ c k f x (0) − p ⋆ where c = 1 − min{2mα, 2βαm/M} < 1.

[SOLVED] Si251 convex optimization project

In this project, you will explore advanced topics in optimization by engaging deeply with recent research papers. The goal is to enhance your understanding and ability to implement, analyze, and possibly extend current optimization techniques. This project will require you to select a paper, replicate its results, and develop incremental improvements or new insights related to the work.1. Paper Selection: • You can choose from the list of papers provided in the accompanying folder, but for each paper, no more than 3 teams to reproduce it. Alternatively, you are encouraged to select a paper from reputable optimization journals or machine learning conferences published in recent years (after 2019), but you must send an email to inform instructors. Here are some recommended conference and journal: – (Mechine Learning) NeurIPS, ICLR, ICML, JMLR, TPAMI, etc. – (Operation Research) Operational Research, Mathmatical Programming, etc.• The topic of the paper should align with one or more of the following keywords or themes discussed in the course: – Optimal transport – Bilevel optimization – Combinatorial optimization – Implicit differentiation – Diffusion model – Federated learning – Smart predict-then-optimize 1 2. Project requirement: • Replication (Basic Pass Score): You should successfully replicate the study presented in your chosen paper. This involves understanding, coding, and achieving similar results as those documented in the original work. • Incremental Work (Higher Score): To achieve a higher grade, you are expected to make some original contribution. This could be an improvement on the existing methods, application to a new problem, or a novel insight or analysis.3. Assessment Criteria • Replication Accuracy: How closely your results match those of the original paper. • Original Contribution: The significance and relevance of any improvements or new insights you provide. • Clarity and Quality of Presentation and Report: How well you communicate your ideas and findings.4. Project Report: • Submit a report of at least 4 pages, distinct from the original paper. It should detail your replication process, any incremental work, and present your results clearly. • The report should reflect a comprehensive understanding of the topic and document any new contributions made during the project. • The submission should use the NeurIPS 2024 template attached in the accompanying folder.5. Team Collaboration: • Groups of 1-3 students are allowed. Collaboration within your group is essential, as all members will share the same grade. Contributions by all team members should be equitable, with no adjustments made for individual efforts.Hint: Use of Large Language Models (LLMs): • You are permitted to use LLMs for assistance with coding, understanding concepts, or generating ideas.• However, it is imperative that you critically evaluate and understand the output from LLMs. You are responsible for the content and integrity of the final submission. This project is an opportunity to delve into the complexities of optimization, challenge your understanding, and contribute to the field. We look forward to your innovative approaches and solutions.

[SOLVED] Si251 – convex optimization homework 1

1. Please prove that the following sets are convex: 1) S = {x ∈ Rm | | p(t) |≤ 1 for |t| ≤ π/3}, where p(t) = x1 cost + x2 cos 2t + · · · + xm cos mt. (5 pts)2) (Ellipsoids) n x| p (x − xc) T P(x − xc) ≤ r o (xc ∈ R n, r ∈ R, P ⪰ 0). (5 pts)3) (Symmetric positive semidefinite matrices) S n×n + = n P ∈ S n×n|P ⪰ 0 o . (5 pts) 4) The set of points closer to a given point than a given set, i.e., n x | ∥x − x0∥2 ≤ ∥x − y∥2 for all y ∈ S o , where S ∈ Rn. (5 pts)2. (15 pts) For a given norm ∥ · ∥ on Rn, the dual norm, denoted ∥ · ∥∗, is defined as ∥y∥∗ = sup x∈Rn {y T x | ∥x∥ ≤ 1}.Show that the dual of Euclidean norm is the Euclidean mom, i.e., supx∈Rn {z T x | ∥x∥2 ≤ 1} = ||z||2.3. (15 pts) Define a norm cone as C ≡ (x, t) : x ∈ R d , t ≥ 0, ∥x∥ ≤ t⊆ R d+1 Show that the norm cone is convex by using the definition of convex sets.4. (18 pts) Let C ⊂ R n be convex and f : C → R⋆ . Show that the following statements are equivalent: (a) epi(f) is convex. (b) For all points xi ∈ C and {λi |λi ≥ 0, Pn i=1 λi = 1, i = 1, 2, · · · , n}, we have f Xn i=1 λixi ≤ Xn i=1 λif(xi).(c) For ∀x, y ∈ C and λ ∈ [0, 1], f (1 − λ)x + λy ≤ (1 − λ)f(x) + λf(y).5. (14 pts) Monotone Mappings. A function ψ : Rn → Rn is called monotone if for all x, y ∈ domψ, (ψ(x) − ψ(y))T (x − y) >= 0. (1)Suppose f : Rn → Rn is a differentiable convex function. Show that its gradient ∇f is monotone. Is the convex true, i.e., is every monotone mapping the gradient of a convex function?6. (18 pts) Please determine whether the following functions are convex, concave or none of those, and give a detailed explanation for your choice. 1) f1(x1, x2, · · · , xn) = ( −(x1x2 · · · xn) 1 n , if x1, · · · , xn > 0 ∞ otherwise; 2) f2(x1, x2) = x α 1 x 1−α 2 , where 0 ≤ α ≤ 1, on R 2 ++; 3) f3(x, u, v) = − log(uv − x T x) on domf = {(x, u, v)|uv > xT x, u, v > 0}.

[SOLVED] Si152 homework 1 to 4 solutions

Problem i. Write the gradient and Heissan matrix of the following formula. [10pts] x TAx + b Tx + c (A ∈ Rn∗n , b ∈ Rn , c ∈ R)Problem ii. Write the gradient and Heissan matrix of the following formula. [10pts] ∥Ax − b∥ 2 2 (A ∈ Rm∗n , b ∈ Rm)Problem iii. Convert the following problem to linear programming. [10pts] min x∈Rn ∥Ax − b∥1 + ∥x∥∞ (A ∈ Rm∗n , b ∈ Rm)Problem vi. Proof the convergence rates of the following point sequences. [30pts] x k = 1 k x k = 1 k! x k = 1 2 2 k (Hint: Given two iterates x k+1 and x k , and its limit point x ∗ , there exists real number q > 0, satisfies lim k→∞x k+1 − x ∗∥xk − x∗∥ = q if 0 < q < 1, then the point sequence Q-linear convergence; if q = 1, then the point sequence Q-sublinear convergence; if q = 0, then the point sequence Q-superlinear convergence)Problem v. Select the Haverly Pool Problem or the Horse Racing Problem in the courseware, compile the program using AMPL model language and submit it to https://neos-server. org/neos/solvers/index.html.(Hint: both AMPL solver and NEOS solver can be used, please indicate the type of solver used in the submitted job, show the solution results (eg: screenshots attached to the PDF file), and submit the source code together with the submitted job, please package as .zip file, including your PDF and source code.) [40pts]Convert the following problem to a linear program in standard form. [20pts] max x∈R4 2×1 − x3 + x4 s.t. x1 + x2 ≥ 5 x1 − x3 ≤ 2 4×2 + 3×3 − x4 ≤ 10 x1 ≥ 0 (1)Use the two-phase simplex procedure to solve the following problem. [40pts] min x∈R4 − 3×1 + x2 + 3×3 − x4 s.t. x1 + 2×2 − x3 + x4 = 0 2×1 − 2×2 + 3×3 + 3×4 = 9 x1 − x2 + 2×3 − x4 = 6 x1, x2, x3, x4 ≥ 0 (2)3.1 Q1 Prove that the extreme points of the following two sets are in one-to-one correspondence. [20pts] S1 = {x ∈ R n : Ax ≤ b, x ≥ 0} S2 = {(x, y) ∈ R n × R m : Ax + y = b, x ≥ 0, y ≥ 0} (3) , where A ∈ R m×n, b ∈ R m. 1 3.2 Q2 Does the set P = {x ∈ R 2 : 0 ≤ x1 ≤ 1} have extreme points? What is its standard form? Does it have extreme points in its standard form? If so, give a extreme point and explain why it is a extreme point. [20pts]Problem 1. Prove the dual of the dual of a linear programming (standard form) is itself.[25pts]Problem 2. Prove the dual objective increases after a pivot of the dual simplex method.[25pts]Problem 3. Let L(x,λ) be the Lagrangian of a linear programming problem, and (x ∗ ,λ ∗ ) be the optimal primaldual solution. Prove that L(x,λ ∗ ) ≥ L(x ∗ ,λ ∗ ) ≥ L(x ∗ ,λ), for any primal feasible x and dual feasible λ.[25pts]Problem 4. Construct a linear programming problem for which both the primal and the dual problem has no feasible solution.[25pts]Problem 1. f is a positive definite quadratic function f(x) = 1 2 x TAx + b Tx, A ∈ S n ++, b ∈ R n , x k is the current iteration point, d k is the descent direction. Derive the step size of exact linear search [20pts] α k = arg min α>0 f(x k + αdk ).Problem 2. Prove that f : R n → R is affine if and only if f is both convex and concave. [20pts]Problem 3. Solve the optimal solution of the Rosenbrock function f(x, y) = (1 − x) 2 + 100(y − x 2 ) 2 , using MATLAB programming to implement three algorithms (each 20pts): gradient descent (GD) method, Newton’s method, and Quasi-Newton methods (either rank-1, DFP or BFGS).You are required to print iteration information of last 10 steps: including objective, step size, residual of gradient. Technical implementation: explain how to choose the step size, how to set the termination criteria, how to choose the initial point, the value of the required parameters, converge or not and convergence rate. (paste the code in the pdf to submit it, no need to submit the source code) [60pts]

[SOLVED] Si251 – convex optimization homework 2

In the lecture, we have learned about robust linear programming as an application of second-order cone programming. Now we will consider a similar robust variation of the convex quadratic program minimize (1/2)x T P x + q T x + r subject to Ax ⪯ b.For simplicity, we assume that only the matrix P is subject to errors, and the other parameters (q, r, A, b) are exactly known. The robust quadratic program is defined as minimize supP ∈E (1/2)x T P x + q T x + r subject to Ax ≺ b where E is the set of possible matrices P.For each of the following sets E, express the robust QP as a convex problem in a standard form (e.g., QP, QCQP, SOCP, SDP). (a) A finite set of matrices: E = {P1, …, PK}, where Pi ∈ S n +, i = 1, …, K. (b) A set specified by a nominal value P0 ∈ S n + plus a bound on the eigenvalues of the deviation P − P0: E = {P ∈ S n | −γI ⪯ P − P0 ⪯ γI} where γ ∈ R and P0 ∈ S n +. (c) An ellipsoid of matrices: E = ( P0 + X K i=1 Piui | ∥u∥2 ≤ 1 ) .You can assume Pi ∈ S n +, i = 0, . . . , K.Please consider the convex optimization problem and calculate its solution minimize − Pn i=1 log (αi + xi) subject to x ⪰ 0, 1 T x = 1,