Instructions • Online submission: You must submit your solutions online on the course Gradescope site (you can find a link on the course Brightspace site). You need to submit (1) a PDF that contains the solutions to all questions to the Gradescope HW3 Paperwork assignment (including the questions asked in the programming problems), (2) x.py or x.ipynb files for the programming questions to the Gradescope HW3 Code Files assignment. We recommend that you type the solution (e.g., using LATEX or Word), but we will accept scanned/pictured solutions as well (clarity matters). • Generative AI Policy: You are free to use any generative AI, but you are required to document the usage: which AI do you use, and what’s the query to the AI. You are responsible for checking the correctness. Before you start: this homework only has programming problems. You should still have all questions answered in the write-up pdf. Also note that some sub-problems are still essentially math problems and you need to show detailed derivations. 1 Programming Problem: Logistic Regression [40 points] In this problem, you will implement the gradient descent algorithm and mini-batch stochastic gradient descent algorithm for multi-class logistic regression from scratch (which means that you cannot use builtin logistic regression modules in scikit-learn or any other packages). Then you will be asked to try your implementation on a hand-written digits dataset to recognize the hand-written digits in given images. We provide an ipython notebook “logistic regression digits.ipynb” for you to complete. You need to complete the scripts below “TODO” (please search for every “TODO”), and submit the completed ipynb file. In your write-up, you also need to include the plots and answers to the questions required in this session. In this problem, you will implement a logistic regression model for multi-class classification. Given features of a sample x, a multi-class logistic regression produces class probability (i.e., the probability of the sample belonging to class k) xW b z Pr((1) where c is the number of all possible classes, model parameter W is a d × c matrix and b is a c-dim vector, and we call the c values in z = xW +b as the c logits associated with the c classes. Binary logistic regression model (c = 2) is a special case of the above multi-class logistic regression model (you can figure out why by yourself with a few derivations). The predicted class ˆy of x is yˆ = arg max Pr(y = k|x;W,b). (2) k=1,2,···,c For simplicity of implementation, we can extend each x by adding an extra dimension with value 1, i.e., x ← [x,1], and accordingly add an extra row to W, i.e, W ← [W;b]. After using the extended representation of x and W, the logits z become z = xW. Logistic regression solves the following optimization for maximum likelihood estimation. minF(W) wherelog[Pr(, (3) W where we use a regularization similar to the ℓ2-norm in ridge regression, i.e., the Frobenius norm ∥W∥2F = . Note that W·,j is the jth column of W. 1. [15 points] Derive the gradient of F(W) w.r.t. W, i.e., ∇WF(W), and write down the gradient descent rule for W. Hint: compute the gradient w.r.t. to each Wj for every class j first. 2. [10 points] Below “2 batch gradient descent (GD) for Logistic regression”, implement the batch gradient descent algorithm with a constant learning rate. To avoid numerical problems when computing the exponential in the probability Pr(y = k|x;W,b), you can use a modification of the logits z′, i.e., z′ = z − maxzj. (4) j Explain why such a modification could avoid potential numerical problems and show that the overall result is unchanged by applying such a trick. When the change of objective F(W) comparing to F(W) in the previous iteration is less than ϵ = 1.0e − 4, i.e., |Ft(W) − Ft−1(W)| ≤ ϵ, stop the algorithm. Please record the value of F(W) after each iteration of gradient descent. Please run the implemented algorithm to train a logistic regression model on the randomly split training set. We recommend to use η = 0.1. Try three different learning rates [5.0e − 3,1.0e − 2,5.0e − 2], report the final value of F(W) and training/test accuracy in these three cases, and draw the three convergence curves (i.e., Ft(W) vs. iteration t) in a 2D plot. 3. [5 points] Compare the convergence curves: what is the advantages and disadvantages of large and small learning rates? 4. [5 points] Below “4 stochastic gradient descent (SGD) for Logistic regression”, implement the minibatch stochastic gradient descent (SGD) for logistic regression. You can reuse some code from the previous gradient descent implementation. You can start by using an initial learning rate of 1.0e−2 and a mini-batch size of 100 for this problem. You can discard the last mini-batch of every epoch if it is not full. Please remember to record the value of F(W) after each epoch and the final training and test accuracy. Run your code for different mini-batch sizes: [10, 50, 100]. Report the final value of F(W) and final training/test accuracy, and draw the three convergence curves (Ft(W) vs. epoch t) in a 2D plot. 2 Programming Problem: Lasso [60 points] You need to complete everything below each of the “TODO”s that you find (please search for every “TODO”). Once you have done that, please submit the completed ipynb file as part of your included .zip file. In your write-up, you also need to include the plots and answers to the questions required in this session. Please include the plots and answers in the pdf file in your solution. Recall that for lasso, we aim to solve: argminθ,θ0F(θ,θ0) where . (5) Remarks: Do not include θ0 in the computation of precision/recall/sparsity. However, do not forget to include it when you compute the prediction produced by the lasso model, because it is one part of the model. 1. [15 points] Implement the coordinate descent algorithm to solve the lasso problem in the notebook. We provide a function “DataGenerator” to generate synthetic data in the notebook. Please read the details of the function to understand how the data are generated. In this problem, you need to use n = 50,m = 75,σ = 1.0,θ0 = 0.0 as input augments to the data generator. Do not change the random seed for all the problems afterwards. Stopping criteria of the outer loop: stop the algorithm when either of the following is fulfilled: 1) the number of steps exceeds 100; or 2) no element in θ changes more than ϵ between two successive iterations of the outer loop, i.e., maxj |θj(t)−θj(t−1)| ≤ ϵ, where the recommended value for ϵ = 0.01, where θj(t) is the value of θj after t iterations. At the beginning of the lasso, use the given initialization function “θ,θ0 = Initialw(X, y)” to initialize θ and θ0 by the least square regression or ridge regression. You can try different values of λ to make sure that your solution makes sense to you (Hint: DataGenerator gives the true θ and θ0). Solve lasso on the generated synthetic data using the given parameters and report indices of non-zero weight entries. Plot the objective value F(θ,θ0) v.s. coordinate descent step. The objective value should always be non-increasing. 2. [5 points] Implement an evaluation function in the notebook to calculate the precision and recall of the non-zero indices of the lasso solution with respect to the non-zero indices of the true vector that generates the synthetic data. Precision and recall are useful metrics for many machine learning tasks. For this problem in specific, |{non-zero indices in θˆ} ∩ {non-zero indices in θ∗}| precision = ; |{non-zero indices in θˆ}| (6) |{non-zero indices in θˆ} ∩ {non-zero indices in θ∗}|recall = non-zero indices in θ∗}| , |{ (7) where θ∗ is the θ in true model weight, while θˆ is the θ in lasso solution. 3. [10 points] Vary λ and solve the lasso problem multiple times. Choose 50 evenly spaced λ values starting with λmax = ∥(y − y¯)X∥∞ (¯y is the average of elements in y, and ∥a∥∞ = maxj aj), and ending with λmin = 0. Plot the precision v.s. λ and recall v.s. λ curves on a single 2D plot. Briefly explain the plotted pattern and curves. On top of this, try to have fun with λ and play with this hyperparameter, explore, discover, and tell us what you have discovered. Draw a “lasso solution path” for each entry of θ in a 2D plot. In particular, use λ as the x-axis, for each entry θi in θ achieved by lasso, plot the curve of θi vs. λ for all the values of λ you tried similar to the plot we showed in class from the Murphy text (in your case, there are 50 points on the curve). Draw such curves for all the m entries in θ within a 2D plot, use the same color for the 5 features in DataGenerator used to generate the data, and use another very noticeably distinct color for other features. If necessary, set a proper ranges for x-axis and y-axis, so you can see sufficient detail. Now change the noise’s standard deviation σ = 10.0 when using “DataGenerator” to generate synthetic data, draw the lasso solution path again. Compare the two solution path plots with different σ, and explain their difference. Be complete, and clear. 4. [5 points] Use the synthetic data generation code with different parameters: (n = 50,m = 75),(n = 50,m = 150),(n = 50,m = 1000),(n = 100,m = 75),(n = 100,m = 150),(n = 100,m = 1000) (keeping other parameters the same as in Sub-Problem 1). Vary λ in the same way as in the previous question (Sub-Problem 3), and find the λ value that can generate both good precision and recall for each set of synthetic data points. For each case, draw the “lasso solution path” defined in Sub-Problem 3. 5. [25 points] This question is challenging, requiring major changes to your previous implementation as well as significant training time. Run lasso to predict review stars on Yelp by selecting important features (words) from review comments. We provide the data in hw3 data.zip. You can unzip the file and use the provided function “DataParser” in the notebook to load the data. There are three files: “star data.mtx”, “star labels.txt”, “star features.txt”. The first file stores a matrix market matrix and DataParser reads it into a scipy csc sparse matrix, which is your data matrix X. The second file contains the labels, which are the stars of comments on Yelp, is your y. The third file contains the names of features (words). For the last two txt files, you can open them in an editor and take a look at their contents. The sparse data X has size 45000 × 2500, and is split into the training set (the first 30000 samples), validation set (the following 5000 samples) and the test set (the last 10000 samples) by “DataParse”. Each column corresponds to a feature, i.e., a word appearing in the comments. Your mission is to solve lasso on the training set, tune the λ value to find the best RMSE on the validation set, and evaluate the performance of the obtained lasso model on the test set. Important to read before you start: Here, we are dealing with a sparse data matrix. Most numpy operations for dense matrices you used for implementing lasso in Sub-Problem 1 cannot be directly applied to sparse matrices here. You can still use the framework you got in Sub-Problem 1, but you need to replace some dense matrix operations (multiply, dot, sum, slicing, etc.) by using sparse matrix operations from “scipy.sparse” (please refer to https://docs.scipy.org/doc/scipy/ reference/sparse.html for details of sparse matrix operations). The sparse matrix format here aims to help you make the algorithm more efficient in handling sparse data. Do not try to directly transform the sparse matrix X to a dense one by using “X.todense()”, since it will waste too much memory. Instead, try to explore the advantages of different sparse matrix types (csc, coo, csr) and avoid their disadvantages, which are listed under each sparse matrix type in the above link. You can change the format of a sparse matrix X to another one by using (for example) “X.tocsc()” if necessary, but do not use it too often. For some special sparse matrix operations, it might be more efficient to write it by yourself. We provide an example “cscMatInplaceEleMultEveryRow” in the notebook. You can use it, or modify it for your own purpose. This will be a good practice for you to think about how to write an efficient ML algorithm. Try to avoid building new objects inside the loop, or computing anything from scratch. You can initialize them before the loop start, and use the lightest way to update them in the loop. Note any operation that seems “small” inside the loop could possibly lead to expensive computations, considering the total number of times it will be executed. You can use “if” to avoid unnecessary operations on zeros. Do not loop over matrices or vectors if not necessary: use matrix or batch operations provided by numpy or scipy instead. Try to use the most efficient operation when there are many choices reaching the same result. If you write an inefficient code here, running it will take extremely longer time. During debugging, timing each step or operation in the loop will help you figure out which step takes longer time, and you can then focus on how to accelerate it. • Explain how do you modify your implementation to make the code more efficient compared to the “naive” implementation you did for Sub-Problem 1. Compare the computational costs of every coordinate descent iteration in terms of n (number of data points) and m (number of dimensions). • Plot the training RMSE (on the training set) v.s. λ values and validation RMSE (on the validation set) v.s. λ values on a 2D plot. Use the definition of λmax in sub-problem 3 and run experiments on multiple values of λ. You can reduce the number of different λ values to 20. You can also increase the minimal λ to be slightly larger than 0 such that 0 ≤ λmin ≤ 0.1λmax. These two changes will save you some time. • Plot the lasso solution path defined in Sub-Problem 3. • Report the best λ value achieving the smallest validation RMSE you find on the validation set, and report the corresponding test RMSE (on the test set). • Report the top-10 features (words in comments) with the largest magnitude in the lasso solution w when using the best λ value, and briefly explain if/why they are meaningful.
Instructions • Online submission: You must submit your solutions online on the course Brightspace site. You need to submit (1) a PDF that contains the solutions to all questions (2) x.py or x.ipynb files for the programming questions. We recommend that you type the solution (e.g., using LATEX or Word), but we will accept scanned/pictured solutions as well (clarity matters). • Generative AI Policy: You are free to use any generative AI, but you are required to document the usage: which AI do you use, and what’s the query to the AI. You are responsible for checking the correctness. 1 Simpson’s Paradox [10 points] Imagine you and a friend are playing the slot machine in a casino. Having played on two separate machines for a while, you decide to swap machines between yourselves, and measure for differences in “luck”. The wins/losses for you and your friend for each machine are tabulated below. Machine 1 Wins Losses You 40 60 Friend 30 70 Machine 2 Wins Losses You 210 830 Friend 14 70 Assuming that the outcomes of playing the slot machine are independent of their history and that of the other machine, answer the following questions. 1. (3 points) Estimate the winning probability of you and your friend for each of the machines. Compare your winning probability with your friend’s on different machines. Who is more likely to win? 2. (3 points) Estimate the overall winning probability of you and your friend in the casino (assume that there are only two slot machines in the casino). Who is more likely to win? 3. (4 points) Compare your conclusions from (1) and (2). Can you explain them mathematically? I.e., write down the equations to compute the winning probabilities and compare the terms.Figure 1: Problem 2 2 Matrix as Operations [25 points] On the 2D plane, we have four vectors (all with the starting point on the origin) a1 = (1,1), a2 = (1,−1), b1 = (−0.8,1.6) and b2 = (2.6,−0.2). See Fig. 1 1. [5 points] Write down one 2 × 2 matrix W that transforms a1 to b1, and a2 to b2. 2. [5 points] Write down one rotation matrix V , which rotates clockwise by α degrees such that tan(α) = 3. Then, write down one matrix Σ, which scales the y-axis by 2, while keeping the x-axis unchanged. Finally, write down one rotation matrix U, which rotates counter-clockwise by β degrees, such that tan(β) = 1/3. Multiply three matrices together, namely UΣV , what do you discover? 3. [10 points] Compute the eigenvalues and the corresponding eigenvectors of WTW. Now consider the unit circle. Suppose every point on the unit circle gets transformed by W; what shape do you get after the transformation (consider the break-up transformations in the previous question)? What relationship do you find between the eigenvalues/eigenvectors and the transformed shape, and why? 4. [5 points] Compute the determinant of W. What’s the area of the shape transformed from the unit circle? What is the relationship between the determinant and the area of a transformed shape? Based on that, can you use one sentence to explain that the determinant of a product of two matrices is equal to the product of the determinants of the two matrices or, in other words, det(AB) = det(A)det(B)? 3 Some Practices [10 points] 1. [3 points] For a discrete random variable X of finite values, show that (E[X3])2 ≤E[X6]. (1) You only need the materials taught in class. 2. [3 points] Prove the above in a different way. Again, you only need the materials taught in class. 3. [4 points] For two PSD matrices A,B both of shape n × n, show that λA + (1 − λ)B is also PSD for 0 ≤ λ ≤ 1. 4 Programming Problem: Density Estimation of Multivariate Gaussian [25 points] Please check the following tutorials if you are not familiar with numpy: http://cs231n.github.io/ python-numpy-tutorial/ and https://docs.scipy.org/doc/numpy-1.15.0/user/quickstart.html In this problem, you will estimate a multivariate Gaussian distribution from 2D points sampled a multivariate Gaussian distribution, and learn how to use python and matplotlib to generate simple plots in ipython notebook. If you are not familiar with jupyter notebook, please check a quick tutorial on https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html. We provide an ipython notebook “2d gaussian.ipynb” for you to complete. In your terminal, please go to the directory where this ipynb file is located, and run command “jupyter notebook”. A local webpage will be automatically opened in your web browser. Click the above file to open the notebook. You need to complete everything below each of the “TODO”s that you find (please search for every “TODO”). Once you have done that, please submit the completed ipynb file as part of your included .zip file. In your write-up, you also need to include the plots and answers to the questions required in this session. Please include the plots and answers in the pdf file in your solution. 1. [2 points] Run the first cell in the ipython notebook to generate 1000 points X (a 1000×2 matrix, each point has two coordinates) sampled from a multivariate Gaussian distribution. Estimate the mean and covariance of the sampled points and report them in your write-up. 2. [3 points] Plot the histogram for the x-coordinates of X and y-coordinates of X respectively. You can use the plt.hist() function from matplotlib. 3. [5 points] From the histogram, can you tell whether the x-coordinates of the 1000 points (the first column of X) follow some Gaussian distribution? If so, compute the mean and the variance. How about the y-coordinates? If so, compute the mean and the variance. 4. [5 points] Sample 1000 numbers from a 1D Gaussian distribution with the mean and variance of the x-coordinates you got from subproblem 3. Sample another 1000 numbers from a 1D Gaussian distribution with the mean and variance of the y-coordinates. You can use np.random.normal from numpy. Generate a new 2D scatter plot of 1000 points with the first 1000 numbers as x-coordinates and the second 1000 numbers as y-coordinates. Compared to the original 1000 points, what is the difference in their distributions? What causes the difference? 6. [5 points] Draw the histogram of the x-coordinates of the projected points. Are the x-coordinates of the projected points sampled from some Gaussian distribution? If so, compute the mean and the variance. 5 Programming Problem: KNN for Iris flowers classification [20 points] For this problem, we want to use the K nearest neighbor algorithm to classify Iris flowers. 1. [5 Points] Load the Iris data using sklearn.datasets. Calculate how many elements there are for every class. 2. [5 Points] Build a KNeighborsClassifier with k = 1 to predict the class. Train it on the whole dataset. For the classification problem, different goodness-of-fit metrics are used. For this exercise, you can use accuracy, which is defined in the formula given below. Calculate the accuracy of the KNN classifier on the iris dataset. Does this result give you meaningful information? #correctly predicted Accuracy = (2) M 3. [5 Points] Split the dataset into two parts using sklearn’s train test split. Use the following arguments: (a) X,y : the dataset (b) test size = 0.5 (use 50% of the dataset for testing) (c) shuffle = True (randomly shuffle the dataset before making a cut) (d) random state = 0 (random seed, this ensures consistent results) Use the split to find the optimal value of k. Please try different values of k from 1 to 50, fit the model on the training data, and calculate the model’s accuracy on the training data and the testing data, respectively. Plot the training accuracy and testing accuracy against the value of k. Which k value is the best? 4. [5 Points] You observed a flower and measured the following characteristics: (a) sepal width = 5.0 (b) petal width = 4.1 (c) sepal length = 3.8 (d) petal length = 1.2 Use your prediction model to classify this plant. What’s the predicted class? 6 Programming Problem: K Means clustering [10 points] For this question, use the data in clust data.csv. We will attempt to cluster the data using k-means. But, what k should we use? 1. [5 Points] Apply k-means to this data 15 times, changing the number of centers from 1 to 15. Each time use nstart = 10 and store the within-cluster sum-of-squares/inertia value from the resulting object. The inertia measures how variable the observations are within a cluster. This value will be lower with more centers, no matter how many clusters there truly are naturally in the given data. Plot this value against the number of centers. Look for an “elbow”, the number of centers where the improvement suddenly drops off. Based on this plot, how many clusters do you think should be used for this data? 2. [2 Points] Re-apply k-means for your chosen number of centers. How many observations are placed in each cluster? What is the value of inertia? 3. [3 Points] Visualize this data. Plot the data using the first two variables and color the points according to the k-means clustering. Based on this plot, do you think you made a good choice for the number of centers? Briefly explain your conclusion.
This assignment has in total 55 base points and 10 extra points, and the cap is 55. Bonus questions are indicated using the ⋆ mark. Please specify the following information before submission: Problem 1: Infinite knapsack [10⋆ +15 pts] (a)⋆ Prove that in any optimal solution (n1,…,nm), the total number of copies of the items I2,…,Im included in the knapsack is at most wmax − 1, i.e., (b) Based on the conclusion of (a), design an algorithm Knapsack(m,W,(w1,v1),…,(wm,vm)) for infinite knapsack with running time ). For convenience, your algorithm only need to return the maximum value achieved. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. (Hint: According to (a), we know that, in an optimal solution, the total weight of the items ), and hence the item I1 occupies “most” space of the knapsack. Try to combine this observation with the O(mW)-time DP algorithm.) Solution. Please write down your solution to Problem 1 here. (a) We prove the problem by contradiction. Suppose there exists an optimal solution where . We denote these items as where n ≥ wmax, and for each , ∃i = 2,…,m such that wj′ = wi. There can be multiple wj′ s that are equal to the same wi. Now, we want to show that ∃1 ≤ l ≤ r ≤ n such that is a multiple of w1. Consider the sum of the first k wj′ s, denoted as . Assume Sk ≡ Rk(mod w1), where Rk = 0,1,…,w1−1. (i) if ∃k such that Rk = 0, then take is a multiple of w1. (ii) if ∄k such that Rk = 0, Rk can only take w1 − 1 possible values. Because n ≥ wmax ≥ w1 ≥ w1 − 1, by the Pigeon-hole Principle, ∃k1,k2 such that Rk1 = Rk2, hence take w1). Next, we can take 1 ≤ l ≤ r ≤ n such that . Therefore, we can replace with t w1’s, and the final value is going to be larger, which is contradictory to our assumption that the original solution is optimal. Therefore, , i.e., (b) We know from (a) that 1, hence . Consider the naive infinite knapsack algorithm. Let DP(i,w) represent the maximul value by choosing from items {i,i + 1,…,m} with total weight w. Because we can either take ni the i-th item or take none of them, DP(i,w)=max{DP(i + 1,w), vi+DP(i,w − wi)}. Because i = 1,2,…,m, w = 0,1,…,W, hence this algorithm is of time complexity O(nW). Now, with the conclusion from (a), we can limit the search space of w for i = 2,…,m to . Then, the global maximum is equal to maxw{DP(2,w . For items 2,…,m, given any total weight limit w, with the infinte knapsack problem, we can obtain the optimal solution that maximizes the total value of the items chosen. For the whole problem, we learned from (a) that the total weight of items 2,…,m ≤ wmax2. Therefore, we can limit the total for 2,…,m to wmax2 and simply take as many item 1 as possible for the weight limit we haven’t used. By using this method, we can obtain the correct optimal solution. Because the weight limit for item 2,…,m shrinks from , and it takes O(1) time to add item 1 to the partial solution, the time complexity of the revised algorithm is Below is the pseudocode: def KNAPSACK(m, W, I ): # I is a l i s t of (w, v ) ’ s I = sorted(I , key=lambda x : x . v / x .w) w max = max([ item .w for item in I ]) DP = [[0 for w in range(w max ∗∗ 2)] for idx in range(m + 1)] # m+1 is set to handle boundary cases for idx in range(m − 1 , 0 , −1): for w in range(w max ∗∗ 2 + 1): if w >= I [ idx ] .w: DP[ idx ] [w] = max( I [ idx ] . v + DP[ idx ] [w − I [ idx ] .w] , DP[ idx + 1][w]) else : DP[ idx ] [w] = DP[ idx + 1][w] ans = max([DP[ 1 ] [w] + floor ((W − w) / I [ 0 ] .w) ∗ I [ 0 ] . v ]) return ans Problem 2: Counting shortest paths [15 pts] Given an undirected graph G = (V,E) and a vertex s ∈ V , we want to know for every vertex t ∈ V , how many (distinct) shortest paths from s to t there are. This can be viewed as a generalization of the unique shortest-path problem. Design an algorithm Count(G,s) to solve this problem. The algorithm should return a list L that contains a pair (v,δv) for each vertex v of G, where δv is the number of shortest paths from s to v. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. A sample example. Suppose the input graph is G = (V,E) where V = {s,a,b,c,d,e} and E = {(a,b),(b,c),(c,d),(a,d),(s,a),(c,e)}; see the figure below. Then the algorithm Count(G,s) should return L = [(s,1),(a,1),(b,1),(c,2),(d,1),(e,2)]. (Hint: Exploit the BFS tree and extend the idea in the unique shortest-path problem.)Solution. Please write down your solution to Problem 2 here. We solve the problem by BFS. Consider a BFS tree of the graph, we view s as the root, and perform a level-order traversal. For each node, we count the number of it on that level as the number of shortest paths. For the neighbors of that node, if the number of shortest paths has not been counted for a neighbor, we append it to the queue that maintains the BFS tree, otherwise we just ignore it, because it must have been studied in a higher level, and the node’s appearance in the current level does not corresponds to a shortest path. The pseudocode is as follows: class Node: def i n i t ( s e l f ): s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) other . neighbors . append( s e l f ) def COUNT(G, s ): (V, E) = G # V: l i s t of Node ; E: l i s t of tuples of 2 Nodes for edge in E: edge [ 0 ] . add neighbor ( edge [1]) cnt = {} queue = [ s ] while queue != [ ] : L = len(queue) print(L) for u in queue [ : L ] : cnt [u] = cnt . get (u, 0) + 1 print(u. neighbors ) for v in u. neighbors : if v not in cnt . keys (): queue . append(v) queue = queue [L : ] L = [( item [0] , item [1]) for item in cnt . items ()] return L This algorithm is correct because all the nodes that have the same shortest-path-length are on the same level of the BFS tree, so by counting the times a node appear in the shallowest level where it appears, we can count the number of shortest paths to the node. Because only the shortest path will be considered, each edge will be visited once and only once. The sum of the number of times each node is visited is O(m), therefore, this algorithm is O(m) (and consequently O(m + n)). Problem 3: Reaching the one [15 pts] Let n ≥ 3 be a given positive integer and we are going to play a game as follows. During the game, we have a number k which is initially equal to 2. In each round of the game, we are allowed to change the current number k to a new number in one of the following ways: • Type-A: Change k to (k + 1) mod n. • Type-B: Change k to a divisor of k that is greater than 1. • Type-C: Change k to k2 mod n. Note that the above three rules guarantee that our number k is always in the range {0,1,…,n−1} during the entire game. The game terminates when k becomes 1. Our goal is to finish the game in fewest rounds. For example, suppose n = 91 and we can play the game as follows. • Round 1: 2 =⇒ 3 using Type-A change. • Round 2: 3 =⇒ 9 using Type-C change. • Round 3: 9 =⇒ 81 using Type-C change. • Round 4: 81 =⇒ 27 using Type-B change. • Round 5: 27 =⇒ 1 using Type-C change. As you can verify, this is an optimal solution. Design an algorithm Game(n) which returns an optimal solution to the game as a list. So Game(91) should return [2,3,9,81,27,1] (or another solution with 5 rounds). Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(nlogn) time. (Hint: Formulate the problem as a graph problem with n vertices and O(nlogn) edges.) Solution. Please write down your solution to Problem 3 here. We construct a graph based on the problem setting: all the possible remainders of n: 0,1,…,n−1 are constructed as n vertices. For Type-A operation, we connect vertex k to vertex k+1 for k = 0,1,…,n−2, and connect n−1 to 0. For Type-B operation, we connect vertex k to all its divisors larger than 1. For Type-C operation, we connect vertex k to k2(mod n) for k = 2,…,n − 1. Next, we consider the total number of directed edges. For Type-A, the number of directed edges is n; for Type-B, the total number of edges is ; for Type-C, the total number of edges is n − 2. Therefore, the total number of directed edges 1) . By integral test, +1, where ). Therefore, )). Also, n is O(nlog(n)), hence the number of directed edges is O(nlog(n)). Next, we perform a BFS search on this graph G. We use a queue to maintain the BFS procedure. We first append the starting vertex 2 to the queue. Every time, for the first vertex in the queue, if it’s 1, we quit the BFS; otherwise we append all of its unvisited neighbors to the end of the queue, and mark the current vertex the previous vertex of all its neighbors (if a neighbor does not have a recorded previous vertex). This algorithm can always find the shortest path between 2 and 1, because the nearest vertex will always come to the head of the queue. So when 1 is visited, the path is guaranteed to be shortest. The time complexity of BFS is O(nlog(n)) because the total number of edges in the graph is O(nlog(n)), and each edge is visited at most once. The time needed to construct the graph is also O(nlog(n)) because each edge takes O(1) time to construct. Therefore, the overall complexity of the algorithm is O(nlog(n)). The pseudocode is given as follows: class Vertex : def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) def GAME(n ): V = [ Vertex ( i ) for i in range(n )] # Type A edge for i in range(n − 1): V[ i ] . add neighbor (V[ i + 1]) # Type B edge for i in range(2 , n ): for j in range(2 ∗ i , n, i ): V[ j ] . add neighbor (V[ i ]) # Type C edge for i in range(2 , n ): if i ∗∗ 2 % n != i : V[ i ] . add neighbor (V[ i ∗∗ 2 % n ]) # BFS visited = [0 for i in range(n )] prev = [−1 for i in range(n )] queue = [2] while queue != [ ] : head = queue [0] queue = queue [ 1 : ] visited [ head ] = 1 if head == 1: break for n in V[ head ] . neighbors : if not visited [n. val ] : queue . append(n. val ) if prev [n. val ] == −1: prev [n. val ] = head sol = [1] while sol [−1] != −1: sol . append( prev [ sol [ −1]]) return sol [ −2:: −1] Problem 4: Finding a strongly connected component [10 pts] Given a directed graph G = (V,E) and a vertex s ∈ V , we want to compute the strongly connected component of G containing s. Design an algorithm SCC(G,s) which returns the set of vertices of G that lie in the same strongly connected component as s. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. (Hint: By definition, a vertex v is in the same strongly connected component as s iff v is reachable from s and s is reachable from v. We already know how to find the vertices reachable from s using DFS. Can you use a similar idea to compute the vertices from which s is reachable?) Solution. Please write down your solution to Problem 4 here. For each node u, we store all the nodes v that (u,v) ∈ E as its out-neighbors and all the nodes w that (w,u) ∈ E as its in-neighbors. Then we perform DFS based on out-neighbors to obtain all the nodes that s can reach, and DFS based on in-neighbors to obtain all the nodes that can reach s. Lastly, we take the intersection of the two results, so that we can obtain the set of points in the strongly connected components containing s. This algorithm is correct, as we know DFS can reach a point t iff it’s reachable from s. If we reverse the directions of all edges, and perform DFS again, a point t is reached iff ∃ a path consisting of reversed from s to t, i.e., ∃ a edge from t to s, s is reachable from t. By taking the union of the results obtained by the 2 DFS’s, a node t is kept iff it is reachable from s and it can reach s, i.e., it is strongly connected to s. The time complexity to process in-neighbors and out-neighbors is O(m). The time complexity of DFS is O(n + m). Therefore, the time complexity of the whole algorithm is O(n + m). The pseudocode is as follows: class Node: def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ [ ] , [ ] ] # 0 for in , 1 for out def add neighbor ( self , other ): s e l f . neighbors [ 1 ] . append( other ) other . neighbors [ 0 ] . append( s e l f ) # add a edge from s e l f to other def SCC(G, s ): (V, E) = G for (u, v) in E: u. add neighbor (v) res = [ set () , set ()] # res [0] for a l l the nodes can be reached from s ; # res [1] for a l l the nodes that can reach s def DFS(u, direction = 0): # direction = 0 for in−neighbors , 1 for out−neighbors res [ direction ] . add(u) for v in u. neighbors [ direction ] : if v not in res [ direction ] : DFS(v , direction = direction ) DFS(s , 0) DFS(s , 1) ans = res [ 0 ] . intersection ( res [1])ans . remove( s ) return ans
This assignment has in total 70 base points and 10 extra points, and the cap is 70. Bonus questions are indicated using the ⋆ mark. Please specify the following information before submission: • Your Name: Yufeng Xu • Your NetID: yx3038 Problem 1: Packing heavy unit intervals [15 pts] Suppose we are given a set P = {x1,…,xn} of n points on the x-axis and a positive integer k. We say an interval on the x-axis is k-heavy if it contains at least k points in P. We consider left-open right-closed unit intervals, i.e., intervals of the form I = (a,b] where b − a = 1 (for convenience, below we simply call them unit intervals). Our goal is to find a maximum number of k-heavy unit intervals on the x-axis that are disjoint from each other. For example, if and k = 1, then one optimal solution is I1 = (−1.6,−0.6], I2 = (−0.6,0.4], I3 = (0.5,1.5], which consists of three disjoint k-heavy unit intervals (note that the optimal solution is not necessarily unique). Design a greedy algorithm HeavyInt(n,P,k) to solve this problem, which should output the set of intervals you find. First describe your greedy strategy and prove its correctness. Then implement your algorithm by giving the pseudocode. Your algorithm is supposed to run in O(n) time, assuming the points in P have already been sorted in the left-right order. Solution. Please write down your solution to Problem 1 here. Intuition of the algorithm: First, for P = {x1,…,xn}, choose the smallest t1(t1 ≥ k) such that xt1+k−1 − xt1 < 1. Take (xt1+k−1 − 1,xt1+k−1] as I1 = (a1,b1]. Repeat the process for P2 = {xl,…,xn} such that xl is the smallest element in P that is larger than b1. Take t2 ≥ l such that xt2+k−1 − xt2 < 1, then let a2 = max(b1,xt2+k−1 − 1), I2 = (a2,a2 + 1]. Note we are choosing the smallest non-conflicting interval with this strategy. Repeat until for some Pc+1 that we can no longer find tc+1 such that xtc+k−1 − xtc < 1. Then the total number of intervals is c. Proof of correctness: . Assume the intervals chosen by this greedy algorithm are Ig1,Ig2,…,Igc, we want to show for any j ∈ {0,1,…,c}, ∃ an optimal solution that contains Ig1,…,Igj. When j = 0, an optimal solution doesn’t have to contain any intervals among Ig1,…,Igc, so this statement is true. Now, assume for some j, ∃ an optimal solution that contains Ig1,…,Igj. Assume the j + 1th chosen interval in this optimal solution is Ix, if we replace Ix with Igj+1, by definition, Igj+1 is the interval with the smallest right end after choosing Ig1,…,Igj, hence the right end of Igj+1 is smaller than that of Ix. In other words, replacing Ix with Igj+1 doesn’t interfere the selection of the subsequent intervals. The modified solution is still optimal. By induction, when j = c, ∃ an optimal solution that contains Ig1,…,Igc. By definition, after making this selection, we can no longer choose another interval, hence this optimal solutions is {Ig1,…,Igc}, and the greedy algorithm is optimal. Python implementation: def HeavyInt(n, P, k ): t = 0 intervals = [ ] while t + k − 1 < n: while t + k − 1 < n and P[ t + k − 1] − P[ t ] >= 1: t += 1 if t + k − 1 < n: if len( intervals ) > 0: l e f t = max( intervals [ −1][1] , P[ t + k − 1] − 1) else : l e f t = P[ t + k − 1] − 1 intervals . append (( left , l e f t + 1)) while t + k − 1 < n and P[ t ] pivot : R. append( each ) rank [ pivot ] = add + len(L) indexer (L, add , rank) indexer (R, add + len(L) + 1 , rank) indexer (A, 0 , rank) def swapSort(n, A): swap cnt = 0 for i in range(n − 1): while i != rank [A[ i ] ] : swap( i , rank [A[ i ] ] ) swap cnt += 1 print( swap cnt ) swapSort(n, A) Analysis of number of oracle calls: After obtaining the rank of all of the elements, for each swap, we put at least 1 element in place and will never put one element was already in place out of place. Therefore, in at most n − 1 swaps, we will put n − 1 elements in place, so the remaining one element will also be in place. In other words, all n elements will be in place after n − 1 calls, so the number of times we call oracle swap is at most n − 1. (b) A counter example: A = [6,5,1,3,2,4]. Illustration: Assume ∃i < j such that A[i] > A[j], if we swap A[i] and A[j], the decrease of #Inv is equal to 1 + |{k|i < k < j,A[j] < A[k] < A[i]}|. Therefore, in order to make #Inv decrease the most, we should swap 6 and 2 in the first place (#Inv decreases by 3). However, sorting the swapped array A′ = [2,5,1,3,6,4] takes at least 5 swaps, so in total it takes 6 swaps to sort this array with the greedy algorithm. Alternatively, this array can be sorted by swapping 2 and 5 in the first place(#Inv decreases by 2), and sorting the swapped array A′ = [6,2,1,3,5,4] takes only 3 steps ([6,2,1,3,5,4] → [1,2,6,3,5,4] → [1,2,3,6,5,4] → [1,2,3,4,5,6]), so in total it only takes 4 swaps to sort this array. Therefore, in this case, the greedy algorithm is not optimal. (c) We apply exactly the same algorithm in 2(a) and show it’s optimal in terms of # swaps. Analysis of minimum number of swaps: We define a ”cycle” as follows: Ci = {Ai1,…,Aiki}, idx(a) as the actual index of a in A, rank(a) as the rank of a in A, then rank(Ai1) = idx(Ai2),…, rank(Aiki−1) = idx(Aiki),rank(Aiki) = idx(Ai1). Any unsorted array A can be decomposed into several cycles. Any swap between the elements in two cycles is meaningless because it will not put any element in place or increase the number of cycles. Therefore, the number of swaps needed to sort A is equal to the sum of the number of swaps needed to sort each cycle. Next, we prove by induction that in order to sort a cycle with |Ci| = ki, we need to perform at least ki − 1 swaps. For ki = 2, we have Ci = {Ai1,Ai2}, where rank(Ai1) = idx(Ai2),rank(Ai2) = idx(Ai1), we only need to swap Ai1 and Ai2. Hence we need to make 1 = 2 − 1 swap to sort this cycle. Assume for |Ci| = k, we need to make k − 1 swaps to make the cycle sorted. For |Ci| = k + 1, any swap within the array will only put at most 1 element in place. Consider making a swap that puts one element in place (swap Aij and Aij+1), now we have idx(Aij) = rank(Aij), and rank(Aij−1) = idx(Aij+1),rank(Aij+1) = idx(Aij+2). In other words, we obtained an element in place and a new cycle Ci′ with |Ci′| = k, which takes at least k − 1 swaps to sort. Hence |Ci| takes at least k steps to sort. Finally, we prove that our algorithm obtains this optimal solution. For each i, our algorithm keeps swapping A[i] and A[rank(Ai)] if i ̸= rank(A[i]). This is equivalent to swap Aij and Aij+1 where Aij,Aij+1 ∈ C, rank(Aij) = idx(Aij+1) Consequently, we have Aij+1 at place i and rank(Aij+1) = idx(Aij+2) and rank(Aij−1) = idx(Aij+1). Therefore, when at last i = rank(A[i]), we finish sorting a cycle C with |C| − 1 swaps. For any sorted cycle C, we have i = rank(Ai) for all Ai ∈ C and will not swap at all. Therefore, this algorithm sorts A with only ΣC(|C|−1) = n−#cycles swaps, which is optimal. Pseudocode: See 2(a). Problem 3: Longest doubling subsequence [15 pts] Recall that an array (or sequence) A of real numbers is doubling if A[i] ≥ 2A[i − 1] for all i ≥ 2 (thus any array of length 1 is doubling). Given an array A, we want to find a longest subsequence of A that is doubling. For example, if A = [7,1,3,8,2,4,5,6,9], then a longest doubling subsequence of A is be [1,2,4,9], which is of length 4. Design an algorithm LDS(n,A) that returns a longest doubling subsequence A′ of the array A (where n is the size of A). Describe the basic idea of your algorithm and give the pseudocode. Briefly justify its correctness and analyze its time complexity. Your algorithm should run in O(n2) time or faster. Solution. Please write down your solution to Problem 3 here. Intuition of algorithm: We define the maximum length of a subsequence ending at Ai as Li. Pi = {i1,…,ik} such that ∀j ∈ Pi,j < i,2Aj ≤ Ai. Let Li = maxj∈Pi(Lj) + 1, previ = argminj∈Pi(Lj). After we obtain all Li for ∀i ∈ {1,2,…,n}, we say that the maximal length of any LDS is maxi∈{1,…,n}Li, and the corresponding LDS is obtained by checking previ recursively. Pseudocode: def LDS(n, A): length = [ ] last = [ ] length . append(1) last . append(None) for i in range(1 , n ): max len = 0 tmp = None for j in range(0 , i ): if length [ j ] > max len and A[ i ] >= 2 ∗ A[ j ] : max len = length [ j ] tmp = j length . append(max len + 1) last . append(tmp) tail = length . index (max( length )) res = [ ] while tail != None: res . append(A[ tail ]) tail = last [ tail ] return res [:: −1] Proof of correctness: Assume an optimal solution ending at Ai contains {Ai1,…,Aj,Ai}, then {Ai1,…,Aj} is an optimal solution that ends at Aj. Therefore, in order to obtain the optimal solution that ends at Ai, we only need to retrieve the best solution among the optimal solutions at ends at Ai1,…,Aik, where ∀j ∈ {i1,…,ik},j < i,2Aj ≤ Ai. Time complexity analysis: For any i, we need to check j = 1,…,i − 1 to see if 2Aj ≤ Ai and the optimal solution that ends at Aj. For each j, the checking time is O(1); for j = 1,…,i − 1, the total time is O(n); for i = 1,2,…,n, the total time is O(n2). Hence the time complexity of this algorithm is O(n2). Problem 4: Removing the numbers optimally [20 pts] Given a sequence of (positive) numbers, we want to remove the numbers from the sequence one by one. When removing one number x, we gain a score equal to l2xr2 where l is the number to the left of x in the current sequence (l = 1 if x is the leftmost number in the current sequence) and r is the number to the right of x in the current sequence (r = 1 if x is the rightmost number in the current sequence). For example, suppose the given sequence is (2,9,12,3). If we remove 12 first, the score we gain is 92 × 12 × 32 = 8748, and the sequence becomes (2,9,3). Now if we remove 9 from the sequence, then the score is 22 × 9 × 32 = 324, and the sequence becomes (2,3). Next if we remove 3, then the score is 22 × 3 × 12 = 12, and the sequence becomes (2). Finally we remove 2, and the score is 12 × 2 × 12 = 2. The total score we gain is 8748 + 324 + 12 + 2 = 9086. Our goal is to find an order to remove the numbers from the given sequence such that the total score we gain is maximized. Formally, design an algorithm Remove(n,A) where A is an array of n positive numbers; the algorithm returns the maximum total score we can gain if the given sequence is A (for convenience, you are not required to return the optimal order for removing the numbers). Describe the basic idea of your algorithm and give the pseudocode. Briefly justify its correctness and analyze its time complexity. Your algorithm should run in O(n3) time or faster. Solution. Please write down your solution to Problem 4 here. Idea of the algorithm: In the first place, we augment the array A to a new array B, B = (1,A1,…,An,1) = (B1,B2,…,Bn+1,Bn+2). We need to remove B2,…,Bn+1, and the score obtained by removing Bj is equal to Bj2−1 · Bj · Bj2+1 for any j ∈ {2,…,n + 1}. Now, let 1 ≤ l < r ≤ n + 2. Consider the optimal score obtained after removing all elements in {Bl+1,…,Br−1} f(l,r). Suppose the last element we remove from {Bl+1,…,Br−1} is Bk where l < k < r, then the optimal score can be represented as: f(l,r) = maxk{f(l,k) + f(k,r) + Bl2 · Bk · Br2} The boundary case is that if r − l = 2, f(l,r) = Bl2 · Bl+1 · Br2. Pseudocode: def Remove(n, A): memory = {} def helper (B, l , r ): if memory. get (( l , r ) , 0) > 0: return memory[( l , r )] if r − l == 2: return B[ l ] ∗∗ 2 ∗ B[ l + 1] ∗ B[ r ] ∗∗ 2 maxn = 0 for i in range( l + 1 , r ): tmp = helper (B, l , i ) + helper (B, i , r ) + B[ l ] ∗∗ 2 ∗ B[ i ] ∗ B[ r ] ∗∗ 2 if tmp > maxn: maxn = tmp memory[( l , r )] = maxn return maxn B = [1] + A + [1] helper (B, 0 , n + 1) return memory[(0 , n + 1)] Justification of correctness: We prove by induction that this algorithm gives the optimal result. When n = 1, f(1,3) = B2 = A1, this algorithm gives the best result. Assume when n = 1,2,…,k, this algorithm gives the optimal result. When n = k + 1, we know in A = {A1,…,Ak+1}, we must choose one element Aj to remove at the last, and the score obtained is equivalent to the score obtained by removing A1,…,Aj−1(denote as L) + the score by removing Aj+1,…,An(denote as R)+Aj, where the first part and the second part are independent of each other, and Aj is not dependent on L and R(but L and R are dependent on Aj). Therefore, max(L,R,Aj){L + R + Aj} = maxAj{maxAj{L} + maxAj{R} + Aj}, where by our assumption, maxAj{L} and maxAj{R} can be obtained by our algorithm. Therefore when n = k+1, the global optimal can also be obtained by this algorithm. By induction, this algorithm works for ∀n ∈ N∗. Analysis of time complexity: In this algorithm, we need to traverse all the possible l,r,k without redundancy, where l < r < k. The number of all possible combinations (l,k,r) is equal to ). The time complexity for each combination is O(1), hence the time complexity is O(n3).
Title: Report Description: In this coursework there are two (2) components. Part 1 – write a summary of the history of HCI and its importance to contemporary computing (1000 words max). Part 2 – conduct and report a usability evaluation of an app of your choice. You must correctly use the Cooperative Evaluation method. (1000 words max)Part 1 – A summary of the history of HCI and its importance to contemporary computing (1000 words max). You are expected to plan and write a 1000-word report that critically explores the history of HCI and its importance to a particular aspect of contemporary computing. You will select a particular area of contemporary computing to focus your writing on. For example: ▪ AI and Machine Learning in home or workplace ▪ Social Media ▪ Smart Cities ▪ Platform Economies ▪ Health technologies ▪ Or something of your choosing i. Introduce an element of contemporary computing and its relevance to HCI. ii. Describe the area of focus and its impact in the world. iii. Provide a brief history of HCI as a background to this aspect of computing and evaluate the impact of HCI on it now. iv. Conclude with a summary of this part of your report highlighting the key points and/or arguments set out. The report should utilise external sources (e.g., grey literature, academic articles, news articles, reports, etc.) to develop analysis and make critical points about your chosen element of computing. The report will be created for a general audience, avoiding jargon and technical language where possible, and will be expected to clearly demonstrate understanding and applications of key concepts of HCI as they relate to your chosen topic.Format: Reports should use the template provided. Do not include your name anywhere on the document. Submission details: Coursework is to be submitted as an MS Word doc (e.g. .docx) file, on NESS. Word limit: for whole coursework is 2000 words, not including section or table headings in report template.Under the word over the word limit affected component shall be deducted from the assessment mark. For clarity: a piece of work which would have scored 65% for that component but that has a word count greater than 10% of the prescribed word limit will be allocated 55%; a piece of work which would have scored 45% for that component but that has a word count greater than 10% of the prescribed word limit will be allocated 35%.Contact: Any questions about the coursework please contact me [email protected] Marking Criteria: See tables below.Marking Criteria: The following grade descriptor is used for marking part 1 of the coursework – which is more essay like. The descriptions for each category are intended as a ‘best fit’. It is inevitable that responses will have characteristics of more than one of the descriptions, or that not all aspects of any one description are met, in which case academic judgement is used to assign a mark based on the overall quality of the work. Once a grade is selected it will then be divided by 50% to give a mark out of 50 for this component of the coursework.Class Regular Grade Boundaries Criteria Distinction 90-100 A brilliant piece of work of outstanding quality and innovation. Has total control of all relevant material. Shows outstanding insight and an ability to structure and synthesise material. Work of the highest order. The candidate could be expected to achieve no more. Expression / style / grammar outstanding.Distinction 80-89 An outstanding piece of work. Has total control of relevant material and shows an excellent synthesis of factual and conceptual components. Work of a very high order. Expression/style/grammar excellent. Distinction 70-79 An excellent piece of work. High level of understanding of all relevant material with excellent, relevant use of referencing and examples. Communicates clearly and effectively using a coherent structure showing insight and perceptiveness. A commendable degree of academic originality. Expression/style/grammar excellent. Merit 65-69 A very good piece of work. Demonstrates all the qualities of 6064 level essay to a higher degree of development. Evidence of good background reading beyond the materials suggested. Sustained argument throughout. Merit 60-64 A good piece of work. Shows a firm grasp of the majority of the relevant material. Argues well and effectively. Is able to criticise and evaluate material. Well-structured and shows evidence of wider background reading. Correctly and appropriately referenced. Some evidence of originality of thought. Expression/style/grammar good. Pass 55-59 A competent piece of work which shows reasonable understanding of the material and presents it satisfactorily with appropriate examples and referencing. Structure is apparent and there is a coherent (though possibly weak) argument with adequate conclusion. Evaluative/critical /analytical skills present but not highly developed. No obvious weaknesses except a lack of originality. Expression/style/grammar moderately good. Pass 50-54 An adequate piece of work which shows some structure, relevant use of examples and evidence of background reading. Some limited referencing. Limited evidence of independent thought and the development of a substantiated argument. Conclusions not well developed. Expression/style/grammar adequate. Fail 45-49 Marginal fail. Argument obscure, weak or unbalanced. Only partially relevant. Has major omissions. Some understanding, reflection, structure and referencing. Partially successful attempt to use relevant examples and facts. Some reading. Conclusions weak. Expression/style/grammar limited. Fail 40-44 Clear fail. Some relevant material, few or no relevant examples. Little or no attempt to relate this to the question. Very little reading. Many unsubstantiated remarks. Naïve – i.e. simplistic and lacks control / awareness of the subject material and reflective thought. Referencing poor. Limited understanding. Lacks a structure. Material not well organised. Expression/style/grammar weak. Fail 34-39 Clear fail. Little or no reading at an appropriate level. Some material of relevance but with major omissions and errors. Generally unsatisfactory but with redeeming features. e.g. some evidence of preparation, some limited understanding, some reflective thought. Expression/style/grammar poor. Fail 30-34 Unsatisfactory. Lacking evidence of preparation, evaluative or reflective skills. Largely irrelevant. Little or no understanding. Expression / style / grammar / presentation very poor. Hardly any, or no, evidence of reading / organisation. Fail 16-29 Wholly unsatisfactory, little or no evidence of preparation, analytical or evaluative skills. No evidence of understanding of the material or ability to structure or present material. Hastily thrown together. Presentation poor. Expression/style/grammar extremely poor Fail 0-15 Very little material; or irrelevant or incomprehensible material.Marking Criteria:A – Excellent, all elements correctly completed B – Good completion of form fields – with occasional omissions C – Mistakes present in most areas of form D – Large areas of the form missing / left incomplete E – Report form not usedA – Excellent / insightful summary of key issues found in the analysis and the major fixes required B – Good summary of key issues found, lacking in either the summary of issues or detail of fixes C – Average summary of the issues, maybe missing points of value or obvious fixes D – Weak summary of the issues, some points made but hard to follow, not clearly written, missing issues or fixes or both E – Attempt to provide a summary but largely absent and/or incoherentA – Excellent, complete and highly detailed description of the study details B – Very good description of the study details, lacking or improvable in some small area C – Average description of the study details, possibly with some significant omission D – Weak description of the study details that is hard to follow E – Attempt to provide study details but largely or completely absent and/or incoherentA – Excellent observations brilliantly documented with appropriate and justified severity ratings B – Good observations of usability issues, with clear documentation and severity ratings present C – Average observations of user issues, with some problems with description and/or documentation, severity ratings partial or missing D – Weak observations, hard to follow or understand, documentation poor or missing, no severity ratings E – No user issues documented, or some present but too poorly described to follow and understand, little or no documentation.
This piece of coursework is worth 50% of the total assessment for this module. You are expected to make use of MySQL to complete this coursework. Submit a single answer document to NESS containing your answers to this coursework. NESS will accept .doc/.docx and .pdf files.Aims:To assess your ability to:• Construct SQL queries over a database implementation. • Decide whether a table is in the first three normal forms or not. • Decide on an appropriate database architecture to solve given problem statements.Learning Outcomes:• To be able to formulate SQL queries. • To be able to normalise a database table. • To be able to analyse problem statements and determine an appropriate database architecture.Problem statement:You are the manager at a city centre bus station and are wanting to store information about the companies, vehicles and routes that use it.The bus station has six stands, five in regular use plus a spare in case of a problem. All stands are identified by a unique letter. There are limits on the length of bus that can go on each stand. Note that a bus of the maximum length can fit on the stand, for example if the maximum length permitted on a stand is 10, a 10M long bus can fit on it (but an 11M one can’t).Routes have a destination and a frequency, expressed as an integer number of buses per hour. Each route uses only one stand in the bus station. For simplicity, the destination is free text (i.e. a VARCHAR). There is no need to store information about all the different places that a route passes through between the bus station and the destination. There is also a Yes/No box (which for simplicity is also a VARCHAR length 3 to accommodate the words “Yes” or “No”) to indicate if a route is suitable for an electric vehicle (EV). Most routes are too long and/or have no charging facilities at the other end for the time being.A route can be operated by one or more operators. Some basic contact details need to be stored about these operators.There are a variety of different types of bus which use the bus station. These have a length and a description. Again, this description is free text, though it will always identify if a bus is electric. We also assume that the type of the bus is unique and are not interested in further vehicle details.Some CREATE TABLE statements have been written. However, the database has not been implemented yet.There are two linking tables needed for many-many relationships. A route can have multiple operators (with each operator working a proportion of the route). Different operators can operate the same type of bus as well.The relationship between Route and Stand can be 1-Many because we assume that each route will only ever depart from one stand at the bus station, hence there is a foreign key to represent this relationship.Tasks:i) Run all of the CREATE TABLE statements from Appendix A on MySQL to get your database established. You MUST NOT change any of the table structures, data types or keys. Populate your database with the data in Appendix B.iii) At the moment, the destination of each route is simply a column in the BusRoute table. You have realised that this is not a very good way of storing the destination and raise this with your manager. Your manager agrees and says that it would perhaps be better to list all stops that a particular route serves, not just the destination. A stop has a unique ID (e.g. 1838201) and a human-readable name (e.g. “Darlington Tubwell Row”). The manager proposes adding all of this into the same table and getting rid of the “Destination” column. The BusRoute table would therefore be:BusRoute: (number (PK), IDsOfStopsServed, NamesOfStopsServed, frequency, stand (FK), EV).Where PK and FK are Primary and Foreign Key respectively.Explain the following in your answer document – you do not need to implement the changes in MySQL:a) State ONE problem you can think of with having the destination as it is now, a simple column in BusRoute. b) How many of the three normal form rules would the manager’s suggested change break and why? c) Think about how you can include the new data without breaking any of the normal form rules. Complete the description of the BusRoute table below and if you feel you would need any new tables, also describe them in the same format. Remember to use (FK) to indicate any columns which are foreign keys. It is up to you how many (if any) new tables you create.BusRoute: (number (PK),NewTable: (column 1 (PK), ).iv) One of your friends works for one of the bus companies and has just heard about key-value stores, document databases and graph databases. Your friend is wondering if they should convert their existing relational databases to a graph database or to a document database or to a key-value store or leave it as a relational database. For each of the two databases below, state which one of the databases you would use and give TWO reasons as to why. a) A database of bus drivers’ personal details including their bank account details to ensure they are paid correctly. There are approx. 500 drivers. The main operations required are to add or delete a driver from the database or to view a driver’s details. b) A database of bus parts. There are approx. 1 million parts. The main operations required are to add or delete parts, view a part’s details and to search for a specific part. What to submit: A single Word or PDF document containing your answers to the above tasks. You should submit your work electronically through NESS.This is the end of the assignment but there are two appendices on the following pages that contain the CREATE TABLE statements and the data that you need.Appendix AThis appendix contains the CREATE TABLE statements needed to get the above scenario working. You should hopefully be able to paste straight into MySQL or DBeaver.Please do NOT change the table names, column names or any of the keys. You should also not add in any additional tables to those shown here.These are the basic tables needed for the strong entities. The foreign key in BusRoute is there to stop someone trying to enter a stand that doesn’t exist in the bus station.CREATE TABLE Stand (ID VARCHAR(1), maxLength INT, PRIMARY KEY (ID));CREATE TABLE BusOperator (name VARCHAR(50), address VARCHAR(100), email VARCHAR(50), phone VARCHAR(15), PRIMARY KEY (name));CREATE TABLE BusRoute (number VARCHAR(3), destination VARCHAR(50), frequency INT, stand VARCHAR(1), EV VARCHAR(3), PRIMARY KEY (number), FOREIGN KEY (stand) REFERENCES Stand (ID));CREATE TABLE Bus (type VARCHAR(30), length INT, description VARCHAR(50), PRIMARY KEY (type));These are the linking tables needed for the two many-to-many relationships.Link table OperatesRoute needed for the relationship from operator to route – in this coursework, the proportion will be either 100 if an operator works a route on its own or 50 if it is shared equally with another operator: CREATE TABLE OperatesRoute (routeNumber VARCHAR(3), operatorName VARCHAR(50), proportion INT, PRIMARY KEY (routeNumber,operatorName), FOREIGN KEY (routeNumber) REFERENCES BusRoute (number), FOREIGN KEY (operatorName) REFERENCES BusOperator (name));Link table Uses needed for the relationship from bus operator to bus: CREATE TABLE Uses (operatorName VARCHAR(50), busType VARCHAR(30), PRIMARY KEY (operatorName,busType), FOREIGN KEY (operatorName) REFERENCES BusOperator (name), FOREIGN KEY (busType) REFERENCES Bus (type));Appendix B Stands: Stand A – max bus length permitted 10M. Stand F – max bus length permitted 11M. Stands B, C, D and E – all permit a max bus length of 12M. Bus Routes: Route 21: Destination Durham, 2 per hour operated by Northern. Uses Stand B. Route 54: Destination Gateshead, 4 per hour operated by Northern. Uses Stand F. Route 308: Destination Blyth, 2 per hour operated equally by Coastline and Northumbria. Uses Stand C. Route 355: Destination Whitley Bay, 2 per hour operated by Northumbria. Uses Stand C. Route 366: Destination Cramlington, 1 per hour operated by Moor-Dale Coaches. Uses Stand F. Route 505: Destination Berwick, 1 per hour operated by Northumbria. Uses Stand D. Route 518: Destination Alnwick, 1 per hour operated by Northumbria. Uses Stand D. Route 602: Destination Hexham, 2 per hour operated by Northumbria. Uses Stand A. Route 722: Destination Darlington, 2 per hour operated equally by Northern and United. Uses Stand B. Route Q1: Destination Quayside, 2 per hour operated by Northern Uses Stand C. Only routes 54 and Q1 are currently suitable for electric vehicles. All the others are not. Bus operators: Coastline, Coastline House, North Shields, NE29 2AB, e-mail: [email protected], phone 0191 222 1451. Moor-Dale Coaches, Dudley Road, Newcastle, NE23 2MD, e-mail [email protected], phone 0191 255 3272. Northern Transport Co, The Depot, Chester-le-Street, Co Durham, DH3 1ZZ, e-mail [email protected], phone 0191 300 3000. Northumbria Motor Services, Jesmond Garage, Newcastle, NE2 2EN, e-mail [email protected], phone 0191 208 4929. United Services, Grange Road, Darlington, Co Durham, DL1 1DK, e-mail [email protected], phone 01325 222 555. Buses: Olympian, double-deck bus, 10M long. Used by Coastline, Northern, Northumbria and United. Spectra, double-deck bus, 10M long. Used by Northern. Tiger, coach, 12M long. Used by Northumbria. Dart, medium single-deck bus, 9M long. Used by Moor-Dale Coaches. Delta, single-deck bus, 12M long. Used by Northumbria and United. Excel, single-deck bus, 11M long. Used by Northern. Metro, short single-deck bus, 6M long. Used by Northumbria. Renown, single-deck bus, 11M long. Used by Northern and Northumbria. E10, electric single-deck bus, 10M long. Used by Northern. IE, electric single-deck bus, 12M long. Used by Northern. END OF DATA
Aims:To assess your ability to:• Design a relational database, expressing that design in an entity-relationship diagram. • Implement the design in MySQL.Learning Outcomes:• Design a database from a problem statement. • Implement a database designed with an E-R diagram.Problem Statement:Tasks:2) Implement your design in MySQL. Populate your database tables with data of your choice (this does not need to be a huge amount of data, maybe 5-10 rows for each base table). If you make any changes to your original design from Task 1, these must be explained. Include the following TWO items:i) SQL statements you used to create your tables in your answer document (plus a description of any changes made from Task 1). This MUST be pasted into your answer document and MUST NOT be a screenshot. Screenshots of SQL will score 0/10.What to submit: A single Word or PDF document containing your answers to the above tasks. You should submit your work electronically through NESS.
Submit your completed answer to the questions via NESS, as a pdf document.Instructions Answer ALL questions This assessment contributes 100% towards the total mark for this module. It is an individual exercise: no group work is required for the assessment. In particular, be careful to cite properly bibliographic references. You can of course discuss your work with other students, but the work you submit must be individual. [Turn Over] Question 1 – Videos of attack demonstrations (40%) Learning Objective of the assessment As a security analyst, you must able to conduct concrete attacks on the system you are analysing, to test if some vulnerabilities are present or not. You must be able to use the relevant security tools and to develop the adversarial mindset (i.e., thinking as an attacker).Instructions Guidance on how to record your screen is provided on Canvas. Only a link to the video will be accepted, please do not send us a video file.– Marking criteria for the video demonstrations.Question 2 – Security Analysis Methods (60%) Learning Objective of the assessment As a security analyst, you must be able to design and justify an analysis strategy when considering a system, before actually conducting the attack. You must consider ethical and professional aspects and assess the potential effectiveness of the tools they want to use for a specific context. InstructionsMarking criteria for the analysis methods. Each report is marked individually, following a marking scheme divided in the following categories:Case study 1- Network Security Building and the Urban Observatory. https://3d.usb.urbanobservatory.ac.uk Describe a method to analyse the security of this network of sensors.Case study 2: Web-application Security Describe a method to analyse the security of this server.
Submission Notes You should submit a .zip file containing four Java files and three text files through NESS as follows: • SortedLinkedList.java, which contains your SortedLinkedList class for Task 1. • Meal.java, which contains your Meal class for Task 2. • Subscriber.java, which contains your Subscriber class for Task 2. • MainProgram.java, which contains your MainProgram driver class for Task 2; your main() method should be in this class. • input_data.txt, which is the input file for your program (this can be the same as the provided input_data.txt file; your program should only read this file in, not change it). • clerk.txt should contain a record of the clerk’s interactions with the program displayed in the Terminal Window (both what the clerk inputs and what your program prints out; copy and paste this from the Terminal Window in IntelliJ; your program should not write to this file). • letters.txt should be the output file created by your program, which contains the letters to subscribers from when they try to add meals to their subscription, but there are not enough meals available. Your text files should show that you have tested your program and the letters.txt file should correspond to the input in clerk.txt. Aims • To gain experience in designing an interactive system of practical importance. • To reinforce your knowledge about list classes in java. • To gain experience at using the java.lang.Comparable interface, inheritance and generic classes in Java. Background Sometimes we want to keep the items in a list in sorted order, which we can achieve with a sorted list. The fundamental difference between a sorted list and an unsorted one is the “adding an item”/insertion method. Having a definition of a class for lists, we can obtain a sorted list class by altering the existing one or adding a new insertion method. Specification This document should be read together with the coursework guidance notes document and the coursework FAQ at https://ncl.instructure.com/courses/50031/pages/summative-coursework. Task 1 LinkedList is a java class that implements the java.util.List interface, just like the ArrayList you are familiar with from CSC8011; both these classes have similar methods available to use. Derive a SortedLinkedList class as a subclass of the java.util.LinkedList class in such a way that the items of a sorted LinkedList are sorted in ascending order. This subclass of the LinkedList class will be needed to complete Task 2 (see Additional Assumptions (4) below). You only need to provide your new insertion method in the SortedLinkedList class. You do not need to consider the other methods from the LinkedList class that can modify the list and you must use the inherited LinkedList class, not implement a SortedLinkedList class from scratch. Task 2 The FrozenFood company is setting up a weekly subscription service that delivers frozen meals to customers. You are asked to write a program to help an office clerk manage subscribers’ subscriptions. The company offers various types of meals to its subscribers. Subscribers can add/remove meals to/from their subscription, but the number of each type of meal available is limited. Your program should read a list of registered subscribers and a list of available meals from a file. The content of the input file should have the following form: The first line contains an integer representing the number of registered subscribers, followed by the information about the subscribers (one line for every subscriber with their first name and surname). The next line contains an integer representing the number of different types of meal available, and is followed by the information about the meals (two lines for every meal: one line containing the name of the meal and the second one containing the number of these meals available each week). Example file content is given on the next page (same as the input_data.txt file available from Canvas), but your program should run correctly on any file in the correct format: 4 Ted Smith Carl Smith Anna Jones Anna Smith 7 Curry 2 Chilli 6 Lasagne 8 Steak 11 Fish 8 Mushroom Stew 4 Spaghetti 5 The program should be able to store information about meals and subscribers: 1. For each meal, the information required is: the name of the meal and the number meals still availableto be ordered. 2. For each subscriber, the office should know their first name, surname and their chosen types of mealstogether with the number of each type of meal that they have subscribed to. To simplify shipping, a subscriber can be subscribed to at most three different types of meal at a time (but can subscribe to more than three meals in total). We also assume that no two subscribers share both their first name and their surname. After the initial information has been read from the file, the clerk will work with the program interactively. The program should display a menu on the screen offering a choice of possible operations, each represented by a lower case letter: • f – to finish running the program. • m – to display on the screen information about all the meals. • s – to display on the screen information about all the subscribers. • a – to update the stored data when a registered subscriber adds meals to their subscription. • r – to update the stored data when a registered subscriber removes meals from their subscription. You should implement all the above operations. Additional Assumptions 1. When f is selected, the program stops running and all the data is lost. The program could be extended to save all the data to a file, but this is not part of the project and you must not do this! 2. When m is selected, the names of all meals should be displayed, along with the number of each type of meal still available. 3. When s is selected, the names of all the registered subscribers should be displayed, along with the types of meals they are subscribed to (if any) and the number of each of these in their subscription. 6. When a subscriber removes meals (a meal) from their subscription, your program must check whetherthe subscriber is a valid subscriber, whether the meal specified is on the list of meals types and whether the subscriber has subscribed to at least this many meals of this type. If not, an appropriate message should be displayed on the screen. If the request is a valid one, the stored information should be updated accordingly. You should assume that a subscriber can remove meals of only one type from their subscription in a single transaction. If a subscriber is subscribed to a number of meals of some type, they they should be able to remove one, some or all such meals from their subscription in a single transaction (your program should ask them how many of this type of meal they want to remove). 7. We assume that the company only sells the meals initially allocated, serves only its registered subscribers and processes transactions sequentially, one after the other. 8. When the program first starts, you should assume that no subscriber is subscribed to any meals. Marking Scheme
Instructions Prepare and deliver an oral presentation detailing the complete course development and recordings from Homework 4. Discuss the remaining lectures, supplementary materials, visual aids, educational standards, and potential future improvements. Reflect on the overall course development process and its relevance to teaching and training activities. Utilize visual aids to support your presentation. Keep your oral presentation under 10 minutes. Make sure you cover the required contents (from the instructions above) in the presentation. Assignment Notes Review Submission Requirements You have a time limit of 10 minutes There is no limit on the total number of slides Keep each slide as simple as possible (explain one idea at a time) Use large font sizes. Use examples. Ensure writing is academic and formal, avoid first-person perspective Submit your work in one single file (Do not submit multiple files) There is no specific page limit, but literature review must be comprehensive and detailed Submission Format: Presentation slides (.pptx or .pdf) URL of your video recordings pasted in Dropbox Message (URL obtained when video is uploaded to the cloud using Zoom) View Grading Rubric Assignment Tips Keep it simple. Do not try to squeeze a lot of material into each presentation slide. Do not rush through presentation slides Avoid having too much technical detail. Should be designed to get the audience interested in your work, (not designed to replace the work) Start strongly. The beginning of your talk is the most important as you need to grab the attention from your audience. Start from a compelling introduction of the background of your work, and motivate your ideas with convincing arguments. Use examples. Use a timer to avoid losing track of time. Ensure that you provide sufficient background information and detailed explanations to make your work approachable, understandable, and accessible Aim for your work to be accessible to both the general public and experts in your field.
Instructions Prepare and deliver an oral presentation describing the extended literature review and initial course recordings from Homework 3. Detail the course outline, design, introductory lectures, and alignment with the approved topic. Reflect on the development process, challenges, and engagement strategies. Use visual aids to enhance understanding. Keep your oral presentation under 10 minutes. Make sure you cover the required contents (from the instructions above) in the presentation. Assignment Notes Review Submission Requirements You have a time limit of 10 minutes There is no limit on the total number of slides Keep each slide as simple as possible (explain one idea at a time) Use large font sizes. Use examples. Ensure writing is academic and formal, avoid first-person perspective Submit your work in one single file (Do not submit multiple files) There is no specific page limit, but literature review must be comprehensive and detailed Submission Format: Presentation slides (.pptx or .pdf) URL of your video recordings pasted in Dropbox Message (URL obtained when video is uploaded to the cloud using Zoom) View Grading Rubric Assignment Tips Keep it simple. Do not try to squeeze a lot of material into each presentation slide. Do not rush through presentation slides Avoid having too much technical detail. Should be designed to get the audience interested in your work, (not designed to replace the work) Start strongly. The beginning of your talk is the most important as you need to grab the attention from your audience. Start from a compelling introduction of the background of your work, and motivate your ideas with convincing arguments. Use examples. Use a timer to avoid losing track of time. Ensure that you provide sufficient background information and detailed explanations to make your work approachable, understandable, and accessible Aim for your work to be accessible to both the general public and experts in your field.
Instructions Prepare and deliver an oral presentation summarizing the initial literature review conducted in Homework 2 related to your chosen course topic. Highlight key research papers, findings, methodologies, and relevance to your course content. Discuss how these works inform your course development. Include visual aids for clarity. Keep your oral presentation under 10 minutes. Make sure you cover the required contents (from the instructions above) in the presentation. Assignment Notes Review Submission Requirements You have a time limit of 10 minutes There is no limit on the total number of slides Keep each slide as simple as possible (explain one idea at a time) Use large font sizes. Use examples. Ensure writing is academic and formal, avoid first-person perspective Submit your work in one single pdf file (Do not submit multiple files) There is no specific page limit, but literature review must be comprehensive and detailed Submission Format: Presentation slides (.pptx or .pdf) URL of your video recordings pasted in Dropbox Message (URL obtained when video is uploaded to the cloud using Zoom) View Grading Rubric Assignment Tips Keep it simple. Do not try to squeeze a lot of material into each presentation slide. Do not rush through presentation slides Avoid having too much technical detail. Should be designed to get the audience interested in your work, (not designed to replace the work) Start strongly. The beginning of your talk is the most important as you need to grab the attention from your audience. Start from a compelling introduction of the background of your work, and motivate your ideas with convincing arguments. Use examples. Use a timer to avoid losing track of time. Homework 2-4: Consider finding recent and representative works related to your research. Look beyond research papers; explore technical blogs, white papers, and seminar/video talks. Explore notable venues in your research field (e.g., RSA, BlackHat conferences for security). Homework 2 or 3: Consider creating a visual representation (e.g., a figure or workflow) to help convey your proposed idea. Homework 2, 3, and 4: Pay attention to readability and generalizability. Ensure that you provide sufficient background information and detailed explanations to make your work approachable, understandable, and accessible Aim for your work to be accessible to both the general public and experts in your field.
Instructions Based on the extended literature review and initial course recordings developed in previous assignments, complete the development of the Udemylike online short course on the approved topic. Record the remaining lectures, create supplementary materials, visual aids, and any additional resources. Ensure that the course is well-structured, engaging, and adheres to high educational standards. Include a reflection on the course development process, challenges faced, and potential future improvements. Assignment Notes Review Submission Requirements Ensure writing is academic and formal, avoid first-person perspective Submit your work in one single pdf file (Do not submit multiple files) There is no specific page limit, but literature review must be comprehensive and detailed Submission must follow IEEE Format Refer to IEEE Reference Materials for more information Utilize the Example IEEE Template Assignment Tips Homework 2-4: Consider finding recent and representative works related to your research. Look beyond research papers; explore technical blogs, white papers, and seminar/video talks. Explore notable venues in your research field (e.g., RSA, BlackHat conferences for security). Research involves not only stating your goals but also presenting your contribution clearly to a wide audience. Homework 2 or 3: Consider creating a visual representation (e.g., a figure or workflow) to help convey your proposed idea. Homework 2, 3, and 4: Pay attention to readability and generalizability. Ensure that you provide sufficient background information and detailed explanations to make your work approachable, understandable, and accessible Aim for your work to be accessible to both the general public and experts in your field.
Instructions Extend your literature review with additional insights and research investigation. Record at least two introductory lectures for your course, ensuring they are engaging, clear, and aligned with the approved course topic. Assignment Notes Review Submission Requirements Ensure writing is academic and formal, avoid first-person perspective Submit your work in one single pdf file (Do not submit multiple files) There is no specific page limit, but literature review must be comprehensive and detailed Submission must follow IEEE Format Refer to IEEE Reference Materials for more information Utilize the Example IEEE Template Assignment Tips Homework 2-4: Consider finding recent and representative works related to your research. Look beyond research papers; explore technical blogs, white papers, and seminar/video talks. Explore notable venues in your research field (e.g., RSA, BlackHat conferences for security). Research involves not only stating your goals but also presenting your contribution clearly to a wide audience. Homework 2 or 3: Consider creating a visual representation (e.g., a figure or workflow) to help convey your proposed idea. Homework 2, 3, and 4: Pay attention to readability and generalizability. Ensure that you provide sufficient background information and detailed explanations to make your work approachable, understandable, and accessible Aim for your work to be accessible to both the general public and experts in your field.
Please review the objectives of Homework 2 and the Grading Rubrics one more time. They are from page 4 to page 7. Path: D2L -> Week 4: Steps towards Homework 2 -> Topics, Assignments, and Grading Rubrics Link: https://d2l.sdbor.edu/d2l/le/content/1849145/viewContent/11872235/View1. Project Option 1: Independent Research Exploration Assignment 2: Initial Literature Review Instructions: Conduct an initial literature review related to your chosen research problem. Summarize at least 5 key research papers or articles, highlighting their main findings, methodologies, and relevance to your research problem. Provide insights into how these works inform your research direction. Submission template: 1. Introduction Background of the Research Topic Importance of the Research Topic in the Cyber Domain 2. Literature Review Summary of Research Paper 1 Main Findings Methodologies Relevance to Your Research Summary of Research Paper 2 (Similar sections as above) (Continue similarly for at least 5 research papers) 3. Incorporation of Findings Discussion on how the findings from the literature review inform your research direction Preliminary Ideas or Hypotheses based on the Literature Review 4. Conclusion Summary of Key Insights Identified Gaps and Future Directions (for your Homework 3) 5. References Proper citation of all referenced papers and articles2. Project Option 2: Practical Research Assignment 2: Initial Literature Review Instructions: Conduct an initial literature review related to the research discovery you are replicating. Summarize at least 5 key research papers or articles, highlighting their main findings, methodologies, and relevance to your replication project. Submission template: 1. Introduction Background of the Research Discovery Importance and Relevance of the Research Discovery in Security or AI Research Areas 2. Literature Review Summary of Research Paper 1 Main Findings Methodologies Relevance to Your Replication Project Summary of Research Paper 2 (Similar sections as above) (Continue similarly for at least 5 research papers) 3. Incorporation of Findings Discussion on how the findings from the literature review inform your replication project Preliminary Insights and Approach based on the Literature Review 4. Conclusion Summary of Key Insights Identified Gaps and Future Directions, or potential papers for your replication project (for your Homework 3) 5. References Proper citation of all referenced papers and articles3. Project Option 3: Short Seminar Course Development Assignment 2: Initial Literature Review Instructions: Conduct an initial literature review related to your chosen course topic. Summarize at least 5 key research papers or articles, highlighting their main findings, methodologies, and relevance to your course content. Submission template: 1. Introduction Background of the Course Topic Importance and Relevance of the Course Topic 2. Literature Review Summary of Research Paper 1 Main Findings Methodologies Relevance to Your Course Content Summary of Research Paper 2 (Similar sections as above) (Continue similarly for at least 5 research papers) 3. Incorporation of Findings into the Course Discussion on how the findings from the literature review inform your course development Preliminary Ideas for Course Content based on the Literature Review 4. Table of Contents for the Course Introduction Module 1: (Title and brief description) Submodule 1.1 … …. Module 2: (Title and brief description) Submodule 2.1 … …. (Continue for all planned modules) Conclusion References and Further Reading 5. Key Components of the Newly Developed Course Learning Objectives Target Audience Assessment Strategies Supplementary Materials 6. Conclusion Summary of Key Insights Potential Impact of the Course 7. References Proper citation of all referenced papers and articles
Instructions Submit a 1-page report outlining the background of your chosen course topic. Clearly state the research problem or theme you intend to explore in your course, including its relevance, significance, and potential impact. Assignment Notes Please submit your work in one single pdf file. (Do not submit multiple files.) Please also leave a message in the text box indicating the project option you choose. View Grading Rubric Assignment Tips Have a clear research problem to solve. Make sure that you use concise and clear sentences to describe: the research background (what is the current state of this research problem), research motivation (why your proposed research is even needed), and your research problem (what specific problem you are trying to solve). Homework 2-4: Consider finding recent and representative works related to your research. Look beyond research papers; explore technical blogs, white papers, and seminar/video talks. Explore notable venues in your research field (e.g., RSA, BlackHat conferences for security). Research involves not only stating your goals but also presenting your contribution clearly to a wide audience. Homework 2 or 3: Consider creating a visual representation (e.g., a figure or workflow) to help convey your proposed idea. Homework 2, 3, and 4: Pay attention to readability and generalizability. Ensure that you provide sufficient background information and detailed explanations to make your work approachable, understandable, and accessible Aim for your work to be accessible to both the general public and experts in your field. Resources HW1 Template Example 1 – Personal Privacy and Data Security by Laura Schuck Example 2 – Analyzing Priority Transformations by Alexis VanderWilt Example 3 – Fuzzing the C Compiler by Jarod Keene and Logan Stratton
Description: On our Labs CTF site you will find three new challenges under the “Extra Credit” category. Each is built from a (very slightly) modified version of this source code. Each contains the same vulnerability. The only difference is their compile options: • Morris Worm (easy) is built much like the original program in 1988 would have been; simple, with no exploit mitigations at all. • Morris Worm (medium) adds NX protection which did not exist at the time. This will make writing an exploit more difficult, but not impossible. • Morris Worm (hard) Is compiled 64-bit, and includes NX, but is missing a few key components you might have wanted to use in your ROP chain. Can you find creative alternative solutions? Put yourself in the shoes of Robert Morris and try to write exploits capable of running code on the remote machine for each. I’ll give 3 points extra credit for each solve, and 1 more bonus point if you solve all three, meaning that you can earn up to 3+3+3+1 = 10 points total here, the equivalent of a full assignment! Don’t forget to submit the flags! Deliverables: • For each of the labs listed above, please turn in to the dropbox… ◦ Your well documented script which solves the challenge ◦ A screenshot of it running, showing that it works.
• Objective – use the parallel programming languages learned from the class to solve a computing intensive problem and demonstrate how parallel programming can be used to Class Project • Part 1 Identify a problem to solve • Part 2 Provide three solutions to solve the same problem – One sequential program – Two parallel programs Parallel Programming Frameworks • Parallel programming frameworks for consideration – Multithreaded programing – MPI, openMP, CUDA: to be discussed in the class – More parallel programming frameworks available such as OpenCL, OpenACC, etc. How About Python, C#, … • You can also choose a language, e.g., python C#, other than C/C++ for the class project – MPI support in Python is possible – https://mpi4py.readthedocs.io/en/stable/ • Contact with me if you have any questions for this option Comparisons and Evaluation a) Create a sequential code to solve the problem b) Create two solutions using the two parallel programming frameworks as identified c) Benchmark the three implementations (sequential code and implementations using two parallel programming frameworks) d) Compare and analyze the performance (e.g., speedup, memory consumption, etc.) of your implementation e) What do you learn in the project? Submission • Submit the source code of the three implementations • Information expected in the final Report – Summarize your implementations, the challenges and the lessons you learned in the project – Comparisons and benchmark the solutions developed Create an all-in-one zip file for all the source code, reports, and supporting documents for submission. Important Dates – A brief introduction of the problem you identify – Source code and readme files – Final report