Total: 100 points The homework assignment will be graded based on the following criteria: • Accuracy: 1) the solution meets specific requirements in the problem description; 2) the solution produces correct results; 2) the procedures adopted in the solution are technically sound. • Efficiency: efficiency will be one of the criteria when grading programing assignment. The solution should produce the desired results efficiently. • Effort/neatness: the solution includes excellent effort, and all relate work is shown neatly and organized well. Homework assignment feedback will be available through the DropBox folder on D2L. For all the programming assignments, you can choose any operating systems you want. I will usually provide C/C++ samples for the programming assignments. If you prefer to use other languages, e.g., Java, they are accepted too. A README.txt is required to submit any programming assignments. In the README.txt, you need to provide the following information: 1) How to compile your program? 2) How to run your program? 3) What is the output and the results when I run your program? Zip all you source code, project files, supporting files, and README.txt and submit the all-in-one zip file together to the D2L Dropbox. (20 points) Programming Assignment 1: We have learned parallel sieve of Eratosthenes algorithm. To find all prime numbers up to some limit value N, this algorithm works as follows: 1. Create the sieve — an array indexed from 2 to N. (This can be an array of bools or ints or anything else that supports random access and provides entries that are easy to “mark”.) 2. Set k to 2. 3. Loop: 1. In the sieve, mark all entries whose indices are multiples of k between k2 and N, inclusive. 2. Find the smallest index greater than k that is unmarked, and set k to this value. Until k2 > N. 4. The indices of the unmarked sieve entries are prime numbers. We have discussed how to convert the sequential program into parallel program in our class. We also identified a few ways to improve the algorithm. In this exercise, three versions of sieve program are given, i.e., sieve1.c, sieve2.c, sieve3.c. Go through the code and answer the following questions: • (5 points) Read the code and briefly explain how each version works. • (5 points) What are the differences between sieve1.c and seive2.c? How does sieve2.c improve on sieve1.c? • (5 points) What are the differences between sieve1.c and seive3.c? How does sieve3.c improve on sieve1.c? • (5 points) Benchmark the performance of the three versions on the Rushmore cluster using different number of processors and fill the table below. Does the Benchmark results meet your expectation? Why? Problem Np=1 Np=2 Np=3 Np=4 sieve1.c sieve2.c sieve3.c• (Bonus 10 points) If you are able to find another way to optimize the sieve program and perform better than the three visions provided in the exercise, you will have chance to get an extra 10 points for this homework. Proof needs to be submitted to demonstrate the improvement you have is better than the three programs provided. Examples of the proof includes, but are not limited to, fast execution time compared with the provided three sample programs. (10 points) Assignment 2: Compare the differences of MPI and openMP programming. Which one do you like better and why? (30 points) Assignment 3: gprof is a type of tool called a profiler. Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing. This information can show you which pieces of your program are slower than you expected and might be candidates for rewriting to make your program execute faster. gprof can be made available in the VMs in the Rushmore virtual cluster. • (15 points) For linpack_bench.cpp, read the profile tool gprof user manual (see the html guide), run the profiling tool and report the percentage of running times for each function. • (15 points) For linepack_bench.cpp, based on the percentage of running times from gprof, briefly describe your strategy to modify the program if openmp is chosen to optimize the program. (40 points) Programming Assignment 4: For the 4 sequential c programs, p1.c, p2.c, p3.c, p4.c, using openmp to parallelize them as much as possible and do the following profiling. • Add timestamp functions to the code to count the running time of the sequential programs and your openmp programs and show the result in the following table. • Copy your openmp program running outputs to a report file running.txt. You need to make sure openmp programs should generate the same results as the sequential codes. P1 P2 P3 P4 Sequential code running time openmp running time Speed up No. of threads Note: speed up is the ration of the sequential code running time to the openmp running time.
CSC718 Homework 1 The homework assignment will be graded based on the following criteria: • Accuracy: 1) the solution meets specific requirements in the problem description; 2) the solution produces correct results; 2) the procedures adopted in the solution are technically sound. • Efficiency: efficiency will be one of the criteria when grading programing assignment. The solution should produce the desired results efficiently. • Effort/neatness: the solution includes excellent effort, and all relate work is shown neatly and organized well. Homework assignment feedback will be available through the DropBox folder on D2L. For all the programming assignments, you can choose any operating systems you want. I will usually provide C/C++ samples for the programming assignments. If you prefer to use other languages, e.g., Java, they are accepted too. A README.txt is required to submit any programming assignments. In the README.txt, you need to provide the following information: 1) How to compile your program? 2) How to run your program? 3) What is the average running time of your program? 4) What are the expected output results when I run your program? Zip all you source code, project files, supporting files, and README.txt and submit the all-in-one zip file together to the D2L Dropbox. If you have any questions about the homework, please let me know.1. (10 points) The central processing unit (CPU), also called a processor, receives a program’s instructions; decodes those instructions, breaking them into individual parts; executes those instructions; and reports the results, writing them back into memory. The format for that processor comes in one of two primary types: vector and scalar. a. What is a vector processor? b. What is a scalar processor? How about superscalar processor? c. What are the differences between vector and scalar processor? 2. (10 points) Instruction-Level Parallelism (ILP) can be implemented in both hardware and software level to increase the number of instructions executed in parallel. Briefly describe the following ILP techniques: a. Instruction pipelining b. Speculative execution c. Branch predictionCSC718 Operating Sys & Parallel Programming 3. (10 pints) Why parallel computing is critical to improve computer performance? In another words, what are the limitations by increasing transistor density in a single chip to improve its performance? 4. (10 points) What are the tradeoffs between preemptive scheduling and non-preemptive scheduling? 5. (10 points) Two sample solutions for the dinning philosopher have been posted in the class web site: one solution is based on mutex and condition variable, the other solution is based on semaphore. Go through the two solutions and answer the following two questions: a. (5 points) Describe the difference of these two solutions. b. (5 points) which solution is better? Why? 6. (Programming Assignment, 50 points) myhttpd1.cpp is a simple webserver written in C. The sample code has been posted in the class website. You can compile and run the program using the following commands in a Linux environment >gcc myhttpd1 –o myhttpd1 >./myhttpd1 –p 8080 After you run the program, open a browser and type http://localhost:8080 and you will see “Welcome to my first page!” in the browser. a. (10 points) The posted program has a bug. A function call has been skipped on purpose in the program. Review the myhttpd1.cpp code, find the problem, and fix the code. b. (15 points) myhttpd1.cpp is a single thread program. Create a new program, myhttpd2.cpp, based on myhttpd1.cpp and use multiple threads to process http request. c. (15 points) Create a new program, myhttpd3.cpp, based on your modification in b). Change the program to a daemon (running in background).
Array Sum 1. Trace the recursive array sum algorithm for the following arrays. Show to each recursive call the input array, the returned value, and the number of sums executed. At the end of the trace, present the total number of sums executed (the total of sums of all recursive calls. a. A = [38, 21, 39, 60, -1, 10, 81, 23] b. B = [2, 97, 5, 88, 9, 72, 12, 64, 17, 56, 21] c. C = [100, 33, 22, 213, 65, 29, 153, 199, 47, 181, 85] Mergesort 2. Trace the Mergesort algorithm for the following arrays. Show to each recursive call the two input and output arrays. a. A = [38, 21, 39, 60, -1, 10, 81, 23] b. B = [2, 97, 5, 88, 9, 72, 12, 64, 17, 56, 21] c. C = [100, 33, 22, 213, 65, 29, 153, 199, 47, 181, 85]
Russian Peasants Multiplication 1. Trace the Russian Peasants Multiplication algorithm for the following products. Show each recursive call and the final result, as shown in the live session (table). a. 64 * 13 b. 60 * 13 c. 59 * 13 Lomuto partition 2. Trace the Lomuto partition with the array: a. A = [100, 33, 22, 213, 65, 29, 153, 199, 47, 181, 85] Using A[10] = 85 as pivot the final array will be: ● A = [33, 22, 65, 29, 47, 85, 153, 199, 100, 181, 213] In your trace, write down to each change in either i or j, stating: the values of i and j, swaps made, and elements divided into lesser than the pivot, greater than the pivot, and yet to compare.
Compute the complexity of the recursive algorithms based on the recursive equation and stop condition. Show your work, not just your final answer. 1. T(n) = 2T(n-1) + 1 and T(0) = 1 a. You can compute this complexity as a tight upper bound. 2. T(n) = T(n-2) + n2 and T(0) = 1 a. Hint: Assume n is even; that is, n = 2k for some integer k. 3.T(n) = T(n-1) + 1/n and T(1) = 1 a. Hint: Go online and find a formula for the sum of the first n terms of the “harmonic series”. Master Method Compute the complexity of the recursive algorithms based on the recursive equation and stop condition. Show your work, not just your final answer. 4. T(n) = 2T(n/4) + 1 and T(0) = 1 a. Be sure to rewrite 1 as n0. 5. T(n) = 2T(n/4) + n1/2 and T(0) = 1 a. Note that n1/2 is the square root of n. 6.T(n) = 2T(n/4) + n2 and T(0) = 1 a. This is similar to the previous one. Master Method Compute the complexity of the recursive algorithms based on the recursive equation and stop condition. Show your work, not just your final answer. 7.T(n) = 10T(n/3) + n2 and T(0) = 1 a. In your answer, round the value of the logarithm to 2 decimal places. b. Remember that the logb(a) is equal to log2 (a) / log2 (b). 8.T(n) = 2T(2n/3) + 1 and T(0) = 1 a. In your answer, round the value of the logarithm to 2 decimal places. b. Be sure to rewrite 1 as n0. c. Remember that the logb(a) is equal to log2 (a) / log2 (b). d. Hint: rewrite 2n / 3 as n / (3/2)
DFS – Breadth First Search using the brute force algorithm as seem in class Consider the graph below:1) Represent this graph using an adjacency list. Arrange the neighbors of each vertex in alphabetical order. – list the triplets for this graph in the form (A, B, 1), where there is a edge from vertex A to vertex B; – Note that this graph is directed, unlike the one presented in class. • (A, E, 1), (A, H, 1) • (B, A, 1) • (C, F, 1), (C, G, 1) • (D, A, 1), (D, E, 1) • (E, C, 1) • (F, D, 1), (F, E, 1) • (G, B, 1), (G, E, 1) • (H, D, 1)2) Trace the DFS execution by adapting the code to deal with a directed graph (remove lines 7 and 8) and instrumenting it to print every time a recursive call is made and a vertex is visited: – Each time a recursive call is made for vertex A, print: DFS called for vertex A; – Each time a vertex A is visited print: Vertex A visited and received the stamp “” and the current array V.
BFS – Breadth First Search using the brute force algorithm as seem in class Consider the graph below:1) Represent this graph using adjacency lists. Arrange the neighbors of each vertex in alphabetical order. – list the triplets for this graph in the form (A, B, 1), where there is a edge from vertex A to vertex B; – Note that this graph is directed, unlike the one presented in class. 2) Trace the BFS execution by adapting the code to deal with a directed graph (remove lines 14, 15, and 16) and instrumenting it to print every time a vertex is visited and everytime a vertex is enqueued or dequeued. – Each time a vertex A is visited print: “Vertex A visited” and the current array V; – Each time a vertex B is enqueued print: “Vertex B enqueued” and the current queue Q; – Each time a vertex C is dequeued print: “Vertex C enqueued” and the current queue Q.
1) Complete this table rounding each decimal to the nearest integer. This should give you a sense of the comparative growth rates of the functions. Note that lg(n) means python’s binary log – log2(n).2) Solve the summation in two ways: write out all terms of the sum and perform the arithmetic; then use one of the formulas in the class notes to confirm your answer.3) Solve the summation in two ways: write out all terms of the sum and perform the arithmetic; then use one of the formulas in the class notes to confirm your answer.4) Solve the summation in two ways: write out all terms of the sum and perform the arithmetic; then use one of the formulas in the class notes to confirm your answer (round off this answer to three decimal places).5) Solve this summation in two ways: write out all terms of the sum, but there will be no arithmetic to perform; then use one of the formulas in the class notes to express the sum as a polynomial function of n. CSC6013 – Worksheet for Week 2 6) Determine the Big-Oh class of each algorithm. That is, formally compute the worst-case running time as we did in class using a table to produce a function that tracks the work required by all lines of code. Include all steps of the algebraic simplification, but you do not need to provide comments to justify each step. Arithmetic mean = “add them all up, and divide by how many”. Let the size of the problem = n = the number of entries in the array.7) Determine the Big-Oh class of each algorithm. That is, formally compute the worst-case running time as we did in class using a table to produce a function that tracks the work required by all lines of code. Include all steps of the algebraic simplification, but you do not need to provide comments to justify each step. Sum of entries in an upper triangular nxn array. Let the size of the problem = n = the dimension of the nxn matrix.
Introduction Sorting algorithms are of great importance in computer science. In data structure and algorithm courses, we have learned about various sorting algorithms. However, have you ever considered making these algorithms run in parallel on multi-core CPUs or even distributing them in a cluster? In fact, distributed and parallel sorting algorithms are more challenging than the projects we have completed for two main reasons. Firstly, it is not that easy to divide the entire task into different threads or processes. Secondly, the work of different threads or processes is no longer independent, which means that they must communicate with each other to synchronize their states and ensure that the algorithms run correctly. 1. Quick Sort 2. Bucket Sort 3. Odd-Even Sort You are required to implement either the thread-level parallel or process-level parallel version of these algorithms. Get started and do your best! Task 1: Process-Level Parallel Quick Sort with MPI Quick Sort is a highly efficient and widely used sorting algorithm in computer science. It is known for its fast average-case performance and is often the sorting algorithm of choice in many applications. Quick Sort operates on the principle of a divide-and-conquer strategy, where it repeatedly divides the unsorted list into two sublists: elements smaller than a chosen pivot and elements larger than the pivot. The algorithm sorts these sublists and combines them to create the final sorted list. Here’s a step-by-step explanation of how Quick Sort works: 1. Pivot Selection: The algorithm selects a pivot element from the unsorted list. The choice of the pivot can significantly affect the algorithm’s efficiency, and various methods are used to select the pivot, such as selecting the first, last, middle, or a random element from the list. 2. Partitioning: The list is then partitioned into two sublists: elements less than the pivot and elements greater than the pivot. This is done by comparing each element in the list with the pivot and moving elements to the appropriate sublist. 3. Recursion: Quick Sort is applied recursively to both sublists, which means that the algorithm is called on the sublists created in the previous step. This process continues until the sublists are small enough to be considered sorted. 4. Combining: Once the recursion reaches small enough sublists, no further action is needed. The sorted sublists are then combined to form the final sorted list.“`c++ int partition(std::vector &vec, int low, int high) { int pivot = vec[high]; int i = low – 1; for (int j = low; j < high; j++) { if (vec[j] = 0 && bucket[j] > key) { bucket[j + 1] = bucket[j]; j–; } bucket[j + 1] = key; } } void bucketSort(std::vector& vec, int num_buckets) { int max_val = std::max_element(vec.begin(), vec.end()); int min_val = std::min_element(vec.begin(), vec.end()); int range = max_val – min_val + 1; int small_bucket_size = range / num_buckets; int large_bucket_size = small_bucket_size + 1; int large_bucket_num = range – small_bucket_size * num_buckets; int boundary = large_bucket_num * large_bucket_size; std::vector buckets(num_buckets); // Pre-allocate space to avoid re-allocation for (std::vector& bucket : buckets) { bucket.reserve(large_bucket_num); } // Place each element in the appropriate bucket for (int num : vec) { int index; if (num < boundary) { index = (num – min_val) / large_bucket_size; } else { index = large_bucket_num + (num – boundary) / small_bucket_size; } if (index >= num_buckets) { // Handle elements at the upper bound index = num_buckets – 1; } buckets[index].push_back(num); } // Sort each bucket using insertion sort for (std::vector& bucket : buckets) { insertionSort(bucket); } // Combine sorted buckets to get the final sorted array int index = 0; for (const std::vector& bucket : buckets) { for (int num : bucket) { vec[index++] = num; } } } “` The sequential version of bucket sort has been provided in src/bucketsort/sequential.cpp, and your task is to implement a process-level parallel bucket sort using MPI in src/bucketsort/mpi.cpp. Hints: 1. The implementation of process-level bucket sort is relatively easy. The general idea is to assign each MPI worker the task of sorting several buckets and then send the sorted buckets back to the main worker to create the final sorted array. Task 3: Process-Level Parallel Odd-Even Sort Odd-Even Sort, also known as Brick Sort or Parallel Sort, is a relatively simple sorting algorithm that is particularly suited for parallel processing. It is mainly used for sorting data in parallel computing environments, where multiple processors or threads can work on sorting different parts of the data simultaneously. Odd-Even Sort is an iterative sorting algorithm that repeatedly compares and swaps adjacent pairs of elements until the entire list is sorted. Here’s how Odd-Even Sort works: 1. Initialization: The algorithm starts by dividing the list into two parts: odd and even indexed elements. It begins by comparing and potentially swapping elements at even and odd positions within the list. 2. Iteration: Odd-Even Sort proceeds in iterations, with each iteration consisting of two phases: the “odd phase” and the “even phase.” 3. Odd Phase: In the odd phase, the algorithm compares and swaps adjacent elements at odd indices in the list. It starts with the element at index 1 and proceeds to index 3, 5, and so on, until it reaches the second-to-last element. At each comparison, it checks if the elements are out of order and swaps them if necessary. 4. Even Phase: In the even phase, the algorithm compares and swaps adjacent elements at even indices. It starts with the element at index 0 and proceeds to index 2, 4, and so on, until it reaches the second-to-last element. Again, it checks for out-of-order elements and swaps them as needed. 5. Repeat Iterations: These odd and even phases are repeated until no swaps are made in an entire iteration, indicating that the list is sorted. 6. Finalize: After the final iteration, the list is guaranteed to be sorted, and the algorithm terminates.“`c++ void oddEvenSort(std::vector& vec) { bool sorted = false; while (!sorted) { sorted = true; // Perform the odd phase for (int i = 1; i < vec.size() – 1; i += 2) { if (vec[i] > vec[i + 1]) { std::swap(vec[i], vec[i + 1]); sorted = false; } } // Perform the even phase for (int i = 0; i < vec.size() – 1; i += 2) { if (vec[i] > vec[i + 1]) { std::swap(vec[i], vec[i + 1]); sorted = false; } } } } “` The sequential version of odd-even sort has been provided in src/odd-event-sort/sequential.cpp, and your task is to implement a process-level parallel odd-even sort using MPI in src/odd-even-sort/mpi.cpp. Hints: Odd-even sort is designed to be parallel-friendly. In fact, its implementation is straightforward with a shared-memory approach. We assign different threads to specific ranges of the array, and each thread is responsible for exchanging elements within its designated range. However, when dealing with a communication-based approach like MPI, the process becomes much more complex. In MPI, each process has its own memory space and cannot access the data of other processes. Therefore, when the comparison encounters the boundary of a subarray, communication is required among the workers to determine whether an exchange is needed. For example, let’s assume the complete array is [1, 2, 4, 3, 5, 6], and there are two processes. Process 1 gets [1, 2, 3], and process 2 gets [4, 5, 6]. In the even phase of exchanging, process 1 doesn’t know 4’s next neighbor. Therefore, process 2 has to send 3 to process 1, and then process 1 can determine whether an exchange is needed. In this example, an exchange is required. Therefore, process 1 replaces 4 with 3 and then informs process 2 that the replacement has occurred. Then process 2 replaces 3 with 4. Another important aspect is that, at the end of each iteration, every process must inform the master process whether its local sub-vector is sorted or not. If the master process determines that all the sub-vectors are sorted, it then informs all the processes that the vector has been sorted, and it’s time to send the sub-vectors back to form a complete one. The primary challenge in this task is for you to design the communication mechanism between different workers when the comparison reaches the boundary. The general idea isn’t difficult, but the implementation has to address many details, so please be careful. Extra Credits: Dynamic Parallel Sorting Algorithms Task 4: Dynamic Thread-Level Parallel Merge Sort Merge Sort is a highly efficient and widely used sorting algorithm in computer science. It is known for its ability to sort large datasets with excellent time complexity and stability. Merge Sort employs a divide-and-conquer strategy, which involves breaking down the unsorted list into smaller sublists, sorting each sub-list, and then merging the sorted sub-lists to obtain the final sorted result. Here’s a brief overview of how Merge Sort works: 1. Divide: The unsorted list is divided into two equal-sized sub-lists until each sub-list contains only one element. This process continues recursively. 2. Conquer: The one-element sub-lists are considered sorted by default. Then, the adjacent sub-lists are merged together. During the merge process, elements are compared and rearranged in a way that ensures they are in the correct order. 3. Combine: The merging and sorting process continues until all sub-lists are merged into a single, fully sorted list. This final list contains all the elements from the original list, sorted in ascending order.“`c++ // Merge two subarrays of vector vec[] // First subarray is vec[l..m] // Second subarray is vec[m+1..r] void merge(std::vector& vec, int l, int m, int r) { int n1 = m – l + 1; int n2 = r – m; // Create temporary vectors std::vector L(n1); std::vector R(n2); // Copy data to temporary vectors L[] and R[] for (int i = 0; i < n1; i++) { L[i] = vec[l + i]; } for (int i = 0; i < n2; i++) { R[i] = vec[m + 1 + i]; } // Merge the temporary vectors back into v[l..r] int i = 0; // Initial index of the first subarray int j = 0; // Initial index of the second subarray int k = l; // Initial index of the merged subarray while (i < n1 && j < n2) { if (L[i]
Submission Link: https://forms.gle/LiXSz9sf4oaCv14T9 Read all the instructions below carefully before you start working on the assignment. • The purpose of this course is that you learn RL and the best way to do that is by implementation and experimentation. • The assignment requires your to implement some algorithms and you are required report your findings after experimenting with those algorithms. The report should be submitted in pdf format. • In case your solution is found to have an overlap with solution by someone else (including external sources), all the parties involved will get zero in this and all future assignments plus further more penalties in the overall grade. We will check not just for lexical but also semantic overlap. Same applies for the code as well. Even an iota of cheating would NOT be tolerated. If you cheat one line or cheat one page the penalty would be same. • Be a smart agent, think long term, if you cheat we will discover it somehow, the price you would be paying is not worth it. • You have to submit your assignment via following Google Form (link above) • Since the assignment involves experimentation, reporting your results and observations, there is a lot of scope for creativity and innovation and presenting new perspectives. Such efforts would be highly appreciated and accordingly well rewarded. Be an exploratory agent! • In your plots, have a clear legend and clear lines, etc. Of course you would generating the plots in your code but you must also put these plots in your notebook. Generate high resolution pdf/svg version of the plots so that it doesn’t pixilate on zooming. • For all experiments, report about the seed used in the code documentation, write about the seed used. • In your notebook write about all things that are not obvious from the code e.g., if you have made any assumptions, references/sources, running time, etc. • In addition to checking your code and report, very likely, we will be conducting one-on-one viva for the evaluation. So please make sure that you do not cheat! • Use of LLMs based tools or AI-based code tools is strictly prohibited! Use of ChatGPT, VS Code, Gemini, CO-Pilot, etc. is not allowed. NOTE VS code is also not allowed. Even in Colab disable the AI assistant. If you use it, we will know it very easily. Use of any of the tools would be counted as cheating and would be given a ZERO, with no questions asked. Random-Maze Environment 0 1 2 G +1 3 4 Wall 5 6 H -1 7 S 8 9 10 11 Living Penalty of -0.04 (a) Maze Environment (b) Maze Environment Transitions In this assignment we will be exploring a variant of the Random Maze Environment (RME) that we have been looking in the lectures. The environment is represented as a grid world in Figure 1a. Random maze environment is a highly stochastic environment with 11 states: two terminal states (a goal state (G) and a hole state (H)) and 9 non-terminal states and a wall in between the environment. The wall behaves similar to the wall on the periphery of the environment, basically if an agent bumps against the wall, it bounces back. The boundary of the environment behaves similarly, if an agent hits the boundary it bounces back. The agent receives a reward of +1 when it lands in the goal state (3) and it receives a reward of -1 when it lands in the hole state (7). For rest of the transitions there is a reward of -0.04. Essentially the agent has the living cost of -0.04. The transitions are stochastic as shown in Figure 1b. In this environment, four actions are possible: left, top, right, and bottom. For every intended action, there is 80% chance of going in the intended direction and remaining 20% chances of going in either of the orthogonal directions. The 20% chance gets equally distributed between each of the orthogonal direction. The agent starts from state 8 (S). Assume γ = 0.99 for the problems below. In this assignment we will be looking at control algorithms we learnt in Lectures. For each of the plot, create the legend on the left/right side so that it doesn’t overlay on the plot. For all the algorithms below, this time we will not be specifying the hyper-parameters, please play with the hyper-params to come up with the best values. This way you will learn to tune the model. As you are aware from your past experience, single run of the algorithm over the environment results in plots that have lot of variance and look very noisy. One way to overcome this is to create several different instances of the environment using different seeds and then average out the results across these and plot these. For all the plots below, you this strategy. Problem 1: Monte Carlo Control (40+20+20+5+5+5+5=100 points) MonteCarloControl(env, γ, α0, ϵ0, maxSteps, noEpisodes, firstVisit = True) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 2: SARSA (TD Control) (40+20+20+5+5+5+5=100 points) SARSA(env, γ, α0, ϵ0, noEpisodes) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 3: Q-Learning (40+20+20+5+5+5+5=100 points) Q-Learning(env, γ, α0, ϵ0, noEpisodes) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 4: Double Q-Learning (40+20+20+5+5+5+5=100 points) Double-Q-Learning(env, γ, α0, ϵ0, noEpisodes) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 5: Comparing Control Algorithms (20+5+5+5+5+5=40 points) For FVMCC, SARSA, Q and Double-Q algorithms implemented above, do the following: (a) For each of the algorithm, in a single plot, plot the evolution of Policy Success Rate (in %) vs Episodes. Policy Success Rate is defined as number of times the agent reaches the goal state out of the total number of the episodes run using a specific policy. Basically implement the following function that would return the policy success percentage. As you are training the agent, at each episode, you will have a version of the policy, use that policy along with the function below to get the policy success rate. def getPolicySuccessRate(env, πcurrent, goalState, maxEpisodes=100, maxSteps=200) (b) What are your observations from the Policy Success Rate (in %) plot. (c) For each of the algorithm (in a single plot), plot the Estimated Expected Return (from the start state) vs Episodes. (d) What are your observations for the Estimated Expected Return plot? (e) For each of the algorithm (in a single plot), plot the State-value Function Estimation Error vs Episodes. State-value Function Estimation Error is defined as Mean Absolute Error across all V-function estimates (across all states) from the respective optimal value. (f) What are your observations for the State-value Function Estimation Error plot? Problem 6: SARSA(λ) Replacing (40+20+20+5+5+5+5=100 points) SARSA-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = True) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 7: SARSA(λ) Accumulating (40+20+20+5+5+5+5=100 points) SARSA-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = False) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 8: Q(λ) Replacing (40+20+20+5+5+5+5=100 points) Q-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = True) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 9: Q(λ) Accumulating (40+20+20+5+5+5+5=100 points) Q-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = False) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 10: Dyna-Q (40+20+20+5+5+5+5=100 points) Dyna-Q(env, γ, α0, ϵ0, noEpisodes, noPlanning) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 11: Trajectory Learning (40+20+20+5+5+5+5=100 points) TrajectorySampling(env, γ, α0, ϵ0, noEpisodes, maxTrajectory) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 12: Comparing Control Algorithms (5+5+5+5+5+5=25 points) For SARSA(λ) Replacing, SARSA(λ) Accumulating, Q(λ) Replacing, Q(λ) Accumulating, Dyna-Q, Trajectory Learning implemented above, do the following: (a) For each of the algorithm, in a single plot, plot the evolution of Policy Success Rate (in %) vs Episodes. (b) What are your observations from the Policy Success Rate (in %) plot. (c) For each of the algorithm (in a single plot), plot the Estimated Expected Return (from the start state) vs Episodes. (d) What are your observations for the Estimated Expected Return plot? (e) For each of the algorithm (in a single plot), plot the State-value Function Estimation Error vs Episodes. State-value Function Estimation Error is defined as Mean Absolute Error across all V-function estimates (across all states) from the respective optimal value. (f) What are your observations for the State-value Function Estimation Error plot?
Submission Link: https://forms.gle/LiXSz9sf4oaCv14T9 Read all the instructions below carefully before you start working on the assignment. • The purpose of this course is that you learn RL and the best way to do that is by implementation and experimentation. • The assignment requires your to implement some algorithms and you are required report your findings after experimenting with those algorithms. The report should be submitted in pdf format. • In case your solution is found to have an overlap with solution by someone else (including external sources), all the parties involved will get zero in this and all future assignments plus further more penalties in the overall grade. We will check not just for lexical but also semantic overlap. Same applies for the code as well. Even an iota of cheating would NOT be tolerated. If you cheat one line or cheat one page the penalty would be same. • Be a smart agent, think long term, if you cheat we will discover it somehow, the price you would be paying is not worth it. • You have to submit your assignment via following Google Form (link above) • Since the assignment involves experimentation, reporting your results and observations, there is a lot of scope for creativity and innovation and presenting new perspectives. Such efforts would be highly appreciated and accordingly well rewarded. Be an exploratory agent! • In your plots, have a clear legend and clear lines, etc. Of course you would generating the plots in your code but you must also put these plots in your notebook. Generate high resolution pdf/svg version of the plots so that it doesn’t pixilate on zooming. • For all experiments, report about the seed used in the code documentation, write about the seed used. • In your notebook write about all things that are not obvious from the code e.g., if you have made any assumptions, references/sources, running time, etc. • In addition to checking your code and report, very likely, we will be conducting one-on-one viva for the evaluation. So please make sure that you do not cheat! • Use of LLMs based tools or AI-based code tools is strictly prohibited! Use of ChatGPT, VS Code, Gemini, CO-Pilot, etc. is not allowed. NOTE VS code is also not allowed. Even in Colab disable the AI assistant. If you use it, we will know it very easily. Use of any of the tools would be counted as cheating and would be given a ZERO, with no questions asked. Random-Maze Environment 0 1 2 G +1 3 4 Wall 5 6 H -1 7 S 8 9 10 11 Living Penalty of -0.04 (a) Maze Environment (b) Maze Environment Transitions In this assignment we will be exploring a variant of the Random Maze Environment (RME) that we have been looking in the lectures. The environment is represented as a grid world in Figure 1a. Random maze environment is a highly stochastic environment with 11 states: two terminal states (a goal state (G) and a hole state (H)) and 9 non-terminal states and a wall in between the environment. The wall behaves similar to the wall on the periphery of the environment, basically if an agent bumps against the wall, it bounces back. The boundary of the environment behaves similarly, if an agent hits the boundary it bounces back. The agent receives a reward of +1 when it lands in the goal state (3) and it receives a reward of -1 when it lands in the hole state (7). For rest of the transitions there is a reward of -0.04. Essentially the agent has the living cost of -0.04. The transitions are stochastic as shown in Figure 1b. In this environment, four actions are possible: left, top, right, and bottom. For every intended action, there is 80% chance of going in the intended direction and remaining 20% chances of going in either of the orthogonal directions. The 20% chance gets equally distributed between each of the orthogonal direction. The agent starts from state 8 (S). Assume γ = 0.99 for the problems below. In this assignment we will be looking at control algorithms we learnt in Lectures. For each of the plot, create the legend on the left/right side so that it doesn’t overlay on the plot. For all the algorithms below, this time we will not be specifying the hyper-parameters, please play with the hyper-params to come up with the best values. This way you will learn to tune the model. As you are aware from your past experience, single run of the algorithm over the environment results in plots that have lot of variance and look very noisy. One way to overcome this is to create several different instances of the environment using different seeds and then average out the results across these and plot these. For all the plots below, you this strategy. Problem 1: Monte Carlo Control (40+20+20+5+5+5+5=100 points) MonteCarloControl(env, γ, α0, ϵ0, maxSteps, noEpisodes, firstVisit = True) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 2: SARSA (TD Control) (40+20+20+5+5+5+5=100 points) SARSA(env, γ, α0, ϵ0, noEpisodes) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 3: Q-Learning (40+20+20+5+5+5+5=100 points) Q-Learning(env, γ, α0, ϵ0, noEpisodes) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 4: Double Q-Learning (40+20+20+5+5+5+5=100 points) Double-Q-Learning(env, γ, α0, ϵ0, noEpisodes) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 5: Comparing Control Algorithms (20+5+5+5+5+5=40 points) For FVMCC, SARSA, Q and Double-Q algorithms implemented above, do the following: (a) For each of the algorithm, in a single plot, plot the evolution of Policy Success Rate (in %) vs Episodes. Policy Success Rate is defined as number of times the agent reaches the goal state out of the total number of the episodes run using a specific policy. Basically implement the following function that would return the policy success percentage. As you are training the agent, at each episode, you will have a version of the policy, use that policy along with the function below to get the policy success rate. def getPolicySuccessRate(env, πcurrent, goalState, maxEpisodes=100, maxSteps=200) (b) What are your observations from the Policy Success Rate (in %) plot. (c) For each of the algorithm (in a single plot), plot the Estimated Expected Return (from the start state) vs Episodes. (d) What are your observations for the Estimated Expected Return plot? (e) For each of the algorithm (in a single plot), plot the State-value Function Estimation Error vs Episodes. State-value Function Estimation Error is defined as Mean Absolute Error across all V-function estimates (across all states) from the respective optimal value. (f) What are your observations for the State-value Function Estimation Error plot? Problem 6: SARSA(λ) Replacing (40+20+20+5+5+5+5=100 points) SARSA-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = True) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 7: SARSA(λ) Accumulating (40+20+20+5+5+5+5=100 points) SARSA-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = False) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 8: Q(λ) Replacing (40+20+20+5+5+5+5=100 points) Q-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = True) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 9: Q(λ) Accumulating (40+20+20+5+5+5+5=100 points) Q-Lambda(env, γ, α0, ϵ0, λ, noEpisodes, replaceTrace = False) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 10: Dyna-Q (40+20+20+5+5+5+5=100 points) Dyna-Q(env, γ, α0, ϵ0, noEpisodes, noPlanning) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 11: Trajectory Learning (40+20+20+5+5+5+5=100 points) TrajectorySampling(env, γ, α0, ϵ0, noEpisodes, maxTrajectory) (a) Plot evolution of State-value (V) function with time. Basically, plot V-function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true V-function for each of the state in the same plot. (b) Plot evolution of action state value (Q) function with time. Plot Q function vs Episodes for states 0, 1, 2, 4, 6, 8, 9, 10, 11 in a single plot. Also plot the true Q-function for each of the state in the same plot. (c) Describe over how many instances of the environments did you average the results? Write about the seeds used for each instance. (d) Draw the environment diagram with optimal policy (shown using arrows) obtained using the algorithm. (e) Write about the hyper-parameters you finally used for the algorithm and describe how did you arrive at these set of hyper-params. (f) Write about your observations from the plots above. Problem 12: Comparing Control Algorithms (5+5+5+5+5+5=25 points) For SARSA(λ) Replacing, SARSA(λ) Accumulating, Q(λ) Replacing, Q(λ) Accumulating, Dyna-Q, Trajectory Learning implemented above, do the following: (a) For each of the algorithm, in a single plot, plot the evolution of Policy Success Rate (in %) vs Episodes. (b) What are your observations from the Policy Success Rate (in %) plot. (c) For each of the algorithm (in a single plot), plot the Estimated Expected Return (from the start state) vs Episodes. (d) What are your observations for the Estimated Expected Return plot? (e) For each of the algorithm (in a single plot), plot the State-value Function Estimation Error vs Episodes. State-value Function Estimation Error is defined as Mean Absolute Error across all V-function estimates (across all states) from the respective optimal value. (f) What are your observations for the State-value Function Estimation Error plot?
• Only electronic submissions will be accepted. Your main PDF writeup must be typeset in LaTeX (please also refer to the “Additional Instructions” below). • The PDF writeup containing your solution has to be submitted via Gradescope https://www.gradescope. com/. • We have created your Gradescope account (you should have received the notification). Please use your IITK CC ID (not any other email ID) to login. Use the “Forgot Password” option to set your password. Additional Instructions • We have provided a LaTeX template file hw2sol.tex to help typeset your PDF writeup. There is also a style file ml.sty that contain shortcuts to many of the useful LaTeX commends for doing things such as boldfaced/calligraphic fonts for letters, various mathematical/greek symbols, etc., and others. Use of these shortcuts is recommended (but not necessary). • Your answer to every question should begin on a new page. The provided template is designed to do this automatically. However, if it fails to do so, use the clearpage option in LaTeX before starting the answer to a new question, to enforce this. • While submitting your assignment on the Gradescope website, you will have to specify on which page(s) is question 1 answered, on which page(s) is question 2 answered etc. To do this properly, first ensure that the answer to each question starts on a different page. • Be careful to flush all your floats (figures, tables) corresponding to question n before starting the answer to question n + 1 otherwise, while grading, we might miss your important parts of your answers. • Your solutions must appear in proper order in the PDF file i.e. solution to question n must be complete in the PDF file (including all plots, tables, proofs etc) before you present a solution to question n + 1. (SGD for K-means Objective) Recall the K-means objective function: . As we have seen, the K- means algorithm minimizes this objective by taking a greedy iterative approach of assigning each point to its closest center (finding the znk’s) and updating the cluster means . The standard K-means algorithm is a batch algorithm and uses all the data in every iteration. It however can be made online by taking a random example xn at a time, and then (1) assigning xn “greedily” to the “best” cluster, and (2) updating the cluster means using SGD on the objective L. Assuming you have initialized randomly and are reading one data point xn at a time, • How would you solve step 1? • What will be the SGD-based cluster mean update equations for step 2? Intuitively, why does the update equation make sense? • Note that the SGD update requires a step size. For your derived SGD update, suggest a good choice of the step size (and mention why you think it is a good choice). (An Ideal Projection) Suppose we are given some labeled training data {xn,yn} with inputs xn ∈ RD and labels yn ∈ {−1,+1}. We want to project the inputs into one dimension using a projection direction given by a vector w ∈ RD such that, after the projection, the distance between the means of the inputs from the two classes becomes as large possible, and the inputs within each class become as close to each other as possible. Write down the objective/loss function that will achieve this and briefly justify the reason. You don’t need to solve it. (Eigenchangers!) Suppose we wish to do PCA for an N × D matrix X and assume D > N. The traditional way to do PCA is to compute the eigenvectors of the covariance matrix S = N1X>X (assuming centered data). Show that, if someone instead gives you an eigenvector v ∈ RN of the matrix N1XX>, you can use it to get an eigenvector u ∈ RD of S. What is the advantage of this way of obtaining the eigenvectors of S? (Latent Variable Models for Supervised Learning) Consider learning a regression model given training data , with xn ∈ RD and yn ∈ R. Let us give a small twist to the standard probabilistic linear model for regression that we have seen in the class. In particular, we will be introducing a latent variable zn with each training example (xn,yn), where zn ∈ {1,2,…,K} denotes the cluster which xn belongs to, and assuming a multinoulli prior on zn, i.e., p(zn) = multinoulli(zn|π1,…,πK). The model also has K weight vectors W = [w1,w2,…,wK]. Given the cluster id zn, we will assume that . Note that although there are K weight vectors in the overall model, only one of them is being used to model yn depending on the value of zn. Also note that the model for the responses yn is still discriminative, since the inputs are not being modeled. The latent variables are Z = (z1,…,zN) and the global parameters are Θ = {(w1,…,wK),(π1,…,πK)}. (1) Give a brief explanation (max. 5 sentences) of what the above model is doing and why you might want to use it instead of the standard probabilistic linear model which uses a single weight vector w as models each response as p(yn|xn,w) = N(yn|w>xn,β−1). (2) Derive an ALT-OPT algorithm to estimate Z and (MLE of) Θ, and clearly write down each step’s update equations. For Z, you must give the update equation for each individual latent variable zn,n = 1,…,N. Likewise, for Θ, you must give the update equation for each wk,k = 1,…,K, and πk,k = 1,…,K. Also, what will be the update of each zn if πk = 1/K,∀k. Give a brief intuitive explanation (max 1-2 sentences) as to what these updates are doing. (Programming Problem: Part 1) For this problem, your task will be to implement kernel ridge regression (see the solution to mid-sem exam question 6) and ridge regression with landmark based features (let’s call the latter simply “landmark-ridge”), and train and test these two models on the provided dataset (already split into training and test). For both models, you have to use the RBF kernel k(xn,xm) = exp(−γ||xn − xm||2) with bandwidth parameter γ = 0.1. The training and test datasets are given in files ridgetrain.txt and ridgetest.txt. In each file, each line is an example, with first number being the input (a single feature) and the second number being the output. You need to do the following 1. For kernel ridge regression, train the model with the regularization hyperparameter λ = 0.1 and use the learned model to predict the outputs for the test data. Compare the model’s predictions with the true test outputs (given to you) by plotting on a graph. In particular, plot all the test inputs and the corresponding true outputs (x,y) on a 2D plot in blue color, and likewise all the test inputs and corresponding predicted outputs in red color. Repeat this exercise for λ = 1,10,100. What do you observe from the plots? For each case, also report the root-mean-squared-error (RMSE) on the test data, which is defined as square root of the mean of squared differences of true outputs and the predicted outputs. 2. For landmark-ridge, you first need to extract landmark based features using the RBF kernel and you will then use data with these features to train a linear ridge regression model. Again, keep the regularization hyperparameter fixed at λ = 0.1. Try L = 2,5,20,50,100 uniformly randomly chosen landmark points from the training set, train the model for each case and compute the predictions on the test data. Plot the results for each case just like you did in part 1 (but only for λ = 0.1). What do you observe from the plots? What’s the RMSE in each case? What value of L seems good enough? (Programming Problem: Part 2) For this problem, your task will be to implement the K-means clustering algorithm and try it on a provided toy (but interesting) dataset (kmeansdata.txt) consisting of points in two dimensions. The provided dataset also has 2 clusters (so you would use K = 2). However, the data is such that the standard K-means will NOT work well since the clusters are not spherical and not separable linearly (you can check this by plotting the data in 2D using a scatter plot). You will consider two ways to handle this issue. 1. Using Hand-crafted Features: Propose a feature transformation to the original data that will transform it such that K-means will be able to learn the correct clusters! Before proposing the transformation, plot the original data to see what transformation(s) might probably work. Apply your K-means implementation on this transformed version of the data to verify if your transformation works. Plot your obtained clustering results by showing points (in the original 2D space) in cluster 1 in red and points in cluster 2 in green. 2. Using Kernels: Although you can kernelize K-mean via the kernel K-means algorithm, you will try something else. You will use the landmark based approach to extract good features and your implementation of standard K-means on these features. The kernel that you will use for the landmark based approach is the RBF kernel k(xn,xm) = exp(−γ||xn − xm||2) with bandwidth parameter γ = 0.1. We are going Important Note: To avoid randomness in cluster mean initialization, we will choose the first two points in the dataset (the first two lines in the provided dataset) as the initial cluster centers. (Programming Problem: Part 3) You are provided a subset of MNIST digits data (as a pickle file) with 10,000 examples from digits 0-9 (10 classes). Loading the pickle file will show X and Y fields contain the digit input features (784 dimensional) and labels (0-9), respectively, of the 10,000 examples. Deliverables: For each of the three parts, you should prepare separate Python notebooks, zip them together in a single zip file (named as yourrollnumber-problem-5.zip) and upload at the following Dropbox link: https://www.dropbox.com/request/YKHU6d5RX4nSXDdNV6gd. In the main PDF writeup, write about your observations for each of the three parts and include the plots in the PDF.
Instructions: • Only electronic submissions will be accepted. Your main PDF writeup must be typeset in LaTeX (please also refer to the “Additional Instructions” below). • The PDF writeup containing your solution has to be submitted via Gradescope https://www.gradescope. com/. • We have created your Gradescope account (you should have received the notification). Please use your IITK CC ID (not any other email ID) to login. Use the “Forgot Password” option to set your password. Additional Instructions • We have provided a LaTeX template file hw1sol.tex to help typeset your PDF writeup. There is also a style file ml.sty that contain shortcuts to many of the useful LaTeX commends for doing things such as boldfaced/calligraphic fonts for letters, various mathematical/greek symbols, etc., and others. Use of these shortcuts is recommended (but not necessary). • Your answer to every question should begin on a new page. The provided template is designed to do this automatically. However, if it fails to do so, use the clearpage option in LaTeX before starting the answer to a new question, to enforce this. • While submitting your assignment on the Gradescope website, you will have to specify on which page(s) is question 1 answered, on which page(s) is question 2 answered etc. To do this properly, first ensure that the answer to each question starts on a different page. • Be careful to flush all your floats (figures, tables) corresponding to question n before starting the answer to question n + 1 otherwise, while grading, we might miss your important parts of your answers. • Your solutions must appear in proper order in the PDF file i.e. solution to question n must be complete in the PDF file (including all plots, tables, proofs etc) before you present a solution to question n + 1. (Another Simple Classifier) Consider a classification model where we are given training data from K classes. Each input xn ∈ RD and each class c is defined by two parameters, wc ∈ RD and a D × D positive definite (PD) matrix Mc, c = 1,2,…,K. Assume Nc denotes the number of training examples from class c. Suppose we estimate wc and Mc by solving the following optimization problem (wˆc,Mˆ c) = arg min X 1 (xn − wc)>Mc(xn − µc) − log|Mc| wc,Mcxn: n Nc y =c (note that, in the above objective, the log|Mc| term ensures positive definiteness of Mc because the determinant of a PD matrix is always non-negative) For the given objective/loss function, find the optimal values of wc and Mc. Also, what will this model reduce to as a special case when Mc is an identity matrix? (Consistent or Not?) An important notion for a classifier is that of consistency. A classification algorithm is said to be consistent if, whenever it has access to infinite amounts of training data, its error rate approaches the optimal error rate (a.k.a. Bayes optimal). Consider the noise-free setting (i.e., every training input is labeled correctly). Here, the Bayes optimal error rate is zero. Is the one-nearest-neighbor algorithm consistent in this setting? Briefly justify your answer in 100 words or less. (Linear Regression viewed as Nearest Neighbors) Show that, for the unregularized linear regression model, where the solution wˆ = (X>X)−1X>y, the prediction at a test input x∗ can be written as a weighted sum of all the training responses, i.e., N f(x∗) = Xwnyn n=1 Give the expression for the weights wn’s in this case and briefly discuss (s As + λI)−1A>s Ms where As is the 40 × 85 matrix containing the class attribute vectors of 40 seen classes, and Ms is the 40 × 4096 matrix containing the means of the 40 seen classes. Once you have W, you can compute the mean µc of any unseen class as Wac where ac is its class attribute vector. This procedure will give us the means of 10 unseen classes and then you can apply the prototype based classifer to predict the labels of each test input. You task is to implement both methods. In your main write-up, report the test-set classification accuracy for each by comparing the respective model’s predictions with the provided ground truth labels (accuracy is the percentage of test inputs with correctly predicted classes). For method 2, try λ = 0.01,0.1,1,10,20,50,100 and report the test accuracy for each value of λ. Which value of λ gives the best test set accuracy? Note: In the dataset folder, I’ve also provided a brief pseudo-code (or rather a summary) of the procedure to follow for the implementation (since this is the very first assignment, I felt it would be helpful for you all). Important: To predict the class of a test input, you should only compute its distances from the means of 10 unseen classes, not from the means of the seen classes (since the test data only consists of inputs from the unseen classes). The other setting where the test data can consist of both seen and unseen class inputs is also possible but we are not considering it here. Deliverables: The code for each method, submitted as separate files. For method 1, name the file convex.py; likewise, for method 2, name the file regress.py. Your codes should be easily readable. Please use comments wherever necessary. Also include a README file to briefly describe how to run the code. All the code and README should be zipped together and submitted as a single file named yourrollnumber.zip. Please DO NOT submit the data provided.
5/5 - (2 votes)
![num_epochs=250 learning_rate=0 0005 20230822_11-33-01 training Setting up Instructions For “simple” dataset: python train.py –dataset simple –model simple –num_epochs –learning_rate –seed For “digits” dataset: python train.py –dataset digits –model digits –num_epochs –learning_rate –seed The default values for each of these parameters are available in args.py. The seed argument is useful if your model uses random initialization (default PyTorch behavior). With a fixed seed, your experiment will be reproducible. You can fix a seed throughout your experiments for best reproducibility. It is possible that a different seed will give different result but the differences will mostly be minor so you should focus on choosing better hyperparameters and making a better model. TensorBoard This assignment uses PyTorch Lightning which allows us to use a far more sophisticated logger called TensorBoard developed originally for the TensorFlow framework. To view your training logs, you can go into checkpoints/ directory and run tensorboard –logdir ./ to start TensorBoard. You can now view the (live) training progress of all your models by going to http://localhost:6006 in your browser! You can also download the data from here in a CSV for making plots in your report. Kaggle Submission You can compete on the Kaggle competition by creating a submission using python make_kaggle_submission.py . This will create a file called kaggle_upload.csv which you can upload on Kaggle to see results. Moodle Submission Once you are done with both the tasks, copy weights corresponding to your best models for each dataset into submission/ directory. Also copy the completed version (implementing both models) of model.py and your observation report (with filename report.pdf) into the same directory. Overall, make sure your submission folder looks as below. This is crucial since the assignment will be autograded: submission/ model.py best_simple.ckpt best_digits.ckpt report.pdf You can get a hint of the accuracy and loss values that autograder will use for grading your submission by running python evaluate_submission.py in this directory itself. The observation report should also contain roll numbers of both students in the team. Once you are satisfied with the submission, use tar -cvzf _.tar.gz submission/ to create the final submission. Only the student with lower roll number in the group needs to upload this .tar.gz on Moodle. Visualizing (Optional)
Setting up Instructions For logistic regression: python train.py –dataset binary –model logistic_regression –num_epochs -learning_rate –momentum For linear classifier: python train.py –dataset iris –model linear_classifier –num_epochs -learning_rate –momentum The default values for each of these parameters are available in args.py Submission Once you are done with both the tasks, copy weights corresponding to your best models for each dataset into submission/ directory. Also copy the completed version (implementing both models) of model.py and your observation report (with filename report.pdf) into the same directory. Overall, make sure your submission folder looks as below. This is crucial since the assignment will be autograded: submission/ model.py best_binary.weights.npy best_iris.weights.npy report.pdf You can get a hint of the accuracy and loss values that autograder will use for grading your submission by running python evaluate_submission.py in this directory itself. The observation report should also contain roll numbers of both students in the team. Once you are satisfied with the submission, use tar -cvzf _.tar.gz submission/ to create the final submission. Only the student with lower roll number in the group needs to upload this .tar.gz on Moodle. Visualizing (Optional)
In this assignment, you will write a particle tracing program to generate streamlines from a 3D vector field data set using Runge–Kutta 4 (RK4) integration. We have discussed the method in class.Here is what your code should do:1. You will load a vector field data set that has a vector array named ‘vectors.’ It is a 3D vector field data set, so vectors at each grid point will have three components. 2. Given a user-provided seed location, you will apply the RK4 integration to generate a streamline. Your Python script should take the 3D seed location as an input parameter. 3. You will perform RK4 integration in both forward and backward directions, starting from the seed location to trace forward and backward directions. Finally, you will create a single streamline and store it as a VTKPolyData file (*.vtp). To trace the particle in the backward direction, you can simply use a negative integration step size. Hence, your streamline should consist of points traced in a backward direction + seed point (at center) + points traced in the forward direction. It should be a single continuous line, as shown in Fig. 1. 4. Your code should take the 3D seed location as input from the user and output a single *.vtp file, which is the streamline traced from the input seed location. 5. You could set step size in RK4 = 0.05 and max number of steps = 1000 for testing. Note that your streamline can go out of the bounds of the data set. Therefore, you must check that your current point on the streamline is within the bounds of the vector field. You can use data.GetBounds() to query the vector field bound and compare the point location on your streamline with the XYZ bounds of the data set. If the streamline is out of bounds after an integration step, you should stop tracing and return the result. 6. To get a vector at each step, you must perform interpolation to get the vector at each location. In this assignment, you are not required to implement your interpolation routine. Instead, you will use VTKProbeFilter() to query vector location at any point in the data set. It will give you the vector at any queried location, and you can use that to perform particle tracing. 7. You are not allowed to use VTK Stream Tracer. You must implement the program on your own.If you have implemented correctly, you should see images like the following for input seed (0,0,7):Fig. 1: Plot of a streamline from three different views. The seed location is (0,0,7), step size is 0.05 and maximum number of steps is 1000. Both forward and backward tracing is used and results are combined into a single streamline.Dataset for this task:The dataset that you will use in this assignment is a 3D vector field data of a simulated tornado. The VTI file is provided with the assignment.How to submit?“groupnum_rollnum1_rollnum2_Assignment3.zip”.** You must ensure you are submitting your correct submission files. If you upload wrong files, you will not be allowed to submit again. We will only grade your submitted version from HelloIITK. No submissions will be accepted after the allowed late days. **
Visualization with PlotlyIn this assignment, you will create a simple interactive interface using Plotly and Jupyter Widgets. Plotly will be used for generating visualizations and Jupyter Widgets will be used to add sliders and buttons.Reference for Jupyter Widgets: https://ipywidgets.readthedocs.io/en/stable/ Sliders and Buttons: https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20Events.htmlHere is what your interface should do:1. In the left side, your interface should show an interactive Isosurface visualization of the 3D dataset provided with this assignment. Plotly has an API to draw Isosurfaces. [See Fig. 1]3. Your interface should contain a slider that allows changing the isovalue for the Isosurface. This slider value range should be mapped to the entire data range and by sliding to through different scalar values, the Isosurface plot should get updated automatically. You should also color your Isosurface using a standard Python/Plotly colormap so that the color of your Isosurface changes as you change the isovalue. I have used colormap ‘plasma’ in this example. You can use Jupyter Widgets to add the slider. [See Fig. 1]4. At the beginning, you should show Isosurface of isovalue=0.0 and the histogram of the entire volume data set as seen in the Figure. [See Fig. 1]5. When you change your slider, you will also update the histogram plot. Let’s say, you have selected the isovalue = x from the slider by changing it. Then your histogram plot should contain data points with the following conditions: (x – 0.25)