Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Cs 261 programming assignment 7 graph traversals

This assignment is shorter than previous assignments. It consists of two parts: a short implementation and a few questions. Part 1: Implementation Complete the implementation of the depth-first search (DFS) and breadth-first search (BFS) algorithms in graph.c. To support the implementation of these algorithms, we have provided you with a circular doubly linked list (see cirListDeque.h and cirListDeque.c), which can be used as either a stack or a queue. This gives you a chance to apply some of your data structure knowledge from earlier in the course. There is a reference implementation of DFS implemented recursively in graph.c. You should write an imperative implementation using a stack. You can use the recursive definition to check your own implementation. In the main function there some paramaters that you can tweak to generate different graphs and run your two search algorithms on them. For every pair of nodes A and B in graph, the main function will determine whether A is reachable from B using either DFS or BFS. You may want to add some new graphs of your own, or modify this file to run tests on several different graphs at once. You may assume that all graphs are undirected but you should NOT assume that they are all connected. That is, there are not necessarily paths between all nodes. Files: graphAssignmentFiles.zip Part 2: Questions Run your program with each of the five example graph and compute the reachability between each pair of nodes using both BFS and DFS on each. Then answer the following questions: 1. How is the graph stored in the provided code — adjacency matrix or edge list? 2. Which of the graphs are connected? How can you tell? 3. Imagine that we ran each search in the other direction (from destination to source, instead of source to destination) — would the output change at all? What if the graphs were directed graphs? 4. What are a few pros and cons of DFS vs. BFS? 5. What’s the Big O execution time to determine if a node is reachable from another node? Tips Pay careful attention to the struct definitions. In particular, the Graph struct contains an array of Vertex structs (all of the vertexes in the graph), while the Vertex struct contains an array of pointers to the Vertex structs that are neighbors of that vertex. These pointers point to the same vertexes stored in the array in the Graph struct! If a vertex named pueblo has at least three neighbors, to access the third neighbor we would write: pueblo->neighbors[2] What to submit graph.c answers.txt (or .pdf) Grading Rubric (Total: 75pts) Compiles (10pts) DFS (20pts) BFS (20pts) Questions (5pts each = 25pts)

[SOLVED] Cs 261 assignment6: hashtable implementation and concordance

In this assignment there will be a programming portion and a written portion. PROGRAMMING For this assignment, you will complete the hash-map with chaining as specified by the provided header file. Chapter 12 and worksheet 38 will be good resources for completing this assignment. After completing the hash-map implementation you will write a concordance program in your main.c. A concordance counts the number of occurrences of each word in a document. For this assignment, we will work with text files. There are three files to work with for the programming portion: main.c is where you will write the concordance code. hashMap.h the header file which defines structs and public functions for your hash table and, explains the purpose of each function. Notice that there is a variable HASHING_FUNCTION which determines which hashing function should be used in hashMap.c Your code should check this value to determine which hashing function to use. HASHING_FUNCTION==1 means use stringHash1 and HASHING_FUNCTION==2 means use stringHash2. hashMap.c a mostly empty implementation file which you will fill in with your code. Makefile Remember to rename this from Makefile.txt to makefile when you ‘save as’. You will be making changes to main.c and hashMap.c. Do not make changes to hashMap.h There are three functions provided to you: getWord(FILE*) will parse out the next word from the file for you. Read the description of the function in the comments inside main.c for more information. stringHash1(char*) is the first function you will use for converting a key into an integer hash index. This function is located in the provided hashMap.c. stringHash2(char*) is the second function for computing an integer hash index, part of your job will be to explain why stringHash2 is better then stringHash1. This function is located in the provided hashMap.c. There is some code in the main function for giving you timing information as well. You should not need to change anything in order to use it. The purpose of the timing information will be made clear in the questions section. The worksheets and the function explanations should provide the information needed to complete this assignment, except for some more details on what a concordance is. Concordance The job of the concordance is to count how many times each word occurs in a document. You will implement a concordance using a hash table implementation of the Map interface. In this implementation, the hash table will store hashLinks which consist of a key, value, and pointer to the next link in the chain. The keys are the words and the values are the number of occurrences of each word. You are provided with a function to retrieve words from a FILE pointer. It is your job to open this file (fopen()) and close this file (fclose()) but the reading of the file will be handled for you inside of getWord. For help with fopen() and fclose() see the slides posted on the schedule for the recitation day. Your concordance will run a loop until the end of the file is reached. In the loop, you will: 1. Read in a word with getWord(). 2. If the word is already in your hash table then increment it’s number of occurrences. 3. If the word is not in your hash table then insert it with an occurrence count of 1. After processing the text file into your concordance you will print all the words in your hash table. Please print the words in the following form with only one word on each line: for the input file of: It was the best of times, It was the worst of times. best: 1 It: 2 was: 2 the: 2 of: 2 worst: 1 times: 2 You may choose any order in which to print the words. Challenge- Extra credit ( 10 points) There are a lot of uses for a hashMap, and one of them is for implementing a spell-checker. All you need to get started is a dictionary. Dictionary spellcheck.c Inside spellcheck.c you will find some code to get you started, but it should look very much like main.c Written Questions Your written questions will be handed in electronically, preferably as comments on the TEACH turn-in page, just cut and paste from your preferred editor. 1. Give an example of two words that would hash to the same value using stringHash1() but would not using stringHash2(). 2. Why does the above make stringHash2() superior to stringHash1()? 3. When you run your program on the same input file but one run using stringHash1() and on the other run using stringHash2(). Is it possible for your size() function to return different values? 4. When you run your program on the same input file using stringHash1() on one run and using stringHash2() on another, is it possible for your tableLoad() function to return different values? 5. When you run your program on the same input file with one run using stringHash1() and the other run using stringHash2(), is it possible for your emptyBuckets() function to return different values? 6. Is there any difference in the number of ’empty buckets’ when you change the table size from an even number, like 1000 to a prime like 997 ? 7. Using the timing code provided to you. Run you code on different size hash tables. How does affecting the hash table size change your performance? Rubric for assignment 6: (Total 100 points) Compile and Style: 20 points Concordance: 20 hash map: 50 Written questions: 10 Extra Credit Challenge (SpellChecker): 10 What to submit 1. main.c 2. hashMap.c 3. spellcheck.c (optional) 4. The answers to the written questions as either a teach comment or .pdf

[SOLVED] Cs261 assignment 6 hashmap implementation

1. Implement the HashMap class by completing the provided skeleton code in the file hash_map_sc.py. Once completed, your implementation will include the following methods: put() get() remove() contains_key() clear() empty_buckets() resize_table() table_load() get_keys() find_mode()2. Use a dynamic array to store your hash table and implement chaining for collision resolution using a singly linked list. Chains of key/value pairs must be stored in linked list nodes. The diagram below illustrates the overall architecture of the HashMap class:3. Two pre-written classes are provided for you in the skeleton code – DynamicArray and LinkedList (in a6_include.py). You must use objects of these classes in your HashMap class implementation. Use a DynamicArray object to store your hash table, and LinkedList objects to store chains of key/value pairs.4. The provided DynamicArray and LinkedList classes may provide different functionality than those described in the lectures, or implemented in prior homework assignments. Review the docstrings in the skeleton code to understand the available methods, their use, and input/output parameters.5. The number of objects stored in the hash map will be between 0 and 1,000,000 inclusive. 6. Two pre-written hash functions are provided in the skeleton code. Make sure you test your code with both functions. We will use these two functions in our testing of your implementation.7. RESTRICTIONS: You are NOT allowed to use ANY built-in Python data structures and/or their methods. You are NOT allowed to directly access any variables of the DynamicArray or LinkedList classes. All work must be done only by using class methods.8. Variables in the SLNode class are not private. You ARE allowed to access and change their values directly. You do not need to write any getter or setter methods. 9. You may not use any imports beyond the ones included in the assignment source code provided.put(self, key: str, value: object) -> None: This method updates the key/value pair in the hash map. If the given key already exists in the hash map, its associated value must be replaced with the new value. If the given key is not in the hash map, a new key/value pair must be added.For this hash map implementation, the table must be resized to double its current capacity when this method is called and the current load factor of the table is greater than or equal to 1.0. Example #1: m = HashMap(53, hash_function_1) for i in range(150): m.put(‘str’ + str(i), i * 100) if i % 25 == 24: print(m.empty_buckets(), round(m.table_load(), 2), m.get_size(), m.get_capacity()) Output: 39 0.47 25 53 39 0.94 50 53 82 0.7 75 107 79 0.93 100 107 184 0.56 125 223 181 0.67 150 223 Example #2: m = HashMap(41, hash_function_2) for i in range(50): m.put(‘str’ + str(i // 3), i * 100) if i % 10 == 9: print(m.empty_buckets(), round(m.table_load(), 2), m.get_size(), m.get_capacity()) Output: 37 0.1 4 41 34 0.17 7 41 31 0.24 10 41 28 0.34 14 41 26 0.41 17 41empty_buckets(self) -> int: This method returns the number of empty buckets in the hash table. Example #1: m = HashMap(101, hash_function_1) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key1’, 10) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key2’, 20) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key1’, 30) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key4’, 40) print(m.empty_buckets(), m.get_size(), m.get_capacity())Output: 101 0 101 100 1 101 99 2 101 99 2 101 98 3 101 Example #2: m = HashMap(53, hash_function_1) for i in range(150): m.put(‘key’ + str(i), i * 100) if i % 30 == 0: print(m.empty_buckets(), m.get_size(), m.get_capacity()) Output: 52 1 53 39 31 53 83 61 107 80 91 107 184 121 223table_load(self) -> float: This method returns the current hash table load factor. Example #1: m = HashMap(101, hash_function_1) print(round(m.table_load(), 2)) m.put(‘key1’, 10) print(round(m.table_load(), 2)) m.put(‘key2’, 20) print(round(m.table_load(), 2)) m.put(‘key1’, 30) print(round(m.table_load(), 2)) Output: 0.0 0.01 0.02 0.02Example #2: m = HashMap(53, hash_function_1) for i in range(50): m.put(‘key’ + str(i), i * 100) if i % 10 == 0: print(round(m.table_load(), 2), m.get_size(), m.get_capacity()) Output: 0.02 1 53 0.21 11 53 0.4 21 53 0.58 31 53 0.77 41 53clear(self) -> None: This method clears the contents of the hash map. It does not change the underlying hash table capacity. Example #1: m = HashMap(101, hash_function_1) print(m.get_size(), m.get_capacity()) m.put(‘key1’, 10) m.put(‘key2’, 20) m.put(‘key1’, 30) print(m.get_size(), m.get_capacity()) m.clear() print(m.get_size(), m.get_capacity()) Output: 0 101 2 101 0 101Example #2: m = HashMap(53, hash_function_1) print(m.get_size(), m.get_capacity()) m.put(‘key1’, 10) print(m.get_size(), m.get_capacity()) m.put(‘key2’, 20) print(m.get_size(), m.get_capacity()) m.resize_table(100) print(m.get_size(), m.get_capacity()) m.clear() print(m.get_size(), m.get_capacity()) Output: 0 53 1 53 2 53 2 101 0 101resize_table(self, new_capacity: int) -> None: This method changes the capacity of the internal hash table. All existing key/value pairs must remain in the new hash map, and all hash table links must be rehashed. (Consider calling another HashMap method for this part).First check that new_capacity is not less than 1; if so, the method does nothing. If new_capacity is 1 or more, make sure it is a prime number. If not, change it to the next highest prime number. You may use the methods _is_prime() and _next_prime() from the skeleton code.Example #1: m = HashMap(23, hash_function_1) m.put(‘key1’, 10) print(m.get_size(), m.get_capacity(), m.get(‘key1’), m.contains_key(‘key1’)) m.resize_table(30) print(m.get_size(), m.get_capacity(), m.get(‘key1’), m.contains_key(‘key1’))Output: 1 23 10 True 1 31 10 True Example #2: m = HashMap(79, hash_function_2) keys = [i for i in range(1, 1000, 13)] for key in keys: m.put(str(key), key * 42) print(m.get_size(), m.get_capacity()) for capacity in range(111, 1000, 117): m.resize_table(capacity) m.put(‘some key’, ‘some value’) result = m.contains_key(‘some key’) m.remove(‘some key’) for key in keys: result &= m.contains_key(str(key)) result &= not m.contains_key(str(key + 1)) print(capacity, result, m.get_size(), m.get_capacity(), round(m.table_load(), 2))Output: 77 79 111 True 77 113 0.68 228 True 77 229 0.34 345 True 77 347 0.22 462 True 77 463 0.17 579 True 77 587 0.13 696 True 77 701 0.11 813 True 77 821 0.09 930 True 77 937 0.08get(self, key: str) -> object: This method returns the value associated with the given key. If the key is not in the hash map, the method returns None. Example #1: m = HashMap(31, hash_function_1) print(m.get(‘key’)) m.put(‘key1’, 10) print(m.get(‘key1’)) Output: None 10 Example #2: m = HashMap(151, hash_function_2) for i in range(200, 300, 7): m.put(str(i), i * 10) print(m.get_size(), m.get_capacity()) for i in range(200, 300, 21): print(i, m.get(str(i)), m.get(str(i)) == i * 10) print(i + 1, m.get(str(i + 1)), m.get(str(i + 1)) == (i + 1) * 10)Output: 15 151 200 2000 True 201 None False 221 2210 True 222 None False 242 2420 True 243 None False 263 2630 True 264 None False 284 2840 True 285 None Falsecontains_key(self, key: str) -> bool: This method returns True if the given key is in the hash map, otherwise it returns False. An empty hash map does not contain any keys.Example #1: m = HashMap(53, hash_function_1) print(m.contains_key(‘key1’)) m.put(‘key1’, 10) m.put(‘key2’, 20) m.put(‘key3’, 30) print(m.contains_key(‘key1’)) print(m.contains_key(‘key4’)) print(m.contains_key(‘key2’)) print(m.contains_key(‘key3’)) m.remove(‘key3’) print(m.contains_key(‘key3’)) Output: False True False True True FalseExample #2: m = HashMap(79, hash_function_2) keys = [i for i in range(1, 1000, 20)] for key in keys: m.put(str(key), key * 42) print(m.get_size(), m.get_capacity()) result = True for key in keys: # all inserted keys must be present result &= m.contains_key(str(key)) # NOT inserted keys must be absent result &= not m.contains_key(str(key + 1)) print(result) Output: 50 79 Trueremove(self, key: str) -> None: This method removes the given key and its associated value from the hash map. If the key is not in the hash map, the method does nothing (no exception needs to be raised). Example #1: m = HashMap(53, hash_function_1) print(m.get(‘key1’)) m.put(‘key1’, 10) print(m.get(‘key1’)) m.remove(‘key1’) print(m.get(‘key1’)) m.remove(‘key4′) Output: None 10 None get_keys_and_values(self) -> DynamicArray:This method returns a dynamic array where each index contains a tuple of a key/value pair stored in the hash map. The order of the keys in the dynamic array does not matter. Example #1: m = HashMap(11, hash_function_2) for i in range(1, 6): m.put(str(i), str(i * 10)) print(m.get_keys_and_values()) m.put(’20’, ‘200’) m.remove(‘1’) m.resize_table(2) print(m.get_keys_and_values())Output: [(‘1′, ’10’), (‘2′, ’20’), (‘3′, ’30’), (‘4′, ’40’), (‘5′, ’50’)] [(‘2′, ’20’), (‘3′, ’30’), (’20’, ‘200’), (‘4′, ’40’), (‘5′, ’50’)]find_mode(arr: DynamicArray) -> (DynamicArray, int): Write a standalone function outside of the HashMap class that receives a dynamic array (that is not guaranteed to be sorted). This function will return a tuple containing, in this order, a dynamic array comprising the mode (most occurring) value/s of the array, and an integer that represents the highest frequency (how many times the mode value(s) appear). If there is more than one value with the highest frequency, all values at that frequency should be included in the array being returned (the order does not matter). If there is only one mode, the dynamic array will only contain that value.You may assume that the input array will contain at least one element, and that all values stored in the array will be strings. You do not need to write checks for these conditions. For full credit, the function must be implemented with O(N) time complexity. For best results, we recommend using the separate chaining hash map provided for you in the function’s skeleton code.Example #1: da = DynamicArray([“apple”, “apple”, “grape”, “melon”, “peach”]) mode, frequency = find_mode(da) print(f”Input: {da}nMode : {mode}, Frequency: {frequency}”) Output: Input: [‘apple’, ‘apple’, ‘grape’, ‘melon’, ‘peach’] Mode : [‘apple’], Frequency: 2Example #2: test_cases = ( [“Arch”, “Manjaro”, “Manjaro”, “Mint”, “Mint”, “Mint”, “Ubuntu”, “Ubuntu”, “Ubuntu”], [“one”, “two”, “three”, “four”, “five”], [“2”, “4”, “2”, “6”, “8”, “4”, “1”, “3”, “4”, “5”, “7”, “3”, “3”, “2”] ) for case in test_cases: da = DynamicArray(case) mode, frequency = find_mode(da) print(f”{da}nMode : {mode}, Frequency: {frequency}n”)Output: Input: [‘Arch’, ‘Manjaro’, ‘Manjaro’, ‘Mint’, ‘Mint’, ‘Mint’, ‘Ubuntu’, ‘Ubuntu’, ‘Ubuntu’] Mode : [‘Mint’, ‘Ubuntu’], Frequency: 3 Input: [‘one’, ‘two’, ‘three’, ‘four’, ‘five’] Mode : [‘one’, ‘four’, ‘two’, ‘five’, ‘three’], Frequency: 1 Input: [‘2’, ‘4’, ‘2’, ‘6’, ‘8’, ‘4’, ‘1’, ‘3’, ‘4’, ‘5’, ‘7’, ‘3’, ‘3’, ‘2’] Mode : [‘2’, ‘3’, ‘4’], Frequency: 31. Implement the HashMap class by completing the provided skeleton code in the file hash_map_oa.py. Your implementation will include the following methods: put() get() remove() contains_key() clear() empty_buckets() resize_table() table_load() get_keys() __iter__(), __next__()2. Use a dynamic array to store your hash table, and implement Open Addressing with Quadratic Probing for collision resolution inside that dynamic array. Key/value pairs must be stored in the array. Refer to the Explorations for an example of this implementation.3. Use the pre-written DynamicArray class in a6_include.py. You must use objects of this class in your HashMap class implementation. Use a DynamicArray object to store your Open Addressing hash table.4. The provided DynamicArray class may provide different functionality than the one described in the lectures or implemented in prior homework assignments. Review the docstrings in the skeleton code to understand the available methods, their use, and input/output parameters.5. The number of objects stored in the hash map will be between 0 and 1,000,000 inclusive. 6. Two pre-written hash functions are provided in the skeleton code. Make sure you test your code with both functions. We will use these two functions in our testing of your implementation.7. RESTRICTIONS: You are NOT allowed to use ANY built-in Python data structures and/or their methods. You are NOT allowed to directly access any variables of the DynamicArray class. All work must be done only by using class methods.8. Variables in the HashEntry class are not private. You ARE allowed to access and change their values directly. You do not need to write any getter or setter methods. 9. You may not use any imports beyond the ones included in the assignment source code.put(self, key: str, value: object) -> None: This method updates the key/value pair in the hash map. If the given key already exists in the hash map, its associated value must be replaced with the new value. If the given key is not in the hash map, a new key/value pair must be added.For this hash map implementation, the table must be resized to double its current capacity when this method is called and the current load factor of the table is greater than or equal to 0.5.Example #1: m = HashMap(53, hash_function_1) for i in range(150): m.put(‘str’ + str(i), i * 100) if i % 25 == 24: print(m.empty_buckets(), m.table_load(), m.get_size(), m.get_capacity()) Output: 28 0.47 25 53 57 0.47 50 107 148 0.34 75 223 123 0.45 100 223 324 0.28 125 449 299 0.33 150 449 Example #2: m = HashMap(41, hash_function_2) for i in range(50): m.put(‘str’ + str(i // 3), i * 100) if i % 10 == 9: print(m.empty_buckets(), m.table_load(), m.get_size(), m.get_capacity())Output: 37 0.1 4 41 34 0.17 7 41 31 0.24 10 41 27 0.34 14 41 24 0.41 17 41table_load(self) -> float: This method returns the current hash table load factor. Example #1: m = HashMap(101, hash_function_1) print(m.table_load()) m.put(‘key1’, 10) print(m.table_load()) m.put(‘key2’, 20) print(m.table_load()) m.put(‘key1’, 30) print(m.table_load()) Output: 0.0 0.01 0.02 0.02Example #2: m = HashMap(53, hash_function_1) for i in range(50): m.put(‘key’ + str(i), i * 100) if i % 10 == 0: print(m.table_load(), m.get_size(), m.get_capacity()) Output: 0.02 1 53 0.21 11 53 0.4 21 53 0.29 31 107 0.38 41 107empty_buckets(self) -> int: This method returns the number of empty buckets in the hash table. Example #1: m = HashMap(101, hash_function_1) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key1’, 10) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key2’, 20) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key1’, 30) print(m.empty_buckets(), m.get_size(), m.get_capacity()) m.put(‘key4’, 40) print(m.empty_buckets(), m.get_size(), m.get_capacity())Output: 101 0 101 100 1 101 99 2 101 99 2 101 98 3 101Example #2: m = HashMap(53, hash_function_1) for i in range(150): m.put(‘key’ + str(i), i * 100) if i % 30 == 0: print(m.empty_buckets(), m.get_size(), m.get_capacity()) Output: 52 1 53 76 31 107 162 61 223 132 91 223 328 121 449resize_table(self, new_capacity: int) -> None: This method changes the capacity of the internal hash table. All existing key/value pairs must remain in the new hash map, and all hash table links must be rehashed. First check that new_capacity is not less than the current number of elements in the hash map; if so, the method does nothing.If new_capacity is valid, make sure it is a prime number; if not, change it to the next highest prime number. You may use the methods _is_prime() and _next_prime() from the skeleton code.Example #1: m = HashMap(23, hash_function_1) m.put(‘key1’, 10) print(m.get_size(), m.get_capacity(), m.get(‘key1’), m.contains_key(‘key1’)) m.resize_table(30) print(m.get_size(), m.get_capacity(), m.get(‘key1’), m.contains_key(‘key1’))Output: 1 23 10 True 1 31 10 True Example #2: m = HashMap(79, hash_function_2) keys = [i for i in range(1, 1000, 13)] for key in keys: m.put(str(key), key * 42) print(m.get_size(), m.get_capacity()) for capacity in range(111, 1000, 117): m.resize_table(capacity) m.put(‘some key’, ‘some value’) result = m.contains_key(‘some key’) m.remove(‘some key’) for key in keys: result &= m.contains_key(str(key)) result &= not m.contains_key(str(key + 1)) print(capacity, result,m.get_size(), m.get_capacity(), round(m.table_load(), 2))Output: 77 163 111 True 77 227 0.34 228 True 77 229 0.34 345 True 77 347 0.22 462 True 77 463 0.17 579 True 77 587 0.13 696 True 77 701 0.11 813 True 77 821 0.09 930 True 77 937 0.08get(self, key: str) -> object: This method returns the value associated with the given key. If the key is not in the hash map, the method returns None. Example #1: m = HashMap(31, hash_function_1) print(m.get(‘key’)) m.put(‘key1’, 10) print(m.get(‘key1’)) Output: None 10Example #2: m = HashMap(151, hash_function_2) for i in range(200, 300, 7): m.put(str(i), i * 10) print(m.get_size(), m.get_capacity()) for i in range(200, 300, 21): print(i, m.get(str(i)), m.get(str(i)) == i * 10) print(i + 1, m.get(str(i + 1)), m.get(str(i + 1)) == (i + 1) * 10) Output: 15 151 200 2000 True 201 None False 221 2210 True 222 None False 242 2420 True 243 None False 263 2630 True 264 None False 284 2840 True 285 None Falsecontains_key(self, key: str) -> bool: This method returns True if the given key is in the hash map, otherwise it returns False. An empty hash map does not contain any keys.Example #1: m = HashMap(53, hash_function_1) print(m.contains_key(‘key1’)) m.put(‘key1’, 10) m.put(‘key2’, 20) m.put(‘key3’, 30) print(m.contains_key(‘key1’)) print(m.contains_key(‘key4’)) print(m.contains_key(‘key2’)) print(m.contains_key(‘key3’)) m.remove(‘key3’) print(m.contains_key(‘key3’))Output: False True False True True FalseExample #2: m = HashMap(79, hash_function_2) keys = [i for i in range(1, 1000, 20)] for key in keys: m.put(str(key), key * 42) print(m.get_size(), m.get_capacity()) result = True for key in keys: # all inserted keys must be present result &= m.contains_key(str(key)) # NOT inserted keys must be absent result &= not m.contains_key(str(key + 1)) print(result)Output: 50 163 Trueremove(self, key: str) -> None: This method removes the given key and its associated value from the hash map. If the key is not in the hash map, the method does nothing (no exception needs to be raised).Example #1: m = HashMap(53, hash_function_1) print(m.get(‘key1’)) m.put(‘key1’, 10) print(m.get(‘key1’)) m.remove(‘key1’) print(m.get(‘key1’)) m.remove(‘key4’) Output: None 10 Noneclear(self) -> None: This method clears the contents of the hash map. It does not change the underlying hash table capacity. Example #1: m = HashMap(101, hash_function_1) print(m.get_size(), m.get_capacity()) m.put(‘key1’, 10) m.put(‘key2’, 20) m.put(‘key1’, 30) print(m.get_size(), m.get_capacity()) m.clear() print(m.get_size(), m.get_capacity()) Output: 0 101 2 101 0 101Example #2: m = HashMap(53, hash_function_1) print(m.get_size(), m.get_capacity()) m.put(‘key1’, 10) print(m.get_size(), m.get_capacity()) m.put(‘key2′, 20) print(m.get_size(), m.get_capacity()) m.resize_table(100) print(m.get_size(), m.get_capacity()) m.clear() print(m.get_size(), m.get_capacity())Output: 0 53 1 53 2 53 2 101 0 101get_keys_and_values(self) -> DynamicArray: This method returns a dynamic array where each index contains a tuple of a key/value pair stored in the hash map. The order of the keys in the dynamic array does not matter. Example #1: m = HashMap(11, hash_function_2) for i in range(1, 6): m.put(str(i), str(i * 10)) print(m.get_keys_and_values()) m.resize_table(2) print(m.get_keys_and_values()) m.put(’20’, ‘200’) m.remove(‘1’) m.resize_table(12) print(m.get_keys_and_values())Output: [(‘1′, ’10’), (‘2′, ’20’), (‘3′, ’30’), (‘4′, ’40’), (‘5′, ’50’)] [(‘1′, ’10’), (‘2′, ’20’), (‘3′, ’30’), (‘4′, ’40’), (‘5′, ’50’)] [(‘4′, ’40’), (‘5′, ’50’), (’20’, ‘200’), (‘2′, ’20’), (‘3′, ’30’)]__iter__(): This method enables the hash map to iterate across itself. Implement this method in a similar way to the example in the Exploration: Encapsulation and Iterators. You ARE permitted (and will need to) initialize a variable to track the iterator’s progress through the hash map’s contents.You can use either of the two models demonstrated in the Exploration – you can build the iterator functionality inside the HashMap class, or you can create a separate iterator class. Example #1: m = HashMap(10, hash_function_1) for i in range(5): m.put(str(i), str(i * 10)) print(m) for item in m: print(‘K:’, item.key, ‘V:’, item.value) Output: 0: None 1: None 2: None 3: None 4: K: 0 V: 0 TS: False 5: K: 1 V: 10 TS: False 6: K: 2 V: 20 TS: False 7: K: 3 V: 30 TS: False 8: K: 4 V: 40 TS: False 9: None 10: None K: 0 V: 0 K: 1 V: 10 K: 2 V: 20 K: 3 V: 30 K: 4 V: 40__next__(): This method will return the next item in the hash map, based on the current location of the iterator. Implement this method in a similar way to the example in the Exploration: Encapsulation and Iterators. It will need to only iterate over active items.Example #2: m = HashMap(10, hash_function_2) for i in range(5): m.put(str(i), str(i * 24)) m.remove(‘0’) m.remove(‘4’) print(m) for item in m: print(‘K:’, item.key, ‘V:’, item.value) Output: 0: None 1: None 2: None 3: None 4: K: 0 V: 0 TS: True 5: K: 1 V: 24 TS: False 6: K: 2 V: 48 TS: False 7: K: 3 V: 72 TS: False 8: K: 4 V: 96 TS: True 9: None 10: None K: 1 V: 24 K: 2 V: 48 K: 3 V: 72

[SOLVED] Cs261 main/programming assignment 2

This assignment is comprised of 3 parts: Part 1: Implementation of Dynamic Array, Stack, and Bag First, complete the Worksheets 14 (Dynamic Array), 15 (Dynamic Array Amortized Execution Time Analysis), 16(Dynamic Array Stack), and 21 (Dynamic Array Bag). These worksheets will get you started on the implementations,but you will NOT turn them in. Next, complete the dynamic array and the dynamic array-based implementation of a stack and a bag indynamicArray.c. The comments for each function will help you understand what each function should do. We have provided the header file for this assignment, DO NOT change the provided header file (dynArray.h). You can test your implementation by using the code in testDynArray.c. This file contains several test cases for thefunctions in dynamicArray.c. Try to get all the test cases to pass. You should also write more test cases on your own,but do not submit testDynArray.c. Part 2: Amortized Analysis of the Dynamic Array (written) Consider the push() operation for a Dynamic Array Stack. In the best case, the operation is O(1). This corresponds tothe case where there was room in the space we have already allocated for the array. However, in the worst case, thisoperation slows down to O(n). This corresponds to the case where the allocated space was full and we must copy eachelement of the array into a new (larger) array. This problem is designed to discover runtime bounds on the average casewhen various array expansion strategies are used, but first some information on how to perform an amortized analysisis necessary. 1. Each time an item is added to the array without requiring reallocation, count 1 unit of cost. This cost will coverthe assignment which actually puts the item in the array. 2. Each time an item is added and requires reallocation, count X + 1 units of cost, where X is the number of itemscurrently in the array. This cost will cover the X assignments which are necessary to copy the contents of thefull array into a new (larger) array, and the additional assignment to put the item which did not fit originally. To make this more concrete, if the array has 8 spaces and is holding 5 items, adding the sixth will cost 1. However, if the array has 8 spaces and is holding 8 items, adding the ninth will cost 9 (8 to move the existing items + 1 to assignthe ninth item once space is available). When we can bound an average cost of an operation in this fashion, but not bound the worst case execution time, wecall it amortized constant execution time, or average execution time. Amortized constant execution time is oftenwritten as O(1)+, the plus sign indicating it is not a guaranteed execution time bound. In a file called amortizedAnalysis.txt, please provide answers to the following questions: 1. How many cost units are spent in the entire process of performing 32 consecutive push operations on an empty array which starts out at capacity 8, assuming that the array will double in capacity each time a new item is added to an already full dynamic array? As N (ie. the number of pushes) grows large, under this strategy for resizing, what is the amortized complexity for a push? 2. How many cost units are spent in the entire process of performing 32 consecutive push operations on an emptyarray which starts out at capacity 8, assuming that the array will grow by a constant 2 spaces each time a new item is added to an already full dynamic array? As N (ie. the number of pushes) grows large, under this strategy for resizing, what is the amortized complexity for a push? 3. Suppose that a dynamic array stack doubles its capacity when it is full, and shrinks (on Pop only) its capacity by half when the array is half full or less. Can you devise a sequence of N push() and pop() operations which willresult in poor performance (O(N^2) total cost)? How might you adjust the array’s shrinking policy to avoid this? (Hint: You may assume that the initial capacity of the array is N/2.) Part 3: Application of the Stack – &KHFNLQJEDODQFHGSDUHWKHVHVEUDFHVDQGEUDFNHWV 1RWH)RUWKLVH[HUFLVHRXQHHGWRILUVWPDNHWKHIROORZLQJFKDQJHLQGQ$UUDK&KDQJHGHILQH7

[SOLVED] Cmsc 430 project 4

The fourth project involves modifying the semantic analyzer for the attached compiler by adding checks for semantic errors.The static semantic rules of this language are the following: Variables and parameter names have local scope.The scope rules require that all names be declared and prohibit duplicate names within the same scope. The type correspondence rules are as follows: Boolean expressions cannot be used with arithmetic or relational operators.  Arithmetic expressions cannot be used with logical operators.  Reductions can only contain numeric types.  Only integer operands can be used with the remainder operator. The two statements in an if statement must match in type. No coercion is performed.  All the statements in a case statement must match in type. No coercion is performed.  The type of the if expression must be Boolean. The type of the case expression must be Integer  A narrowing variable initialization or function return occurs when a real value is being forced into integer. Widening is permitted.  Boolean types cannot be mixed with numeric types in variable initializations or function returns.Type coercion from an integer to a real type is performed within arithmetic expressions. You must make the following semantic checks. Those highlighted in yellow are already performed by the code that you have been provided, although you are must make minor modifications to account for the addition of real types and the need to perform type coercion and to handle the additional arithmetic and logical operators. Using Boolean Expressions with Arithmetic Operator  Using Boolean Expressions with Relational Operator  Using Arithmetic Expressions with Logical Operator  Reductions containing nonnumeric types  Remainder Operator Requires Integer Operands  If-Then Type Mismatch  Case Types Mismatch  If Condition Not Boolean  Case Expression Not Integer  Narrowing Variable Initialization  Variable Initialization Mismatch  Undeclared Variable  Duplicate Variable  Narrowing Function ReturnThis project requires modification to the bison input file, so that it defines the additional semantic checks necessary to produce these errors and addition of functions to the library of type checking functions already provided in types.cc. You must also make some modifications to the functions provided.You need to add a check to the checkAssignment function for mismatched types in the case that Boolean and numeric types are mixed. You need to also add code to the checkArithmetic function to coerce integers to reals when the types are mixed and the error message must be modified to indicate that numeric rather than only integer types are permitted.The provided code includes a template class Symbols that defines the symbol table. It already includes a check for undeclared identifiers. You need to add a check for duplicate identifiers. Like the lexical and syntax errors, the compiler should display the semantic errors in the compilation listing, after the line in which they occur.An example of compilation listing output containing semantic errors is shown below: 1 — Test of Multiple Semantic Errors 2 3 function test a: integer returns integer; 4 b: integer is 5 if a + 5 then 6 2; 7 else 8 5; 9 endif; Semantic Error, If Expression Must Be Boolean10 c: real is 9.8 – 2 + 8; 11 d: boolean is 7 = f; Semantic Error, Undeclared f 12 begin 13 case b is 14 when 1 => 4.5 + c; 15 when 2 => b; Semantic Error, Case Types Mismatch 16 others => c; 17 endcase; 18 end; Lexical Errors 0 Syntax Errors 0 Semantic Errors 3You are to submit two files. 1. The first is a .zip file that contains all the source code for the project. The .zip file should contain the flex input file, which should be a .l file, the bison file, which should be a .y file, all .cc and .h files and a makefile that builds the project.2. The second is a Word document (PDF or RTF is also acceptable) that contains the documentation for the project, which should include the following:a. A discussion of how you approached the project b. A test plan that includes test cases that you have created indicating what aspects of the program each one is testing and a screen shot of your compiler run on that test case c. A discussion of lessons learned from the project and any improvements that could be madeGrading Rubric Criteria Meets Does Not Meet Functionality 70 points 0 points Generates semantic error when a remainder operator has non-integer operands (10) Does not generate semantic error when a remainder operator has noninteger operands (0) Generates semantic error when if and then types don’t match (10) Does not generate semantic error when if and then types don’t match (0) Generates semantic error when case types don’t match (10) Does not generate semantic error when case types don’t match (0) Generates semantic error when if condition is not Boolean (10) Does not generates semantic error when if condition is not Boolean (0) Generates semantic error when case expression is not integer (10) Does not generate semantic error when case expression is not integer (0) Generates semantic error on narrowing initialization (10) Does not generate semantic error on narrowing initialization (0) Generates semantic error for duplicate variables (10) Does not generate semantic error for duplicate variables (0) Test Cases 15 points 0 points Includes test cases that test all type checking errors (10) Does not Include test cases that test all type checking errors (0) Includes test cases that test all symbol table errors (3) Does not include test cases that test all symbol table errors (0) Includes test case with multiple errors (2) Does not include test case with multiple errors (0) Documentation 15 points 0 pointsDiscussion of approach included (5) Discussion of approach not included (0) Lessons learned included (5) Lessons learned not included (0) Comment blocks with student name, project, date and code description included in each file (5) Comment blocks with student name, project, date and code description not included in each file (0)

[SOLVED] Cmsc 430 project 3

The third project involves modifying the attached interpreter so that it interprets programs for the complete language.You may convert all values to double values, although you can maintain their individual types if you wish. When the program is run on the command line, the parameters to the function should be supplied as command line arguments.For example, for the following function header of a program in the file text.txt: function main a: integer, b: integer returns integer; One would execute the program as follows: $ ./compile < test.txt 2 4In this case, the parameter a would be initialized to 2 and the parameter b to 4. An example of a program execution is shown below: $ ./compile < test.txt 2 4 1 function main a: integer, b: integer returns integer; 2 c: integer is 3 if a > b then 4 a rem b; 5 else 6 a ** 2; 7 endif; 8 begin 9 case a is 10 when 1 => c; 11 when 2 => (a + b / 2 – 4) * 3; 12 others => 4; 13 endcase; 14 end; Compiled Successfully Result = 0After the compilation listing is output, the value of the expression which comprises the body of the function should be displayed as shown above.The existing code evaluates some of the arithmetic, relational and logical operators together with the reduction statement and integer literals only. You are to add the necessary code to include all of the following:  Real and Boolean literals  All additional arithmetic operators  All additional relational and logical operators  Both if and case statements  Functions with multiple variables  Functions with parametersThis project requires modification to the bison input file, so that it defines the additional the necessary computations for the above added features. You will need to add functions to the library of evaluation functions already provided in values.cc. You must also make some modifications to the functions already provided.You are to submit two files. 1. The first is a .zip file that contains all the source code for the project. The .zip file should contain the flex input file, which should be a .l file, the bison file, which should be a .y file, all .cc and .h files and a makefile that builds the project.2. The second is a Word document (PDF or RTF is also acceptable) that contains the documentation for the project, which should include the following:a. A discussion of how you approached the project b. A test plan that includes test cases that you have created indicating what aspects of the program each one is testing and a screen shot of your compiler run on that test casec. A discussion of lessons learned from the project and any improvements that could be madeGrading Rubric Criteria Meets Does Not Meet Functionality 70 points 0 points Functions with real and Boolean literals evaluated correctly (5) Functions with real and Boolean literals not evaluated correctly (0) Subtraction and division operators evaluated correctly (5) Subtraction and division operators not evaluated correctly (0) Remainder operator evaluated correctly (5) Remainder operator not evaluated correctly (0) Exponentiation operator evaluated correctly (5) Exponentiation operator not evaluated correctly (0) Additional relational operators evaluated correctly (5) Additional relational operators not evaluated correctly (0) Additional logical operators evaluated correctly (5) Additional logical operators not evaluated correctly (0) if conditional expressions evaluated correctly (10) if conditional expressions not evaluated correctly (0) case conditional expressions evaluated correctly (10) case conditional expressions not evaluated correctly (0) Functions with multiple variables evaluated correctly (10) Functions with multiple variables not evaluated correctly (0) Functions with parameters evaluated correctly (10) Functions with parameters not evaluated correctly (0) Test Cases 15 points 0 points Includes test cases that test real and Boolean literals (3) Does not Include test cases that test real and Boolean literals (3) Includes test cases that test all arithmetic operators (3) Does not include test cases that test all arithmetic operators (0) Includes test cases that test all relational and logical operators (3) Does not include test cases that test all relational and logical operators (0) Includes test cases that test both conditional expressions (3) Does not include test cases that test both conditional expressions (0) Includes test cases with variables and parameters (3) Does not include test cases with variables and parameters (0)Documentation 15 points 0 points Discussion of approach included (5) Discussion of approach not included (0) Lessons learned included (5) Lessons learned not included (0) Comment blocks with student name, project, date and code description included in each file (5) Comment blocks with student name, project, date and code description not included in each file (0)

[SOLVED] Cmsc 430 project 2

The second project involves modifying the syntactic analyzer for the attached compiler by adding to the existing grammar. The full grammar of the language is shown below. The highlighted portions of the grammar show what you must either modify or add to the existing grammar.function: function_header {variable} body function_header: FUNCTION IDENTIFIER [parameters] RETURNS type ; variable: IDENTIFIER : type IS statement parameters: parameter {, parameter} parameter: IDENTIFIER : type type: INTEGER | REAL | BOOLEAN body: BEGIN statement END ; statement: expression ; |REDUCE operator {statement} ENDREDUCE ; | IF expression THEN statement ELSE statement ENDIF ; | CASE expression IS {case} OTHERS ARROW statement ; ENDCASE ; operator: ADDOP | MULOP case: WHEN INT_LITERAL ARROW statement expression: ( expression ) | expression binary_operator expression | NOT expression | INT_LITERAL | REAL_LITERAL | BOOL_LITERAL | IDENTIFIERbinary_operator: ADDOP | MULOP | REMOP | EXPOP | RELOP | ANDOP | OROP In the above grammar, the red symbols are nonterminals, the blue symbols are terminals and the black punctuation are EBNF metasymbols. The braces denote repetition 0 or more times and the brackets denote optional.You must rewrite the grammar to eliminate the EBNF brace and bracket metasymbols and to incorporate the significance of parentheses, operator precedence and associativity for all operators. Among arithmetic operators the exponentiation operator has highest precedence following by the multiplying operators and then the adding operators.All relational operators have the same precedence. Among the binary logical operators, and has higher precedence than or. Of the categories of operators, the unary logical operator has highest precedence, the arithmetic operators have next highest precedence, followed by the relational operators and finally the binary logical operators.All operators except the exponentiation operator are left associative. The directives to specify precedence and associativity, such as %prec and %left, may not be used Your parser should be able to correctly parse any syntactically correct program without any problem.You must modify the syntactic analyzer to detect and recover from additional syntax errors using the semicolon as the synchronization token. To accomplish detecting additional errors an error production must be added to the function header and another to the variable declaration.Your bison input file should not produce any shift/reduce or reduce/reduce errors. Eliminating them can be difficult so the best strategy is not introduce any. That is best achieved by making small incremental additions to the grammar and ensuring that no addition introduces any such errors.An example of compilation listing output containing syntax errors is shown below: 1 — Multiple errors 2 3 function main a integer returns real; Syntax Error, Unexpected INTEGER, expecting ‘:’ 4 b: integer is * 2; Syntax Error, Unexpected MULOP5 c: real is 6.0; 6 begin 7 if a > c then 8 b 3.0; Syntax Error, Unexpected REAL_LITERAL, expecting ‘;’ 9 else 10 b = 4.; 11 endif; 12 ; Syntax Error, Unexpected ‘;’, expecting END Lexical Errors 0 Syntax Errors 4 Semantic Errors 0You are to submit two files.  The first is a .zip file that contains all the source code for the project. The .zip file should contain the flex input file, which should be a .l file, the bison file, which should be a .y file, all .cc and .h files and a makefile that builds the project. The second is a Word document (PDF or RTF is also acceptable) that contains the documentation for the project, which should include the following:a. A discussion of how you approached the project b. A test plan that includes test cases that you have created indicating what aspects of the program each one is testing and a screen shot of your compiler run on that test case c. A discussion of lessons learned from the project and any improvements that could be madeGrading Rubric Criteria Meets Does Not Meet Functionality 70 points 0 points Parses all syntactically correct programs (25) Does not parse all syntactically correct programs (0)Productions correctly implement precedence and associativity (10) Productions do not correctly implement precedence and associativity (0) Grammar contains no shift/reduce or reduce/reduce errors (5) Grammar contains shift/reduce or reduce/reduce errors (0) Detects and recovers from all programs with single syntax errors (20) Does not detect and recover from errors in the function header (0) Detects and recovers from a program with multiple syntax errors (10) Does not detect and recover from multiple errors (0) Test Cases 15 points 0 points Includes test cases that test all grammar productions (6) Does not include test cases that test all grammar productions (0) Includes test cases that test errors in all productions (6) Does not include test cases that test errors in all productions (0) Includes test case with multiple errors (3) Does not include test case with multiple errors (0) Documentation 15 points 0 pointsDiscussion of approach included (5) Discussion of approach not included (0) Lessons learned included (5) Lessons learned not included (0) Comment blocks with student name, project, date and code description included in each file (5) Comment blocks with student name, project, date and code description not included in each file (0)

[SOLVED] Cmsc 430 project 1

The first project involves modifying the attached lexical analyzer and the compilation listing generator code. You need to make the following modifications to the lexical analyzer, scanner.l: 1. A new token ARROW should be added for the two character punctuation symbol =>. 2. The following reserved words should be added: case, else, endcase, endif, if, others, real, then, when Each reserved words should be a separate token. The token name should be the same as the lexeme, but in all upper case. 3. Two additional logical operators should be added. The lexeme for the first should be or and its token should be OROP. The second logical operator added should be not and its token should be NOTOP. 4. Five relational operators should be added. They are =, /=, >, >= and 6 and 8 = 5 * (7 – 4); 6 end; Compiled Successfully Here is the required output for a program that contains more than one lexical error on the same line: 1 — Function with two lexical errors 2 3 function test2 returns integer; 4 begin 5 7 $ 2 ^ (2 + 4); Lexical Error, Invalid Character $ Lexical Error, Invalid Character ^ 6 end; Lexical Errors 2 Syntax Errors 0 Semantic Errors 0 You are to submit two files. 1. The first is a .zip file that contains all the source code for the project. The .zip file should contain the flex input file, which should be a .l file, all .cc and .h files and a makefile that builds the project. 2. The second is a Word document (PDF or RTF is also acceptable) that contains the documentation for the project, which should include the following: a. A discussion of how you approached the project b. A test plan that includes test cases that you have created indicating what aspects of the program each one is testing and a screen shot of your compiler run on that test case c. A discussion of lessons learned from the project and any improvements that could be made Grading Rubric Criteria Meets Does Not Meet Functionality 70 points 0 points Defines new comment lexeme (5) Does not define new comment lexeme (0) Correctly modifies identifier definition to include underscores (5) Does not correctly modify identifier definition to include underscores (0) Adds real and Boolean tokens (5) Does not add real and Boolean tokens (0) Defines additional logical operators (5) Does not define additional logical operators (0) Defines additional relational operators (5) Does not define additional relational operators (0) Defines additional arithmetic operators (5) Does not define additional arithmetic operators (0) Defines additional reserved words and arrow symbol(5) Does not define additional reserved words and arrow symbol (0) Adds new tokens to the token header file (5) Does not add new tokens to the token header file (0) Implements modifications to display multiple errors on the same line (15) Does not implement modifications to display multiple errors on the same line (0) Implements modifications to count and display each type of compilation error (15) Does not Implement modifications to count and display each type of compilation error (0) Test Cases 15 points 0 points Includes test case containing all lexemes (5) Does not include test case containing all lexemes (0) Includes test case with multiple errors on one line (5) Does not include test case with multiple errors on one line (0) Includes test case with no errors (5) Does not include test case with no errors (0) Documentation 15 points 0 points Discussion of approach included (5) Discussion of approach not included (0) Lessons learned included (5) Lessons learned not included (0) Comment blocks with student name, project, date and code description included in each file (5) Comment blocks with student name, project, date and code description not included in each file (0)

[SOLVED] Statistics 215b assignment 5

Math stats Work the following exercises in Efron (2010): 1.1, 1.2, 1.4, 1.5. Simulation Produce your own version of Table 1.2 in Efron (2010) by repeating the simulation study described on pp. 7-9. Use the same i’s as Efron. Explain how many decimal places of agreement one would expect to see between your results and Efron’s. How well did you meet this expectation?Shrinking radon The file srrs2.dat contains 12,777 observed radon levels from households throughout the United States. This data file comes from Andrew Gelman’s website, http://www.stat.columbia.edu/~gelman/arm/software/. We will focus on the 766 measurements taken in the basements of the Minnesota homes. These homes are spread across 85 counties in Minnesota; the data set tells us which observations came from which counties. Load the data into R. Extract the subset of observations taken in Minnesota basements. Although there is a basement variable, you should instead use the floor variable—a zero value means a basement. (Don’t ask.) Reduce the data set further: keep only the data for counties with at least 10 observations.You should find 17 such counties, with a total of 511 observations. Now split the data into two sets: a training set with five randomly chosen observations from each county, and a test set with the other observations. Compute , the vector of mean radon levels by county in the test data. Radon levels are given in the variable activity. From now on we will treat as a population-level parameter to be estimated. Make the standard James-Stein independent-normals assumption: the five observations in county i are iid draws from a N .i ; 2 / distribution; these five draws are independent of the draws from every other county. Compute O .MLE/ , the maximum-likelihood estimate of based on the training data. Now compute O .JS/ , the James-Stein estimator, using the average value in O .MLE/ as the shrinkage target. We are assuming that the components of O .MLE/ share a common SE. Using the same number of observations in each county tends to aid this assumption. To estimate this shared SE, you must estimate 2 , using the pooled-variance technique: add up all the within-county squared residuals, and divide by the total degrees of freedom. Caution: The SE of O .MLE/ i is not . If you proceed as though it is, you will over-shrink. What is the total squared error of O .MLE/ ? Of O .JS/ ? What is the ratio of the larger to the smaller? What do you conclude about Stein shrinkage in this application?

[SOLVED] Statistics 215b assignment 4

For this assignment you will use simulation to study the performance of OLS and IVLS coefficient estimators, as well as two different estimators of the error variance. Consider the model Yi D Xiˇ C i ; (1) Xi D Ui C 2Vi C ıi : Everything in sight is scalar. The vectors .Ui ; Vi ; i ; ıi/; i D 1; : : : ; n; are iid across i. Each vector is normal with mean zero. For each i we take the three random objects (1) Ui , (2) Vi , and (3) .i ; ıi/ to be mutually independent; furthermore, Var.Ui/ D Var.Vi/ D 1; Var.i/ D Var.ıi/ D 2 ; and Cov.i ; ıi/ D : In (1), the variable Xi is endogenous (explain why), and .Ui ; Vi/ are instruments (explain why). Suppose ˇ D 3, 2 D 1, and D 3=4. For each of 1,000 simulation runs, generate n D 100 independent realizations of the vector .Ui ; Vi ; i ; ıi ; Xi ; Yi/ according to (1) and (2). Use OLS to obtain the estimate ˇO OLS, and IVLS to obtain ˇO IVLS. Plot the histogram for each estimator; report the mean, SD, and RMSE in each case. What are the relative merits of OLS versus IVLS? For each simulation, estimate the error variance 2 in two ways: first using the residuals obtained from plugging ˇO IVLS into (1), then using the residuals from the transformed equation .Z0Z/1=2Z 0Y D .Z0Z/1=2Z 0Xˇ C : Here Z is the n 2 matrix of instruments, X is the n 1 design matrix, and Y is the n 1 vector of responses. What is the appropriate denominator in each case? Plot the histograms for the two estimators and report sample means and SDs. Comment briefl

[SOLVED] Statistics 215b assignment 3

The Police Foundation and the Metro-Dade Police Department ran a big, complicated field experiment. The execution was excellent. The data analysis in Pate and Hamilton (1992) (hereafter PH), on the other hand, is a let-down. Randomization does not justify logistic regression, as we have seen. Your mission: carry out an analysis of the Dade County experimental data that is justified by the randomization.A simple and effective approach is to compare rates of recidivism (that is, repeat spousal abuse) between the treatment group, who were assigned to arrest, and the control group, who were not. Also of interest is the same rate comparison, in each of two subgroups: unemployed subjects and employed subjects. To pull off these comparisons, you need the numbers in the following table: no_arrest arrest unemployed n00=N00 n01=N01 n0=N0 employed n10=N10 n11=N11 n1=N1 n0=N0 n1=N1 n=NThe N’s are total subject counts, while the the n’s tally the corresponding number of recidivist subjects. The dots denote summation over an index; for example, n0 D n00 C n10. You might expect to find these various counts in an appendix to the paper. But you will not. The only count directly reported in PH is N D 907, the total number of randomized subjects.To figure out the N’s, use the data file part6_907.txt. This file is derived from a dataset available on the National Institute of Justice website. It has one row per subject.1 With the help of its columns, you can cross-tabulate the total subject counts by employment and assigned-treatment status. In addition to the data file, you have been given an excerpt from the experiment’s “Codebook,” explaining what the columns are and how the numerical codes are interpreted. Compute all the N’s, including the margins. PH report the rate of unemployment among the subjects. Does your rate agree with theirs? If the data file had a column for recidivism, you’d figure out the n’s in the same way. Alas, recidivism outcomes appear in a separate file in the dataset—“Part 4”. Why don’t we just match up the records in these two files? A quote from the Codebook: 1The original file had 916 rows, for reasons of little interest. Applying an imputation procedure in reverse, this has been reduced to 907.“Each of the six data files contain at least one variable to identify cases. However, one common case identification variable is not present across all files. At the time of this release, ICPSR had not been successful in linking all files.” Welcome to applied statistics. As luck would have it, PH provide enough information to recover the n’s. See their Figure 1. Compute all the n’s, including the margins. In a part of the paper separate from Figure 1 and its discussion, PH report the rate of recidivism among arrestees and among non-arrestees. Do your rates agree with theirs?Statistical work PH draw several conclusions from their logistic-regression analyses: “Among employed suspects, arrest had a statistically significant deterrent effect on the occurrence of a subsequent assault.” “Among unemployed suspects. . . significant increases in subsequent assault were associated with arrest.” “[Among all suspects, there is] no statistically significant effect of arrest on the occurrence of a subsequent spouse assault.”Evaluate each of these conclusions in turn, by comparing the relevant observed rates in your hardwon counts table. Report p-values justified by the randomization that took place. Applicable methods include Fisher’s exact test and the two-sample test of equal binomial proportions. One of the assumptions underlying Fisher’s exact test: the total number of observed recidivists (overall, and in each employment-status subgroup) would not change if there had been a different randomization outcome in the Dade County experiment. Discuss whether this assumption is compatible with the Neyman model of the experiment.On the other hand, the binomial test as rendered in textbooks concerns independent Bernoulli trials. We are not thinking of recidivism outcomes as random coin flips (unlike PH). Instead, the treatment assignment is what’s random. How then can the randomization justify the textbook p-value?

[SOLVED] Statistics 215b assignment 2

Define the hazard function for any disease as H.t/ D f .t/ 1 F .t/ where F .t / and f .t / are the distribution function and density, respectively, for the time to first onset. Thus H.t/ D lim !0 1 P.Get disease during time t to t C j Never had disease up to time t/ What is the hazard function if time to first onset is modelled with the exponential distribution? Is this a reasonable model? Explain.One alternative is to use the Weibull distribution, which has cumulative distribution function G˛;ˇ .u/ D 1 exp˚ .u=˛/ ˇ; u > 0; ˛ > 0; ˇ > 0: What is the hazard function if time to failure has distribution G˛;ˇ ?Survival curves Consider the following stylized model of a study to test the effectiveness of a new surgical procedure. One thousand individuals are enrolled into the study, half receiving the surgery and half serving as the control group—no surgery. The time to death by any cause is measured in years, up to a maximum of five years, at which time the study ends.The ith participant in the study carries two random values: time to death if assigned to receive the surgery, Xi , and time to death if assigned to the control group, Yi . Assume that f.Xi ; Yi/ W i D 1; 2 : : : ; Ng are drawn independently with Xi and Yi having distributions G3;2 and G2;2, respectively, as defined in part one. Two survival functions can be defined as SX .t/ D P.Individual i lives past time t j Individual i receives surgery/ D P.Xi > t/ and SY .t/ D P.Individual i lives past time t j Individual i is in control group/ D P.Yi > t/: 1. Write an R function to simulate draws of a G˛;ˇ random variable. Your function should take as input a sample size n as well as ˛ and ˇ. It should return a length-n vector of independent realizations of G˛;ˇ . Your function must not use any of R’s random sampling routines apart from runif. 2. Write an R function that creates a Kaplan-Meier survival-function estimate. The input is two length-n vectors. The first contains event times. The second vector, parallel to the first, contains logical values: TRUE for observed deaths, FALSE for censoring events. The return value should be the estimated survival function: an R function which, when evaluated at t, returns the Kaplan-Meier estimate of surviving past t. Your function must not use anything from the survival package or any similar package: the requirement is for you to build a Kaplan-Meier estimate “from scratch”. If you are in any doubt about whether a supporting function is permissible, check with the course staff.3. Simulate the performance of the clinical trial by drawing a sample of 500 failure times from G3;2 and 500 from G2;2. Estimate SX and SY using Kaplan-Meier. Graphically compare these estimates to the true curves. (Recall that the study has a length of five years.) Note: do not be surprised by low survival rates at the five-year horizon. The people in this study are very sick.4. The simulation in (3) did not include the possibility of censoring. Assume that for individual i there is another random variable, denoted Zi , that gives the time at which i will be censored—if he lives that long. Consider the case when the Zi are i.i.d. exponential random variables with mean 10, chosen independently of Xi and Yi . Simulate censoring times under this scenario and create new Kaplan-Meier survival curves. Compare these with the true survival curves.5. Repeat the previous exercise, but now suppose the Zi are independent exponential random variables whose mean depends on the individual’s time of death. If the time of death is less than two years, the mean of the distribution of Zi is 10; otherwise, the mean is 5. This could arise in a study where the sicker patients are more likely to remain under the care of their doctors. Discuss the results. What is the key difference between this censoring scenario and the previous one?

[SOLVED] Statistics 215b assignment 1

Overview This assignment has two goals: to exercise your skills in using R for data analysis, and to recall basic ideas from descriptive statistics, visualization, hypothesis testing, and multiple linear regression. Your job in this assignment is to investigate the connection between maternal smoking and infant health, using data. You will accomplish this by working through a guided analysis, detailed below. This case study is adapted from Chapter 10 of Nolan and Speed (2000), but the presentation here is self-contained.Please read the entire assignment before you begin your work. Maternal smoking and infant health Nolan and Speed (2000) present the following quotation from the 1989 Report of the Surgeon General: . . . cigarette smoking seems to be a more significant determinant of birth weight than the mother’s pre-pregnancy height, weight, parity, payment status, or history of previous pregnancy outcome, or the infant’s sex. The reduction in birth-weight associated with maternal tobacco use seems to be a direct effect of smoking on fetal growth. Mothers who smoke also have increased rates of premature delivery. (“Parity” refers to whether or not a pregnant woman has previously given birth. “Payment status” has to do with the type of the mother’s pre-natal health insurance.) We can isolate two claims: 1. Mothers who smoke deliver premature babies more often than mothers who do not. 2. Cigarette smoking has a stronger relationship to infant birth weight than several other relevant covariates.At the risk of stating the obvious, premature delivery and small, underweight newborns are bad things. The first step in deciding whether maternal smoking causes these bad outcomes is to figure out whether maternal smoking is associated with them; the latter is the content of these claims. You will study the claims in turn. The dataset forming the basis of your analysis is (a subset of) the Child Health and Development Studies (CHDS), a large survey on all babies born between 1960 and 1967 at the Kaiser Foundation Hospital in Oakland, California. On the course website is the file babies.data. It contains observations (rows) for 1236 live male births. The variables recorded for each birth are given in the following table:Name Description bwt Newborn weight (rounded to the nearest ounce) gestation Length of the pregnancy (days) parity Whether the baby is (1) or is not (0) the first-born age Age of the mother at conception (years) height Mother’s height (inches) weight Mother’s weight (pounds) smoke Whether the mother smokes (1) or not (0)What to submit Write a report which addresses your findings about the claims. Summarize each claim in your own words, as you understand it. For each claim, outline why your analysis of the data ought to be informative, explain the practical meaning of the possible analysis outcomes, report what outcome you obtained, and describe your conclusions. Some specific guidelines appear in subsequent sections of this document. Refer to figures and tables obtained from your R session whenever it seems helpful. Please remember to give every figure a title, axis labels with units, and (where appropriate) a legend. I strongly encourage you to install and use the R package ggplot2 to make your figures—once you learn how to use it, many otherwise difficult graphical tasks become simple one-line commands.The report should be long enough to convey what you understood about the content of the claims, and how strong a case is made for or against them by this data. The report should be no longer than that. The report should be written using LATEX, and submitted in pdf format.Your submission should include three files: 1. a file assignment1.pdf containing your report; 2. a file assignment1.R containing all the R commands you used for your analyses; 3. a file assignment1-transcript.Rt containing a transcript of an R session in which assignment1.R has been run without errors. Please submit these materials through the course website before the due date.Preparing the data Download the data file from the website and load it into R, as a data frame named babies. The variables gestation, age, height, weight, and smoke all have some missing values. The code for a missing value is not exactly the same across the variables. Figure out the missingness code for each variable, then replace all occurrences of the missingness code with R’s missing value code, NA. Some of the variables in the dataset are actually categorical, but are coded numerically. Convert these variables from numeric vectors to factors in the babies data frame, with appropriately named levels. Confirm the conversion worked by inspecting a summary of the data frame. Look at a small number of other descriptive statistics or graphics that might be helpful in getting an initial feel for the data. Analyzing claim 1: guidelines Claim 1 states: mothers who smoke deliver premature babies more often than mothers who do not. A full-term pregnancy is defined by the medical community as lasting 40 weeks. A premature birth is defined as occurring prior to the 37th week of gestation. 1. Make one or more suitable graphical comparisons of the gestation distribution for smoking mothers to the gestation distribution of non-smoking mothers.2. Add to the babies data frame a two-level factor variable indicating whether or not each baby was born prematurely. Use this factor and the factor smoke to carry out a relevant tabular comparison of distributions. 3. Make a figure which allows the comparison in the previous bullet point to be carried out visually.4. Use the same table to carry out one or more hypothesis tests of the null hypothesis that smoking and non-smoking mothers have the same rate of premature delivery. 5. A related question is whether the overall average gestation time is shorter for smoking mothers, compared to non-smoking mothers. Conduct one or more appropriate hypothesis tests. 6. If there are other statistics, tables, figures, tests, or analyses that seem useful or important to you in assessing claim 1, produce them and report on them.Analyzing claim 2: guidelines Claim 2 states: Cigarette smoking has a stronger relationship to infant birth weight than several other relevant covariates. The only other covariates available in the data for us to check are parity, age, height, and weight.1. Compare the difference in the average birth-weight between smoking and non-smoking mothers to the difference in the average birth-weight between first-borns and non-first-borns. Conduct suitable hypothesis tests to accompany the comparison.2. Divide the mothers into “tall” (above median height in the data) and “short” (below median height in the data). Repeat the comparison of the previous bullet point for babies born to tall versus short women (rather than for first-borns versus non-first-borns).3. Do the same again, for mothers who are “heavy” (above median weight) and “light” (below median weight). 4. Make a multi-panel figure which allows the comparisons of the previous three bullet points to be carried out visually for whole distributions, rather than averages. Put the y-axes across the panels in exactly the same range, to ease visual comparison.5. Fit a multiple linear regression of birth-weight against height, weight, and parity (but not smoking status). Summarize and check the fit. 6. Fit a second regression like the previous bullet point, but including smoking status. Compare the two regression models informally and formally. Interpret the results of the comparison. 7. What are pros and cons of the multiple-regression approach, as compared to the univariate comparisons you carried out initially?8. If there are other statistics, tables, figures, tests, or analyses that seem useful or important to you in assessing claim 3, produce them and report on them.9. (EXTRA CREDIT) Use the plotting package ggplot2 to produce a single multi-panel figure which does the following: for each bin created in a three-way classification by htall/short, heavy/light, parityi, visually compare the birth-weight distribution of smokers versus nonsmokers.Create the figure using a single R expression that involves only ggplot2 functions. What advantages does this comparison have over the linear regression approach? References Deborah Nolan and Terry Speed. Stat Labs: Mathematical Statistics through Applications. Springer Texts in Statistics. Springer, 2000.

[SOLVED] Cs6250 project 5 bgp hijacking attacks

In this project, using an interactive Mininet demo [1], we will explore some of the vulnerabilities of Border Gateway Protocol (BGP). In particular, we will see how BGP is vulnerable to abuse and manipulation through a class of attacks called BGP hijacking attacks. A malicious Autonomous System (AS) can mount these attacks through false BGP announcements from a rogue AS, causing victim ASes to route their traffic bound for another AS through the malicious AS. This attack succeeds because the false advertisement exploits BGP routing behavior by advertising a shorter path to reach a particular prefix, which causes victim ASes to attempt to use the newly advertised (and seemingly better!) route.A.Browse this paper as a reference for subsequent tasks and for some important background on Prefix Hijack Attacks. B.Refer to this resource on configuring a BGP router with Quagga. C.Check out the following example configurations: Example 1 and Example 2 D.Project Intro Presentation Video Link and Slides from CS6250 in Spring 2019 (there Project 7)The demo creates the network topology shown below, consisting of four ASes and their peering relationships. AS4 is the malicious AS that will mount the attack. Once again, we will be simulating this network in Mininet, however there are some important distinctions to make from our previous projects.In this set up, each container is not a host, but an entire autonomous system. Each AS runs a routing daemon (quagga), communicates with other ASes using BGP (bgpd), and configures its own isolated set of routing entries in the kernel (zebra). Each AS has an IP address, which is the IP address of its border router.NOTE: In this topology solid lines indicate peering relationships and the dotted boxes indicate the prefix advertised by that AS.1. First, download and unzip the Project-5 files (modify permissions if necessary). 2. Next, in the Project-5 directory, start the demo using the following command: o sudo python bgp.py 3. After loading the topology, the Mininet CLI should be visible. Keep this terminal open throughout the experiment.4. Start another terminal and navigate to the Project-5 directory. We will use this terminal to start a remote session with AS1’s routing daemon: o ./connect.sh 5. This script will start quagga, which will require access verification. The password is: o en 6. Next, use the following commands to start the admin shell and view the routing table entries for AS1: o en o You will be prompted for the password again, retype en o sh ip bgp7. You should see output very much like the screen grab below. In particular, notice that AS1 has chosen the path via AS2 and AS3 to reach the prefix 13.0.0.0/8: 9.Next, let’s verify that network traffic is traversing this path. Open a third terminal and navigate to the Project-5 directory. In this terminal you will start a script that continuously makes web requests from a host within AS1 to a web server in AS3: ./website.sh10.Leave this terminal running as well, and open a fourth terminal, also in the Project-5 directory. Now, we will start a rogue AS (AS4) that will connect directly to AS1 and advertise the same 13.0.0.0/8 prefix. This will allow AS4 to hijack the prefix due to the shorter AS Path Length: ./start_rogue.sh11.Return to the third terminal window and observe the continuous web requests. After the BGP routing tables converge on this simple network, you should eventually see the attacker start responding to requests from AS1, rather than AS3.12.Additionally, return to the second terminal and rerun the command to print the routing table. You may need to repeat the steps to establish the remote session if it closes due to inactivity. You should now see the fraudulent advertisement for the 13.0.0.0/8 prefix in the routing table, in addition to the longer unused path to the legitimate owner.13.Finally, let’s stop the attack by switching to the fourth terminal and using the following command: ./stop_rogue.sh14.You should notice a fairly quick re-convergence to the original legitimate route in the third terminal window, which should now be delivering the original traffic. Additionally, you can check the BGP routing table again to see the original path is being traversed.As demonstrated in Part 2, network virtualization can be very useful in demonstrating and analyzing network attacks that would otherwise require a large amount of physical hardware to accomplish. In Part 3, you are tasked with replicating a different topology and attack scenario to demonstrate the effects of a different instance of a Prefix Hijack Attack.1. To start, we recommend making a working copy of the code provided to you in the Project-5 directory. You will likely find this project to be more approachable if you spend time exploring the demo code and fully understanding how each part works rather than immediately trying to edit the code.2. Next, refer to the referenced paper in Part 1A, and locate Figure 1.3. Edit the working copy of the demo code you just made to reconstruct the topology in Figure 1. When complete, you should be able to use the commands from Part 2 to explore the routing tables generated by each border router. For our purposes, you can assume:a. All links to be bidirectional peering links. b. Each AS advertises a single prefix: AS1: 1.0.0.0/8, AS2: 2.0.0.0/8, AS3: 3.0.0.0/8, AS4: 4.0.0.0/8, AS5: 5.0.0.0/8, AS6: 1.0.0.0/8 (Note: We highly recommend using these prefix values in your configuration to simplify grading and for consistency in communication and discussion in Piazza. However, you may use any valid prefix values in your configuration.) c. The number of hosts in each AS is the same as in the provided code.4. Do not change passwords in zebra and conf files. If you change the passwords, the auto-grader will fail resulting in 0 for the assignment. ——–5. Next, locate Figure 2 in the referenced paper. Draw a topology map using any drawing tool of your choice. You may hand-draw your topology with pencil and paper and scan or photograph your drawing. All configuration values drawn on the map must be legible. Save your topology diagram in PDF format with the name fig2_topo.pdf. You must use this filename as part of your submission to receive credit for your diagram.6. Continue to adapt the code in your working copy to simulate this hijack scenario. When complete, you should be able to use the commands from Part 2 to start a Rogue AS and demonstrate a similar change in routing table information as was shown in Part 2.7. Finally, create a compressed file (zip format) named Part3.zip containing your entire attack demonstration. You must include all of the files necessary to run your demo in an empty directory – do NOT assume that we will provide any of the files necessary to run your demonstration for grading purposes. Include your fig2_topomap.pdf file in your Part3.zip.• When viewing the BGP Tables note the “Status codes”. Give your topology enough time to converge before recreating the hijack simulation portion. It may take a minute or so for your topology to fully converge. You may continue to check the BGP Tables to determine whether the topology has converged• The order that you set up your peering links using addLink() matters. In previous projects, we manually selected which port on the switch to use. There is an optional parameter to the addLink() call which allows you to specify which switch port to use. In this project, you will not use those options. Therefore, the order of the links matters.• Some of the commands in the boilerplate code may not be necessary to complete Part 3. Some of it is there just so that you know it exists. • Check for more descriptive errors in the /logs directory. See the zebra files for the location of additional log files.• Run “links” on the Mininet CLI terminal to see if all links are connected and OK OK. • Run “net” on the Mininet CLI terminal to see if your ethernet links are connected as you expect. • Run “ifconfig -a” on all routers and hosts to ensure that all IP addresses are assigned correctly.• Run “sh ip bgp” and “sh ip bgp summary” on all routers. • The command pingall may not work and that is fine. • The website.sh may sometimes hang intermittently. If this happens restart the simulation. We are aware of this issue, and we keep this in mind as we grade your submission. You will not lose points if website.sh hangs so long as we are eventually able to run the simulation.• Watch the Intro presentation and read through the additional debugging tips on the intro slides.This part of the project is optional, but it is worth extra credit if you complete it. Your task here is to design and implement a countermeasure to the attack demonstrated in Part 3. We recommend you start by creating a complete copy of the code you produced in Part 3, and paste it to a fresh working directory.Next, design and implement a countermeasure to the attack from Part 3. When complete, you should be able to use the commands from Part 2 to launch the simulation, and start a Rogue AS that mounts a Prefix Hijack attack as in Part 3. In this case, the attack should fail and you should be able to observe the victim AS routing table maintain (or revert back to) it’s original state before the attack commences.The paper referenced in Part 1A describes some example countermeasures, and you can implement / modify them as required for this project. You are also free to explore other methods; this Part is open ended. The first stipulation is that the solution you implement be applicable in the general case, meaning it is not a hard-coded defense.Your defense should work regardless of which AS is attacked, which AS mounts the attack, and what prefix is targeted. The second is that the countermeasure must be demonstrable on the course VM. It is permissible to use additional libraries in the development of your countermeasure; however, they must be documented so the grader can install them prior to grading your code.As was done in Part 3, create a compressed file (zip format) named Part4.zip containing your entire countermeasure demonstration. You must include all of the files necessary to run your demo in an empty directory – do NOT assume that we will provide any of the files necessary to run your demonstration for grading purposes. Additionally, you should provide a supplementary document (PDF format) named Countermeasure.pdf.This document should provide the following: 1. A brief summary of how your solution counters the attack 2. A list of files you modified from Part 3 or created in order to implement the countermeasure 3. A brief description of what is changed in each file, (or the purpose of newly created files) including how it functions as a part of the larger system.4. Instructions for demonstrating the countermeasure, including instructions for installing required software / libraries. 5. A brief closing containing any additional information the grader may need to reproduce your countermeasure and contact information (if different than your GT student email address) in case the grade has questions.For this project you need to turn in the Part3.zip file you created in Part 3. Include your topology diagram fig2_topo.pdf in Part3.zipIf you chose to pursue the extra credit, also turn in Part4.zip file and Countermeasure.pdf files you created in Part 4. Please upload Part3.zip and Part4.zip and Countermeasure.pdf directly into canvas, there is no need to zip these three files into another zip. So please make sure you submit these three files on canvas directly.While discussion of the project in general is always permitted on Piazza, you are not permitted to share your code generated for Part 3 or Part 4. You may quote snippets of the unmodified skeleton code provided to you when discussing the Project. You may not share yout topology diagram you created in Part 3 Step 5.Rubric (out of 150 points) 5 pts Submission for turning in all the correct demo files with the correct names, and significant effort has been made towards completing the project. 5 pts Fig 2 TopoDiagram For turning in the correctly named Topology diagram file: fig2_topo.pdf with legible configuration values. 140 pts AttackDemo for accurately recreating the topology, links, router configuration, and attack per the instructions. Partial credit is available for this rubric item. 50 pts Extra Credit For correctly designing and implementing a countermeasure to the attack from Part 3. Submissions MUST include both the code and documentation – extra credit will not be considered for code without accompanying documentation.Some partial credit may be provided for thorough Countermeasures.pdf identifying a viable solution without accompanying code or with non-working code if the documentation acknowledges the lack of code or the failing code. [1] This Project inspired by a Mininet Demo originally presented at SIGCOMM 2014.

[SOLVED] Ece467 natural language processing project 2: first deep learning project

The purpose of this project is for you to get used to using PyTorch and Google Colab. As such, you are going to use those resources to complete one of the following two tasks:• Choice 1: Implement an RNN-based text categorization system using Google Colab and PyTorch, and then apply it to one of the three datasets from project #1. This is an example of a sequence classification task.• Choice 2: Implement an RNN-based POS tagger using Google Colab and PyTorch, and then apply it to a subset of the Penn Treebank. This is an example of a sequence labelling task. I have posted the Penn Treebank to our class Team, as a gzipped tar file. If you go to the General channel of the Team, then switch to the Files tab, then open the Class Materials folder, you will find it. Please do not share any part of this corpus outside of Cooper Union. The datasets from project #1, of course, have been available from the course website since project #1 was assigned.(I am still not sharing the test sets from the second or third dataset, so if you use one of those, you need to split what I gave you into a training set and a test set, and optionally a tuning set.) Whichever choice you make, you don’t need to implement the system from scratch. You can find tutorials online for either type of task and use their starting code. The hardest part of this project will likely be preprocessing the data and creating a proper dataset to be used with your system.You are not going to be graded on accuracy, as long as it is reasonable. Rather, I want to see that you set things up cleanly and have done some proper experimentation. For example, you might want to compare vanilla RNNs to LSTMs; single-direction LSTMs to bi-LSTMs; stacking two or three layers of LSTMs together; etc. You can also experiment with hyperparameters. Perhaps experiment with learning an embedding layer versus using static embedding such as those produced by word2vec. Do not use transformers for this assignment; I want you to use architectures that we learned about in the second unit of the course.You should also submit a short writeup that answers the following questions: • Which of the two tasks did you do? • Which data did you use, exactly? How did you divide it into a training set and test set (if necessary)? How did you convert the data to the proper format for your system? • What architectures and hyperparameters did you experiment with? What choices helped or hurt? What setting are used by your final system?• Does your final architecture learn embeddings for the task, or does it use static word embeddings, and if so, from what method? • What are the final results? (I should be able to easily run your system and verify this.)• If you think I need any additional instructions to run your code in Colab, specify this. When you are finished, share your Colab notebook containing the variation of the architecture that works best with Also send me your short writeup. The project is due the night of Sunday, November 17, before midnight. (I am not accepting presubmissions for this project; come and talk to me if you have questions.)

[SOLVED] Ece467 natural language processing project 1: conventional text categorization project

For this project, you will implement a text categorization system, using one of the three conventional machine learning methods for text categorization that we covered in class. You must implement the crux of the algorithm yourself.You are allowed to use available NLP resources that are not specifically related to text categorization, machine learning, or word statistics. For example, you may use an existing tokenizer, stemmer, or lemmatizer, if you wish.Assuming you implement the project in Python (which I recommend), you may use NLTK. However, you may not use any pre-existing routine (from NLTK or any other library) that calculates word statistics or that applies text categorization or a machine learning approach.Your program must allow the user to specify the names of two input files. The first will contain a list of labeled training documents. Each row of the file will list the relative path and filename of one training document followed by a single space and then the category of the document. The system should use these training documents to train itself appropriately so that it can predict the labels of future documents according to the learned categories. The second file will contain a list of unlabeled test documents. Each row will consist of a single string representing the relative path and filename of one test document.The program should loop through the indicated test documents and categorize each one based on its training. After all the predictions have been made, the program should allow the user to specify the name of an output file. This file will list the test documents along with their predicted labels; the file format should be same as the format of the training file. Do not assume any specific file names, folder names, or relative paths. It’s OK if you assume that the input files (but not the documents) will exist in the current directory.If you wish, you may separate training and testing into two separate programs. A disadvantage of this approach would be that your training program will need to save all its learned statistics in a file, and the test program would have to reload this information. (If you choose to do this, you should have both programs prompt the user for the name of this file.) An advantage of this approach is that this would allow you to train your system once for a set of categories, and then using the saved representation of your trained system, you could evaluate the trained system on multiple test sets without having to retrain it. Since training is usually more computationally intensive than testing, this can be very beneficial in practice; but for this project, it is optional.To test your programs, I am providing you with one entire corpus, including a training set and a test set. This corpus involves the categorization of news documents into the categories: Str (Struggle), Pol (Politics), Dis (Disaster), Cri (Crime), or Oth (Other). It is guaranteed that each document belongs to exactly one of these categories (i.e., the categories are mutually exclusive and exhaustive). I am also providing you with just the training sets (but not the test sets) for two other corpora.The second corpus involves the categorization of images based on the first sentences of their captions into the mutually exclusive and exhaustive categories: O (Outdoor) or I (Indoor). The third corpus involves the categorization of news documents into the mutually exclusive and exhaustive categories: Wor (World News), USN (U.S. News), Sci (Science and Technology), Fin (Finance), Spo (Sports), or Ent (Entertainment).Note that I am only providing the training sets for two of the corpora. (I have separate test sets for these corpora that I will use for my evaluation.) Since I am not providing these test sets, you should use one of the methods mentioned in class to evaluate your system for these corpora; e.g., you can either split the training set into a smaller training set and a tuning set, or you can apply k-fold cross-validation. You should never include any documents in both the training set and the tuning set for a single experiment; that would likely lead to an unreasonably high estimate of future accuracy.Generally, using either a tuning set or cross-validation should give you a reasonable idea as to what sort of accuracy to expect when I run the system on my test sets. (If you tune parameters, improving accuracy as you go, the estimate still might be a bit high.) I have posted a file on the course website called TC_provided.tar.gz. If you download this file to a Linux system or to a Windows system with Cygwin, you can extract the contents by typing “gunzip TC_provided.tar.gz” and then “tar -xvf TC_provided.tar”.This will create a directory called TC_provided, including the following files: • corpus1_train.labels: This file contains the list of labeled training files for corpus 1. Note that there are 885 training documents, and the 5 categories are not represented equally (there are 282 Str, 243 Pol, 207 Dis, 100 Cri, and 53 Oth documents). • corpus1_test.list: This file contains the list of 443 test files for corpus 1. • corpus1_test.labels: This file contains the list of labeled test files for corpus 1. I am providing this to you so that you can evaluate your text categorization system after applying it to the files listed in corpus1_test.list.• corpus2_train.labels: This file contains the list of labeled training files for corpus 2. Note that there are 894 training documents including 621 with category O and 273 with category I.• corpus3_train.labels: This file contains the list of labeled training files for corpus 3. Note that there are 955 training documents, and the 6 categories are not represented equally (there are 338 Wor, 245 USN, 124 Sci, 114 Fin, 92 Spo, and 42 Ent documents).There will also be 3 subfolders called corpus1, corpus2, and corpus3. Within corpus1, you will find both the training and test documents. Within corpus2 and corpus3, you will find only the training documents. Of course, I have test sets for all three corpora. In all cases, files were distributed randomly between the training and test sets, so while the relative sizes of categories in the test set will not be identical to the training set, you can expect them to be similar. In all three cases, approximately 1/3 of the documents were placed in the test set.I have also provided, in the root directory, a file called analyze.pl, a Perl script I wrote (as a graduate student) that you can use to compare the output of your system to a file with the actual labels for the test set. You can use this directly to evaluate your system’s results for corpus 1. For example, if you name your output file corpus1_predictions.labels for corpus 1, you can type the following (assuming Perl is installed on your system): “perl analyze.pl corpus1_predictions.labels corpus1_test.labels”If your output file is in the correct format, this will display a confusion matrix indicating the breakdown of correct and incorrect predictions according to categories, with columns indicating actual categories and rows indicating predictions. You will also see reported the precision, recall, and F1 measures for each category, as well as the overall accuracy of the system.A user-friendly text categorization system should not assume specific categories; it should learn them from the training set that is provided. However, you may assume that your system will only be applied to the sets of categories mentioned in this document, and you may hardcode them into your system if you wish. In any case, you should not require the user to indicate which set of categories is being used. If you decide to hardcode the categories, your system should still figure out which of the three sets of categories it is dealing with based on the training set. (Of course, you also have the option of creating a general system which detects all the categories based on the training set.) Also, I don’t think I should have to specify this, but for the first corpus, you may not hardcode the answers for any specific document, nor may you include rules that are in any way specific to the examples in the test set. Doing something like that would invalidate the evaluation for that corpus.Your project may be written in any language, as long as I can run it easily using Cygwin or Ubuntu, but I recommend using Python. If you e-mail me your program early (at least two days before the deadline), I will test it on the three data sets and let you know the performance on the test sets (one of which is provided to you). You will then have an opportunity to improve your system and resubmit it if you are not satisfied. Expect that it might take me a couple of days to reply to each presubmission. Assuming that your program meets all the requirements, the grade will largely depend on how well the system performs, according to the overall accuracy metric. I will specifically consider the performance relative to other systems I have received over the years that have used the same sort of machine learning approach.When you submit your project, I would also like a short write-up describing your system; one or at mots two pages should be enough. The information provided in your writeup should include: • Instructions explaining how to use your system, including: o What operating system did you use to develop your system? o What programming language and version did you use?o How do I run your program (and how do I compile it, if necessary)? o What libraries do I need (and how can I install them if they are not already installed)? • Which basic machine learning method did you use? • How does your system tokenize training and test files?• What weighting scheme, if any, is used for tokens (not relevant for naïve Bayes)? • If you used naïve Bayes, what method of smoothing is used?• Which optional parameters or features did you experiment with (e.g., possibilities might include case sensitivity, stemming or lemmatization, stop lists, etc.)? Which parameters or features made a significant difference, and how are they set in your final system? • How did you evaluate your system’s performance for the second and third data sets? • You may include any additional information that you wish.NOTE: I am requesting that submissions be sent to my Cooper Gmail account because it is better for accepting Python programs. The project is due the night of Sunday, October 13, before midnight. I advise you to get started early and have fun with it!

[SOLVED] Ece-210-b homework #1 the basics

Read the rest of this document – hopefully it helps with the coding, and it details style points that you’ll want to remember. As for code… do the following. You may wish to read the matlab docs for a few of these! Scale-‘ers Create the following scalar variables (with these names – they’ll be referenced later). These should all be doable in one line. 1. a = |sin(π/3) + j/sec(−5π/3)| (| · | denotes absolute value) 2. l = √3 8 3. u = vuut 2 6! X 80 n=1 n (hint: use sum and the colon operator) 4. m = Im⌊ln √ 66 7j ⌋ 2 (⌊·⌋ denotes the floor function. Note that floor and the natural log work on complex numbers!)Mother…? Create the following vectors & matrices. Reference the above variables by name rather than writing the expressions again. 1. A = (a column vector with those four scalars in any order) 2. F = a l u m (bonus points if you index A to do it)3. T = F-transpose (should be short!) 4. B = TF-inverse (verify this by multiplying B by TF) 5. C = T F F T (should be a 4 × 4 matrix)Cruelty Use mean to compute the mean of all four entries in B, as well as the row-wise means of C. Store the latter in a 4 × 1 column vector. Odd Types Try evaluating T+F. Then T+1. Then, just for kicks, C+A. What happens? Does this remotely make sense? Tell me your thoughts. Not What It Seems… This matlab thing seems great for evaluating functions and manipulating matrices. What about something a little less… limited?Create a (row or column) vector with k = 3 elements, with values evenly spaced between zero and one (think linspace). Square each individual element, take the sum, and divide by k. See what you get. Repeat this for k = 5, 10, 300, 1e6. Could you have predicted the value you’re approaching? How?A Style For historical purposes, I’ve included Jon Lam’s reminder on style (first given in a matlab seminar before my time), as it succinctly covers most of the important points: Remember, Good Code Style™ is important! Here are some recommendations, but feel free to do what suits you, so long as it is consistent and logical. • Begin your scripts with clc, clear and close all. (Don’t remember what these do? Use help!)• Suppress outputs of intermediate values by ending the line with a semicolon. Long outputs printed in the command window are hard to follow. I prefer suppressing all outputs and storing answers in descriptive variables. • Follow a convention for variable names. It can be snake_case, camelCase, PascalCase, alllowercase, etc. Names that are too short (e.g., x) are not descriptive, and variableNamesThatAreTooLongLikeThis become tedious. The exception to the first rule is when the name is clear from context, e.g., x and t to denote time series data, but even then it is usually nice to subscript them (e.g., x_1 and t_1 if you are working with multiple time-series). Be sensible!• Long lines tend to be hard to read, especially on smaller screens. Try to limit lines to 80 characters. (MATLAB has a visual indicator for this.) To break an expression over multiple lines, use ellipses (…), e.g.: this_is_a_long_variable_name … = some_long_expression … * another_long_expression;• Use comments to explain code. (Recall that comments start with %.) The better and more consistently your variables are named, the less commenting you need to keep your code maintainable. • Using sections and consistent spacing makes for easier reading/debugging. Section separators must have two percent signs at the beginning of the line, followed by the space, followed by the section title.Here’s a snippet of matlab code written in the above style: useful comment z }| { % broadcasting should not be attempted here! weighted_sum | {z } snake_case name spaced well z}|{ = sum( values_read | {z } consistent spaced well z }| { .* weights | {z } concise ); ← no printing! There’s more to say on style (of course), and we’ll get to that as the semester progresses.A.1 Arranging MATLAB Files This should be more of a reminder – I’ll go over this in class, and the lecture notes are formatted similarly to this. You can also deviate from this style if you believe it right – I provide these guidelines as, well, guidelines, so you have somewhere to start. Begin the file with a preamble: %%%% A Title Befitting the Extraordinary Works You Have Produced % Your Name, The Date % A description, perhaps across multiple lines if the project was % sufficiently complex, of what you have created. close all; clear; clc;This tells whoever reads it what it is and clears out the matlab environment in preparation for further works. Each section of the file should start with a section designator (%%) to allow running sections individually. This will make your life easier, as you’ll be able to run (and thus debug) sections of the file individually using matlab’s Run Section button (under Editor). It’s somewhat similar to notebook-style programming (e.g., Jupyter), if you’re familiar with that.A section might look like this: %% The First Task % (1) in the beginning… x = linspace(0, 1, 1000); y = sin(x); % (2) …there was Mathworks! figure; plot(x, y); title(‘why did i bother putting a title here?’);For reference, this is what my submission for this assignment might look like:% Just the basics… close all; clear; clc; %% Scale-‘ers % … % (1) … a = …; % (2) … l = …; % … %% Mother…? % … % and so on…B Reading the Docs There are two ways to get documentation: via help and via Mathworks’ website. The built-in help command yields text-format help for a given command. For example: >> help log log Natural logarithm. log(X) is the natural logarithm of the elements of X. Complex results are produced if X is not positive. See also log1p, log2, log10, exp, logm, reallog.Documentation for log Other uses of log This documentation is pretty terse and is useful as a refresher if you’ve forgotten how to do something. If you want more depth (or are learning a function for the first time), Mathworks provides extensive documentation for matlab at mathworks.com/help/matlab/. If you need help on anything from syntax to function arguments to design strategies, this is the place to go! StackOverflow is, of course, a wonderful resource too, but being able to read Mathworks’ own documentation is a useful skill.For this assignment, you might check out the following pages: 1. the colon (:) operator: mathworks.com/help/matlab/ref/colon.html 2. the sum function: mathworks.com/help/matlab/ref/sum.html 3. the mean function: mathworks.com/help/matlab/ref/mean.html 4. the page for floor 5. the page for imag 6. … …you get the idea. These pages are also available within the matlab interface itself via the built-in command doc, which takes arguments like help does. B.1 How To Learn Something New What do you do when you need to do something but don’t know the first thing about it?I don’t have a magic answer for this. So much in software comes down to knowing where the right documentation lies, and, frustratingly, there’s no one answer about how to find it. Knowing a few places to look is helpful – hence the references to Mathworks, help, and SO above – but that doesn’t solve everything, and traditional software documentation is not written in a way that makes learning a library, a language, etc. from scratch very easy. I’m no expert in researching stuff in general, neither am I one in software, but here’s what I do when I need something I don’t have:1. Realize I need new knowledge. This is more important than you might think and I’m kinda bad at it. Signs you might need to learn something are: your work seems harder than expected, you keep doing the same thing over and over, you suspect the problem you’re working on is useful outside the bounds of what you’re doing, you’re bored – in gereral, someone might have encountered exactly what you’re facing now and already worked out a solution to it.2. Feed it into a search engine (or several – using Bing and Google simultaneously often gives non-overlapping results). But not just once: I often reword queries on the order of ten times before I find what I’m looking for. Pay attention to language – how do other people describe the problem you’re facing? What words do they use that you could use in your next search?3. If you find examples with links to the relevant documentation pages, great – use those. But type them yourself rather than copying them. This ensures you think about what you’re writing – which, after all, is the point.4. If you’re left with software docs: good luck. More on that below. Reading software documentation is a research art in itself. Most of the time, docs are set up heirarchically, starting with general packages and working their way down towards individual features. The “How To ReadThis Manual” section of the docs for gnu make gives what is often a decent approach: read the first few sections of whatever you find, looking for general information, and skip the technical details on the first pass. Look up language or concepts you don’t know, either in the manual itself or on the wider internet. Learning from docs is, again, a skill, and an explorative process. It will take time.Speaking of exploration: one unorthodox means of discovery is to simply read other people’s code and look for things you don’t understand. Don’t be afraid of talking to them, either!

[SOLVED] Ece-210-b homework #2 vectors & matrices

We’ve now seen some more powerful vector and matrix operations, so let’s use ‘em! Thinking and writing vectorized code is a powerful way of improving operation efficiency (as you will see). For each computation, save the result to a variable or print it to the screen (omit the semicolon). Bonus points if you use disp or fprintf to label the output and make really pretty charts (FF loves really pretty charts).Same submission format as the first assignment, and will be the same going forward. Have fun…Read the rest of this document… as before. Some of it may be review, but hey – lecture notes for this class aren’t super formal. Spatial Awareness Perform the following operations without the use of for. 1. Use reshape and the : operator to create the matrix A =   0 1 . . . 7 8 9 . . . 15 . . . . . . . . . . . . 56 57 . . . 63   (should be 8 by 8), then dot-exponentiate 2 by it to create the matrix B =   2 0 . . . 2 7 . . . . . . . . . 2 56 . . . 2 63   .2. Flatten B into a vector v (row or column – should be quick either way!) and extract the prime-numbered components of v (in other words, v2, v3, . . . , v61). Save these in their own variable, and name it well! Note: matlab has a lot of funny little helper functions…3. Find the geometric mean of those prime components (for some x1, x2, . . . , xn, the geometric mean is √n x1x2…xn – see the documentation for prod and nthroot for how to compute this concisely). 4. Flip one row of A end for end (see the course notes for how to do this).5. Delete one column of A. Smallest Addition We’re going to try out a couple variations of numerical integration and differentiation. Specifically, let’s see how accurately we can evaluate erf(z) = 2 √ π ∫ z 0 exp(−t 2 ) dt and d dt [exp(−t 2 )] . 2 Matlab has a function for doing the former (called, unsurprisingly, erf), as this is a rather useful but nonelementary integral, but because we enjoy pain we’ll do it by hand.1. Evaluate exp(−t 2 ) at 100 evenly spaced points from 0 to 6.66. 2. Approximate the derivative of this function using diff. Find the mean squared error (using mean) between this derivative and samples of the analytical derivative over this range. (Note: you will likely have to truncate or pad these.)3. Approximate the integral of the original function using cumsum and cumtrapz. This will result in two estimations of the original. Find the mean squared error against matlab’s erf over the same range for each of them. (Sorry to make you do this before we hit functions, but…) Comment on the integral estimations – which is better? How close did they get to erf?A Indexing & How MATLAB Thinks Vectors and matrices are indexed in multiple ways. Each element in a vector or a matrix has an index, which uniquely identifies it (as is probably painfully evident by now). Vectors are indexed by only this number. Matrix indexing can be more complex, so I’ll give a refresher here.All indices in matlab start from 1. If this irks your programmer self to no end, Python provides some excellent matlab alternatives with zero-indexing (not to mention the expressive power of a real programming language!). Each element of a matrix has two (or more!) numbers which represent it. As in mathematics, matrices are indexed first by row, then by column, then by… whatever other dimensions you have. For a 3 × 3 matrix, you’d have something like this: A =   A(1, 1) A(1, 2) A(1, 3) A(2, 1) A(2, 2) A(2, 3) A(3, 1) A(3, 2) A(3, 3)   . Each element of a matrix also has a single scalar associated with it, for an alternate method of indexing. Note the row-first order: A =   A(1) A(4) A(7) A(2) A(5) A(8) A(3) A(6) A(9)   .B Notes on Numerical Calculus The lecture notes on numerical calculus walk you through most of the code you need to know here. One important thing to note, though, is padding: the diff function returns the difference between subsequent elements of a vector, and thus will return a vector one element shorter than the one passed to it. For example: x = [0 1 3 5 9]; % length 5 y = diff(x); % y becomes [1 2 2 4], length 4If you’re computing a derivative via diff, you might want to pad the resulting vector (depending on what you’re doing with it) so that its length matches with other data you’re using: 4 % x, y defined above… dydx = diff(y)./diff(x); dydx_pad_start = dydx([1 1:end]); % duplicates first element dydx_pad_end = dydx([1:end end]); % duplicates last element The functions cumsum and cumtrapz do not need padding and are a little easier to use – cumtrapz makes doing numerical integrals really easy if you pass it the right arguments. See its documentation for more details!