Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Csci 570 hw 4 1 graded problems 1. suppose you were to drive from usc to santa monica along i-10.

1. Suppose you were to drive from USC to Santa Monica along I-10. Your gas tank, when full, holds enough gas to go p miles, and you have a map that contains the information on the distances between gas stations along the route.Let d1 < d2 < … < dn be the locations of all the gas stations along the route where di is the distance from USC to the gas station. We assume that the distance between neighboring gas stations is at most p miles. Your goal is to make as few gas stops as possible along the way. Give the most efficient algorithm to determine at which gas stations you should stop and prove that your strategy yields an optimal solution. Give the time complexity of your algorithm as a function of n.2. Consider the following modification to Dijkstras algorithm for single source shortest paths to make it applicable to directed graphs with negative edge lengths. If the minimum edge length in the graph is −w < 0, then add w + 1 to each edge length thereby making all the edge lengths positive.Now apply Dijkstras algorithm starting from the source s and output the shortest paths to every other vertex. Does this modification work? Either prove that it correctly finds the shortest path starting from s to every vertex or give a counter example where it fails.3. Solve Kleinberg and Tardos, Chapter 4, Exercise 3. 4. Solve Kleinberg and Tardos, Chapter 4, Exercise 5.2 Practice Problems 1. Solve Kleinberg and Tardos, Chapter 4, Exercise 4.2. Solve Kleinberg and Tardos, Chapter 4, Exercise 8. 3. Solve Kleinberg and Tardos, Chapter 4, Exercise 22.

[SOLVED] Csci 570 hw 3 1 graded problems 1. design a data structure that has the following properties

1. Design a data structure that has the following properties (assume n elements in the data structure, and that the data structure properties need to be preserved at the end of each operation): • Find median takes O(1) time • Extract-Median takes O(log n) time • Insert takes O(log n) time • Delete takes O(log n) timeDo the following: (a) Describe how your data structure will work (b) Give algorithms that implement the Extract-Median() and Insert() functions. Hint: Read this only if you really need to. Your Data Structure should use a min-heap and a max-heap simultaneously where half of the elements are in the max-heap and the other half are in the min-heap.2. There is a stream of integers that comes continuously to a small server. The job of the server is to keep track of k largest numbers that it has seen so far. The server has the following restrictions: (a) It can process only one number from the stream at a time, which means it takes a number from the stream, processes it, finishes with that number and takes the next number from the stream. It cannot take more than one number from the stream at a time due to memory restriction.(b) It has enough memory to store up to k integers in a simple data structure (e.g. an array), and some extra memory for computation (like comparison, etc.).(c) The time complexity for processing one number must be better than Θ(k). Anything that is Θ(k) or worse is not acceptable. Design an algorithm on the server to perform its job with the requirements listed above.3. When we have two sorted lists of numbers in non-descending order, and we need to merge them into one sorted list, we can simply compare the first two elements of the lists, extract the smaller one and attach it to the end of the new list, and repeat until one of the two original lists become empty, then we attach the remaining numbers to the end of the new list and it’s done. This takes linear time. Now, try to give an algorithm using O(n log k) time to merge k sorted lists (you can also assume that they contain numbers in non-descending order) into one sorted list, where n is the total number of elements in all the input lists. (Hint: Use a min-heap for k-way merging.)4. Suppose you are given two sets A and B, each containing n positive integers. You can choose to reorder each set however you like. After reordering, let ai be the i-th element of set A, and let bi be the i-th element of set B. You then receive a payoff on Qn i=1 ai bi . Give an algorithm that will maximize your payoff. Prove that your algorithm maximizes the payoff,and state its running time.2 Practice Problems 1. The police department in a city has made all streets one-way. The mayor contends that there is still a way to drive legally from any intersection in the city to any other intersection, but the opposition is not convinced. A computer program is needed to determine whether the mayor is right. However, the city elections are coming up soon, and there is just enough a time to run a linear-time algorithm. a Formulate this as a graph problem and design a linear-time algorithm. Explain why it can be solved in linear time.b Suppose it now turns out that the mayors original claim is false. She next makes the following claim to supporters gathered in the Town Hall: ”If you start driving from the Town Hall (located at an intersection), navigating one-way streets, then no matter where you reach, there is always a way to drive legally back to the Town Hall.” Formulate this claim as a graph problem, and show how it can also be varied in linear time 2. You are given a weighted directed graph G = (V, E, w) and the shortest path distances δ(s, u) from a source vertex s to every other vertex in G. However, you are not given π(u) (the predecessor pointers). With this information, give an algorithm to find a shortest path from s to a given vertex t in O(|V |+|E|)

[SOLVED] Csci 570 hw 2 1 graded problems 1. solve kleinberg and tardos, chapter 2, exercise 3.

1. Solve Kleinberg and Tardos, Chapter 2, Exercise 3. 2. Solve Kleinberg and Tardos, Chapter 2, Exercise 4.3. Solve Kleinberg and Tardos, Chapter 2, Exercise 5.4. Which of the following statements are true? (a) If f, g, and h are positive increasing functions with f in O(h) and g in Ω(h), then the function f + g must be in Θ(h). (b) Given a problem with input of size n, a solution with O(n) time complexity always costs less in computing time than a solution with O(n 2 ) time complexity.(c) F(n) = 4n + √ 3n is both O(n) and Θ(n).(d) For a search starting at node s in graph G, the DFS Tree is never as the same as the BFS tree.(e) BFS can be used to find the shortest path between any two nodes in a non-weighted graph.5. Solve Kleinberg and Tardos, Chapter 3, Exercise 2.2 Practice Problems 1. Reading Assignment: Kleinberg and Tardos, Chapter 2 and 3. 2. Solve Kleinberg and Tardos, Chapter 2, Exercise 6. 3. Solve Kleinberg and Tardos, Chapter 3, Exercise 6.

[SOLVED] Csci 570 hw 1 1 graded problems 1. state true/false: an instance of the stable marriage problem

1. State True/False: An instance of the stable marriage problem has a unique stable matching if and only if the version of the Gale-Shapely algorithm where the male proposes and the version where the female proposes both yield the exact same matching.2. A stable roommate problem with 4 students a, b, c, d is defined as follows. Each student ranks the other three in strict order of preference. A matching is defined as the separation of the students into two disjoint pairs. A matching is stable if no two separated students prefer each other to their current roommates. Does a stable matching always exist? If yes, give a proof. Otherwise give an example roommate preference where no stable matching exists.3. Solve Kleinberg and Tardos, Chapter 1, Exercise 4.4. N men and N women were participating in a stable matching process in a small town named Walnut Grove. A stable matching was found after the matching process finished and everyone got engaged. However, a man named Almazo Wilder, who is engaged with a woman named Nelly Oleson, suddenly changes his mind by preferring another woman named Laura Ingles, who was originally ranked right below Nelly in his preference list, therefore Laura and Nelly swapped their positions in Almanzos preference list.Your job now is to find a new matching for all of these people and to take into account the new preference of Almanzo, but you don’t want to run the whole process from the beginning again, and want to take advantage of the results you currently have from the previous matching. Describe your algorithm for this problem.Assume that no woman gets offended if she got refused and then gets proposed by the same person again.2 Practice Problems 1. Reading Assignment: Kleinberg and Tardos, Chapter 1. 1 2. Solve Kleinberg and Tardos, Chapter 1, Exercise 1. 3. Solve Kleinberg and Tardos, Chapter 1, Exercise 2. 4. Solve Kleinberg and Tardos, Chapter 1, Exercise 3.

[SOLVED] Cop3502 lab 8: the cow strikes back

This lab’s purpose is to provide students with experience in inheritance of classes and working with multiple classes. It is recommended that students use command line tools and editors for this lab (though it is not strictly speaking required). This lab will require students to build on their previous lab experience, in which a version of the cowsay utility was created.Students will update the driver program file (cowsay.py) and also create two new classes – Dragon and IceDragon. The Dragon class should extend the Cow class, and IceDragon must be derived from Dragon.As before, heifer_generator.py is provided for you – but updated to handle the new Dragon class and its subtypes. (Please refer to specification for previous lab for a refresher.) Students may implement private attributes and methods if they choose to do so. This is not required – it is purely optional. No public attributes / methods should be added to the specification! cowsay.py (Program Driver)Your program must accept command line arguments as follows: python3 cowsay.py -l Lists the available cows python3 cowsay.py MESSAGE Prints out the MESSAGE using the default COW python3 cowsay.py -n COW MESSAGE Prints out the MESSAGE using the specified COWIn addition, this version of the utility handles a special set of Cow-derived Dragon class (and its subclasses). Whenever a dragon-type cow is selected, the display of the message must be followed by a line stating whether or not the dragon is fire-breathing:Cow Class The Cow class must have all of the same methods as previously required (though students may add private methods). The methods are repeated here, briefly, for reference. __init__(self, name) // Constructor get_name(self) // Returns name of this cow object get_image(self) // Return image for this cow object set_image(self, image) // Sets the image for this cow object to imageDragon Class The Dragon class must be derived from the Cow class and must make all of its methods available. In addition, Dragon must provide the following methods: __init__(self, name, image)Constructor; creates a new Dragon object with the given name and image. can_breathe_fire() This method should exist in every Dragon class. For the default Dragon type, it should always return True. IceDragon Class The IceDragon class must be derived from the Dragon class and must make all of its methods available: __init__(self, name, image)Constructor; creates a new IceDragon object with the given name and image. can_breathe_fire() For the IceDragon type, this method should always return False.Submissions NOTE: Your output must match the example output *exactly*. Files: Method: cowsay.py, cow.py, dragon.py, ice_dragon.py Submit on CanvasSample Output >python3 cowsay.py Hello World! Hello World! ^__^ (oo)_______ (__) )/ ||—-w | || || >python3 cowsay.py -n kitteh Moew-Moew! Moew-Moew! (“`-‘ ‘-/”) .___..–‘ ‘ “`-._ ` *_ * ) `-. ( ) .`-.__. `) (_Y_.) ‘ ._ ) `._` ; “ -. .-‘ _.. `–‘_..-_/ /–‘ _ .’ ,4 ( i l ),-” ( l i),’ ( ( ! .-‘ >python3 cowsay.py -lCows available: heifer kitteh dragon ice-dragon >python3 cowsay.py -n ninja Hello world! Could not find ninja cow! >python3 cowsay.py -n dragon Firey RAWR Fiery RAWR |___/| / //|\ /0 0 __ / // | / / /_ / // | _^_’/ /_ // | //_^_/ /_ // | ( //) | // | ( / /) _|_ / ) // | _ ( // /) ‘/,_ _ _/ ( ; -. | _ _.-~ .-~~~^-. (( / / )) ,-{ _ `.|.-~-. .~ `. (( // / )) ‘/ / ~-. _.-~ .-~^-. (( /// )) `. { } / (( / )) .—-~-. -‘ .~ `. __ ///.—-..> _ -~ `. ^-` ///-._ _ _ _ _ _ _}^ – – – – ~ `—–‘ This dragon can breathe fire.>python3 cowsay.py -n ice-dragon Ice-cold RAWR Ice-cold RAWR |___/| / //|\ /0 0 __ / // | / / /_ / // | _^_’/ /_ // | //_^_/ /_ // | ( //) | // | ( / /) _|_ / ) // | _ ( // /) ‘/,_ _ _/ ( ; -. | _ _.-~ .-~~~^-. (( / / )) ,-{ _ `.|.-~-. .~ `. (( // / )) ‘/ / ~-. _.-~ .-~^-. (( /// )) `. { } / (( / )) .—-~-. -‘ .~ `. __ ///.—-..> _ -~ `. ^-` ///-._ _ _ _ _ _ _}^ – – – – ~ `—–‘ This dragon cannot breathe fire.

[SOLVED] Cop3502 lab 7: the cow says…

This lab is designed to introduce students to the Bash Command Line Interface (CLI) and the concept of CLI arguments and give them practice writing classes. The cowsay utility is a popular Unix program from the 20th century (see https://en.wikipedia.org/wiki/Cowsay). You will write a slightly simplified cowsay program that takes in several arguments and prints out different text depending on the arguments.Tools Please note that you are strongly recommended to use a text editor and the terminal to edit and run your program and its directories. It is advised students learn/review basic Unix shell commands before beginning; a good run-through can be found here: https://linuxjourney.com/lesson/the-shell. You are also allowed to use PyCharm and its terminal to write and run your program.Follow these steps to get started on the lab: 1. Open a terminal and enter the pwd command to identify the path to the working (current) directory (folder) 2. Enter ls to list the contents of the current directory 3. Use the mkdir command to make a new directory called CowLab. 4. Use ls to see the change, then cd to change to the directory CowLab. 5. Do your lab work in that folder. Use your google skills to find more commands.You can read more information about some of these commands here: https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/ https://pythonbasics.org/execute-python-scripts/Students will write two files: a driver file with a main() entry point (cowsay) and a data class (cow). Note that heifer_generator.py is provided for you; your code must use this class to create the cow objects.Provided For Students – HeiferGenerator get_cows() Static method which returns a Python list of cow objects from the built-in data set. This will use the Cow constructor and image property of the cow class to properly initialize new cow objects uniquely for each data set. This means it is dependent on your Cow class, so you should write that before working on main! cowsay.py (Program Driver)Your program must accept command line arguments. Command line arguments are captured as part of the argv variable found in the sys module. This can be accessed with sys.argv after you import sys (Review lecture slides for examples!).The command line arguments that must be supported are as follows (use python command for Windows and python3 for mac): python cowsay.py -l Lists the available cows python cowsay.py MESSAGE Prints out the MESSAGE using the default COW python cowsay.py -n COW MESSAGE Prints out the MESSAGE using the specified COW If a user calls for a cow that does not exist, the program should print out “Could not find [COWNAME] cow!” Output SamplesSuggested Functions The following functions are suggested to make development easier, but are not required: list_cows(cows) Displays the available cows from a Python list of Cow objects. find_cow(name, cows) Given a name and a Python list of Cow objects, return the Cow object with the specified name. If no such Cow object can be found, return None.Cow Class The Cow class facilitates the creation and use of cow objects by providing the following methods (which students must implement): __init__(self, name)Initializes a cow object with name and image to be None get_name(self) Returns the name of the cow. Note: the name property should NOT have a setter. get_image(self) Returns the image used to display the cow (this should be called after the message has been displayed). set_image(self, image) Sets the image used to display the cow.Submissions NOTE: Your output must match the example output *exactly*. If it does not, you will not receive full credit for your submission! Files: cowsay.py, cow.py Sample Output >python3 cowsay.py Hello World! Hello World! ^__^ (oo)_______ (__) )/ ||—-w | || || >python3 cowsay.py -n kitteh Hello World! Hello World! (“`-‘ ‘-/”) .___..–‘ ‘ “`-._ ` *_ * ) `-. ( ) .`-.__. `) (_Y_.) ‘ ._ ) `._` ; “ -. .-‘ _.. `–‘_..-_/ /–‘ _ .’ ,4 ( i l ),-” ( l i),’ ( ( ! .-‘>python3 cowsay.py -l Cows available: heifer kitteh >python3 cowsay.py -n ninja Hello world! Could not find ninja cow! >python3 cowsay.py Hello -n kitteh Hello -n kitteh ^__^ (oo)_______ (__) )/ ||—-w | || ||

[SOLVED] Cop3502 hw 4: rle with images python

In this project students will develop routines to encode and decode data for images using run-length encoding (RLE). Students will implement encoding and decoding of raw data, conversion between data and strings, and display of information by creating procedures that can be called from within their programs and externally.This project will give students practice with loops, strings, Python lists, methods, and type-casting.RLE is a form of lossless compression used in many industry applications, including imaging. It is intended to take advantage of datasets where elements (such as bytes or characters) are repeated several times in a row in certain types of data (such as pixel art in games). Black pixels often appear in long “runs” in some animation frames; instead of representing each black pixel individually, the color is recorded once, following by the number of instances.For example, consider the first row of pixels from the pixel image of a gator (shown in Figure 1). The color black is “0”, and green is “2”: Figure 1 – Gator Pixel Image The encoding for the entire image in RLE (in hexadecimal) – width, height, and pixels – is: Flat (unencoded) data: 0 0 2 2 2 0 0 0 0 0 0 2 2 0_ Run-length encoded data: 2 0 3 2 6 0 2 2 1 0_. 1 E |1 6 2 0 3 2 6 0 2 2 2 0 1 2 1 F 1 0 7 2 1 A F 2 1 0 9 2 3 0 1 2 1 0 3 2 6 0 3 2 3 0 8 2 5 0 W/ H/ ——————————————PIXELS———————————————–/The images are stored in uncompressed / unencoded format natively. In addition, there are a few other rules to make the project more tractable: 1. Images are stored as a list of numbers, with the first two numbers holding image width and height. 2. Pixels will be represented by a number between 0 and 15 (representing 16 unique colors). 3. No run may be longer than 15 pixels; if any pixel runs longer, it should be broken into a new run. For example, the chubby smiley image (Figure 2) would contain the data shown in Figure 3. Figure 2 Figure 3 – Data for “Chubby Smiley”NOTE: Students do not need to work with the image file format itself – they only need to work with lists and encode or decode them. Information about image formatting is to provide context. Student programs must present a menu when run in standalone mode and must also implement several methods, defined below, during this assignment.Standalone Mode (Menu) When run as the program driver via the main() method, the program should: 1) Display welcome message 2) Display color test (ConsoleGfx.test_rainbow) 3) Display the menu 4) Prompt for inputNote: for colors to properly display, it is highly recommended that student install the “CS1” theme on the project page. There are five ways to load data into the program that should be provided and four ways the program must be able to display data to the user.Loading a File Accepts a filename from the user and invokes ConsoleGfx.load_file(filename): Select a Menu Option: 1 Enter name of file to load: testfiles/uga.gfx Loading the Test Image Loads ConsoleGfx.test_image: Select a Menu Option: 2_ Test image data loaded._Reading RLE String Reads RLE data from the user in decimal notation with delimiters (smiley example): Select a Menu Option: 3 Enter an RLE string to be decoded: 28:10:6B:10:10B:10:2B:10:12B:10:2B:10:5B:20:11B:10:6B:10 Reading RLE Hex String Reads RLE data from the user in hexadecimal notation without delimiters (smiley example): Select a Menu Option: 4Enter the hex string holding RLE data: 28106B10AB102B10CB102B105B20BB106B10 Reading Flat Data Hex String Reads raw (flat) data from the user in hexadecimal notation (smiley example): Select a Menu Option: 5Enter the hex string holding flat data: 880bbbbbb0bbbbbbbbbb0bb0bbbbbbbbbbbb0bb0bbbbb00bbbbbbbbbbb0bbbbbb0 Displaying the Image Displays the current image by invoking the ConsoleGfx.display_image(image_data) method.Displaying the RLE String Converts the current data into a human-readable RLE representation (with delimiters): Select a Menu Option: 7RLE representation: 28:10:6b:10:10b:10:2b:10:12b:10:2b:10:5b:20:11b:10:6b:10 Note that each entry is 2-3 characters; the length is always in decimal, and the value in hexadecimal! Displaying the RLE Hex Data Converts the current data into RLE hexadecimal representation (without delimiters): Select a Menu Option: 8 RLE hex values: 28106b10ab102b10cb102b105b20bb106b10 Displaying the Flat Hex Data Displays the current raw (flat) data in hexadecimal representation (without delimiters): Select a Menu Option: 9 Flat hex values: 880bbbbbb0bbbbbbbbbb0bb0bbbbbbbbbbbb0bb0bbbbb00bbbbbbbbbbb0bbbbbb0 Class Methods Student classes are required to provide all of the following methods with defined behaviors. We recommend completing them in the following order:1. to_hex_string(data) Translates data (RLE or raw) a hexadecimal string (without delimiters). This method can also aid debugging. Ex: to_hex_string([3, 15, 6, 4]) yields string “3f64”. 2. count_runs(flat_data) Returns number of runs of data in an image data set; double this result for length of encoded (RLE) list. Ex: count_runs([15, 15, 15, 4, 4, 4, 4, 4, 4]) yields integer 2.3. encode_rle(flat_data) Returns encoding (in RLE) of the raw data passed in; used to generate RLE representation of a data. Ex: encode_rle([15, 15, 15, 4, 4, 4, 4, 4, 4]) yields list [3, 15, 6, 4]. 4. get_decoded_length(rle_data) Returns decompressed size RLE data; used to generate flat data from RLE encoding. (Counterpart to #2) Ex: get_decoded_length([3, 15, 6, 4]) yields integer 9.5. decode_rle(rle_data) Returns the decoded data set from RLE encoded data. This decompresses RLE data for use. (Inverse of #3) Ex: decode_rle([3, 15, 6, 4]) yields list [15, 15, 15, 4, 4, 4, 4, 4, 4, 4].6. string_to_data(data_string) Translates a string in hexadecimal format into byte data (can be raw or RLE). (Inverse of #1) Ex: string_to_data (“3f64”) yields list [3, 15, 6, 4]. 7. to_rle_string(rle_data) Translates RLE data into a human-readable representation. For each run, in order, it should display the run length in decimal (1-2 digits); the run value in hexadecimal (1 digit); and a delimiter, ‘:’, between runs. (See examples in standalone section.)Ex: to_rle_string([15, 15, 6, 4]) yields string “15f:64”.8. string_to_rle(rle_string) Translates a string in human-readable RLE format (with delimiters) into RLE byte data. (Inverse of #7) Ex: string_to_rle(“15f:64”) yields list [15, 15, 6, 4]. Submissions NOTE: Your output must match the example output *exactly*. If it does not, you will not receive full credit for your submission! File: Method: initials_HW{num}.py Submit on Canvas Do not submit any other files!For this assignment, students will now complete the final 2 methods on page 3 of this document as well as the remainder of the project involving the menu options and understanding how all the individual methods are intertwined with each other. You will submit your whole program including the 8 methods listed above and the main method. We will only test your remaining 2 methods and the main method in HW4.

[SOLVED] Cop3502c pakudex project

This project will provide students with practice building and working with object-oriented programming constructs including classes and objects by building classes to represent creatures and a cataloguing system.Scenario NOTE: This project concept is a work of satire. To state the obvious: we do not advise one to go around imprisoning creatures in small receptacles held in one’s pockets and/or having them fight for sport. Pouch Creatures – abbreviated “Pakuri” – are the latest craze sweeping elementary schools around the world.Tiny magical creatures small enough to fit into one’s trouser pouches (with enough force applied, ‘natch) have begun appearing all around the world in forests. They come in all shapes colors. When stolen from their parents at a young enough age, they can be kept in small spherical cages (for their own good) easily carried by elementary school children (though they are also popular with adults).This has led to an unofficial catch phrase for the phenomenon – “Gotta steal ‘em all!” – a play on the abbreviation “Pakuri” (which doubles as Japanese slang meaning “to steal”). Young children can then pit their Pakuri against one another in battle for bragging rights or to steal them from one another. (Don’t worry – they heal their wounds quickly!)Of course, keeping track of all these critters can be a real task, especially when you are trying to steal so many of them at such a young age! You’ve decided to cash in – hey, if you don’t someone else will – on the morally ambiguous phenomenon by developing an indexing system – a pakudex – for kids and adult participants.Students will construct three classes: a driver class with a main() entry point (pakuri_program) and two data object classes (pakuri and pakudex). All attributes / methods must be private unless noted in the specification!pakuri_program.py (Driver / Program) When run, the program should… 1) Display a welcome message 2) Prompt for / read pakudex capacity & confirm 3) Display the menu 4) Prompt for inputThis should number and list the critters in the pakudex in the order contained. For example, if “Pikaju” and “Charasaurus” were added to the pakudex (in that order), before sorting, the list should be: Success FailureShow Pakuri The program should prompt for a species and collect species information, then display it: Success FailureAdding Pakuri When adding a pakuri, a prompt should be displayed to read in the species name, and a confirmation displayed following successful addition (or failure). Success Failure – Duplicate Failure – Full Evolve Pakuri Welcome to Pakudex: Tracker Extraordinaire! Enter max capacity of the Pakudex: 30 The Pakudex can hold 30 species of Pakuri. Pakudex Main Menu —————– 1. List Pakuri 2. Show Pakuri 3. Add Pakuri 4. Evolve Pakuri 5. Sort Pakuri 6. ExitWhat would you like to do? Enter the name of the species to display: PsyGoose Species: PsyGoose Attack: 65 Defense: 57 Speed: 61 Enter the name of the species to display: PsyDuck Error: No such Pakuri! Pakudex Main Menu —————–1. List Pakuri Enter the name of the species to add: PsyGoose Pakuri species PsyGoose successfully added! Pakudex Main Menu —————– Enter the name of the species to add: PsyGoose Error: Pakudex already contains this species! Error: Pakudex is full! Pakuri In Pakudex: 1. Pikaju 2. Charasaurus No Pakuri in Pakudex yet!Pakudex Main Menu The program should prompt for a species and then cause the species to evolve if it exists: Success Failure Sort Pakuri Sort pakuri in Python standard lexicographical order: Pakuri have been sorted! (Hint: Use Sort() Function) Exit Quit the program: Thanks for using Pakudex! Bye!Pakuri Class This class will be the blueprint for the different critter objects that you will create. You will need to store information about the critter’s species, attack level, defense level, and speed. All variables storing information about the critters must be private (accessible from outside of the class).We recommend (but do not mandate) the following variable types and names: species; (Type: String) attack, defense, speed; (Type: Int)These attack, defense, and speed levels should have the following initial values when first created: Attribute Value attack (len(species) * 7) + 9 defense (len(species) * 5) + 17 speed (len(species) * 6) + 13 (You may have noticed Pakuri don’t have individual names, just species; don’t worry! They won’t live long enough for it to matter with all of the fighting. Your conscience can be clear!)The class must also have the following methods and behaviors (this is mandatory): __init__(self, species) Initialize the pakuri object with species attribute def get_species(self) Returns the species of this critter def get_attack(self) Returns the attack value for this critter def get_defense(self) Returns the defense value for this critter def get_speed(self) Returns the speed of this critter def set_attackself, new_attack) Changes the attack value for this critter to new_attack Enter the name of the species to evolve: PsyGoose PsyGoose has evolved! Enter the name of the species to evolve: PsyDuck Error: No such Pakuri! def evolve(self)Will evolve the critter as follows: a) double the attack; b) quadruple the defense; and c) triple the speed Pakudex Class The Pakudex class will contain all the pakuri that you will encounter as Pakuri objects. Note: The pakudex will have a set size determined by user input at the beginning of the program’s run; the number of species contained in the pakudex will never grow beyond this point.The class must also have the following methods and behaviors (this is mandatory): def __init__(self, capacity=20) Initializes this object to contain exactly capacity objects when completely full. The default capacity for the pakudex should be 20 def get_size(self) Returns the number of critters currently being stored in the pakudex def get_capacity(self)Returns the number of critters that the pakudex has the capacity to hold at most def get_species_array(self) Returns a string list containing the species of the critters as ordered in the pakudex; if there are no species added yet, this method should return None def get_stats(self, species)Returns an int list containing the attack, defense, and speed statistics of species at indices 0, 1, and 2 respectively; if species is not in the pakudex, returns None def sort_pakuri(self)Sorts the pakuri objects in this pakudex according to Python standard lexicographical ordering of species name def add_pakuri(self, species)Adds species to the pakudex; if successful, return True, and False otherwise def evolve_species(self, species) Attempts to evolve species within the pakudex; if successful, return True, and False otherwiseSubmissions NOTE: Your output must match the example output *exactly*. If it does not, you will not receive full credit for your submission!MAKE SURE YOUR CLASSES ARE DEFINED WITH LOWERCASE LETTERS AS SHOWN BELOW! Files: pakuri_program.py, pakuri.py, pakudex.py Method: Submit on ZyLabs

[SOLVED] Cop3502 p2: rle with images python

In this project students will develop routines to encode and decode data for images using run-length encoding (RLE). Students will implement encoding and decoding of raw data, conversion between data and strings, and display of information by creating procedures that can be called from within their programs and externally.This project will give students practice with loops, strings, Python lists, methods, and type-casting.RLE is a form of lossless compression used in many industry applications, including imaging. It is intended to take advantage of datasets where elements (such as bytes or characters) are repeated several times in a row in certain types of data (such as pixel art in games). Black pixels often appear in long “runs” in some animation frames; instead of representing each black pixel individually, the color is recorded once, following by the number of instances.For example, consider the first row of pixels from the pixel image of a gator (shown in Figure 1). The color black is “0”, and green is “2”: Flat (unencoded) data: 0 0 2 2 2 0 0 0 0 0 0 2 2 0_ Run-length encoded data: 2 0 3 2 6 0 2 2 1 0_. Figure 1 – Gator Pixel ImageThe encoding for the entire image in RLE (in hexadecimal) – width, height, and pixels – is: 1E|162032602220121F10721AF21092301210326032308250 W/ H/ ——————————————PIXELS———————————————–/Image Formatting The images are stored in uncompressed / unencoded format natively. In addition, there are a few other rules to make the project more tractable: 1. Images are stored as a list of numbers, with the first two numbers holding image width and height. 2. Pixels will be represented by a number between 0 and 15 (representing 16 unique colors). 3. No run may be longer than 15 pixels; if any pixel runs longer, it should be broken into a new run.For example, the chubby smiley image (Figure 2) would contain the data shown in Figure 3. Figure 2 Figure 3 – Data for “Chubby Smiley” NOTE: Students do not need to work with the image file format itself – they only need to work with lists and encode or decode them. Information about image formatting is to provide context.Student programs must present a menu when run in standalone mode and must also implement several methods, defined below, during this assignment.Standalone Mode (Menu) When run as the program driver via the main() method, the program should: 1) Display welcome message 2) Display color test (ConsoleGfx.test_rainbow) 3) Display the menu 4) Prompt for inputNote: for colors to properly display, it is highly recommended that student install the “CS1” theme on the project page. There are five ways to load data into the program that should be provided and four ways the program must be able to display data to the user. Loading a File Accepts a filename from the user and invokes ConsoleGfx.load_file(filename): Select a Menu Option: 1 Enter name of file to load: testfiles/uga.gfx Loading the Test Image Loads ConsoleGfx.test_image: Select a Menu Option: 2_ Test image data loaded._Reading RLE String Reads RLE data from the user in hexadecimal notation with delimiters (smiley example): Select a Menu Option: 3 Enter an RLE string to be decoded: 28:10:6B:10:10B:10:2B:10:12B:10:2B:10:5B:20:11B:10:6B:10 Reading RLE Hex String Reads RLE data from the user in hexadecimal notation without delimiters (smiley example): Select a Menu Option: 4 Enter the hex string holding RLE data: 28106B10AB102B10CB102B105B20BB106B10 Reading Flat Data Hex StringReads raw (flat) data from the user in hexadecimal notation (smiley example): Select a Menu Option: 5 Enter the hex string holding flat data: 880bbbbbb0bbbbbbbbbb0bb0bbbbbbbbbbbb0bb0bbbbb00bbbbbbbbbbb0bbbbbb0 Displaying the Image Displays the current image by invoking the ConsoleGfx.display_image(image_data) method. Displaying the RLE String Converts the current data into a human-readable RLE representation (with delimiters): Select a Menu Option: 7RLE representation: 28:10:6b:10:10b:10:2b:10:12b:10:2b:10:5b:20:11b:10:6b:10 Note that each entry is 2-3 characters; the length is always in decimal, and the value in hexadecimal! Displaying the RLE Hex Data Converts the current data into RLE hexadecimal representation (without delimiters): Select a Menu Option: 8 RLE hex values: 28106b10ab102b10cb102b105b20bb106b10 Displaying the Flat Hex Data Displays the current raw (flat) data in hexadecimal representation (without delimiters): Select a Menu Option: 9 Flat hex values: 880bbbbbb0bbbbbbbbbb0bb0bbbbbbbbbbbb0bb0bbbbb00bbbbbbbbbbb0bbbbbb0Class Methods Student classes are required to provide all of the following methods with defined behaviors. We recommend completing them in the following order: 1. to_hex_string(data) Translates data (RLE or raw) a hexadecimal string (without delimiters). This method can also aid debugging. Ex: to_hex_string([3, 15, 6, 4]) yields string “3f64”. 2. count_runs(flat_data) Returns number of runs of data in an image data set; double this result for length of encoded (RLE) list. Ex: count_runs([15, 15, 15, 4, 4, 4, 4, 4, 4]) yields integer 2.3. encode_rle(flat_data) Returns encoding (in RLE) of the raw data passed in; used to generate RLE representation of a data. Ex: encode_rle([15, 15, 15, 4, 4, 4, 4, 4, 4]) yields list [3, 15, 6, 4].4. get_decoded_length(rle_data) Returns decompressed size RLE data; used to generate flat data from RLE encoding. (Counterpart to #2) Ex: get_decoded_length([3, 15, 6, 4]) yields integer 9. 5. decode_rle(rle_data) Returns the decoded data set from RLE encoded data. This decompresses RLE data for use. (Inverse of #3) Ex: decode_rle([3, 15, 6, 4]) yields list [15, 15, 15, 4, 4, 4, 4, 4, 4].6. string_to_data(data_string) Translates a string in hexadecimal format into byte data (can be raw or RLE). (Inverse of #1) Ex: string_to_data (“3f64”) yields list [3, 15, 6, 4].7. to_rle_string(rle_data) Translates RLE data into a human-readable representation. For each run, in order, it should display the run length in decimal (1-2 digits); the run value in hexadecimal (1 digit); and a delimiter, ‘:’, between runs. (See examples in standalone section.) Ex: to_rle_string([15, 15, 6, 4]) yields string “15f:64”.8. string_to_rle(rle_string) Translates a string in human-readable RLE format (with delimiters) into RLE byte data. (Inverse of #7) Ex: string_to_rle(“15f:64”) yields list [15, 15, 6, 4].Submissions NOTE: Your output must match the example output *exactly*. If it does not, you will not receive full credit for your submission! File: Method: rle_program.py Submit on ZyLabs Do not submit any other files!Part A (5 points) For part A of this assignment, students will set up the standalone menu alongside the 4 requirements listed on page 2 of this document. In addition to this, students should also set up menu options 1 (loading an image), 2 (loading specifically the test image), and 6 (displaying whatever image was loaded) in order to help grasp the bigger picture of the project.This involves correctly setting up the console_gfx.py file and utilizing its methods. You will use ConsoleGfx.display_image(…) to display images. Notice how it takes in a decoded list. This is the format in which you will locally (in your program) store any image data that you are working with. When the document mentions that something is “loaded” it means that something is stored as a list of flat (decoded) data.Part B (60 points) For part B of this assignment, students will complete the first 6 methods on page 3 of this document. They must match specifications and pass test cases on chapter 12.2 in Zybooks, which will be your means of submission for this part of the assignment. Your grade will be the score received on Zybooks. To guarantee functionality moving forward to part C, it is expected that you will receive full marks for this section.Part C (35 points) For part C of this assignment, students will now complete the final 2 methods on page 3 of this document as well as the remainder of the project involving the menu options and understanding how all the individual methods are intertwined with each other. You will submit your whole program including the 8 methods listed above and the main method in chapter 12.3 in Zybooks. We will only test your remaining 2 methods and the main method in part C.

[SOLVED] Coms 4771 hw4 1 from distances to embeddings your friend from overseas is visiting you and asks

Your friend from overseas is visiting you and asks you the geographical locations of popular US cities on a map. Not having access to a US map, you realize that you cannot provide your friend accurate information.You recall that you have access to the relative distances between nine popular US cities, given by the following distance matrix D: Distances (D) BOS NYC DC MIA CHI SEA SF LA DEN BOS 0 206 429 1504 963 2976 3095 2979 1949 NYC 206 0 233 1308 802 2815 2934 2786 1771 DC 429 233 0 1075 671 2684 2799 2631 1616 MIA 1504 1308 1075 0 1329 3273 3053 2687 2037 CHI 963 802 671 1329 0 2013 2142 2054 996 SEA 2976 2815 2684 3273 2013 0 808 1131 1307 SF 3095 2934 2799 3053 2142 808 0 379 1235 LA 2979 2786 2631 2687 2054 1131 379 0 1059 DEN 1949 1771 1616 2037 996 1307 1235 1059 0Being a machine learning student, you believe that it may be possible to infer the locations of these cities from the distance data. To find an embedding of these nine cities on a two dimensional map, you decide to solve it as an optimization problem as follows.You associate a two-dimensional variable xi as the unknown latitude and the longitude value for each of the nine cities (that is, x1 is the lat/lon value for BOS, x2 is the lat/lon value for NYC, etc.).You write down the an (unconstrained) optimization problem minimizex1,…,x9 X i,jkxi xjk Dij 2 , where P i,j (kxi xjk Dij )2 denotes the embedding discrepancy function.(i) What is the derivative of the discrepancy function with respect to a location xi? (ii) Write a program in your preferred language to find an optimal setting of locations x1,…,x9.You must submit your code to receive full credit.(iii) Plot the result of the optimization showing the estimated locations of the nine cities. (here is a sample code to plot the city locations in Matlab) >> cities={’BOS’,’NYC’,’DC’,’MIA’,’CHI’,’SEA’,’SF’,’LA’,’DEN’}; >> locs = [x1;x2;x3;x4;x5;x6;x7;x8;x9]; >> figure; text(locs(:,1), locs(:,2), cities); What can you say about your result of the estimated locations compared to the actual geographical locations of these cities?In this question, you will get a glimpse into some of the similarities and differences of kernelized SVMs (KSVM) and neural networks (NN) using both theory and empirics.Understanding KSVMs for Regression: In lecture we covered kernelized SVMs for classification, but here we will consider a formulation for regression. For classification, we aimed to find the maximum margin decision boundary between our two classes.For regression, we can first study the linear case with one dimensional inputs and outputs. We aim to learn a predictor of the form f(x) = wx + b given a set of observations (xi, yi) for i 2 [N]. We now want to constrain the possible choices of our parameters, namely we want 8i 2 [N] : |yi (wxi + b)|  ✏, where ✏ is a hyperparameter. This ensures that all our residuals are within ✏ distance.(a) Assuming there exists such a predictor, we can formulate our optimization problem as follows: min w,b 1 2 w2 such that: 8i 2 [N] : |yi (wxi + b)|  ✏.Notice that the constraints are making sure that our prediction is within distance of ✏. What role does the objective play in this optimization? (Hint: consider the formulation of ridge regression)(b) Of course, it is not guaranteed that all predictions will be within ✏ distance. Thus, we can add in slack variables as follows: min w,b,⇠, 1 2 w2 + CX N i=1 (⇠i + i) such that: yi (wxi + b)  ✏ + ⇠i 8i 2 [N] (wxi + b) yi  ✏ + i ⇠i 0 i 0. Describe in words the role of ⇠i and i.(c) Show that the learned predictor will take the form f(x) = PN i=1(↵i i)xix + b by setting up the Lagrangian and examining the stationary points with respect to w.So far our SVM formulation was for linear regression, where the prediction is done by f(x) = X N i=1 (↵i i)xix + b.Note that by taking i = ↵i i, and replacing the (inner) product between xi and x by any non-linear kernel, we can now have a kernelized version as fkern(x) = X N i=1 iK(xi, x) + b.For the remainder of the question we will be focusing on the RBF kernel. Approximation Power of RBF KSVMs: RBF kernels are known to fit any arbitrary complex functions. Here, we will show specifically that our KSVM model can approximate any continuous function over the interval [1, 1] with arbitrary precision1.Let Z be the set of all possible RBF KSVM regressors of the type [1, 1] ! R on a non-empty dataset, that is, Z := {fkern}.(d) Prove that Z is an algebra, that is, it is closed under: (i) addition (8f1, f2 2 Z : (f1+f2) 2 Z), (ii) multiplication (8f1, f2 2 Z : f1f2 2 Z, i.e. element-wise multiplication), and (iii) scalar multiplication (8f1 2 Z : 8c 2 R : cf1 2 Z).For multiplication, you may assume = 1 as the scaling parameter of the RBF kernel to simplify calculations.(e) Prove that Z can “isolate” each point in [1, 1], that is, for a fixed dataset, there exists distinct x, y 2 [1, 1] : 9f 2 Z : f(x) 6= f(y). (f) Prove that 8x 2 [1, 1] : 9f 2 Z : f(x) 6= 0.Parts d-f collectively imply RBF KSVMs can approximate any continuous function on the interval [1, 1]. 2Approximation Power of 2-Layer Neural Networks: A 2-layer neural network (with one-dimensional input, one-dimensional output, and an N-width hidden layer) is defined as fNN(x) := X N i=1 ↵i(wix + bi), where wi, bi and ↵i are the network weights, and is the “activation” function (usually a sigmoid, ReLU, or tanh).1The set of continuous functions on the interval [1, 1] is denoted C([1, 1]). 2One can combine these results, see e.g. Stone–Weierstrass Theorem, to show universal approximability. 3 silina)Here we will show that ReLU activated 2-Layer Neural Networks, i.e. fNN can approximate any function from [1, 1] ! R, by showing that ReLU activations are “discriminatory”.3 An activation function is called discriminatory if for a given µ 2 C([1, 1]), ⇣ 8w, b 2 R : Z 1 1 (wx + b)µ(x)dx = 0⌘ =) µ(x)=0.(g) Prove that the linear activation function, that is, (x) = x is not discriminatory. (Hint: think of a discrete case first.)(h) Assuming that all continuous functions of the form (x) = ( 1 as x ! 1 0 as x ! 1 are discriminatory, prove that ReLU is discriminatory. (Hint: try to construct a type of function out of ReLUs and prove by contradiction.)Comparing Empirical Performance of KSVMs and NNs: If both KSVMs and NNs have universal approximation (as seen in previous parts), then why are NNs more used in practice?While both KSVMs and NNs are powerful models that can work well on arbitrarily complex datasets, on simpler datasets, we want models that require fewer training samples to yield good prediction. Here we will compare the relative performance of KSVMs and NNs on increasingly complex datasets and study which model class adapts better.4Download the dataset provided. It contains train and test samples of various sizes of (x, y) pairs of functions with increasing complexity. For this question, you may use any library you want. (i) To get a feel for the data, create scatter plots (x vs. y) using 50 training samples and 1000 training samples for each function complexity. (There are a total of 10 plots.) (j) Train an RBF KSVM regressor and a NN for each function complexity with varying number of training samples. Suggestions: For the SVM, it is advised you use SciKitLearn’s the built in Support Vector Regression function. The default settings should work fine, just ensure that you specify the right kernel. For the NN, it is advised you use PyTorch. Using a small neural network with 2 or 3 hidden layers and a dozen or a few dozen neurons per layer with the Adam optimizer and ReLU activation function should work well. To squeeze out good performance, try changing the number of epochs and the batch size first. Additionally, see this reference for more training tips: http://karpathy.github.io/2019/04/25/recipe/. You must use the MSE loss. For training sample sizes, consider sizes 50, 100, 300, 500, 750, and 1000.You must submit your code to receive credit. (k) For both KSVM and NN predictors, and each function complexity, plot the test MSE error for varying training sizes.3See e.g. “Approximation by Superpositions of a Sigmoidal Function” by Cybenko for a proof of why discriminatory activation functions imply universal approximation. 4For a more theoretical analysis of this topic, please see https://francisbach.com/quest-for-adaptivity/ and references therein.(l) What can you conclude about the adapatibility of KSVMs vs NNs? Is one model better than the other? Analyze how the prediction quality varies with function complexity and training size.

[SOLVED] Coms 4771 hw3 1 inconsistency of the fairness definitions

Recall the notation and definitions of group-based fairness conditions: Notation: Denote X 2 Rd, A 2 {0, 1} and Y 2 {0, 1} to be three random variables: non-sensitive features of an instance, the instance’s sensitive feature and the target label of the instance respectively, such that (X, A, Y ) ⇠ D. Denote a classifier f : Rd ! {0, 1} and denote Yˆ := f(X).For simplicity, we also use the following abbreviations: P := P(X,A,Y )⇠D and Pa := P(X,a,Y )⇠D Group based fairness definitions: – Demographic Parity (DP) P0[Yˆ = ˆy] = P1[Yˆ = ˆy] 8yˆ 2 {0, 1} (equal positive rate across the sensitive attribute)– Equalized Odds (EO) P0[Yˆ = ˆy | Y = y] = P1[Yˆ = ˆy | Y = y] 8y, y ˆ 2 {0, 1} (equal true positive- and true negative-rates across the sensitive attribute) – Predictive Parity (PP) P0[Y = y | Yˆ = ˆy] = P1[Y = y | Yˆ = ˆy] 8y, y ˆ 2 {0, 1} (equal positive predictive- and negative predictive-value across the sensitive attribute)Unfortunately, achieving all three fairness conditions simultaneously is not possible. An impossibility theorem for group-based fairness is stated as follows.• If A is dependent on Y , then Demographic Parity and Predictive Parity cannot hold at the same time. • If A is dependent on Y and Yˆ is dependent on Y , then Demographic Parity and Equalized Odds cannot hold at the same time.• If A is dependent on Y , then Equalized Odds holds and Predictive Parity cannot hold at the same time.These three results collectively show that it is impossible to simultaneously satisfy the fairness definitions except in some trivial cases. (i) State a scenario where all three fairness definitions are satisfied simultaneously. (ii) Prove the first statement. (iii) Prove the second statement. (iv) Prove the third statement.Hint: First observe that P0[Y = y|Yˆ = ˆy] = P1[Y = y|Yˆ = ˆy] 8y, y ˆ 2 {0, 1} is equivalent to: P0[Y = 1|Yˆ = ˆy] = P1[Y = 1|Yˆ = ˆy] 8yˆ 2 {0, 1}. A necessary condition for PP is the equality of positive predictive value (PPV): P0[Y = 1|Yˆ = 1] = P1[Y = 1|Yˆ = 1]To prove the third statement, it is enough to prove a stronger statement: if A is dependent on Y , Equalized Odds and equality of Positive Predictive Value cannot hold at the same time. Next, try to express the relationship between FPRa (= Pa[Yˆ = 1|Y = 0]) and FNRa (= Pa[Yˆ = 0|Y = 1]) using pa (= P[Y = 1 | A = a]) and PPVa (= Pa[Y = 1|Yˆ = 1]), 8a 2 {0, 1} and finish the proof.The concept of “wisdom-of-the-crowd” posits that collective knowledge of a group as expressed through their aggregated actions or opinions is superior to the decision of any one individual in the group. Here we will study a version of the “wisdom-of-the-crowd” for binary classifiers: how can one combine prediction outputs from multiple possibly low-quality binary classifiers to achieve an aggregate high-quality final output? Consider the following iterative procedure to combine classifier results.Input: – S – a set of training samples: S = {(x1, y1),…,(xm, ym)}, where each yi 2 {1, +1} – T – number of iterations (also, number of classifiers to combine) – F – a set of (possibly low-quality) classifiers. Each f 2 F, is of the form f : X ! {1, +1}Output: – F – a set of selected classifiers {f1,…,fT }, where each fi 2 F. – A – a set of combination weights {↵1,…, ↵T } Iterative Combination Procedure: – Initialize distribution weights D1(i) = 1 m [for i = 1,…,m] – for t = 1,…,T do – // ✏j is weighted error of j-th classifier w.r.t. Dt – Define ✏j := Pm i=1 Dt(i) · 1[yi 6= fj (xi)] [for each fj 2 F]– // select the classifier with the smallest (weighted) error – ft = arg minfj2F ✏j – ✏t = minfj2F ✏j – // recompute weights w.r.t. performance of ft – Compute classifier weight ↵t = 1 2 ln 1✏t ✏t– Compute distribution weight Dt+1(i) = Dt(i) exp(↵tyift(xi)) – Normalize distribution weights Dt+1(i) = P Dt+1(i) i Dt+1(i) – endfor – return weights ↵t, and classifiers ft for t = 1,…,T.Final Combined Prediction: – For any test input x, define the aggregation function as: g(x) := P t ↵tft(x), and return the prediction as sign(g(x)). We’ll prove the following statement: If for each iteration t there is some t > 0 such that ✏t = 1 2 t (that is, assuming that at each iteration the error of the classifier ft is just t better than random guessing), then error of the aggregate classifier err(g) := 1 m X i 1[yi 6= sign(g(xi))]  exp(2 X T t=1 2 t ).That is, the error of the aggregate classifier g decreases exponentially fast with the number of combinations T! (i) Let Zt := P i Dt+1(i) (i.e., Zt denotes the normalization constant for the weighted distribution Dt+1). Show that DT +1(i) = 1 m 1 Q t Zt exp(yig(xi)).(ii) Show that error of the aggregate classifier Q g is upper bounded by the product of Zt: err(g)  t Zt. (hint: use the fact that 0-1 loss is upper bounded by exponential loss)(iii) Show that Zt = 2p✏t(1 ✏t). (hint: noting Zt = P i Dt(i) exp(↵tyift(xi)), separate the expression for correctly and incorrectly classified cases and express it in terms of ✏t) (iv) By combining results from (ii) and (iii), we have that err(g)  Q t 2 p✏t(1 ✏t), now show that: Y t 2 p✏t(1 ✏t) = Y t q 1 42 t  exp(2 X t 2 t ).Thus establishing that err(g)  exp(2 P t 2 t ). 3 1-Norm Support Vector Machine (i) Recall the standard support vector machine formulation: minimize kwk2 2 subject to yi(w · xi + w0) 1, i = 1, . . . , m, where m is the number of points and n is the number of dimensions, is a quadratic program because the objective function is quadratic and the constraints are affine. A linear program on the other hand uses only affine objective function and constraints, and is generally easier to solve than a quadratic program.By replacing the 2-norm in the objective function with the 1-norm (kxk1 = Pn j=1 |xj |), we get minimize kwk1 subject to yi(w · xi + w0) 1, i = 1, . . . , m.Note that the objective function here is not linear because there are absolute values involved. Show that this problem is equivalent to a linear program with 2n variables and m + 2n constraints. (ii) The Chebyshev (`1) distance between two points x and y is defined as maxi |xi yi|. Show that the 1-norm SVM maximizes the Chebyshev distance between the two separating hyperplanes w · x + w0 = ±1. (Hint: Show that the vector (sign(w1),…sign(wn)) minimizes the l1 distance from the origin to the plane w · x = 2.)(iii) When the input data are not perfectly separable, we can apply a soft-margin approach (this is an alternative to the usual slack-variables approach discussed in class): minimize kwk1 +Xm i=1 [1 yi(w · xi + w0)]+ , (1) where [·]+ is the hinge loss function given by max(0, ·). Note that we’ve replaced the constraints with a penalty in the objective function.· Using the fact that strong duality always applies for linear programs, show that (1) can be expressed as maximize k⇡k1 subject to Xm j=1 yixij⇡i  1 j = 1, . . . , n, Xm i=1 yi⇡i = 0, 0  ⇡i  1 i = 1, . . . , n, where ⇡ 2 Rm. (Hint: First express (1) as a linear program and then find its dual.)(iv) Suppose we know that the output y depends only on a few input variables (i.e. the optimal w is sparse). Would the 1-norm or 2-norm SVM make more sense? Justify your answer.Let P be the probability distribution on Rd⇥R for the random pair(X, Y )(where X = (X1,…,Xd)) such that X1,…,Xd ⇠iid N(0, 1), and Y |X = x ⇠ N(xT, kxk2), x 2 Rd Here, = (1,…, d) 2 Rd are the parameters of P, and N(µ, 2) denotes the Gaussian distribution with mean µ and variance 2.(i) Let (x1, y1),…,(xn, yn) 2 Rd ⇥ R be a given sample, and assume xi 6= 0 for all i = 1,…,n. Let f be the probability density for P as defined above. Define Q : Rd ! R by Q() := 1 n Xn i=1 ln f(xi, yi), 2 Rd.Write a convex optimization problem over the variables = (1,…, d) 2 Rd such that its optimal solutions are maximizers of Q over all vector of Euclidean length at most one. (ii) Let (x1, y1),…,(xn, yn) 2 Rd ⇥ R be a given sample, and assume xi 6= 0 for all i = 1,…,n. Let f be the probability density for P as defined above. Define Q : Rd ! R by Q() := 1 n Xn i=1 ln f(xi, yi), 2 Rd.Find a system of linear equations A = b over variables = (1,…, d) 2 Rd such that the solutions are maximizers of Q over all vectors in Rd.

[SOLVED] Coms 4771 hw2 1 designing socially aware classifiers traditional machine learning research focuses on simply improving the accuracy.

Traditional Machine Learning research focuses on simply improving the accuracy. However, the model with the highest accuracy may be discriminatory and thus may have undesirable social impact that unintentionally hurts minority groups1. To overcome such undesirable impacts, researchers have put lots of effort in the field called Computational Fairness in recent years.Two central problems of Computational Fairness are: (1) what is an appropriate definition of fairness that works under different settings of interest? (2) How can we achieve the proposed definitions without sacrificing on prediction accuracy?In this problem, we will focus on some of the ways we can address the first problem. There are two categories of fairness definitions: individual fairness2 and group fairness3. Most works in the literature focus on the group fairness. Here we will study some of the most popular group fairness definitions and explore them empirically on a real-world dataset.Generally, group fairness concerns with ensuring that group-level statistics are same across all groups. A group is usually formed with respect to a feature called the sensitive attribute. Most common sensitive features include: gender, race, age, religion, income-level, etc. Thus, group fairness ensures that statistics across the sensitive attribute (such as across, say, different age groups) remain the same.For simplicity, we only consider the setting of binary classification with a single sensitive attribute. Unless stated otherwise, we also consider the sensitive attribute to be binary. (Note that the binary assumption is only for convenience and results can be extended to non-binary cases as well.)Notations: Denote X 2 Rd, A 2 {0, 1} and Y 2 {0, 1} to be three random variables: non-sensitive features of an instance, the instance’s sensitive feature and the target label of the instance respectively, such that (X, A, Y ) ⇠ D. Denote a classifier f : Rd ! {0, 1} and denote Yˆ := f(X).1see e.g. Machine Bias by Angwin et al. for bias in recidivism predication, and Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification by Buolamwini and Gebru for bias in face recognition 2see e.g. Fairness Through Awareness by Dwork et al. 3see e.g. Equality of Opportunity in Supervised Learning by Hardt et al.For simplicity, we also use the following abbreviations: P := P(X,A,Y )⇠D and Pa := P(X,a,Y )⇠D We will explore the following are three fairness definitions. – Demographic Parity (DP) P0[Yˆ = ˆy] = P1[Yˆ = ˆy] 8yˆ 2 {0, 1} (equal positive rate across the sensitive attribute)– Equalized Odds (EO) P0[Yˆ = ˆy | Y = y] = P1[Yˆ = ˆy | Y = y] 8y, y ˆ 2 {0, 1} (equal true positive- and true negative-rates across the sensitive attribute) – Predictive Parity (PP) P0[Y = y | Yˆ = ˆy] = P1[Y = y | Yˆ = ˆy] 8y, y ˆ 2 {0, 1} (equal positive predictive- and negative predictive-value across the sensitive attribute)(i) Why is it not enough to just remove the sensitive attribute A from the dataset to achieve fairness as per the definitions above? Explain with a concrete example.Part 1: Sometimes, people write the same fairness definition in different ways. (ii) Show that the following two definitions for Demographic Parity is equivalent under our setting: P0[Yˆ = 1] = P1[Yˆ = 1] () P[Yˆ = 1] = Pa[Yˆ = 1] 8a 2 {0, 1} (iii) Generalize the result of the above equivalence and state an analogous equivalence relationship of two equality when A 2 N, and Yˆ 2 R.Part 2: In this part, we will explore the COMPAS dataset. The task is to predict two year recidivism. Download the COMPAS dataset posted on the class discussion board. In this dataset, the target label Y is two year recid and the sensitive feature A is race.(iv) Develop the following classifiers: (1) MLE based classifier, (2) nearest neighbor classifier, and (3) na¨ıve-bayes classifier, for the given dataset.For MLE classifier, you can model the class conditional densities by a Multivariate Gaussian distribution. For nearest neighbor classifier, you should consider different values of k and the distance metric (e.g. L1, L2, L1). For the na¨ıve-bayes classifier, you can model the conditional density for each feature value as count probabilities. (you may use builtin functions for performing basic linear algebra and probability calculations but you should write the classifiers from scratch.)You must submit your code to Courseworks to receive full credit.(v) Which classifier (discussed in previous part) is better for this prediction task? You must justify your answer with appropriate performance graphs demonstrating the superiority of one classifier over the other. Example things to consider: how does the training sample size affects the classification performance.(vi) To what degree the fairness definitions are satisfied for each of the classifiers you developed? Show your results with appropriate performance graphs. For each fairness measure, which classifier is the most fair? How would you summarize the difference of these algorithms?(vii) Choose any one of the three fairness definitions. Describe a real-world scenario where this definition is most reasonable and applicable. What are the potential disadvantage(s) of this fairness definition? (You are free to reference online and published materials to understand the strengths and weaknesses of each of the fairness definitions. Make sure cite all your resources.)(viii) [Optional problem, will not be graded] Can an algorithm simultaneously achieve high accuracy and be fair and unbiased on this dataset? Why or why not, and under what fairness definition(s)? Justify your reasoning.2 Data dependent perceptron mistake bound In class we have seen and proved the perceptron mistake bound which states that the number of mistakes made by the perceptron algorithm is bounded by ⇣ R⌘2 . (i) Prove that this is tight. That is, give a dataset and an order of updates such that the perceptron algorithm makes exactly ⇣ R⌘2 mistakes.Interestingly, although you have hence proved that the perceptron mistake bound is tight, this does not mean that it cannot be improved upon. The claimed “tightness” of the bound simply means that there exists a “bad” case which achieves this worst case bound. If we make some extra assumptions, these bad cases might be ruled out and the worst case bound could significantly improve.In ML, it is common to look at how extra assumptions can help improve such bounds 4. This is what we will do in this problem. As in class let S = {(xi, yi)}n i=1 be our (linearly separable) dataset where xi 2 RD and yi 2 {1, 1}. Also let w⇤ be the unit vector defining the optimal linear boundary with the optimal margin (i.e. 8i, yi(w⇤ · xi) ). Finally, let R = maxxi2S kxik. Note that the standard bound tells us that the perceptron algorithm will make at most ⇣ R⌘2 mistakes.4Indeed there is a vast field that comprises of trying to get “data-dependent bounds”, i.e. bounds that give better results if you know some nice properties of your data.Now assume that we are given the extra information that maxxi2S k(I P)xik  ✏ < R where P = w⇤w⇤T and thus (I P) is the projector onto the orthogonal complement space of w⇤ 5. The goal of this problem is to show that when running the perceptron algorithm on S, the number of mistakes is bounded by ⇣ ✏⌘2 + 1 (which is arbitrarily better than the standard bound).Let iT be the index of the element on which the Tth mistake was made. Let wT be the weight vector after T mistakes have been made. Note that w0 = 0. (ii) Show that kwT k2  ✏2T + PT t=1 kP xit k2. Hint: Start by showing that kwT k2  PT t=1 kxit k2. Also, it may be helpful for both (ii) and (iii) to review the properties of projection matrices (but make sure you prove any facts you use).(iii) Show that wT · w⇤2 T(T 1)2 + PT t=1 kP xit k2. Hint: start by showing that wT · w⇤ = PT t=1 yitxit · w⇤. (iv) Use parts (ii) and (iii) to show that T  ✏2 + 1. Notice (for yourself, no writing necessary) that you successfully proved the tighter bound!Compute the distance from the hyperplane g(x) = w · x + w0 = 0 to a point xa by using constraint optimization techniques, that is, by minimizing the squared distance kx xak2 subject to the constraint g(x)=0.In class, you have studied that decision trees can be simple but powerful classifiers that separate the training data by making a sequence of binary decisions along the optimal features to split the data. In their most basic implementation (without early stopping or pruning), decision trees continue this procedure until every data point in the dataset has a leaf node associated with it. In this problem, we will choose a method to characterize the complexity of decision trees and test the limits of single decision trees on a challenging data-set, before moving onto examining a more powerful algorithm and testing its limits, too.(i) Download the FashionMNIST dataset6 provided to you astrain.npy, trainlabels.npy, test.npy and testlabels.npy. These will be the train and test splits of the dataset you should use for all the following parts. Please ensure that you do not use any other split or download method as you will be evaluated against this split. Once downloaded, visualize a few data-points and their associated labels. Briefly comment on why this is a harder dataset to work with than classical MNIST, and why a simple classifier such as nearest neighbors is likely to do poorly.5Geometrically the data resides in a high dimensional oval (ellipsoid) where the longest axis is in the direction of w⇤ and has radius R, and the other axes have radius ✏. While this assumption seems quite construed (if we know that the longest axis is in the direction of w⇤ it seems that we could trivially find w⇤), similar situations can be quite common when doing metric learning (which would approximately stretch the dataset in the direction of w⇤ while compressing in the orthogonal directions, hence resulting in a similarly oval shaped dataset). 6FashionMNIST dataset – https://github.com/zalandoresearch/fashion-mnistThere are many ways to measure the number of parameters in classifiers. In this problem, we will be using the number of leaves in the trained decision tree as a measure of how complex it is. Trivially, this is related to other methods of capturing decision tree complexity, such as its maximum depth and its total number of decisions/nodes.(ii) As a first step, train a series of decision trees on the training split of FashionMNIST, with a varying limit on the maximum number of permitted leaf nodes. Once trained, evaluate the performance of your classifiers on both the train and test splits, plotting the 0-1 loss of the train/test curves against the maximum permitted number of leaf nodes (log scale horizontal axis). You are permitted to use an open source implementation of a decision tree classifier (such as sklearn’s DecisionTreeClassifier) as long as you are able to control the maximum number of leaves. What is the minimum loss you can achieve and what do you observe on the plot?For your analysis in the rest of the question, recall the definitions of bias and variance as sources of error. Consider your classifier ˆf(x; D) where D is the dataset used to train it. High bias classifiers are ones where in expectation over D, the learnt classifier predicts far away from the true classifier.On the other hand, a high variance classifier is one where the estimate of the function itself will be be strongly sensitive to the split D of the training data used. High bias and high variance classifiers are typically associated with models considered too simple (underfit) or too complex (overfit) respectively, and study of the bias-variance tradeoff helps find the correct model class for good train-test generalization.(iii) Inspect your decision tree classifiers for the maximum number of leaves they actually used. Did they always use the full capacity of leaves you permitted? (hint: If your answer is yes, go back to step 2 and try a tree with higher complexity, aka, one with a greater number of maximum leaf nodes than what you have already tried) What is the maximum number of leaves that a trained classifier ends up using?Clearly there is a limit to the training of the decision tree beyond which the loss starts going up again. As the complexity of the decision tree increases, its variance does too: it achieves low training (empirical) risk but high test (true) risk. A trivial way to mitigate this would be to keep restricting the maximum number of leaves, but this would bring us back to the high bias zone. Instead, we will turn to ensembling: a trick that allows us to reduce variance without resorting to simpler, high bias classifiers.The idea behind ensembling is to train a range of independent weaker models and combine them towards the final goal of making a stronger model: if our problem is classification, for instance, we may have them vote on the final answer.The canonical first ensembling method associated with decision trees is the Random Forest algorithm. The algorithm trains a series of (shallower) decision trees on random subsets of the original set, before having them vote on the final answer. Intuitively, this allows each tree to be smaller and requires a larger number of trees to agree on an answer before providing a result, hence keep model bias low while reducing variance. Additionally, random forest also allows individual decision trees to only access a random subset of the features in the training data.(iv) Why do you think it is important for individual estimators in the random forest to have access to only a subset of all features in lieu of reducing variance?(v) With the random forest model, we now have two hyperparameters to control: the number of estimators and the maximum permitted leaves in each estimator, making the total parameter count the product of the two. In the ensuing sections, you are allowed to use an open source implementation of the random forest classifier (such as sklearn’s RandomForestClassifier) as long as you can control the number of estimators used and maximum number of leaves in each decision tree trained.(a) First, make a plot measuring the train and test 0-1 loss of a random forest classifier with a fixed number of estimators (default works just fine) but with varying number of maximum allowed tree leaves for individual estimators. You should plot the train and test error on the same axis against a log scale of the total number of parameters on the horizontal axis. In this case, you are making individual classifiers more powerful but keeping the size of the forest the same. What do you observe- does an overfit seem possible?(b) Second, make a plot measuring the train and test 0-1 loss of a random forest classifier with a fixed maximum number of leaves but varying number of estimators. You should plot the train and test error on the same axis against a log scale of the total number of parameters on the horizontal axis. Ensure that the maximum number of leaves permitted is small compared to your answer in part (iii) to have shallower trees.In this case, you are making the whole forest larger without allowing any individual tree to fit the data perfectly, aka without any individual tree achieving zero empirical risk. How does your best loss compare to the best loss achieved with a single decision tree? What about for a similar number of total parameters? With a sufficiently large number of estimators chosen, you should still see variance increasing, albeit with an overall lower test loss curve.(vi) Now we will generate a final plot. Here we will vary both the number of estimators and the number of maximum leaves allowed, albeit in a structured manner. First while allowing only a single estimator (effectively reducing the random forest to a decision tree) increase the maximum leaves permitted until your answer in part (iii), aka the number of leaves needed for the single tree to overfit- we will call this Phase 1. Now, keeping the maximum permitted leaves the same, keep doubling the number of estimators allowed. We will call this Phase 2.Train a random forest for all these combinations and make a plot of the train and test 0-1 loss versus the total number of parameters on a log scale. Note that Phase 1 and Phase 2 is clearly separable on the horizontal axis. Please note the following details as you perform the experiment to make it clear why this experiment is different from the ones you have previously performed. What surprising result do you observe from the loss curve at the end of phase 1?• In this experiment, we follow a very structured way of increasing the number of parameters in the model: we first increase the maximum number of permitted leaves of a single tree until we see no further change in the operation of the algorithm. Only then do we increase the model complexity by growing the size of the forest. In (v), on the contrary, you experimented with growing the size of the forest without having any single strong tree and growing the size of the trees while keeping forest size constant.• Observe what point on the horizontal axis (number of parameters) corresponds to the ‘surprising’ result on the test loss. Is there a relationship between the order of the number of parameters and the number of data points in the training set?The phenomenon, called double descent (in seeming contrast to the traditional U-shaped curve test set curve observed in classical ML models) is a recent discovery and has been seen as a way to reconcile the ‘classical’ U-shaped test loss curve and the ‘modern regime’ monotonically decreasing test loss curve usually found in neural networks.You can read more about the theoretical reasoning (as well as some more experiments relating parameters and dataset size concretely) behind the phenomenon in this paper: https://arxiv.org/abs/1812.11118.Important Note: Remember to include all the plots as required by the questions in your report, organized by the questions that require them, with details clearly being given about any hyperparameters you chose to use. Make sure to submit all code in accordance with the coding instructions already provided. Also, list all your dependencies in a file called requirements.txt and submit this inside your code zip.

[SOLVED] Coms 4771 hw1 1 analyzing bayes classifier consider a binary classification problem where the output

Consider a binary classification problem where the output variable Y 2 {0, 1} is fully determined by variables A, B and C (each in R). In particular Y = ( 1 if A + B + C < 7 0 otherwise . (i) Let each A, B and C be i.i.d. exponential random variable (with mean parameter = 1). (a) Suppose the variables A and B are known but C is unknown, compute P[Y = 1|A, B], the optimal Bayes classifier, and the corresponding Bayes error. (b) If only the variable A is known but the variables B and C are unknown, compute P[Y = 1|A], the optimal Bayes classifier, and the corresponding Bayes error. (c) If none of the variables A, B, C are known, what is the Bayes classifier and the corresponding Bayes error? (ii) Assume that variables A, B and C are independent. For known A and B, show that there exists a distribution on C for which the Bayes classification error rate can be made as close to 1/2 as desired.As discussed in class, occasionally the standard misclassification error is not a good metric for a given task. In this question, we consider binary classification (Y = {0, 1}) where the two possible types of errors are not symmetric. Specifically, we associate a cost of p > 0 for outputting 0 when the class is 1, and a cost of q > 0 for outputting 1 when the class is 0.Furthermore, we allow the classifier to output -1, indicating an overall lack of confidence as to the label of the provided 1 example, and incur a cost r > 0. Formally, given a classifier f : X ! {1, 0, 1} and an example x 2 X with label y 2 {0, 1} we define the loss of f with respect to example (x, y) by: `(f(x), y) = 8 >>>< >>>: 0 if f(x) = y p if f(x)=0 and y = 1 q if f(x)=1 and y = 0 r if f(x) = 1 (i) Describe a real world scenario where this model of classification may be more appropriate than the standard model seen in class. (ii) Assume that r < pq p+q . Let f ⇤ be defined as: f ⇤(x) = 8 >< >: 0 if 0  ⌘(x)  r p 1 if r p < ⌘(x) < 1 r q 1 if 1 r q  ⌘(x)  1 . where ⌘(x) = Pr[Y = 1 | X = x].Show that f ⇤ is the Bayes classifier for this model of classification. Specifically, show that for any other classifier g : X ! {1, 0, 1} we have: Ex,y `(f ⇤(x), y)  Ex,y `(g(x), y). (iii) Assume that r pq p+q . Show that the Bayes classifier is given by: f ⇤(x) = ( 0 if 0  ⌘(x)  q p+q 1 if q p+q  ⌘(x)  1 . where ⌘(x) = Pr[Y = 1 | X = x]. (iv) Using the previous parts, show that when p = q and r > p/2 the Bayes classifier is the same as the one we derived in class. Explain intuitively why this makes sense.3 Finding (local) minima of generic functions Finding extreme values of functions in a closed form is often not possible. Here we will develop a generic algorithm to find the extremal values of a function. Consider a smooth function f : R ! R. (i) Recall that Taylor’s Remainder Theorem states: For any a, b 2 R, exists z 2 [a, b], such that f(b) = f(a) + f0 (a)(b a) + 1 2 f00(z)(b a)2. Assuming that there exists L 0 such that for all a, b 2 R, |f0 (a)f0 (b)|  L2|ab|, prove the following statement: For any x 2 R, there exists some ⌘ > 0, such that if x¯ := x ⌘f0 (x), then f(¯x)  f(x), with equality if and only if f0 (x)=0.(Hint: first show that the assumption implies that f has bounded second derivative, i.e., f00(z)  L2 (for all z); then apply the remainder theorem and analyze the difference f(x) f(¯x)). 2 (ii) Part (i) gives us a generic recipe to find a new value x¯ from an old value x such that f(¯x)  f(x). Using this result, develop an iterative algorithm to find a local minimum starting from an initial value x0. (iii) Use your algorithm to find the minimum of the function f(x) := (x3)2 + 4e2x.You should code your algorithm in a scientific programming language like Matlab to find the solution. You don’t need to submit any code. Instead you should provide a plot of how the minimum is approximated by your implementation as a function of the number of iterations. (iv) Why is this technique only useful for finding local minima? Suggest some possible improvements that could help find global minima.4 Exploring the Limits Current Language Models Recently, OpenAI launched their Chat Generative Pre-Trained Transformer (ChatGPT), a powerful language model trained on top of GPT-3. Models like ChatGPT and GPT-3 have demonstrated remarkable performance in conversation and Q&A and overall large generative modeling. Despite their impressive performance, the sentences produced by such models are not “foolproof”. Here we will explore how effectively can one distinguish between human vs. AI generated written text.The classification model which you will develop can also be used to generate new sentences in a similar fashion to ChatGPT! One of the simplest language models is N-grams. An N-gram is a sequence of N consecutive words w1:N . The model calculates the probability of a word wi appearing using only the N 1 words before it. This local independence assumption can be considered as a Markov-type property. For the bigram model, the probability of a sequence of words w1, w2,…,wn appearing is P(w1:n) = Yn i=1 P(wi|wi1). We use maximum likelihood estimation (MLE) to determine the conditional probabilities above. Given a training corpus D, let C(wi1wi) denote the number of times the bigram (wi1, wi) appears in D. Then we can approximate the conditional probability as P(wi|wi1) = P(wi1wi) P(wi1) = approx. C(wi1wi) C(wi1) .The probability that a sequence of words is from class y is P(y|w1:n) = P(y, w1:n) P(w1:n) = P(y)P(w1:n|y) P y˜ P(˜y)P(w1:n|y˜) , where P(y) is the prior. Formally, the bigram classifier is f(w1:n) = arg max y P(y|w1:n) = arg max y P(y)P(w1:n|y). To address out of vocabulary (OOV) words (i.e. N-grams in the test set but not in the training corpus), N-gram models often employ “smoothing”. One common technique is Laplacian smoothing, which assumes that every N-gram appears exactly one more time than actually observed. Given a vocabulary V , the conditional probability becomes P(wi|wi1) = approx. C(wi1wi)+1 C(wi1) + |V | . 3 Note that the denominator is increased by |V | so the probabilities sum to one. (i) Calculate the class probability P(y|w1:n) for a trigram model with Laplacian smoothing. Download the datafile humvgpt.zip1.This file contains around 40, 000 human written and 20, 000 ChatGPT written entries. The data is stored in separate text files named hum.txt and gpt.txt respectively. (ii) (a) Clean the data by removing all punctuation except “,.?!” and converting all words to lower case. You may also find it helpful to add special and tokens for calculating N-gram probabilities. Partition 90% of the data to a training set and 10% to test set. (b) Train a bigram and trigram model by finding the N-gram frequencies in the training corpus. Calculate the percentage of bigrams/trigrams in the test set that do not appear in the training corpus (this is called the OOV rate). (you must submit your code to get full credit) (c) Evaluate the model on the test set and report the classification accuracy. Which model performs better and why? Your justification should consider the bigram and trigram OOV rate.This study will also tell you how difficult or easy it is to distinguish human vs. AI generated text! Besides classification, N-gram models may also be used for text generation. Given a sequence of n 1 previous tokens win+2:i, the model selects the next word with probability P(wi+1 = w) = P(w|win+2:i) = approx. exp(C(win+2:iw)/T) P w˜ exp(C(win+2:iw˜)/T) .In the above equation, T is referred to as the temperature. (iii) (a) Using T = 50, generate 5 sentences of 20 words each with the human and ChatGPT bigram/trigram models. Which text corpus and N-gram model generates the best sentences? What happens when you increase/decrease the temperature? (b) Nowadays, language models (e.g. transformers) use an attention mechanism to capture long-range context. Why do sentences generated from the n-gram model often fail in this regard? 1The data is adapted from https://huggingface.co/datasets/Hello-SimpleAI/HC3.

[SOLVED] Coms w4701: artificial intelligence homework 4 problem 1: robot localization (30 points) a robot is wandering around a room with some obstacles

A robot is wandering around a room with some obstacles, labeled as # in the grid below. It can occupy any of the free cells labeled with a letter, but we are uncertain about its true location and thus keep a belief distribution over its current location. At each timestep it moves from its current cell to a neighboring free cell in one of the four cardinal directions with uniform probability; it cannot stay in the same cell.For example, from A the robot can move to either B or C with probability 1 2 , while from D it can move to B, C, E, or F, each with probability 1 4 . A B # C D E # F #The robot may also make an observation after each transition, returning what it sees in a random cardinal direction. Possibilities include observing #, “wall”, or “empty” (for a free cell). For example, in B the robot observes “wall”, # (each with probability 1 4 ), or “empty” (probability 1 2 ).(a) Suppose the robot wanders around forever without making any observations. What is the stationary distribution π over the robot’s predicted location? Hint: You can use numpy.linalg.eig in Python. The first return value is a 1D array of eigenvalues; the second return value is a 2D array, where each column is a corresponding eigenvector. Remember that eigenvectors may not sum to 1 by default.(b) Now suppose that we know that the robot is in state D; i.e., Pr(X0 = D) = 1. Starting from this state, the robot makes one transition and observes e1 = #. What is the updated belief distribution Pr(X1 | e1)?(c) The robot makes a second transition and observes e2 = “empty”. What is the updated belief distribution Pr(X2 | e1, e2)?(d) Compute the joint distribution Pr(X1, X2 | e1, e2). You do not need to explicitly list the values that have probability 0. What is the most likely state sequence(s)? (e) Compute b = Pr(e2 | X1). Briefly explain what this quantity represents. (f) Compute the smoothed distribution Pr(X1 | e1, e2) by multiplying f = Pr(X1 | e1) with b = Pr(e2 | X1) and normalizing the result. Confirm that the distribution is the same as that obtained from marginalization of Pr(X1, X2 | e1, e2).We will investigate the absence of conditional independence guarantees between two random variables when an arbitrary descendant of a common effect is observed. We will consider the simple case of a causal chain of descendants:Suppose that all random variables are binary. The marginal distributions of A and B are both uniform (0.5, 0.5), and the CPTs of the common effect D0 and its descendants are as follows: A B Pr(+d0 | A, B) +a +b 1.0 +a −b 0.5 −a +b 0.5 −a −b 0.0 Di−1 Pr(+di | Di−1) +di−1 1.0 −di−1 0.0(a) Give an analytical expression for the joint distribution Pr(D0, D1, · · · , Dn). Your expression should only contain CPTs from the Bayes net parameters. What is the size of the full joint distribution, and how many entries are nonzero?(b) Suppose we observe Dn = +dn. Numerically compute the CPT Pr(+dn|D0). Please show how you can solve for it using the joint distribution in (a), even if you do not actually use it. (c) Let’s turn our attention to A and B. Give a minimal analytical expression for Pr(A, B, D0, +dn). Your expression should only contain CPTs from the Bayes net parameters or the CPT you found in part (b) above. (d) Lastly, compute Pr(A, B | +dn). Show that A and B are not independent conditioned on Dn.In this problem you will explore part-of-speech (POS) tagging, a standard task in natural language processing. The goal is to identify parts of speech and related labels for each word in a given corpus. Hidden Markov models are well suited for this problem, with parts of speech being hidden states and the words themselves being observations.We will be using data from the English EWT treebank from Universal Dependencies, which uses 17 POS tags. We are providing clean versions of training and test data for you. The data format is such that each line contains a word and associated tag, and an empty line signifies the end of a sentence. Feel free to open the files in a text editor to get an idea.The provided Python file contains a couple of functions that can be used to read a data file (you do not need to call these yourself). The global variable POS contains a list of all possible parts of speech. You will be filling in the remaining functions in the file and running the code in main where instructed.3.1: Supervised Learning (12 points) Your first task is to learn the three sets of HMM parameters from the training data. The initial distribution Pr(X0) will be stored in a 1D array of size 17. The transition probabilities will be stored in a 2D array of size 17 × 17. The observation probabilities will be stored in a dictionary, where each key is a word and the value is a 1D array (size 17) of probabilities Pr(word | POS). These probabilities should follow the same order as in the POS list.Implement learn model so that it compiles and returns these three structures. The data input is a list of sentences, each of which is a list of (word, POS) pairs. Your method should iterate over each sentence in the training data, counting the POS appearances in the first word, the number of POS to POS transitions, and the number of POS to word observations. Treat each sentence independently; do not count transitions between different sentences.Be sure to correctly normalize all of these distributions. Make sure that the quantities P i Pr(X0)i P , i Pr(Xt+1 = i | Xt = j), and P k Pr(Et = k | Xt) are all equal to 1.3.2: Viterbi Algorithm (12 points) The next task is to implement the Viterbi algorithm to predict the most likely sequence of states given a sequence of observations. We will break this into two pieces: viterbi forward and viterbi backward. viterbi forward takes in four parameters: initial distribution X0 (1D array), transition matrix Tprobs (2D array), observation probabilities Oprobs (dictionary), and observation sequence obs (list of word observations) of length T.viterbi forward should compute and return two quantities: maxx1,…,xT −1 Pr(x1, …, xT −1, XT , e1:T ) as a 1D array of size 17, and a T ×17 2D array of pointer indices. Row i of the pointer array should contain argmaxxi−1 Pr(x1, …, xi−1, Xi , e1:i).For simplicity, pointers will be the indices of the POS in the POS array rather than strings. Note that it is possible for an observation e to not exist in the Oprobs dictionary if it was not present when the model was trained. If this occurs, you may simply take Pr(e | x) = 1 for all POS x; this is equivalent to skipping the observation step.viterbi backward takes the two quantities returned by viterbi forward as parameters. It should start with the most likely POS according to maxx1,…,xT −1 Pr(x1, …, xT −1, XT , e1:T ) and then follow the pointers backward to reconstruct the entire POS sequence argmaxx1,…,xT Pr(x1, …, xT | e1:T ).The returned object should be a list of POS (strings) from state 1 to state T. Note that the predicted state for X0 should not be included, so your list should be length T.3.3: Model Evaluation (8 points) Your Viterbi implementation can now be used for prediction. evaluate viterbi takes in the HMM parameters, along with a data list in the same format as in learn model. Complete the function so that it runs Viterbi on each sentence separately on the data set. Then compare all returned POS predictions with the true POS in data. Compute and return the accuracy rate as the proportion of correct predictions (this should be a number between 0 and 1). After implementing this function, run the associated code in main and answer the following questions.(a) Report the accuracies of your Viterbi implementation on each data set. Why is the accuracy on the test data set lower than that of the training set? (b) Why can we not expect 100% accuracy on the training set, despite defining the HMM parameters to maximize the likelihood on the training data set?3.4: Forward and Backward Algorithms (12 points) Next, implement the the forward, backward, and forward-backward algorithms (ideally fewer than 5 lines of code each). forward takes in the same four parameters as viterbi forward above, and it computes and returns Pr(Xk, e1:k) (no normalization). backward also takes in four parameters, dropping X0 but newly including the state index k. It should compute and return Pr(ek+1:T | Xk).Note that k follows Python indexing; in other words, k=0 corresponds to X1 and backward should return Pr(e2:T | X1). Again, to deal with the scenario in which a word is not in the observation model, you should explicitly check when this occurs and if so simply use Pr(e | x) = 1.Once you have both forward and backward, it should be straightforward to call these to implement forward backward, which computes the smoothed state distribution Pr(Xk | e1:T ). Note that the call to forward should only use the observations up to (and including) the kth one. Don’t forget to normalize the smoothed distribution.3.5: Inference Comparisons (6 points) Now that you have all of these inference algorithms implemented, come up with a short English phrase p, ideally around 10 words or fewer. You should then identify a word w within this phrase, such that (i) the most likely POS of w according to the result of the forward algorithm on the first portion of p up to word w, and (ii) the most likely POS of w according to the result of the forward-backward algorithm on the entirety of p, are different.Leave the code that you used to obtain these observations in the main function of the code file. In your writeup, give the phrase that you used and the word that you evaluated. Indicate the POS returned by each of the results as described above. Give a brief explanation about why each of the methods returns something different.Submission You should have one PDF document containing your solutions for problems 1-2, as well as your responses to 3.3 and 3.5. You should also have a completed Python file implementing problem 3; make sure that all provided function headers and the filename are unchanged. Submit the document and .py code file to the respective assignment bins on Gradescope. For full credit, you must tag your pages for each given problem on the former.

[SOLVED] Coms w4701: artificial intelligence homework 3 problem 1: mcts practice (12 points) the partial game tree below was discussed in class on the topic of monte carlo tree search

The partial game tree below was discussed in class on the topic of Monte Carlo tree search. Each node shows the win rate: number of playout wins / total number of playouts from that node. The leaf node labeled 0/0 was just expanded in the middle of a MCTS iteration.(a) Suppose that a rollout is performed and the player corresponding to the purple nodes (root, third layer, and newly expanded leaf) wins. Sketch a copy of the tree and indicate all win rates after backpropagation, whether updated or not.(b) Using the new win rates from your tree in (a) above and the exploration parameter α = 1, compute the UCT values of each of the nodes in the second layer of the tree (immediate children of the root node). Which of these three nodes is traversed by the selection policy in the next MCTS iteration?(c) We will refer to the child node you found above as n. Suppose that the player corresponding to the second layer (orange) always loses in simulation. After how many iterations of MCTS will a child node of the root different from n be selected? (You may use a program like WolframAlpha to solve any nonlinear equations.)(d) Starting once again from your tree in (a), solve for the minimum value of α for which a child node of the root different from n be selected in the next MCTS iteration. Explain why we need a higher, not lower, α value in order to select one of the other two nodes.We will model a mini-blackjack game as a MDP. The goal is to draw cards from a deck containing 2s, 3s, and 4s (with replacement) and stop with a card sum as close to 6 as possible without going over. The possible card sums form the states: 0, 2, 3, 4, 5, 6, “done”. The last state is terminal and has no associated actions. From all other states, one action is to draw a card and advance to a new state according to the new card sum, with “done” representing card sums of 7 and 8. Alternatively, one may stop and receive reward equal to the current card sum, also advancing to “done” afterward.(a) Draw a state transition diagram of this MDP. The diagram should be a graph with seven nodes, one for each state. Draw edges that represent transitions between states due to the draw action only; you may omit transitions due to the stop action. Write the transition probabilities adjacent to each edge.(b) Based on the given information and without solving any equations, what are the optimal actions and values of states 5 and 6? You may assume that V ∗ (done) = 0. Then using γ = 0.9, solve for the optimal actions and values of states 4, 3, 2, and 0 (you should do so in that order).Briefly explain why dynamic programming is not required for this particular problem.(c) Find the largest value of γ that would possibly lead to a different optimal action in state 3 (compared to those above) but leave all others the same. Is there any nonzero value of γ that would yield a different optimal action for state 0? Why or why not?Let’s revisit the mini-blackjack game but from the perspective of dynamic programming. You will be thinking about both value iteration and policy iteration at the same time. Assume γ = 0.9.(a) Let’s initialize the time-limited state values: V0(s) = 0 for all s (we will ignore “done”). Find the state values of V1 after one round of value iteration. You do not need to write out every calculation if you can briefly explain how you infer the new values.(b) Coincidentally, V0 = 0 are also the values for the (suboptimal) policy π0(s) = draw for all s. If we were to run policy iteration starting from π0, what would be the new policy π1 after performing policy improvement? Choose the draw action in the case of ties.(c) Perform a second round of value iteration to find the values V2. Have the values converged? (d) Perform a second round of policy iteration to find the policy π2. Has the policy converged?Let’s now study the mini-blackjack game from the perspective of using reinforcement learning given a set of game episodes and transitions. This time, assume γ = 1. We initialize all values and Q-values to 0 and observe the following episodes of state-action sequences: • 0, draw, 3, draw, done (reward = 0) • 0, draw, 2, draw, 4, draw, done (reward = 0) • 0, draw, 4, draw, 6, stop, done (reward = 6) • 0, draw, 3, draw, 5, stop, done (reward = 5) • 0, draw, 2, draw, 5, stop, done (reward = 5)(a) Suppose that the above episodes were generated by following a fixed policy. According to Monte Carlo prediction, what are the values of the six states other than the “done” state? Explain whether the order in which we see these episodes affects the estimated state values.(b) Suppose we use temporal-difference learning with α = 0.8 instead. Write out each of the updates for which a state value is changed. Again explain whether the order in which we see these episodes affects the estimated state values.(c) Now suppose we had generated the above transitions using Q-learning, starting with all Q-values initialized to zero. Which Q values are nonzero after all episodes are complete? Assuming that draw is the default “exploit” action in the case of equal Q values, which transitions would be considered exploratory?We will now extend mini-blackjack from 6 to 21 and add in a few other twists. We have 22 numbered states, one for each possible card sum from 0 to 21, in addition to the “done” terminal state. From each non-terminal state we can either stop and receive reward equal to the current card sum, or we can draw an additional card.When drawing a card from the deck, any of the standard set of cards may show up, but the jack, queen, and king cards are treated as having value equal to 10 (aces will just be treated as 1s). We will still be drawing cards with replacement, so the probability of drawing a card with a value 1 through 9 is 1 13 each, while the probability of obtaining a 10 value is 4 13 . Lastly, we will add in a constant living reward that is received with every draw action.Given this information, we can thus model the problem as a Markov decision process and solve for the optimal policy and value functions. You will be implementing several of the dynamic programming and reinforcement learning algorithms in the provided blackjack Python file.5.1: Value Iteration (12 points) Implement value iteration to compute the optimal values for each of the 22 non-terminal states. The first argument is the initial value function V0 stored in a 1D NumPy array, with the index corresponding to the state. The other arguments are the living reward, discount factor, and stopping threshold. As discussed in class, your stopping criterion should be based on the maximum value change between successive iterations. Your value updates should be synchronous; only use Vi to compute Vi+1. When finished, your function should return the converged values array V ∗ .5.2: Policy Extraction (10 points) An agent would typically care more for a policy than a value function. Implement value to policy, which takes in a set of values, living reward, and discount factor to compute the best policy associated with the provided values. Return the policy as a NumPy array with 22 entries (one for each state), with value 0 representing stop and value 1 representing draw.5.3: DP Analysis (9 points) Now that you can generate optimal policies and values, we can study the impact of living reward and discount factor on the problem. For each of the following experiments, you can simply use an initial set of values all equal to 0.(a) Compute and plot the values V ∗ and policy π ∗ for living reward lr = 0 and γ = 1. You should see that V ∗ consists of three continuous “segments”. Briefly explain why the discontinuities (which show up as “dips”) between the segments exist, referring to the optimal policy found and game rules.(b) Experiment with decreasing the discount factor, e.g. in 0.1 decrements. For sufficiently low values of γ you should see that the three segments of V ∗ merge into two. Show the plots for an instance of this effect and report the γ value you used. Briefly explain the changes that you observe in V ∗ and π ∗ .(c) Reset γ to 1 and experiment with changing the living reward, e.g. using intervals of 1 or 2. Which segments of V ∗ shift for negative living rewards or slightly positive living rewards? In which direction do these values shift in each case? How does π ∗ change as these state values shift? Find approximate thresholds of the living reward in which π ∗ becomes stop in all states, and alternatively draw in all states.5.4: Temporal-Difference Learning (12 points) You will now investigate using reinforcement learning, and in particular the Q learning algorithm, to learn the optimal policy and values for our blackjack game purely through self-play. We will need to keep track of Q values; we can use a 22×2 NumPy array Q, so that we have Q[s,a]= Q(s, a). a=0 corresponds to the stop action and a=1 corresponds to the draw action. To allow for exploration, we will use ε-greedy action selection.The Qlearn function takes in an initial array of Q values, living reward lr, discount factor γ, learning rate α, exploration rate ε, and the number of transitions N. In addition to updating Q, your procedure should keep track of the states, actions, and rewards seen in a N × 3 array record.After initializing the arrays and initial state, the procedure within the simulation loop is as follows: • Use the ε-greedy method to determine whether we explore or exploit. • If stopping, the reward is r = s, where s is the current state. If drawing, the reward is r = lr; also call the draw() function so that you can compute the successor state s ′ .• Update the appropriate Q value, as well as record with the current state, action, and reward. You should use 0 for the “Q value” of the “done” state. • Reset s to 0 if we either took the stop action or s ′ > 21, else set s = s ′ . After finishing N transitions, return Q and record.5.5: RL Analysis (9 points) After you implement Qlearn, you should ensure that it is working correctly by comparing the learned state values (the maximum values in each row of Q) with those returned by value iteration. A suitable default set of learning parameters would be to have a low α (e.g., 0.1), low ε (e.g., 0.1) and high N (e.g., 50000).Once you are ready, call the provided RL analysis function (no additional code needed) and answer the following questions using the plots that you see. Your Qlearn must return the two arrays as specified above in order for this to work properly.(a) The first plot shows the number of times each state is visited when running Qlearn with lr = 0, γ = 1, α = ε = 0.1, and N ranging from 0 to 50k in 10k intervals. Explain how the value of N is important to ensuring that all states are visited sufficiently. You should also note that the two most visited states are the same regardless of N; explain why these two states attract so much attention.(b) The second plot shows the cumulative reward received for a game using the same parameters above, except with N = 10000 and ε ranging from 0 to 1 in 0.2 intervals. Explain what you see with the ε = 0 curve and why we need exploration to learn. Looking at the other curves, how does increasing exploration affect the overall rewards received and why?(c) The third plot shows the estimated state values using the same parameters above, except with ε = 0.1 and α ranging from 0.1 to 1. While the set of curves may be hard to read, each should bear some resemblance to the true state values V ∗ . What is a problem that arises when α is too low, particularly with less visited states? What is a problem that arises when α is too high? Try to compare the stability or smoothness of the different curves.Submission You should have one PDF document containing your solutions for problems 1-4, as well as the information and plots for 5.3 and 5.5. You should also have a completed Python file implementing problem 5; make sure that all provided function headers and the filename are unchanged.Submit the document and .py code file to the respective assignment bins on Gradescope. For full credit, you must tag your pages for each given problem on the former.

[SOLVED] Coms w4701: artificial intelligence homework 2 problem 1: tic-tac-twist (22 points) two players are playing a modified tic-tac-toe game

Two players are playing a modified tic-tac-toe game in which grid spaces have point values, shown on the left grid below. The players take turns marking a grid space with their own designation (X or O) until either one player gets three marks in a row or the board has no empty spaces. When the game ends, a score is computed as the sum of the values of the X spaces minus the sum of the values of the O spaces. In addition, if X has three in a row, 3 points are added to the score; if O has three in a row, 3 points are subtracted from the score. X seeks to maximize the total score and O seeks to minimize it.(a) The right grid shows the current board configuration, whose current value is −3. It is O’s turn to move. Draw out the entire game tree with the root corresponding to the current board. Use game tree convention to draw the MAX and MIN nodes. Also sketch out the tic-tac-toe board configuration for each terminal node (e.g., draw them right below each node).(b) Compute the minimax values of each node. What is the best move for O to make, and what is the expected score of the game assuming both players play optimally?(c) Suppose we are performing alpha-beta search. In what order would the successors of the root node have to be processed in order to maximize the number of nodes that can be pruned? Identify the node(s) in the game tree that can be pruned, and specify the α or β inequality that allows pruning to take place.(d) Suppose that instead of playing to maximize the score, player X chooses a random valid move with uniform probability. Explain how the game tree will change (you do not have to redraw it), and compute the new utility and best move for O starting from the current state.We will consider a simplified version of the game of Yahtzee. We first roll three 4-sided dice (with results 1, 2, 3, 4 occurring with equal probability), and then we can either reroll one of the dice or keep the original result. Let S be the sum of the final dice results. We get a score equal to the max over S and one of the following, if the situation applies: • 10 points for two-of-a-kind • 15 points for three-of-a-kind • 7 points for a series (1-2-3 or 2-3-4)The expectimax tree representing the player’s decision is shown below: Suppose we initially rolled a 3, 4, and 4. From left to right, the second layer of the tree represents actions of rerolling the 3, rerolling one 4, rerolling the other 4, and keeping the original dice results. The third layer represents the outcomes of each of the three die rerolls from 1 to 4 (going left to right for each).(a) Fill in the values of the nodes of the tree (you can simply write out the values for each node variable, e.g. M = 1). Start with the utilities of the leaf nodes, followed by the chance nodes and finally the value at the root. What decision maximizes our expected score?(b) Suppose we change the rules slightly. We can still either reroll a die or keep the initial results, but choosing the ”reroll” action results in a random die being rerolled (we do not get to choose which). Describe precisely how the structure of the expectimax tree would change and say what the new optimal action is. You do not need to draw the new tree or compute new node values.(c) Consider another change of rules. Instead of choosing one die to reroll, the options are to choose exactly two dice to reroll (we can indicate which two dice we want) or to keep the initial results. Describe precisely how the structure of the expectimax tree would change. You do not need to draw the new tree or compute new node values.Sudoku is a logic-based number placement puzzle. The basic rules are as follows: Given a n 2 × n 2 grid and a set of pre-filled cells (clues), fill in the remaining cells such that each row, column, and n × n (non-overlapping) subgrid contains an instance of each number from 1 to n 2 (inclusive).Classic sudoku has n = 3 and a 9 × 9 grid. Like n-queens, we can define sudoku as a constraint satisfaction problem, which we can then solve using exhaustive or local search methods. We are providing a simple Python scaffold for solving sudoku using hill-climbing search. A state is a filled-in grid (2D NumPy array) with n 2 of each of the numbers from 1 to n 2 .A transition occurs by swapping two numbers that are not in the clue indices. Finally, we can measure the number of “conflicts” for a given state by counting the number of errors in each row, column, and subgrid.The code that we provide generates a sudoku puzzle, which you can test out yourself. Given the order (n) of the desired puzzle and the number of clues (which should be fewer than n 4 , the total number of cells), generate returns a dictionary (problem) specifying the indices and values of clue cells.The display function prints out the board with the clues filled in and zeroes elsewhere. initialize returns a state that satisfies the clues (but is not necessarily a solution), and successors returns a list of valid successor states given a current state. The latter two functions will be used by your hill-climbing procedure.3.1: Computing Errors (5 points) Write the num errors function that will count and return the number of errors in a given state. The simplest way to do this would be to iterate over each of the n 2 rows, n 2 columns, and n 2 non-overlapping n×n subgrids and count the number of missing values in each. Note that this will inevitably double- or triple-count individual errors.3.2: Basic Hill-Climbing (10 points) Implement the basic hill-climbing procedure in hill climb, ignoring the optional arguments for now. You should only initialize the given problem once, and no sideways moves are permitted. Either move to a “better” successor state with the lowest number of errors (choose a random one if there are multiple best states), or return the current state if no successor state has fewer errors than the current one. Also create a list tracking the number of errors in each iteration and return that along with the solution.3.3: Preliminary Tests (5 points) You should find that your basic procedure performs decently on order-2 puzzles, although it is certainly prone to getting stuck at local minima. Generate a few different puzzles with n = 2 and c = 5 clues (the minimum number of clues for a unique solution, though not sufficient for all puzzles). For two puzzles, one that hill climb can solve and one that hill climb cannot solve successfully, show the problem using display, the returned solution state, and a plot showing the number of errors over each iteration.Run hill climb in batch on 100 random order-2 puzzles with 5 clues each. Report the average success rate (proportion of final states with 0 error) as well as the average error over all final states (counting both successes and failures).3.4: Sideways Moves (5 points) The basic hill climbing procedure can be improved in a couple ways. First, we should allow for sideways moves. If the best neighbors all have the same number of conflicts as the current state, we can move to a random one. You should also keep a counter and increment it when a sideways move is taken consecutively. If max sideways moves are made consecutively, quit and return the current state.Generate a puzzle with n = 2 and c = 5 in which at least one (preferably more) sideways move is taken and show the corresponding error per iteration plot. The sideways move(s) should clearly be seen as a horizontal segment on this plot. Then do a batch solve of 100 random puzzles allowing up to 10 sideways moves each, and report the success rate and average error.3.5: Random Restarts (5 points) The last modification of hill climbing that you will implement is including random restarts. When either all current neighbors are worse than the current state or when max sideways moves are made, we should initialize a new sudoku board and assign it to be the current state. A counter on the number of restarts should be created and incremented each time this happens. The return condition would be when we have hit max restarts or when a solution is found.Run the batch of 100 (n = 2, c = 5) puzzles with max sideways and max restarts both equal to 10 and report your findings. Lastly, experiment with (n = 3, c = 40) puzzles; these are standard-sized sudoku problems, but on the easier side given the number of clues. Run hill climbing on a few of these and show the result of one puzzle that was successfully solved, including the given problem, returned solution, and a plot of errors per iteration. This may take a few tries, and you may tweak the max sideways and max restarts parameters. Write down the parameter values that you used.The game of Connect Four bears some similarities to tic-tac-toe. Two players take turns placing marks or pieces on a grid, trying to be the first to achieve four consecutive pieces. The difference, aside from the size of the board (typically six rows, seven columns) and that we want four rather than three in a row, is that the grid is vertical and affected by gravity. In short, a player’s actions only consist of the column in which to place their piece, and the piece must then subsequently be placed in the bottom-most free cell in that column.We are providing a simple Python scaffold that will be developed into a Connect Four-playing agent using alpha-beta depth-limited search. A game state is represented as a 2D NumPy array. We will use ’X’ and ’O’ to represent each player just as in tic-tac-toe, and the character ’.’ to represent an empty cell. To make our implementation more general, we will allow grids of any size (m × n) and make this a “Connect k” game, rather than just Connect Four. ’X’ will always go first and be the maximizing player, and ’O’ will be the minimizing player. ’X’ wins have positive utility, ’O’ wins have negative utility, and draws have zero utility.We provide several utility functions; the relevant ones for you are terminal and eval. terminal takes in a current game state and returns either a utility value if the state is terminal or None if the game is not terminal (this thus serves as both a utility function and terminal test). eval takes in the same inputs as terminal and returns an evaluation of a non-terminal state based on both players’ potential for winning (how this is computed is not important for this assignment, but we are happy to talk more about it if you are interested). You will use both functions when implementing alpha-beta search.Finally, given the game parameters, game loop runs a game from start to finish, using your implemented alpha-beta search procedure for both players. While this nominally means that both players will be playing with identical search capabilities, the optional X params and O params will allow you to tweak the strength of each player in the form of search depth.4.1: Successor Function (5 points) Before writing alpha-beta search, we first need to write the successors function. Given both a state and the player making a move (either ’X’ or ’O’), return a list of possible successor states. Be sure to follow the rules of Connect k as described above. Note that this function combines the actions and result functions if following the textbook’s pseudocode.4.2: Depth-Limited Alpha-Beta Search (20 points) We provide the wrapper function for depth-limited alpha-beta search, which is called repeatedly by game loop. You will write the two recursive functions, mostly following the textbook pseudocode but with some additional functionalities. The arguments to each are the state, alpha and beta values, k value, current node depth, and maximum allowed search depth.Each function should perform the following tasks: • Perform the terminal test; if terminal, return the state’s score and None move. • If current node depth is equal to maximum allowed depth, this state is treated as a leaf. Run the evaluation function on the state and return the value and None move.• Otherwise we expand the node’s successors and search them using the alternative recursive function. Before doing so, you must generate and sort the successors such that the “best” nodes occur first for each player. You can do this by calling eval on each successor and then sorting them from largest to smallest for MAX and smallest to largest for MIN.• The recursive calls should be similar to the pseudocode. Be sure to increment the current node depth by 1 when you make a new recursive call.4.3: Testing Connect k (5 points) Once you are ready to test your implementations, have your agent play itself in Connect 3 and then Connect 4, both on a 4 × 4 board. Run both without depth limiting; you can simply omit the two optional arguments to game loop. Show the final state for each.To play larger games, depth limiting will be necessary. Have your agent play the full Connect 4 game on a 6 × 7 board. Show the final game states for maximum search depths of 5 and 6 (for both players). You should find that one ends in a draw and the other ends in a win for one player. Then try to change the depth values, making them different if necessary, so that the other player wins the game. Show the final state for this last run.Submission You should have one PDF document containing your solutions for problems 1-2, as well as the information and plots in the relevant parts of 3-4. You should also have the completed code file implementing problems 3 and 4; make sure that all provided function headers are unchanged. Submit the document and code files to the respective assignment bins on Gradescope. For full credit, you must tag your pages for each given problem on the former.

[SOLVED] Coms w4701: artificial intelligence homework 1 problem 1: hodl (15 points) consider the problem of investing in cryptocurrency.

Consider the problem of investing in cryptocurrency. We have many trader agents who must decide when and how much to invest at any given time. Their decisions are based on individual risk levels, current market conditions, and inferences about company events (e.g., IPO, earnings reports, acquisition, etc.).(a) Give a state space description of this problem. What information should individual states contain? What are the valid actions that an agent can take in each state?(b) Classify this task environment according to the six properties discussed in class, and include a one- or two-sentence justification for each. For some of these properties, your reasoning may determine the correctness of your choice.This is a variation of the famous Monty Hall problem. Let’s say we play the following game prior to our final exam. You are given 50 mystery boxes and you have to choose one of them. One mystery box grants you a free A on the final (U = 20). Another one will force you to retake the entire class in the fall (U = −100). All other boxes give you a single bonus point on the final exam (U = 1).(a) Compute the expected utility of selecting a mystery box at random. (b) Suppose you’ve chosen a box. Before you open it, we choose and open five other boxes granting the single exam bonus point. What is the expected utility of switching to another box after this occurs?(c) Is it better to stick with your original choice or switch to another one? Compute the value of information of seeing the five boxes granting the single bonus point.(d) We will now slightly change the game rules. After you select a box, we will open it for you. You can then either claim the revealed prize, or choose to open another mystery box and keep the new prize (forgoing the first one). What is the expected utility of this game procedure?In the state space graph below, S is the start state and G is the goal state. Costs are shown along edges and heuristic values are shown adjacent to each node. All edges are bidirectional. Assume that search algorithms expand states in alphabetical order when ties are present.(a) List the ordering of the states expanded as well as the solution (as a state sequence) returned by each of DFS and BFS, assuming that they use the early goal test (insertion into the frontier) and a reached table.(b) List the ordering of the states expanded as well as the solution (as a state sequence) returned by each of DFS and BFS, assuming that they use the late goal test (removal from the frontier) and a reached table.(c) List the ordering of the states expanded as well as the solution (as a state sequence) returned by each of UCS and A*, assuming that they use the late goal test (removal from the frontier) and a reached table.(d) Change exactly one heuristic value so that A* returns a suboptimal solution. Give the value of the heuristic that you changed and explain why it becomes inadmissible. List the ordering of the states expanded as well as the solution (as a state sequence) returned by A*. (e) A heuristic value has not been assigned to S. Give the upper bound on the value of h(S) so that i) h is admissible (but possibly inconsistent), and ii) h is consistent (and admissible). Do these values change how A* runs on this graph?In this problem you will implement and compare the performance of various search algorithms on solving word ladder puzzles. Given two English words, the goal is to transform the first word into the second word by changing one letter at a time. The catch is that each new word in the process must also be an English (dictionary) word. For example, given a start word “fat” and a goal word “cop”, a solution would be the word sequence “fat”, “cat”, “cot”, “cop”.We have provided a Python skeleton file for you to complete this problem. Note that the function headers are type annotated to clearly indicate argument and return value types. You can optionally use static type checkers like mypy to verify your code as you work on it.In our implementation, states will be equated with words (strings). Successor states are words that differ from the current state by one letter. Using this idea, we provide a successors function that returns a list of “actions” and successor states given a state. The action is simply the index of the changed letter. This function uses the pyenchant library to perform dictionary checking. Finally, the cost of each action can be treated uniformly (e.g., 1).To perform search, we represent a search tree node as a Python dictionary containing three components: state, parent, and cumulative cost. For example, the root node may be defined as {‘state’:start, ‘parent’:None, ‘cost’:0}. The frontier can be implemented as a list, while the implementation of the reached set will differ in each part below.4.1: Depth-Limited Depth-First Search (15 points) We will start by implementing depth-limited DFS. First, write the expand function, which takes in a node and returns a list of nodes, one per successor state to the state in the given node. You should use the provided successors function here.The depth limited dfs function takes in a start state, a goal state, and a depth limit. Nodes at depth will be considered to be leaves. In this procedure, you can implement the reached set as a list of states (as opposed to the frontier, which is a list of nodes). Since this is DFS, you should perform the early goal test for time efficiency by checking for the goal upon a node’s insertion into, rather than removal from, the frontier.You should also continually update two local variables: nodes expanded and frontier size. The first is an integer that is incremented every time that a node is expanded. The second is a list that contains the size of the frontier at the beginning of each search iteration, updated before a node is popped. Once the goal node has been found, the procedure returns that node, along with nodes expanded and frontier size. Alternatively, if the frontier becomes empty, you should return None along with the latter two values.If you would like to test your solution before moving on, you can run the code in the main function and also add your own (see 4.4). Investigate the return values of depth limited dfs. You can also pass in the goal node into the provided sequence function to retrieve the sequence of words from start to goal. Think about how changing the depth limit affects the solution.4.2: Iterative Deepening (5 points) Since we have a DFS implementation that can account for depth limits, a natural extension would be an iterative deepening wrapper around it. Recall that this algorithm repeatedly calls depth limited dfs with a larger depth parameter each time, starting from 0. To prevent this from potentially searching forever, iterative deepening will stop and return no solution if one is not found by max depth.As with depth limited dfs, iterative deepening should also update and return quantities indicating nodes expanded and frontier size. The former should be the total number of expanded nodes over all search iterations. The latter should be a single flat list containing the frontier sizes over all iterations.4.3: A* Search (10 points) Your iterative deepening implementation, like all uninformed search approaches, uses no information about the goal word. But we should use this knowledge to our advantage. It makes sense to “favor” successor words that look more like the goal. We can do this using A* search, and a suitable heuristic would be the Hamming distance between a current word and the goal, or the number of indices where the corresponding letters are different.A* search will mostly follow the implementation of depth-first search with a few key changes. To simulate priority queue behavior of the frontier, the heapq module, and in particular the heappush and heappop functions, can be used to efficiently treat regular Python lists using a priority function.In order to “sort” the nodes in the frontier, you can first place each node within a data structure like a tuple, so that the first element captures the f-cost and the last element is the node itself. For example, (1, node1) would be ordered before (2, node2). If the first elements of two tuples are equal, they are then compared according to their second elements, followed by their third and so on. For this problem, you should break ties (and thus include an additional element between the f-cost and node) by alphabetical order of the states.Another difference is that the reached structure is now a dictionary with each key as a state and the value as the cheapest node reaching that state. Even if a state already exists in reached, a cheaper path to it may be discovered later, and so a cost comparison should be done when determining whether a child node should be added to the frontier. Finally, remember to conduct the goal test only when popping a node from the frontier—the “early” version should no longer be used.Implement the astar search function following the specifications above. The return values are the same as those of your other implementations.4.4: Analysis (20 points) You should now be able to run each search algorithm implementation and solve different word ladder puzzle instances. We provide a sequence function that takes the goal node output from best-first search and returns the entire sequence of words from start to goal. Consider the puzzles below: • Start: “fat”; goal: “cop” • Start: “cold”; goal: “warm” • Start: “small”; goal: “large”(a) Let’s investigate A* first as the results will help us better understand the other two algorithms. Run A* on each of the puzzles above and report the solution length as well as number of nodes expanded. Also generate three line plots (e.g., using matplotlib), one per puzzle, showing the size of the frontier per iteration.(b) Now let’s look at iterative deepening search. From what you saw with A*, what is the range of maximum allowable depth values that would yield solutions for the first two puzzles but no solution for the third? Explain whether these solutions would be identical to those of A*.(c) Pick two different integer values in the range you found above and perform two runs of IDS on the three puzzles with these max depths (we recommend you choose smaller values for faster runs). Report the solution lengths and number of nodes expanded for each, and compare these values with those of A* for each puzzle.(d) Do the same experiment using the max depth values you used above for depth-limited DFS. Report the solution lengths and number of nodes expanded. Compare these values with those of IDS for each puzzle. Why might the results be different even though the maximum depths are all identical?Submission You should have one PDF document containing your solutions for problems 1-3, as well as the information and plots for 4.4. You should also have the completed code file implementing the search algorithms for the word ladder problem in a .py file. Submit the document and code file to the respective assignment bins on Gradescope. For full credit, you must tag your pages for each given problem on the former.

[SOLVED] Coms 4701 homework 5 – coding

The objective of this homework is to build a hand gesture classifier for sign language. We will be using Google Colaboratory to train our model (set up instructions at the end). The dataset will be in the form of csv files, where each row represents one image and its true label.We have provided the skeleton code to read in the training and testing datasets. Before you begin coding, go through the provided code and try to figure out what each function is responsible for. Most of the functionalities are implemented in the SignLanguage class.Below are brief descriptions of each function and what is expected of you. • create model(self): You will generate a keras sequential mode here. Make sure to set the self.model variable with your compiled sequential model.• prepare data(self, images, labels): Mainly this splits the data into training and validation sets. You may choose to normalize or downsample your data here, as well.• train(self, batch size, epochs, verbose): This method invokes the training of the model. Make sure to return the generated history object. Your model will be trained for a max of 50 epochs during grading. Make sure you are using the input parameters (batch size, epochs, verbose)• predict(self, data): This method will be invoked with the test images. Make sure to downsample/resize the test images the same way as the training images, and return a list of predictions.• visualize data(self, data): Nothing to do here. This is solely to help you visualize the type of data you are dealing with. • visualize accuracy(self, history): Nothing to do here. It plots out the accuracy improvement over training time.Here are a few guides that may help you get started: 1. Keras Guide: Getting started with the Keras Sequential model 2. Introduction to Convolutional Neural Networks (CNNs) 3. A practical guide to CNNsA few points to note: • We will train each model for 30 epochs to ensure convergence. However, you are free to tune the batch size hyperparameter.• We require a train-validation split, but you are free to choose the dataset split ratio. The following questions might help you better approach CNNs and Keras: 1. What is one-hot encoding? Why is this important and how do you implement it in keras? 2. What is dropout and how does it help overfitting? 3. How does ReLU differ from the sigmoid activation function? 4. Why is the softmax function necessary in the output layer?5. This is a more practical calculation. Consider the following convolution network: (a) Input image dimensions = 100x100x1 (b) Convolution layer with filters=16 and kernel size=(5,5) (c) MaxPooling layer with pool size = (2,2) What are the dimensions of the outputs of the convolution and max pooling layers?You will submit a README.txt or pdf file that should contain the following: 1. Your name and UNI 2. 1-2 sentence answers to the questions above. 3. A very brief explanation of your architecture choices, workflow or any relevant information about your model.1 Test-Run Your Code Before you submit, make sure it works! Go to Runtime → “Restart and run all. . . ”. This will restart your kernel, clearing any set variables and run all cells again. This is to make sure your results are reproducible and do not depend on an overwritten variable!2 Grading Submissions Your model will be tested on the test set using the grading script as given in the skeleton code. We will only be using your SignLanguage class. Please make sure not to edit the grading script portion of the notebook to avoid grading issues.Any model that achieves 90% accuracy in the test set will receive a full score. The scoring algorithm is tentatively grade = 100 ∗ min(90, accuracy)/90. Any changes to the grading scheme will be updated here.To avoid unlucky optimizations, we will train your model twice and take the maximum accuracy on the test set as your model accuracy.3 Submission When you are ready to submit your file, run the notebook using Runtime > “Restart and run all. . . ”. Once you have your output, go to File > Download .ipynb and File > Download .py.Upload the following files to the gradescope assignment HW6 – Programming: • README.txt/README.pdf • sign language.ipynb • sign language.py Make sure to include your uni and name in the first cell of your notebook!4 Setting up Google Colaboratory 1. Go to https://colab.research.google.com and sign in using your LionMail/gmail account. 2. In the pop up window, select “UPLOAD” and upload the sign language.ipynb file we have provided. (If this is not your first time, then it should automatically appear in the “RECENT” tab) 3. Once the notebook is open, you will need to set up the runtime to use GPU for training. To set up the GPU, go to Runtime > Change Runtime Type and fill in the following values Runtime type: Python 3 Hardware accelerator: GPU Omit code cell output: False (Uncheck)4. Next, upload your data files to Colaboratory. On the left hand panel, go to the Files tab. Click on “Upload”, and select your train.csv and test.csv files. 5. You should be set up and ready to go! The main advantage of Google Colaboratory is that all your packages should come pre-installed (especially tensorflow!).