Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Ece404 introduction to computer security: homework 09

Iptables is an IPv4 packet filtering and network address translation tool used to set up, maintain, and inspect the tables of IP packet filter rules in a Linux kernel. Your learning objectives for this homework are as follows: 1. Understand the overall organization of the Linux based iptables tool 2. Write your own iptables rules based on specific, real-world requirements As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Getting Ready for This Homework Before embarking on this homework, it is advised that you familiarize yourself with the relevant material discussed in Prof. Kak’s Lecture 18 Notes. Nonetheless, you might find the following review material on iptables helpful when writing your own firewall rules. In its simplest form, iptables is a Linux firewall program that monitors network traffic to and from a server (your local machine in this case) using a set of tables. Each of these tables consist of a set of rules typically called chains, to which incoming and outgoing packets are subject to. When a data packet matches a particular rule, it is given a target. This target can be another chain, or one of the following special values: • ACCEPT – allows the packet to pass through the firewall • DROP – prohibits the packet from entering, with no indication to the sender that the connection failed. • REJECT – prohibits the packet from entering, and sends an error message to the source indicating that the connection failed. 1 • RETURN – stops the packet from further traversal through the chain and tells it to go back to the previous chain. Defining new rules with iptables boils down to appending a new rule to a specified chain. Shown below is how one might use iptables to add new rules. 1 sudo iptables -t < table > -A < chain > -i < in interface > -o < out interface > -p < protocol > -s < source > – dport < port no . > -j < target > • -t – specifies which table you would like to append a new rule in • -A – specifies which chain in the table you would like to append a new rule to • -i – specifies the network interface incoming packets are received on (eth0, lo, ppp0, etc) • -o – specifies the network interface that packets that are to be sent on • -p – specifies the network protocol where your filtering process takes place (tcp, udp, icmp, sctp, icmpv6, all, etc.) • -s – specifies the address from which traffic comes from (can be symbolic or numerical) • -dport – specifies the destination pot number (22, 80, 8000, etc.) • -j – specifies the target (another chain or one of the four previously mentioned special values) Note *** The information provided to you in this section is just the tip of the iceberg when it comes to interacting with iptables. The requirements for the programming portion of the assignment will draw on additional knowledge found in Lecture 18 as well as the official Linux manual found here [1]. 3 Programming Assignment Before making any changes to your current firewall, it is important to save the current state of it. Shown in the code listing below is how you could accomplish this. DO NOT SKIP THIS STEP!!! 2 1 # this saves the current state of your firewall into a file 2 # called MyFirewall . bk 3 iptables – save > MyFirewall . bk 4 5 # this restores your firewall from file MyFirewall . bk 6 iptables – restore < MyFirewall . bk Design a firewall for your Linux machine using the iptables packet filtering modules. It is likely that iptables came pre-installed with the Linux distribution you are using. Otherwise, you may need to upgrade it to get iptables to work. If you don’t have a Linux environment on your PC, you can try setting up a virtual machine using software such as VirtualBox or VMware. Write a set of iptables rules (as a shell script titled firewall404.sh) to do the following: 1. Flush and delete all previously defined rules and chains 2. Write a rule that only accepts packets that originate from f1.com. 3. For all outgoing packets, change their source IP address to your own machine’s IP address (Hint: Refer to the MASQUERADE target in the nat table). 4. Write a rule to protect yourself against indiscriminate and nonstop scanning of ports on your machine. 5. Write a rule to protect yourself from a SYN-flood Attack by limiting the number of incoming ’new connection’ requests to 1 per second once your machine has reached 500 requests. 6. Write a rule to allow full loopback access on your machine i.e. access using localhost (Hint: You will need two rules, one for the INPUT chain and one the OUTPUT chain on the FILTER table. The interface is ’lo’.) 7. Write a port forwarding rule that routes all traffic arriving on port 8888 to port 25565. Make sure you specify the correct table and chain. Subsequently, the target for the rule should be DNAT. 8. Write a rule that only allows outgoing ssh connections to engineering.purdue.edu. You will need two rules, one for the INPUT chain and one for the OUTPUT chain and one the FILTER table. Make 3 sure to specify the correct options for the –state suboption for both rules. 9. Drop any other packets if they are not caught by the above rules. To run your script, you will have to include a shebang line at the beginning of the file for .sh (this program is almost always located in /bin/sh). Your script should be able to run without error. You will also need superuser privileges to edit any of the packet-filtering tables. 4 Submission Instructions • For this homework you will be submitting a zip file titled hw09 .zip, which consists of: – A pdf titled hw09 .pdf containing: ∗ For each requirement in section 3, your solution as well as an in depth explanation of your solution. In depth to the point where a beginner (who only has basic knowledge of iptables can follow along. ∗ Output (e.g. screenshots) of your updated firewall after running firewall404.sh. The command for this is sudo iptables -L – The file firewall404.sh containing your iptable commands References [1] iptables:administration tool for IPv4 packet filtering and NAT. URL administration tool for IPv4 packet filtering and NAT.

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 08

This assignment marks the start of the system/protocol side of ECE404. The goal of this assignment is to give you a deeper understanding of the transport control protocol (TCP) and its vulnerabilities to denial-of-service (DoS) attacks. As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Problem 1 Write a Python object-oriented program that scans a specific target IP for open ports, and subsequently performs a SYN Flood attack. 2.1 Starter Code 1 class TcpAttack (): 2 def __init__ ( self , spoofIP :str , targetIP :str )-> None : 3 # spoofIP : String containing the IP address to spoof 4 5 def scanTarget ( self , rangeStart :int , rangeEnd :int )-> None : 6 # rangeStart : Integer designating the first port in the range of ports being scanned 7 # rangeEnd : Integer designating the last port in the range of ports being scanned 8 # return value : no return value , however , writes open ports to openports .txt 9 10 def attackTarget ( self , port :int , numSyn :int )->int : 11 # port : integer designating the port that the attack will use 12 # numSyn : Integer of Syn packets to send to target IP address at the given port 13 # If the port is open , perform a DoS attack and return 1. Otherwise return 0 14 1 15 if __name__ == ” __main__ “: 16 # Construct an instance of the TcpAttack class and perform scanning and SYN Flood Attack 2.2 Program Requirements Construct a class called TcpAttack that implements both open port scanning and the SYN flood attack. A breakdown of how you might use the starter code to accomplish this is as follows: 1. Define the constructor of the TcpAttack class: • The constructor is an inbuilt function of the class that gets executed when creating new instances of that class. • Every instance of the TcpAttack class has two instance variables, spoofIP and targetIP. Thus the constructor of this class accepts two strings as arguments. (a) spoofIP: Any IP that is not your own machine’s (b) targetIP: The target of the scan and the SYN Flood attack (c) Note that there is a flexibility in how you express the IPs. They can either be expressed as symbolic hostnames or in the corresponding dotted decimal notation. 2. Define the scanTarget class method: • The method accepts two integer arguments: – rangeStart: The first port in the range of ports to be scanned – rangeEnd: The last port in the range of ports to be scanned • This method scans the target machine for open ports in the range [rangeStart, rangeEnd] and writes all open ports detected into an output file called openports.txt • The format of openports.txt should be one open port per line in ascending order. 3. Define the attackTarget class method • This method accepts two integer arguments: – port: The port number on which the attack will be mounted on 2 – numSyn: The number of SYN packets to be sent to the target on the specified port • This method first verifies if the specified port is open. If so, perform the DoS attack and return 1. Otherwise return 0 2.3 Program Dependencies For this assignment, you will need to use a combination of functions from the socket [2] and scapy [1] libraries. Feel free to consult the official documentation for these modules, as well as Prof. Kak’s implemenation in Lecture 16.15. • socket: a module that allows you to set up a socket connection • scapy: a module that allows you to create and send network packets Please note that you will need to install scapy in order to use its defined methods and objects. If you elected to create a conda environment at the beginning of the semester, installing scapy is as easy as running the following command in your ece404 conda environment. 1 pip install scapy 2.4 Implementation Details for SYN Flood Attack Note that SYN flood attacks have become more difficult to mount over the years. As shown in Lecture 16.14 of the lecture notes, most ISP’s now use BCP 38 ingress filtering to prevent spoofing over a router. Therefore you would have to do the spoofing attack between two computers on the same LAN where the packets would not go through a router. For this assignment, it is totally acceptable if you do not actually manage to cause a DoS outside your LAN or do not have the means to do it with another computer on the same LAN. We are simply looking to see that a theoretical attack is implemented correctly. 2.5 How to Tell that Your Program is Working To test that the target machine is actually receiving packets, you should run tcpdump (or some equivalent program) while your script is running to see that you are actually sending packets to the target IP address (i.e. start 3 tcpdump and then run your program). If you are using Windows, you can use Wireshark instead of tcpdump to look at the packets. In the event that you are on a busy network, you can use tcpdump to selectively sniff packets as outlined in Lecture 16. To further avoid clutter, you can optionally turn off all other applications connecting to the internet. As mentioned below, you will include output from these programs in your homework submission. If you do not have access to another computer to test on, you can use Prof. Kak’s machine in RVL whose symbolic hostname is moonshine.ecn.purdue.edu. 2.6 How Your Code Will Be Tested Your source code will be tested with a script similar to the one below: 1 from TcpAttack import * 2 3 spoofIP = ’10.10.10.10 ’ 4 targetIP = ’moonshine .ecn. purdue .edu ’ 5 6 rangeStart = 1000 7 rangeEnd = 4000 8 9 port = 1716 10 numSyn = 100 11 12 tcp = TcpAttack ( spoofIP , targetIP ) 13 tcp . scanTarget ( rangeStart , rangeEnd ) 14 15 if tcp . attackTarget ( port , numSyn ): 16 print ( f” Port { port } was open , and flooded with { numSyn } SYN packets “) 3 Submission Instructions • For this homework you will be submitting a zip file titled hw08 .zip, which consists of: – A pdf titled hw08 .pdf containing: ∗ Output (e.g. screenshots) from tcpdump (or equivalent program) of both the port scanning and syn flood attack. Your 4 PDF should indicate in the tcpdump output (e.g. highlight, circle, etc.) which packets were sent as a result of the program you wrote. ∗ Example screenshots have been provided below in section 4 – The file TcpAttack.py containing your code for the programming problem. 4 Example Screenshots From tcpdump Figure 1: tcpdump output indicating port scanning Figure 2: tcpdump output indicating SYN flood attack on port 1716 References [1] Scapy: interactive packet manipulation tool. URL https://pypi.org/ project/scapy/. [2] Socket: Low-level networking interface. URL https://docs.python. org/3/library/socket.html.

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 07

In this homework assignment, we will dive into the practical side of cryptography by implementing the Secure Hash Algorithm 512 (SHA-512) in Python. SHA-512 is a widely used crytographic hash function that generates a fixed-sized message digest from input data as large as 2128 bits. Furthermore, the algorithm is part of the SHA family developed by the National Security Agency (NSA) and published by the National Institute of Standards and Technology (NIST). As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Problem 1: SHA-512 Write a Python script that implements the SHA-512 algorithm. More specifically, your program should have the following call syntax: 1 python3 sha512 . py input . txt hashed . txt An explanation of the syntax and some key points to note are mentioned below: • Using the logic described in Lecture 17.5.2, hash the ASCII text in input.txt and write the resulting message digest to hashed.txt • Note that the resulting message digest should be written in hexstring format • You can check the correctness of your work by comparing the message digest your implementation produces with that produced by this online tool • For those of you that I know for a fact will skip over the previous bullet, the message digest for the given input represented as a one-line hexstring is: 1 1 84 f353348a552229554fba7ba822005edcb6bca2fac8cf1735d53ae9e 2 2915 aa2e625f6d3cfa0106c8707ff0004d3ce95281b47b851b380ef91 3 c86d2fb0e58b28 Submission Instructions • Make sure to follow program requirements specified above. Failure to follow these instructions may result in loss of points!. • You must turn in a single zip file on Brightspace with the following naming convention: HW07 .zip . Your submission must include: – A PDF titled hw07 .pdf containing: ∗ a brief explanation of your code. ∗ The input and the output of the sha12.py. – The file sha512.py containing your code for your SHA-512 implementation.

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 06

The goal of this homework is to give you a deeper understanding of RSA encryption and decryption, and its underlying principles. Before starting this assignment, make sure that you understand the relationship between the modulus and the block size for the RSA cipher and how RSA is made feasible by the fact that the modular exponentiation possesses a fast implemenation. As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Problem 1: RSA Encryption and Decryption Write a Python object oriented program to implement a 256-bit RSA algorithm for encryption and decryption. The plaintext message has been provided in the zip file and is called message.txt. Your data block from the text will be 128-bits. For the reasons explained in Lecture 12.4, prepend your 128-bit data block with 128 zeros on the left to make it a 256-bit block. If the overall length of the plaintext is not an integral multiple of 128-bits, make sure to pad the appropriate number of zeros from the right before prepending the 128 zeros mentioned in the previous step. This method of creating the blocks can be a little tricky so make sure to understand it, or you will face issues when trying to get the correct encryption results. 2.1 Problem 1 Program Requirements Your solution should accept the following command-line syntax: 1 python3 rsa . py -g p . txt q . txt 2 python3 rsa . py -e message . txt p . txt q . txt encrypted . txt 3 python3 rsa . py -d encrypted . txt p . txt q . txt decrypted . txt An explanation of the command-line syntax is as follows: 1 • For Key Generation (indicated by ‘-g’ in line 1) – The generated values of p and q will be written to p.txt and q.txt respectively. – The .txt files should contain the number as an integer represented in ASCII. – For example if p = 7, the corresponding text file will display 7 when opened in a text editor. • For Encryption (indicated by ‘-e’ in line 2) – Given the p and q values found in p.txt and q.txt respectively, encrypt the plaintext message in message.txt using the RSA algorithm, and write the output to encrypted.txt – The key generation step mentioned in the previous bullet is there to simply make you aware of the necessity in real world applications. Make sure to use the p and q value we provided to check the fidelity of your implementation. • For Decryption (indicated by ’-d’ in line 3) – Given the p and q values found in p.txt and q.txt respectively, decrypt the ciphertext in encrypted.txt using the RSA algorithm, and write the output to decrypted.txt 2.2 Important Implementation Details Regarding key generation, keep the following points in mind while writing your solution: • The priority in RSA is to select a particular value of e and then choose p and q accordingly. For this assignment use e = 65537. • Using the logic in PrimeGenerator.py (found in Lecture 12), to generate values of p and q. Both p and q must satisfy the following conditions: 1. The two leftmost bits of p and q must be set 2. p and q should not be equal 3. (p − 1) and (q − 1) should both be co-prime to e 4. If any of the above conditions are not satisfied, repeat the process until they are. 2 Regarding decryption, keep the following points in mind • RSA specifies that the recovered plaintext can be computed as C d mod n. • However, d is roughly the same size as the modulus n meaning the above modular exponentiation is an expensive process. • To circumvent this obstacle, use the Chinese Remainder Theorem (CRT) explained in Lecture 12.5 to compute the modular exponentiation for decryption. 2.3 RSA Class Skeleton File Below is a skeleton file to get you started on this assignment. 1 class RSA (): 2 def __init__ ( self , e ) -> None : 3 self . e = e 4 self . n = None 5 self . d = None 6 self . p = None 7 self . q = None 8 9 # You are free to have other RSA class methods you deem necessary for your solution 10 11 def encrypt ( self , plaintext :str , ciphertext :str ) -> None : 12 # your implemenation goes here 13 def decrypt ( self , ciphertext :str , recovered_plaintext :str ) -> None : 14 # your implemenation goes here 15 16 if __name__ == ” __main__ “: 17 cipher = RSA ( e= 65537 ) 18 if sys . argv [1] == “-e”: 19 cipher . encrypt ( plaintext = sys . argv [2], ciphertext = sys . argv [5]) 20 elif sys . argv [1] == “-d”: 21 cipher . decrypt ( ciphertext = sys . argv [2], recovered_plaintext =sys . argv [5] ) 3 Problem 2: Breaking RSA for small values of e Lecture 12.3.2 describes a method for breaking RSA encryption for small values of e. In this scenario, a sender, say Party A, sends the same message 3 M to 3 different receivers using their respective public keys. All of the public keys have the same value of e, but different values of n. An attack can intercept all three ciphertexts and use the Chinese Remainder Theorem to calculate the value of M3 mod N, where N is the product of the three ns. Then, he or she can simply solve the cube-root to recover the plaintext message M. Your task is to replicate this scenario with a Python script that does the following: 1. Generates three sets of public and private keys with e = 3 2. Encrypts the given plaintext with each of the three public keys. (You should have three ciphertexts after this step) 3. Take the three ciphertexts generated in step 2 and use the CRT to recover the original plaintext. 3.1 Problem 2 Program Requirements Your solution should accept the following command-line syntax: 1 python3 breakRSA . py -e message . txt enc1 . txt enc2 . txt enc3 . txt n_1_2_3 . txt 2 python3 breakRSA . py -c enc1 . txt enc2 . txt enc3 . txt n_1_2_3 . txt cracked . txt An explanation of the command-line syntax is as follows: • For Encryption (indicated with the ‘-e’ argument) – Encrypt the plaintext in message.txt with three different self generated public keys and write each ciphertext to enc1.txt, enc2.txt, enc3.txt respectively. – Also write the moduli used (i.e. n1, n2, n3) to n 1 2 3.txt • For Cracking the Encryption (indicated with the ‘-c’ argument) – Given the three ciphertexts and the respective public keys used to compute them, recover the original plaintext and write it to a file called cracked.txt 3.2 Important Implementation Details for Problem 2 This section details some implemenation details that are important to consider 4 • Problem 1 and 2 share a lot of overlap in terms of operations that need to be performed. Thus, we highly recommend doing a good job of defining your RSA class in problem 1 so that it can be directly used in problem 2. • Because python’s pow() method does not provide enough precision to solve the cube-root, we have provided a method called solve pRoot() that provides the necessary precision. 4 Submission Instructions • Make sure to follow program requirements specified above. Failure to follow these instructions may result in loss of points!. • For this homework you will be submitting a zip file titled HW06 .zip to Brightspace containing: – The file rsa.py containing your code for Part 1. – The file breakRSA.py containing your code for Part 2. – You can import PrimeGenerator.py and solve pRoot.py into your .py files with the assumption that it will be in the same directory as your files when being graded. – a pdf titled HW06 .pdf containing a detailed explanation of how you implemented RSA in part 1 and the CRT to break RSA in part 2. 5

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 05

This homework assignment consists of two part, the first of which is in regards to using block ciphers in the counter (CTR) mode. The second is in regards to cryptographically-safe pseudo-random number generators. Both parts of the assignment will require the use of your AES implemenation from homework 05. As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Problem 1: AES Encryption In Counter Mode In homework 02, the sudden changes in the image of the helicopter allowed you to see the helicopter’s outline even after encrypting the image. To prevent this from happening, encrypt the same helicopter image using AES in the counter mode as described in Section 9.5.5 of the lecture notes [2]. In a real-world implementation, you would normally choose a random 16 byte number, however for testing and reproducibility purposes, please use a BitVector containing the ASCII encoding of the textstring ‘counter-modectr’. as your initialization vector. Please note that your solution must be an object oriented python program. The bullet points below should give you an idea of how your solution should be written as an extension of the AES class from homework 04. • The method that performs AES encryption in the CTR mode has the following format. Note that this function should be an AES class method. 1 def ctr_aes_image ( self , iv , image_file , enc_image ): 2 “”” 3 Inputs : 4 iv ( BitVector ): 128 – bit initialization vector 5 image_file (str): input .ppm file name 6 enc_image (str): output .ppm file name 1 7 8 Method Description : 9 * This method encrypts the contents in image_file using CTR mode AES and writes the encrypted content to enc_image 10 * Method returns void 11 “”” • To ensure that the encryption does not take too long, write each block to the output image file as you encrypt it. Do not store the entire encrypted image in a BitVector as you encrypt it (this will cause a noticeable slowdown to the size of the image). • As in homework 02, the encrypted image should still be a viewable image file and as such should have an image header • The command line syntax to invoke your ctr aes image method should be the following: 1 python3 AES . py -i image . ppm key . txt enc_image . ppm • Note here that the key used is the same as the one used in homework 04 • Modify your if name equals main guard to hand the above command line syntax. Example shown below: 1 if __name__ == ” __main__ “: 2 cipher = AES ( keyfile = sys . argv [3]) 3 if sys . argv [1] == “-e”: 4 cipher . encrypt ( plaintext = sys . argv [2], ciphertext = sys . argv [4]) 5 elif sys . argv [1] == “-d”: 6 cipher . decrypt ( ciphertext = sys . argv [2], recovered_plaintext =sys . argv [4]) 7 elif sys . argv [1] == “-i”: 8 cipher . ctr_aes_image ( iv= BitVector ( textstring =” counter -mode -ctr”) , image_file =sys . argv [2], enc_image =sys . argv [4]) 2 3 Problem 2: X9.31 CSPRNG Lecture 10.6 introduces the ANSI X9.31 crytographically secure pseudorandom number generator. Your task is to implement a more modern version of this PRNG with the following requirements: • Instead of using 3DES for encrypting the 64-bit vectors as indicated in the lecture notes, use your implemenation of AES from homework 04 to encrypt 128-bit vectors. • Interestingly enough, newer modern versions of the X9.31 algorithm use AES instead of 3DES to generate their pseudo-random numbers [1]. • Similar to problem 1, your solution for problem 2 should also be in the form of an object oriented python program. (i.e. an extension of your AES class from homework 04). • The method that generates your cryptographically secure pseudo-random numbers has the following format. Note that this function should be an AES class method. 1 def x931 ( self , v0 , dt , totalNum , outfile ): 2 “”” 3 Inputs : 4 v0 ( BitVector ): 128 – bit seed value 5 dt ( BitVector ): 128 – bit date / time value 6 totalNum (int): total number of pseudo – random numbers to generate 7 8 Method Description : 9 * This method uses the arguments with the X9.31 algorithm to compute totalNum number of pseudo – random numbers , each represented as BitVector objects . 10 * These numbers are then written to the output file in base 10 notation . 11 * Method returns void 12 “”” • For testing and reproducibility purposes, please use the IV from problem 1 for v0 and the integer value 501 represented as a 128-bit BitVector for dt. 3 • The command line syntax to invoke your x931 method should be the following: 1 python3 AES . py -r 3 key . txt random_numbers . txt • The argument directly following the ‘-r’ flag indicates the number of random numbers to generate. For this assignment please generate 5 cryptographically secure random numbers. • Note here that the key used is the same as the one used in homework 04 • Modify your if name equals main guard to handle the above command line syntax. Example shown below: 1 if __name__ == ” __main__ “: 2 cipher = AES ( keyfile = sys . argv [3]) 3 if sys . argv [1] == “-e”: 4 cipher . encrypt ( plaintext = sys . argv [2], ciphertext = sys . argv [4]) 5 elif sys . argv [1] == “-d”: 6 cipher . decrypt ( ciphertext = sys . argv [2], recovered_plaintext =sys . argv [4]) 7 elif sys . argv [1] == “-i”: 8 cipher . ctr_aes_image ( iv= BitVector ( textstring =” counter -mode -ctr”) , image_file =sys . argv [2], enc_image =sys . argv [4]) 9 else : 10 cipher . x931 ( v0= BitVector ( textstring =” counter -mode – ctr “) , dt= BitVector ( intVal =501 , size =128 ) , totalNum = int ( sys . argv [2]) , outfile = sys . argv [4]) • Using the given dt and v0 values, your program should generate the five following numbers: 1 331374527193731622526773163027689011175 2 26263303708022960927873924862754889187 3 6213881104399286406150948824157995508 4 317525806849049200816126045738729418009 5 240080400546264647934751409092776671804 4 4 Submission Instructions Make sure that the program requirements and submission instructions are followed. Failure to follow these instructions may result in loss of points! • For this homework you will be submitting a zip file to Brightspace titled hw05 .zip containing: – The modified AES.py file containing your solutions for problems 1 and 2 – A PDF titled hw05 .pdf containing: ∗ a brief explanation of your code for both problems ∗ the encrypted image from problem 1 ∗ the 5 pseudo-random numbers generated from problem 2 References [1] NIST-Recommended Random Number Generator Based on ANSI X9.31 Appendix A.2.4 Using the 3-Key Triple DES and AES Algorithms. URL https://csrc.nist.rip/cryptval/rng/931rngext.pdf. [2] ECE 404 Lecture Notes. URL https://engineering.purdue.edu/kak/ compsec/Lectures.html.

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 04

In our recent lectures, finite fields have been a key focus. These mathematical structures lay the groundwork for the Advanced Encryption Standard (AES), enabling the secure handling of arithmetic operations within a limited set. This understanding will be crucial as we embark on implementing the AES algorithm, where finite fields are at the heart of the encryption process. As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Programming Assignment Write an Object Oriented Python [1] program that implements the full AES algorithm. More specifically, given a 256-bit key and a plaintext message, your program must produce the correct encryption and decryption results. The two commands below specify the exact command-line syntax for invoking encryption and decryption. 1 python3 AES . py -e message . txt key . txt encrypted . txt 2 python3 AES . py -d encrypted . txt key . txt decrypted . txt An explanation of the command-line syntax is as follows: • Encryption (indicated with the -e argument in line 1) – perform AES encryption on the plaintext in message.txt using the key in key.txt, and write the ciphertext to a file called encrypted.txt – You can assume that message.txt and key.txt contain text strings (i.e. ASCII characters) – However, the final ciphertext should be saved as a single-line hex string 1 • Decryption (indicated with the -d argument in line 2) – perform AES decryption on the ciphertext in encrypted.txt using the key in key.txt, and write the recovered plaintext to decrypted.txt A skeleton file for your AES.py has been provided below. 1 import sys 2 from BitVector import * 3 4 class AES (): 5 # class constructor – when creating an AES object , the 6 # class ’s constructor is executed and instance variables 7 # are initialized 8 def __init__ ( self , keyfile :str ) -> None : 9 10 # encrypt – method performs AES encryption on the plaintext and writes the ciphertext to disk 11 # Inputs : plaintext (str) – filename containing plaintext 12 # ciphertext (str) – filename containing ciphertext 13 # Return : void 14 def encrypt ( self , plaintext :str , ciphertext :str ) -> None : 15 16 # decrypt – method performs AES decryption on the ciphertext and writes the recovered plaintext to disk 17 # Inputs : ciphertext (str) – filename containing ciphertext 18 # decrypted (str) – filename containing recovered plaintext 19 # Return : void 20 def decrypt ( self , ciphertext :str , decrypted :str ) -> None : 21 22 if __name__ == ” __main__ “: 23 cipher = AES ( keyfile = sys . argv [3]) 24 25 if sys . argv [1] == “-e”: 26 cipher . encrypt ( plaintext = sys . argv [2], ciphertext = sys . argv [4]) 27 elif sys . argv [1] == “-d”: 28 cipher . decrypt ( ciphertext = sys . argv [2], decrypted = sys . argv [4]) 29 else : 30 sys . exit (” Incorrect Command – Line Syntax “) 2 2.1 Useful Notes For AES implemenation The following points may aid you in your implementation of AES, however for full documentation, please refer to Lecture 8 [3]. • Each round of AES involves the following four steps 1. Single-byte based substitution 2. Row-wise permutation 3. Column-wise mixing 4. XOR with the round key • Please note that the order in which these four steps are executed is different for encryption and decryption. • The last round for encryption does not involve the ‘Mix Columns” Step. Similarly, the last round for decryption does not involve the ‘Inverse Mix Columns” step. • As you know, AES has variable key-length, and the number of rounds of processing depend upon the key-length. The lecture assumes a 128-bit key length and all subsequent explanation is based upon that assumption. But the key provided to you is 256 bits long, hence, there will be a slight variation in how you generate the key schedule. The following explanation will be helpful in that regard: 1. For the key expansion algorithm, note that irrespective of the keylength, each round still uses only 4 words from the key schedule. Just as we organised the 128-bit key in 4 words for key-expansion, we organise the 256-bit key in 8 words. 2. Each step of the key-expansion algorithm takes us from 8 words in one round to 8 words in the next round. Hence, 8 such steps will give us a 64-word key schedule. The implementation of the g(·) function remains the same. The logic of obtaining the 8 words from the j th step of key expansion to the (j + 1)th step also remains the same. 3. Note that since the key is 256-bits long, there will be 14 rounds of processing in the AES, plus the initial processing. Because each round of processing uses only 4 words from the key schedule, you will require only a 60-word key schedule. However, the previous step generates a 64-word schedule, so you can ignore the last 4 words in the schedule. 3 • Keep in mind that the block size is still 128 bits, despite the key size being 256 bits. • Should the last block of plaintext not be an integral multiple of 128 bits, pad the block with trailing 0’s. Note that this will result in trailing NULL bytes in your recovered plaintext files. • We have provided first round.txt which allows you to verify your results for each of the four steps in the first round of processing the first block. • You can verify your final ciphertext with this online tool from javainuse [2]. Please note that due to different padding methods for blocks that are not integral multiples of the block size, the final block of encryption generated from the website will not match with your final encrypted block. 3 Submission Instructions Make sure that the program requirements and submission instructions are followed. Failure to follow these instructions may result in loss of points! • For this homework you will be submitting a zip file to Brightspace titled hw04 .zip containing: – The file AES.py containing your code for the programming problem. – A PDF titled hw04 .pdf containing: ∗ a brief explanation of your code, and the encrypted and decrypted output for the text mentioned above using the key provided. References [1] Object-Oriented Programming in Python. URL https://realpython. com/python3-object-oriented-programming/. [2] AES Encryption and Decryption Online Tool. URL https://www. javainuse.com/aesgenerator. [3] ECE 404 Lecture Notes. URL https://engineering.purdue.edu/kak/ compsec/Lectures.html.

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 03

“It is almost impossible to fully understand practically any facet of modern cryptography and several important aspects of general computer security if you do not know what is meant by a finite field” [1]. Thus, the goal of this homework is to help further your understanding of finite fields in preparation for later topics to come. The assignment consists of a theory problem section whose details are specified below. As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Theory Problems Solve the following theory problems. Your solutions must be typed in a PDF titled HW03 .pdf. 1. Given A = {0,1}, determine whether or not the set forms a group with the following binary operators: • boolean and • boolean or • boolean xor 2. Given W, the set of all unsigned integers, determine whether or not w forms a group under the gcd(·) operator. 3. Let’s say we have a ring with the group operator + as addition and the ring operator × as multiplication. If you switch the two (i.e. multiplication is the group operator and addition is the ring operator), would it still be a ring? Explain why or why not (i.e. indicate all the properties that are true/not true that show it is/is not a ring). 1 4. Explain in detail how one would use Bezout’s identity to find the multiplicative inverse of an integer in the field Zp, where p is a prime number. Then, use those steps to find the multiplicative inverse of 47 in Z97. 5. In the following, find the smallest possible integer x that solves the congruences. You should not solve them by simply plugging in arbitrary values of x until you get the correct value. Make sure to show your work. (a) 28x ≡ 34 (mod 37) (b) 19x ≡ 42 (mod 43) (c) 54x ≡ 69 (mod 79) (d) 153x ≡ 182 (mod 271) (e) 672x ≡ 836 (mod 997) 6. Simplify the following polynomial expression in GF(89) (54x 10 − 62x 9 − 84x 8 + 70x 7 − 75x 6 + x 5 − 50x 3 + 84x 2 + 65x + 78) + (−67x 9 + 44x 8 − 26x 7 − 37x 6 + 61x 5 + 68x 4 + 22x 3 + 74x 2 + 87x + 38) 7. Simplify the following polynomial expression in GF(11) (8x 3 + 6x 2 + 8x + 1) × (3x 3 + 9x 2 + 7x + 5) 8. For the finite field GF(23 ), simplify the following expressions with modulus polynomial (x 3 + x + 1): (a) (x 2 + x + 1) × (x 2 + x) (b) x 2 − (x 2 + x + 1) (c) x 2+x+1 x2+1 3 Submission Instructions • You must turn in a single PDF file on Brightspace containing your solutions to the theory questions in section 2. The PDF must have the following naming convention: HW03 .pdf. • You are allowed to include scans of handwritten work in the PDF, but please make sure it is legible. 2 References [1] ECE 404 Lecture Notes. URL https://engineering.purdue.edu/kak/ compsec/Lectures.html.

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 02

The goal of this homework is to help you further your understanding of the Data Encryption Standard (DES) covered in Lecture 3 [2]. Before you start the programming tasks, you are encouraged to take a look at the files in the gzipped archive for Lecture 3. These files include scripts for generating the round keys, permuting the encryption key, as well as as performing the substitution step – all of which are crucial in DES. As always, please read the homework document in its entirety before coming to office hours with your questions. The teaching staff have spent a long time writing the assignment to cover many common questions you might have. 2 Programming Tasks 2.1 Problem 1 Write an Object Oriented Python [1] program that implements the full DES algorithm. Refer to Lecture 3 as it outlines the key steps to implementing DES. Given an encryption key and some plaintext, your program must produce the correct encryption and decryption results. The two commands below specify the exact command-line syntax for invoking encryption and decryption. 1 python3 DES . py -e message . txt key . txt encrypted . txt 2 python3 DES . py -d encrypted . txt key . txt decrypted . txt An explanation of the command-line syntax is as follows: • Encryption (indicated with the -e argument in line 1) – perform DES encryption on the plaintext in message.txt using the key in key.txt, and write the ciphertext to a file called encrypted.txt 1 – You can assume that message.txt and key.txt contain text strings (i.e. ASCII characters) – However, the final ciphertext should be saved as a single-line hex string • Decryption (indicated with the -d argument in line 2) – perform DES decryption on the ciphertext in encrypted.txt using the key in key.txt, and write the recovered plaintext to decrypted.txt A skeleton file for DES.py has been provided below. 1 from BitVector import * 2 import sys 3 4 class DES (): 5 # class constructor – when creating a DES object , the 6 # class ’s constructor is called and the instance variables 7 # are initialized 8 9 # note that the constructor specifies each instance of DES 10 # be created with a key file (str) 11 def __init__ ( self , key ): 12 # within the constructor , initialize instance variables . 13 # these could be the s-boxes , permutation boxes , and 14 # other variables you think each instance of the DES 15 # class would need . 16 17 # encrypt method declaration for students to implement 18 # Inputs : message_file (str), outfile (str) 19 # Return : void 20 def encrypt ( self , message_file , outfile ): 21 # encrypts the contents of the message file and writes 22 # the ciphertext to the outfile 23 24 # decrypt method declaration for students to implement 25 # Inputs : encrypted_file (str), outfile (str) 26 # Return : void 27 def decrypt ( self , encrypted_file , outfile ): 28 # decrypts the contents of the encrypted_file and 29 # writes the recovered plaintext to the outfile 30 31 # drive the encryption / decryption process 32 if __name__ == ’__main__ ’: 33 # example of construction of DES object instance 34 cipher = DES ( key= sys . argv [3]) 2 A couple of things to keep note of while working on Problem 1: • The plaintext bit size is not necessarily divisible by the DES block size. For the sake of this assignment, your program can pad the last block with zeros if this is the case. • You can expect some null-byte characters in your recovered plaintext should you need to pad the plaintext at encryption time. • remember to parse the command-line arguments for your program using the calling conventions described above. Please do not hard-code the file names into your program. • For debugging purposes, we have included a text file called first round.txt containing the left and right halves of the first plaintext block after the first Feistel round for the text file mentioned above. 2.2 Problem 2 As you will soon learn in Lecture 9, block ciphers such as DES should not be used in electronic code book (ECB) mode as seen in Problem 1 (i.e. directly encrypting the data in independent blocks) to encrypt viewable media such as images. Overall patterns in the data may still be obvious since each block of data is encrypted independently. Your job in problem 2 is to demonstrate this characteristic with a script that performs DES encryption on an image. More specifically, given an image in PPM format, perform DES encryption only on the image data with the same key from problem 1. Do not encrypt the PPM header, as you will combine it with the encrypted image data to produce an encrypted PPM image. The structure of a PPM image file is detailed below. PPM Image Format: A “.ppm” image file consists of a header and the actual image data. The header occupies the first 3 lines of the file while the image data (i.e. the data you need to encrypt) begins on the subsequent lines following the header [3]. 1 python3 DES . py -i image . ppm key . txt image_enc . ppm The command-line syntax for invoking image encryption is specified above. It says to perform DES encryption on the image in image.ppm with key in 3 key.txt, and store the encrypted image in image enc.ppm A couple of things to keep note of while working on Problem 2: • Many parts of problem 1 will get reused in problem 2. It is in your best interest to keep your problem 1 solution nice and tidy so that it can be ported over efficiently. • The implementation for your problem 2 solution can be made as short as one additional class method in the DES class, and some modification to the if name equals main construct. • Unlike in problem 1 where you are required to write the encrypted data as a hex string, make sure to write the encrypted image data directly to the file (no hex string conversion), otherwise your final output will not be viewable. 3 Submission Instructions • You must turn in a single zip file on Brightspace with the following naming convention: HW02 .zip . Do not turn in files other than those listed below. Your submission must include: – A PDF containing: ∗ For Problem 1: a brief explanation of your code, and the encrypted and decrypted output for the text mentioned above using the key provided. ∗ For Problem 2: a brief explanation of your code, and a picture of the encrypted PPM image. – The file DES.py containing your code for Problem 1 and 2 References [1] Object-Oriented Programming in Python. URL https://realpython. com/python3-object-oriented-programming/. [2] ECE 404 Lecture Notes. URL https://engineering.purdue.edu/kak/ compsec/Lectures.html. [3] PPM – Netpbm color image format. URL https://netpbm. sourceforge.net/doc/ppm.html.

$25.00 View

[SOLVED] Ece404 introduction to computer security: homework 01

In this exercise, you will assume the role of a cryptanalyst and attempt to break a cryptographic system composed of the two Python scripts EncryptForFun.py and DecryptForFun.py described in Section 2.11 in Lecture 2. As you will recall, the script EncryptForFun.py can be used for encrypting a message file while the script DecryptForFun.py recovers that message from the ciphertext produced with the previous script. Both these scripts can be found on the ECE 404 web page for the Lecture Notes (click on the “download code” tab for Lecture 2) [3]. 2 Programming Tasks 2.1 Downloading the Necessary Dependencies Before writing any code, you will first need to download the BitVector package [2]. You are free to acquire this dependency in any fashion you wish, however the guidelines below detail the setup via. Anaconda environments. 1. Follow the instructions here [1] for installing Anaconda on your local machine. 2. Create your ece404 conda environment: conda create –name ece404 python=3.8 3. Activate your new conda environment: conda activate ece404 4. Install the BitVector package: pip install BitVector 1 2.2 Problem Statement With the BLOCKSIZE parameter set to 16 and the passphrase parameter set to “Hopes and dreams of a million years”, the script EncryptForFun.py produces the following one-line ciphertext for a plaintext message regarding Scuderia Ferrari driver Charles Leclerc. 5e0e392c531926696d58196b3f735c512c640a6469460c737c4216325c07393b5a1 93f695b562f6809093979704a02756640123025591a3e3c0749772e0765380d0d6c 3c191d6a284c0a2b3944072577475528734b51612104047026131861241e546a65 0f5c676b41522922691d1d206e1c096c40522322431d376d625e116a72481d793b 6f584d2c7c584570353e0018277119573d39081f1f497c6f7f0c5d63681618732014 1d62240b1a7e221102373b5f4b6b2b5c4d7a7c1248346c5a597c4f335e15487c6a1 74f7d7e5b1c7d0f133e781e17217f02113b674b08752e17187629080f146c2706127 221541c332850583e614b110b2e7f130c2f6b5f226141112324470135715a4f7c2d4 a4c743c140360733542467025520f2268521a677d5a5b6b3b5d54613b5049652b1 55a2c28140e026d2019116d28475827675317222f4d1c236c4d596b7e0c5d2a7d4 d5f3f2f5619703d1e08381e5b3c2f0d5b347d200919792c0d5c4a6f2d5b4e663118 Your job is to recover both the original quote and the encryption key by mounting a brute-force attack on the encryption/decryption algorithms. HINT 1: The correctly decrypted message should contain the word Ferrari. HINT 2: The logic used in the scripts assumes that the effective key size is 16 bits when the BLOCKSIZE variable is set to 16. So your brute-force attack needs to search through a keyspace of size 216 . 2 2.3 Programming Instructions Implement the following function to decrypt the ciphertext within ciphertextFile using key bv, which returns the original plaintext as a string. 1 def cryptBreak ( ciphertextFile , key_bv ): 2 # Arguments : 3 # * ciphertextFile : String containing file name of the ciphertext 4 # * key_bv : 16 -bit BitVector for the decryption key A couple of things to keep note of: • The function must be implemented and saved in a file named cryptBreak.py. • This function must be implemented to decrypt the message for a single key and not to perform complete brute force analysis – the brute force analysis must be done within the code’s main function/statement or in a separate Python file by importing cryptBreak.py into that file. • Note that the string returned by the above function may or not may not be the correct plaintext since the correct key bv is unknown. Therefore to determine the correct value for key bv, you will need to brute force all possible values for key bv and check the returned string to find the right one. • You need to submit only the cryptBreak.py file which will be autograded – hence make sure that the cryptBreak.py file does not run the entire brute force analysis or any other routine when imported. 2.4 Example Usage Below is an example of how your implemented function could be used – if your function is implemented correctly, the following code snippet should run without any errors. 1 from cryptBreak import cryptBreak 2 from BitVector import * 3 RandomInteger = 9999 # Arbitrary integer for creating a BV 4 key_bv = BitVector ( intVal = RandomInteger , size =16 ) 5 decryptedMessage = cryptBreak (’encrypted .txt ’, key_bv ) 6 if ’Ferrari ’ in decryptedMessage : 7 print (’Encryption Broken !’) 8 else : 9 print (’Not decrypted yet ’) 3 3 Submission Instructions • You must turn in a single zip file on Brightspace with the following naming convention: HW01 .zip . Do not turn in files other than those listed below. Your submission must include: – The file containing your cryptBreak implementation named cryptBreak.py – A pdf named HW01 .pdf containing: ∗ The recovered plaintext quote ∗ The recovered encryption key ∗ A brief explanation of your code References [1] Anaconda Installation Instructions. URL https://conda.io/ projects/conda/en/latest/user-guide/install/index.html. [2] BitVector Python. URL https://pypi.org/project/BitVector/. [3] ECE 404 Lecture Notes. URL https://engineering.purdue.edu/kak/ compsec/Lectures.html. 4

$25.00 View

[SOLVED] Cs434 homework 3 naïve bayes and neural networks

In this homework, we are going to do some exercises to understand Naïve Bayes a bit better and then get some hand-on experience with simple Neural Networks in a multi-class classification setting. How to Do This Assignment. • Each question that you need to respond to is in a blue “Task Box” with its corresponding point-value listed. • We prefer typeset solutions (LATEX / Word) but will accept scanned written work if it is legible. If a TA can’t read your work, they can’t give you credit. • Programming should be done in Python and numpy. If you don’t have Python installed, you can install it from here. This is also the link showing how to install numpy. You can also search through the internet for numpy tutorials if you haven’t used it before. Google and APIs are your friends! You are NOT allowed to… • Use machine learning package such as sklearn. • Use data analysis package such as panda or seaborn. • Discuss low-level details or share code / solutions with other students. Advice. Start early. There are two sections to this assignment – one involving working with math (20% of grade) and another focused more on programming (80% of the grade). Read the whole document before deciding where to start. How to submit. Submit a zip file to Canvas. Inside, you will need to have all your working code and hw3-report.pdf. You will also submit test set predictions to a class-wide Kaggle competition. 1 Written Exercises: Analyzing Naïve Bayes [5pts] 1.1 Bernoulli Naïve Bayes As A Linear Classifier Consider a Naïve Bayes model for binary classification where the features X1, …, Xd are also binary variables. As part of training this Naïve Bayes model, we would estimate the conditional distributions P(Xi |y=c) for c = {0, 1} and i = 1, …, d as well as the prior distribution P(y=c) for c = {0, 1}. For notational simplicity, let’s denote P(Xi=1|y=c) as θic and P(Xi=0|y=c) as 1 − θic. Likewise, let θ1 be P(y=1) and θ0 = 1 − θ1 be P(y=0). If we write out the posterior of this classifier (leaving out the normalizing constant), we would have: P(y = 1|x1, …, xd) = P(y = 1) Qd i=1 P(xi |y = 1) P(x1, …, xd) ∝ θ1 Y d i=1 θ xi i1 (1 − θi1) 1−xi (1) P(y = 0|x1, …, xd) = P(y = 0) Qd i=1 P(xi |y = 0) P(x1, …, xd) ∝ θ0 Y d i=1 θ xi i0 (1 − θi0) 1−xi (2) The classification rule in Naïve Bayes is to choose the class with the highest posterior probability, that is to say by predicting y = argmaxc P(y = c|x1, …, xd) for a new example x = [x1, …, xd]. In order for the prediction to be class 1, P(y = 1|x1, …, xd) must be greater than P(y = 0|x1, …, xd), or equivalently we could check if: P(y = 1|x1, …, xd) P(y = 0|x1, …, xd) > 1 (3) In this setting with binary inputs, we will show that this is a linear decision boundary. This also true for continuous inputs if they are modelled with a member of the exponential family (including Gaussian) – see proof here in §3.1. 1 I Q1 Prove Bernoulli Naïve Bayes has a linear decision boundary [4pts]. Prove the Bernoulli Naïve Bayes model described above has a linear decision boundary. Specifically, you’ll need to show that class 1 will be predicted only if b + wT x > 1 for some parameters b and w. To do so, show that: P(y = 1|x1, …, xd) P(y = 0|x1, …, xd) > 1 =⇒ b + X d i=1 wixi > 0 (4) As part of your report, explicitly write expressions for the bias b and each weight wi . Hints: Expand Eq. (3) by substituting the posterior expressions from Eq. (1) & (2) into Eq. (3). Take the log of that expression and combine like-terms. 1.2 Duplicated Features in Naïve Bayes Naïve Bayes classifiers assume features are conditionally independent given the class label. When this assumption is violated, Naïve Bayes classifiers may be over-confident or under-confident in the outputs. To examine this more closely, we’ll consider the situation where an input feature is duplicated. These duplicated features are maximally dependent – making this is a strong violation of conditional independence. Here, we’ll examine a case where the confidence increases when the duplicated features are included despite no new information actually being added as input. I Q2 Duplicate Features in Naïve Bayes [1pts]. Consider a Naïve Bayes model with a single feature X1 for a binary classification problem. Assume a uniform prior such that P(y = 0) = P(y = 1). Suppose the model predicts class 1 for an example x then we know: P(y = 1|X1 = x1) > P(y = 0|X1 = x1) (5) Now suppose we make a mistake and duplicate the X1 feature in our data – now we have two identical inputs X1 and X2. Show that the predicted probability for class 1 is higher than it was before; that is, prove: P(y = 1|X1 = x1, X2 = x2) > P(y = 1|X1 = x1) (6) Hints: Use the assumption that P(y = 0) = P(y = 1) to simplify the posterior expressions in Eq. (6) and Eq. (5). As X2 is an exact copy of X1, P(X2 = x2|y) is the same as P(X1 = x1|y) for any example. 2 Implementing a Neural Network For Digit Identification [20pts] Small MNIST. In this section, we will implement a feed-forward neural network model for predicting the value of a drawn digit. We are using a subset of the MNIST dataset commonly used in machine learning research papers. A few example of these handwritten-then-digitized digits from the dataset are shown below: Each digit is a 28 × 28 greyscale image with values ranging from 0 to 256. We represent an image as a row vector x ∈ R 1×784 where the image has been serialized into one long vector. Each digit has an associated class label from 0,1,2,…,9 corresponding to its value. We provide three dataset splits for this homework – a training set containing 5000 examples, a validation set containing 1000, and our test set containing 4220 (no labels). These datasets can be downloaded from the class Kaggle competition for this homework. 2 2.1 Cross-Entropy Loss for Multiclass Classification Unlike the previous classification tasks we’ve examined, we have 10 different possible class labels here. How do we measure error of our model? Let’s formalize this a little and say we have a dataset D = {xi , yi} N i=1 with yi ∈ {0, 1, 2, …, 9}. Assume we have a model f(x; θ) parameterized by a set of parameters θ that predicts P(Y |X = x) (a distribution over our labels given an input). Let’s refer to P(Y = c|X = x) predicted from this model as pc|x for compactness. We can write this output as a categorical distribution: P(Y = y|X = x) =    p0|x ify = 0, p1|x ify = 1 . . . p9|x ify = 9 = Y 9 c=0 p I[y==c] c (7) where I[condition] is the indicator function that is 1 if the condition is true and 0 otherwise. Using this, we can write our negative log-likelihood of a single example as as: −logP(D|θ) = − X 9 c=0 I[yi == c] log pc|xi = − log pyi|xi (8) This loss function is also often referred to as a Cross-Entropy loss. In this homework, we will minimize this negative log-likelihood by stochastic gradient descent. In the following, we will refer to this negative log-likelihood as L(θ) = − log pyi|xi (9) Note that we write L as a function of θ because each pyi|xi is produced by our model f(xi ; θ). 2.2 Implementing Backpropagation for Feed-forward Neural Network In this homework, we’ll consider feed-forward neural networks composed of a sequence of linear layers xW1 + b1 and non-linear activation functions g1(·). As such, a network with 3 of these layers stacked together can be written b3 + g2(b2 + g1(b1 + x ∗ W1) ∗ W2) ∗ W3 (10) Note how that this is a series of nested functions, reflecting the sequential feed-forward nature of the computation. To make our notation easier in the future, I want to give a name to the intermediate outputs at each stage so will expand this to write: z1 = x ∗ W1 + b1 (11) a1 = g1(z1) (12) z2 = a1 ∗ W2 + b2 (13) a2 = g2(z2) (14) z3 = a2 ∗ W3 + b3 (15) (16) where z’s are intermediate outputs from the linear layers and a’s are post-activation function outputs. In the case of our MNIST experiments, z3 will have 10 dimensions – one for each of the possible labels. Finally, the output vector z3 is not yet a probability distribution so we apply the softmax function: pj|x = e z3j P9 c=0 e z3c (17) and let p·|x be the vector of these predicted probability values. Gradient Descent for Neural Networks. Considering this simple 3-layer neural network, there are quite a few parameters spread out through the function – weight matrices W3, W2, W1 and biases vectors b3, b2, b1. Suppose we would like to find parameters that minimize our loss L that measures our error in the network’s prediction. 3 How can we update the weights to reduce this error? Let’s use gradient descent and start by writing out the chain rule for the gradient of each of these. I’ll work backwards from W3 to W1 to expose some structure here. δL δW3 = δL δp·|x δp·|x δz3 δz3 δW3 (18) δL δW2 = δL δp·|x δp·|x δz3 δz3 δa2 δa2 δz2 δz2 δW2 (19) δL δW1 = δL δp·|x δp·|x δz3 δz3 δa2 δa2 δz2 δz2 δa1 δa1 δz1 δz1 δW1 (20) As I’ve highlighted in color above, we end up reusing the same intermediate terms over and over as we compute derivatives for weights further and further from the output in our network.1 As discussed in class, this suggests the straight-forward backpropagation algorithm for computing these efficiently. Specifically, we will compute these intermediate colored terms starting from the output and working backwards. Forward-Backward Pass in Backpropagation. One convenient way to implement backpropagation is to consider each layer (or operation) f as having a forward pass that computes the function output normally as output = fforward(input) (21) and a backward pass that takes in the gradient up to this point in our backward pass and then outputs the gradient of the loss with respect to its input: δL δinput = fbackward  δL δoutput = δL δoutput δoutput δinput (22) The backward operator will also compute any gradients with respect to parameters of f and store them to be used in a gradient descent update step after the backwards pass. The starter code implements this sort of framework. See the snippet on the following page that defines a neural network like the one we’ve described here, except it allows for a configurable number of linear layers. Please read the comments and code below before continuing reading this document. To give concrete examples of the forward-backward steps for an operator, consider the Sigmoid (aka the logistic) activation fucntion below: Sigmoid(x) = 1 1 + e−x (23) The implementation for forward and backward for the Sigmoid is below – in forward it computes Eq.(23), in backward it computes and returns δL δinput = δL δoutput δoutput δinput = δL δoutputSigmoid(input)(1 − Sigmoid(input)) (24) It has no parameters so does nothing during the “step” function. 1 class Sigmoid : 2 3 # Given the input , apply the sigmoid function 4 # store the output value for use in the backwards pass 5 def forward ( self , input ) : 6 self . act = 1/(1+ np . exp (- input )) 7 return self . act 8 9 # Compute the gradient of the output with respect to the input 10 # self .act *(1 – self .act ) and then multiply by the loss gradient with 11 # respect to the output to produce the loss gradient with respect to the input 12 def backward ( self , grad ) : 13 return grad * self . act * (1 – self . act ) 14 15 # The Sigmoid has no parameters so nothing to do during a gradient descent step 16 def step ( self , step_size ): 17 return 1 I don’t repeat this for the bias vectors, as it would simply involve changing the final terms from δzi δWi to δzi δbi . 4 1 class FeedForwardNeuralNetwork : 2 3 # Builds a network of linear layers separated by non – linear activations 4 # Either ReLU or Sigmoid . Each internal layer has hidden_dim dimensions . 5 def __init__ ( self , input_dim , output_dim , hidden_dim , num_layers , activation =” ReLU “) : 6 7 if num_layers == 1: # Just a linear mapping from input to output 8 9 self . layers = [ LinearLayer ( input_dim , output_dim ) ] 10 11 else : # At least two layers 12 13 # Layer to map input to hidden dimension size 14 self . layers = [ LinearLayer ( input_dim , hidden_dim ) ] 15 self . layers . append ( Sigmoid () if activation ==” Sigmoid ” else ReLU () ) 16 17 # Hidden layers 18 for i in range ( num_layers -2) : 19 self . layers . append ( LinearLayer ( hidden_dim , hidden_dim ) ) 20 self . layers . append ( Sigmoid () if activation ==” Sigmoid ” else ReLU () ) 21 22 # Layer to map hidden dimension to output size 23 self . layers . append ( LinearLayer ( hidden_dim , output_dim ) ) 24 25 # Given an input , call the forward function of each of our layers 26 # Pass the output of each layer to the next one 27 def forward ( self , X) : 28 for layer in self . layers : 29 X = layer . forward ( X) 30 return X 31 32 # Given an gradient with respect to the network output , call 33 # the backward function of each of our layers . Pass the output of each layer to the one before it 34 def backward ( self , grad ) : 35 for layer in reversed ( self . layers ): 36 grad = layer . backward ( grad ) 37 38 # Tell each layer to update its weights based on the gradient computed in the backward pass 39 def step ( self , step_size =0.001) : 40 for layer in self . layers : 41 layer . step ( step_size ) Operating on Batches. The network described in the equations earlier in this section is operating on a single input at a time. In practice, we will want to operate on sets of n examples at once such that the layer actually computes Z = XW + b for X ∈ R n×input_dim and Z ∈ R n×output_dim – call this a batched operation. It is straightforward to change the forward pass to operate on these all at once. For example, a linear layer can be rewritten as Z = XW + b where the +b is a broadcasted addition – this is already done in the code above. On the backward pass, we simply need to aggregate the gradient of the loss of each data point with respect to our parameters. For example, δL δW1 = Xn i=1 δLi δW1 (25) where Li is the loss of the i’th datapoint and L is the overall loss. Deriving the Backward Pass for a Linear Layer. In this homework, we’ll implement the backward pass of a linear layer. To do so, we’ll need to be able to compute dZ/db, dZ/dW, and dZ/dX. For each, we’ll start by considering the problem for a single training example x (i.e. a single row of X) and then generalize to the batch setting. In this single-example setting, z = xW + b such that z, b ∈ R 1×c , x ∈ R 1×d , and W ∈ R d×c . Once we solve this case, extending to the batch setting just requires summing over the gradient terms for each example. dZ/db. Considering just the i’th element of z, we can write zi = xw·,i +bi where w·,i is the i’th column of W. From this equation, it is straightforward to observe that element bi only effects the corresponding output zi such that dzi dbj = ( 1 if i = j 0 otherwise (26) This suggests that the Jacobian dz/db is an identity matrix I of dimension c × c. Applying chain rule and summing 5 over all our datapoints, we see dL/db can be computed as a sum of the rows of dL/dZ: dL db = Xn k=1 dLk dZk dZk db = Xn k=1 dLk dZk I = Xn k=1 dLk dZk (27) dZ/dW. Following the same process of reasoning from the single-example case, we can again write the i’th element of z as zi = xw·,i + bi where w·,i is the i’th column of W. When considering the derivative of zi with respect to the columns of W, we see that it is just x for w·,i and 0 for other columns as they don’t contribute to zi – that is to say: δzi δw·,j = ( x if i = j 0 otherwise (28) Considering the loss gradient δL/δw·,i for a single example, we can write: δL δw·,i = δL δzi δzi δw·,i = δL δzi x (29) That is to say, each column i of δL δW is the input x scaled by the loss gradient of zi . As such, we can compute the gradient for the entire W as the product: δL δW = x T δL δz (30) Notice that x T is d × 1 and δL δz is 1 × c – resulting in a d × c gradient that matches the dimension of W. Now let’s consider if we have multiple datapoints x1, …xn as the matrix X and likewise multiple activation vectors z1, …, zn as the matrix Z. As our loss simply sums each datapoint’s loss, the gradient also decomposes into a sum of δL δzi,· terms. δL δW = Xn k=1 δL δZk δZk δW = Xn k=1 XT k δLk δZk (31) We can write this even more compactly as: δL δW = XT δL δZ (32) dZ/dX. This follows a very similar path as dZ/dW. We again consider the i’th element of z as zi = xw·,i + bi where w·,i is the i’th column of W. Taking the derivative with respect to x it is clear that for zi the result will be w·,i. δzi δx = w·,i (33) This suggests that the rows of dZ/dx are simply the columns of W such that dZ/dx = WT and we can write δL δx = δL δz δz δx = δL δz WT (34) Moving to the multiple example setting, the above expression gives each row of dL/dX and the entire matrix can be computed efficiently as dL dX = dL dZ WT (35) 6 I Q3 Implementing the Backward Pass for a Linear Layer [6pt]. Implement the backward pass function of the linear layer in the skeleton code. The function takes in the matrix dL/dZ as the variable grad and you must compute dL/dW, dL/db, and dL/dX. The first two are stored as self.grad_weights and self.grad_bias and the third is returned. The expressions for these can be found above in Eq.27 (dL/db), Eq.32 (dL/dW), and Eq.35 (dL/dX). 1 class LinearLayer : 2 3 # Initialize our layer with ( input_dim , output_dim ) weight matrix and a (1 , output_dim ) bias vector 4 def __init__ ( self , input_dim , output_dim ): 5 self . weights = np . random . randn ( input_dim , output_dim ) . astype ( np . float64 )* np . sqrt (2. / input_dim ) 6 self . bias = np . ones ( (1 , output_dim ) ) . astype ( np . float64 ) *0.5 7 8 # During the forward pass , we simply compute Xw+b 9 def forward ( self , input ) : 10 self . input = input 11 return self . input@self . weights + self . bias 12 13 # ################################################ 14 # Q3 Implementing Backward Pass for Linear 15 # ################################################ 16 # Inputs : 17 # 18 # grad dL/dZ — For a batch size of n, grad is a (n x output_dim ) matrix where 19 # the i’th row is the gradient of the loss of example i with respect 20 # to z_i ( the output of this layer for example i) 21 # 22 # Computes and stores : 23 # 24 # self . grad_weights dL/dW — A ( input_dim x output_dim ) matrix storing the 25 # gradient of the loss with respect to the weights of 26 # this layer . 27 # 28 # self . grad_bias dL/db — A (1 x output_dim ) matrix storing the gradient 29 # of the loss with respect to the bias of this layer . 30 # 31 # Return Value : 32 # 33 # grad_input dL/dX — For a batch size of n, grad_input is a (n x input_dim ) 34 # matrix where the i’th row is the gradient of the loss of 35 # example i with respect to x_i (the input of this 36 # layer for example i) 37 # ################################################ 38 39 def backward ( self , grad ) : # grad is dL/dZ 40 self . grad_weights = ? # Compute dL/dW as in Eq. 32 41 self . grad_bias = ? # Compute dL/db as in Eq. 27 42 return ? # Compute dL/dX as in Eq. 35 43 44 # During the gradient descent step , update the weights and biases based on the stored gradients from the backward pass 45 def step ( self , step_size ): 46 self . weights -= step_size * self . grad_weights 47 self . bias -= step_size * self . grad_bias Once you’ve completed the above task, running the skeleton code should load the digit data and train a 2-layer neural network with hidden dimension of 16 and Sigmoid activations. This model is trained on the training set and evaluated once per epoch on the validation data. After training, it will produce a plot of your results that should look like the one below. This curve plots training and validation loss (cross-entropy in this case) over training iterations (in red and measured on the left vertical axis). It also plots training and validation accuracy (in blue and measures on the right vertical axis). As you can see, this model achieves between 80% and 90% accuracy on the validation set. 7 2.3 Analyzing Hyperparmeter Choices Neural networks have many hyperparameters. These range from architectural choices (How many layers? How wide should each layer be? What activation function should be used? ) to optimization parameters (What batch size for stochastic gradient descent? What step size (aka learning rate)? How many epochs should I train? ). This section has you modify many of these to examine their effect. The default parameters are below for easy reference. 1 # GLOBAL PARAMETERS FOR STOCHASTIC GRADIENT DESCENT 2 np . random . seed (102) 3 step_size = 0.01 4 batch_size = 200 5 max_epochs = 200 6 7 # GLOBAL PARAMETERS FOR NETWORK ARCHITECTURE 8 number_of_layers = 2 9 width_of_layers = 16 # only matters if number of layers > 1 10 activation = ” ReLU ” if False else ” Sigmoid ” Optimization Parameters. Optimization parameters in Stochastic Gradient Descent are very inter-related. Large batch sizes mean less noisy estimates of the gradient, so larger step sizes could be used. But larger batch sizes also mean fewer gradient updates per epoch, so we might need to increase the max epochs. Getting a good set of parameters that work well can be tricky and requires checking the validation set performance. Further, these “good parameters” will vary model-to-model. I Q4 Learning Rate [2pts]. The learning rate (or step size) in stochastic gradient descent controls how large of a step in the direction of the loss gradient we take our parameters at each iteration. The batch size determines how many data points we use to estimate the gradient. Modify the hyperparameters to run the following experiments: 1. Step size of 0.0001 (leave default values for other hyperparameters) 2. Step size of 5 (leave default values for other hyperparameters) 3. Step size of 10 (leave default values for other hyperparameters) Include these plot in your report and answer the following questions: a) Compare and contrast the learning curves with your curve using the default parameters. What do you observe in terms of smoothness, shape, and what performance they reach? b) For (a), what would you expect to happen if the max epochs were increased? 8 Activation Function and Depth. As networks get deeper (or have more layers) they tend to become able to fit more complex functions (though this also may lead to overfitting). However, this also means the backpropagated gradient has many product terms before reaching lower levels – resulting in the magnitude of the gradients being relatively small. This has the effect of making learning slower. Certain activation functions make this better or worse depending on the shape of their derivative. One popular choice is to use a Rectified Linear Unit or ReLU activation that computes: ReLU(x) = max(0, x) (36) This is especially common in very deep networks. In the next question, we’ll see why experimentally. I Q5 ReLU’s and Vanishing Gradients [3pts]. Modify the hyperparameters to run the following experiments: 1. 5-layer with Sigmoid Activation (leave default values for other hyperparameters) 2. 5-layer with Sigmoid Activation with 0.1 step size (leave default values for other hyperparameters) 3. 5-layer with ReLU Activation (leave default values for other hyperparameters) Include these plot in your report and answer the following questions: a) Compare and contrast the learning curves you observe and the curve for the default parameters in terms of smoothness, shape, and what performance they reach. Do you notice any differences in the relationship between the train and validation curves in each plot? b) If you observed increasing the learning rate in (2) improves over (1), why might that be? c) If (3) outperformed (1), why might that be? Consider the derivative of the sigmoid and ReLU functions. 2.4 Randomness in Training There is also a good deal of randomness in training a neural network with stochastic gradient descent – network weights are randomly initialized and the batches are randomly ordered. This can make a non-trivial difference to outcomes. I Q6 Measuring Randomness [1pt]. Using the default hyperparameters, set the random seed to 5 different values and report the validation accuracies you observe after training. What impact does this randomness have on the certainty of your conclusions in the previous questions? 2.5 Make Your Kaggle Submission Great work getting here. In this section, you’ll submit the predictions of your best model to the class-wide Kaggle competition. You are free to make any modification to your neural network or the optimization procedure to improve performance; however, it must remain a feed-forward neural network! For example, you can change any of the optimization hyperparameters or add momentum / weight decay, vary the number of layers or width, add dropout or residual connections, etc. 9 I Q7 Kaggle Submission [8pt]. Submit a set of predictions to Kaggle that outperforms the baseline on the public leaderboard. To make a valid submission, use the train set to train your neural network classifier and then apply it to the test instances in mnist_small_test.csv available from Kaggle’s Data tab. Format your output as a two-column CSV as below: id,digit 0,3 1,9 2,4 3,1 . . where the id is just the row index in mnist_small_test.csv. You may submit up to 10 times a day. In your report, tell us what modifications you made for your final submission. Extra Credit and Bragging Rights [1.25pt Extra Credit]. The TA has made a submission to the leaderboard. Any submission outperforming the TA on the private leaderboard at the end of the homework period will receive 1.25 extra credit points on this assignment. Further, the top 5 ranked submissions will “win HW3” and receive bragging rights. 3 Debriefing (required in your report) 1. Approximately how many hours did you spend on this assignment? 2. Would you rate it as easy, moderate, or difficult? 3. Did you work on it mostly alone or did you discuss the problems with others? 4. How deeply do you feel you understand the material it covers (0%–100%)? 5. Any other comments?

$25.00 View

[SOLVED] Cs434 homework 2 linear models for regression and classification

In this homework, we are going to do some exercises about alternative losses for linear regression, practice recall and precision calculations, and implement a logistic regression model to predict whether a tumor is malignant or benign. There is substantial skeleton code provided with this assignment to take care of some of the details you already learned in the previous assignment such as cross-validation, data loading, and computing accuracies. How to Do This Assignment. • Each question that you need to respond to is in a blue “Task Box” with its corresponding point-value listed. • We prefer typeset solutions (LATEX / Word) but will accept scanned written work if it is legible. If a TA can’t read your work, they can’t give you credit. • Programming should be done in Python and numpy. If you don’t have Python installed, you can install it from here. This is also the link showing how to install numpy. You can also search through the internet for numpy tutorials if you haven’t used it before. Google and APIs are your friends! You are NOT allowed to… • Use machine learning package such as sklearn. • Use data analysis package such as panda or seaborn. • Discuss low-level details or share code / solutions with other students. Advice. Start early. There are two sections to this assignment – one involving working with math (20% of grade) and another focused more on programming (80% of the grade). Read the whole document before deciding where to start. How to submit. Submit a zip file to Canvas. Inside, you will need to have all your working code and hw1-report.pdf. You will also submit test set predictions to a class Kaggle. This is required to receive credit for Q8. 1 Written Exercises: Linear Regression and Precision/Recall [5pts] I’ll take any opportunity to sneak in another probability question. It’s a small one. 1.1 Least Absolute Error Regression In lecture, we showed that the solution for least squares regression was equivalent to the maximum likelihood estimate of the weight vector of a linear model with Gaussian noise. That is to say, our probabilistic model was yi ∼ N (µ = wT xi , σ) −→ P(yi |xi , w) = 1 σ √ 2π e − (yi−wT xi ) 2 σ2 (1) and we showed that the MLE estimate under this model also minimized the sum-of-squared-errors (SSE): argmaxY N i=1 P(yi |xi , w) | {z } Likelihood = argmin X N i=1 (yi − wT xi) 2 | {z } Sum of Squared Errors (2) However, we also demonstrated that least squares regression is very sensitive to outliers – large errors squared can dominate the loss. One suggestion was to instead minimize the sum of absolute errors. In this first question, you’ll show that changing the probabilistic model to assume Laplace error yields a least absolute error regression objective. To be more precise, we will assume the following probabilistic model for how yi is produced given xi : yi ∼ Laplace(µ = wT xi , b) −→ P(yi |xi , w) = 1 2b e − |yi−wT xi | b (3) 1 I Q1 Linear Model with Laplace Error [2pts]. Assuming the model described in Eq.3, show that the MLE for this model also minimizes the sum of absolute errors (SAE): SAE(w) = X N i=1 |yi − wT xi | (4) Note that you do not need to solve for an expression for the actual MLE expression for w to do this problem. Simply showing that the likelihood is proportional to SAE is sufficient. 1.2 Recall and Precision y P(y|x) 0 0.1 0 0.1 0 0.25 1 0.25 0 0.3 0 0.33 1 0.4 0 0.52 y P(y|x) 0 0.55 1 0.7 1 0.8 0 0.85 1 0.9 1 0.9 1 0.95 1 1.0 Beyond just calculating accuracy, we discussed recall and precision as two other measure of a classifier’s abilities. Remember that we defined recall and precision as in terms of true positives, false positives, true negatives, and false negatives: Recall = #TruePositives #TruePositives + #FalseNegatives (5) and Precision = #TruePositives #TruePositives + #FalsePositives (6) I Q2 Computing Recall and Precision [3pts]. To get a feeling for recall and precision, consider the set of true labels (y) and model predictions P(y|x) shown in the tables above. We compute Recall and Precision as a specific threshold t – considering any point with P(y|x) > t as being predicted to be the positive class (1) and ≤ t to be the negative class (0). Compute and report the recall and precision for thresholds t = 0, 0.2, 0.4, 0.6, 0.8, and 1. 2 Implementing Logistic Regression for Tumor Diagnosis [20pts] In this section, we will implement a logistic regression model for predicting whether a tumor is malignant (cancerous) or benign (non-cancerous). The dataset has eight attributes – clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bland chromatin, nomral nucleoli, and mitoses – all rated between 1 and 10. You will again be submitting your predictions on the test set via the class Kaggle. You’ll need to download the train_cancer.csv and test_cancer_pub.csv files from the Kaggle’s data page to run the code. 2.1 Implementing Logistic Regression Logistic Regression. Recall from lecture that the logistic regression algorithm is a binary classifier that learns a linear decision boundary. Specifically, it predicts the probability of an example x ∈ R d to be class 1 as P(yi = 1 | xi) = σ(wT xi) = 1 1 + e−wT xi , (7) where w ∈ R d is a weight vector that we want to learn from data. To estimate these parameters from a dataset of n input-output pairs D = {xi , yi}, we assumed yi ∼ Bernoulli(θ = σ(wT xi)) and wrote the negative log-likelihood: −logP(D|w) = − Xn i=1 logP(yi |xi , w) = − Xn i=1yi logσ(wT xi) + (1 − yi)log(1 − σ(wT xi)) (8)I Q3 Negative Log Likelihood [2pt]. Implement the calculateNegativeLogLikelihood function in logreg.py. The function takes in a n×d matrix X of example features (each row is an example) and a n×1 vector of labels y. It should return the result of computing Eq.8 (which results in a scalar value). Note that np.log and np.exp will apply the log or exponential function to each element of an input matrix. Gradient Descent. We want t find optimal weights w∗ = argminw − logP(D|w). However, taking the gradient of the negative log-likelihood yields the expression below which does not offer a closed-form solution. ∇w(−logP(D|w)) = Xn i=1 (σ(wT xi) − yi)xi (9) Instead, we opted to minimize −logP(D|w) by gradient descent. We’ve provided pseudocode in the lecture but to review the basic procedure is written below (α is the stepsize). 1. Initialize w to some initial vector (all zeros,random, etc) 2. Repeat until max iterations: (a) w = w − α ∗ ∇w(−logP(D|w)) For convex functions (and sufficiently small values of the stepsize α), this will converge to the minima. I Q4 Gradient Descent for Logistic Regression [5pt]. Finish implementing the trainLogistic function in logreg.py. The function takes in a n×d matrix X of example features (each row is an example) and a n×1 vector of labels y. It returns the learned weight vector and a list containing the observed negative log-likelihood after each epoch (uses calculateNegativeLogLikleihood). The skeleton code is shown below. 1 def trainLogistic (X ,y , max_iters =2000 , step_size =0.0001) : 2 # Initialize our weights with zeros 3 w = np . zeros ( (X. shape [1] ,1) ) 4 # Keep track of losses for plotting 5 losses = [] 6 7 # Take up to max_iters steps of gradient descent 8 for i in range ( max_iters ): 9 10 # Make a variable to store our gradient 11 w_grad = np . zeros ( (X . shape [1] ,1) ) 12 13 # Compute the gradient over the dataset and store in w_grad 14 15 # #### TODO : Implement equation 9. 16 17 # This is here to make sure your gradient is the right shape 18 assert ( w_grad . shape == (X . shape [1] ,1) ) 19 20 # Take the update step in gradient descent 21 w = w – step_size * w_grad 22 23 # Calculate the negative log – likelihood with w 24 losses . append ( calculateNegativeLogLikelihood (X ,y ,w )) 25 26 return w , losses To complete this code, you’ll need to implement Eq.9 to compute the gradient of the negative log-likelihood of the dataset with respect to the weights w. Note that an approach that loops over the dataset to compute Eq.9 takes about 15x slow than a fully matrix-form version. The max_iters variable can be set lower (200ish) during development to avoid this slowness being too annoying – but when optimizing performance later you may want to raise it back up. Either solution is fine for this assignment if you’re patient. If you’ve implemented this question correctly, running logreg.py should print out the learned weight vector and training accuracy. You can expect something around 86% for the train accuracy. Provide your weight vector and accuracy in your report. 3 2.2 Playing with Logistic Regression on This Dataset Adding a Bias. The model we trained in the previous section did not have a constant offset (called a bias) in the model – computing wT x rather than wT x + b. A simple way to include this in our model is to add an new column to X that has all ones in it. This way, the first weight in our weight vector will always be multiplied by 1 and added. I Q5 Adding A Dummy Variable [1pt]. Implement the dummyAugment function in logreg.py to add a column of 1’s to the left side of an input matrix and return the new matrix. Once you’ve done this, running the code should produce the training accuracy for both the no-bias and this updated model. Report the new weight vector and accuracy. Did it make a meaningful difference? Observing Training Curves. After finishing the previous question, the code now also produces a plot showing the negative log-likelihood for the bias and no-bias models over the course of training. If we change the learning rate (also called the step size), we could see significant differences in how this plot behaves – and in our accuracies. I Q6 Learning Rates / Step Sizes. [2pt] Gradient descent is sensitive to the learning rate (or step size) hyperparameter and the number of iterations. Does it look like the gradient descent algorithm has converged or does it look like the negative log-likelihood could continue to drop if max_iters was set higher? Different values of the step size will change the nature of the curves in the training curve plot. In the skeleton code, this is originally set to 0.0001. Change the step size to 1, 0.1, 0.01, and 0.00001. Provide the resulting training curve plots and training accuracy. Discuss any trends you observe. Cross Validation. The code will also now print out K-fold cross validation results (mean and standard deviation of accuracy) for K = 2, 3, 4, 5, 10, 20, and 50. This part may be a bit slow, but you’ll see how the mean and standard deviation change with larger K. I Q7 Evaluating Cross Validation [2pt] Come back to this after making your Kaggle submission. The point of cross-validation is to help us make good choices for model hyperparameters. For different values of K in K-fold cross validation, we got different estimates of the mean and standard deviation of our accuracy. How well did these means and standard deviations capture your actual performance on the leaderboard? Discuss any trends you observe. 2.3 Make Your Kaggle Submission Great work getting here. In this section, you’ll submit the predictions of your best model to the class-wide Kaggle competition. You are free to make any modification to your logistic regression algorithm to improve performance; however, it must remain logistic regression! For example, you can change feature representation, adjust the learning rate, and max_steps parameters. 4 I Q8 Kaggle Submission [8pt]. Submit a set of predictions to Kaggle that outperforms the baseline on the public leaderboard. To make a valid submission, use the train set to build your logistic regression classifier and then apply it to the test instances in test_cancer_pub.csv available from Kaggle’s Data tab. Format your output as a two-column CSV as below: id,type 0,0 1,1 2,1 3,0 . . . where the id is just the row index in test_cancer_pub.csv. You may submit up to 10 times a day. In your report, tell us what modifications you made for your final submission. Extra Credit and Bragging Rights [1.25pt Extra Credit]. The TA has made a submission to the leaderboard. Any submission outperforming the TA on the private leaderboard at the end of the homework period will receive 1.25 extra credit points on this assignment. Further, the top 5 ranked submissions will “win HW2” and receive bragging rights. 3 Debriefing (required in your report) 1. Approximately how many hours did you spend on this assignment? 2. Would you rate it as easy, moderate, or difficult? 3. Did you work on it mostly alone or did you discuss the problems with others? 4. How deeply do you feel you understand the material it covers (0%–100%)? 5. Any other comments?

$25.00 View

[SOLVED] Cs434 homework 1 k-nearest neighbor classification and statistical estimation

In this homework, we are going to implement a kNN classifier and get some practice with the concepts in statistical estimation we went over in lecture. The assignment will help you understand how to apply the concepts from lecture to applications. How to Do This Assignment. • Each question that you need to respond to is clearly marked in a blue “Task Box” with its corresponding point-value listed. • We prefer typeset solutions (LATEX / Word) but will accept scanned written work if it is legible. If a TA can’t read your work, they can’t give you credit. Submit your solutions to Canvas as a zip including your report and any code the assignment asks you to develop. • Programming should be done in Python and numpy. If you don’t have Python installed in your machine, you can install it from https://www.python.org/downloads/. This is also the link showing how to install numpy: https://numpy.org/install/. You can also search through the internet for numpy tutorials if you haven’t used it before. Google is your friend! You are NOT allowed to… • Use machine learning package such as sklearn. • Use data analysis package such as panda or seaborn. • Discuss low-level details or share code / solutions with other students. Advice. Start early. Start early. Start early. There are two sections to this assignment – one involving working with math (20% of grade) and another focused more on programming (80% of the grade). Read the whole document before deciding where to start. How to submit. Submit a zip file to Canvas. Inside, you will need to have all your working code and hw1-report.pdf. You will also submit test set predictions to a class Kaggle. This is a required to receive credit for Q10. 1 Statistical Estimation [10pts] The Poisson distribution is a discrete probability distribution over the natural numbers starting from zero (i.e. 0,1,2,3,…). Specifically, it models how many occurrences of an event will happen within a fixed interval. For example – how many people will call into a call center within a given hour. Importantly, it assumes that an individual event (a person calling the center) is independent and occurs with a fixed probability. Say there is a 2% chance of someone calling every minute, then on average we would expect 1.2 people per hour – but sometimes there would be more and sometimes fewer. The Poisson distribution models that uncertainty over the total number of events in an interval. The probability mass function for the Poisson with parameter λ is given as: Pois(X = x; λ) = λ x e −λ x! ∀x ∈ {0, 1, 2, …} λ ≥ 0 (1) where the rate λ can be interpreted as the average number of occurrences in an interval. In this section, we’ll derive maximum likelihood estimates for λ and show that the Gamma distribution is a conjugate prior to the Poisson. 1 I Q1 Maximum Likelihood Estimation of λ [4pts]. Assume we observe a dataset of occurrence counts D = {x1, x2, …, xN } coming from N i.i.d random variables distributed according to Pois(X = x; λ). Derive the maximum likelihood estimate of the rate parameter λ. To help guide you, consider the following steps: 1. Write out the log-likelihood function logP(D|λ). 2. Take the derivative of the log-likelihood with respect to the parameter λ. 3. Set the derivative equal to zero and solve for λ – call this maximizing value λˆMLE. Your answer should make intuitive sense given our interpretation of λ as the average number of occurrences. In lecture, we discussed how to use priors to encode beliefs about parameters before any data is seen. We showed the Beta distribution was a conjugate prior to the Bernoulli – that is to say that the posterior of a Bernoulli likelihood and Beta prior is itself a Beta distribution (i.e. Beta ∝ Bernoulli * Beta) . The Poisson distribution also has a conjugate – the Gamma distribution. Gamma is a continuous distribution over the range [0, ∞) with probability density function: Gamma(Λ = λ; α, β) = β αλ α−1 e −βλ Γ(α) ∀λ ≥ 0 α, β > 0 (2) I Q2 Maximum A Posteriori Estimate of λ with a Gamma Prior [4pts]. As before, assume we observe a dataset of occurrence counts D = {x1, x2, …, xN } coming from N i.i.d random variables distributed according to Pois(X = x; λ). Further, assume that λ is distributed according to a Gamma(Λ = λ; α, β). Derive the MAP estimate of λ. To help guide you, consider the following steps: 1. Write out the log-posterior logP(λ|D) ∝ logP(D|λ) + logP(λ). 2. Take the derivative of logP(D|λ) + logP(λ) with respect to the parameter λ. 3. Set the derivative equal to zero and solve for λ – call this maximizing value λˆMAP . In the previous question we found the MAP estimate for λ under a Poisson-Gamma model; however, we didn’t demonstrate that Gamma was a conjugate prior to Poisson – i.e. we did not show that the product of a Poisson likelihood and Gamma prior results in another Gamma distribution (or at least is proportional to one). I Q3 Deriving the Posterior of A Poisson-Gamma Model [2pt]. Show that the Gamma distribution is a conjugate prior to the Poisson by deriving expressions for parameters αP , βP of a Gamma distribution such that P(λ|D) ∝ Gamma(λ; αP , βP ). [Hint: Consider P(D|λ)P(λ) and group like-terms/exponents. Try to massage the equation to looking like the numerator of a Gamma distribution. The denominator can be mostly ignored if it is constant with respect to λ as we are only trying to show a proportionality (∝).] 2 k-Nearest Neighbor (kNN) [50pts] In this section, we’ll implement our first machine learning algorithm of the course – k Nearest Neighbors. We are considering a binary classification problem where the goal is to classify whether a person has an annual income more or less than $50,000 given census information. Test results will be submitted to a class-wide Kaggle competition. As no validation split is provided, you’ll need to perform cross-validation to fit good hyperparameters. Data Analysis. We’ve done the data processing for you for this assignment; however, getting familiar with the data before applying any algorithm is a very important part of applying machine learning in practice. Quirks of the dataset could significantly impact your model’s performance or how you interpret the outcome of your experiments. For instance, if I have an algorithm that achieved 70% accuracy on this task – how good or bad is that? We’ll see! We’ve split the data into two subsets – a training set with 8,000 labelled rows, and a test set with 2,000 unlabelled rows. These can be downloaded from the data tab of the HW1 Kaggle as “train.csv” and “test_pub.csv” respectively. Both files will come with a header row, so you can see which column belongs to which feature. Below you will find a table listing the attributes available in the dataset. We note that categorizing some of these 2 attributes into two or a few categories is reductive (e.g. only 14 occupations) or might reinforce a particular set of social norms (e.g. categorizing sex or race in particular ways). For this homework, we reproduced this dataset from its source without modifying these attributes; however, it is useful to consider these issues as machine learning practitioners. attribute name: type. list of values id: numerical. Unique for each point. Don’t use this as a feature (it will hurt, badly). age: numerical. workclass: categorical. Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay education-num: ordinal. 1:Preschool, 2:1st-4th, 3:5th-6th, 4:7th-8th, 5:9th, 6:10th, 7:11th, 8:12th, 9:HS-grad, 10:Some-college, 11:Assoc-voc, 12:Assoc-acdm, 13:Bachelors, 14:Masters, 15:Prof-school, 16:Doctorate marital-status: categorical. Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse occupation: categorical. Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlerscleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, ArmedForces relationship: categorical. Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried race: categorical. White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black sex: categorical. 0:Male, 1:Female capital-gain: numerical. capital-loss: numerical. hours-per-week: numerical. native-country: categorical. United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(GuamUSVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinada&Tobago, Peru, Hong, Holand-Netherlands income: ordinal. 0: 50K This is the class label. Don’t use this as a feature. Our dataset has three types of attributes – numerical, ordinal, and nominal. Numerical attributes represent continuous numbers (e.g. hours-per-week worked). Ordinal attributes are a discrete set with a natural ordering, for instance different levels of education. Nominal attributes are also discrete sets of possible values; however, there is no clear ordering between them (e.g. native-country). These different attribute types require different preprocessing. As discussed in class, numerical fields have been normalized. For nominal variables like workclass, marital-status, occupation, relationship, race, and native-country, we’ve transformed these into one column for each possible value with either a 0 or a 1. For example, the first instance in the training set reads: [0, 0.136, 0.533, 0.0, 0.659, 0.397, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0] where all the zeros and ones correspond to these binarized variables. The following questions guide you through exploring the dataset and help you understand some of the steps we took when preprocessing the data. I Q4 Encodings and Distance [2pt] To represent nominal attributes, we apply a one-hot encoding technique – transforming each possible value into its own binary attribute. For example, if we have an attribute workclass with three possible values Private, State-gov, Never-worked – we would binarize the workclass attribute as shown below (each row is a single example data point): workclass Private State-gov Never-worked State-gov The original field workclass= Private workclass= State-gov workclass= Never-worked 1 0 0 0 1 0 0 0 1 0 1 0 Fields after binarization A common naive preprocessing is to treat all categoric variables as ordinal – assigning increasing integers to each possible value. For example, such an encoding would say 1=Private, 2=State-gov, and 3=Neverworked. Contrast these two encodings. Focus on how each choice affects Euclidean distance in kNN. 3 I Q5 Looking at Data [3pt] What percent of the training data has an income >50k? Explain how this might affect your model and how you interpret the results. For instance, would you say a model that achieved 70% accuracy is a good or poor model? How many dimensions does each data point have (ignoring the id attribute and class label)? [Hint: check the data, one-hot encodings increased dimensionality] 2.1 k-Nearest Neighbors In this part, you will need to implement the k-NN classifier algorithm with the Euclidean distance. (If you are not familiar with Euclidean distance, you can go back to the L1.2 – kNN slide 18) The kNN algorithm is simple as you already have the preprocessed data. To make a prediction for a new data point, there are three main steps: 1. Calculate the distance of that data point to every training example. 2. Get the k nearest points (i.e. the k with the lowest distance). 3. Return the majority class of these neighbors as the prediction Implementing this using native Python operations will be quite slow, but lucky for us there is the numpy package. numpy makes matrix operations quite efficient and easy to code. How will matrix operations help us? The next question walk you through figuring out the answer. Some useful things to check out in numpy: broadcasting, np.linalg.norm np.argsort() or np.argpartition(), and array slicing. I Q6 Norms and Distances [2pt] Distances and vector norms are closely related concepts. For instance, an L2 norm of a vector x (defined below) can be intepretted as the Euclidean distance between x and the zero vector: ||x||2 = vuutX d i=1 x 2 i (3) Given a new vector z, show that the Euclidean distance between x and z can be written as an L2 norm. [kNN implementation note for later, you can compute norms efficiently with numpy using np.linalg.norm] In kNN, we need to compute distance between every training example xi and a new point z in order to make a prediction. As you showed in the previous question, computing this for one xi can be done by applying an arithmetic operation between xi and z, then taking a norm. In numpy, arithmetic operations between matrices and vectors are sometimes defined by “broadcasting”, even if standard linear algebra doesn’t allow for them. For example, given a matrix X of size n × d and a vector z of size d, numpy will happily compute Y = X − z such that Y is the result of the vector z being subtracted from each row of X. Combining this with your answer from the previous question can make computing distances to every training point quite efficient and easy to code. I Q7 Implement kNN [10pts] Okay, enough math and talk about numpy. It is time to get our hands dirty. Let’s write some code! Implement k-Nearest Neighbors using Euclidean distance for this dataset in Python in a file named knn.py. Conceptually, you will need to load the training and test data and have some function to find the nearest neighbors for each test point to make a prediction. 2.2 K-Fold Cross Validation We do not provide class labels for the test set or a validation set, so you’ll need to implement K-fold Cross Validation1 to check if your implementation is correct and to select hyperpameters. As discussed in class, K-fold Cross Validation divides the training set into K segments and then trains K models – leaving out one of the segments each time and training on the others. Then each trained model is evaluated on the left-out fold to estimate performance. Overall performance is estimated as the mean of these K fold performances. 1Note that K here has nothing to do with k in kNN – just overloaded notation. 4 I Q8 Implement 4-fold Cross Validation [8pts] Inside of your knn.py file, implement 4-fold cross validation for your kNN algorithm. Your implementation of cross validation should return the observed accuracies for each fold. We’ll use this in the next part. I Q9 Hyperparameter Search [15pt] To search for the best hyperpameters, we will use cross-validation to estimate our accuracy on new data. For each k in 1,3,5,7,9,99,999,and 8000, report: • accuracy on the training set when using the entire training set for kNN (call this training accuracy), • the mean and variance of the 4-fold cross validation accuracies (call this validation accuracy). What is the best number of neighbors (k) you observe? When k = 1, is training error 0%? Why or why not? What trends (train and cross-valdiation accuracy rate) do you observe with increasing k? Do they relate to underfitting and overfitting? 2.3 Kaggle Submission Great work getting here. In this section, you’ll submit the predictions of your best model to the class-wide Kaggle competition. You are free to make any modification to your kNN algorithm to improve performance; however, it must remain a kNN algorithm! You can change distances, weightings, select subsets of features, etc. I Q10 Kaggle Submission [10pt]. Submit a set of predictions to Kaggle that outperforms the baseline on the public leaderboard. To make a valid submission, use the train set to build your kNN classifier and then apply it to the test instances in test_pub.csv available from Kaggle’s Data tab. Format your output as a two-column CSV as below: id,income 0,0 1,0 2,0 3,0 4,0 5,1 6,0 . . . You may submit up to 5 times a day. In your report, tell us what modifications you made to kNN for your final submission. Extra Credit and Bragging Rights [2.5pt Extra Credit]. The TA has made a submission to the leaderboard. Any submission outperforming the TA on the private leaderboard at the end of the homework period will receive 2.5 extra credit points on this assignment. Further, the top 5 ranked submissions will “win HW1” and receive bragging rights. 3 Debriefing (required in your report) 1. Approximately how many hours did you spend on this assignment? 2. Would you rate it as easy, moderate, or difficult? 3. Did you work on it mostly alone or did you discuss the problems with others? 4. How deeply do you feel you understand the material it covers (0%–100%)? 5. Any other comments?

$25.00 View

[SOLVED] Cs434 homework 4 decision trees and k-means clustering

In this homework, we are going to do some exercises to understand Decision Trees a bit better and then get some hand-on experience with k-means clustering. How to Do This Assignment. • Each question that you need to respond to is in a blue “Task Box” with its corresponding point-value listed. • We prefer typeset solutions (LATEX / Word) but will accept scanned written work if it is legible. If a TA can’t read your work, they can’t give you credit. • Programming should be done in Python and numpy. If you don’t have Python installed, you can install it from here. This is also the link showing how to install numpy. You can also search through the internet for numpy tutorials if you haven’t used it before. Google and APIs are your friends! You are NOT allowed to… • Use machine learning package such as sklearn. • Use data analysis package such as panda or seaborn. • Discuss low-level details or share code / solutions with other students. Advice. Start early. There are two sections to this assignment – one involving small exercises (20% of grade) and another focused more on programming (80% of the grade). Read the whole document before deciding where to start. How to submit. Submit a zip file to Canvas. Inside, you will need to have all your working code and hw4-report.pdf. 1 Exercises: Decision Trees and Ensembles [5pts] To get warmed up and reinforce what we’ve learned, we’ll do some light exercises with decision trees – how to interpret a decision tree and how to learn one from data. I Q1 Drawing Decision Tree Predictions [2pts]. Consider the following decision tree: a) Draw the decision boundaries defined by this tree over the interval x1 ∈ [0, 30], x2 ∈ [0, 30]. Each leaf of the tree is labeled with a letter. Write this letter in the corresponding region of input space. b) Give another decision tree that is syntactically different (i.e., has a different structure) but defines the same decision boundaries. c) This demonstrates that the space of decision trees is syntactically redundant. How does this redundancy influence learning – i.e., does it make it easier or harder to find an accurate tree? 1 I Q2 Manually Learning A Decision Tree [2pts]. Consider the following training set and learn a decision tree to predict Y. Use information gain to select attributes for splits. A B C Y 0 1 1 0 1 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 1 For each candidate split include the information gain in your report. Also include the final tree and your training accuracy. Now let’s consider building an ensemble of decision trees (also known as a random forest). We’ll specifically look at how decreasing correlation can lead to further improvements in ensembling. I Q3 Measuring Correlation in Random Forests [1pts]. We’ve provided a Python script decision.py that trains an ensemble of 15 decision trees on the Breast Cancer classification dataset we used in HW1. We are using the sklearn package for the decision tree implementation as the point of this exercise is to consider ensembling, not to implement decision trees. When run, the file displays the plot: The non-empty cells in the upper-triangle of the figure show the correlation between predictions on the test set for each of 15 decision tree models trained on the same training set. Variations in the correlation are due to randomly breaking ties when selecting split attributes. The plot also reports the average correlation (a very high 0.984 for this ensemble) and accuracy for the ensemble (majority vote) and a separately-trained single model. Even with the high correlation, the ensemble managed to improve performance marginally. As discussed in class, uncorrelated errors result in better ensembles. Modify the code to train the following ensembles (each separately). Provide the resulting plots for each and describe what you observe. a) Apply bagging by uniformly sampling train datapoints with replacement to train each ensemble member. b) The sklearn API for the DecisionTreeClassifier provides many options to modify how decision trees are learned, including some of the techniques we discussed to increase randomness. When set less than the number of features in the dataset, the max_features argument will cause each split to only consider a random subset of the features. Modify line 44 to include this option at a value you decide. 2 2 Implementation: k-Means Clustering [20pts] 2.1 Implementing k-Means Clustering As discussed in lecture, k-Means is a clustering algorithm that divides an input dataset into k separate groups. In this section, we’ll get an implementation running for a simple toy problem and then apply it to a unsorted image collection to discover underlying structure. k-Means Algorithm. More formally, given an input dataset X = {xi} n i=1 1 and the number of clusters k, k-Means produces a set of assignments Z = {zi} n i=1 where zi ∈ {1, 2, …, k} mapping each point to a one of the k clusters and a set of k cluster centers C = {cj} k j=1 (also referred to as centroids). The k-Means algorithm attempts to minimize the sum-of-squared-error (SSE) between each datapoint and it’s assigned centroid – we can write this objective as SSE(X, Z, C) = Xn i=1 ||xi − czi ||2 2 (1) where czi is the centroid vector that point i is assigned to. k-Means tries to minimize this objective by alternating between two stages updating either the assignments or the centroids: 1. Update Assignments. For each point xi , compute which centroid is closest (i.e., has the minimum distance) and set zi to the id of this nearest centroid. zi = arg min j=1,2,…,k ||xi − cj ||2 2 (2) This operation is similar to finding the nearest neighbor of xi within the set of centroids. 2. Update Centroids. For each centroid cj , update its value as the mean of all assigned points: cj = 1 Pn i=1 I[zi == j] Xn i=1 I[zi == j]xi (3) where I[zi == j] is the indicator function that is 1 if zi == j and 0 otherwise. For instance, the denominator of the leading term effectively just counts the number of points assigned to cluster j. The figure below demonstrates this process for two iterations on a simple problem. The centroids shown as large circles with black borders are initialized randomly. In the first assignment step (a), points are colored according to the nearest centroid. Then in the update step (b), the centroids are moved to the mean of their assigned points. The process repeats in iteration 2, assignments are updated (c) and then the centroids are updated (d). Additional Implementation Details. The algorithm above does not specify how the to initialize the centroids. Popular options include uniformly random samples between the min/max of the dataset, starting the centroids at random data points, and iterative furthest point sampling (bit tricky to implement). 1Notice that we do not have any associated yi, as we are in an unsupervised setting. 3 Another issue that can arise during runtime is having “dead” clusters – or clusters that no or very few points assigned. Commonly, clusters with fewer members than some threshold are re-initialized. The skeleton code provided in kmeans.py implements this algorithm in the kMeansClustering function shown below. You will implement the helper functions to perform each step. 1 def kMeansClustering ( dataset , k , max_iters =10 , min_size =0 , visualize = False ) : 2 3 # Initialize centroids 4 centroids = initalizeCentroids ( dataset , k) 5 6 # Keep track of sum of squared error for plotting later 7 SSE = [] 8 9 # Main loop for clustering 10 for i in range ( max_iters ): 11 12 # 1. Update Assignments Step 13 assignments = computeAssignments ( dataset , centroids ) 14 15 # 2. Update Centroids Step 16 centroids , counts = updateCentroids ( dataset , centroids , assignments ) 17 18 # Re – initalize any cluster with fewer then min_size points 19 for c in range ( k): 20 if counts [c]

$25.00 View

[SOLVED] Comp 5600/6600 assignment 3 – deep learning applications

Part 1 [25 points] In this assignment you will implement two deep learning algorithms to predict the genres of a movie from a short description given in text. This will be a multi-class classification problem since a movie can belong to multiple genres at the same time. Models: 1. Implement a RNN model to predict the genre classes of a movie given the short textual description. There are 20 classes of genres in the dataset and each sample may have multiple genres to predict. You can use a sigmoid activation function to compute the probability of each classes and find multiple genres given a text description. 2. Implement a LSTM model to do the same and find accuracy, precision and recall and compare the numbers in your report. 3. The dataset contains the titles of the movies as well. Use the titles along with your descriptions and feed it to your one of your models and see if you can get any boost in your performance. Dataset: IMDB data from 2006 to 2016( https://www.kaggle.com/datasets/PromptCloudHQ/imdb-data) has 1000 samples with information Title, Genre, Description, Director, Actors, Year, Runtime, Rating, Votes, Revenue, Metascrore. For this assignment you will use only the Title, Genre and Description columns.To load the data you can refer to the following code snippet: import kagglehub path = kagglehub.dataset_download(“PromptCloudHQ/imdb-data”) print(“Path to dataset files:”, path) files = os.listdir(path) print(“Files in dataset directory:”, files) csv_file_path = os.path.join(path, ‘IMDB-Movie-Data.csv’) # Replace with the actual filename data = pd.read_csv(csv_file_path) print(data.columns) print(data[‘Title’][1]) print(data[‘Genre’][1]) print(data[‘Description’][1]) Preprocessing: Since the text description of the movies contain some noise and stopwords you need to preprocess and clean the data such as remove the stopwords, remove numbers, lemmatization etc. You can refer to the following code to do some preprocessing on your data or you can write your own. nltk.download(‘punkt’) nltk.download(‘stopwords’) nltk.download(‘wordnet’) stop_words = set(stopwords.words(‘english’)) print(stop_words) lemmatizer = WordNetLemmatizer() tokens = nltk.word_tokenize(data[‘Description’][1].lower()) print(tokens) filtered_tokens = [lemmatizer.lemmatize(token) for token in tokens if token not in stop_words and token.isalnum()] print(filtered_tokens) Since you cannot feed your text data directly into a neural network, you need to tokenize each word and then use some sort of vector embedding to represent your words. In this assignment, you will use the glove embedding vectors to represent each of your words. Also, the neural network will only take a fixed size of input to it’s model, so you need to pad your text sequences to give it a fixed size. You can use the pad_sequences function from Tensorflow. Here is how you can download the Golve embedding and load the embedding vectors for all your words. This might take a while to download. !wget http://nlp.stanford.edu/data/glove.6B.zip !unzip glove.6B.zip embeddings_index = {} with open(‘glove.6B.100d.txt’, encoding=’utf-8′) as f: for line in f: values = line.split() word = values[0] coefs = np.asarray(values[1:], dtype=’float32′) embeddings_index[word] = coefs print(‘Found %s word vectors.’ % len(embeddings_index)) Multi-Label labelling: Since your dataset has multiple labels for each sample, you need to preprocess the labels as well. There are 20 classes of genres and you need to create a vector of size 20 for each sample such as only the corresponding indices of genres are 1 and all other values are 0. You can follow the following code to preprocess your labels: from sklearn.preprocessing import MultiLabelBinarizer genres = data[‘Genre’].apply(lambda x: x.split(‘,’)) print(genres) mlb = MultiLabelBinarizer() genres_encoded = mlb.fit_transform(genres) print(genres_encoded[0]) print(mlb.classes_) Once you have done all the preprocessing for your data you can load the data using a custom dataloader from keras. Use first 700 samples for training, 100 samples for validation and the last 200 to test your model. Train your networks for 20 epochs. Plot the graphs for training and validation loss, accuracy etc. Deliverables: One .ipynb file with codes with data preprocessing and implementation of two models (feed forward and LSTM). Plot the graphs for loss and accuracy for training and validation. Report the numbers for test set for both models. Add titles to your text descriptions. Report the numbers for one of the models and compare with the previous model. Comment on how it helps or does not help. Part 2 [25 points] In this part you will implement a convolutional neural network to classify natural images into 6 classes. This dataset is called Intel Image Classification dataset which has 6 classes. The dataset has 14,000 training images; 3,000 validation and 7000 pred_seg images. You need to train your model using the 14000 training images and test it on 3000 validation images. You do not need to do anything on the 7000 pred_seg images. The images of the dataset belong to classes such as streets, buildings, woods, mountains, seas, and glaciers. The data is organized into subfolders with class names for both training and validation (test) sets. When you load the images to a dotaloader, you need to label the images according to your folder names. You can load the images using opencv library. This is how you can download the data: import kagglehub import os path = kagglehub.dataset_download(“puneet6060/intel-imageclassification”) print(“Path to dataset files:”, path) You can also download the images from the following link, unzip it and then upload to your google drive: https://www.kaggle.com/datasets/puneet6060/intel-image-classification?resource=download Models: 1. Implement one CNN model with 3 Conv layer and three and 3 max_pooling layers. Add a dropout layer and a dense layer before the output layer 2. Implement a model with 6 Conv layers and 3 pooling layers. Add a dropout layer and a dense layer before the output layer 3. Play around with different learning rates, batch sizes and optimizers. Deliverables: Include your code and report. Document your findings in a table and write a summary. Report your numbers in terms of total accuracy and per-class accuracy on the test set (3000 images). Visualize your prediction from test set with the label for 2 samples. **Submit your .ipynb file with output cells. Please do not submit without running the code or with empty output cells. ** Use the free GPUs in google colab, otherwise it will take a long time to train your models Click on Runtime, change runtime type then select the available GPUs.

$25.00 View

[SOLVED] Comp 5600/6600 assignment 2 – basic machine learning

In this assignment, you will be implementing and analyzing three different machine learning algorithms. Each algorithm will focus on a different problem scenario that can be solved by different learning methods. You must implement all algorithms from scratch using Python and you can use any external libraries such as Sci-kit learn. You can use other libraries for data loading and pre-processing as necessary. [Problem 1] [25 Points] You are tasked with developing models to predict customer churn for a subscriptionbased service. Using the provided dataset, your goal is to build two classification models: one using Logistic Regression and the other using Naive Bayes. You will compare their performance, interpret the results, and provide insights into customer churn based on your findings. You will use provided the Telco Customer Churn dataset, which contains customer information such as demographic details, account features, and whether the customer has churned. Your target variable is “Churn,” indicating whether a customer has left the service. Ensure you follow the below instructions: • Evaluate both models using the following metrics: Accuracy, Precision, Recall, F1-Score, and ROC-AUC. • Perform 5-fold cross-validation on both models and report the averaged results. • If there are any missing values (there will be!), fill them in during a preprocessing step using two of the three common strategies outlined below. Do this for the entire dataset! o Use the most common value in the dataset that has a value for this feature/attribute o Use a default value to fill in for missing values. It can be anything. o Drop that feature all together and use only features that have values for all data points. • Scale or normalize numerical features if required. • Ensure that your IPython notebook has text files that has the following details: o Discuss your outcomes from using your chosen preprocessing steps to handle missing data o Compare the performance of both models and discuss their strengths and weaknesses. Which model is more suited for this dataset and why? o Insights gained from your experiments. [Problem 2] [25 Points] In this question, you will be using k-means to perform image compression. Implement a naïve version of the k-means algorithm based on your understanding. Your code must take the number of clusters k as input and perform k-means clustering on the given image (test_image.png). Once the algorithm finishes running, the cluster centroids represent the top-k common colors in the image. Iterate through each pixel in the image and assign the closest color to each pixel. Save and visualize the resulting image. For reading and writing images, you can use OpenCV, which is an open-source computer vision toolkit. The following code will load the image into a NumPy array. You can use this as input to your K-Means algorithm. import cv2 img = cv2.imread(‘test_image.png’) height, width, channels = np.shape(img) for i in width: for j in height: pixel = img[j][i] # Read the pixel at location (i,j) img[j][i] = newValue # Assign a new value to the pixel Experiment with different values of k and briefly describe your thoughts about which value works best for this problem. You can use plots, error bars, etc. to support your conclusions. Deliverables: A single IPython Notebook that contains your code and report. You can use the text cells to write your report and embed any plots, illustrations, and/or images that you need to support your claims.

$25.00 View

[SOLVED] Comp 5600/6600 assignment 1 – using search for robot path planning

You are designing a robot whose goal is to move from one location to another, just as we discussed in Lecture 6. You want to evaluate the ability of informed and uninformed searches to find the optimal path (shortest) to get from the start to the goal state. To help you, I have converted the map of the building into a state-space graph. Your goal is to implement 2 uninformed search algorithms (BFS and DFS) and 1 informed search (A* algorithm) to search for an optimal path from the start node to the end node. You will be given 3 test cases on which to evaluate your implementations. Deliverables: You will need to have your own implementation of each of the three algorithms as well as a short report that discusses your design choices, your implementation strategy, and a comparison of the performance of the three algorithms on each of the test cases. You can compare them using any metric(s) of your choice, such as success rate, time taken to find a solution, and number of steps taken to find a solution. You should briefly describe your findings and provide your insights on which is suitable for this problem. For the informed search (A* algorithm) you must devise and evaluate at least 2 heuristics of your choice. You are free to choose any function as your heuristic. In your report, you should describe why you chose it, whether it is an admissible heuristic, and whether it helped the A* algorithm perform better than uninformed search. Here are some more details about your assignment. • Input: 3 test cases, with varying levels of complexity, are provided. Each test case consists of 2 files. The first, labeled “TestCase_XX_EdgeList.txt” is a text file where each line corresponds to an edge list of the form , which indicates an edge between nodes n1 and n2 with a weight of w. The second file, labeled “TestCase_XX_NodeID.csv”, is a CSV file with each line of the form “n1,x,y”, where x and y are the coordinates of the state n1. • Expected Output: Your program should print out a list of states visited by your algorithm, from the start state (indicated by the first line of the NodeID file) to the goal state (the last line of the NodeID file). • Submission Format: Your code must be an IPython Notebook. You can have text blocks to write your report and the code blocks for your implementation. You can use Google Colab for implementing your code since your code will be evaluated on Colab so that everyone’s code is evaluated on a standard platform. You can download your file for submission by going to “File- >”Download”->”Download .ipynb”. To Verify your implementation, the expected output of BFS and DFS are given below. Since heuristic functions can vary and hence result in different solutions, the output for A* is not provided here. Case 1: ————— BFS: [‘N_0’, ‘N_1’, ‘N_6’, ‘N_2’, ‘N_5’, ‘N_7’, ‘N_3’, ‘N_10’, ‘N_12’, ‘N_15’, ‘N_11’, ‘N_13’, ‘N_17’, ‘N_20’, ‘N_16’, ‘N_8’, ‘N_14’, ‘N_18’, ‘N_22’, ‘N_21’, ‘N_9’, ‘N_19’, ‘N_23’, ‘N_4’, ‘N_24’] DFS: [‘N_0’, ‘N_1’, ‘N_2’, ‘N_3’, ‘N_6’, ‘N_7’, ‘N_12’, ‘N_17’, ‘N_22’, ‘N_23’, ‘N_13’, ‘N_18’, ‘N_19’, ‘N_24’]

$25.00 View

[SOLVED] Cs 5600/6600 assignment 0 – python fundamentals

The goal of this assignment is to provide a refresher with python programming and the associated libraries that we will be using in this course such as NumPy and Matlplotlib. There will be three parts (each worth 5 points) to this assignment, for a total of 15 points. Please submit your code as a IPython notebook on Canvas. Please use Google Colab to create and test your code as it will be graded on Colab to ensure a common, fair platform for everybody. Do not submit your output files. We will generate them when we run your code. Problem 1 In this part of the assignment, Numpy arrays. Create a 1D NumPy array of shape [1×5] with random values drawn from a uniform distribution. Perform the following operations: • Compute the mean and standard deviation of the array. • Reshape the array into a 2D array with 5 rows and 1 column. • Add 5 to each element in the array and print the result. • Compute the dot product of this reshaped array with itself. After each step, print the resulting value. Problem 2 In this part of the assignment, you will be working with visualizing plots with Matplotlib. Generate a set of x values ranging from 0 to 100 with an increment of 0.1 using NumPy. Compute the corresponding y values using the function y=sin(x). Plot the sine wave using Matplotlib, and add appropriate labels for the x-axis, y-axis, and a title for the plot. Save the plot as a PNG file named “sine_wave.png”. Problem 3 In this part of the assignment, you will be integrating NumPy and Matplotlib to analyze and visualize data. Perform the following steps: • Create two NumPy arrays: one for x values ranging from 0 to 100 with an increment of 1, and another for y values that represent a quadratic function y=0.5×2 +2x+1. • Plot the quadratic function using Matplotlib with appropriate labels and a legend. • Add gridlines to the plot and display it with a line style of your choice. • Save the plot as a PDF file named quadratic_function.pdf.

$25.00 View

[SOLVED] Comp 3700: project 4– dragon game part 2

Goals: • To design a use case diagram to capture the requirements of project 4. • To use the argoUML tool to create a use case diagram and specify use cases. 1. Overview Write a few important use cases. Remember, these use cases describe how the user interacts with the text-based game (what they do, what the system does in response, etc.). Your use cases should have enough basic details such that someone unfamiliar with the system can understand what is happening in the text-based game. They should not include internal technical details that users are not (and should not) aware of. Make sure that any special rules/features you plan to add are clearly described in your analysis section. 2.1. Create a Use Case Diagram using argoUML In this project, you must use ArgoUML to draw a use case diagram. When you create a new project, it has a use case diagram created by default, named use case diagram 1. 2.2. Create Use Case Specification in argoUML You must also use argoUML to document the behavior of each use case in your use case diagram. The specification of a use case should be described in the Documentation tab of the use case. The specification of each use case should contain the following items: • Name. The name of the use case to which this relates. • Goal. A one- or two-line summary of what this use case achieves for its actors. • Actors. The actors involved in this use case, and any context regarding their involvement. Note: This should not be a description of the actor. That should be associated with the actor on the use case diagram. • Pre-condition. These would be better named “pre-assumptions”, but the term used everywhere is pre-conditions. This is a statement of any simplifying assumptions we can make at the start of the use case. • Basic Flow. The linear sequence of steps that describe the behavior of the use case in the “normal” scenario. Where a use case has a number of scenarios that could be normal, one is arbitrarily selected. • Alternate Flows. A series of linear sequences describing each of the alternative behaviors to the basic flow. • Post-conditions. These would be better named “post-assumptions”. This is a statement of any assumptions that we can make at the end of the use case. • Requirements. In an ideal world, all of the vision document, use case diagrams, use case specifications and supplementary requirements Project 4 – page 2 specification would form the requirements for a project. 2. Grading Criteria 2.1 (40 points) Use case diagram 1. (10 points) Actors 2. (10 points) Use cases in the diagram. 3. (10 points) Relations among actors and use cases. 4. (10 points) Relations among use cases. 2.2 (50 points) Use case specification 1. (10 points) Name, goal, actors in each use case 2. (20 points) Pre-condition/post-condition in each use case 3. (20 points) Basic Flows/Alternate Flows in each use case 2.3 (10 points) Submission Please submit your project analysis through the Canvas system (e-mail submission will not be accepted). You just need to submit your analysis document as an ArgoUML compressed project file (*.zargo). The file name should be formatted as: Project4-firstName.zargo Note: other format (e.g., pdf, doc, txt) will not be accepted. 3. No Late Submission • Late submissions will not be accepted and will result in ZERO without valid excuses, in which case you should talk to Dr. Li to explain your situation. • GTA/Instructor will NOT accept any late submission caused by Internet latency. 4. Rebuttal period • You will be given a period of two business days to read and respond to the comments and grades of your homework or project assignment. The TA may use this opportunity to address any concern and question you have. The TA also may ask for additional information from you regarding your homework or project. Project 4 – page 3 5. Sample Usage What’s your name? Bob =========================================================== | Welcome, Bob! | =========================================================== 1) Start a New Game of Dunstan and Dragons! 2) View top 10 High Scores 3) Quit Please choose an option: 2 The top 5 High Scores are: Win 1337 CaseyZZZ 625 JonnieKill 400 Bob 75 Daisy 33 -no more scores to show1) Start a New Game of Dunstan and Dragons! 2) View top 10 High Scores 3) Quit Please choose an option: 1 Entering the Dungeon… You have: intelligence: 20 time: 25 money: $11.00 You are 20 steps from the goal. Time left: 25. 1) Move forward(takes time, could be risky…) 2) Read technical papers (boost intelligence, takes time) 3) Search for loose change (boost money, takes time) 4) View character 5) Quit the game Please choose an action: 4 Project 4 – page 4 You have: intelligence: 20 time: 25 money: $11.00 You are 20 steps from the goal. Time left: 25. 1) Move forward(takes time, could be risky…) 2) Read technical papers (boost intelligence, takes time) 3) Search for loose change (boost money, takes time) 4) View character 5) Quit the game Please choose an action: 2 You read through some technical papers. You gain 3 intelligence, but lose 2 units of time. You are 20 steps from the goal. Time left: 23. 1) Move forward (takes time, could be risky…) 2) Read technical papers (boost intelligence, takes time) 3) Search for loose change (boost money, takes time) 4) View character 5) Quit the game Please choose an action: 1 You move forward one step, and… NOTHING HAPPENS! You spent one unit of time. You are 19 steps from the goal. Time left: 22. 1) Move forward (takes time, could be risky…) 2) Read technical papers (boost intelligence, takes time) 3) Search for loose change (boost money, takes time) 4) View character 5) Quit the game Please choose an action: 1 You move forward one step, and… Project 4 – page 5 YOU FIND SOME PAPERS TO GRADE. You spent two units of time, but gained $3.00! You are 18 steps from the goal. Time left: 20. You can move forward or backward. 1) Move forward(takes time, could be risky…) 2) Read technical papers (boost intelligence, takes time) 3) Search for loose change (boost money, takes time) 4) View character 5) Quit the game Please choose an action: 1 You move forward one step, and… PUZZLE: It’s a riddling imp. I hate riddling imps. But fine, he asks: “Find the product of 8 and 8!” 1) 16 2) 64 3) 256 4) Uh…uh… no? Choose wisely: 4 The imp cackles “Oh yes. Yes indeed. Now you die.” TIME HAS FALLEN TO ZERO. YOU DIE.

$25.00 View