Objectives ● Implement an Account class. ● Implement a Bank class. ● Use loops. ● Use arrays. ● Write one of the classes from scratch (no template given as in previous labs)Duration: one week. Grading Scheme: 50% submitted source code 25% in-class demonstration and questions (during week 4 lab hours) 25% in-class quiz – Held during the first 5 mins of the lab class (during week 6 lab hours) Overview A classic application of object-oriented techniques is a bank account. In this lab, you will implement a simplified model of a bank account and of a bank itself. A bank Account will track the account number, the owner’s name and the current balance. The only operations allowed (apart from trivial getters) will be withdrawing and depositing money. A user will not be allowed to make a deposit or withdrawal of a negative or zero value. Furthermore, they will not be allowed to withdraw more money than they have in the account. An account, of course, must belong to a specific bank; hence you will also implement the Bank class. A Bank object will keep track of all of the accounts it has; it will also disallow two accounts with the same account number.The Account Class No template is given for the Account class, you will have to create it from scratch following the design instructions given here. The Account class will consist of instance variables, a constructor and the following methods: ● String getName() ● double getBalance() ● int getNumber() ● boolean deposit(double amount) ● boolean withdraw(double amount) In addition, you are provided with a toString() method below that you must not modify. Account class toString() code @Override public String toString() {//DO NOT MODIFY return “(” + getName() + “, ” + getNumber() + “, ” + String.format(“$%.2f”, getBalance()) + “)”; } The Constructor The constructor’s signature is: public Account(String name, int number, double initialBalance) In typical use, it could create an account for “Alice Jones” with an account number of 1234 and an initial balance of $100.00 with: Account alice = new Account(“Alice Jones”, 1234, 100.0); When you implement the constructor, you must also think about the instance variables you will need. (Hint: recall that a constructor often simply copies its arguments into instance variables. Also, if you have declared your instance variables, a Netbeans tool can be used to generate the code for the constructor automatically.) The Getters You should have little trouble implementing the getter methods (getName() returns the name of the account owner, getNumber() returns the account number and getBalance() returns the current balance.) (Hint: The getters can be generated automatically in Netbeans as one of the Refactoring choices.) The withdraw() and deposit() methods Note that these methods return a boolean: true or false. If successful, they return true. However, an attempt to withdraw can fail if the balance is not big enough or if an attempt is made to withdraw zero or a negative value. A deposit can fail if the amount is negative or zero. Of course, a successful deposit or withdrawal should update the account balance.The Bank Class You should not attempt to implement the Bank class until you have an Account working properly. The Bank implementation will require that you use arrays and loops for the first time. Unlike the Account class, you are given a template for this class (Bank.java) so you don’t have to start it from scratch. And, to make your life a little easier and ensure that you start off the right way, the instance variables and the constructor have been given to you as well. (You should not modify the furnished constructor and you do not need any additional instance variables.)Source Code Bank.java, MainAccount.java and MainBank.java classes are provided with the lab. More details about these classes are in the instructions below.Step 1: Create a Netbeans Project and Account class 1. Create a Netbeans project called BankAccounts. 2. Create a Java file (class library type) called Account and set the package to coe318.lab3. Note: you are not provided with a template this time, you have to write it from scratch. Do not modify the automatically generated statement: package coe318.lab3; 3. Determine your instance variables and implement the constructor. 4. Implement the other methods. 5. You are provided with a toString() method for the Account class (above) which you should copy and paste into your class.Step 2: Test Account with MainAccount 1. Create a Java file (class library type) called MainAccount with package coe318.lab3 and copy and paste the provided source code from MainAccount.java. 2. You can run this main method either by invoking the Run file or changing the Netbeans configuration to specify the class containing the main method. 3. You should not continue until at least most of it works. Correct output from MainAccount (Bob, 789, $0.00) (Alice, 123, $100.00) (Alice, 123, $85.00) (Alice, 123, $85.00) (Alice, 123, $85.00) (Alice, 123, $35.00) (Bob, 789, $300.00) (Bob, 789, $200.00)Step 3: Create the Bank Class 1. Create a Java file called Bank and copy/paste the provided template code. 2. Create a Java file MainBank which will be used to test your code. Copy the provided MainBank.java code into this file. (Reset the main method in the Netbeans configuration to make this your main class) 3. Fix the methods until it works.Correct output from MainBank Toronto Dominion: 0 of 3 accounts open Toronto Dominion: 1 of 3 accounts open (Charles, 234, $200.00) td has account # 456: true Toronto Dominion: 2 of 3 accounts open (Charles, 234, $200.00) (Dora, 456, $300.00) Bank of Montreal: 1 of 5 accounts open (Edward, 456, $400.00) Step 4: Submit your lab Please zip up your NetBeans project containing all source files and submit to the respective assignment folder on D2L.
Objectives ● Implement a ComplexNumber class. ● Learn how immutable objects work. ● Create a project with more than one class.Overview In mathematics, complex numbers combine two real numbers and can be thought of as specifying a point on a plane. The most common ways to express the two components of the complex number are: ● rectangular: where the two numbers represent the x- and y-components of the point; ● polar: where one number represents the distance between the origin and the point and the other represents the angle between the x-axis and the line connecting the origin and the point. In this lab, we only consider the rectangular version. The Design of ComplexNumber The design of a class means specifying all of its public members. Usually, only some methods and constructors are public (and hence designed); it is highly unusual for instance variables to be public. In Java, a design is expressed in javadocs: specially formatted comments in the source code that describe the API for the class. You are given the design for the ComplexNumber class below. The design specifies methods such as getX(), add(ComplexNumber z), etc. The source code can be more conveniently accessed here The source code provided consists mainly of method stubs: methods that compile but produce dummy results. A notable exception is the method toString(); this method, which gives a String representation of a complex number, does work and should not be modified. (This method is implicitly invoked in the testing class that produces output.) The main objective of the lab requires that you fix the method stubs so that the program works.Source Code The skeleton code is provided with the lab handout in the file named ComplexNumber.java. Copy the code and paste it into your own ComplexNumber class. You are also provided with another class, ComplexTry.java, which includes a main method that you can use to test your implementation of ComplexNumber.java.Step 1: Create a Netbeans Project 1. Create a Netbeans project called Lab2ComplexNumber. 2. Create a Java file (class library type) called ComplexNumber specifying the package as coe318.lab1 and copy and paste the provided source code. 3. Similarly, create the Java file ComplexTry. Ensure that Netbeans sets the package to coe318.lab1. 4. Generate the javadocs and compile and run the project. 5. It should compile correctly and produce output. Unfortunately, the output is incorrect and you have to fix it.Step 2: Add instance variables and fix constructor and getters 1. Add instance variables for the two components of a complex number. 2. Modify the constructor so that they are properly initialized. 3. Fix the getReal() and getImaginary() methods so that they return the appropriate component. 4. Compile and run your project. The statement System.out.println(“a = ” + a) should now produce the correct output. Step 3: Fix remaining methods 1. Fix the remaining methods. 2. Suggestion: fix them in the order they are used in ComplexTry. 3. Hint: subtract() and divide() are easier to write by using previously fixed methods. Step 4: Submit your lab Please zip up your NetBeans project containing all source files and submit to the respective assignment folder on D2L.
Objectives ● Implement a Counter class. ● Learn how objects can be linked together. ● Use an “if” statement. Duration: one week. Overview In mathematics, a number is expressed in positional notation to a certain base, BFor example, the 3-digit number 123 in base 4 represents 16+8+3=27 (base 10). In this lab each digit is represented as a Counter object. A Counter object has an optional left neighbour which is also a Counter object. (The absence of a left neighbour is indicated with the keyword null. The important methods to implement are getCount() and increment(). If there is no left neighbour, the count is the same as the digit. If there is a left neighbour, the count is the sum of the digit and the modulus times the count of the left neighbour. The increment() method increments the Counter’s digit and, if it reaches its maximum (modulus) value, it is reset to zero. Furthermore, if there is a left neighbour and if the Counter has rolled over, its left neighbour should be incremented as well. Source Code Similar to the previous labs, Counter.java and CounterTry.java are provided with the handout.Step 1: Create a Netbeans Project 1. Create a Netbeans project called Counter which should be placed in a folder called lab2 (all lowercase and no spaces). The lab2 folder should itself be in your 1coe318 folder. 2. Create a Java file (class library type) called Counter; set the package to coe318.lab2; then copy and paste the provided source code. 3. Similarly, create the Java file CounterTry. (Ensure you use the same coe318.lab2 package name. 4. Generate the javadocs and compile and run the project. 5. It should compile correctly and produce output. Unfortunately, the output is incorrect and you have to fix it.Step 2: Add instance variables and fix constructor and getters 1. Add instance variables for the two components of a counter. 2. Modify the constructor so that they are properly initialized. 3. Fix the remaining methods so that they work for a simple counter without a left neighbour.Step 3: Fix remaining methods 1. Fix the remaining methods.Step 4: Submit your lab Please zip up your NetBeans project containing all source files and submit to the respective assignment folder on D2L. 2
Objectives ● Implement a Node class. ● Implement a Circuit class. ● Implement a Resistor class. ● Do a tutorial on debugging. Overview In this lab, you will model an electric circuit composed of an arbitrary number of resistors. Each of the two ends of a resistor will be connected to a Node. Each Resistor will be added to a Circuit at the time the resistor is created (i.e. within the constructor.) Introduction to IllegalArgumentException() What can a programmer do when parameters passed to a constructor make no sense? For example, in the Lab 3 Counter class, a modulus anything less than 2 would be senseless. Construction of such “senseless” objects can be aborted in the constructor by throwing an Exception. We will discuss Exceptions in much greater detail later in the course. For now all you need to know is that constructors should use if-statements to detect illegal parameters and, if detected, a new IllegalArgumentException() should be thrown. This general technique is illustrated below where (for reasons that don’t matter) a constructor of an E object must be passed an integer that cannot be negative and s String that cannot be null. public class E { public E(int i, String s) { if (i
The goal of this homework is to introduce you to BERT, an LLM (Large Language Model) from Google. To get an overall sense of where BERT stands in relation to the other LLMs like GPT-2 and GPT-3 that you have surely heard about, please read through at least the Preamble of your instructor’s Week 15 slides: https://engineering.purdue.edu/DeepLearn/pdf-kak/LLMLearning.pdfThis assignment calls upon you to fine-tune a pre-trained BERT model. The focus will be on the downstream task of question answering (Q&A). To accomplish this, we will utilize the Hugging Face transformers framework. Hugging Face plays a pivotal role in the open-source machine learning/artificial intelligence ecosystem, providing developers with libraries that facilitate seamless interaction with pretrained models. The following sections will help you set up the environment and usage of transformers framework.2 Getting Ready for This Homework Before embarking on this homework, do the following: 1. Install hugging face transformers library. You may use the following command: pip install transformers 1 If you want to install from source or any specific variation, you are welcome to further refer the official documentation at: https://huggingface.co/docs/transformers/en/installation 2. You will also need to install datasets library: pip install datasets For further installation options, please refer the official documentation at: https://pypi.org/project/datasets/3. Download the pickle files provided on Brightspace which contain train and test datasets. 4. Review slides 9-26 from Week 15 lecture for an overview of BERT architecture and input to BERT. 2.1 Dataset The data provided to you (the pickle files) is a subset of the The Stanford Question Answer Dataset (SQuAD) [1]. SQuAD is a reading comprehension dataset, consisting of crowd-sourced questions on a set of Wikipedia articles. The answer to every question could be a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD v1.1 contains 100,000+ question-answer pairs on 500+ articles.However, we extracted only 10000 samples and split into 70:20:10 ratio for train, evaluate and test datasets for fine-tuning task. At this point, the reader may question why only a small portion of data? The notion of fine-tuning is to avoid the necessity for extensive dataset to perform a specific downstream objective. For this homework, you need to download dataset provided on Brightspace.The zip file contains 6 pickle files, two pickle files corresponding to each split, train, test and, eval.The following snippet shows an example how to load the pickle dictionaries : 1 import pickle 2 3 with open (’ train_dict . pkl ’, ’rb ’) as f: 4 train_dict = pickle . load (f) 5 with open (’ test_dict . pkl ’, ’rb ’) as f: 6 test_dict = pickle . load (f) 2 7 with open (’ eval_dict . pkl ’, ’rb ’) as f: 8 eval_dict = pickle . load (f) 9 10 with open (’ train_data_processed . pkl ’, ’rb ’) as f: 11 train_processed = pickle . load (f) 12 13 with open (’ test_data_processed . pkl ’, ’rb ’) as f: 14 test_processed = pickle . load (f) 15 16 with open (’ eval_data_processed . pkl ’, ’rb ’) as f: 17 eval_processed = pickle . load (f) 18 19 print( train_dict . keys () ) 20 print( test_dict . keys () ) 21 print( eval_dict . keys () ) 22 23 print( train_processed . keys () ) 24 print( test_processed . keys () ) 25 print( eval_processed . keys () ) 26 27 # dict_keys ([ ’ id ’, ’ title ’, ’ context ’, ’ question ’, ’ answers ’]) 28 # dict_keys ([ ’ id ’, ’ title ’, ’ context ’, ’ question ’, ’ answers ’]) 29 # dict_keys ([ ’ id ’, ’ title ’, ’ context ’, ’ question ’, ’ answers ’]) 30 # dict_keys ([ ’ input_ids ’, ’ attention_mask ’, ’ start_positions ’, ’ end_positions ’]) 31 # dict_keys ([ ’ input_ids ’, ’ attention_mask ’, ’ start_positions ’, ’ end_positions ’]) 32 # dict_keys ([ ’ input_ids ’, ’ attention_mask ’, ’ start_positions ’, ’ end_positions ’])The * _dict.pkl contains the raw data with context, question and answers. While the * _processed.pkl contains the subword embeddings from BERT tokenizer and embedder. Also, in the preprocessing steps, I have performed offset mapping to gather start and end positions. What is offset mapping? You may have noticed change of position or word split when you convert text to tokens.These tokens are used by BERT to understand the start and end of the sentence. However, when dealing with tasks such as Q&A or Named Entity Recognition, we need to know the position of the tokens in the original text. This is where offset mapping comes into play. Offset mapping is a list of tuples that map tokenized words to their character position in the original text. For example, if the original text is ”Hello, world!” and the tokenized text is [”Hello”, ”,”, ”world”, ”!”], then the offset mapping would be [(0, 5), (6, 7), (8, 13), (13, 14)].This means that the word ”Hello” starts at position 0 and ends at position 5 in the original text, and so on. This is useful when we want to extract the answer from the original text 3 using the start and end token positions predicted by BERT. The ’start positions’, ’end positions’ keys denote the start and end token positions of the answer in the tokenized text.3.1 BERT for Q&A This homework is an introduction to how to fine-tune a BERT model for downstream tasks, here, specifically Question Answering. Fine-tuning as BERT model for any task involves addition of extra layers to accommodate for the final task of text classification, generation or question-answering. Some methods freeze the backbone BERT entirely or partially or not at all. Following are the steps to fine-tune your own BERT model for Question Answer task: 1. First, initialize a model. We use BertForQuestionAnswering to initialise the model 1 from transformers import BertForQuestionAnswering 2 model_name = ’bert – base – uncased ’ 3 model = BertForQuestionAnswering. from_pretrained ( model_name ) 4 5 print( model . _modules ) 6 7 # OrderedDict ([( ’ bert ’, 8 # BertModel ( 9 # ( embeddings ): BertEmbeddings ( 10 # ( word_embeddings ): Embedding ( 30522 , 768 , padding_idx =0) 11 # ( position_embeddings ): Embedding ( 512 , 768 ) 12 # ( token_type_embeddings ): Embedding (2 , 768 ) 13 # ( LayerNorm ): LayerNorm (( 768 ,) , eps =1e – 12 , elementwise_affine = True ) 14 # ( dropout ): Dropout (p =0.1 , inplace = False ) 15 # ) 16 # ( encoder ) : BertEncoder ( 17 # ( layer ) : ModuleList ( 4 18 # (0 – 11 ): 12 x BertLayer ( 19 # ( attention ): BertAttention ( 20 # ( self ): BertSelfAttention ( 21 # ( query ): Linear ( in_features = 768 , out_features = 768 , bias = True ) 22 # ( key ): Linear ( in_features = 768 , out_features = 768 , bias = True ) 23 # ( value ): Linear ( in_features = 768 , out_features = 768 , bias = True ) 24 # ( dropout ): Dropout (p=0.1 , inplace = False ) 25 # ) 26 # ( output ): BertSelfOutput ( 27 # ( dense ): Linear ( in_features = 768 , out_features = 768 , bias = True ) 28 # ( LayerNorm ): LayerNorm (( 768 ,) , eps =1e -12 , elementwise_affine = True ) 29 # ( dropout ): Dropout (p=0.1 , inplace = False ) 30 # ) 31 # ) 32 # ( intermediate ): BertIntermediate ( 33 # ( dense ): Linear ( in_features = 768 , out_features = 3072 , bias = True ) 34 # ( intermediate_act_fn ): GELUActivation () 35 # ) 36 # ( output ): BertOutput ( 37 # ( dense ): Linear ( in_features = 3072 , out_features = 768 , bias = True ) 38 # ( LayerNorm ): LayerNorm (( 768 ,) , eps =1e – 12 , elementwise_affine = True ) 39 # ( dropout ): Dropout (p=0.1 , inplace = False ) 40 # ) 41 # ) 42 # ) 43 # ) 44 # )) , 45 # (’ qa_outputs ’, 46 # Linear ( in_features = 768 , out_features =2 , 5 bias = True )) ]) 2. Now, next step would be to set the training arguments for fine-tuning task: 1 from transformers import TrainingArguments 2 3 training_args = TrainingArguments ( 4 output_dir =’./ results ’, # output directory 5 use_mps_device = True , 6 num_train_epochs =3 , # total number of training epochs , change this as you need 7 per_device_train_batch_size=8 , # batch size per device during training , change this as you need 8 per_device_eval_batch_size=8 , # batch size for evaluation , change this as you need 9 weight_decay =0.01 , # strength of weight decay 10 logging_dir =’./ logs ’, # directory for storing logs 11 ) 3. Finally, to train we use the Trainer class from hugging face 1 from transformers import Trainer 2 from datasets import Dataset 3 import pandas as pd 4 5 train_dataset = Dataset . from_pandas ( pd .DataFrame ( train_processed )) 6 eval_dataset = Dataset . from_pandas ( pd . DataFrame ( eval_processed )) 7 test_dataset = Dataset . from_pandas ( pd . DataFrame ( test_processed )) 8 9 trainer = Trainer ( 10 model=model , # the instantiated Transformers model to be fine – tuned 11 args = training_args , # training arguments , defined above 12 train_dataset = train_dataset , # training dataset 13 eval_dataset = eval_dataset # evaluation dataset 14 ) 15 6 16 trainer . train () 17 18 # {’ loss ’: 2. 3874 , ’ learning_rate ’: 3. 3333333333333335e -05 , ’ epoch ’: 1.0} 19 # {’ loss ’: 0. 926 , ’ learning_rate ’: 1. 6666666666666667e -05 , ’ epoch ’: 2.0} 20 # {’ loss ’: 0. 4081 , ’ learning_rate ’: 0.0 , ’ epoch ’: 3.0} 21 # {’ train_runtime ’: 1347 . 7488 , ’ train_samples_per_second’: 8. 904 , ’ train_steps_per_second ’: 1 . 113 , ’ train_loss ’: 1 . 2405101318359375 , ’ epoch ’: 3.0} In your report, dedicate a block showing the train output as above for first 5 epochs. 4. Finally, to test the trained model. 1 import numpy as np 2 x = trainer . predict ( test_dataset ) 3 start_pos , end_pos = x. predictions 4 start_pos = np . argmax ( start_pos , axis =1) 5 end_pos = np . argmax ( end_pos , axis =1) 6 7 for k , (i , j) in enumerate ( zip( start_pos , end_pos )): 8 tokens = tokenizer . convert_ids_to_tokens ( test_processed [’ input_ids ’ ][k]) 9 10 print(’ Question : ’, test_dict [’ question ’][i]) 11 print(’ Answer : ’, ’ ’. join ( tokens[i:j+1])) 12 print(’ Correct Answer : ’, test_dict [’ answers ’][i][’ text ’]) 13 print(’ — ’) How are the outputs ? Qualitatively look at 10-20 answers and express in your own words how bad or relevant they are.You may need to run more epochs if the sentences make no sense. 3.2 Evaluation Metrics Now for quantitative metrics, we will use Exact Match and F1 score. You may use the following snippets: 1 def compute_exact_match ( prediction , truth ): 2 return int( prediction == truth ) 7 3 4 def f1_score ( prediction , truth ): 5 pred_tokens = prediction . split () 6 truth_tokens = truth . split () 7 8 # if either the prediction or the truth is no – answer then f1 = 1 if they agree , 0 otherwise 9 if len( pred_tokens ) == 0 or len( truth_tokens ) == 0: 10 return int ( pred_tokens == truth_tokens ) 11 12 common_tokens = set( pred_tokens ) & set( truth_tokens ) 13 14 # if there are no common tokens then f1 = 0 15 if len( common_tokens ) == 0: 16 return 0 17 18 prec = len( common_tokens ) / len( pred_tokens ) 19 rec = len( common_tokens ) / len( truth_tokens ) 20 21 return 2 * ( prec * rec ) / ( prec + rec ) Calculate the average and median EM and F1-score and report them. For this, you may first calculate individual output in the test set and collect them in a list.3.3 Comparison Let us now compare out fine tuned model with another open-source fine-tuned model. For this, we will use distilbert-base-cased-distilled-squad model from Hugging Face. This model is a distilled version of BERT fine-tuned on SQuAD dataset. To load and extract outputs from the model, you may use the following snippet: 1 from transformers import pipeline 2 question_answerer = pipeline (” question – answering “, model=’ distilbert – base – cased – distilled – squad ’) 3 4 for i in range(len ( test_dict [’ question ’][:2]) ): 5 result = question_answerer ( question = test_dict [’ question ’][i ], context = test_dict [’ context ’] [i]) 6 print(’ Question : ’, test_dict [’ question ’][i]) 7 print(’ Answer : ’, result[’ answer ’]) 8 8 print(’ Correct Answer : ’, test_dict [’ answers ’][i][’ text ’][0] ) 9 print(’ Exact Match : ’, compute_exact_match ( result [’ answer ’], test_dict [’ answers ’][i][’ text ’ ][0])) 10 print(’F1 Score : ’, f1_score ( result [’ answer ’], test_dict [’ answers ’][i] [’ text ’][0])) 11 print(’ —’) 12 13 14 # Question : Who does Beyonce describe as the definition of inspiration ? 15 # Answer : Oprah Winfrey 16 # Correct Answer : Oprah Winfrey 17 # Exact Match : 1 18 # F1 Score : 1.0 19 # — 20 # Question : Who is still looking for compensation and justice ? 21 # Answer : the many families 22 # Correct Answer : many families 23 # Exact Match : 0 24 # F1 Score : 0.8 25 # — 26 # Question : Discrepancy in what spec brought about a class action suit against Apple in 2003 ? 27 # Answer : College of Science 28 # Correct Answer : the College of Science 29 # Exact Match : 0 30 # F1 Score : 0. 8571428571428571 31 # — Similar to Section 3.2, compute and report average and median EM and F1 scores.4 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. 1. Make sure your submission zip file is under 10MB. Compress your figures if needed. Do NOT submit your network weights nor dataset. 2. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3. 9 • Your source code. Make sure that your source code files are adequately commented and cleaned up. 3. Turn in a zipped file, it should include (a) a typed self-contained pdf report with source code and results and (b) source code files (only .py files are accepted). Rename your .zip file as hw9 .zip and follow the same file naming convention for your pdf report too.4. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code. 5. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 6. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 7. To help better provide feedbacks to you, make sure to number your figures. References [1] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text, 2016.
The initial step in any Natural Language Processing (NLP) task involves text preprocessing, particularly tokenization. Tokenization breaks down a stream of text into meaningful units called tokens, such as words or sentences.This process is essential as it transforms unstructured text data into a format suitable for analysis. Tokenization is fundamental to NLP pipelines, as it divides text into discrete units, enabling their representation as vectors for machine learning. This conversion from raw text to numerical data facilitates further analysis and processing. Following tokenization, the next step is to extract embeddings for these tokens. Word embeddings, for instance, capture the semantic meaning of words in a numerical form, facilitating various NLP tasks.2 Getting Ready for This Homework Before embarking on this homework, do the following: 1. Carefully review Slide 46 through 59 of the Week 12 slides on “Recurrent Neural Networks for Text Classification and Data Prediction” [1]. Make sure you understand how gating is done in GRU to address the problem of vanishing gradients that are caused by the long chains of feedback in a neural network. 2. Review the Week 13 slides on “Word Embeddings and Sequence-toSequence Learning” [2]. In particular, pay attention to Slide 39 through 49 on word2vec and fastText. Make yourself familiar with their use and advantages over one-hot vector encoding.3. Download the text dataset provided on Brightspace. The provided dataset is ”Financial Sentiment Analysis” dataset [4]. The dataset has three sentiments, namely, [ “positive”, “neutral”, “negative”]. This makes its a 3 class classification problem. 4. Install transformers library into your conda environment, as this will be used to extract subword tokens, subsequently, word embeddings.3.1 Tokenization Your first task in this HW would be to tokenize the data. The steps are: 1. Depending on the specific task at hand, text can be tokenized at various levels such as character, subword, word, or sentence level. For this assignment, we will focus on tokenization at the word and subword levels. 2. To tokenize text at the word level, each word is separated at whitespace boundaries. This can be achieved using the built-in split() function.The following code snippet illustrates how this process can be implemented: 1 import csv 2 3 # this is an example of how to read a csv file line by line 4 # This snipped only shows the processing on the first 4 entries 5 sentences = [] 6 sentiments = [] 7 count = 0 8 with open (’data . csv ’, ’r’) as f: 9 reader = csv . reader ( f ) 10 # ignore the first line 11 next ( reader ) 12 for row in reader : 13 count += 1 14 sentences . append ( row[0]) 15 sentiments . append ( row[1]) 16 if count == 4: 17 break 18 19 print ( sentences ) 2 20 # [” The GeoSolutions technology will leverage Benefon ’s GPS solutions by providing Location Based Search Technology , a Communities Platform , location relevant multimedia content and a new and powerful commercial model .” , 21 # ’ $ESI on lows , down $1.50 to $2.50 BK a real possibility ’, 22 # “For the last quarter of 2010 , Componenta ’s net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre -tax profit from a pre – tax loss of EUR7m .” , 23 # ’According to the Finnish – Russian Chamber of Commerce , all the major construction companies of Finland are operating in Russia . ’]24 25 print ( sentiments ) 26 # [ ’ positive ’, ’negative ’, ’positive ’, ’neutral ’] 27 28 # tokenize the sentences word by word 29 word_tokenized_sentences = [ sentence . split () for sentence in sentences ] 30 print ( word_tokenized_sentences [:2]) 31 # [[ ’ The ’, ’ GeoSolutions ’, ’technology ’, ’will ’, ’leverage ’, ’Benefon ’, ” ’s” , ’GPS ’, ’solutions ’, ’by ’, ’ providing ’, ’Location ’, ’ Based ’, ’Search ’, ’ Technology ’, ’,’, ’a ’, ’ Communities ’, ’Platform ’, ’,’, ’location ’, ’relevant ’, ’multimedia ’, ’content ’, ’and ’, ’a ’, ’new ’, ’and ’, ’powerful ’, ’commercial ’, ’model ’, ’. ’] , [ ’ $ESI ’, ’on ’, ’lows , ’ , ’down ’, ’ $1.50 ’, ’to ’, ’$2.50 ’, ’BK ’, ’a ’, ’real ’, ’ possibility ’]] 32 # pad the sentences to the same length 33 # here I chose the max of all the sentences . You may set it to a hard number such 3 as 64 , 128 etc. 34 max_len = max ([len ( sentence ) for sentence in word_tokenized_sentences ]) 35 padded_sentences = [ sentence + [’[PAD ]’] * ( max_len – len ( sentence ) ) for sentence in word_tokenized_sentences ] 36 print ( padded_sentences [:2]) 37 #[[ ’ The ’, ’ GeoSolutions ’, ’technology ’, ’will ’, ’leverage ’, ’Benefon ’, ” ’s” , ’GPS ’, ’solutions ’, ’by ’, ’ providing ’, ’Location ’, ’ Based ’, ’Search ’, ’ Technology ’, ’,’, ’a ’, ’ Communities ’, ’Platform ’, ’,’, ’location ’, ’relevant ’, ’multimedia ’, ’content ’, ’and ’, ’a ’, ’new ’, ’and ’, ’powerful ’, ’commercial ’, ’model ’, ’. ’ , ’[ PAD] ’ , ’[PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD ] ’] , [ ’ $ESI ’, ’on ’, ’lows , ’ , ’down ’, ’$1.50 ’, ’to ’, ’$2.50 ’, ’BK ’, ’a ’, ’real ’, ’possibility ’, ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD ] ’ , ’[PAD] ’ , ’[PAD] ’ , ’[ PAD ] ’ , ’[PAD ] ’ , ’[PAD ] ’ , ’[PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD ] ’ , ’[PAD] ’ , ’[PAD] ’ , ’[ PAD ] ’ , ’[PAD ] ’ , ’[PAD ] ’ , ’[PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD ] ’ , ’[PAD] ’ , ’[PAD ] ’]]3. Subword level tokenization, also known as wordpiece tokenization employed in models like BERT, offers the advantage of breaking down less frequent words into subwords that occur more frequently. Below is code snippet demonstrating how one can perform subword tokenization as used in BERT: 1 from transformers import DistilBertTokenizer 2 model_ckpt = ” distilbert -base – uncased ” 3 distilbert_tokenizer = DistilBertTokenizer . from_pretrained ( model_ckpt ) 4 5 # bert encode returns the tokens as ids. 4 6 # i have set the max length to what we have padded the sentences to in word tokens 7 # you are free to choose any size but be consistent so that you may use the same model for training . 8 bert_tokenized_sentences_ids = [ distilbert_tokenizer . encode ( sentence , padding =’ max_length ’, 9 truncation =True , max_length = max_len ) 10 for sentence in sentences ] 11 12 print ( bert_tokenized_sentences_ids [:2]) 13 # [[ 101 , 1996 , 20248 , 19454 , 13700 , 2015 , 2974 , 2097 , 21155 , 3841 , 12879 , 2239 , 1005 , 1055 , 14658 , 7300 , 2011 , 4346 , 3295 , 2241 , 3945 , 2974 , 1010 , 1037 , 4279 , 4132 , 1010 , 3295 , 7882 , 14959 , 4180 , 1998 , 1037 , 2047 , 1998 , 3928 , 3293 , 2944 , 102] , [101 , 1002 , 9686 , 2072 , 2006 , 2659 , 2015 , 1010 , 2091 , 1002 , 1015 , 1012 , 2753 , 2000 , 1002 , 1016 , 1012 , 2753 , 23923 , 1037 , 2613 , 6061 , 102 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]] 14 bert_tokenized_sentences_tokens = [ distilbert_tokenizer . convert_ids_to_tokens ( sentence ) for sentence in bert_tokenized_sentences_ids ] 15 print ( bert_tokenized_sentences_tokens [:2]) 16 # [[ ’[ CLS] ’ , ’the ’, ’geo ’, ’## sol ’, ’## ution ’, ’##s ’, ’ technology ’, ’will ’, ’ leverage ’, ’ben ’, ’## ef ’, ’## on ’, ” ’” , ’s ’, ’gps ’, ’ solutions ’, ’by ’, ’ providing ’, ’location ’, ’ based ’, ’search ’, ’ technology ’, ’,’, ’a ’, ’ communities ’, ’platform ’, ’,’, ’location ’, ’relevant ’, ’multimedia ’, ’content ’, ’and ’, ’a ’, ’new ’, ’and 5 ’, ’powerful ’, ’commercial ’, ’model ’, ’[ SEP ] ’] , [ ’[ CLS ] ’ , ’$ ’, ’es ’, ’##i ’, ’ on ’, ’low ’, ’##s ’, ’,’, ’ down ’, ’$ ’, ’1 ’, ’. ’ , ’50 ’, ’to ’, ’$ ’, ’2 ’, ’. ’ , ’ 50 ’, ’bk ’, ’a ’, ’real ’, ’ possibility ’, ’[SEP] ’ , ’[ PAD ] ’ , ’[PAD ] ’ , ’[PAD ] ’ , ’[PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD] ’ , ’[ PAD ] ’ , ’[PAD] ’ , ’[PAD] ’ , ’[ PAD ] ’ , ’[PAD ] ’ , ’[PAD ] ’ , ’[PAD] ’ , ’[ PAD ] ’]]4. You will observe that tokens have an extra [ CLS] and [ SEP] token when used with the bert tokenizer. Some words are split into subwords. Example: “solution” is split into “##sol” and “##ution”. 3.2 Word Embeddings Following tokenization, the next step involves generating word embeddings. Before proceding further, it’s important to note that while the BERT tokenizer provides token IDs, our word tokenization process yields only the words themselves. Therefore, before extracting embeddings, we need to create token IDs for the word tokens.The code snippet shows one way of doing this: 1 vocab = {} 2 vocab [’[PAD]’] = 0 3 for sentence in padded_sentences : 4 for token in sentence : 5 if token not in vocab : 6 vocab [ token ] = len( vocab ) 7 8 # print ( vocab ) 9 10 # convert the tokens to ids 11 padded_sentences_ids = [[ vocab [ token ] for token in sentence ] for sentence in padded_sentences ] 12 print ( padded_sentences_ids [:2]) 13 # [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 16 , 20 , 21 , 22 , 23 , 24 , 17 , 25 , 24 , 26 , 27 , 28 , 29 , 0, 0, 0, 0, 0, 0, 0], [30 , 31 , 32 6 , 33 , 34 , 35 , 36 , 37 , 17 , 38 , 39 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 , 0, 0, 0, 0, 0, 0, 0, 0]] Now, let us move onto extracting embeddings. 1 from transformers import DistilBertModel 2 import torch 3 4 model_name = ’distilbert / distilbert -base – uncased ’ 5 distilbert_model = DistilBertModel . from_pretrained ( model_name ) 6 7 # extract word embeddings 8 # we will use the last hidden state of the model 9 # you can use the other hidden states if you want 10 # the last hidden state is the output of the model 11 # after passing the input through the model 12 word_embeddings = [] 13 # convert padded sentence tokens into ids 14 for tokens in padded_sentences_ids : 15 input_ids = torch . tensor ( tokens ) . unsqueeze ( 0 ) 16 with torch . no_grad (): 17 outputs = distilbert_model ( input_ids ) 18 19 word_embeddings . append ( outputs . last_hidden_state ) 20 21 print ( word_embeddings [0]. shape ) 22 # torch . Size ([1, 39 , 768 ]) 23 24 # subword embeddings extraction 25 subword_embeddings = [] 26 for tokens in bert_tokenized_sentences_ids : 27 28 input_ids = torch . tensor ( tokens ) . unsqueeze ( 0 ) 29 with torch . no_grad (): 30 outputs = distilbert_model ( input_ids ) 31 32 subword_embeddings . append ( outputs . last_hidden_state ) 33 34 print ( subword_embeddings [0]. shape ) 35 # torch . Size ([1, 39 , 768 ]) 3.3 Sentiment Analysis Using torch.nn.GRU In this task, we ask you to carry out the same sentiment prediction task but with PyTorch’s GRU implementation.The steps are: 7 1. First, familiarize yourself with the documentation of the torch.nn.GRU module [3]. Also, you should go through Slide 67 through 87 of the Week 12 slides to understand how you may feed in an entire sequence of embeddings at once when using torch.nn.GRU. Using the module in such manner may help you speed up training dramatically. 2. Before training, split your dataset into 80:20 ratio for train and test sets respectively. Create your custom dataloaders which returns word embeddings and the sentiment as a one-hot vector of 3 dimensions corresponding to each emotion. 3. Perform the sentiment prediction experiment using torch.nn.GRU similar to that presented in the lecture slides. Perform quantitative evaluation of the unidirectional RNN on the sequestered test set.4. Now, repeat the above step with PyTorch’s bidirectional GRU (i.e. with bidirectional=True). Note that you’ll also need to adjust several other places in your RNN to accommodate the change of shape for the hidden state. Does using a bidirectional scan make a difference in terms of test performance? 5. In your report, report the overall accuracy and the confusion matrix produced by your RNN that is based on torch.nn.GRU as well as its bidirectional variant. Also, you should include plots of the training losses for the two RNNs. Write a paragraph comparing the test performances of the both RNN implementations that you have done.4 Extra Credit (25 points) Repeat the above sentiment analysis on the text classification dataset on the course website. The link to download the dataset is below: https://engineering.purdue.edu/kak/distDLS/text_datasets_for_ DLStudio.tar.gz 1. First of all, extract the dataset and analuyse the files inside them. You may refer to Slides 11 through 15 of the Week 12 slides to familiarize yourself with how the datasets are organized. 2. Refer to Examples/text_classification_with_GRU_word2vec.py on how to create dataloaders for this dataset. You may report results on 8 embedding sizes 200 and 400 in the dataset, (i.e. sentiment_dataset_train_200 .tar.gz and sentiment_dataset_train_400.tar.gz, and corresponding test datasets.) 3. Train and test your Unidirectional and Bidirection RNNs you created in 3. You may refer to DLStudio code on how to create your training loop. 4. Repeat steps 3 and 5 from Section 3.5 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. For HW9, you need to submit the following: 1. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3, and 4 if you choose to do the extra credit. • Your source code. Make sure that your source code files are adequately commented and cleaned up. 2. Turn in a pdf file a typed self-contained report with source code and results. Rename your .pdf file as hw9 .pdf 3. Turn in a zipped file, it should include all source code files (only .py files are accepted). Rename your .zip file as hw9 .zip . 4. Do NOT submit your network weights. 5. Do NOT submit your dataset. 6. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code.7. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 9 8. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 9. To help better provide feedback to you, make sure to number your figures and tables.References [1] Recurrent Neural Networks for Text Classification and Data Prediction, . URL https://engineering.purdue.edu/DeepLearn/pdf-kak/RNN. pdf. [2] Word Embeddings and Sequence-to-Sequence Learning, . URL https: //engineering.purdue.edu/DeepLearn/pdf-kak/Seq2Seq.pdf. [3] GRU. URL https://pytorch.org/docs/stable/generated/torch. nn.GRU.html. [4] Pekka Malo, Ankur Sinha, Pyry Takala, Pekka Korhonen, and Jyrki Wallenius. Good debt or bad debt: Detecting semantic orientations in economic texts, 2013.
The goal of this homework is for you to compare the performance of a GAN and the approach based on diffusion for generative modeling of a dataset. You will be provided with a subset of the famous CelebA dataset.You will write your own code for a GAN to generate fake images using this dataset. About the diffusion based modeling, since it is very compute intensive, you will be provided with a model trained by Aditya Chauhan in RVL and your homework will only involve generating fake images using this model. If you have the computational resources at your disposal, we will delighted if you train your own diffusion based model using the implementation in the latest addition to DLStudio in Version 2.4.2. The GAN implementation will only involve the approach based on the DCGAN as explained the Week 11 lecture slides. This homework will also make you familiar with the Fr´echet Inception Distance (FID) to evaluate your generated images quantitatively.2 Getting Ready for This Homework Before embarking on this homework, do the following: 1. Download DLStudio version 2.4.2 from the following link: https://engineering.purdue.edu/kak/distDLS/DLStudio2.4.2.tar.gz?download 2. You should already be familiar with concept of Transpose Convolution. To review that material, go through Slide 29 through 62 of the Week 8 slide deck on Semantic Segmentation. Make sure you understand 1 the relationship between the Kernel Size, Padding, and Output Size for Transpose Convolution.You also need to understand the example shown on Slide 48 in which a 4-channel 1 × 1 noise vector is expanded into a 2-channel 4 × 4 noise image. Understanding this example is foundational to understanding the working of GAN.3. Understand the GAN material on Slides 60 through 81 of the Week 11 slide deck. For additional depth, you are encouraged to read the original GAN paper by Goodfellow et al. [1]. 4. When you are learning about a new type of a neural network, playing with an implementation by varying its various parameters and seeing how that affects the results can often help you gain deep insights in a short time. If you believe in that philosophy, execute the following the script in the ExamplesAdversarialLearning directory of DLStudio: python dcgan_DG1.py It uses the PurdueShapes5GAN dataset that is described on Slides 53 through 59 of the Week 11 slides. Instructions for downloading this dataset are on the main DLStudio webpage.5. Since we are not asking to write your own code for diffusion based modeling, it would be sufficient for you gain an understanding of just the top-level ideas on Slides 123 through 165. Ask yourself the following questions: 1) Why does diffusion modeling require two Markov Chains? 2) What is the difference between the forward q-chain and the reverse p-chain? Why does injecting Gaussian noise make it easier to train a diffusion based data modeler? etc. 6. Make yourself with how to run the diffusion related code in DLStudio. To that end, go over the following page to understand how that code is organized: https://engineering.purdue.edu/kak/distDLS/ GenerativeDiffusion-2.4.2_CodeOnly.html 7. After you have downloaded and installed Version 2.4.2 of DLStudio, read the README in the ExamplesDiffusion directory for how to run the diffusion code in DLStudio.1. Ensure you have downloaded the subset of the CelebA dataset from BrightSpace that is associated with this homework. The dataset comprises 10,000 images for working with your two data modelers. All these images are of size 64 × 64. Figure 1 shows a sample of the images from the dataset. 2. Design your own generator and discriminator networks. Just like the previous homeworks, you have total freedom on how you design your networks. You can draw inspiration from the implementations in DLStudio. Your generator must be able to generate RGB celebrity images of size 64 × 64 from random noise vectors utilizing transpose convolutions. 3. Subsequently, you’ll need to write your own adversarial training logic. You can refer to Slide 64 through 69 of the Week 11 slides to familiarize yourself with how it can be done. You only need to use the nn.BCELoss for training.4. In your report, plot the adversarial losses over training iterations for both the generator and the discriminator in the same figure. 4 The Task Related to Diffusion You will find the following scripts in the directory ExamplesDiffusion: 1. README 2. RunCodeForDiffusion.py 3. GenerateNewImageSamples.py 4. VisualizeSamples.py You may want to start with reading the README file. The following instructions are to generate images using the network weights provided to you. 3 Figure 1: Sample images from the subset of CelebA dataset provided. 1. As you have already read, you will need to run all three files to see diffusion from end-to-end. However, we understand that one may not have sufficient computation capacity to run multiple epochs of Diffusion training for batch size 32. Therefore, we are providing the weights to you and you may skip running RunCodeForDiffusion.py. 2. To generate images using the pretrained diffusion model, first change the results directory to your downloaded path directory of weights in GenerateNewImageSamples.py and run the file. NOTE: Ensure you have installed the new version of DLStudio 2.4.2 before running this code.3. Generate 1000 fake images. This will create npy files which you can visualize using the VisualizeSamples.py. Make sure you also change the directory locations accordingly in the VisualizeSamples.py file as well. If you wish to train your own diffusion model, you will also need to run the RunCodeForDiffusion.py before executing the generation and visualization scripts. Change the directory locations in the RunCodeForDiffusion .py before executing the file.4.1 Evaluating Your GAN You can visually analyze the outputs generated by your face-generator. However, how does one quantitatively evaluate generated images? For evaluating generated images quantitatively, Fr´echet Inception Distance (FID) is used. Originally proposed in [2], the FID is a widely used metrics for measuring both the quality and the diversity of GAN-generated images. More specifically, it does so by measuring how close the distribution of the fake images is to the distribution of the real images. 1. First, you should generate 1k images of fake images from randomly sampled noise vectors using your trained generator BCE-GAN.2. To calculate the FID, one would first encode the set of real images (from training data) into feature vectors using a pre-trained Inception network, and then model the resulting distribution of feature vectors using a multivariate Gaussian distribution. The same is carried out for the set of fake images. Once that is done, the FID is simply the Fr´echet distance between the two multivariate Gaussian distributions. 3. For this homework, you will be using the pytorch-fid package [3] for calculating the FIDs. To install the package, use the command: pip install pytorch-fid Once installed, you can use the pytorch-fid package in a Python script as follows: 1 from pytorch_fid . fid_score 2 import calculate_activation_statistics , 3 calculate_frechet_distance 4 from pytorch_fid . inception import InceptionV3 5 6 # you have to write a script to populate the following path lists 7 real_paths = [’/ real /0.jpg ’, ’/ real /1.jpg ’, …] 8 fake_paths = [’/ fake /0.jpg ’, ’/ fake /1.jpg ’, …] 9 dims = 2048 10 block_idx = InceptionV3 . BLOCK_INDEX_BY_DIM [ dims ] 11 model = InceptionV3 ([ block_idx ]) . to ( device ) 12 m1 , s1 = calculate_activation_statistics ( 13 real_paths , model , device = device ) 14 m2 , s2 = calculate_activation_statistics ( 15 fake_paths , model , device = device ) 16 fid_value = calculate_frechet_distance ( m1 , s1 , m2 , s2 ) 17 print ( f’FID: { fid_value :. 2f}’) 5 In your report, you will have to present both qualitative and quantitative results: 4. • Qualitative Evaluation: Display a 4 × 4 image grid, similar to what is shown in Figure 1, showcasing images randomly generated by your BCE-GAN.Repeat the same with images generated by your Diffusion output. Describe in several lines what are the differences and similarities in the respective outputs • Quantitative Evaluation: Present the FID values for both GAN and Diffusion variants. Finally, include a paragraph discussing your results: BCE-GAN v.s. Diffusion, which is better? 5 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. 1. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3. • Your source code. Make sure that your source code files are adequately commented and cleaned up.2. Turn in a pdf file a typed self-contained report with source code and results. Rename your .pdf file as hw8 .pdf 3. Turn in a zipped file, it should include all source code files (only .py files are accepted). Rename your .zip file as hw8 .zip . 4. Make sure your submission zip file is under 10MB. Compress your figures if needed. 5. Do NOT submit your network weights. 6. Do NOT submit your dataset. 7. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code.8. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 9. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 10. To help better provide feedback to you, make sure to number your figures.References [1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139– 144, 2020. [2] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. [3] Maximilian Seitzer. pytorch-fid: FID Score for PyTorch. https:// github.com/mseitzer/pytorch-fid, August 2020. Version 0.3.0.
[pdf-embedder url="https://assignmentchef.com/wp-content/uploads/2025/01/hw7_v6.pdf"] This homework introduces you to semantic segmentation of images with neural networks. Semantic segmentation is of great importance in biomedical imaging where the anotomical features and the tissues affected by pathology can have highly irregular shapes.Most neural network architectures for semantic segmentation are based the Encoder-Decoder design as first proposed in [1]. That network is famously known as the U-Net. In this homework, you will learn about the basics of the Encoder-Decoder networks for semantic segmentation through DLStudio’s mUnet class. That class is a part of the larger SemanticSegmention class in DLStudio.The semantic segmentation code in DLStudio is based on nn.MSELoss. As you can imagine, such a loss function is not likely to be sensitive to the boundaries of the pixel blobs that you want your neural network to identify. To remedy this shortcoming of the code in DLStudio, this homework will also ask you to code up what is known as the Dice loss, which is also known as the Sørensen-Dice coefficient. It’s a popular choice in image segmentation, as it quantifies the overlap between the predicted and the target segmentation masks. Importantly, it provides a smooth and differentiable measure of segmentation accuracy. Additionally, the Dice loss is known to be particularly effective when you have imbalanced datasets.2 Getting Ready for This Homework 1. First of all review the slides 63-83 from the Week 8 lecture. Understand the structure of the mUnet and how it performs semantic segmentation. 2. Download the latest version (2.3.6) of DLStudio that has improved code for Semantic Segmentation from the website or the link below: https://engineering.purdue.edu/kak/distDLS/DLStudio2.3.6.tar.gz It is highly likely that the latest version of DLStudio in your computer is 2.3.5. What you need for this homework is 2.3.6. Perform the appropriate installation in your environment based on the instruction found at the link: https://engineering.purdue.edu/kak/distDLS/#1133. Locate the file named semantic segmentation.py in the main Example subdirectory in your installation of DLStudio. Make yourself as familiar as you can with the script semantic segmentation.py. This is the only script you will be running for this homework. 4. Download the image dataset for DLStudio main module from the website or from the below link: https://engineering.purdue.edu/kak/distDLS/datasets_for_ DLStudio.tar.gz 5. To extract the tar.gz dataset file, use the tar zxvf command as provided below: tar zxvf datasets_for_DLStudio.tar.gz You do NOT need to extract the internal PurdueShapes5MultiObject10000-train.gz and PurdueShapes5MultiObject-1000-test.gz. You only need to provide the pathname for the folder on your machine containing all the datasets. If done correctly, rest should be handled.The following are the programming tasks you must do for this homework: 1. Execute the semantic segmentation.py script and evaluate both the training loss and the test results. Provide a brief write-up of your understanding of mUnet and how it carries out semantic segmentation of an image. By “evaluate” we mean just record the running losses during training. One of the most commonly used tools for evaluating a semantic segmentation network is through the IoU loss. If you wish, you can write that code yourself. But that is not required for this homework.2. The run_code_for_training_for_semantic_segmentation function of the SemanticSegmentation class in DLStudio uses just the MSE loss. MSE loss may not adequately capture the subtleties of segmentation boundaries. To this end, we will implement our own Dice loss and augment it with MSE loss and compare it against vanilla MSE. 3. What follows is a code snippet to help you create your own implemenation for Dice Loss. Make sure you set required_grad=True wherever necessary to ensure backpropagation, therefore, enabling model learning.2 def dice_loss ( preds : torch . Tensor , ground_truth : torch . Tensor , epsilon =1e-6 ): 3 “”” 4 5 inputs : 6 preds : predicted mask 7 ground_truth : ground truth mask 8 epsilon ( float ): prevents division by zero 9 10 returns : 11 dice_loss 12 “”” 13 14 # implement your logic for dice loss 15 16 17 # Step1 : Compute Dice Coefficient . 18 # For the numerator , multiply your prediction with 3 19 # ground truth and compute the sum of elements (in H and W dimensions ).20 # For the denominator , multiply prediction with 21 # itself and sum the elements (in H and W dimensions ) and multiply ground 22 # truth by itself and sum the elements (in H and W dimensions ). 23 24 # Step2 : dice_coeffecient = 2* numerator / ( denominator + epsilon ) 25 26 # Step 3: Compute dice_loss = 1 – dice_coeffecient . 27 28 return dice_loss 4. Plot the best- and the worst-case training-loss vs. iterations using just the MSE loss, just the Dice Loss and a combination of the two . Provide insights into potential factors contributing to the observed variations in performance.5. State your qualitative observations on the model test results for MSE loss vs. Dice+MSE loss.4 Extra Credit For extra credit of 25 points, you repeat the segmentation task in 3 with COCO dataset classes [ ’cake’, ’dog’, ’motorcycle’] from HW6. • Pick images with ONLY single object instance of atleast 200 × 200 bounding box. Extract the segmentation as a mask using the annToMask . This converts the segmentation in an annotation to binary mask. • You will need to extract the binary masks instead of the bounding boxes for this task. • Resize the images to 256 × 256 before storing them to the disk. You should also resize the masks accordingly.• You may continue using the same network from DLStudio. However, you may need to adjust the network parameters to account for 256×256 resized images from COCO dataset as opposed to 64 ×64 images from PurdueShapesMultiObject dataset.5 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. For HW7, you need to submit the following: 1. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3, and 4 if you choose to do the extra credit. • Your source code. Make sure that your source code files are adequately commented and cleaned up. 2. Turn in a pdf file a typed self-contained report with source code and results. Rename your .pdf file as hw7 .pdf 3. Turn in a zipped file, it should include all source code files (only .py files are accepted). Rename your .zip file as hw7 .zip . 4. Do NOT submit your network weights. 5. Do NOT submit your dataset. 6. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code.7. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 8. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 9. To help better provide feedback to you, make sure to number your figures and tables. 5 References [1] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation, 2015.
In HW5, you worked with skip connections to improve the performance of the Deep CNN classifier. Now we use these skip connections to construct an object detector network. Your task in HW6 is to build your own object detector for cakes and other objects.Because it is a more complex homework compared to what you have worked on so far, you are going to need more time for this. So we are going to help you pace your work by having you first submit a “Checkpoint” by Feb 21.Subsequently, the deadline for the homework will be March 1. You will only be allowed to upload your final submission if you submitted the checkpoint. The checkpoint will count for 10% of the overall grade for this homework. 1. For the Checkpoint due Feb 21, 2024, you will work on the problem of single instance object detection. This will only require running a script from the Examples directory of DLStudio. That is, you will NOT be writing any new code for the Checkpoint submission. 2. For the final homework solution that is due March 1, 2024, you will implement multi-instance object detection using YOLO. Single-instance object detection is based on the assumption that each image has only one object of interest (even when the image contains multiple objects).The goal in single-instance object is to detect and localize that object. Localization is carried out by predicting the coordinates of the bounding box for the detected object. 1 Multi-instance object detection, on the other hand, is based on the assumption an image may contain multiple objects of interest and you want to predict their labels and the coordinates of their bounding boxes. Before launching into this homework, make sure that you have downloaded the newly released Version 2.3.4 of DLStudio from https://engineering.purdue.edu/kak/distDLS/DLStudio-2.3.4.html It constains the updated versions of SkipBlock and the networks that use SkipBlock as a building block. You will be using one or more of those networks for the current homework.2 Getting Ready for This Homework 2.1 For Submitting the Checkpoint The main thing you have to do for submitting the checkpoint is to run the following script from the main Examples directory of DLStudio: object_detection_and_localization.py on the following training and testing datasets: PurdueShapes5-10000-train.gz PurdueShapes5-1000-test.gz The integer value you see in the names of the datasets is the number of images in each. You can download these datasets by clicking on the link “Download the image datasets for the main DLStudio module” at the main webpage for DLStudio. This link will give you not only the above two datasets, but also the datasets you are going to need for some of the future homework assignments in our class. After you have downloaded the data archive into the Examples directory in your installation of DLStudio, you would need to execute the following (Linux) commands tar xvf datasets_for_DLStudio.tar.gzThis will create a subdirectory data in the Examples directory and deposit all the datasets in it. That will set you up to execute the previously mentioned script object_detection_and_localizati.py in the Examples directory. This script uses the network class LOADnet2 2 from DLStudio. The acronym ”LOAD” in ”LOADnet2” stands for ”LOcalization And Detection”. To understand the connection between the above mentioned script and the network class LOADnet2 network class, search for the following string in the DLStudio.py file: class DetectAndLocalize As you will see, this class contains ALL of the DLStudio code dealing with single-instance object detection and localization. It defines multiple networks and difference loss functions for the job. As you will see, the LOADnet2 class is very much like the BMEnet class you saw in your previous homework. It uses the same SkipBlock that you are already familiar with.The main difference between BMEnet and LOADnet2 is that the latter predicts both the class label for the detected object and estimates the coordinates of its bounding box. Predicting the coordinates is referred to as regression in DL. For the Checkpoint submission, show the results you get with the DLStudio datasets mentioned above. The work you submit must also include a brief write-up on your understanding of the architecture of LOADnet network.2.2 For the Final Submission of HW6 NOTE: It would be best if you read the material in this section after the class on Tuesday, Feb 20. The very first thing you would need to do would be to beef up SkipBlock you used for the Checkpoint submission. In order to do that, it would be best if you first become familiar the logic of the ResNet as described in the paper: https://arxiv.org/abs/1512.03385 and with the GitHub code for ResNet: https://github.com/pytorch/vision/blob/master/torchvision/ models/resnet.py As you will see, ResNet has two different kinds of skip blocks, named BasicBlock and BottleNeck. BasicBlock is used as a building-block in ResNet-18 and ResNet-34.The numbers 18 and 34 refer to the number of layers in these two networks. For deeper networks, ResNet uses the BottleNeck class. 3 For the final submission, you will also be comparing two different loss functions for the regression loss: The L2-norm based loss as provided by torcn.nn.MSELoss and the CIoU Loss as provided by PyTorch’s complete_box_iou_loss that is available at the link supplied in the Intro section. To prepare for this comparison, review the material on Slides 38 through 48 of the Week 7 slides on Object Detection. Your main goal for the final submission is to implement a multi-instance object detection framework with a YOLO network.The rest of this section details the steps you would need to go through for that. 1. Your first step would be to come to terms with the basic concepts of YOLO. As will be explained in the class next Tuesday, the YOLO logic is based on the notion of of Anchor Boxes. You divide an image into a grid of cells and you associate N anchor boxes with each cell in the grid. Each anchor box represents a bounding box with a different aspect ratio. Your first question is likely to be: Why divide the image into a grid of cells? To respond, the job of estimating the exact location of an object is assigned to that cell in the grid whose center is closest to the center of the object itself.Therefore, in order to localize the object, all that needs to be done is to estimate the offset between the center of the cell and the center of true bounding box for the object. But why have multiple anchor boxes at each cell of the grid? As previously mentioned, anchor boxes are characterized by different aspect ratios. That is, they are candidate bounding boxes with different height-to-width ratios. In Prof. Kak’s implementation in the RegionProposalGenerator module, he creates five different anchor boxes for each cell in the grid, these being for the aspect ratios: [ 1 / 5, 1/3, 1/1, 3/1, 5/1 ] . The idea here is that the anchor box whose aspect ratio is closest to that of the true bounding box for the object will speak with the greatest confidence for that object.2. You can deepen your understanding of the YOLO logic by looking at the implementation of image gridding and anchor boxes in Version 2.1.1 of Prof. Kak’s RegionProposalGenerator module: https://engineering.purdue.edu/kak/distRPG/ Go to the Example directory and execute the script: 4 multi_instance_object_detection.py and work your way backwards into the module code to see how it works. In particular, you should pay attention to how the notion of anchor boxes is implemented in the function: run_code_for_training_multi_instance_detection() To execute the script multi_instance_object_detection.py, you will need to download and install the following datasets: Purdue_Dr_Eval_Multi_Dataset-clutter-10-noise-20-size-10000-train.gz Purdue_Dr_Eval_Multi_Dataset-clutter-10-noise-20-size-1000-test.gz Links for downloading the datasets can be found on the module’s webpage. In the dataset names, a string like size-10000 indicates the number of images in the dataset, the string noise-20 means 20% added random noise, and the string clutter-10 means a maximum of 10 background clutter objects in each image.Follow the instructions on the main webpage for RegionProposalGenerator on how to unpack the image data archive that comes with the module and where to place it in your directory structure. These instructions will ask you to download the main dataset archive and store it in the Examples directory of the distribution.3.1 Checkpoint Submission Checkpoint submission should NOT require any programming by you. All you have to do is to run the DLStudio script as described in Section 2.1 and submit your results. The document you submit should include a brief writup on your understanding of the LOADnet network.3.2 Final Submission of HW6 This section contains guidelines on how to extract images with one or more than one instance of the object from the COCO dataset. Finally, implement the YOLO logic to perform multi-instance detection. 5 3.2.1 How to Use the COCO Annotations For this homework, you will need bounding boxes in addition to the labels from the COCO dataset. In this section, we go over how to access these annotations as shown in Fig. 1. The code below is sufficient to introduce you how to prepare your own dataset and write your dataloader for this homework.Before we jump into the code, it is important to understand structures of the COCO annotations. The COCO annotations are stored in the list of dictionaries and what follows is an example of such a dictionary: 1 { 2 “id”: 1409619 , # annotation ID 3 ” category_id “: 1 , # COCO category ID 4 ” iscrowd “: 0 , # specifies whether the segmentation is for a single object or for a group / cluster of objects 5 ” segmentation “: [ 6 [86 .0 , 238 .8 , … , 382 . 74 , 241 . 17] 7 ], # a list of polygon vertices around the object (x, y pixel positions ) 8 ” image_id “: 245915 , # integer ID for COCO image 9 ” area “: 3556 . 2197000000015 , # Area measured in pixels 10 ” bbox “: [86 , 65 , 220 , 334] # bounding box [top left x position , top left y position , width , height ] 11 }The following code (refer to inline code comments for details) shows how to access the required COCO annotation entries and display a randomly chosen image with desired annotations for visual verification. After importing the required python modules (e.g. cv2, skimage, pycocotools, etc.), you can run the given code and visually verify the output yourself. 1 # Input 2 input_json = ’ instances_train2017 . json ’ 3 class_list = [’cake ’, ’dog ’, ’motorcycle ’] 4 5 # ########################## 6 # Mapping from COCO label to Class indices 7 coco_labels_inverse = {} 8 coco = COCO ( input_json ) 9 catIds = coco .getCatIds ( catNms = class_list ) 10 categories = coco . loadCats ( catIds ) 11 categories . sort ( key= lambda x: x[’id ’]) 6 12 print ( categories ) 13 14 15 for idx , in_class in enumerate ( class_list ): 16 for c in categories : 17 if c[’name ’] == in_class : 18 coco_labels_inverse [c[’id ’]] = idx 19 print ( coco_labels_inverse ) 20 21 # ############################ 22 # Retrieve Image list 23 imgIds = coco . getImgIds ( catIds = catIds ) 24 25 # ############################ 26 # Display one random image with annotation 27 idx = np . random . randint (0 , len ( imgIds ) ) 28 img = coco . loadImgs ( imgIds [idx ])[0] 29 I = io . imread ( img[’coco_url ’]) 30 # change from grayscale to color 31 if len( I . shape ) == 2: 32 I = skimage . color . gray2rgb ( I ) 33 # pay attention to the flag , iscrowd being set to False 34 annIds = coco .getAnnIds ( imgIds = img[’id ’], catIds = catIds , iscrowd = False ) 35 anns = coco . loadAnns ( annIds ) 36 fig , ax = plt . subplots (1 , 1 ) 37 image = np . uint8 ( I ) 38 for ann in anns : 39 [x , y , w , h] = ann[’bbox ’] 40 label = coco_labels_inverse [ann[’ category_id ’]] 41 image = cv2 . rectangle ( image , (int( x ) , int( y ) ) , (int( x + w ) , int ( y + h ) ) , ( 36 , 255 , 12 ) , 2 ) 42 image = cv2 . putText ( image , class_list [ label ], (int( x ) , int( y – 10 ) ) , cv2 . FONT_HERSHEY_SIMPLEX , 43 0 .8 , ( 36 , 255 , 12 ) , 2 ) 44 ax . imshow ( image ) 45 ax . set_axis_off () 46 plt . axis (’tight ’) 47 plt . show ()3.2.2 Creating Your Own Multi-Instance Object Localization Dataset In this exercise, you will create your own dataset based on following steps: 1. You need to write a script similar to HW4 that filters through the images and annotations to generate your training and testing dataset such that any image in your dataset meets the following criteria: 7 • Contains at least one foreground object. A foreground object must be from one of the three categories: [ ’cake’, ’dog’, ’motorcycle’]. Additionally, the area of any foreground object must be larger than 64 × 64 = 4096 pixels1 . Different from the HW4, there can be multiple foreground objects in an image since we are dealing with multi-instance object localization for this homework. If there is none, that image should be discarded. • When saving your images to disk, resize them to 256 ×256.Note that you would also need to scale the bounding box parameters accordingly after resizing. • Again, use images from 2017 Train images for the training set and 2017 Val images for the testing set. Figure 1: Sample COCO images with bounding box and label annotations for multi-instances. Again, you have total freedom on how you organize your dataset as long as it meets the above requirements. If done correctly, you will end up with approximately 8000 train images and 300 test images.2. In your report, make a figure of a selection of images from your created dataset. You should plot at least 3 images from each of the three classes like what is shown in Fig. 1 and with the annotations of all the present foreground objects. 1Also, you can use the area entry in the annotation dictionary instead of calculating it yourself. 8 3.2.3 Building Your Deep Neural Network 1. Once you have prepared the dataset, you now need to implement your deep convolutional neural network (CNN) for multi-instance object classification and localization. You can directly base your CNN architecture on LOADnet2 adjusting for YOLO parameters. Again, you have total freedom on what specific architecture you choose. You will need to use a beefed up SkipBlock in 2.2.2. The key design choice you’ll need to make is on the organization of the predicted parameters by your network. As you have learned in Prof. Kak’s tutorial on Multi-Instance Object Detection [1], for any input image, your CNN should output a yolo_tensor. 3. The exact shape of your predicted yolo_tensor is dependent on how you choose to implement image gridding and the anchor boxes. It is highly recommended that, before starting your own implementation, you should review the tutorial again and familiarize yourself with the notions of yolo_vector, which is predicted for each and every anchor box, and yolo_tensor, which stacks all yolo_vectors. 4. In your report, designate a code block for your network architecture. 5. Additionally, clearly state the shape of your output yolo_tensor and explain in detail how that shape is resulted from your design parameters, e.g. the total number of cells and the number of anchor boxes per cell, etc.3.3 Training and Evaluating Your Network Now that you have finished designing your deep CNN, it is finally time to put your glorious multi-cake detector in action. What is described in this section is probably the most challenging part of the homework. To train and evaluate your YOLO framework, you should follow the steps below: 1. Write your own dataloader. While everyone’s implementation will differ, it is highly recommended that the following items should be returned by your __getitem__ method for multi-instance object localization: (a) The image tensor; (b) For each foreground object present in the image: 9 i. Index of the assigned cell; ii. Index of the assigned anchor box; iii. Groundtruth yolo_vector. The tricky part here is how to best assign a cell and an anchor box given a GT bounding box. For this part, you will have to implement your own logic. Typically, one would start with finding the best cell, and subsequently, choose the anchor box with the highest IoU with the GT bounding box. You would need to pass on the indices of the chosen cell and anchor box for the calculation of the losses explained later in this section. It is also worthy to remind yourself that the part in a yolo_vector concerning the bounding box should contain four parameters: δx, δy, σw and σh. The first two, δx and δy, are simply the offsets between the GT box center and the anchor box center. While the last two, σw and σh, can be the “ratios” between the GT and anchor widths and heights: wGT = e σw · wanchor, hGT = e σh · hanchor.2. Create your own training code (or adjust existing code) for training your network. This time, you’ll need three different types of losses: a binary cross-entropy loss for detecting objects, a cross-entropy loss for classifying objects, and another loss for refining bounding box positions. 3. Develop your own evaluation code (or make modifications to existing code). Evaluating single-instance detectors, assessing the performance of a multi-instance detector can be more complex and may be beyond the scope of this homework assignment, as discussed in Prof. Kak’s tutorial [1]. Therefore, for this assignment, we only require you to present your multi-instance detection and localization results in a qualitative manner. This means that for each test image, you should display the predicted bounding boxes and their corresponding class labels alongside the ground truth annotations. Specifically, you’ll need to create your own method for translating the predicted yolo_tensor into bounding box predictions and class label predictions that can be visually represented. You have the flexibility to implement this logic according to your own approach.10 4. In your report, write several paragraphs summarizing on how you have implemented your dataloading, training and evaluation logic. Additionally, include a plot of all three losses over training iterations (you should train your network for at least 10-15 epochs). For presenting the outputs of your YOLO detector, display your multiinstance localization and detection results for at least 8 different images from the test set. Again, for a given test image, you should plot the predicted bounding boxes and class labels along with the GT annotations for all foreground objects. You should strive to present your best multi-instance results in at least 6 images while you can use the other 2 images to illustrate the current shortcomings of your multi-instance detector. Additionally, you should include a paragraph that discusses the performance of your YOLO detector.4 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. For checkpoint I and the final submission for HW6, you need to submit the following: 1. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3. • For final submission only, your source code. Make sure that your source code files are adequately commented and cleaned up. 2. Turn in a pdf file a typed self-contained pdf report with source code and results. Rename your .pdf file as hw6 .pdf 3. Turn in a zipped file, it should include all source code files (only .py files are accepted). Rename your .zip file as hw5 .zip . 4. There will be seperate submission links for Checkpoint I and HW6 on Brightspace 5. Do NOT submit your network weights.6. Do NOT submit your dataset. 11 7. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code. 8. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 9. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 10. To help better provide feedback to you, make sure to number your figures and tables. References [1] Multi-Instance Object Detection – Anchor Boxes and Region Proposals. URL https://engineering.purdue.edu/DeepLearn/pdf-kak/ MultiDetection.pdf.
As you surely realized in HW4, the classification accuracy did not improve significantly when you increased the number of convolutional layers in HW4’s class Net3. An important reason for that is the problem of vanishing gradients in deep networks.The goal of this homework is for you to acquire some preliminary experience with using a new network building-block element known as a SkipBlock or a ResBlock for mitigating the problem of vanishing gradients. Your doing so would make it a bit more efficient for your instructor to provide you with a thorough presentation of the topic next Tuesday. If you wish to also see some instructional material on SkipBlocks before the class on Tuesday, feel free to look over your instructor’s Week 6 slides at the course website.2 Getting Ready for This Homework Go through the following steps in order to get ready for this homework: 1. Open the main module file DLStudio/DLStudio.py in your text editor and search for the string Class SkipConnections. 2. Read the doc string associated with the class definition for the class SkipConnections. 3. Go over the implementation of the inner class SkipBlock. This will serve as the building-block for the network you will be creating for this homework. 1 4. Now go back to the installation directory for DLStudio and open the main Examples subdirectory. There you will find the following script that you will find helpful for his homework. This script uses BMEnet that is defined right after SkipBlock in the main DLStudio module file: playing_with_skip_connections.py The code you will be creating for this homework will be along the lines of what you see in the class BMEnet in DLStudio. If you so wish, you can simply create a larger version of BMEnet for your homework solution.3.1 Building Your Deep Convolutional Neural Network 1. Copy over the SkipBlock mentioned above into your code and use it as a building block for creating your network. What that means is that you will NOT be directly using the torch.nn components like nn.Conv2d in your neural network. Instead your network will consist of layers of SkipBlock in the same manner as the BMEnet is built from the SkipBlock components. 2. Make sure that your network has at least 40 learnable layers. Since each SkipBlock uses two instances of nn.Conv2d, your network will need to contain at least 20 instances of SkipBlock.You can check the number of layers in your network by 1 num_layers = len( list ( net . parameters () ) ) Make sure you have properly commented your code and cited the sources of the code fragments you have borrowed. The report must mention the total number of learnable layers in your network. As previously mentioned, you may also directly use BMEnet as a starter for your classification network. BMEnet Class Path: DLStudio → SkipConnections → BMEnet 3.2 Training and Evaluating Your Trained Network • Use the same training and evaluation pipelines that you created for HW4.2 • You may also adapt the training and evaluation routines provided in DLStudio for the classification task. • Collect training loss for at least two different learning rates and include the plots in your report. • Report the accuracy numbers and the confusion matrix for different learning rates on the test dataset. • State your observations regarding the classification performance of HW5Net in comparison with what you achieved previously with Net3 in HW4. Also, attach your confusion matrix of Net3 from HW4. • Optional: You may also attach the confusion matrix generated for CIFAR dataset. State your observations in comparison to COCO dataset performance.4 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. 1. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3. • Your source code. Make sure that your source code files are adequately commented and cleaned up. 2. Turn in a pdf file a typed self-contained pdf report with source code and results. Rename your .pdf file as hw5 .pdf 3. Turn in a zipped file, it should include all source code files (only .py files are accepted). Rename your .zip file as hw5 .zip . 4. Do NOT submit your network weights. 5. Do NOT submit your dataset. 6. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code. 3 7. You can resubmit a homework assignment as many times as you want up to the deadline.Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 8. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 9. To help better provide feedback to you, make sure to number your figures and tables.
The goal of this homework is for you to develop a greater appreciation for the step-size optimization logic that is ubiquitous in training deep neural networks. To that end, this homework will first ask you to execute the scripts in the Examples directory of your instructor’s CGP class that are based on a vanilla implementation of SGD (Stochastic Gradient Descent).Subsequently, you will be asked to improve the learning performance that can be achieved with those scripts by replacing the basic SGD step-size calculation with your implementation based on what’s known as the Adam optimizer. The Adam optimizer involves two parameters that are commonly denoted by β1 and β2. A second goal of this homework is for you to carry out hyperparameter tuning with respect to these two parameters.What that means is that you need to search through a designated range of values for these two parameters in order to find the values that give you the best performance. That begs the question: How to measure the performance of a network for any given values for β1 and β2? For now, just use the least value of the loss achieved with N iterations of training. For further information regarding the concepts described above, please refer to Prof. Kak’s slides on Autograd [1].2 Becoming Familiar with the Primer 1. Download the tar.gz archive and install version 1.1.3 of your instructor’s ComputationalGraphPrimer. You will be notified via Piazza or Brightspace if there are any version updates. You may not want to sudo pip install the Primer since that would not give you the Examples directory of the distribution that you are going to need for 1 the homework. The main documentation page for the Primer can be accessed though the following link: https://engineering.purdue.edu/kak/distCGP/2. Execute the following scripts in the Examples directory: python3 one_neuron_classifier.py python3 multi_neuron_classifier.py The final output of both these scripts is a display of the training loss versus the training iterations. 3. Now, execute the following script in the Examples directory python3 verify_with_torchnn.py If you did not make changes to the script in the Examples directory, the loss vs. iterations graph that you will see is for a network that is a torch.nn version of the handcrafted network you get through the script multi neuron classifier.py Compare visually the output you get with the above call with what you saw for the second script in Step 2.4. Now make appropriate changes to the file verify with torchnn.py in order to see the torch.nn based output for the one-neuron model. The changes you need to make are mentioned in the documentation part of the file verify with torchnn.py. Again compare visually the loss-vs-iterations for the one-neuron case with the handcrafted network vis-a-vis the torch.nn based network. 5. Now comes the somewhat challenging part of this homework: If you’d look at the code for the one-neuron and multi-neuron models in the Primer, you will notice that the step-size calculations do no use any optimizations. [For the one-neuron case, you can also see the backprop and update code on Slide 59 and, for the multi-neuron case, on Slide 80 of the Week 3 slides.]The implemented parameter update steps are based solely on the current value of the gradient of the loss with respect to the parameter in question. That is, pt+1 = pt − lr ∗ gt+1 (1) where pt denotes learnable parameters from the previous time step, e.g. layer weights at iteration t, and gt+1 denotes the corresponding gradient for the current time step t + 1. 2 It is your job to improve the estimation of pt+1 using the ideas discussed on Slides 105 through 117 of the Week 3 slides. In order to fully appreciate what that means, it is recommended that you carefully review the material on those slides[1]. As you will see in the slides mentioned above, the two major components of step-size optimization are: (1) using momentum; and (2) adapting the step sizes to the gradient values of the different parameters.(The latter is also referred to as dealing with sparse gradients.) Adam (Adaptive Moment Estimation) currently incorporates both of these components and stands as the world’s most popular step-size optimizer. However, in some cases, practitioners choose SGD+ over Adam. Feel free to consult your TA to understand the reasons behind this choice. Also, feel free to inititate a conversation on Piazza over the same topic. What follows is a brief description of the two choices for the optimizer in order to help you do your homework. • SGD with Momentum (SGD+): In its simplest form, incorporating momentum involves retaining the step size from the previous iteration.The current step-size decision is then based on the current gradient value and the preceding step size. To invoke momentum for step optimization, separate step updates are computed for individual learnable parameters. This approach facilitates determining the current step size by considering both its prior value and the current gradient value. The recursive update formula for the step size (v) is expressed as follows: vt+1 = µ ∗ vt + gt+1, pt+1 = pt − lr ∗ vt+1. (2) In the formulas shown, v is the step size and the first equation is the recursive update formula for its update. v0 is typically initialized with all zeros. When determining the step size for the current iteration t + 1, only a fraction µ of its value from the previous iteration is utilized. The momentum scalar µ ∈ [0, 1] determines the weight assigned to the previous time step update. If you set µ = 0, it corresponds to Vanilla Gradient Descent. Since this exercise aims for you to comprehend what goes under the hood, what variable you think µ corresponds to in the torch implementation of SGD.3 You can find the torch documentation of SGD in the following link: https://pytorch.org/docs/stable/generated/torch.optim. SGD.html • Adaptive Moment Estimation (Adam): Adam is one of the most widely used step-size optimizers for SGD in deep learning owing to its efficiency and robust performance especially on large datasets. The key idea behind Adam is a joint estimation of the momentum term and the gradient adaptation term in the calculation of the step sizes.To this end, it keeps running averages of both the first and second moments of the gradients, and takes both the moments into account for calculating the step size. The equations below demonstrate the key logic: mt+1 = β1 ∗ mt + (1 − β1) ∗ gt+1, vt+1 = β2 ∗ vt + (1 − β2) ∗ (gt+1) 2 , pt+1 = pt − lr ∗ p mˆ t+1 vˆt+1 + ϵ , (3) where the definitions of the bias-corrected moments ˆm and ˆv can be found on Slide 115 of [1]. In practice, β1 and β2, which control the decay rates for the moments, are generally set to 0.9 and 0.99, respectively.• Hyperparameter Tuning the the Adam Optimizer: Hyperparameter tuning is crucial in deep learning as it involves optimizing the settings that control the learning process, impacting model performance. The right hyperparameter values can significantly enhance a model’s accuracy, generalization, and ability to extract meaningful patterns from data. Effective tuning ensures that a model adapts well to diverse datasets and problem domains, ultimately leading to more robust and reliable models. This exercise is aimed to provide insights into the sensitivity of the Adam optimizer to changes in β1 and β2 values and enhance your understanding of hyperparameter tuning in deep learning. 3 Programming Task • Your main programming task is two-fold: implementing SGD+ and Adam based on the basic SGD you see in one_neuron_classifier. py and multi_neuron_classifier.py.4 As explained in Section 2, the Steps 1-4 are for you to become familiar with Version 1.1.3 of the Primer. Prof. Kak’s slides on Autograd explain the basic logic of the implementation code for one_neuron_classifier.py and multi_neuron_classifier.py. More specifically, your programming task is to create new versions of the one-neuron and multi neuron-classifiers that are based on SGD+ as well as Adam. • Note that for the implementation of both SGD+ and Adam, modifying the main module file ComputationalGraphPrimer.py is NOT recommended. Instead, you should create subclasses that inherit the ComputationalGraphPrimer class provided by the module. In your subclasses, create or override any class methods as your implementation requires. Also, it should be stressed that you are not allowed to use PyTorch’s built-in SGD optimizer.• Do include your observations on why the results with torch.nn are better. Also, talk about the effect of beta values in 3. • Fig. 1 shows an example of the comparative plots from the one-neuron classifier. This plot is shown just to give you an idea of the improvement achieved from SGD+ over SGD. Your results could vary based on your choice of the parameters, such as learning rate, µ, batch size, number of iterations, etc. • In this final exercise, you will explore how the performance of the Adam optimizer is affected by two hyperparameters: β1 (for the exponential decay of the first moment estimates) and β2 (for the exponential decay of the second moment estimates).Using your own implemention for the Adam optimizer, train your network with 3 different values for β1 and β2. For example, you can set β1 to [0.8, 0.95, 0.99] and β2 to [0.89, 0.9, 0.95]. Pick a reasonable value for N. You may continue with the same number of iterations as in previous exercises. Now, tabulate the time taken, final and minimum losses in these nine different configurations. Based on your observations, state your conclusions about the impact of β1 and β2 on the Adam optimizer’s performance. 5 Figure 1: Sample comparative plot (SGD+ vs SGD) for the one-neuron network. Your results could vary depending on your choice of the training parameters. All the plot formatting related options are also flexible.4 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. 1. Do NOT include CGP Primer downloaded folder or any datasets. Only submit the .py files you have modified. If you have made any changes to CGP Primer, your code won’t run on our test scripts. Please be warned and adhere to instructions in 3 on modifying the main module files. 2. Your pdf must include a description of • A description of both SGD+ and Adam in your own words with key equations. 6 • For the one-neuron classifier, a plot of training loss vs iteration comparing all three optimizers (SGD, SGD+, Adam). Another 2 sets of the same plot but with two different learning rates of your choice covering a good spectrum of low to high learning rates. What are your observations in terms of loss smoothness and convergence?• The same comparative plots with 3 different learning rates for multi-neuron and state your observations. • Discuss your findings comparing the performance of the three optimizers in one or two paragraphs. • Discuss your findings comparing the performance of the Adam optimizer under 9 configurations in one or two paragraphs. • Your source code. Make sure that your source code files are adequately commented and cleaned up. 3. Turn in a pdf file a typed self-contained pdf report with source code and results. Rename your .pdf file as hw3 .pdf 4. Turn in a zipped file, it should include all source code files (only .py files are accepted). Rename your .zip file as hw3 .zip . 5. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code.6. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 7. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 8. To help better provide feedback to you, make sure to number your figures and tables. 7 References [1] Autograd for Automatic Differentiation and for Auto-Construction of Computational Graphs. URL https://engineering.purdue.edu/ DeepLearn/pdf-kak/AutogradAndCGP.pdf.
The two main goals of this homework: 1. To introduce you to the competition-grade and very famous COCO dataset of images. A more correct name for the COCO dataset is MS-COCO. The acronym stands for Microsoft Common Objects in COntext.It is frequently used by researchers to showcase the power of their neural networks for solving problems in image segmentation, classification, object detection, etc. You will be tasked to create your own image classification dataset as a relatively small subset of COCO. The dataset you create will consist of the images and their annotations as described later in this homework.2. Your second goal will be to write PyTorch code for a CNN (Convolutional Neural Network) for image classification. You will train the CNN using the dataset you will extract from COCO as mentioned above. Since this will be your first exercise in image classification, this part of the homework will also suggest that you develop your classification insights quickly by experimenting with DLStudio’s inner class ExperimentsWithCIFAR using the far simpler CIFAR dataset that consists of just 32 × 32 images. 2Background 2.1 About the COCO Dataset Owing to its rich annotations, the COCO dataset, first published in 2014 [3], continues to be a most important resource in deep learning. A recent very famous paper from Meta presented a powerful neural network for image 1 segmentation called Segment Anything (SAM) [2]. It was trained using the COCO dataset. With its versatile annotations, the dataset can be used to train networks for all kinds of tasks including image classification, object detection, self-supervised learning, pose estimation, and more. To understand the motivations behind its inception and to appreciate the challenges faced in constructing the dataset, see the original paper [3] on COCO.You should at least read the Introduction section of the paper. For this homework, you will download a part of the full COCO dataset and familiarize yourself with the COCO API, which provides a convenient interface to the otherwise complicated annotation files. Finally, you will create your own image dataset for classification using the downloaded COCO files and the COCO API. 2.2 About the Image Classification Network You Will Write Code For DLStudio’s inner class ExperimentsWithCIFAR will serve as a sandbox for quickly coming up to speed on this part of the Homework.The network you have to create is likely to be very similar to the two examples — Net and Net2 — shown in that part of DLStudio. After installing DLStudio, play with these two networks by changing the parameters of the convolutional and the fully connected layers and see what that does to the classification accuracy. Regarding DLStudio, download its zip archive from its main doc page, install the module, and go to its Examples directory to get hold of the script named below.1 : python playing_with_cifar10.py As with the DLStudio example code mentioned above, the classification network you will create will use a certain number of convolutional layers and, at its top, will contain one or more fully connected (FC) layers (also known as Linear layers).The number of output nodes at the final layer is equal to the number of image classes you will be working with, here 10. In your experiments with the classification network, pay attention to the changing resolution in the image tensor as it is pushed up the resolution hierarchy of a CNN. This is particularly important when you are trying to 1The CIFAR-10 dataset will be downloaded automatically when you run the script playing with cifar10.py. The CIFAR image dataset, made available by the University of Toronto, is considered to be the fruit-fly of DL. The dataset consists of 32 × 32 images, 50,000 for training and 10,000 for testing that can easily be processed in your laptop. Just Google “CIFAR-10 dataset” for more information regarding the dataset.2 estimate the number of nodes you need in the first fully connected layer at the top of the network. Depending on the sizes of the convolutional kernels you will use, you may also need to pay attention to the role played by padding in the convolutional layers.3.1 Using COCO to Create Your Own Image Classification Dataset Note: Don’t be concerned about the initial large size of the complete dataset you download. You will utilize only a subset of the entire dataset, which you will extract following the provided instructions. Make sure to execute these steps accurately, save the subset data, and keep it for future assignments. Through this exercise, you will create a custom dataset which is a subset of the COCO dataset: 1. The first step is to install the COCO API in your conda environment.The Python version of the COCO API — pycocotools provides the necessary functionalities for loading the annotation JSON files and accessing images using class names. The pycocoDemo.ipynb demo available on the COCO API GitHub repository [1] is a useful resource to familiarize yourself with the COCO API. You can install the pycocotools package with the following command 2 : conda install -c conda-forge pycocotools 2. Now, you need to download the image files and their annotations. The COCO dataset comes in 2014 and 2017 versions. For this homework, you will be using the 2017 Train images. You can download them directly from this page: https://cocodataset.org/#download On the same page, you will also need to download the accompanying annotation files: 2017 Train/Val annotations. Unzip the two archives you just downloaded. 2The following command may change based on your version of conda, please check for the appropriate conda/pip command to install pycocotools 33. You main task is to use those files to create your own image classification dataset. Note that you can access the class labels of the images stored in the instances_train2017.json file using the COCO API. You have total freedom on how you organize your dataset as long as it meets the following requirements: • It should contain 1600 training and 400 validation images for each of the following five classes: [ ’boat’, ’couch’, ’dog’, ’hotdog’, ’motorcycle’] This will amount to 8000 training images and 2000 validation images in total and there should be no duplicates. All images should be taken from the 2017 Train images set you just downloaded. • When saving your images to disk, resize them to 64 × 64. You may use the opencv and PIL module to perform above operations.4. In your report, make a figure of a selection of images from your created dataset. You should plot at least 3 images from each of the five classes. Do NOT submit any dataset, original or custom, to Brightspace. 3.2 Image Classification using CNNs – Training and Validation Once you have prepared the dataset, you now need to implement and test the following CNN tasks: CNN Task 1: In the following network, you will notice that we are constructing instances of torch.nn.Conv2d in the mode in which it only uses the valid pixels for the convolutions. But, as you now know based on the Week 5 lecture (and slides), this is going to cause the image to shrink as it goes up the convolutional stack.4 Your first task is to run the network as shown. Let’s call this single layer CNN as Net1 . You may also modify the code by using nn.Sequential to implement this. 1 class HW4Net ( nn . Module ): 2 def __init__ ( self ): 3 super ( HW4Net , self ) . __init__ () 4 self . conv1 = nn . Conv2d (3 , 16 , 3 ) 5 self . pool = nn . MaxPool2d (2 , 2 ) 6 self . conv2 = nn . Conv2d ( 16 , 32 , 3 ) 7 self . fc1 = nn . Linear ( XXXX , 64 ) 8 self . fc2 = nn . Linear ( 64 , XX ) 9 10 def forward ( self , x ): 11 x = self . pool ( F . relu ( self . conv1 ( x ) ) ) 12 x = self . pool ( F . relu ( self . conv2 ( x ) ) ) 13 x = x . view ( x . shape [0], -1 ) 14 x = F . relu ( self . fc1 ( x ) ) 15 x = self . fc2 ( x ) 16 return x Note that the value for XXXX will vary for each CNN architecture and finding this parameter for each CNN is your homework task. XX denotes the number of classes.In order to experiment with a network like the one shown above, your training routine can be as simple as: 1 net = net . to ( device ) 2 criterion = torch . nn . CrossEntropyLoss () 3 optimizer = torch . optim . Adam ( 4 net . parameters () , lr=1e-3 , betas =( 0 .9 , 0 . 99 ) ) 5 epochs = 7 6 for epoch in range ( epochs ): 7 running_loss = 0 . 0 8 for i , data in enumerate ( train_data_loader ): 9 inputs , labels = data 10 inputs = inputs . to ( device ) 11 labels = labels . to ( device ) 12 optimizer . zero_grad () 13 outputs = net ( inputs ) 14 loss = criterion ( outputs , labels ) 15 loss . backward () 5 16 optimizer . step () 17 running_loss += loss . item () 18 if (i+1 ) % 100 == 0: 19 print (“[ epoch : %d, batch : %5d] loss : %.3f” 20 % ( epoch + 1 , i + 1 , running_loss / 100 ) ) 21 running_loss = 0 . 0 where the variable net is an instance of HW4Net. CNN Task 2: In the HW4Net class as shown, we used the class torch.nn.Conv2d class without padding.In this task, construct instances of this class with padding. Specifically, add a padding of one to the all the convolutional layers. Now calculate the loss again and compare with the loss for the case when no padding was used. This is the second CNN architecture, Net2 for this homework. CNN Task 3: So far, both Net1 and Net2 can be only considered as very shallow networks. Now in this task, we would like you to experiment with a deeper network. Modify the HW4Net class to chain at least 10 extra convolutional layers between the second conv layer and the first linear layer. Each new convolutional layer should have 32 in-channels, 32 out-channels, a kernel size of 3 and padding of 1. In the forward() method, the output of each conv layer should be fed through an activation function before passed into the next layer. Note that you would also need to update the value of XXXX accordingly. The resulting network will be the third CNN architecture — Net3 . Before you proceed further, identify the number of parameters in each of your network. 6 (a) Training loss for the three CNNs. (b) Sample confusion matrix. Figure 1: Sample output, training loss and validation confusion matrix.The plotting options are flexible. Your results could vary based on your choice of hyperparamters. The confusion matrix shown is for a different dataset and is for illustration only. Note that in order to train and evaluate your CNNs, you will need to implement your own torch.utils.data .Dataset and DataLoader classes for loading the images and labels. This is similar to what you have implemented in HW2. For evaluating the performance of your CNN classifier, you need to write your own code for calculating the confusion matrix. For the dataset that you created, your confusion matrix will be a 5 × 5 array of numbers, with both the rows and the columns standing for the 5 classes in the dataset. The numbers in each row should show how the test samples corresponding to that class were correctly and incorrectly classified. You might find scikit-learn and seaborn python packages useful for this task. Fig. 1b shows a sample plot of the training loss and a sample confusion matrix. It’s important to note that your own plots could vary based on your choice of hyperparameters.7 In your report, you should include a figure that plots the training losses of all three networks together. Further, include the confusion matrix for each of the three networks on the validation set. Add a table with net name, corresponding number of parameters and classification accuracy. Finally, include your answers to the following questions: 1. Does adding padding to the convolutional layers make a difference in classification performance? 2. As you may have known, naively chaining a large number of layers can result in difficulties in training. This phenomenon is often referred to as vanishing gradient. Do you observe something like that in Net3 ? 3. Compare the classification results by all three networks, which CNN do you think is the best performer? 4. By observing your confusion matrices, which class or classes do you think are more difficult to correctly differentiate and why? 5. What is one thing that you propose to make the classification performance better?4 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. 1. Your pdf must include a description of • The figures and descriptions as mentioned in Sec. 3. 8 • Your source code. Make sure that your source code files are adequately commented and cleaned up. 2. Turn in a pdf file a typed self-contained pdf report with source code and results. Rename your .pdf file as hw4 .pdf 3. Turn in a zipped file, it should include all source code files (only .py files are accepted). Rename your .zip file as hw4 .zip .4. Do NOT submit your network weights. 5. Do NOT submit your dataset. 6. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code. 7. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests.8. Regrade requests regarding failing to follow instructions are not accepted. 9. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 9 10. To help better provide feedback to you, make sure to number your figures. References [1] COCO API – http://cocodataset.org/. URL https:// github.com/cocodataset/cocoapi. [2] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll´ar, and Ross Girshick. Segment anything, 2023. [3] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Doll´ar. Microsoft coco: Common objects in context, 2015.
The goal of this homework is to introduce you to the pieces needed to implement an image dataloader for training or testing your deep neural networks. To that end, this homework will help you familiarize yourself with the image representations as provided by PIL, Numpy, and PyTorch libraries. It will also make you more familiar with the idea of data augmentation.Upon completing this homework, you will be able to construct your own image dataloader for training deep neural networks using the torchvision library. For more information regarding the image representations and the usage of the torchvision library, you can refer to Prof. Kak’s tutorial [5].Note that this homework contains a “theory” task (Sec. 2) followed by the programming tasks (Sec. 3). 2 Understanding Pixel Value Scaling and Normalization As you know from the Week 2 lecture by Prof. Kak, image data consists fundamentally of integers in the range 0 to 255 what a neural network likes to see at its input are floating point numbers between -1.0 and 1.0. Pixel value scaling refers to mapping the integer values to the floating point (0, 1.0) range and pixel value normalization refers to a further transformation of the pixel values so that they span the floating-point (−1.0, 1.0) range.Your goal here is to compare manual pixel-value scaling using the call in Line (12) on Slide 26 with the more “automated” pixel-value scaling as provided by tvt.ToTensor as shown in Lines (15) and (16) on Slide 28. For this comparative study, create two different versions of a simulated batch of images as shown at the bottom of Slide 19, one in which the pixel values are limited to the range 0 through 32 and other in which the pixel 1 values span the full one-byte range.Your first batch would be a simulation of a color photo recorded under conditions of poor illumination. And your second batch would be more or less the same as in Slide 19. For each batch, compare the values you get with the manual approach with the values you get with the based on tvt.ToTensor and report your results. If you wish, in each case, you can follow pixel-value scaling with pixelvalue normalization using the statements shown on Slide 34. 2.1 Try it yourself Load the .npy file provided to you. You may investigate the image by checking the minimum and maximum values. Next, print the maximum value in the image. Finally, divide the given image by max value and 255.What do you observe? 3 Programming Tasks 3.1 Setting Up Your Conda Environment Before writing any code, you will first need to set up an Anaconda [1] environment, in which PyTorch and install other necessary packages. You should familiarize yourself with the basics of using conda for package management. Nonetheless, what is outlined below will help you get started: 1. A very useful cheatsheet on the conda commands can be found here [2]. 2. If you are used to using pip, execute the following to download Anaconda: sudo pip install conda For alternatives to pip, follow the instructions here [3] for installation. 3. Create your ECE60146 conda environment: conda create –name ece60146 python=3.104. Activate your new conda environment: conda activate ece60146 2 5. Install the necessary packages (e.g. PyTorch, torchvision) for your solutions conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=10.2 -c pytorch Note that the command above is specifically for a GPU-enabled installation of PyTorch version 1.10 and is only an example. Depending on your own hardware specifications and the drivers installed, the command will vary.You can find more about such commands for installing PyTorch here [4]. Most issues regarding installation can be resolved through stakoverflow solutions. While GPU capabilities are not required for this homework, you will need thm for later homeworks. 6. After you have created the conda environment and installed the all the dependencies, use the following command to export a snapshot of the package dependencies in your current conda environment: conda env export > environment.yml 7. Submit your environment.yml file to demonstrate that your conda enviroment has been properly set up. 3.2 Becoming Familiar with torchvision.transforms This task is about the Data Augmentation material on Slide 37 through 47 of the Week 2 slides on torchvision.Review those slides carefully and execute the following steps: 1. Take a photo of a Laptop on a Desk with your cellphone camera while you are standing directly in front of the sign and the camera is pointing straight at it. Alternatively, you can also take a picture of any flat surfaced object such as book, tablet, portrait, etc. placed vertically on a desk or wall. 2. Take another photo of the same object, but this time from a very oblique angle – you may either move just the camera or your entire self to create this effect. Example: I have chosen a white board leaning against a wall. The images are in figures 1a and 1b correspond to front and oblique views. 3 (a) Front image (b) Oblique image Figure 1: Example of images of a flat surfaced object to take. 3. Now experiment with applying the callable instance tvt.RandomAffine and the function tvt.functional.perspective() that are mentioned on Slides 46 and 47 of Week 2 to see if you can transform one image into the other.4. Note that for measuring the similarity between two images of the object, you can measure the distance between the two corresponding histograms, as explained on Slides 65 through 73. 5. One possible way of solving this problem is to keep trying different affine (or projective) parameters in a loop until you find the parameters that will make one image look reasonably similar to the other. 6. In your report, first plot your front and oblique images side-by-side. Subsequently, display your best transformed image, that is the most similar to the target image, using either the affine or projective parameters. Also, plot the final histograms of the images and report the Wasserstein distance . Explain in one or two paragraphs on how you have solved this task. 3.3 Creating Your Own Dataset Class Now that you have become familiar with implementing transforms using torchvision, the next step is to learn how to create a custom dataset class that is based on the torch.utils.data.Dataset class for your own images.Your custom dataset class will store the meta information about your dataset and implement the method that loads and augments your images. The code snippet below provides a minimal example of a custom dataset within the PyTorch framework: 1 import torch 4 2 3 class MyDataset ( torch . utils . data . Dataset ): 4 5 def __init__ ( self , root ): 6 super () . __init__ () 7 # Obtain meta information (e.g. list of file names ) 8 # Initialize data augmentation transforms , etc. 9 pass 10 11 def __len__ ( self ): 12 # Return the total number of images 13 # the number is a place holder only 14 return 100 15 16 def __getitem__ ( self , index ): 17 # Read an image at index and perform augmentations 18 # Return the tuple : ( augmented tensor , integer label ) 19 # these dimension numbers are for illustrative purpose only . 20 return torch .rand (( 3 , 256 , 256 ) ) , random . randint (0 , 10 ) Before proceeding, take ten images with your cellphone camera of any object, you may wish to continue with the same object. Now, store them together within a single folder. Now, based on the code snippet above, implement a custom dataset class that handles your own images. More specifically, your __getitem__ method should: 1. Read from disk the image corresponding to the input index as a PIL image. 2. Subsequently, assuming that you are using your custom dataset to train a classifier, augment your image with any three different transforms of your choice that you think will make your classifier more robust. Note that a suitable transform could be either color-related or geometry-related. Note that you should use tvt.Compose to chain your augmentations into a single callable instance.3. Finally, return a tuple, with the first item being the tensor representation of your augmented image and the second the class label. For now, you can just use a random integer as your class label. The code below demonstrates the expected usage of your custom dataset class: 1 # Based on the previous minimal example 2 my_dataset = MyDataset (’./ path /to/ your / folder ’) 5 3 print (len( my_dataset ) ) # 100 4 index = 5 print ( my_dataset [ index ][0].shape , my_dataset [ index ][1]) 6 # torch . Size ([3, 256 , 256 ]) 6 7 8 index = 50 9 print ( my_dataset [ index ][0].shape , my_dataset [ index ][1]) 10 # torch . Size ([3, 256 , 256 ]) 8 In your report, for at least three of your own images, plot the original version side-by-side with its augmented version. Also briefly explain the rationale behind your chosen augmentation transforms.3.4 Generating Data in Parallel For reasons that will become clear later in this class, training a deep neural network in practice requires the training samples to be fed in batches. Since calling __getitem__ will return you a single training sample, you now need to build a dataloader class that will yield you a batch of training samples per iteration. More importantly, by using a dataloader, the loading and augmentation of your training samples is done efficiently in a multi-threaded fashion. For the programming part, wrap an instance of your custom dataset class within the torch.utils.data.DataLoader class so that your images for training can be processed in parallel and are returned in batches. In your report, set your batch size to 4 and plot all 4 images together from the same batch as returned by your dataloader. Additionally, compare and discuss the performance gain by using the multi-threaded DataLoader v.s. just using Dataset. First, record the time needed to load and augment 1000 random images in your dataset (with replacement) by calling my_dataset.__getitem__ 1000 times. Then, record the time needed by my_dataloader to process 1000 random images.Note that for this comparison to work, you should set both your batch_size and num_workers to values greater than 1. You must report the times for atleast 2 different batchsizes and two different number of workers. In your report, tabulate your findings on the timings and experiment with different settings of the batch_size and num_workers parameters. 3.5 Random seed Reproducibility is crucial in deep learning to ensure consistent results. In this section, we will explore the impact of setting random seeds on the 6 behavior of data loaders. First, without setting the seed, set the batch size to 2 and plot all 2 images together from the same batch as returned by your data loader with shuffle set to true. Plot only one batch with two images and exit the batch iterator. Now, rerun the iterator. Do you see the same two images in the first iteration? Why or why not? 1 batch_size = 2 2 dataloader = DataLoader ( my_dataset , batch_size = batch_size , shuffle = True ) 3 4 # Plot the first batch of images 5 for batch in dataloader : 6 images , labels = batch 7 # Plot images ( only two images for brevity ) 8 break 9 10 # Rerun the iterator 11 for batch in dataloader : 12 images , labels = batch 13 # Check if the same two images are in the first iteration 14 break Next, at the top just below your import library statements, set your random seed to ’60146’ (See Week 2 lecture’s slides 72 and 73). Follow the previous exercise of printing the images in the first batch only in two different iterations. What do you see now?4 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. 1. Turn in a zipped file, it should include (a) a typed self-contained pdf report with source code and results and (b) source code files (ONLY .py files are accepted) (c) .yaml file of your conda environment Rename your .zip file as hw2 .zip and follow the same file naming convention for your pdf report too. 2. For all homeworks, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code.3. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous 7 submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 4. The sample solutions from previous years are for reference only. Your code and final report must be your own work. 5. Your pdf must include a description of • Your explanation to the theory question as described in Sec.2. • Your observations in Sec. 2.1 • The various plots and descriptions as instructed by the subsections in Sec. 3 and 3.5. • Your source code. Make sure that your source code files are adequately commented and cleaned up. • To help better provide feedback, make sure to number your figures, tables and refer them accordingly in your reports. References [1] Anaconda, . URL https://www.anaconda.com/. [2] Conda Cheat Sheet, . URL https://docs.conda.io/projects/conda/ en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/condacheatsheet.pdf. [3] Conda Installation, . URL https://conda.io/projects/conda/en/ latest/user-guide/install/index.html. [4] Installing Previous Versions of PyTorch. URL https://pytorch.org/ get-started/previous-versions/. [5] Torchvision and Random Tensors. URL https://engineering. purdue.edu/DeepLearn/pdf-kak/Torchvision.pdf. 8
The goal of this homework is to improve your understanding of the Python Object-Oriented (OO) code in general, especially with regard to how it is used in PyTorch. This is the only homework you will get on general Python OO programming.Future homework assignments will be specific to using PyTorch classes directly or your own extensions of those classes for creating your DL solutions. Note that you should use Python 3.x and NOT Python 2.x for this and all future programming assignments. For the Python-related knowledge required in this homework, refer to Prof. Kak’s tutorial on OO Python [1].1. Create a class named Sequence with an instance variable named array as shown below: 1 class Sequence ( object ): 2 def __init__ ( self , array ): 3 self . array = array The input parameter array is expected to be a list of numbers, e.g. [ 0, 1, 2]. This class will serve as the base class for the subclasses later in this assignment. 2. Now, extend your Sequence class into a subclass called Arithmetic , with its __init__ method taking in two input parameters: start and step. These two values will serve as the start and step of the Arithmetic sequence. 13. Further expand your Arithmetic class to make its instances callable. More specifically, after calling an instance of the Arithmetic class with an input parameter length, the instance variable array should store a Arithmetic sequence of that length and with start as your initial value and increments of step. In addition, calling the instance should cause the computed sequence to be printed. Shown below is a demonstration of the expected behaviour described so far: 1 AS = Arithmetic ( start =1 , step =2 ) 2 AS ( length =5 ) # [1, 3, 5, 7, 9]4. Modify your class definitions so that your Sequence instance can be used as an iterator. For example, when iterating through an instance of Arithmetic, the numbers should be returned one-by-one. The snippet below illustrates the expected behavior: 1 AS = Arithmetic ( start =1 , step =2 ) 2 AS ( length =5 ) # [1, 3, 5, 7, 9] 3 print (len( AS ) ) # 5 4 print ([n for n in AS]) # [1, 2, 3, 5, 8]5. Make another subclass of the Sequence class named Geometric. As the name suggests, the new class is identical to Arithmetic except that the array now stores a series. Modify the class definition so that its instance is callable and can be used as an iterator. What is shown below illustrates the expected behavior: 1 GS = Geometric ( start =1 , ratio =2 ) 2 GS ( length =8 ) # [1, 2, 4, 8, 16 , 32 , 64 , 128] 3 print (len( GS ) ) # 8 4 print ([n for n in GS]) # [1, 2, 4, 8, 16 , 32 , 64 , 128] 6. Finally, modify the base class Sequence such that two sequence instances of the same length can be compared by the operator == . Invoking (A == B) should compare element-wise the two arrays and return the number of elements in A that are equal than the corresponding elements in B.If the two arrays are not of the same size, your code should throw a ValueError exception. Shown below is an example: 1 AS = Arithmetic ( start =1 , step =2 ) 2 AS ( length =5 ) # [1, 3, 5, 7, 9] 3 GS = Geometric ( start =1 , ratio =2 ) 2 4 GS ( length =5 ) # [1, 2, 4, 8, 16] 5 print ( FS == GS ) # 1 6 7 GS ( length =8 ) # [1, 2, 4, 8, 16 , 32 , 64 , 128] 8 9 print ( FS == GS ) # will raise an error 10 # Traceback ( most recent call last ): 11 # … 12 # ValueError : Two arrays are not equal in length !3 Submission Instructions Include a typed report explaining how did you solve the given programming tasks. You may refer to previous homeworks for an outline. 1. Turn in a zipped file, it should include (a) a typed self-contained pdf report with source code and results and (b) source code files (only .py files are accepted). Rename your .zip file as hw1 .zip and follow the same file naming convention for your pdf report too. Not adhering to the above naming convention will lead to an automatic zero. 2. For this homework, you are encouraged to use .ipynb for development and the report. If you use .ipynb, please convert it to .py and submit that as source code. Do NOT submit .ipynb notebooks.3. You can resubmit a homework assignment as many times as you want up to the deadline. Each submission will overwrite any previous submission. If you are submitting late, do it only once on BrightSpace. Otherwise, we cannot guarantee that your latest submission will be pulled for grading and will not accept related regrade requests. 4. The sample solutions from previous years are for reference only. Your code and final report must be your own work.5. Your pdf must include a description of • Reproductions of the outputs for each of the provided snippets above with the given parameters. • Correct outputs for each of the provided snippets above with input parameters of your choice. 3 • Your source code. Make sure that your source code files are adequately commented and cleaned up. References [1] Python OO for DL. URL https://engineering.purdue.edu/ DeepLearn/pdf-kak/PythonOO.pdf.
In this assignment, you will work with a simple web crawler to measure aspects of a crawl, study the characteristics of the crawl, download web pages from the crawl and gather webpage metadata, all from pre-selected news websites.To begin we will make use of an existing open source Java web crawler called crawler4j. This crawler is built upon the open source crawler4j library which is located on github. For complete details on downloading and compiling see Also see the document “Instructions for Installing Eclipse and Crawler4j” located on the Assignments web page for help. Note: You can use any IDE of your choice. But we have provided installation instructions for Eclipse IDE onlyYour task is to configure and compile the crawler and then have it crawl a news website. In the interest of distributing the load evenly and not overloading the news servers, we have pre-assigned the news sites to be crawled according to your USC ID number, given in the table below. The maximum pages to fetch can be set in crawler4j and it should be set to 20,000 to ensure a reasonable execution time for this exercise. Also, maximum depth should be set to 16 to ensure that we limit the crawling.You should crawl only the news websites assigned to you, and your crawler should be configured so that it does not visit pages outside of the given news website! USC ID ends with News Sites to Crawl Ne wsS ite Na me Root URL 01~20 NY Times nytimes https://www.nytimes.com 21~40 Wall Street Journal wsj https://www.wsj.com 41~60 Fox News foxnews https://www.foxnews.com 61~80 USA Today usatoday https://www.usatoday.com 81~00 Los Angeles Times latimes https://www.latimes.com Limit your crawler so it only visits HTML, doc, pdf and different image format URLs and record the meta data for those file types 2Your primary task is to enhance the crawler so it collects information about: 1. the URLs it attempts to fetch, a two column spreadsheet, column 1 containing the URL and column 2 containing the HTTP/HTTPS status code received; name the file fetch_NewsSite.csv (where the name “NewsSite” is replaced by the news website name in the table above that you are crawling). The number of rows should be no more than 20,000 as that is our pre-set limit. Column names for this file can be URL and Status 2. the files it successfully downloads, a four column spreadsheet, column 1 containing the URLs successfully downloaded, column 2 containing the size of the downloaded file (in Bytes, or you can choose your own preferred unit (bytes,kb,mb)), column 3 containing the # of outlinks found, and column 4 containing the resulting content-type; name the file visit_NewsSite.csv; clearly the number of rows will be less than the number of rows in fetch_NewsSite.csv 3. all of the URLs (including repeats) that were discovered and processed in some way; a two column spreadsheet where column 1 contains the encountered URL and column two an indicator of whether the URL a. resides in the website (OK), or b. points outside of the website (N_OK). (A file points out of the website if its URL does not start with the initial host/domain name, e.g. when crawling USA Today news website all inside URLs must start with .)Name the file urls_NewsSite.csv. This file will be much larger than fetch_*.csv and visit_*.csv. For example for New York Times- the URL and the URL are both considered as residing in the same website whereas the following URL is not considered to be in the same website, http://store.nytimes.com/ Note1: you should modify the crawler so it outputs the above data into three separate csv files; you will use them for processing later; Note2: all uses of NewsSite above should be replaced by the name given in the column labeled NewsSite Name in the table on page 1.Note 3: You should denote the units in size column of visit.csv. The best way would be to write the units that you are using in column header name and let the rest of the size data be in numbers for easier statistical analysis. The hard requirement is only to show the units clearly and correctly. Based on the information recorded by the crawler in the output files above, you are to collate the following statistics for a crawl of your designated news website: ● Fetch statistics: o # fetches attempted: The total number of URLs that the crawler attempted to fetch. This is usually equal to the MAXPAGES setting if the crawler reached that limit; less if the website is smaller than that. o # fetches succeeded: The number of URLs that were successfully downloaded in their entirety, i.e. returning a HTTP status code of 2XX. o # fetches failed or aborted: The number of fetches that failed for whatever reason, including, but not limited to: HTTP 3 redirections (3XX), client errors (4XX), server errors (5XX) and other network-related errors.1 ● Outgoing URLs: statistics about URLs extracted from visited HTML pages o Total URLs extracted: The grand total number of URLs extracted (including repeats) from all visited pages o # unique URLs extracted:The number of unique URLs encountered by the crawler o # unique URLs within your news website: The number of unique URLs encountered that are associated with the news website, i.e. the URL begins with the given root URL of the news website, but the remainder of the URL is distinct o # unique URLs outside the news website: The number of unique URLs encountered that were not from the news website. ● Status codes: number of times various HTTP status codes were encountered during crawling, including (but not limited to): 200, 301, 401, 402, 404, etc.● File sizes: statistics about file sizes of visited URLs – the number of files in each size range (See Appendix A). o 1KB = 1024B; 1MB = 1024KB ● Content Type: a list of the different content-types encountered These statistics should be collated and submitted as a plain text file whose name is CrawlReport_NewsSite.txt, following the format given in Appendix A at the end of this document. Make sure you understand the crawler code and required output before you commence collating these statistics. For efficient crawling it is a good idea to have multiple crawling threads. You are required to use multiple threads in this exercise. crawler4j supports multi-threading and our examples show setting the number of crawlers to seven (see the line in the code int numberOfCrawlers = 7;).However, if you do a naive implementation the threads will trample on each other when outputting to your statistics collection files. Therefore you need to be a bit smarter about how to collect the statistics, and crawler4j documentation has a good example of how to do this. See both of the following links for details: and https://github.com/yasserg/crawler4j/blob/master/crawler4j-examples/crawler4j-examplesbase/src/test/java/edu/uci/ics/crawler4j/examples/localdata/LocalDataCollectorCrawler.java All the information that you are required to collect can be derived by processing the crawler output. 5. FAQ Q: For the purposes of counting unique URLs, how to handle URLs that differ only in the query string? For example: https://www.nytimes.com/page?q=0 and https://www.nytimes.com/page?q=1 1 Based purely on the success/failure of the fetching process. Do not include errors caused by difficulty in parsing content after it has already been successfully downloaded. 4 A: These can be treated as different URLs. Q: URL case sensitivity: are these the same, or different URLs? https://www.nytimes.com/foo and https://www.nytimes.com/FOO A: The path component of a URL is considered to be case-sensitive, so the crawler behavior is correct according to RFC3986.Therefore, these are different URLs. The page served may be the same because: ● that particular web server implementation treats path as case-insensitive (some server implementations do this, especially windows-based implementations) ● the web server implementation treats path as case-sensitive, but aliasing or redirect is being used. This is one of the reasons why deduplication is necessary in practice. Q: Attempting to compile the crawler results in syntax errors. A: Make sure that you have included crawler4j as well as all its dependencies. Also check your Java version; the code includes more recent Java constructs such as the typed collection List which requires at least Java 1.5.0.Q: I get the following warnings when trying to run the crawler: log4j: WARN No appenders could be found for logger log4j: WARN Please initialize the log4j system properly. A: You failed to include the log4j.properties file that comes with crawler4j. Q: On Windows, I am encountering the error: Exception_Access_Violation A: This is a Java issue. See: Q: I am encountering multiple instances of this info message: INFO [Crawler 1] I/O exception (org.apache.http.NoHttpResponseException) caught when processing request: The target server failed to respond INFO [Crawler 1] Retrying request A: If you’re working off an unsteady wireless link, you may be battling network issues such as packet losses – try to use a better connection. If not, the web server may be struggling to keep up with the frequency of your requests.As indicated by the info message, the crawler will retry the fetch, so a few isolated occurrences of this message are not an issue. However, if the problem repeats persistently, the situation is not likely to improve if you continue hammering the server at the same frequency. Try giving the server more room to breathe: 5 /* * Be polite: Make sure that we don’t send more than * 1 request per second (1000 milliseconds between requests). */ config.setPolitenessDelay(2500); /* * READ ROBOTS.TXT of the website – Crawl-Delay: 10 * Multiply that value by 1000 for millisecond value */ Q: The crawler seems to choke on some of the downloaded files, for example: java.lang.StringIndexOutOfBoundsException: String index out of range: -2 java.lang.NullPointerException: charsetName A: Safely ignore those. We are using a fairly simple, rudimentary crawler and it is not necessarily robust enough to handle all the possible quirks of heavy-duty crawling and parsing.These problems are few in number (compared to the entire crawl size), and for this exercise we’re okay with it as long as it skips the few problem cases and keeps crawling everything else, and terminates properly – as opposed to exiting with fatal errors. Q: While running the crawler, you may get the following error: SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See for further details. A. Download slf4j-simple-1.7.25.jar from add this as an external JAR to the project in the same way as the crawler-4j JAR will make the crawler display logs now. Q: What should we do with URL if it contains comma ? A: Replace the comma with “-” or “_”, so that it doesn’t throw an error. Q: Should the number of 200 codes in the fetch.csv file have to exactly match with the number of records in the visit.csv? A: No, but it should be close, like within 2,000 of 20,000. If not then you may be filtering too much. Q: “CrawlConfig cannot be resolved to a type” ? A: import edu.uci.ics.crawler4j.crawler.CrawlConfig; make sure the external jars are added to ClassPath.ModulePath only contains the JRE. If it doesn’t work, check if standard JRE imports are working. Or using an alternative way: Using maven. Initialize a new project using maven and 6 then just add a crawler4j dependency in the pom.xml file (auto-generated by maven). The dependency is given in Crawler4j github page. Q: What’s the difference between aborted fetches and failed fetches? A: failed: Can be due to HTTP errors and other network related errors aborted: Client decided to stop the fetching. (ex: Taking too much time to fetch) You may sum up both the values and provide the combined result in the write up. Q: For some reason my crawler attempts 19,999 fetches, even though max pages is set to 20,000, does this matter? A: No, it doesn’t matter. It can occur because 20,000 is the limit that you will try to fetch (it may contain successful status code like 200 and other like 301). But the visit.csv will contain only the URL’s for which you are able to successfully download the files. Q: How to differentiate fetched pages and downloaded pages? A: In this assignment we do not ask you to save any of the downloaded files to the disk. Visiting a page means crawler4j processing a page (it will parse the page and extract relevant information like outgoing URLs ).That means all visited pages are downloaded. You must make sure that your crawler crawls both http and https pages of the given domain Q: How much time should it approximately take to crawl a website using n crawlers? A: (i) Depends on your parameters set for the crawler (ii) Depends on the politeness you set in the crawler program Your crawl time in hours = maxPagesToFetch / 3600 * politeness delay in seconds Example: a 20,000 page fetch with a politeness delay of 2 seconds will take 11.11 hours. That is assuming you are running enough threads to ensure a page fetch every 2 seconds. Therefore, it can vary for everyone. Q: For the third CSV file, urls_NewSite.csv, should the discovered URLs include redirect URLs? A: YES, if the redirect URL is the one that gets status code 300, then the URL that redirects the URL to point to will be added to the scheduler of the crawler and waits to be visited. Q: When the URL ends with “/”, what needs to be done? A: You should filter using content type. Please have a peek into Crawler 4j code located at You will get a hint on how to know the content type of the page, even if the extension is not explicitly mentioned in the URL Q: Eclipse keeps crashing after a few minutes of running my code. But when I reduce the no of pages to fetch, it works fine. A: Increase heap size for eclipse using this. Q: What if a URL has an unknown extension? A: Please check the content type of the page if it has an unknown extension Q: Why do some links return True in shouldVisit() but cannot be visited by Visit()? A: shouldVisit() function is used to calculate whether the page should be visited or not. It may or may not be a visitable page. 7 For example – If you are crawling the site http://viterbi.usc.edu/, the page http://viterbi.usc.edu/mySamplePage.html should be visited. but this page may return a 404 Not Found Error or it may be redirected to some other site like http://mysamplesite.com. In this case, shouldVisit() function would return true because the page should be visited but visit() will not be called because the page cannot be visited. Comment: has details on regular expressions that you need to take care. Comment: Since many newspaper websites dump images and other types of media on CDN, your crawl may only encounter html files. That is fine. Comment: File types css,js,json and others should not be visited. E.g. you can add .json to your pattern filter. If the extension does not appear, use !page.getContentType().contains(“application.json”) Comment: Some sites may have less than the 20,000 pages, but as long as the formula matches. i,e # fetches attempted = # fetches succeeded + # fetches aborted + # fetches failed your homework is ok. However, the variation should not be more than 10% away from the limit as it is an indication that something is wrong. Scenario: My visit.csv file has about 15 URLs lesser than the number of URLs with status code 200. It is fine if the difference is less than 10%. Comment: the homework description states that you only need to consider HTML, doc, pdf and different image format URLs . But you should also consider URL’s with no extension as they may return a file of one of the above types. Comment: The distinction between failed and aborted web pages. failed: Can be due to content not found, HTTP errors or other network related errors aborted: the client (the crawler) decided to stop the fetching. (ex: Taking too much time to fetch). You may sum up both the values and provide the combined result in the write up. Q: In the visit_NewsSite.csv, do we also need to chop “charset=utf-8” from content-type? Or just chop “charset=urf-8” in the report? A: You can chop Encoding part(charset=urf-8) in all places.Q: REGARDING STATISTICS A: #unique URLs extracted = #unique URLs within + #unique URLs outside #total urls extracted is the sum of #outgoing links. #total urls extracted is the sum of all values in column 3 of visit.csv For text/html files, find the number of out links. For non-text/html files, the number should be 0. Q: How to handle pages with NO Extension 8 A: Use getContentType() in visit() and don’t rely just on extension. If the content type returned is not one of the required content types for the assignment, you should ignore it for any calculation of the statistics. This will probably result in more rows in visit.csv, but it’s acceptable according to the grading guidelines. Q: Clarification on “the URLs it attempts to fetch” A: “The URLs it attempts to fetch” means all the URLs crawled from start seed which reside in the news website and has the required media types. Note #1: Extracted urls do not have to be added to visit queue. Some of them which satisfy a requirement (e.g : content type, domain, not duplicate) will be added to visit queue. But others will be dumped by the crawler. However, as long as the grading guideline is satisfied, we will not deduct points.Note#2: : 303 could be considered aborted. 404 could be considered failed. To summarize: we consider a request to be aborted if the crawler decides to terminate that request. Client-side timeout is an example. Requests can fail due to reasons like content not found, server errors, etc. Note#3: Fetch statistics: # fetches attempted: The total number of URLs that the crawler attempted to fetch. This is usually equal to the MAXPAGES setting if the crawler reached that limit; less if the website is smaller than that. # fetches succeeded: The number of URLs that were successfully downloaded in their entirety, i.e. returning a HTTP status code of 2XX. # fetches failed or aborted: The number of fetches that failed for whatever reason, including, but not limited to: HTTP redirections (3XX), client errors (4XX), server errors (5XX) and other network-related errors. Note#4: Consider fetches failed and aborted as same similar to as mentioned in Note#3 Note#5: Hint on crawling pages other than html Look for how to turn ON the Binary Content in Crawling in crawler4j. Make sure you are not just crawling the html parsed data and not the binary data which includes file types other than html. Search on the internet on how to crawl binary data and I am sure you will get something on how to parse pages other than html types. There will be pages other than html in almost every news site so please make sure you crawl them properly. Q: Regarding the content type in visit_NewsSite.csv, should we display “text/html;charset=UTF8” or chop out the encoding and write “text/html” in the Excel sheet ?A: ONLY TEXT/HTML, ignore rest. Q: Should we limit the URLs that the crawler attempted to fetch within the news domain? e.g. if we encounter we should skip fetching by adding constraints in “shouldVisit()”? But do we need to include it in urls_NewsSite.csv? A: Yes, you need to include every encountered url in urls_NewsSite.csv. 9 Q: All 3xx,4xx, 5xx should be considered as aborted? A: YES Q: Are “cookie” domains considered as an original newsite domain ? A: NO, they should not be included as part of the newsite you are crawling.For details see https://web.archive.org/web/20200418163316/https://www.mxsasha.eu/blog/2014/03/04/definiti ve-guide-to-cookie-domains/ Q. More about statistics A: visit.csv will contain the urls which are succeeded i.e. 200 status code with known/ allowed content types. Fetch.csv will include all the urls which are been attempted to fetch i.e. with all the status codes. fetch.csv entries will be = visit.csv entries (with 2xx status codes) + entries with status codes other than 2XX visit.csv = entries with 2XX status codes. Also, you should not and it is not necessary to use customized status code. Just use the status code what the webpage returns to you. (Note:-> fetch.csv should have urls from news site domain only) Q: do we need to check content-type for all the extracted URLs, i.e. url.csv or just for visited URLs, e.g. those in visit.csv? A: only those in visit_NewsSite.csv Q: How to get the size of the downloaded file? A: It will be the size of the page. Ex – for an image or pdf, it will be the size of the image or the pdf, for the html files, it will be the size of the file. The size should be in bytes (or kb, mb etc.). (page.getContentData().length) Q: Change logging level in crawler4j? A: If you are using the latest version of Crawler4j, logging can be controlled through logback.xml. You can view the github issue thread for knowing more about the logback configurations – . Q: Crawling urls only yield text/html. I have only filtered out css|js|mp3|zip|gz, But all the visited urls have return type text/html is it fine? Or is there a problem? A: It is fine. Some websites host their asset files (images/pdfs) on another CDN, and the URL for the same would be different from www.newssite.com, so you might only get html files for that news site. Q: Eclipse Error: Provider class org.apache.tika.parser.external.CompositeExternalParser not in module I’m trying to follow the guide and run the boiler plate code, but eclipse gives this error when I’m trying to run the copy pasted code from the installation guide A: Please import crawler4j jars in ClassPath and not ModulePath, while configuring the build in Eclipse. Q: Illegal State Exception Error 10 A: 1) if you are using a newest java version, I would downgrade to 8 to use, there are some sort of a similar issue with the newest java version. 2) carefully follow the instructions on Crawler4jinstallation.pdf 3) make sure to add the jar file to the CLASS PATH 4) if any module is missing, download from google and add to the prj class path. Q: /data/crawl error Exception in thread “main” java.lang.Exception: couldn’t create the storage folder: /data/crawl does it already exist ? at edu.uci.ics.crawler4j.crawler.CrawlController.(CrawlController.java:84) at Controller.main(Controller.java:20) A: Replace the path /data/crawl in the Controller class code with a location on your machine Q: Do we need to remove duplicate urls in fetch.csv (if exists)? A: Crawler4j already handles duplication checks so you don’t have to handle it. It doesn’t crawl pages that have already been visited. Q: Error in Controller.java- “Unhandled exception type Exception” A: Make sure Exception Handling is taken care of in the code. Since CrawlController class throws exception, so it needs to be handled inside a try-catch block. Q: Crawler cannot stop – when I set maxFetchPage to 20000, my script cannot stop and keeps running forever. I have to kill it by myself. However, it looks like that my crawler has crawled all the 20000 pages but just cannot end. A:Set a reasonable maxDepthofCrawling, Politeness Delay, setSocketTimeout(), and Number of crawlers in the Controller class, and retry. Also ensure there are no System.out.print() statements running inside the Crawler code. Q: If you are in countries that have connection problems. A: We would suggest you to visit https://itservices.usc.edu/vpn/ for more information. Enable the VPN, clear the cache, restart the computer should help solve the problem.6. Submission Instructions ● Save your statistics report as a plain text file and name it based on the news website domain names assigned below: USC ID ends with Site 01~20 CrawlReport_nytimes.txt 21~40 CrawlReport_wsj.txt 41~60 CrawlReport_foxnews.txt 61~80 CrawlReport_usatoday.txt 11 81~00 CrawlReport_latimes.txt ● Also include the output files generated from your crawler run, using the extensions as shown above: o fetch_NewsSite.csv o visit_NewsSite.csv ● Do NOT include the output files o urls_NewsSite.csv where _NewSite should be replaced by the name from the table above. ● Do not submit Java code or compiled programs; it is not required. ● Compress all of the above into a single zip archive and name it: crawl.zip Use only standard zip format. Do NOT use other formats such as zipx, rar, ace, etc. For example the zip file might contain the following three files: 1. CrawlReport_nytimes.txt, (the statistics file) 2. fetch_nytimes.csv 3. visit_nytimes.csv ● Please upload your homework to your Google Drive CSCI572 folder, in the subfolder named hw2 Appendix A Use the following format to tabulate the statistics that you collated based on the crawler outputs.Note: The status codes and content types shown are only a sample. The status codes and content types that you encounter may vary, and should all be listed and reflected in your report. Do NOT lump everything else that is not in this sample under an “Other” heading. You may, however, exclude status codes and types for which you have a count of zero. Also, note the use of multiple threads. You are required to use multiple threads in this exercise. CrawlReport_NewsSite.txt 12 Name: Tommy Trojan USC ID: 1234567890 News site crawled: nytimes.com Number of threads: 7 Fetch Statistics ================ # fetches attempted: # fetches succeeded: # fetches failed or aborted: Outgoing URLs: ============== Total URLs extracted: # unique URLs extracted: # unique URLs within News Site: # unique URLs outside News Site: Status Codes: ============= 200 OK: 301 Moved Permanently: 401 Unauthorized: 403 Forbidden: 404 Not Found: File Sizes: =========== < 1KB: 1KB ~
This exercise is about comparing the search results from Google versus different search engines. Many search engine comparison studies have been done. All of them use samples of data, some small and some large, so no definitive conclusions can be drawn. But it is always instructive to see how two search engines match up, even on a small data set.The process you will follow is to issue a set of queries and to evaluate how closely the results of the two search engines compare. You will compare the results from the search engine that you are assigned to with the results from Google (provided by us on the class website).To begin, the class is divided into four groups. Students are pre-assigned according to their USC ID number, as given in the table below.Note: Please stick with the assigned dataset and search engine according to your ID number. PLEASE don’t work on another dataset and later ask for an exception. USC ID ends with Query Data Set Google Reference Dataset Assigned Search Engine 00~24 100QueriesSet1 Google_Result1.json Bing 25~49 100QueriesSet2 Google_Result2.json Yahoo! 50~74 100QueriesSet3 Google_Result3.json Ask 75~99 100QueriesSet4 Google_Result4.json DuckDuckGoTHE QUERIES The queries will be given to you in a text file, one query per line. Each file contains 100 queries. These are actual queries extracted from query log files of several search engines. Here is a sample of some of the queries: The European Union includes how many countries What are Mia Hamms accomplishments Which form of government is still in place in Greece When was the canal de panama built What color is the black box on commercial airplanesNote: Some of the queries will include misspellings; you should not alter the queries in any way as this accurately reflects the type of query data that search engines have to deal with REFERENCE GOOGLE DATASETA Google Reference JSON1 file is given which contains the Google results for each of the queries in your dataset. The JSON file is structured in the form of a query as the key and a list of 10 results as the value for that key (each a particular URL representing a result). The Google 1 JSON, JavaScript object Notation is a file format used to transmit data objects consisting of key-value pairs. It is programming language independent. https://www.softwaretestinghelp.com/json-tutorial/ 2 results for a specific query are ordered as they were returned by Google. Namely the 1st element in the list represents the top result that was scraped from Google, the 2nd element represents the second result, and so on. Example: { “A two dollar bill from 1953 is worth what”: [ “http://www.antiquemoney.com/old-two-dollar-bill-value-price-guide/two-dollarbank-notes-pictures-prices-history/prices-for-two-dollar-1953-legal-tenders/”, “https://oldcurrencyvalues.com/1953_red_seal_two_dollar/”, “https://www.silverrecyclers.com/blog/1953-2-dollar-bill.aspx”, “https://www.ebay.com/b/1953-A-2-Dollar-Bill/40033/bn_7023293545”, “https://www.ebay.com/b/1953-2-US-Federal-Reserve-SmallNotes/40029/bn_71222817”, “https://coinsite.com/why-the-1953-2-dollar-bill-has-a-red-seal/”, “https://hobbylark.com/collecting/Value-of-Two-Dollar-Bills”, “https://www.quora.com/What-is-the-value-of-a-2-dollar-bill-from-1953”, “https://www.reference.com/hobbies-games/1953-2-bill-worth-c778780b24b9eb8a”, “https://treasurepursuits.com/1953-2-dollar-bill-value-whats-it-worth/” ] }DETERMINING OVERLAP AND CORRELATION Overlap: Since the Google results are taken as our baseline, it will be interesting to see how many identical results are returned by your assigned search engine, regardless of their position. Assuming Google’s results are the standard of relevance, the percentage of identical results will act as a measure of the quality of your assigned search engine.Each of the queries in your dataset should be run on your assigned search engine. You should capture the top ten results. Only the resulting URL is required. For each of the top ten results for each query you should compute an overlap score between our reference Google answer dataset and your scraped results. The output format is described ahead.Note: If you get less than 10 URLs for a particular query, you can just use those results to compare against Google results. For example: if a query gets 6 results from a search engine, just use those 6 results to compare against 10 results of Google reference dataset and produce statistics for that particular query.Note: For a given query, if the Google result has 10 URLs, but the other search engine has fewer results (e.g. 8), and there are 5 overlapping URLs, the percent overlap would be 5/10 Correlation: In statistics, Spearman’s rank correlation coefficient or Spearman’s rho, is a measure of the statistical dependence between the rankings of two variables. It assesses how well the relationship between two variables can be described. Intuitively, the Spearman correlation between two variables will be high when observations have a similar rank, and low when observations have a dissimilar rank.The rank coefficient rs can be computed using the formula 3 where, ● di is the difference in the two rankings, and ● n is the number of observationsNote: The formula above when applied to search results yields a somewhat modified set of values that can be greater than one or less than minus one. However the sign of the Spearman correlation indicates the direction of association between the two rank variables. If the rank results of one search engine is near the rank of the other, then the Spearman correlation value is positive. If the rank of one is dissimilar to the rank of the other, then the Spearman correlation value will be negative.Note: In the event that your search engine account enables personalized search, please turn this off before performing your tests. Example1.1: “Who discovered x-rays in 1885” GOOGLE RESULTS 1. https://explorable.com/wilhelm-conrad-roentgen 2. https://www.the-scientist.com/foundations/the-first-x-ray-1895-42279 4 3. https://www.bl.uk/learning/cult/bodies/xray/roentgen.html 4. https://en.wikipedia.org/wiki/Wilhelm_R%C3%B6ntgen 5. https://www.wired.com/2010/11/1108roentgen-stumbles-x-ray/ 6. https://www.history.com/this-day-in-history/german-scientist-discovers-x-rays 7. https://www.aps.org/publications/apsnews/200111/history.cfm 8. https://www.nde-ed.org/EducationResources/CommunityCollege/Radiography/Introduction/history.htm 9. https://www.dw.com/en/x-ray-vision-an-accidental-discovery-that-revolutionized-medicine/a-18833060 10. http://www.slac.stanford.edu/pubs/beamline/25/2/25-2-assmus.pdf RESULTS FROM ANOTHER SEARCH ENGINE 1. https://explorable.com/wilhelm-conrad-roentgen 2. https://www.history.com/this-day-in-history/german-scientist-discovers-x-rays 3. https://www.coursehero.com/file/p5jkhl/Discovery-of-X-rays-In-1885-Wilhem-Rontgen-while-studying-the-characteristics/ 4. http://www.nde-ed.org/EducationResources/HighSchool/Radiography/discoveryxrays.htm 5. https://www.answers.com/Q/Who_discovered_x-rays 6. https://www.aps.org/publications/apsnews/200111/history.cfm 7. https://www.answers.com/Q/Who_discovered_x-rays 8. https://www.coursehero.com/file/p5jkhl/Discovery-of-X-rays-In-1885-Wilhem-Rontgen-while-studying-the-characteristics/ 9. https://www.wired.com/2010/11/1108roentgen-stumbles-x-ray/ 10. http://time.com/3649842/x-ray/ RANK MATCHES FROM GOOGLE AND ANOTHER SEARCH ENGINE 1 AND 1 5 AND 9 6 AND 2 7 AND 6We are now ready to compute Spearman’s rank correlation coefficient. Rank Google Rank Other Srch Engine di di 2 1 1 0 0 5 9 -4 16 6 2 4 16 7 6 1 1 The sum of di 2 = 33. The value of n = 4. Substituting into the equation 1 – ( (6 * 33) / (4 * 15) ) = 1 – ( 3.3) = -2.30Even though we have four overlapping results (40% overlap), their positions in the search result list produce a negative Spearman coefficient indicating that the overlapping results are uncorrelated. Clearly the two search engines are using different algorithms for weighting and ranking the documents they determine are most relevant to the query. Moreover their algorithms are emphasizing different ranking features.Note: the value of n in the equation above refers to the number of URL matches (in this case, four) and does not refer to the original number of results (in this case, ten). Note: If n=1 (which means only one paired match), we deal with it in a different way: 1. if Rank in your result = Rank in Google result → rho=1 2. if Rank in your result ≠ Rank in Google result → rho=0Task1: Scraping results from your assigned search engine In this task you need to develop a script (computer program) that could scrape the top 10 results from your assigned search engine. You may use any language of your choice. Always incorporate random delay between 10 to 100 seconds while scraping multiple queries, else you may be blocked off by the search engine and they may not allow you to scrape results for several hours.For reference: ● https://pypi.org/project/beautifulsoup4, a python library for parsing HTML documents ● URLs for the search engines: ○ Bing: http://www.bing.com/search?q= ○ Yahoo!: http://www.search.yahoo.com/search?p= ○ Ask: http://www.ask.com/web?q= ○ DuckDuckGo: https://www.duckduckgo.com/html/?q= For each URL, you can add your query string after q= ● Selectors for various search engines, you grab links by looking for href in these selectors: ○ Bing: [“li”, attrs = {“class” : “b_algo”}] ○ Yahoo!: [“a”, attrs = {“class” : “ac-algo fz-l ac-21th lh-24”}] ○ Ask: [“div”, attrs = {“class” : “PartialSearchResults-item-title”}] ○ DuckDuckGo: [“a”, attrs = {“class” : “result__a”}]By executing this task you need to generate a JSON file which will store your results in the JSON format described above and repeated here. { Query1: [Result1, Result2, Result3, Result4, Result5, Result6, Result7, Result8, Result9, Result10], Query2: [Result1, Result2, Result3, Result4, Result5, Result6, Result7, Result8, Result9, Result10], …. Query100: [Result1, Result2, Result3, Result4, Result5, Result6, Result7, Result8, Result9, Result10] } 6 Here Result1 is the top result for that particular query. NOTE: In the JSON shown above, query string should be used as keys.Task2: Determining the Percent Overlap and the Spearman Coefficient For this task, you need to use the JSON file that you generated in Task 1 and the Google reference dataset which is provided by us and compare the results as shown in the Determining Correlation section above. The output should be a CSV file with the following information: 1. Use the JSON file that you generated in Task 1 and do the following steps on each query: 2. Determine the URLs that match with the given reference Google dataset, and their position in the search engine result list 3. Compute the percent of overlap. In Example1.1, above the percent overlap is 4/10 or 40%.4. Compute the Spearman correlation coefficient. In above Example1.1, the coefficient is -2.30. 5. Once you run all of the queries, collect all of the top ten URLs and compute the statistics, as shown in the following example:Note: The above example is a table with four columns, rows containing results for each of the queries, and averages for each of the columns. Of course the actual values above are only for demonstration purposes. The first column should contain “Query 1”, “Query 2” … “Query 100” and should not be replaced by actual queries.Points to note: ● Always incorporate a delay while scraping. We recommend that you use a random delay with a range of 10 to 100 seconds. ● You will likely be blocked off from the search engine if you do not implement some delay in your code. ● You should ignore the People Also Ask boxes and any carousels that may be included in the results. ● You should ignore Ads and scrape only organic resultsSUBMISSION INSTRUCTIONSPlease place your homework in your Google Drive CSCI572 folder that is shared with your grader, in the subfolder named hw1. You need to submit: ● JSON file generated in Task 1 while scraping your assigned search engine, call it hw1.json ● CSV file of final results after determining relevance between your assigned search engine and Google reference dataset provided by us, call it hw1.csv. Note: you need not format the numbers.● TXT file stating why the assigned search engine performed either better/worse/same as Google, call it hw1.txt. For the txt file, we are just looking for a paragraph which states how much is your assigned search engine similar to Google based on the Spearman coefficients and percent overlap. Make sure you clearly state the “average percent overlap” and the “average Spearman coefficient” over all queries, clearly in the file.SAMPLE SCRAPING PROGRAM IN PYTHON Here is a program you can use to help you get started from bs4 import BeautifulSoup from time import sleep import requests from random import randint from html.parser import HTMLParser USER_AGENT = {‘User-Agent’:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36′} class SearchEngine: @staticmethod def search(query, sleep=True): if sleep: # Prevents loading too many pages too soon time.sleep(randint(10, 100)) temp_url = ‘+’.join(query.split()) #for adding + between words for the query url = ‘SEARCHING_URL’ + temp_url soup = BeautifulSoup(requests.get(url, headers=USER_AGENT).text, “html.parser”) new_results = SearchEngine.scrape_search_result(soup) return new_results @staticmethod def scrape_search_result(soup): raw_results = soup.find_all(“SEARCH SELECTOR”) results = [] #implement a check to get only 10 results and also check that URLs must not be duplicated for result in raw_results: link = result.get(‘href’) results.append(link) return results#############Driver code############ SearchEngine.search(“QUERY”) ####################################FAQs 1. What do I need to run Python on my Windows/Mac machine? You can refer to the documentation for setup: https://docs.python.org/3.6/using/index.html We encourage you to use Python 3.6. You can find many tutorials on Google. 2. Given that Python is installed what lines of the sample program do I have to modify to get it to work on a specific search engine? In the reference code, you need to: ● Supply in query variable ● Change SEARCHING_URL and SEARCH SELECTOR as per the search engine that is assigned to you ● Implement the code that extracts only top 10 URLs and make sure that none of them is repeated ● Implement the main function3. What to do if the query does not produce ten results. You can modify the URLs to get 30 results on single page: – For bing use count=30 – http://www.bing.com/search?q=test&count=30 – For yahoo use n=30 – https://search.yahoo.com/search?p=test&n=30 – For Ask there does not appear to be a parameter which could produce n results on single page, so instead you can update the URL in such a manner which increments page number– For Ask use page=2 – https://www.ask.com/web?q=testn&page=2 – If, after trying the above hints you are unable to get 10 results for a particular query, you can just use those results to compare against Google results. For example: if a query gets 6 results from a search engine, just use those 6 results to compare against 10 results of Google reference dataset and produce statistics for that particular query. 4. Two URLs that differ only in the scheme (http versus https) can be treated as the same.5. Metrics for similar URLs: a. As browsers default to www when no host name is provided, so xyz.com is identical to www.xyz.com b. URLs that only differ in the scheme (http or https) are identical c. www.xyz.com and www.xyz.com/ – You need to remove slash(/) at the end of URL d. URLs should NOT be converted to lower case. 6. Value of rho: a. If no overlap, rho = 0 b. If only one result matches: i. if Rank in your result = Rank in Google result → rho=1 9 ii. if Rank in your result ≠ Rank in Google result → rho=0 7. Rho value may be negative a. The maximum value of rho is 1, but it may have negative values that are smaller than -1. b. How to calculate average rho? We calculate rho for each query and sum them up. Then we get the average8. Save order as JSON a. You can save the dictionary in python as JSON directly by importing json library calling json.dump(args) 9. What to do if a search engine blocks your IP: a. Try to change USER_AGENT and try again. b. Sometimes if you are hitting a URL in quick succession, it may block the IP. Put a sleep or wait after each query. c. Run the queries in batches to prevent IP ban. d. Use a different WiFi or mobile hotspot.
[1.a] (5 marks) Calculate and plot the convolution of x[n] and h[n] specified below: x[n] = ( 1 −3 ≤ n ≤ 3 0 otherwise h[n] = ( 1 −2 ≤ n ≤ 2 0 otherwise (1) [1.b] (5 marks) Calculate and plot the convolution of x[n] and h[n] specified below: x[n] = ( 1 −3 ≤ n ≤ 3 0 otherwise h[n] = ( 2 − |x| −2 ≤ n ≤ 2 0 otherwise (2)We define a system as something that takes an input signal, e.g. x(n), and produces an output signal, e.g. y(n). Linear Time-Invariant (LTI) systems are a class of systems that are both linear and time-invariant. In linear systems, the output for a linear combination of inputs is equal to the linear combination of individual responses to those inputs. In other words, for a system T, signals x1(n) and x2(n), and scalars a1 and a2, system T is linear if and only if: T[a1x1(n) + a2x2(n)] = a1T[x1(n)] + a2T[x2(n)]Also, a system is time-invariant if a shift in its input merely shifts the output; i.e. If T[x(n)] = y(n), system T is time-invariant if and only if: T[x(n − n0)] = y(n − n0)[2.a] (5 marks) Consider a discrete linear time-invariant system T with discrete input signal x(n) and impulse response h(n). Recall that the impulse response of a discrete system is defined as the output of the system when the input is an impulse function δ(n), i.e. T[δ(n)] = h(n), where: δ(n) = ( 1, if n = 0, 0, else. Prove that T[x(n)] = h(n) ∗ x(n), where ∗ denotes convolution operation. Hint: represent signal x(n) as a function of δ(n).[2.b] (5 marks) Is Gaussian blurring linear? Is it time-invariant? Make sure to include your justifications. [2.c] (5 marks) Is time reversal, i.e. T[x(n)] = x(−n), linear? Is it time-invariant? Make sure to include your justifications.Vectors can be used to represent polynomials. For example, 3rd-degree polynomial (a3x 3 + a2x 2 + a1x + a0) can by represented by vector [a3, a2, a1, a0]. If u and v are vectors of polynomial coefficients, prove that convolving them is equivalent to multiplying the two polynomials they each represent. Hint: You need to assume proper zero-padding to support the full-size convolution.The Laplace operator is a second-order differential operator in the “n”-dimensional Euclidean space, defined as the divergence (∇.) of the gradient (∇f). Thus if f is a twice-differentiable real-valued function, then the Laplacian of f is defined by: ∆f = ∇2 f = ∇ · ∇f = Xn i=1 ∂ 2 f ∂x2 i where the latter notations derive from formally writing: ∇ = ∂ ∂x1 , . . . , ∂ ∂xn .Now, consider a 2D image I(x, y) and its Laplacian, given by ∆I = Ixx+Iyy. Here the second partial derivatives are taken with respect to the directions of the variables x, y associated with the image grid for convenience. Show that the Laplacian is in fact rotation invariant.In other words, show that ∆I = Irr + Ir ′r ′, where r and r ′ are any two orthogonal directions.Hint: Start by using polar coordinates to describe a chosen location (x, y). Then use the chain rule.Using the sample code provided in Tutorial 2, examine the sensitivity of the Canny edge detector to Gaussian noise. To do so, take an image of your choice, and add i.i.d Gaussian noise to each pixel. Analyze the performance of the edge detector as a function of noise variance. Include your observations and three sample outputs (corresponding to low, medium, and high noise variances) in the report.In this question, the goal is to implement a rudimentary edge detection process that uses a derivative of Gaussian, through a series of steps. For each step (excluding step 1) you are supposed to test your implementation on the provided image, and also on one image of your own choice. Include the results in your report.Step I – Gaussian Blurring (10 marks): Implement a function that returns a 2D Gaussian matrix for input size and scale σ. Please note that you should not use any of the existing libraries to create the filter, e.g. cv2.getGaussianKernel(). Moreover, visualize this2D Gaussian matrix for two choices of σ with appropriate filter sizes. For the visualization, you may consider a 2D image with a colormap, or a 3D graph. Make sure to include the color bar or axis values.Step II – Gradient Magnitude (10 marks): In the lectures, we discussed how partial derivatives of an image are computed. We know that the edges in an image are from the sudden changes of intensity and one way to capture that sudden change is to calculate the gradient magnitude at each pixel. The edge strength or gradient magnitude is defined as: g(x, y) = |∇f(x, y)| = q g 2 x + g 2 y where gx and gy are the gradients of image f(x, y) along x and y-axis direction respectively.Using the Sobel operator, gx and gy can be computed as: gx = −1 0 1 −2 0 2 −1 0 1 ∗ f(x, y) and gy = −1 −2 −1 0 0 0 1 2 1 ∗ f(x, y)Implement a function that receives an image f(x, y) as input and returns its gradient g(x, y) magnitude as output using the Sobel operator. You are supposed to implement the convolution required for this task from scratch, without using any existing libraries.Step III – Threshold Algorithm (20 marks): After finding the image gradient, the next step is to automatically find a threshold value so that edges can be determined. One algorithm to automatically determine image-dependent threshold is as follows: 1. Let the initial threshold τ0 be equal to the average intensity of gradient image g(x, y), as defined below: τ0 = Ph j=1 Pw i=1 g(i, j) h × w where h and w are the height and width of the image under consideration.2. Set iteration index i = 0, and categorize the pixels into two classes, where the lower class consists of the pixels whose gradient magnitudes are less than τ0, and the upper class contains the rest of the pixels.3. Compute the average gradient magnitudes mL and mH of lower and upper classes, respectively.4. Set iteration i = i + 1 and update threshold value as: τi = mL + mH 25. Repeat steps 2 to 4 until |τi − τi−1| ≤ ϵ is satisfied, where ϵ → 0; take τi as final threshold and denote it by τ . Once the final threshold is obtained, each pixel of gradient image g(x, y) is compared with τ . The pixels with a gradient higher than τ are considered as edge point and is represented as white pixel; otherwise, it is designated as black. The edge-mapped image E(x, y), thus obtained is: E(x, y) = ( 255, if g(x, y) ≥ τ 0, otherwise Implement the aforementioned threshold algorithm. The input to this algorithm is the gradient image g(x, y) obtained from step II, and the output is a black and white edge-mapped image E(x, y).Step IV – Test (10 marks): Use the image provided along with this assignment, and also one image of your choice to test all the previous steps (I to III) and to visualize your results in the report. Convert the images to grayscale first.Please note that the input to each step is the output of the previous step. In a brief paragraph, discuss how the algorithm works for these two examples and highlight its strengths and/or its weaknesses.In Gaussian pyramids, the image at each level Ik is constructed by blurring the image at the previous level Ik−1 and downsampling it by a factor of 2. A Laplacian pyramid, on the other hand, consists of the difference between the image at each level (Ik) and the upsampled version of the image in the next level of the Gaussian pyramid (Ik+1).Given an image of size 2n × 2 n denoted by I0, and its Laplacian pyramid representation denoted by L0, …, Ln−1, show how we can reconstruct the original image, using the minimum information from the Gaussian pyramid. Specify the minimum information required from the Gaussian pyramid and a closed-form expression for reconstructing I0. Hint: The reconstruction follows a recursive process; What is the base case that contains the minimum information?Show that in a fully connected neural network with linear activation functions, the number of layers has effectively no impact on the network. Hint: Express the output of a network as a function of its inputs and its weights of layers.Consider a neural network that represents the following function: yˆ = σ(w5σ(w1x1 + w2x2) + w6σ(w3x3 + w4x4)) where xi denotes input variables, ˆy is the output variable, and σ is the logistic function: σ(x) = 1 1 + e −x .Suppose the loss function used for training this neural network is the L2 loss, i.e. L(y, yˆ) = (y − yˆ) 2 . Assume that the network has its weights set as: (w1, w2, w3, w4, w5, w6) = (−0.65, −0.55, 1.74, 0.79, −0.13, 0.93)[3.a] (5 marks) Draw the computational graph for this function. Define appropriate intermediate variables on the computational graph. [3.b] (5 marks) Given an input data point (x1, x2, x3, x4) = (1.2, −1.1, 0.8, 0.7) with true label of 1.0, compute the partial derivative ∂L w3 , by using the back-propagation algorithm.Indicate the partial derivatives of your intermediate variables on the computational graph. Round all your calculations to 4 decimal places. Hint: For any vector (or scalar) x, we have ∂ ∂x (||x||2 2 ) = 2x. Also, you do not need to write any code for this question! You can do it by hand.In this problem, our goal is to estimate the computation overhead of CNNs by counting the FLOPs (floating point operations). Consider a convolutional layer C followed by a max pooling layer P. The input of layer C has 50 channels, each of which is of size 12×12. Layer C has 20 filters, each of which is of size 4 × 4. The convolution padding is 1 and the stride is 2. Layer P performs max pooling over each of the C’s output feature maps, with 3 × 3 local receptive fields, and stride 1.Given scalar inputs x1, x2, …, xn, we assume: • A scalar multiplication xi .xj accounts for one FLOP. • A scalar addition xi + xj accounts for one FLOP.• A max operation max(x1, x2, …, xn) accounts for n − 1 FLOPs. • All other operations do not account for FLOPs. How many FLOPs layer C and P conduct in total during one forward pass, with and without accounting for bias?The following CNN architecture is one of the most influential architectures that was presented in the 90s. Count the total number of trainable parameters in this network. Note that the Gaussian connections in the output layer can be treated as a fully connected layer similar to F6.For backpropagation in a neural network with logistic activation function, show that, in order to compute the gradients, as long as we have the outputs of the neurons, there is no need for the inputs. Hint: Find the derivative of a neuron’s output with respect to its inputs.One alternative to the logistic activation function is the hyperbolic tangent function: tanh(x) = 1 − e −2x 1 + e −2x . • (a) What is the output range for this function, and how it differs from the output range of the logistic function? • (b) Show that its gradient can be formulated as a function of logistic function. • (c) When do we want to use each of these activation functions?In this question, we train (or fine-tune) a few different neural network models to classify dog breeds. We also investigate their dataset bias and cross-dataset performances. All the tasks should be implemented using Python with a deep learning package of your choice, e.g. PyTorch or TensorFlow.We use two datasets in this assignment. 1. Stanford Dogs Dataset 2. Dog Breed ImagesThe Stanford Dogs Dataset (SDD) contains over 20,000 images of 120 different dog breeds. The annotations available for this dataset include class labels (i.e. dog breed name) and bounding boxes. In this assignment, we’ll only be using the class labels. Further, we will only use a small portion of the dataset (as described below) so you can train your models on Colab. Dog Breed Images (DBI) is a smaller dataset containing images of 10 different dog breeds.To prepare the data for the implementation tasks, follow these steps: 1- Download both datasets and unzip them. There are 7 dog breeds that appear in both datasets: • Bernese mountain dog • Border collie • Chihuahua • Golden retriever • Labrador retriever • Pug • Siberian husky2- Delete the folders associated with the remaining dog breeds in both datasets. You can also delete the folders associated with the bounding boxes in the SDD.3- For the 7 breeds that are present in both datasets, the names might be written slightly differently (e.g. Labrador Retriever vs. Labrador). Manually rename the folders so the names match (e.g. make them both labrador retriever ).4- Rename the folders to indicate that they are subsets of the original datasets (to avoid potential confusion if you later want to use them for another project). For example, SDDsubset and DBIsubset. Each of these should now contain 7 subfolders (e.g. border collie, pug, etc.) and the names should match.5- Zip the two folders (e.g. SDDsubset.zip and DBIsubset.zip) and upload them to your Google Drive.You can find sample code working with the SDD on the internet. If you want, you are welcome to look at these examples (particularly the one linked here) and use them as your starting code or use code snippets from them. You will need to modify the code as our questions are asking you to do different tasks, which are not the same as the ones in these online examples. But using and copying code snippets from these resources is fine. If you choose to use this (or any other online example) as your starting code, please acknowledge them in your submission. We also suggest that before starting to modify the starting code, you run them as is on your data (e.g. DBIsubset) to 1) make sure your dataset setup is correct and 2) to make sure you fully understand the starter code before you start modifying it.Look at the images in both datasets, and briefly explain if you observe any systematic differences between images in one dataset vs. the other.Construct a simple convolutional neural network (CNN) for classifying the images in SDD. For example, you can construct a network as follow: • convolutional layer – 16 filters of size 3×3 • batch normalization • convolutional layer – 16 filters of size 3×3 • max pooling (2×2) • convolutional layer – 8 filters of size 3×3 • batch normalization • convolutional layer – 8 filters of size 3×3 • max pooling (2×2) • dropout (e.g. 0.5) • fully connected (32) • dropout (0.5) • softmaxIf you want, you can change these specifications; but if you do so, please specify them in your submission. Use RELU as your activation function, and cross-entropy as your cost function. Train the model with the optimizer of your choice, e.g., SGD, Adam, RMSProp, etc.Use random cropping, random horizontal flipping, and random rotations for augmentation. Make sure to tune the parameters of your optimizer for getting the best performance on the validation set. Plot the training, and test accuracy over the first 10 epochs. Note that the accuracy isdifferent from the loss function; the accuracy is defined as the percentage of images classified correctly.Train the same CNN model again; this time, with dropout. Plot the training and test accuracy over the first 10 epochs; and compare them with the model trained without dropout. Report the impact of dropout on the training and its generalization to the test set.[III.a] (15 marks) ResNet models were proposed in the “Deep Residual Learning for Image Recognition” paper. These models have had great success in image recognition on benchmark datasets. In this task, we use the ResNet-18 model for the classification of the images in the DBI dataset. To do so, use the ResNet-18 model from PyTorch, modify the input/output layers to match your dataset, and train the model from scratch; i.e., do not use the pretrained ResNet. Plot the training, validation, and testing accuracy, and compare those with the results of your CNN model.[III.b] (10 marks) Run the trained model on the entire SDD dataset and report the accuracy. Compare the accuracy obtained on the (test set of) DBI, vs. the accuracy obtained on the SDD. Which is higher? Why do you think that might be? Explain very briefly, in one or two sentences.Similar to the previous task, use the following three models from PyTorch: ResNet18, ResNet34, and ResNeXt32. This time you are supposed to use the pre-trained models and fine-tune the input/output layers on DBI training data. Report the accuracy of these finetuned models on DBI test dataset, and also the entire SDD dataset. Discuss the crossperformance of these trained models. For example, are there cases in which two different models perform equally well on the test portion of the DBI but have significant performance differences when evaluated on the SDD?Train a model that – instead of classifying dog breeds – can distinguish whether a given image is more likely to belong to SDD or DBI. To do so, first, you need to divide your data into training and test data (and possibly validation if you need those for tuning the hyperparameters of your model).You need to either reorganize the datasets (to load the images using torchvision.datasets.ImageFolder ) or write your own data loader function. Train your model on the training portion of the dataset. Include your network model specifications in the report, and make sure to include your justifications for that choice. Report your model’s accuracy on the test portion of the dataset.The Laplacian of Gaussian operator is defined as: ∇2G(x, y, σ) = ∂ 2G(x, y, σ) ∂x2 + ∂ 2G(x, y, σ) ∂y2 = 1 πσ4 x 2 + y 2 2σ 2 − 1 e − x 2+y 2 2σ2 , where the Gaussian filter G is: G(x, y, σ) = 1 2πσ2 e − x 2+y 2 2σ2The characteristic scale is defined as the scale that produces the peak value (minimum or maximum) of the Laplacian response.1. (10 marks) What scale (i.e. what value of σ) maximises the magnitude of the response of the Laplacian filter to an image of a black circle with diameter D on a white background? Justify your answer.2. (5 marks) What scale should we use if we want to instead detect a white circle of the same size on a black background?3. (10 marks) Experimentally find the value of σ that maximizes the magnitude of the response for a black square of size 100×100 pixels on a sufficiently large white background. Hint: You can simply implement this task and automatically test for a large set of samples. You may also want to first generate the samples in log-domain to accurately locate the optimal value in a large spectrum.For corner detection, we defined the Second Moment Matrix as follows: M = X x X y w(x, y) I 2 x IxIy IxIy I 2 y Let’s denote the 2×2 matrix used in the equation by N; i.e.: N = I 2 x IxIy IxIy I 2 y 1. (10 marks) Compute the eigenvalues of N denoted by λ1 and λ2? 2. (15 marks) Prove that matrix M is positive semi-definite.The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of scale-invariant feature transform (SIFT) descriptors, and shape contexts (a similar technique we have not seen in class), but differs in the sense that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. Until deep learning, HOG was one of the long-standing top representations for object detection.In this assignment, you will implement a variant of HOG. Given an input image, your algorithm will compute the HOG feature and visualize it as shown in Figure 1 (the line directions are perpendicular to the gradient to show edge alignment). Figure 1: HOG features plotted on an example image.The orientation and magnitude of the red lines represent the gradient components in a local cell. A HOG descriptor is formed at a specified image location as follows: 1. Compute image gradient magnitudes and directions over the whole image, thresholding small gradient magnitudes to zero. You should empirically set a reasonable value for the threshold for each of the input images.2. Center a cell grid (m × n) on the image. To create this grid cell, assume the grid cells are square and we have a fixed-size length for each of the cells in this grid; let us call that size τ . For example, if your image size is 1021 ×975 and τ = 8, then you will have a grid size of (m = 127) × (n = 121). You can ignore the boundary of the image that can not be fit into a grid (on either end), i. e., just consider the crop of the image that fits to the grid perfectly, which is 1016 × 968 in this example.3. For each cell, form an orientation histogram by quantizing the gradient directions and, for each such orientation bin, add the (thresholded) gradient magnitudes. This processcan be done in two steps: Imagine gradient orientations are discretized by 6 bins: [−15◦ , 15◦ ), [15◦ , 45◦ ), [45◦ , 75◦ ), [75◦ , 105◦ ), [105◦ , 135◦ ), [135◦ , 165◦ )Remember 165◦ is equivalent to −15◦ where the orientation is not directed. Now create a 3D array (m × n × 6) where in element (i, j, k) of this 3D array you will store the accumulated gradient magnitudes over all the pixels in the cell (i, j) with gradient orientations corresponding to bin k.Another approach for constructing the HOG, is to collect the number of occurrences in each bin, rather than accumulating the magnitudes of occurrences; i.e. in element (i, j, k) of the histogram, we store the number of pixels in cell (i, j) with gradient orientations corresponding to bin k Choose reasonable values for the threshold and cell size, and then visualize the resulting 3D arrays (using both approaches) on the given images similar to the quiver plot of Figure 1. Briefly, compare the two approaches by inspecting the visualizations.(15 marks)Hint: You can use any package/function for creating the visualization in Figure 1. One way to do that is to superimpose 6 quiver plots (one for each bin), generated by quiver function in matplotlib package. For the remaining tasks, you can use either approaches for constructing HOG. Make sure to explicitly mention your choice in the report.4. To account for changes in illumination and contrast, the gradient strengths must be locally normalized, which requires grouping the cells together into larger, spatially connected blocks (adjacent cells). Given the histogram of oriented gradients, you apply L2 normalization as follows:• Build a descriptor for the first block by concatenating the HOG within the block. You can use block size = 2, i.e., 2 × 2 block will contain 2 × 2 × 6 entries that will be concatenated to form one long vector. • Normalize the descriptor as follows: hˆ i = p hi P i h 2 i + e 2 where hi is the i th element of the vector and hˆ i is the normalized histogram. e is the normalization constant to prevent division by zero (e.g., e = 0.001).• Assign the normalized histogram to the first cell of a new histogram array, i.e. cell (1,1). • Move to the next block of old histogram array with stride 1 and iterate steps 1-3 above, to compute the next cell of the new histogram array.The resulting new histogram array will have the size of (m − 1) × (n − 1) × 24. Compute normalized histogram arrays for the provided images, and store them in a single line text file where the data is stored row by row (first row then second row etc.). Submit both your code and the files that are generated by your code. Please note that the file should have the same name as the image (e.g. ‘image.jpg’ → ‘image.txt’). (15 marks)In addition to the provided images, use your own camera (e.g. smartphone camera) to capture two images of the same scene, one with flash and one without flash. Convert the images to gray-scale, and down-sample the images if needed to avoid excessive computation overhead.First, compute the original HOG arrays for these two images and visualize them similar to Figure 1. (5 marks) Second, compute the normalized histogram arrays for each of these two images, and store them in two txt files as instructed earlier. (5 marks)Third, by comparing the results, argue why or why not the normalization of HOG may be beneficial. Limit your discussion to a paragraph, containing the main points. You can compare the histograms visually or you may want to define a quantifiable measure to compare the histograms for pair of with-flash/no-flash images. If you choose to visually compare, provide the details of your visualization approach for normalized HOG; alternatively, if you decide to quantitatively compare the histograms, include the function you used and your justification in the report. (20 marks)Download two images (I1 and I2) of the Sandford Fleming Building taken under two different viewing directions: • https://commons.wikimedia.org/wiki/File:University College, University of Toronto.jpg • https://commons.wikimedia.org/wiki/File:University College Lawn, University of Toronto, Canada.jpg 1. Calculate the eigenvalues of the Second Moment Matrix (M) for each pixel of I1 and I2.2. Show the scatter plot of λ1 and λ2 for all the pixels in I1 (5 marks) and the same scatter plot for I2 (5 marks). Each point shown at location (x, y) in the scatter plot, corresponds to a pixel with eigenvalues: λ1 = x and λ2 = y.3. Based on the scatter plots, pick a threshold for min(λ1, λ2) to detect corners. Illustrate detected corners on each image using the chosen threshold (10 marks). 4. Constructing matrix M involves the choice of a window function w(x, y). Often a Gaussian kernel is used. Repeat steps 1, 2, and 3 above, using a significantly different Gaussian kernel (i.e. a different σ) than the one used before. For example, choose a σ that is significantly (e.g. 5 times, or 10 times) larger than the previous one (10 marks). Explain how this choice influenced the corner detection in each of the images (10 marks).We have two images of a planar object (e.g. a painting) taken from different viewpoints and we want to align them. We have used SIFT to find a large number of point correspondences between the two images and visually estimate that at least 70% of these matches are correct with only small potential inaccuracies. We want to find the true transformation between the two images with a probability greater than 99.5%.1. (5 marks) Calculate the number of iterations needed for fitting a homography. 2. (5 marks) Without calculating, briefly explain whether you think fitting an affine transformation would require fewer or more RANSAC iterations and why.Assume a plane passing through point P⃗ 0 = [X0, Y0, Z0] T with normal ⃗n. The corresponding vanishing points for all the lines lying on this plane form a line called the horizon. In this question, you are asked to prove the existence of the horizon line by following the steps below:1. (15 marks) Find the pixel coordinates of the vanishing point corresponding to a line L, passing point P⃗ 0 and going along direction ⃗d. Hint: P⃗ = P⃗ 0 +t ⃗d are the points on line L, and ⃗p = ωx ωy ω = K P⃗ = K X0 + t dx Y0 + t dy Z0 + t dz are pixel coordinates of the same line in the image, and K = f 0 px 0 f py 0 0 1 , where f is the camera focal length and (px, py) is the principal point.2. (15 marks) Prove the vanishing points of all the lines lying on the plane form a line. Hint: all the lines on the plane are perpendicular to the plane’s normal ⃗n; that is, ⃗n . ⃗d = 0, or nx dx + ny dy + nz dz = 0Using the homogeneous coordinates: 1. (15 marks) (a) Show that the intersection of the 2D line l and l ′ is the 2D point p = l × l ′ . (here × denotes the cross product)2. (15 marks) (b) Show that the line that goes through the 2D points p and p ′ is l = p×p ′ .You are given three images hallway1.jpg, hallway2.jpg, hallway3.jpg which were shot with the same camera (i.e. same internal camera parameters), but held at slightly different positions/orientations (i.e. with different external parameters). hallway1.jpg hallway2.jpg hallway3.jpg Consider the homographies H, wexe weye we = x y 1 that map corresponding points of one image I to a second image Ie, for three cases: A. The right wall of I =hallway1.jpg to the right wall of Ie=hallway2.jpg. B. The right wall of I =hallway1.jpg to the right wall of Ie=hallway3.jpg. C. The floor of Ie=hallway1.jpg to the floor of Ie=hallway3.jpg.For each of these three cases: 1. (10 marks) Use a Data Cursor to select corresponding points by hand. Select more than four pairs of points. (Four pairs will give a good fit for those points, but may give a poor fit for other points.) Also, avoid choosing three (or more) collinear points, since these do not provide independent information. This is trickier for case C. Make two figures showing the gray-level images of I and Ie with a colored square marking each of the selected points. You can convert the image I or Ie to gray level using an RGB to grayscale function (or the formula gray = 0.2989 × R + 0.5870 × G + 0.1140 × B).2. (10 marks) Fit a homography H to the selected points. Include the estimated H in the report, and describe its effect using words such as scale, shear, rotate, translate, if appropriate. You are not allowed to use any homography estimation function in OpenCV or other similar packages.3. (10 marks) Make a figure showing the Ie image with red squares that mark each of the selected (x, e ye), and green squares that mark the locations of the estimated (x, e ye), that is, use the homography to map the selected (x, y) to the (x, e ye) space.4. (25 marks) Make a figure showing a new image that is larger than the original one(s). The new image should be large enough that it contains the pixels of the I image as a subset, along with all the inverse mapped pixels of the Ie image. The new image should be constructed as follows:• RGB values are initialized to zero, • The red channel of the new image must contain the rgb2gray values of the I image (for the appropriate pixel subset only );• The blue and green channels of the new image must contain the rgb2gray values of the corresponding pixels (x, e ye) of Ie. The correspondence is computed as follows: for each pixel (x, y) in the new image, use the homography H to map this pixel to the (x, e ye) domain (not forgetting to divide by the homogeneous coordinate), and round the value so you get an integer grid location. If this (x, e ye) location indeed lies within the domain of the Ie image, then copy the rgb2gray’ed value from that Ie(x, e ye) into the blue and green channel of pixel (x, y) in the new image. (This amounts to an inverse mapping.)If the homography is correct and if the surface were Lambertian∗ then corresponding points in the new image would have the same values of R,G, and B and so the new image would appear to be gray at these pixels.• Based on your results, what can you conclude about the relative 3D positions and orientations of the camera? Give only qualitative answers here. Also, What can you conclude about the surface reflectance of the right wall and floor, namely are they more or less Lambertian? Limit your discussion to a few sentences. (5 marks) Along with your writeup, hand in the program that you used to solve the problem. You should have a switch statement that chooses between cases A, B, C.∗ Lambertian reflectance is the property that defines an ideal “matte” or diffusely reflecting surface. The apparent brightness of a Lambertian surface to an observer is the same regardless of the observer’s angle of view. Unfinished wood exhibits roughly Lambertian reflectance, but wood finished with a glossy coat of polyurethane does not, since the glossy coating creates specular highlights. Specular reflection, or regular reflection, is the mirror-like reflection of waves, such as light, from a surface. Reflections on still water are an example of specular reflection.In tutorial 10, we learned about the mean shift and cam shift tracking. In this question, we first attempt to evaluate the performance of mean shift tracking in a single case and will then implement a small variation of the standard mean shift tracking. For both parts you can use the attached short video KylianMbappe.mp4 or, alternatively, you can record and use a short (2-3 second) video of yourself. You can use any OpenCV (or other) functions you want in this question.1. (20 marks) Performance Evaluation • Use the Viola-Jones face detector to detect the face on the first frame of the video. The default detector can detect the face in the first frame of the attached video. If you record a video of yourself, make sure your face is visible and facing the camera in the first frame (and throughout the video) so the detector can detect your face in the first frame.• Construct the hue histogram of the detected face on the first frame using appropriate saturation and value thresholds for masking. Use the constructed hue histogram and mean shift tracking to track the bounding box of the face over the length of the video (from frame #2 until the last frame). So far, this is similar to what we did in the tutorial.• Also, use the Viola-Jones face detector to detect the bounding box of the face in each video frame (from frame #2 until the last frame). • Calculate the intersection over union (IoU) between the tracked bounding box and the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis of the plot should be the frame number (from 2 until the last frame) and the y axis should be the IoU on that frame.• In your report, include a sample frame in which the IoU is large (e.g. over 50%) and another sample frame in which the IoU is low (e.g. below 10%). Draw the tracked and detected bounding boxes in each frame using different colors (and indicate which is which).• Report the percentage of frames in which the IoU is larger than 50%. • Look at the detected and tracked boxes at frames in which the IoU is small (< 10%) and report which (Viola-Jones detection or tracked bounding box) is correct more often (we don’t need a number, just eyeball it). Very briefly (1-2 sentences) explain why that might be.2. (10 marks) Implement a Simple Variation • In the examples in Tutorial 10 (and the previous part of this question) we used a hue histogram for mean shift tracking. Here, we implement an alternative in which a histogram of gradient direction values is used instead.• After converting to grayscale, use blurring and the Sobel operator to first generate image gradients in the x and y directions (Ix and Iy). You can then use cartToPolar (with angleInDegrees=True) to get the gradient magnitude and angle at each frame. You can use 24 histogram bins and [0,360] (i.e. not [0,180]) directions.• When constructing hue histograms, we thresholded saturation and value channels to create a mask. Here, you can threshold the gradient magnitude to create a mask. For example, you can mask out pixels in the region of interest in which the gradient magnitude is less than 10% of the maximum gradient magnitude in the RoI.• Calculate the intersection over union (IoU) between the tracked bounding box and the Viola-Jones detected box in each frame. Plot the IoU over time. The x axis of the plot should be the frame number (from 2 until the last frame) and the y axis should be the IoU on that frame.• In your report, include a sample frame in which the IoU is large (e.g. over 50%) and another sample frame in which the IoU is low (e.g. below 10%). Draw the tracked and detected bounding boxes in each frame using different colors (and indicate which is which). • Report the percentage of frames in which the IoU is larger than 50%.
You are required to design a prototype movie recommendation program in Python as detailed in the assignment description.You will work individually. We will NOT be using Autolab in this class for submitting assignments. This means your program will only be graded once, after the submission deadline has passed. Make sure you abide by the DCS Acadmic Integrity Policy for Programming Assignments.What to submit Write all the required functions described below in the given template file named hw1.py Fill in the function code for all the functions in this template. Note: Every function has a pass statement because Python does not allow empty functions. The pass statement does nothing, it’s simply a filler. You can delete it when you write in your code.You may implement other helper functions as needed. Make sure to test your programs on files other than the samples we have provided, to cover the various paths of logic in your code. Make sure you write ALL your test calls in the main() function. Do NOT write ANY code outside of any of the functions.When you are done writing and testing, submit ONLY the filled-in hw1.py file to Canvas. Do NOT submit a Jupyter notebook, it will not be accepted for grading. You are allowed up to 5 submissions, only the last submission will be graded.How to test your code You can test your program by calling your functions in the hw1.py file from the main() function. All test code must be in the main() function ONLY. If you write ANY code outside of any of the functions, you will lose credit.In Terminal, execute your program like this: > python hw1.pyThis was explained in class: see jan22_notes.ipynb/jan22_notes.html between cells #23 and #24 – Writing and executing standalone (outside Jupyter notebook) Python programs.Make sure the test files are in the same folder as the program. You may develop and test your code in a Jupyter notebook, but for submission you will need to move your code over to hw1.py and execute it as above to make sure it works correctly. The process of moving code from a Jupyter notebook to a Python file has been explained in class.You can run your tests on the given ratings and movies files, but testing on only these files may not be sufficient. You should make your own test files as well, to make sure that you cover the various paths of logic in your functions. You are not required to submit any of your test files.You may assume that all parameter values to your functions will be legitimate, so you are not required to check whether the parameter values are valid. In any function that requires the returned values to be sorted or ranked, ties may be broken arbitrarily between equal values.You may retain the main() function when submitting, but we will IGNORE it.For this assignment only, grading will be done by an AUTOGRADER program that will call the functions in your hw1.py, and check the returned value against the expected correct value. It will NOT call your main() function.The AUTOGRADER does not look at printed output, so anything you print in your program will be ignored. There will not be any manual inspection of code, credit is based solely on whether your functions return correct results.Partial Credit There is no partial credit for code structure, etc. Credit is given only when correct values are returned from your functions. However, each function will be tested on several cases. So for instance, if a function runs correctly on 2 out of 3 test cases, you will get full points for the 2 cases and zero for the third. (In this sense, there is partial credit for each function.)Data Input • Ratings file: A text file that contains movie ratings. Each line has the name (with year) of a movie, its rating (range 0-5 inclusive), and the id of the user who rated the movie. A movie can have multiple ratings from different users. A user can rate a particular movie only once. A user can however rate multiple movies. Here’s a sample ratings file.• Movies file: A text file that contains the genres of movies. Each line has a genre, a movie id, and the name (with year) of the movie. To keep it simple, each movie belongs to a single genre. Here’s a sample movies file.Note: A movie name includes the year, since it’s possible different movies have the same title, but were made in different years. However, no two movies will have the same name in the same year. You may assume that input files will be correctly formatted, and data types will be as expected. So you don’t need to write code to catch any formatting or data typing errors. For all computation of rating, do not round up (or otherwise modify) the rating unless otherwise specified.Implementation1. [10 pts] Write a function read_ratings_data(f) that takes in a ratings file name, and returns a dictionary. (Note: the parameter is a file name string such as “myratings.txt”, NOT a file pointer.) The dictionary should have movie as key, and the list of all ratings for it as value. For example: movie_ratings_dict = { “The Lion King (2019)” : [6.0, 7.5, 5.1], “Titanic (1997)”: [7] }2. [10 pts] Write a function read_movie_genre(f) that takes in a movies file name and returns a dictionary. The dictionary should have a one-to-one mapping from movie to genre. For example { “Toy Story (1995)” : “Adventure”, “Golden Eye (1995)” : “Action” } Watch out for leading and trailing whitespaces in movie name and genre name, and remove them before storing in the dictionary.1. [8 pts] Genre dictionary Write a function create_genre_dict that takes as a parameter a movie-to-genre dictionary, of the kind created in Task 1.2. The function should return another dictionary in which a genre is mapped to all the movies in that genre. For example: { genre1: [ m1, m2, m3], genre2: [m6, m7] }2. [8 pts] Average Rating Write a function calculate_average_rating that takes as a parameter a ratings dictionary, of the kind created in Task 1.1. It should return a dictionary where the movie is mapped to its average rating computed from the ratings list. For example: {“Spider-Man (2002)”: [3,2,4,5]} ==> {“Spider-Man (2002)”: 3.5}1. [10 pts] Popularity based In services such as Netflix and Spotify, you often see recommendations with the heading “Popular movies” or “Trending top 10”.Write a function get_popular_movies that takes as parameters a dictionary of movie-to-average rating ( as created in Task 2.2), and an integer n (default should be 10). The function should return a dictionary ( movie:average rating, same structure as input dictionary) of top n movies based on the average ratings. If there are fewer than n movies, it should return all movies in ranked order of average ratings from highest to lowest.2. [8 pts] Threshold Rating Write a function filter_movies that takes as parameters a dictionary of movie-to-average rating (same as for the popularity based function above), and a threshold rating with default value of 3.The function should filter movies based on the threshold rating, and return a dictionary with same structure as the input. For example, if the threshold rating is 3.5, the returned dictionary should have only those movies from the input whose average rating is equal to or greater than 3.5.3. [12 pts] Popularity + Genre based In most recommendation systems, genre of the movie/song/book plays an important role. Often,features like popularity, genre, artist are combined to present recommendations to a user. Write a function get_popular_in_genre that, given a genre, a genre-to-movies dictionary (as created in Task 2.1), a dictionary of movie:average rating (as created in Task 2.2), and an integer n (default 5), returns the top n most popular movies in that genre based on the average ratings. The return value should be a dictionary of movie-to-average rating of movies that make the cut. If there are fewer than n movies, it should return all movies in ranked order of average ratings from highest to lowest.Genres will be from those in the movie:genre dictionary created in Task 1.2. The genre name will exactly match one of the genres in the dictionary, so you do not need to do any upper or lower case conversion.One important analysis for content platforms is to determine ratings by genre. Write a function get_genre_rating that takes the same parameters as get_popular_in_genre above, except for n, and returns the average rating of the movies in the given genre.Write a function genre_popularity that takes as parameters a genre-to-movies dictionary (as created in Task 2.1), a movie-to-average rating dictionary (as created in Task 2.2), and n (default 5), and returns the top-n rated genres as a dictionary of genre:average rating. If there are fewer than n genres, it should return all genres in ranked order of average ratings from highest to lowest. Hint: Use the above get_genre_rating function as a helper.1. [10 pts] Read the ratings file to return a user-to-movies dictionary that maps user ID to a list of the movies they rated, along with the rating they gave. Write a function named read_user_ratings for this, with the ratings file as the parameter.For example: { u1: [ (m1, r1), (m2, r2) ], u2: [ (m3, r3), (m8, r8) ] } where ui is user ID, mi is movie, ri is corresponding rating. You can handle user ID as int or string type, but make sure you consistently use it as the same type everywhere in your code.2. [12 pts] Write a function get_user_genre that takes as parameters a user id, the user-to-movies dictionary (as created in Task 4.1 above), and the movie-to-genre dictionary (as created in Task 1.2), and returns the top genre that the user likes based on the user’s ratings. Here, the top genre for the user will be determined by taking the average rating of the movies genre-wise that the user has rated. If multiple genres have the same highest ratings for the user, return any one of genres (arbitrarily) as the top genre.3. [12 pts] Recommend 3 most popular (highest average rating) movies from the user’s top genre that the user has not yet rated. Write a function recommend_movies for this, that takes as parameters a user id, the user-to-movies dictionary (as created in Task 4.1 above), the movie-togenre dictionary (as created in Task 1.2), and the movie-to-average rating dictionary (as created in Task 2.2).The function should return a dictionary of movie-to-average rating. If fewer than 3 movies make the cut, then return all the movies that make the cut in ranked order of average ratings from highest to lowest.Given a CSV data file as represented by the sample file pokemonTrain.csv, perform the following operations on it. 1. [7 pts] Find out what percentage of “fire” type pokemons are at or above the “level” 40. (This is percentage over fire pokemons only, not all pokemons)Your program should print the value as follows (replace … with value): Percentage of fire type Pokemons at or above level 40 = … The value should be rounded off (not ceiling) using the round() function. So, for instance, if the value is 12.3 (less than or equal to 12.5) you would print 12, but if it was 12.615 (more than 12.5), you would print 13, as in: Percentage of fire type Pokemons at or above level 40 = 13 Do NOT add % after the value (such as 13%), only print the numberPrint the value to a file named “pokemon1.txt” If you do not print to a file, or your output file name is not exactly as required, you will get 0 points.2. [10 pts] Fill in the missing “type” column values (given by NaN) by mapping them from the corresponding “weakness” values.You will see that typically a given pokemon weakness has a fixed “type”, but there are some exceptions. So, fill in the “type” column with the most common “type” corresponding to the pokemon’s “weakness” value.For example, most of the pokemons having the weakness “electric” are “water” type pokemons but there are other types too that have “electric” as their weakness (exceptions in that “type”). But since “water” is the most common type for weakness “electric”, it should be filled in. In case of a tie, use the type that appears first in alphabetical order.3. [13 pts] Fill in the missing (NaN) values in the Attack (“atk”), Defense (“def”) and Hit Points (“hp”) columns as follows: a. Set the pokemon level threshold to 40. b. For a Pokemon having level above the threshold (i.e. > 40), fill in the missing value for atk/def/hp with the average values of atk/def/hp of Pokemons with level > 40.So, for instance, you would substitute the missing “atk” value for Magmar (level 44), with the average “atk” value for Pokemons with level > 40. Round the average to one decimal place.c. For a Pokemon having level equal to or below the threshold (i.e.