Answer the following questions: 1– Choose one variable, look at its distribution (mean, sd, median, min, max), or if it is categorical, create a simple table for it, and plot it with a histogram. Explain what you take away from looking at the variable. 2– Choose some continuous-ish variable, and calculate its mean and standard deviation by some grouping variable. Plot it using a box-plot. Explain what conclusion you draw from this analysis. 3– Choose two categorical-ish variables, and cross-tabulate them. Plot them using a stacked bar chart. Explain what conclusion you draw from this analysis. Student chooses one variable and looks at its distribution. 10 pts Full Marks 0 pts No Marks Student explains what they take away from looking at the variable. 10 pts Full Marks 0 pts No Marks Student chooses a continuous-ish variable, and calculates its mean and standard deviation by some grouping variable. 10 pts Full Marks 0 pts No Marks Student explains what conclusion they draw from this analysis. 10 pts Full Marks 0 pts No Marks Student chooses two categorical-ish variables, and cross-tabulates them. 10 pts Full Marks 0 pts No Marks Student explains what conclusion they draw from this analysis. 10 pts Full Marks 0 pts No Marks Overall, the quality of the student’s report was well-done, without issues, followed the directions and reflects time, effort and thought. 40 pts Full Marks 0 pts No Marks
**1– Recode 2 different variables into new categories. They can both be continuous-ish or both be nominal-ish, or one of each. Tell me what you did and explainthe variable(s).** **2– Use one (or both) of your recoded variables to do a cross-tabulation. Explain your results.** **3– Run a correlation of one variable with another variable; make all of the recodes necessary to make the correlation as easy to interpret as possible; and explain your results.** **4– Identify the most extreme cases on some variable. Interpret the results.**
1. Collect some texts. Compare them in a number of ways. 2. You will likely want to have them be “bags of words.” Prepare the text through removing upper case, white space, punctuation, and consider stemming the words, if appropriate for you purpose. 3. Generate relative word frequencies for each bag of words, and compare them to each other. 4. Articulate what differences (if any) you notice and whether this comports with a theory of why these bags of words should be similar or different. 5. Run statistical tests of association between the bags of words (correlation, cosine similarity, regression or Chi-squared), and explain what they indicate. 6. Do one more big thing– either a sentiment analysis of the bags of words; rerun your analysis but using bigrams and/or trigrams; consider the role of negation words (“not,” “no”, etc.) on your earlier analysis; run a parts of speech tagger; look at the temporal unfolding of your words; or do a topic modelling exercise. For whichever thing you choose, explain what you are doing and whatever you find makes sense in some way theoretically. 7. Extra credit: do some wordclouds of your texts Total Points: 100 Criteria Ratings Pts
Choose one (1) of these 4 choices: 1. Run a multiple multinomial logistic regression. The outcome can be truly unordered or simply ordinal. Tell me how you think your independent variables will be related to your dependent variable. Interpret your results. Compare coefficients on your X variable of interest (not all of them) across different cuts of the multinomial outcomes, as we did in class (i.e., the Z test). For extra credit, generate some predicted probabilities. Tell me what you learned about your hypothesized relationship(s) from this exercise. 2. Run a multiple Poisson regression. Illustrate that a Poisson regression (or negative binomial, or zero-inflated negative binomial) is the appropriate model to use on your dependent variable. Tell me how you think your independent variables will be related to your dependent variable and why. Interpret your results. For extra credit, generate some predicted values. Tell me what you learned about your hypothesized relationship(s) from this exercise. 3. Run a Gamma regression. Illustrate that a Gamma regression is the appropriate model to use on your dependent variable. Tell me how you think your independent variables will be related to your dependent variable and why. Interpret your results. For extra credit, generate some predicted values. Tell me what you learned about your hypothesized relationship(s) from this exercise. 4. Run a Tobit regression. Illustrate that a Tobit regression is the appropriate model to use on your dependent variable. Tell me how you think your independent variables will be related to your dependent variable and why. Interpret your results. For extra credit, generate some predicted values. Tell me what you learned about your hypothesized relationship(s) from this exercise. Total Points: 100 Criteria Ratings Pts 15 pts 15 pts 15 pts 15 pts 15 pts 25 pts Student runs a multiple Poisson regression. // Student runs a multiple multinomial logistic regression. // Student runs a multiple Gamma regression. // Student runs a multiple Tobit regression. 15 pts Full Marks 0 pts No Marks Student illustrates that a Poisson regression (or negative binomial, or zero-inflated negative binomial) – or multinomial, Gamma, Tobit – is the appropriate model to use on your dependent variable. 15 pts Full Marks 0 pts No Marks Student tells me how they think their independent variables will be related to their dependent variable and why. 15 pts Full Marks 0 pts No Marks Student interprets their results. 15 pts Full Marks 0 pts No Marks Student tells me what they learned about their hypothesized relationship(s) from this exercise. 15 pts Full Marks 0 pts No Marks Overall, the student presents a clear and well-organized lab report, with things overall having very few, if any, mistakes. 25 pts Full Marks 0 pts No Marks
Theory (10 marks) 1. Consider the training set given below for determining whether a loan application should be approved or rejected. Draw the full decision tree obtained using entropy as the impurity measure. Show all steps and calculations clearly. Compute the training error of the decision tree. Coding (40 marks) The objective is to build a decision tree model using the provided real estate dataset, which involves price prediction. You’ll need to preprocess the data, deal with data imbalance, train the model, optimize it, and evaluate its performance. The dataset to be used for this question is provided. The dataset is already split into train.csv and test.csv. Use train.csv for training and validation and test.csv for testing your model. NOTE:You can use Python libraries like NumPy, Pandas, Scikit-learn, Imbalancedlearn, Matplotlib, and Seaborn to perform the required tasks. Additionally, other libraries can be utilized if needed, and you are free to use inbuilt functions whenever applicable. Below are some links to help you gain a better understanding of Exploratory Data Analysis (EDA) https://www.geeksforgeeks.org/exploratory-data-analysis-in-python/ https://www.geeksforgeeks.org/smote-for-imbalanced-classification-with-python/ 2. Data Preprocessing and Exploratory Data Analysis (15 Marks) 1. Task 1: Understanding the Dataset: (2 Marks) (a) Provide a dataset overview, summarizing the unique values in each column. Perform a detailed statistical analysis on the numerical columns, including calculations for mean, standard deviation, minimum, maximum, and percentiles (25th, 50th, and 75th). 2. Task 2: Drop Irrelevant Columns: (1 Mark) (a) Remove the columns identified through correlation analysis with a correlation coefficient within the range of -0.1 to 0.1, as well as those that lack predictive power and do not contribute meaningfully to the target variable. Provide reasons for dropping each of these columns. 3. Task 3: Encoding Categorical Features : (2 Marks) (a) Use the label encoding technique to transform categorical columns. Discuss the impact of high cardinality on certain categorical variables and how to mitigate them. 4. Task 4: Feature Scaling: (3 Marks) (a) Scale numerical data using Standard Scaler and analyze its impact on model performance after training, particularly focusing on whether scaling affects the performance of Decision Tree models.(Analysis can be done after training). 5. Task 5: Target Variable Imbalance Detection: (4 Marks) (a) Since this is a regression problem, first analyze the distribution of the target variable, ‘Price’, by plotting it in bins of size 10. After understanding the distribution, convert the target variable into categories by creating price brackets using fixed binning. Define four fixed price categories: ‘Low’, ‘Medium’, ‘High’, and ‘Very High’, based on specified price ranges. Analyze the distribution of properties across these categories and visualize it using histograms or bar charts. Finally, discuss the level of imbalance across the different price brackets. 6. Task 6: Handling Imbalanced Data: (3 Marks) (a) Use random undersampling and random oversampling techniques to address data imbalance. Explain the benefits and limitations of each method. 3. Building Decision Tree Model ( 15 Marks) 1. Task 1: Model Training: (3 Marks) (a) Train a Decision Tree Regressor using the training data. (b) Visualize the decision tree and explain the model structure, including the depth and splitting decisions.(You can use plot tree function from sklearn.tree module) 2. Task 2: Feature Importance and Hyperparameter Tuning: (4 Marks) (a) Extract and plot the feature importances from the trained decision tree model. (b) Discuss why certain features are more important than others and whether it matches your expectations. (c) Perform any method for hyperparameter optimization (e.g., Grid Search or Randomized Search) to find the best hyperparameters for the decision tree. The focus should be on: • max depth • min samples split • min samples leaf • max features Compare the performance of the tuned model with the default one. 3. Task 3: Pruning Decison Tree: (4 Marks) (a) Prune the decision tree using pre-pruning/post-pruning techniques like minimal cost-complexity pruning. (b) Visualize and discuss the difference between the pruned and unpruned trees. 4. Task 4: Handling Overfitting: (4 Marks) (a) Use cross-validation to assess model generalization and detect overfitting. (b) Implement learning curves and evaluate overfitting by comparing training and validation errors. (c) Discuss the role of cross-validation in controlling overfitting for Decision Trees. 4. Model Evaluation and Error Analysis (10 Marks) 1. Task 1: Model Evaluation: (4 Marks) (a) Evaluate the model(use the tuned model with best parameters from previous question) on test data using appropriate regression metrics: • Mean Squared Error (MSE) • Mean Absolute Error (MAE) • R-squared (R²) (b) Report and interpret the model’s performance on both the training and test datasets. 2. Task 2 : Residual and Error Analysis: (4 Marks) (a) Analyze the residuals (difference between predicted and actual prices). (b) Visualize the residuals to check for patterns. Are there groups of data where the model consistently underperforms? If yes, then propose possible improvements for the model based on the error analysis. 3. Task 3 : Feature Importance based analysis: (2 Marks) (a) Analyze how top 3 important features affect the target variable Prices individually. Calculate the RMSE. 5. Bonus Challenge (6 Marks) 1. Task 1:Advanced Imbalance Handling (3 Marks) (a) Experiment with advanced data balancing techniques like ADASYN (Adaptive Synthetic Sampling). Compare it with SMOTE and discuss its effectiveness in handling imbalanced data. 2. Task 2:Ensemble Learning: Random Forest (3 Marks) (a) Train a Random Forest Regressor on the same dataset.Compare the performance of the Random Forest model with your Decision Tree model. Discuss the tradeoffs between using a single decision tree versus an ensemble of trees in Random Forest.
Theory (30 marks) 1. (10 marks) Based on the following dataset answer the below questions: • About 80.0% of people prefer to travel by air or train. • Of the people who prefer air travel, 20% travel for business and 30% travel for leisure. • Given that a person travels by train, the chance they are traveling for leisure is 0.400, rounded to 3 decimal places. • About 25% of people prefer to travel by car and have reported feeling stressed during their travels. • There is a 0.015 probability that a person prefers to travel by bus and has a lowstress level. • Given that a person prefers traveling by bus, the probability that the person is traveling for business is about 0.350, rounded to 3 decimal places. • The probability that a person feels stressed and prefers air travel is 0.065. • About 70% of people travel for either leisure or business. • There is a 60% chance that a person prefers air travel given that they are feeling stressed. • There is a 50% chance a person prefers train travel whether they travel for business or leisure. (a) (2.5 marks) Compare direct sampling, rejection sampling, and Gibbs sampling in the context of estimating probabilities from the given travel dataset. Compare their strengths and weaknesses. (b) (2.5 marks) Suppose you want to estimate the probability of a person traveling by train for leisure, where you know that the chance of leisure travel given train preference is 0.400. If you sample 100 people randomly and 30 of them prefer train travel, how many should you expect to accept as travelers for leisure based on this probability? Provide your calculations. (c) (2.5 marks) Given that 80% of people prefer air travel, and of those, 20% travel for business, calculate the probability that a randomly selected person prefers air travel and travels for business. Show your calculations and round to three decimal places. (d) (2.5 marks) How does increasing the sample size affect the accuracy and precision of estimates obtained through direct sampling? Discuss the implications for the given dataset. 2. (10 marks) Given the following statements below, answer the questions. Round off the probability values computed to 3 decimal places. A statement may have more than one proposition. • About 78.0% of people either read books or access academic journals regularly. • Of the people who read books, 40% also access academic journals, and 60% only read books. • Given that a person reads books, the probability that they participate in book clubs is 0.320. • About 20% of people access academic journals but do not read books. • There is a 0.090 probability that a person neither reads books nor accesses academic journals. • Given that a person does not read books, the probability that they access academic journals is 0.850. • The probability that a person participates in book clubs and accesses academic journals is 0.060. • About 60.0% of people either participate in book clubs or access academic journals. • There is a 40.0% chance that a person accesses academic journals, given that they participate in book clubs. • There is a 50.0% chance that a person accesses academic journals, whether or not they read books. (a) (2.5 marks) Identify the random variables in the statements above and write each statement using symbols for random variables, logical connectives where necessary, and conditional probability notation. (b) (2.5 marks) Verify that these propositions create a valid probability distribution. List the set of axioms that they satisfy. (c) (2.5 marks) Populate the full joint probability distribution table. (d) (2.5 marks) Use the joint distribution table and check for conditional independence between all the random variables that you have identified. 3. (10 marks) In the context of adversarial machine learning, consider a machine learning model used for image classification. Two different types of adversarial attacks can cause the model to misclassify an input: adversarial perturbations (small, imperceptible modifications to input data) and backdoor attacks (where the model is trained to misclassify inputs that contain a specific trigger). Both types of attacks can trigger a misclassification, leading to a ”misclassification alarm” being raised. Now, suppose you observe a misclassification alarm after querying the model with an input. Initially, adversarial perturbations and backdoor attacks are considered independent events. However, you come across a report that backdoor triggers have been increasingly present in recent datasets. How does this new information about the prevalence of backdoor attacks change your belief regarding the likelihood of adversarial perturbations causing the misclassification? (a) (5 marks) Formulate this problem using Bayesian inference. (b) (2.5 marks) Define the probabilities involved (prior, likelihood, and posterior). (c) (2.5 marks) Explain how conditioning on the detection of a backdoor attack (from recent reports) changes your belief about the role of adversarial perturbations in causing the misclassification. Hint: Consider how the observation of the common effect (the misclassification alarm) influences your belief about the independent causes (adversarial perturbations and backdoor attacks). Coding (60 marks) Use the provided requires.txt file to setup your environment for both the coding questions. Your code should execute without any errors in this environment; otherwise, you will not be marked. Follow the below folder structure for submission. A3 RollNumber Report.pdf HMM Question HMM Question.py roomba class.py estimated paths.csv Bayesian Question boilerplate.py test model.py dataset csv files .pkl files for all 3 models 4. (30 marks) Bayesian network for fare classification In this assignment, you will develop a Bayesian network model for fare classification using a public transportation dataset. The dataset contains information about different bus routes, stops, distances, and fare categories. Your task is to build an initial Bayesian network, then apply pruning techniques to improve the model’s efficiency, and optimize it further using structure refinement methods. All 3 models need to be evaluated on the validation set provided. The models will be tested on the private test set for final evaluation. Return the .pkl files for each model along with your report. Boilerplate code and dataset can be accessed here Dataset Features You are expected to use the following features for constructing the Bayesian network: • Start Stop ID (S): The stop ID where the journey begins. • End Stop ID (E): The stop ID where the journey ends. • Distance (D): The distance between the start and end stops. • Zones Crossed (Z): The number of fare zones crossed during the journey. • Route Type (R): The type of route taken (e.g., standard, express). • Fare Category (F): The fare category for the journey, classified as Low, Medium, or High. The objective is to classify the fare for a journey between a given start and end stop as one of the following categories: Low, Medium, or High. You will build a Bayesian network based on other features and use it to predict the fare category. Hint: use the bnlearn library imported in the boilerplate code for constructing, training and testing the bayesian network. Some examples showcasing how to use the library can be found here. Testing The code for evaluation is provided to you along with the boilerplate code in test model.py. Test your models on the validation subset and report accuracies in the assignment report. You will have to write your own code for calculating runtimes for each network construction (initialization and training). For evaluation purposes, DO NOT FORGET TO RETURN the .pkl FILES along with your code and report. Tasks 1. Task 1: Construct the initial Bayesian Network (A) for fare classification. (10 Marks) (a) Build the Bayesian network using the provided features. (b) Ensure that the structure includes dependencies between all possible feature pairs. (c) You should provide a visualization of the initial Bayesian network in the assignment report. 2. Task 2: Prune the initial Bayesian Network (A) to enhance performance. (10 Marks) (a) Apply pruning techniques such as Edge Pruning, Node Pruning, or simplifying Conditional Probability Tables (CPTs). (b) Clearly explain the pruning method applied and how it improves the model’s efficiency (time taken to fit the data) and/or prediction accuracy. (c) Provide a visualization of the pruned Bayesian Network (B) with fewer edges or simplified structure. 3. Task 3: Optimize the Bayesian Network (A) by adjusting parameters or using structure refinement methods. (10 Marks) (a) Apply optimization techniques such as structure learning (e.g., Hill Climbing) to refine the Bayesian network structure. (b) Compare the performance of the optimized Bayesian network with the initial network (A) and explain how the optimization improves the model’s accuracy and/or efficiency. (c) Provide a visualization of the optimized network. 5. (30 marks)Tracking a Roomba Using the Viterbi Algorithm Imagine you have a Roomba robotic vacuum cleaner that autonomously cleans your home while you’re away. The Roomba operates based on specific movement policies specified in the Roomba class member functions. You’ve installed sensors that provide noisy observations of the Roomba’s location at discrete time intervals. Due to sensor limitations, these observations are not always accurate. Your goal is to model the Roomba’s movement using a Hidden Markov Model (HMM) and implement the Viterbi algorithm to track its most likely path based on the noisy sensor observations. Boilerplate code can be accessed here • Environment: (Present in boilerplate code) – Your home is represented as a grid of size 10 × 10. – Possible headings the Roomba can take are: North (N), East (E), South (S), West (W). – Only obstacles are the 4 walls of your home. • Roomba Movement Policy: 1. Random Walk Policy – The Roomba takes one unit per time step in either direction. – After each step it randomly selects a new heading from the available set of directions for the next step. – It continues moving until it reaches the final destination or runs out of time. 2. Straight Until Obstacle Policy – The Roomba moves one unit per time step in its current heading unless an obstacle blocks its path. – Upon encountering an obstacle, it randomly selects a new heading from the available directions and continues moving. – The movement is deterministic unless an obstacle triggers a heading change. • Sensor Observations: – At each time step, sensors provide a noisy observation of the Roomba’s location. – Observations are modeled as the true position plus Gaussian noise with mean zero and standard deviation σ = 1.0. • Given Code: – The Roomba’s two movement policies are implemented inside the Roomba class. The path using both policies is simulated for T=50 time steps and is already provided. – Noisy observations for both policies based on the Roomba’s true positions are already calculated for you. • Tasks: (a) Model the problem as a Hidden Markov Model, specifying the state space, transition probabilities, and emission probabilities. (b) Implement the Viterbi algorithm to estimate the most likely path of the Roomba given the observations. (c) (10 marks per seed value) You can change the seed value in the setup environment() function. This will generate new observations. For at least 3 different seed values do the following: – Use given code to setup environment, get the true path and noisy observations for the specified seed value. – Estimate the Roomba’s path given the noisy observations generated previously using the Viterbi Algorithm. Save the estimated path for evaluation. – Compare the estimated path with the true path using the given evaluate viterbi() function to compute the tracking accuracy. – Analyze which policy is more accurate and why. – Plot the true path, observed positions, and estimated path using the given plot results() function. Include plots in the assignment report. (d) State your selected seed values clearly in the assignment report. Your code will be executed for multiple seed values for evaluation so your viterbi algorithm should work without any errors in the given environment. Any possible exceptions should be handled carefully. (e) Create a estimated paths.csv file with your chosen seed values and the corresponding estimated path. This will be compared later with the estimated path generated by your code for evaluation so DO NOT MAKE ANY FORMATTING CHANGES. First column should be the seed value and second column the estimated path variable.
Theory (30 marks) 1. The traffic light at the busy intersection has a dramatic life—it’s always flipping between colors, and it needs your help to formalize its behavior. Here’s what we know: • At any given moment, the traffic light is either green, yellow, or red. It’s never in more than one state at a time. • The traffic light switches from green to yellow, yellow to red, and red to green. There’s no creative jumping around in the sequence—it sticks to its routine. • The traffic light cannot remain in the same state for more than 3 consecutive cycles. It gets bored easily. Represent these rules (highlighted in bold) clearly using Propositional Logic (PL). (1+2+2) = 5 marks 2. You’ve been hired as the official color master of a strange colored graph-based world where nodes dream of standing out. Some of them have already chosen their colors, while others are still undecided and waiting for your expert guidance! Let {c1, …, ck} be a non-empty and finite set of colors. A partially colored directed graph is a structure N, R, C where • N: A non-empty set of nodes, some of whom already have colors, while others are still figuring out their fashion choices. • R: A set of directed edges representing connections between these nodes— basically, who’s linked to whom. • C: A color palette, {c1, …, ck}, that the nodes can choose from. However, not all the nodes are necessarily colored, and each node has at most one color. No outfit changes! But, like any good world, there are rules (and we all know rules make things more fun). Here’s what the nodes have agreed upon: 1. Connected nodes don’t have the same color. No node wants to be caught wearing the same outfit as their neighbor—it’s the ultimate faux pas! 2. Exactly two nodes are allowed to wear yellow. Yellow is rare, and only two nodes can pull it off. 3. Starting from any red node, you can reach a green node in no more than 4 steps. The red nodes have been told to keep a close eye on the greens. 4. For every color in the palette, there is at least one node with this color. No color should be left behind; each deserves at least one representative. 5. The nodes are divided into exactly |C| disjoint non-empty cliques, one for each color. Each color gets its own squad, and no clique is left empty. Develop a First-Order Logic (FOL) language and set of axioms that formalize these rules. Make sure to represent each of the statements (highlighted in bold) precisely and help the nodes of this graph live in harmony, following the rules of their colorful world! (10 marks) 3. Our friends in the ocean are having an intellectual argument about reading, literacy, and intelligence. Here are the statements of interest: • Whoever can read is literate (easy enough, right?) • Dolphins, unfortunately, are not literate (they’ve tried, though). • Some dolphins are intelligent (think of them as the Socrates of the sea). • Some who are intelligent cannot read (cue the dolphin groans). • There exists a dolphin who is both intelligent and can read (finally, a hero!), but for every intelligent dolphin, if it can read, it must be that it is not literate (plot twist!). Represent these statements (highlighted in bold) using both PL and FOL. Define appropriate propositional variables and predicates to capture this dolphin debate. (2.5*2=5 marks) But wait! We also need you to resolve a deep-sea conundrum: check whether the fourth and fifth statements are satisfiable using resolution refutation. Let the dolphins be your muse as you swim through logic! (5*2=10 marks) Note: to check the satisfiability of the fourth sentence use only the first three sentences and when you prove the fifth sentence, use remaining fours Computational (70 marks) Ah, Delhi buses—the mighty chariots of commuters! In this assignment, you will be building a transit data application to navigate the complex network of routes, trips, and stops. The challenge? Ensure your application can effectively handle this data, providing clear and accurate transit information. For this, you are required to use the GTFS static data from Delhi’s Open Transit Data (OTD). The dataset consists of several static data files: routes.txt, trips.txt, stop times.txt, stops.txt, and fare rules.txt. For your convenience, the attributes you will use and the relationships between these files are illustrated in Figure 1. Key terms: • Start Stop: The stop ID where your journey begins. • End Stop: The stop ID where your journey concludes. • Route: A sequence of bus stops, identified by their stop IDs, that form a specific path from the Start Stop to the End Stop. • Intermediate/Via Stop: A stop ID where the user is required to pause during the journey. This stop is essential for the route and may also serve as a point for transferring between routes. Figure 1: Static data file structure for Delhi buses • Interchange: The process of switching from one route to another at a specific stop ID. An interchange can occur at an Intermediate/Via Stop but is not limited to it. The boilerplate code can be downloaded from this link. The output of a few public test cases will be provided to you. The final scoring will be done using multiple public and private test cases. 1. Data Loading and Knowledge Base Creation (10 marks) Think of this part as preparing the bus system’s “brain”—organizing everything so that the application can later reason and plan effectively. • Load the provided OTD static data. Ensure all data is stored in well-structured Python data types. Convert data types as necessary (e.g., time as datetime objects, IDs as strings). • Set up the knowledge base (KB) for reasoning and planning tasks. For this you would need to create the following dictionaries: route to stops = {route id: [list of stop ids]}, trip to route = {trip id: [list of route ids]}, stop trip count = {stop id: count of trips stopping there}. • If your KB is correctly set up, then you should be able to answer the following: (a) Top 5 busiest routes based on the number of trips. (b) Top 5 stops with the most frequent trips. (c) The top 5 busiest stops based on the number of routes passing through them. (d) The top 5 pairs of stops (start and end) that are connected by exactly one direct route, sort them by the combined frequency of trips passing through both stops. Additionally, create a graph representation using plotly for the knowledge base you created, and use the route to stops mapping for the same. 2. Reasoning (30 marks) Now that you have organized the data, it’s time to put that knowledge to good use. Your goal is to implement a function DirectRoute(start stop, end stop) that takes a start stop and end stop as inputs and returns all the direct routes between them (i.e., no interchanges). Input Format – (start stop id, end stop id), Output Format – [list of route id’s] You are required to implement using the following methods: 1. Brute-Force Approach: Develop a straightforward reasoning algorithm from scratch. Note: The logic is procedural. We systematically outline the steps for reasoning. For example: if directRoute(x, y): adddirectRoute(x, y) else: dontAdd(x, y) 2. FOL Library-Based Reasoning: Utilize the PyDatalog library to implement it. For this, you need to create terms, define the predicates, add facts to the above knowledge base, and then define query functions. Note: The logic is declarative. We define relationships (facts) and rules, and the system infers the rest. For example: create_terms(AddToKB, ValidRoute, Fun1, Fun2, X, Y, Z) DirectRoute(X, Y)
1. Consider the search space below where an edge from node x to node y means that y can be generated from x by an operator. The edges are labeled with the actual path cost and the estimated costs to a goal (h-values) are provided inside the nodes. You are required to find the shortest path from S to goal G1 using the following algorithms: (a) A* (5) (b) Uniform cost search (5) (c) Iterative deepening A* (5) List the order of nodes expanded (not generated) by these algorithms together with the f-values and total path costs. Show detailed steps of each algorithm in a table with node expansions, frontier nodes, explored nodes (where applicable) for each algorithm. Note: Explore nodes in counter clock-wise direction (left to right) when no other criterion is specified. Also, if two nodes are at the same cost, pick nodes alphabetically for tiebreaking. 2. Consider the following game tree. Note that each square signifies a Max node and each circle is a Min node. (a) Part A • Use min-max algorithm to determine best play for both players, i.e. best moves at all levels of the tree for both players. Use arrows to represent the best moves. Show all legitimate moves. (2) • Alpha-beta pruning is a directional search algorithm, i.e. it explores children from left to right. Apply alpha-beta pruning to the given game tree. Cross-out the branches where pruning can be done and mark them with alpha or beta depending on its type. (3) (b) Part B • Best case: Rearrange the leaf nodes (and internal nodes if necessary) of the given tree so that the maximum pruning is achieved by alpha-beta pruning that explores branches from left to right. Justify your answer. (4) • Worst case: Rearrange the leaf nodes again to make the worst case for alphabeta pruning. Justify your answer. (4) (c) Part C Based on your best case, briefly explain why the best-case complexity is O(b d/2 ) , where b and d are the branching factor and look-ahead depth, respectively. (2) 3. Consider a graph G = (V, E) representing our institute (Fig 1), where V is the set of nodes and E is the set of edges. Each node u ∈ V represents a unique geographical location and each edge euv ∈ E represents a road connecting the locations u and v. The graph G is given in the form of an n × n adjacency matrix A, where Auv > 0 indicates the travel cost between locations u and v along edge euv, whereas Auv = 0 means that locations u and v are not directly connected, implying an infinite cost for direct travel between them. The locations are named as {0, · · · , n−1} and all the nodes in the graph are attributed with (x, y) denoting the latitude and longitude of the corresponding location. The dataset and the boilerplate code can be downloaded from this link. The primary objective of this programming question is to find a path (sequence of nodes traversed including u and v) from a source node u to destination v for a given graph G = (V, E) using various algorithms and compare them. The output of a few ‘public’ test cases is provided in the shared code. The final scoring will be done automatically using ‘multiple private’ test cases. For this purpose, you must not change the method definitions provided in the boilerplate code; otherwise, your submission will not be graded. (a) Implement the following uninformed search algorithms: (10 + 10 = 20) • Iterative Deepening Search • Bidirectional Breadth-First Search (b) For each public test case, analyze whether the path obtained to travel from source u to destination v using both algorithms is the same. If you find that the path obtained is identical (or different), comment whether it will always be identical (or different) for any pair of nodes in the given graph G. (2.5 + 2.5 = 5) (c) Obtain the path between all pairs of nodes using both algorithms. Compare the memory usage and time of execution for both algorithms. (2.5 + 2.5 = 5) (d) Let dist(u, v) be a function that calculates the Euclidean distance between two nodes u and v using the node attributes (x, y). Use the heuristic function given Figure 1: A graph representing IIIT Delhi below to implement the following informed search algorithms. (10 + 10 = 20) h(w) = dist(u, w) + dist(w, v) • A* Search • Bidirectional A* Search (e) Repeat the exercises (b) and (c) using both these informed search algorithms. (2.5+ 2.5 + 2.5 + 2.5 = 10) (f) Analyze the results obtained using all the uninformed and informed search algorithms. Using scatter plots, compare and contrast the efficiency (in terms of space and time) and optimality (in terms of cost of traveling) of all the algorithms. Explain how the metric to generate the scatter plots was obtained. Also, comment on the benefits and drawbacks of using informed search algorithms over uninformed ones, supported via the empirical analysis. (5 + 5 = 10) (g) (BONUS) There might be geographical locations that are only connected via one road. Removal of such a road will disconnect the geographical locations. For example, India and Sri Lanka were only connected via The Ram Sethu bridge but due to its submersion the cities in India are no more connected to that of Sri Lanka via roads. In order to reduce the vulnerability of such road networks, the minister of road transport and highways wants to construct new roads within our country. For this purpose, the ministry has asked you to develop an algorithm that can be used to identify all vulnerable roads. Specifically, for the given graph G, identify all the edges whose removal would increase the number of disconnected components. (10)
1. (12 points) CLIP (Contrastive Language-Image Pretraining) is a multi-modal deep learning model developed by OpenAI that enables zero-shot learning for vision tasks. It learns to associate images and text by training on a vast dataset of image-text pairs collected from the internet. Following CLIP, a recent work CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions improves the zero-shot vision tasks. For the following questions, you need to perform a detailed study and a comparative analysis. 1. (2 points) Refer to github repository and follow the readme to install required dependencies for CLIP. Alternatively, you can use hugging face’s transformers library for the same. 2. (1 points) Download CLIP pretrained weights (clip-vit-base-patch32) and load the CLIPModal with pretrained weights. 3. (2 points) For the given sample image of human and dog, choose any 10 random textual description and generate their similarity scores. 4. (2 points) Refer to CLIPS github repo and follow README.md to install required dependencies. 5. (1 point) Load the pretrained weights for the CLIPS-Large-14-224 model. 6. (2 points) For the previous image of human and dog, calculate the similarity scores for previous captions using CLIPS. 7. (2 point) Comment on the results obtained by both CLIP and CLIPS. 2. (5 points) Visual question answering: Visual Question Answering is the task of answering open-ended questions based on an image. They output natural language responses to natural language questions. 1. (2 points) Refer to the paper BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, and its github repository. Follow README.md to install all the dependencies and download pre-trained weights for answering visual questions. 2. (1 point) For the previous sample image of human and dog, generate an answer to the question “Where is the dog present in the image?”. 3. (1 point) For the same image, generate an answer to the question “Where is the man present in the image?”. 4. (1 point) Comment on the output and accuracy of the answer for the previous two questions. 3. (10 points) BLIP vs CLIP BLIP can also be used for generating image captions, for the following questions, you will need to do a comparative analysis between BLIP and CLIP. 1. (2 points) For BLIP, load the pretrained weights of image captioning. 2. (2 points) For given sample of images, generate a caption for each image using pretrained BLIP model. 3. (2 point) Use CLIP to evaluate the semantic accuracy of the BLIP-generated captions. Compute and interpret the similarity score between the image and the generated caption. 4. (2 point) Use CLIPS to evaluate the scores as asked in the question above. 5. (2 point) Discuss different metrics that can be used to quantify alignment between CLIP and BLIP outputs. Provide examples of when each metric would be most useful. Page 2 4. (8 points) Referring Image Segmentation (RIS) is a vision-language task where a model segments an object in an image based on a natural language description. Unlike traditional segmentation methods that rely on predefined categories, RIS understands contextual and relational descriptions (e.g., “the cat sitting on the sofa” instead of just “cat”). 1. (2 points) Refer to the LAVT paper and its github repository. Follow the README to install the required libraries and download pre-trained weights. 2. (2 points) For each of the images in the sample folder, we also provide a reference text in this file. Show the segmented image using the given references. 3. (2 points) Also plot the Y1 feature map obtained for each model in the given Figure 2 of the paper. 4. (2 points) For each of the images, provide your own reference texts and show the failure segmentation results of the given images. Show both reference text and segmentation results. 5. (5 points) Image as reference One shot segmentation using image reference. Unlike large language models that excel at directly tackling various language tasks, vision foundation models require a task-specific model structure followed by fine-tuning on specific tasks. For this question, we will use Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks. Matcher can segment anything by using an in-context example without training. 1. (2 points) Refer to the github repo for matcher. Follow the README to install the required libraries and download the necessary pre-trained weights. 2. (3 points) We will evaluate the matcher on simple images. Using the reference images show the segmentation results for each of the images provided in this folder. Each subfolder contains two images. Take one image as a reference, one at a time, and the other as the input image. So, each subfolder will give 2 segmentation results. Show the segmentation results obtained.
1. (15 points) Consider a vector (3, −1, 4)T , which undergoes the following transformations in sequence: 1. A rotation of −π/6 about the Y-axis. 2. A rotation of π/4 about the X-axis. 3. A reflection across the XZ-plane. 4. Finally, a translation by (1, 0, −2)T . (1) (4 points) Determine the overall coordinate transformation matrix (including both rotation and reflection). (2) (3 points) Compute the new coordinates of the given vector under this transformation. Additionally, determine where the origin of the initial frame of reference is mapped. (3) (4 points) Calculate the direction of the axis of the combined rotation (excluding reflection) in the original frame of reference and find the angle of rotation about this axis. (4) (4 points) Using Rodrigues’ formula, show that the rotation matrix obtained from the two rotations matches the matrix derived by direct computation. 2. (10 points) The image formation process can be summarized in the equation x = K[R|t]X, where K is the intrinsic parameter matrix, [R|t] are the extrinsic parameters, X is the 3D point and x is the image point in the homogeneous coordinate system. Consider a scenario where there are two cameras (C1&C2) with intrinsic parameters K1&K2 and corresponding image points x1&x2 respectively. Assume that the first camera frame of reference is known and is used as the world coordinate frame. The second camera orientation is obtained by a pure 3D rotation R applied to the first camera’s orientation. Show that the homogeneous coordinate representation of image points x1 and x2 of C1 and C2 respectively, are related by an equation x1 = Hx2, where H is an invertible 3×3 matrix. Find H in terms of K1, K2 & R. 3. (40 points) Camera Calibration: Refer to the following tutorials on camera calibration: Link1 and Link2. You are required to perform the camera calibration separately for both the 25 images provided to you in the dataset link and for 25 images that you will click separately. Place your camera (laptop or mobile phone) stationary on a table. Take the printout of a chessboard calibration pattern as shown in the links above and stick it on a hard, planar surface. Click ∼ 25 pictures of this chessboard pattern in many different orientations. Be sure to cover all degrees of freedom across the different orientations and positions of the calibration pattern. Make sure that each image fully contains the chessboard pattern. Additionally, the corners in the chessboard pattern should be detected automatically and correctly using appropriate functions in the OpenCV library. Include the final set of images that you use for the calibration in your report. For the dataset provided, submit a JSON file containing the estimated intrinsic parameters, extrinsic parameters and radial distortion coefficients. The JSON file should follow the specified format given here. 1. (5 points) Report the estimated intrinsic camera parameters, i.e., focal length(s), skew parameter and principal point along with error estimates if available. 2. (5 points) Report the estimated extrinsic camera parameters, i.e., rotation matrix and translation vector for the first 2 images 3. (5 points) Report the estimated radial distortion coefficients. Use the radial distortion coefficients to undistort 5 of the raw images and include them in your report. Observe how straight lines at the corner of the images change upon application of the distortion coefficients. Comment briefly on this observation. Page 2 4. (5 points) Compute and report the re-projection error using the intrinsic and extrinsic camera parameters for each of the 25 selected images. Plot the error using a bar chart. Also report the mean and standard deviation of the re-projection error. 5. (10 points) Plot figures showing corners detected in the image along with the corners after the re-projection onto the image for all the 25 images. Comment on how is the reprojection error computed. 6. (10 points) Compute the checkerboard plane normals n C i , i ∈ {1, ..25} for each of the 25 selected images in the camera coordinate frame of reference (Oc ). 4. (40 points) Panorama Generation Download the dataset from link. The dataset consists of three sets of images, which will be used to create three distinct panoramas. Since the images are mixed, you will need to separate them using K-means clustering based on their color histograms or Visual Bag of Words. Visually inspect the results to determine which method provides more accurate separation and go ahead with that. The code for clustering should be present in the notebook. For steps 1 to 5, use only the first two images from the entire set (named image1 and image2). In step 6, perform stitching on all three sets to generate the three complete panoramas. 1. (5 points) Keypoint detection: Extract the keypoints and descriptors from the first two images using the SIFT algorithm. SIFT (Scale-Invariant Feature Transform) is a computer vision algorithm used for feature detection and description. After extracting the keypoints and descriptors, draw them overlaid on the original images to visualize and verify their correctness. 2. (5 + 5 points) Feature matching: Match the extracted features using two different algorithms: BruteForce and FlannBased . BruteForce is a simple algorithm that matches features by comparing all the descriptors of one image with all the descriptors of the other image. FlannBased (Fast Library for Approximate Nearest Neighbors) is a more efficient algorithm that uses a hierarchical structure to speed up the matching process. After performing the matching, display the matched features by drawing lines between them. 3. (5 points) Homography estimation: Compute the Homography matrix using RANSAC. Save and submit the matrix as a csv file. RANSAC (Random Sample Consensus) is an iterative algorithm used for robust estimation of parameters in a mathematical model. The homography matrix is used to align the two images so that they can be stitched together to form a panorama. 4. (5 points) Perspective warping: Perspective warping is a process that transforms the perspective of an image so that it appears as if it was taken from a different viewpoint. Warp the first two images (with overlapping field of view) using their respective homography matrices and display image1 and image2 side-by-side. These warped images will be part of your first panorama. Display the images without cropping the images or stitching them (as asked in the next part). 5. (5 points) Stitching: The two images need to be stitched together to form a panorama. Display the final panorama without any cropping or blending, along Page 3 with the panorama obtained after cropping and blending. 6. (10 points) Multi-Stitching: Perform multi stitching for all the images in the folder and display the final result. Multi-Stiching has to be performed on each of the three sets of images obtained after clustering. The output should be three panoramas. (Hint: Use the function implemented for Stitching). 5. (20 points) [BONUS] Point Cloud Registration Download the dataset from link. The dataset contains multiple sequentially recorded point cloud (.pcd) files. These files were recorded by mounting a 3D LiDAR on a TurtleBot and capturing the point clouds during its motion. Complete the following steps to estimate the TurtleBot trajectory and visualize the registered point clouds. 1. (5 points) Run point-to-point ICP (Iterative Closest Point) registration algorithm on any 2 consecutive point clouds with hyperparameters of your choice. You can use open3d (link) for this task. The output of ICP will be a ”learnt” transformation matrix. Report the fitness and inlier RMSE for initial and estimated transformation matrices between the 2 point clouds. Hint: While making an initial guess of the T-matrix for the ICP algorithm, make sure that it has valid Rotation and Translation components, i.e., the matrix should be orthonormal. You may refer to the ortho group.rvs() function. Also, make sure that the original T-guess that you make isn’t the same as the ground truth transformation matrix. If found to be the same, 0 marks would be awarded for this and the following parts. 2. (8 points) Run multiple experiments with different hyperparameter settings to improve the performance of the model in terms of error and fitness in the transformation matrix. Compare with different threshold values and different initial guess of the identity matrix (random orthogonal matrix and RANSAC based initial guess). Also compare the error between initial and ”learned” transformation matrix. Make sure to add these experiments in a tabular form in your final report and highlight the best hyper-parameters for which you get the minimum error. For estimating normals (if required) you can refer to the open3d function – estimate normals() Also mention the estimated T-matrix in your report. 3. (2 points) With the best hyperparameter settings, transform your source point cloud using the estimated transformation matrix. Visualize the same and give reasons for the results that you get. 4. (5 points) Repeat steps 1 through 4 for point-to-point ICP algorithm for all the point clouds and report the global registered point cloud. Also report and plot the estimated 3D trajectory of the TurtleBot. Save the trajectory in a csv file and submit that along with your report.
1. (14 points) Theory 1. (4 points) Consider a classification problem with K classes, where the true class label is represented as a one-hot vector y ∈ {0, 1} K, ||y||1 = 1, and the predicted probability distribution over classes is q = [q1, q2, . . . , qK], where PK i=1 qi = 1, qi ≥ 0. Label smoothing is a regularization technique used in classification models to reduce the extent to which models become overconfident in their predictions. Label smoothing modifies the target distribution y by assigning a small probability ϵ K to each incorrect class and 1 − ϵ + ϵ K to the correct class. (a) (2 points) Cross-entropy between two arbitrary distributions p and q for a random variable X is defined as H(p, q) = EP [− log q(X)]. Derive the crossentropy loss with label smoothing, H(y, q), as an expectation over the smoothed target distribution in terms of ϵ, K, and the predicted probabilities qi . (b) (2 points) Discuss the effect of label smoothing on the loss and its interpretation. 2. (5 points) Consider two univariate Gaussian distributions, p(x) = N (µp, σ2 p ) and q(x) = N (µq, σ2 q ): (a) (1 point) Write the expression for the cross-entropy between p(x) and q(x) as an expectation. (b) (2 points) Evaluate the expectation H(p, q) in terms of µp, σ2 p , µq, σq. (c) (2 points) For σp = σq = σ, simplify H(p, q) and interpret the result. 3. (5 points) Atrous convolutions, or dilated convolutions, allow convolutional neural networks to capture a larger receptive field without increasing the number of parameters or downsampling the feature map. The dilation factor (or rate) r specifies the spacing between kernel elements. (a) (2 points) Derive the effective receptive field size for a 1D convolution with a kernel size k, dilation factor r, and L layers of stacked convolutions. Show that the receptive field grows exponentially with respect to r if the dilation factor increases layer-by-layer (e.g., r = 1, 2, 4, . . .). (b) (2 points) Generalize the result to 2D convolutions with a k × k kernel and explain how the receptive field changes with r and L. (c) (1 point) Compare the computational complexity of a standard k × k convolution with a dilated k × k convolution for a fixed feature map size. Derive expressions for the number of multiply-add operations in each case. 2. (43 points) Image Classification 1. (5 points) Refer to the Russian Wildlife Dataset. (a) (1 point) Download the dataset and use the following mapping as the class labels: {’amur leopard’: 0, ’amur tiger’: 1, ’birds’: 2, ’black bear’: 3, ’brown bear’: 4, ’dog’: 5, ’roe deer’: 6, ’sika deer’: 7, ’wild boar’: 8, ’people’: 9} Perform a stratified random split of the data in the ratio 0.8:0.2 to get the train and Page 2 validation sets. Create a custom Dataset class for the data. Initialize Weights & Biases (WandB)(Video Tutorial). (b) (2 points) Create data loaders for all the splits (train and validation) using PyTorch. (c) (2 points) Visualize the data distribution across class labels for training and validation sets. 2. (13 Points) Training a CNN from scratch (Tutorial): (a) (3.5 points) Create a CNN architecture with 3 Convolution Layers having a kernel size of 3×3 and padding and stride of 1. Use 32 feature maps for the first layer, 64 for the second and 128 for the last convolution layer. Use a Max pooling layer having kernel size of 4×4 with stride 4 after the first convolution layer and a Max pooling layer having kernel size of 2×2 with stride 2 after the second and third convolution layers. Finally flatten the output of the final Max pooling layer and add a classification head on top of it. Use ReLU activation functions wherever applicable. (b) (3 points) Train the model using the Cross-Entropy Loss and Adam optimizer for 10 epochs. Use wandb to log the training and validation losses and accuracies. (c) (0.5 points) Look at the training and validation loss plots and comment whether the model is overfitting or not. (d) (3 points) Report the Accuracy and F1-Score on the validation set. Also, log the confusion matrix using wandb. (e) (3 points) For each class in the validation set, visualize any 3 images that were misclassified along with the predicted class label. Analyze why the model could possibly be failing in these cases. Is this due to the fact that the image does not contain the ground truth class or it looks more similar to the predicted class or something else? Can you think of any workaround for such samples? 3. (10 points) Fine-tuning a pretrained model (a) (3.5 points) Train another classifier with a fine-tuned Resnet-18 (pre-trained on ImageNet) architecture using the same strategy used in Question 2.2.(b) and again use wandb for logging the loss and accuracy. (b) (0.5 points) Look at the training and validation loss plots and comment whether the model is overfitting or not. (c) (3 points) Report the Accuracy and F1-Score on the validation set. Also, log the confusion matrix using wandb. (d) (3 points) For deep neural networks, typically, the backbone is the part of a model (initial layers) that is used to extract feature representations (or simply features) from the raw input data, which can then be used for classification or some other related task. These features are expressed as an n-dimensional vector, also known as a feature vector and the corresponding vector space is referred to as the feature space. As the training progresses and the classifier learns to classify the input, the data samples belonging to the same class lie Page 3 closer to each other in the feature space than other data samples. For input samples from the training and validation sets, extract the feature vectors using the backbone (ResNet-18 in this case) and visualize them in the feature space using the tSNE plot in a 2-D Space. Also, visualize the tSNE plot of the validation set in a 3D-Space. 4. (10 points) Data augmentation techniques (a) (3.5 points) Use any 3 (or more) Data Augmentation techniques that are suitable for this problem. Remember that data augmentation techniques are used for synthetically adding more training data so that the model can train on more variety of data samples. Visualize 4-5 augmented images and add them to your report. (b) (3 points) Follow the same steps as in Question 2.3.(a) to train the model. (c) (0.5 point) Look at the training and validation loss plots now and comment if the problem of overfitting is getting resolved or not. (d) (3 points) Report the Accuracy and F1-Score on the validation set. Also, log the confusion matrix using wandb. 5. (5 points) Compare and comment on the performance of all three models. 3. (35 points) Image Segmentation 1. (5 points) Download the CAMVid dataset. (a) (1 point) Download the dataset and write the dataloader. The image size in the dataset is (960, 720) shape; you need to resize the image to (480, 360) and normalize the input image by applying mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]. (b) (2 points) Visualize the class distribution across the provided dataset. (c) (2 points) Visualize two images along with their mask for each class. Figure 1: Encoder-decoder architecture for segmentation. 2. (15 points) Train SegNet Decoder from scratch: (a) (6 points) We provide a pre-trained SegNet encoder, implemented in the given model classes.py file, which has been trained on the CamVid dataset. Additionally, the file includes a skeleton structure for the decoder that you will need to implement from scratch. Page 4 Your task is to complete the decoder implementation by following the provided instructions. Once implemented, train the decoder by using the segnet pretrained, use the cross-entropy loss function, with a Batch Normalization momentum of 0.5, and optimize the model using the Adam optimizer. Also, log your training loss on wandb. (b) (4 points) Report the classwise performance of the test set in terms of pixelwise accuracy, dice coefficient, IoU (Intersection Over Union), and also mIoU. Additionally, report precision and recall. Use the IoUs within the range [0, 1] with 0.1 interval size for computation of the above metrics. You may refer to this article to learn more about the evaluation of segmentation models. Include all your findings in the submitted report. (c) (5 points) For any three classes in the test set, visualize any three images with IoU ≤ 0.5 and the predicted and ground truth masks; the visualization of masks should be in proper color code . Comment on why the model could possibly be failing in these cases with the help of IoU visualizations. Is the object occluded, is it being misclassified, or is it due to the environment (surroundings) where the object is present? 3. (15 points) Fine-tuning DeepLabV3 on the CamVID Dataset. (a) (6 points) Use a pre-trained DeepLabv3 trained on the Pascal VOC dataset, fine-tune it for CamVID Dataset using the cross-entropy loss function and the Adam optimizer. Also, log your training loss on wandb. (b) (4 points) Report the classwise performance of the test set in terms of pixelwise accuracy, dice coefficient, IoU (Intersection Over Union), and also mIoU. Additionally, report precision and recall. Use the IoUs within the range [0, 1] with 0.1 interval size for computation of the above metrics and compare it with your implemented SegNet Model. Include all your findings in the submitted report. (c) (5 points) For any three classes in the test set, visualize any three images with IoU ≤ 0.5 and the predicted and ground truth masks; the visualization of masks should be in proper color code same as done in the part-1 of this question. Comment on why the model could possibly be failing in these cases with the help of IoU visualizations. Is the object occluded, is it being misclassified, or is it due to the environment (surroundings) where the object is present? 4. (28 points) Object Detection and Multi-Object Tracking 1. (18 points) The Object Detection problem involves two tasks – localizing the object, i.e. determining the coordinates of the valid bounding boxes in the image, and classifying the objects within those boxes from a vocabulary of object classes. By completing this task, you will learn how to use modern state-of-the-art detection models and interpret their outputs. More importantly, you will learn how to perform a deep analysis of prediction errors. Page 5 (a) (1 point) You will use the Ultralytics API to detect objects in the COCO 2017 validation set. Familiarize yourself with the API and download the COCO 2017 validation set. Note that the Ultralytics repo has a script to automatically download the data – you may refer to their documentation. You will not need the train and test splits, only the val split is required for this task. (b) (2 points) Use a COCO-pretrained YOLOv8 model to make predictions on COCO val2017. Save your predictions in a json file in the standard COCO format. You can find this format specification on the COCO website. Report the mAP (Mean Average Precision) of your model predictions. (c) (2 points) Check out the TIDE Toolbox. This tool analyzes detector predictions to classify and quantify the specific types of errors. Use your saved predictions to compute the TIDE statistics on the val set. Moreover, read the TIDE paper and understand what these errors mean. Comment on the error analysis of your model as per your understanding. (d) (5 points) Modern deep neural networks often have poor confidence calibration – that is, their predicted probability estimates are not a good representative of the true correctness likelihood. Expected Calibration Error (ECE) is one metric that summarizes this miscalibration error into a scalar statistic. Please read Section 2 of this paper to learn how to compute ECE (Eqn. 3). Compute this metric on your COCO val2017 predictions. Report and comment on the significance of the error obtained for your predictions. (e) (5 points) The COCO API reports performance for three scales of objects – small, medium and large. These objects are classified based on the area of the bounding box of each object – small objects are defined as those with area < 36 · 36, medium as 36 · 36 ≤ area < 96 · 96, and large as area ≥ 96 · 96. Compute the TIDE statistics at each of the three object scales. For this, you can filter your saved predictions file based on the three scales and save a new file for each scale, and then match each of them against the ground truth annotations file with the TIDE API. Also, compute the Expected Calibration Error at each of these scales. (f) (3 points) Answer each of the following points separately. • (1 point) What can you infer from these observations? • (1 point) Comment on your observations across each of the three scales. • (1 point) Compare these statistics with the relevant metrics computed with all objects, as you computed in 4.(c) and 4.(d). 2. [BONUS] (10 points) Refer to the MOT17 Dataset for object tracking evaluation. You need to evaluate ByteTrack using the training dataset. (a) (2.5 points) Download the MOT17 dataset. Visualize the first frame of the first video in the training set. Visualize the data distribution using the number of frames. Visualize the first frame of any one video. (b) (1 point) ByteTrack is a simple, fast and strong multi-object tracker. Use ByteTrack to track detection provided in any one video and visualize the output video with bounding boxes. Page 6 (c) (4 points) Implement a simple IOU Tracker, and track the detection provided in any one video and visualize the output video with bounding boxes (d) (2.5 points) Compare the performance of your custom-implemented IoU tracker with ByteTrack. To know more about MOT metrics, click this link.
1. (10 points) Section A (Theoretical) (a) (3 points) Consider a regression problem where you have an MLP with one hidden layer (ReLU activation) and a linear output (refer to attached Fig 1). Train the network using mean squared error loss. Given a dataset with inputs [1, 2, 3] and corresponding targets [3, 4, 5], perform a single training iteration and update the weights. Assume appropriate initial weights and biases and a learning rate of 0.01. (b) (4 points) You have the following dataset: (a) Are the points linearly separable? Support your answer by plotting the points. Figure 1: Figure for Question 1.a Class x1 x2 Label + 0 0 + + 1 0 + + 0 1 + – 1 1 – – 2 2 – – 2 0 – Table 1: Figure for Question 1.b (b) Find out the weight vector corresponding to the maximum margin hyperplane. Also find the support vectors present. (c) (3 points) Consider the dataset with features x1 and x2 and 2 classes y = −1 and y = +1. Training Dataset: Sample No. x1 x2 y 1 1 2 +1 2 2 3 +1 3 3 3 -1 4 4 1 -1 Table 2: Table for Question 1.c The SVM formulates a decision boundary of format: w.x + b = 0 Given the values: w1 = 1, w2 = -1, b = 0 Solve the following parts: (a) Calculate the margin of the classifier. (b) Identify the support vectors. (given samples 1 to 4). (c) Predict the class of a new point : x1=1, x2=3. 2. (15 points) Section B (Scratch Implementation) 1. (5.5 points) Implement a class named NeuralNetwork with the following parameters during initialization: (a) N: Number of layers in the network. (b) A list of size N specifying the number of neurons in each layer. (c) lr: Learning rate (d) Activation function (same activation function is used in all layers of the network except the last layer). Page 2 (e) Weight initialization function. (f) Number of epochs. (g) Batch size. The NeuralNetwork class should also implement the following functions: (a) fit(X, Y): trains a model on input data X and labels Y. (b) predict(X): gives the prediction for input X. (c) predict_proba(X): gives the class-wise probability for input X. (d) score(X, Y): gives the accuracy of the trained model on input X and labels Y. 2. (2 points) Implement the following activation functions (along with their gradient functions): sigmoid, tanh, ReLU, Leaky ReLU and softmax (only used in the last layer). 3. (1.5 points) Implement the following weight initialization functions: zero init, random init, and normal init (Normal(0, 1)). Choose appropriate scaling factors. 4. (6 points) Train the implemented network on the MNIST dataset. Perform appropriate preprocessing and use a 80:10:10 train-validation-test split. Use the following configurations for training the network: (a) Number of hidden layers = 4. (b) Layer sizes = [256,128,64,32]. (c) Number of epochs = 100 (can be less if computation is taking too long). (d) Batch size = 128 (or any other appropriate batch size if taking too long). (e) Learning rate = 2e-5. Plot training loss vs. epochs and validation loss vs. epochs for each activation function and weight initialization function and report your findings in the report (such as which function combination performed the best and where did it perform suboptimally). Also, save all the 12 trained models as .pkl files. You will be asked to run them during the demo to reproduce your results on the test set. OR 3. (15 points) Section C (Algorithm implementation using packages) For this question, you would need to download the Fashion-MNIST dataset. It contains a train.csv and a test.csv. You need to take the first 8000 images from the train data and the first 2000 from the test data. These will be your training and testing splits. 1. (1 point) Perform appropriate preprocessing on the data (for eg: normalization) and visualize any 10 samples from the test dataset. 2. (4 points) Train a MLP Classifier from sklearn’s neural network module on the training dataset. The network should have 3 layers of size [128, 64, 32], should be trained for 100 iterations using an ‘adam’ solver with a batch size of 128 and learning rate of 2e-5. Train it using all the 4 activation functions i.e. ‘logistic’, Page 3 ‘tanh’, ‘relu’ and ‘identity’. For each activation function, plot the training loss vs epochs and validation loss vs epochs curves and comment on which activation function gave the best performance on the test set in the report. 3. (3 points) Perform grid search using the best activation function from part 2 to find the best hyperparameters (eg: solver, learning rate, batch size) for the MLP classifier and report them in the report. 4. (4 points) For this part, you need to train a MLPRegressor from sklearn’s neural network module on a regeneration task: (a) This means you will need to design a 5 layer neural network with layer sizes following the format: [c, b, a, b, c] where c > b > a. (b) By regeneration task, it means that you will try to regenerate the input image using your designed neural network and plot the training and validation losses per epoch to see if your model is training correctly. (c) Train 2 neural networks on the task above. One using a ‘relu’ activation and the other using the ‘identity’ activation function. Set the solver as adam and use a constant learning rate of 2e-5. (d) Post training both the networks, visualize the generations for the 10 test samples you visualized in part 1 and describe your observations in the report. (4 points, 1 for each part). 5. (3 points) Lastly, from the two neural networks trained above extract the feature vector of size ‘a’ for the train and test data samples. Using this vector as your new set of image features, train two new smaller MLP Classifiers with 2 layers, each of size ‘a’ on the training dataset and report accuracy metrics for both these classifiers. Train it for 200 iterations with the same solver and learning rate as part 2. Contrast this with the MLP Classifier you trained in part 2 and report possible reasons why this method still gives you a decent classifier?
1. (10 points) Section A (Theoretical) (a) Consider the forward pass of a convolutional layer in a neural network architecture, where the input is an image of dimensions M × N (where min(M, N) ≥ 1) with P channels (P ≥ 1) and a single kernel of size K × K (1 ≤ K ≤ min(M, N)). (a) (1 point) Given a stride of 1 and no padding, determine the dimensions of the resulting feature map. (b) (1 point) Compute the number of elementary operations (multiplications and additions) required to compute a single output pixel in the resulting feature map. (c) (3 points) Now, consider the scenario where there are Q kernels (Q ≥ 1) of size K × K. Derive the computational time complexity of the forward pass for the entire image in Big-O notation as a function of the relevant dimensions. Additionally, provide another Big-O notation assuming min(M, N) ≫ K. (b) (5 points) Explain the Assignment Step and Update Step in the K-Means algorithm. Discuss any one method that helps in determining the optimal number of clusters. Can we randomly assign cluster centroids and arrive at global minima? 2. (15 points) Section B (Scratch Implementation) You are tasked with implementing the KMeans clustering algorithm from scratch using Python. Use the Euclidean distance as the distance function where k is chosen as 2. The initial centroids for the 2 clusters are given as: u1 = (3.0, 3.0) u2 = (2.0, 2.0) The matrix X consists of the following data points: X = 5.1 3.5 4.9 3.0 5.8 2.7 6.0 3.0 6.7 3.1 4.5 2.3 6.1 2.8 5.2 3.2 5.5 2.6 5.0 2.0 8.0 0.5 7.5 0.8 8.1 −0.1 2.5 3.5 1.0 3.0 4.5 −1.0 3.0 −0.5 5.1 −0.2 6.0 −1.5 3.5 −0.1 4.0 0.0 6.1 0.5 5.4 −0.5 5.3 0.3 5.8 0.6 Page 2 (a) Implement the k-means clustering algorithm from scratch. Ensure that your implementation includes: (a) (1 point) Initialization: Use the given centroids as starting points. (b) (2 points) Assignment: Assign each data point to the nearest centroid based on the Euclidean distance. (c) (2 points) Update: Recalculate the centroids after each assignment by computing the mean of all points assigned to each centroid. (d) (1 point) Convergence Check: Terminate the algorithm if centroids do not significantly change between iterations or after a maximum of 100 iterations. Use a convergence threshold of 1e-4. (b) (2 points) Find the values of final centroids after the algorithm converges. Plot the two clusters at the start of the process and at the end. (c) (2 points) Compare the results using the provided initial centroids versus using random initialization of centroids. (d) (5 points) Determine the optimal number of clusters, M, using the Elbow method. Plot the Within-Cluster Sum of Squares (WCSS) against different values of k to find the elbow point. Randomly initialize M centroids, perform clustering and plot the resulting clusters OR 3. (15 points) Section C (Algorithm implementation using packages) For this question, you are expected to work with the CIFAR-10 dataset. The CIFAR-10 dataset consists of 60,000 32×32 RGB images of 10 classes, with 6,000 images per class. You are expected to work with 3 classes from the available classes as per your choice. Hence, you should have roughly 18,000 images in your training curated dataset, with 15,000 images in the train and 3,000 in the test dataset, respectively. (Note: No additional marks will be provided for working with more classes than required.) 1. (3 points) Data Preparation: Use PyTorch to load the CIFAR-10 dataset, perform a stratified random split in the ratio of 0.8:0.2 for the training and validation datasets. Here, the 15,000 images from the training dataset are split into train-val via 0.8:0.2 split, and 3,000 images (1,000 per class) are retained as the testing data from the original test dataset of CIFAR-10. Create a custom Dataset class for the data and create data loaders for all the dataset splits – train, val, and test. 2. (0.5 points) Visualization: Load the dataset and visualize 5 images of each class from both the training and validation datasets. 3. (2.5 points) CNN Implementation: Create a CNN architecture with 2 convolutional layers (using in-built PyTorch implementations) having a kernel size of 5 x 5, 16 channels, padding and stride of 1 for the first layer, and kernel size of 3 x 3, 32 channels, stride of 1, and padding of 0 for the second layer. Use max-pooling layers with a kernel size of 3 x 3 with a stride of 2 after the first convolutional layer and a kernel size of 3 x 3 with stride 3 after the second convolutional layer. After Page 3 the second max pooling layer, flatten out the output and add it to a multi-layer perceptron, with 16 neurons in the first layer and the classification head as the second layer. Use the ReLU activation function after each layer other than the last layer (the classification head layer). 4. (2.5 points) Training the model: Train the model using the cross-entropy loss function with Adam optimizer for 15 epochs. Log the training and validation loss and accuracy after each epoch. Save the trained models as .pth files, which are to be submitted along with the code for the assignment. 5. (1.5 points) Testing: Observe the training and validation plots for loss and accuracy and comment on your understanding of the results. Report the accuracy and F1- score on the test dataset. Plot the confusion matrix for the train, val and test dataset. 6. (3 points) Training an MLP: Create an MLP model with 2 fully connected layers, the first layer with 64 neurons and the second layer as the classification head. Flatten out the image before processing it into the MLP. Use a ReLU layer after the first fully connected layer, and use the cross-entropy loss function with adam optimizer to train the model for 15 epochs. Log the training and validation loss and accuracy after each epoch. Save the models as .pth files, which must be submitted along with the assignment. 7. (2 points) Infer and Compare: Compute the test accuracy and F1-score and plot the confusion matrix for the MLP model. Now, compare the results and plots obtained from both the models and comment on their performance and differences. Note: During the demos, students will be expected to reproduce the evaluation results on the test dataset for both CNN and MLP models. Hence, it is critical to submit the .pth files of the trained models.
1. (10 points) Section A (Theoretical) (a) (2 marks) You are developing a machine-learning model for a prediction task. As you increase the complexity of your model, for example, by adding more features or by including higher-order polynomial terms in a regression model, what is most likely to occur? Explain in terms of bias and variance with suitable graphs as applicable. (b) (3 marks) You’re working at a tech company that has developed an advanced email filtering system to ensure users’ inboxes are free from spam while safeguarding legitimate messages. After the model has been trained, you are tasked with evaluating its performance on a validation dataset containing a mix of spam and legitimate emails. The results show that the model successfully identified 200 spam emails. However, 50 spam emails managed to slip through, being incorrectly classified as legitimate. Meanwhile, the system correctly recognised most of the legitimate emails, with 730 reaching the users’ inboxes as intended. Unfortunately, the filter mistakenly flagged 20 legitimate emails as spam, wrongly diverting them to the spam folder. You are asked to assess the model by calculating an average of its overall classification performance across the different categories of emails. (c) (3 marks) Consider the following data where y(units) is related to x(units) over a period of time: Find the equation of the regression line and, using the regression x y 3 15 6 30 10 55 15 85 18 100 Table 1: Table of x and y values equation obtained, predict the value of y when x=12. (d) (2 marks) Given a training dataset with features X and labels Y , let ˆf(X) be the prediction of a model f and L( ˆf(X), Y ) be the loss function. Suppose you have two models, f1 and f2, and the empirical risk for f1 is lower than that for f2. Provide a toy example where model f1 has a lower empirical risk on the training set but may not necessarily generalize better than model f2. 2. (15 points) Section B (Scratch Implementation) Implement Logistic Regression in the given dataset. You need to implement Gradient Descent from scratch, meaning you cannot use any libraries for training the model (You may use libraries like NumPy for other purposes, but not for training the model). Split the dataset into 70:15:15 (train: test: validation). The loss function to be used is Crossentropy loss. Dataset: Heart Disease (a) (3 marks) Implement Logistic Regression using Batch Gradient Descent. Plot training loss vs. iteration, validation loss vs. iteration, training accuracy vs. iteration, and validation accuracy vs. iteration. Comment on the convergence of the model. Compare and analyze the plots. (b) (2 marks) Investigate and compare the performance of the model with different feature scaling methods: Min-max scaling and No scaling. Plot the loss vs. iteration for each method and discuss the impact of feature scaling on model convergence. Page 2 (c) (2 marks) Calculate and present the confusion matrix for the validation set. Report precision, recall, F1 score, and ROC-AUC score for the model based on the validation set. Comment on how these metrics provide insight into the model’s performance. (d) (3 marks) Implement and compare the following optimisation algorithms: Stochastic Gradient Descent and Mini-Batch Gradient Descent (with varying batch sizes, at least 2). Plot and compare the loss vs. iteration and accuracy vs. iteration for each method. Discuss the trade-offs in terms of convergence speed and stability between these methods. (e) (2 marks) Implement k-fold cross-validation (with k=5) to assess the robustness of your model. Report the average and standard deviation for accuracy, precision, recall, and F1 score across the folds. Discuss the stability and variance of the model’s performance across different folds. (f) (3 marks) Implement early stopping in your best Gradient Descent method to avoid overfitting. Define and use appropriate stopping criteria. Experiment with different learning rates and regularization techniques (L1 and L2). Plot and compare the performance with and without early stopping. Analyze the effect of early stopping on overfitting and generalization. OR 3. (15 points) Section C (Algorithm implementation using packages) Split the given dataset into 80:20 (train: test) and perform the following tasks: Dataset: Electricity Bill Dataset (a) (2.5 marks) Perform EDA by creating pair plots, box plots, violin plots, count plots for categorical features, and a correlation heatmap. Based on these visualizations, provide at least five insights on the dataset. (b) (1 marks)Use the Uniform Manifold Approximation and Projection (UMAP) algorithm to reduce the data dimensions to 2 and plot the resulting data as a scatter plot. Comment on the separability and clustering of the data after dimensionality reduction. (c) (2.5 marks) Perform the necessary pre-processing steps, including handling missing values and normalizing numerical features. For categorical features, use LabelEncoding. Apply Linear Regression on the preprocessed data. Report Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R2 score, Adjusted R2 score, and Mean Absolute Error (MAE) on the train and test data. (d) (2 marks) Perform Recursive Feature Elimination (RFE) or Correlation analysis on the original dataset to select the 3 most important features. Train the regression model using the selected features. Compare the results (MSE, RMSE, R2 score, Adjusted R2 score, MAE) on the train and test dataset with the results obtained in part (c). Page 3 (e) (2 marks) Encode the categorical features of the original dataset using One-Hot Encoding and perform Ridge Regression on the preprocessed data. Report the evaluation metrics (MSE, RMSE, R2 score, Adjusted R2 score, MAE). Compare the results with those obtained in part (c). (f) (2 marks) Perform Independent Component Analysis (ICA) on the one-hot encoded dataset and choose the appropriate number of components (try 4, 5, 6, and 8 components). Compare the results (MSE, RMSE, R2 score, Adjusted R2 score, MAE) on the train and test dataset. (g) (1.5 marks) Use ElasticNet regularization (which combines L1 and L2) while training a linear model on the preprocessed dataset from part (c). Compare the evaluation metrics (MSE, RMSE, R2 score, Adjusted R2 score, MAE) on the test dataset for different values of the mixing parameter (alpha). (h) (1.5 marks) Use the Gradient Boosting Regressor to perform regression on the preprocessed dataset from part (c). Report the evaluation metrics (MSE, RMSE, R2 score, Adjusted R2 score, MAE). Compare the results with those obtained in parts (c) and (g).
Question-1 Use https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz MNIST dataset for this question and select two digits – 0 and 1. Label them as -1 and 1. In this exercise you will be implementing AdaBoost.M1. Perform following tasks. • Divide the train set into train and val set. Keep 1000 samples from each class for val. Note val should be used to evaluate the performance of the classifier. Must not be used in obtaining PCA matrix. • Apply PCA and reduce the dimension to p = 5. You can use the train set of the two classes to obtain PCA matrix. For the remaining parts, use the reduced dimension dataset. • Now learn a decision tree using the train set. You need to grow a decision stump. For each dimension, find the unique values and sort them in ascending order. The splits to be evaluated will be midpoint of two consecutive unique values. Find the best split by minimizing weighted 1 miss-classification error. Denote this as h1(x). Note as we are dealing with real numbers, each value may be unique. So just sorting them and taking midpoint of consecutive values may also result in similar tree. [2] • Compute α1 and update weights. • Now build another tree h2(x) using the train set but with updated weights. Compute α2 and update weights. Similarly grow 300 such stumps. • After every iteration find the accuracy on val set and report. You should show a plot of accuracy on val set vs. number of trees. Use the tree that gives highest accuracy and evaluate that tree on test set. Report test accuracy. [2] Q2. Consider the above as a regression problem. Apply gradient boosting using absolute loss and report the MSE between predicted and actual values of test set. • Divide the train set into train and val set. Keep 1000 samples from each class for val. Note val should be used to evaluate the performance of the classifier. Must not be used in obtaining PCA matrix. • Apply PCA and reduce the dimension to p = 5. You can use the train set of the two classes to obtain PCA matrix. For the remaining parts, use the reduced dimension dataset. • Now learn a decision tree using the train set. You need to grow a decision stump. For each dimension, find the unique values and sort them in ascending order. The splits to be evaluated will be midpoint of two consecutive unique values. Find the best split by minimizing SSR. Denote this as h1(x). [1] • Compute residue using y − .01h1(x). • Now build another tree h2(x) using the train set but with updated labels. Note, now you have to update labels based on the way we update labels for absolute loss. That is the labels will be obtained as negative gradients. Compute residue using y − .01h1(x) − .01h2(x). [1] • Similarly grow 300 such stumps. Note, the labels are updated every iteration based on negative gradients. • After every iteration find the MSE on val set and report. You should show a plot of MSE on val set vs. number of trees. Use the tree that gives lowest MSE and evaluate that tree on test set. Report test MSE. [1]
Question-1 Use https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz MNIST dataset for this question and select class 0, 1 and 2. Note you are not allowed to use libraries which can take data, fit the model, predict the classes and give accuracy. Perform following tasks. • Apply PCA and reduce the dimension to p = 10. You can use the entire train set of these 3 classes to obtain PCA matrix. For the remaining parts, use the reduced dimension dataset. • Now learn a decision tree using the train set. You need to grow a decision tree with 3 terminal nodes. This is similar to what we did in the baseball salary example. For the first split, consider all p dimensions. For each dimension, consider one split which will divide the space into two regions. Find the total Gini index. Similarly find the total Gini index for all 50 dimensions. Find the best split by searching for minimum Gini index. Suppose, you split across 10th dimension. Choose one of the splits, and repeat the steps to find best split. Once you find it, the entire p dimensional space is divided into three regions. [2] 1 • Find the class of all samples in test set of these 3 classes. For a particular test sample, check where the samples lies in the segmented space. The class for a particular sample is the class of sample which is in majority in the region to which the test sample belongs. Report accuracy and class-wise accuracy for testing dataset. [1] • Now use bagging, develop 5 different datasets from the original dataset. Learn trees for all these datasets. For test samples, use majority voting (atleast 3 trees should predict the same class) to find the class of a given sample. In case there is a tie, that is two trees predict one class and other two trees predict another class, then you can choose either of the classes. Report the total accuracy and class-wise accuracy. [1]
Question-1 Use https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz MNIST dataset for this question and perform following tasks. • It has all in all 60K train samples from 10 classes and 10K test samples. 10 classes are digits from 0-9. Labels or classes for all samples in train and test set is available. • Visualize 5 samples from each class in the train set in form of images. • Images are of size 28×28. Vectorize them to make it 784 dimensional. Apply QDA on the given dataset. For each of the 10 classes you need to compute its mean vector and covariance vector. Use the QDA expression derived in the lecture. Your code should clearly have this expression. Note mean and covariance will be computed from train set only. Test set is not seen at this stage. • Find the class of all samples in test set. Report accuracy and class-wise accuracy for testing dataset. Accuracy is ratio of total number of samples correctly classified to the total number of samples tested. Total number of samples tested is 10K. Similarly, for each class report the accuracy. Note the labels or classes for each sample is given in the dataset. 1 3 Question-2 Use same downloaded dataset from Question 1 and perform following tasks. • Choose 100 samples from each class and create a 784×1000 data matrix. Let this be X. • Remove mean from X. • Apply PCA on the centralized X. You need to compute covariance S = XX>/999. The find its eigenvectors and eigenvalues. You can use any library for this. Sort them in descending order and create matrix U. • Perform Y = U >X and reconstruct Xrecon = UY . Check the MSE between X and Xrecon. This should be close to 0. MSE = P i,j (X(i, j) − Xrecon(i, j))2 . • Now chose p = 5, 10, 20 eigenvectors from U. For each p, obtain UpY , add mean that was removed from X, reshape each column to 28×28, and plot the image. You should see that as p increase the reconstructed images look more like their original counterparts. Plot 5 images from each class. • Let test set be Xtest. Find Y = U > p Xtest. For each value of p find Y , and apply QDA from Q1 on Y. Obtain accuracy on test set as well as per class accuracy. As p inreases, accuracy shall increase.
Q1. Consider two Cauchy distributions in one dimension p(x|ωi) = 1 πb 1 1 + ( x−ai b ) 2 , i = 1, 2 Assume P(ω1) = P(ω2). Find the total probability of error. Note you need to first obtain decision boundary using p(ω1|x) = p(ω2|x). Then determine the regions where error occurs and then use p(error) = R x p(error|x)p(x)dx. Plot the the conditional likelihoods, p(x|ωi)p(ωi), and mark the regions where error will occur. This shall be rough hand-drawn sketch. As p(x) is same when equating posteriors, we can simply use p(x|ωi)p(ωi). [1] Q2. Compute the unbiased covariance matrix: [0.5] X = 1 0 0 −1 0 1 0 1 1 Here, X ∈ Rd×N form. Q3.a. In multi-category case, probability of error p(error) is given as 1- p(correct), where p(correct) is the probability of being correct. Consider a case of 3 classes or categories. Draw a rough sketch of p(x|ωi)p(ωi) ∀i = 1, 2, 3. Give an expression for p(error). Assume equi-probable priors for simplicity. [1] b. Mark the regions if the three conditional likelihoods are Gaussians p(x|ωi) N(µi , 1), µ1 = −1, µ2 = 0, µ3 = 1. Find the p(error) in terms of CDF of standard distribution. [1] Q4. Find the likelihood ratio test for following Cauchy pdf: p(x|ωi) = 1 πb 1 1 + ( x−ai b ) 2 , i = 1, 2 Assume P(ω1) = P(ω2) and 0-1 loss. [1]