Objective This program provides an opportunity to practice manipulating array-based lists of objects. Overview & Instructions Define a context that requires storing a collection of related data. Ultimately, you will implement this entity in the form of a class. Your attributes can be numbers and/or strings. Include at least four attributes with one as the key that defines the object uniquely. Create a text file to contain the data for at least 10 of these objects. Data for one object should occupy one line of the file. You may keep your file simple with minimal white space or comma delimiters separating each of the attributes. Create driver program that includes an array of objects (no use of the Java ArrayList class, please). Your program should be an interactive list manager. It should be driven by a simple menu that offers the user to manage the elements of the list (stored as the array of objects). Your interface should be driven by a formal frame-based GUI that includes text fields, text areas, buttons, etc. Button clicks could prompt for the various list actions, but you are free to design your own (user-friendly) interface within these constraints. As part of this, your program should include the following features: Read information from input file Add a new element Delete an existing element (using the key to search and delete) Sort ascending relative to one field of data Sort descending relative to a different field of data Randomize the list Write the information back to the file in the same format Design your application using guidelines to maximize modularity, reusability, and maintainability. Deliverables Deliver the following to the online course management system Assignment dropbox: Notice Upload your source code (.java) files This is an individual assignment. You must complete this assignment on your own. You may not discuss your work in detail with anyone except the instructor. You may not acquire from any source (e.g., another student or an internet site), a partial or complete solution toaproblem or project that has been assigned. You may not show another student your solution to an assignment. You may not have another person (current student, former student, tutor, friend, anyone) “walk you through” how to solve the assignment.
Objective To practice drawing with Java graphics. Overview & Instructions Build a Java application to draw the following four graphic objects in the application window. Utilize a simple user interface that allows the user to select one or more of the objects to view or hide. A stoplight with left turn arrows. Design and draw a personal logo. Use your initials, or some image that depicts you. Include the use of font control for text sizing. TK A weather icon including all of the features you see (sun clouds lightning, and rain). Use at least two colors. Finally, if you were tasked with implementing a battle game, draw an image of a bad guy, an enemy fighter, or some other sinister image. (Just one image) ? Deliverables Deliver the following to the online course management system dropbox as your final product: Notice Upload your source code (.java) file This is an individual assignment. You must complete this assignment on your own. You may not discuss your work in detail with anyone except the instructor. You may not acquire, from any source (e.g.,another student or an internet site), a partial or complete solution to a problem or project that has been assigned. You may not show another student your solution to an assignment. You may not have another person (current student, former student, tutor, friend, anyone) “walk you through” how to solve the assignment.
Objective To build a complete working Java program that offer practice with a more involved Java graphical user interface, multiple classes, and a large data file. Overview & Instruction Write a Java application to simulate shipping tool for Michigan zip codes. Your company maintains several shipping centers around Michigan. You need to build an application to calculate the shipping cost between one of the shipping centers and any other community in the state. Your shipping centers are in: University Center (48710) Mackinaw_City (49701) Grand Rapids (49501) Marquette (49855) Traverse_City (49684) From post offices at these locations, your company can ship to any other post office in Michigan (except of course itself). Build a user interface that allows the user to enter a shipping center and a zip code for the product destination. Then, calculate and provide the shipping cost for the order. Create a drop-down list (JComboBox) that includes the small list of shipping center cities. Also include an empty default choice in the drop-down list for an initial setting. Next, build a numerical key pad as an interface to key in the digits of the zip code. Include a (non-editable) text field to display the digits of the zip code as they are keyed in. Finally, include buttons to Calculate, Clear, and Quit. The Calculate button should determine the shipping cost and the Clear button should empty the zip code text field and reset the drop-down list to the empty “non-choice”. Upon entering a valid zip code, the user should receive a dialog box containing the post office name, the distance from the shipping source (see below), and the cost to ship. For simplicity, assume it requires only 5 cents per mile to ship a product. For error checking, be sure that the zip code entered (1) has five digits, (2) exists in Michigan, and (3) does not match the selected shipping center. A simple message dialog can be used to display an error the user. Your application requires a file zipMlcity.txt that contains all zip codes in Michigan including the location (latitude, longitude) and name of the post office. A sample line of input from the file will be: 48706 43.60880 -83.95300 MI Bay City (note: Western Hemisphere longitudes are negative). Your application should be object oriented containing: • The main application GUI 이 At least one class that will act as a “data manager” for the list of zip code information Consider additional classes to manage a zip code, a county, and/or a county list The data manager class should perform the following tasks “behind the scenes” Read the raw zip code information from the provided deta files. Store the information in one or more arrays within your class(es). Search for the name, latitude, and longitude for the zip code entered. Perform required distance calculations. Combine zip code and county information as needed for given user input. To calculate the distance from Delta College, you will need to integrate the great circle distance formula. This formula is provided and demonstrated for you in the following example: DistanceCalc.java. Deliverables Deliver the following to the online course management system dropbox as your final product: Notice Upload your source code (java) files This is an individual assignment. You must complete this assignment on your own. You may not discuss your work in detail with anyone except the instructor. You may not acquire from any source (e.g.,another student or an internet site), a partial or complete solution to a problem or project that has been assigned. You may not show another student your solution to an assignment. You may not have another person (current student, former student, tutor, friend, anyone) “walk you through” how to solve the assignment.
Objective To build a complete working Java program that includes a variety of user interface components. Overview & Instruction Write a Java application that acts as a “front-end” GUI to set preferences for ordering form at a pizza restaurant. Include the following components for user data entry: Text field to enter the name of the server. Include appropriate labels. Radio button group for choice for size (Small, Medium, Large, X-Large). Drop-down list for identifying a list of specialty pizzas to choose from. Include at least five types of pizza (i.e “Supreme”, “Meat Lovers”, etc.) Check boxes for add-on ingredients to the standard specialties (i.e. extra cheese, extra sauce, etc.) Text area for entering any special instructions. Slider bar to allow the customer to select the “spiciness level” (1-20) of the dipping sauce that accompanies all orders. Button to “submit” or “send the information” There is much room for creativity within these specifications, so feel free to embellish as you wish. The context of this assignment is to build the “front-end” GUI for what is likely a far more complex application. To capture the information design a simple class that includes required constructors, set/get methods, and a tostring() method (that returns all collected info as one String object), but nothing else.When the button is pressed, collect the input from the interface and “set” the data into one object of your class. To display a summary of the entire order, design the button click to present the output as a string within a dialog box. When the button is clicked, collect the info from the form, “set” it into the object of your class, and then produce the string via a call to the toString() method that can be displayed in a simple output dialog box. Arrange the GUI components the best that you can by managing the order that they are added to the window as well as the window size itself. You may choose any layout management scheme you would like for this program. Finally, be sure your interface/class is set up to handle an immediate user button click. Have default values or setting included to avoid any runtime exceptions from this action.
Objective To build a complete working Java program that applies arrays and list processing. Overview & Instruction Write a menu-driven to analyze population data. You are provided with a (significantly large) comma-delimited file countyPopData1017.txt that includes the following fields: {FIPScode) {county} (state} {8 more fields with 2010-2017 county populations} You program should read the entire contents of the file into either parallel arrays or one array of objects. Create a simple menu-driven interface driven by layers of dialog boxes. Offer the user choices for analyzing population data and trends. Offer the following choices: Search for… County population by year County population change State population by year State population change U.S. population by year by prompting for … FIPS code, year FIPS code, startYear, endYear Two-char state code, year Two-char state code, startYear, endYear Year For all county searches, be sure to include the county name in the output dialog presented to the user. Allow the user to go back to a “main menu” when a query is completed. They can then be offered the option to submit another request. To enable this efficiently, be sure to load the data from the file into the array(s) only once at the launch of the program. Use the speed of array searches to retrieve the info instead of reloading the file for each transaction. Be sure to build in error messages for incorrect FIPS codes or state codes. Years must also be in the range of 2010 to 2017. Finally, also be sure to consider modularity in this program whether using a procedural approach or an object oriented approach. Deliverables Deliver the following to the online course management system dropbox as your final product: . Upload your source code (.java) file Notice This is an individual assignment. You must complete this assignment on your own. You may not discuss your work in detail with anyone except the instructor. You may not acquire from any source (e.g., another student or an internet site), a partial or complete solution to a problem or project that has been assigned. You may not show another student your solution to an assignment. You may not have another person (current student, former student, tutor, friend, anyone) “walk you through” how to solve the assignment.
Objective To build a complete working Java program that offer practice with basic Java graphical user interface components and the interaction of a GUI with an object of a class. Overview & Instruction Write a Java program that will determine the risk of severe weather at a given weather station by utilizing measurements taken from a weather balloon. Weather balloons are used to observe upper air measurements. There are several severe weather indexes used by meteorologists. Two are included in this assignment. Each include a simple arithmetic formula and are defined to calculate values based on patterns and conditions likely to produce severe weather. They offer a forecaster a quick number that can be referenced to assist in judging weather risks on a given day. These measurements taken via weather balloons are not taken at standard heights, but instead at standard pressure levels (in the unit of millibars). The severe weather indexes your program will calculate require the following values:°C T850 temperature at 850 mb Td850 dew point at 850 mb T700 Td700 temperature at 700 mb dew point at 700 mb T500 temperature at 500 mb For validation, assume that 850 mb values must be between -40 °C and 40 °C, 700 mb values must be between -60 °C and 10 °C, and 500 mb values must be between -50 °C and 0 °C. Note also that the dew point values can never exceed the temperature values. Below are two indexes your program needs calculate: Total Totals Index TT = T850 + Tda50 – 2(Т500) in degrees Celsius The value produced then can be interpreted to produce the following forecasts: Total Totals Index Severe Weather Risk Less Than 43 Thunderstorms Unlikely 44 to 45 46 to 47 48 to 49 50 to 51 52 to 55 Isolated Moderate Thunderstorms Scattered Moderate, Few Heavy Thunderstorms Scattered Moderate, Few Heavy, Isolated Severe Thunderstorms Scattered Heavy, Few Severe Thunderstorms, Isolated Tornadoes Scattered to Numerous Heavy, Few to Scattered Severe Thunderstorms, Isolated Tornadoes Greater Than 55 Numerous Heavy, Scattered Severe Thunderstorms, Few to Scattered Tornadoes
Objective To build a complete working Java program that offer practice with Java string manipulation. Overview & Instruction Write a Java program that performs encryption of a given prioritized message. The chosen encryption routine requires a one-word key known by both the sender and receiver. The letters of the key are used to shift the characters of the message. For example if the key is: MATH, and the message is: Delta College is open., the encryption would be: DELTACOLLEGEISOPEN Capitalize the message and remove spaces and punctuation. MATHMATHMATHМАТНМА Repeat the key as needed and align with characters of message. PEEAMCHSXEZLUSHWON The encrypted message. Note that ‘A’ shifts zero positions, ‘B’ shifts one position, and so on through ‘Z’ that shifts 25 positions. If the shifting rolls past the end of the alphabet, then it must “wrap around” (i.e. ‘Z’ shifts next to ‘A’, etc.) You are only responsible for the encryption routine. Assume another team is handling the decryption. The behavior of your program should include a simple input via a dialog box. This could be an example: Mëssage: P, Delta College is closed. Key: MATH All messages should be prefaced with one of four possible characters priority codes: Z-FLASH O- IMMEDIATE P- PRIORITY R- ROUTINE These decrease in criticality as your read left-to-right. FLASH implies life or death urgency, while ROUTINE of course deals with messages in accords to its name. Output from the example above there would be: PRIORITY PEEAMCHSXEZLUSHWỒN A legitimate message can only include one of the four valid characters, exactly one comma, and then one or more characters following the comma. Also, the key must be one capitalized word of at least four characters in length. Be sure to include an error message if these basic formatting requirements are not followed for the input strings. Finaliy, design your solution using an object oriented approach. This implies a Message class that will include the ability to validate and manage the priority code, and to encrypt the message. A suggested design would be to pass or “set” the user entry into the Message object as it is being constructed. Your main driver application class can then be narrowed to managing the user interaction as well as method calls to the one Message object in the solution. Utilize dialog boxes for both input and output. Deliverables Deliver the following to the online course management system dropbox as your final product: Notice Upload your source code (.java) files This is an individual assignment. You must complete this assignment on your own. You may not discuss your work in detail with anyone except the instructor. You may not acquire from any source (e.g., another student or an internet site), a partial or complete solution to a problem or project that has been assigned. You may not show another student your solution to an assignment. You may not have another person (current student, former student, tutor, friend, anyone).”walk you through” how to solve the assignment.
Objective To build a complete working Java program that applies methods and basic object oriented programming. Overview & Instruction Write a Java application that will manage a car rental transaction. Your solution should include two files: one containing the CarRental class including the data and method definitions and a second file to contain the “driver” application. Build your solution such that all interaction with the user is contained within the driver application. Design your class to meet the following specifications: Data Customer classification Days vehicle rented Odometer mileage at start Odometer mileage at end Class CarRental Methods No-argument constructor Parameterized constructor set/get methods Validate data Calculate rental cost Your application should essentially do the following: 1. Read the input from the user via dialog-based input and “set”into one object of the CarRental class. 2. Validate the input. If the information is invalid. Be sure to utilize the class member designated for error checking. This method should return a boolean value back to the driver class if any of the values “set” in the object are invalid. For invalid input, you can either terminate the program or loop to offer the user another opportunity to start again. 3. If all data are valid, build contents of a summary output statement into one output dialog. Include the following: o Miles driven o Days rented • Rental base charge o Rental mileage charge o Total rental charge Consider the following specifications for the program: Actual vehicle odometers measure to the 1/10 of a mile. Your program should accommodate this for both odometer inputs. Then, when calculating the files driven, always “round up” to the next highest mile for the tenths value of the difference.
Objective To build a complete working Java program that applies control structures and file processing. Overview & Instructions As a banker, you are responsible for reviewing Input consists of a file of raw loan data (named loandata.txt) in the following format: {name} {principal) {term} {annualRate} {creditRating} {fee-optional} Example data could be: SMITH 20000 5 4.5 750 JOHNSON 18000 4 4.2 560 2.0 The principal amount is the loan request in dollars. The term is the duration of the loan in years. Note that some loan applicants with a low credit score (under 580) will be required to pay an up-front fee. In the example above, it implies that the customer would need to pay 2.0% of the $18,000 requested principal for the added risk to the bank for taking on the risky loan. For any loan applications without this field, simply list the fee as $0.00. The formula for calculating monthly payment for a loan is: r(PV) P 1 – (1+r)- P = Payment PV = Present Value r = rate per period n = number of periods Note that the “periods” of the formula above are months. Therefore, be sure to convert n to months and furthermore define the rate r as a monthly rate (i.e. annualRate / 12.0). The loan payoff amount will be calculated as the monthly payment multiplied by the number of overall months of the loan. This is the amount the bank receives in total back from the borrower. A typical credit score ranges from 300 to 850. A bank will rate a loan applicant based on this number. From the raw credit score in the data file, enter the credit rating description (i.e. Fair, Good, etc.) in your output report. Credit Score Range Credit Rating 300-579 Very Poor 580-669 Fair 670-739 Good 740-799 Very Good 800-850 Exceptional Your output should appear as an organized, formal financial report summarizing each order in detail. You are free to write your output directly to the Java file console or you may choose to write the report to an external file. This will assist in maintaining the text formatting. An example format could be: Customer Principal Rate Years Payment Payoff Fee Credit Rating xxxxxxxxx $xxxxx.xx x.x% xx $xxxx.xx $xxxxx.xx $xxx.xx xxXXxXXXx xxXXXXXXX $xxxxx.xx x.x% xx $xxxx.xx $xxxxx.xx $xxx.xx xxxxXXXxx … and so on… TOTALS $xxxxx.xx $xxxx.xx $xxxxx.xx $xxx.xx After completing all output lines for each of the orders in the input file, write totals at the bottom of your report for the principal, monthly payments, loan payoff and fee columns. This implies accumulating the totals as you process them within the processing loop. Deliverables Deliver the following to the online course management system dropbox as your final product: Notice Upload your source code (.java) file This is an individual assignment. You must complete this assignment on your own. You may not discuss your work in detail with anyone except the instructor. You may not acquire, from any source (e.g., another student or an internet site), a partial or complete solution to a problem or project that has been assigned. You may not show another student your solution to an assignment. You may not have another person (current student, former student, tutor, friend, anyone) “walk you through” how to solve the assignment.
Objective To build a complete working Java program to apply basic data types, arithmetic operations, and dialog-based input/output. Overview & Instructions In the world of Harry Potter, a pub owner needs to replenish their stocks of ale and must work with muggles to make the deal. The negotiations conclude and a price for a number of hogheads of ale is agreed upon for a given price in U.S. dollars. Your program should summarize the transaction. Input should include: • Number of hogsheads of ale (integer) Dollar amount agreed upon per hogshead (decimal) Output for the program should be a summary of various totals in different units (both muggle and wizarding). Here is a the expected output: Amount: xx hogsheads, xxx.x gallons Cost: xxxx.xx U.S. Dollars xxxx.xx Euros xxxx.xx British Pounds xx galleons, xx sickles, xx knuts Your general formatting can vary but be sure to include all of the demonstrated values. Format gallons to one decimai place, monetary amounts to two decimal places, and be sure that the hogsheads and wizarding money values are integers. Required facts include the following: The monetary exchange rate between the wizarding and muggle worlds is: There are 54 U.S. gallons in one hogshead 1 Galleon = $25.50 1 Dollar = 0.86 Euros = 0.76 Pounds 1 Galleon = 17 Sickles 1 Sickle = 29 Knuts Include exclusive use of dialog-based input and output for this solution. You are free of course to integrate use of the Java console for testing, but be sure the final solution utilizes dialog boxes. Design your input dialogs to include clear instructions and make sure your output dialog is organized, understandable, and includes proper units. Be sure to consider documentation, code structure, neatness, and clarity in your final solution. 25,500, ow, 000,027,00 Deliverables Deliver the following to the online course management system dropbox as your final product: Notice Upload your source code (.java) file This is an individual assignment. You must complete this assignment on your own. You may not discuss your work in detail with anyone except the instructor. You may not acquire, from any source (e.g., another student or an internet site), a partial or complete solution to a problem or project that has been assigned. You may not show another student your solution to an assignment. You may not have another person (current student, former student, tutor, friend, anyone) “walk you through” how to solve the assignment.
Homework 4 EE 559 1. The pdf of a Γ(2, 1) random variable is p(z) = z exp(−z), z > 0, and the pmf of a Poisson random variable X is pX(x) = λ x e −λ/x!, λ > 0, x = 0, 1, . . .. Assuming that X1, X2, . . . , Xn is an i.i.d Poisson sample given that λ has a Γ(2, 1) prior distribution, find the MAP estimate of λ and prove that what you find is actually a value that maximizes the posterior. (10 pts) 2. Assume that you have an i.i.d sample from a population with Poisson pmf, i.e. pX(x) = λ x e −λ/x!, λ > 0, x = 0, 1, . . .. Calculate the MLE of λ and its asymptotic distribution by calculating Fisher information and compare the results with those of the Central Limit Theorem. (10 pts) 3. Assume that Y = β0 + β1X1 + · · · + βpXp + , where ∼ N (0, σ2 ). Show that the MLE and least squares estimates of the β vector are the same, which means MLE is also BLUE according to Gauss-Markov. Remember that the log-likelihood function l(β0, β1, . . . , βp) here is based on the conditional density p(Y |X1, . . . , Xp). (10 pts) 4. Find the MAP estimate of β under the assumption that Y = β0+β1X1+· · ·+βpXp+, where ∼ N (0, σ2 ) and that the prior distribution of (independent) βi , i = 1, 2, . . . , p is N (0, σ2/λ). Interpret your results. (15 pts) 5. Find the MAP estimate of β under the assumption that Y = β0+β1X1+· · ·+βpXp+, where ∼ N (0, σ2 ) and that the prior distribution of (independent) βi , i = 1, 2, . . . , p is Lap(0, σ2/λ). Interpret your results. (15 pts) 6. In the regularized least squares problem, assume that the singular value decomposition of X is UΣVT . (a) Show that the vector of predicted values is: (10 pts) yb = XβbRidge = X p j=1 uj σ 2 j σ 2 j + λ u T j y where uj are the columns of U. Conclude that greater amount of shrinkage is applied to basis vectors uj that have smaller singular values σj , for a fixed λ ≥ 0. (b) Use SVD to show that (10 pts) tr[X(XTX + λI) −1XT ] = X p j=1 σ 2 j σ 2 j + λ This quantity is equal to the degrees of freedom p when λ = 0 and is called the effective degrees of freedom for the Ridge-regularized model. 7. Time Series Classification Part 1: Feature Creation/Extraction An interesting task in machine learning is classification of time series. In this problem, we will classify the activities of humans based on time series obtained by a Wireless Sensor Network.(a) Download the AReM data from: https://archive.ics.uci.edu/ml/datasets/ Activity+Recognition+system+based+on+Multisensor+data+fusion+%28AReM %29 . The dataset contains 7 folders that represent seven types of activities. In each folder, there are multiple files each of which represents an instant of a human performing an activity.1 Each file containis 6 time series collected from activities of the same person, which are called avg rss12, var rss12, avg rss13, var rss13, vg rss23, and ar rss23. There are 88 instances in the dataset, each of which contains 6 time series and each time series has 480 consecutive values. (b) Keep datasets 1 and 2 in folders bending1 and bending 2, as well as datasets 1, 2, and 3 in other folders as test data and other datasets as train data. (c) Feature Extraction Classification of time series usually needs extracting features from them. In this problem, we focus on time-domain features. i. Research what types of time-domain features are usually used in time series classification and list them (examples are minimum, maximum, mean, etc). ii. Extract the time-domain features minimum, maximum, mean, median, standard deviation, first quartile, and third quartile for all of the 6 time series in each instance. You are free to normalize/standardize features or use them directly.2 (20 pts) Your new dataset will look like this: Instance min1 max1 mean1 median1 · · · 1st quart6 3rd quart6 1 2 3 . . . . . . . . . . . . . . . . . . . . . . . . 88 where, for example, 1st quart6, means the first quartile of the sixth time series in each of the 88 instances. iii. Estimate the standard deviation of each of the time-domain features you extracted from the data. Then, use Python’s bootstrapped or any other method to build a 90% bootsrap confidence interval for the standard deviation of each feature. (10) iv. Use your judgement to select the three most important time-domain features (one option may be min, mean, and max). v. Assume that you want to use the training set to classify bending from other activities, i.e. you have a binary classification problem. Depict scatter plots of the features you specified in 7(c)iv extracted from time series 1, 2, and 6 of each instance, and use color to distinguish bending vs. other activities. (See p. 129 of the ISLR textbook).3 (10 pts) 1Some of the data files need very minor cleaning. You can do it by Excel or Python. 2You are welcome to experiment to see if they make a difference. 3You are welcome to repeat this experiment with other features as well as with time series 3, 4, and 5 in each instance.8. Time Series Classification Part 2: Binary and Multiclass Classification Important Note: You will NOT submit this part with Homework 4. It will be the programming assignment of Homework 5. However, because it uses the features you extracted from time series data in Homework 4, and because some of you may want to start using your features to build models earlier, you are provided with the instructions of the next programming assignment. Thus, you may want to submit the code for Homework 4 with Homework 5 again, since it might need the feature creation code. Also, since this part involves building various models, you are strongly recommended to start as early as you can. (a) Binary Classification Using Logistic Regression4 i. Break each time series in your training set into two (approximately) equal length time series. Now instead of 6 time series for each of the training instances, you have 12 time series for each training instance. Repeat the experiment in 7(c)v, i.e depict scatter plots of the features extracted from both parts of the time series 1,2, and 6. Do you see any considerable difference in the results with those of 7(c)v? ii. Break each time series in your training set into l ∈ {1, 2, . . . , 20} time series of approximately equal length and use logistic regression5 to solve the binary classification problem, using time-domain features. Remember that breaking each of the time series does not change the number of instances. It only changes the number of features for each instance. Calculate the p-values for your logistic regression parameters in each model corresponding to each value of l and refit a logistic regression model using your pruned set of features.6 Alternatively, you can use backward selection using sklearn.feature selection or glm in R. Use 5-fold cross-validation to determine the best value of the pair (l, p), where p is the number of features used in recursive feature elimination. Explain what the right way and the wrong way are to perform cross-validation in this problem.7 Obviously, use the right way! Also, you may encounter the problem of class imbalance, which may make some of your folds not having any instances of the rare class. In such a case, you can use stratified cross validation. Research what it means and use it if needed. In the following, you can see an example of applying Python’s Recursive Feature Elimination, which is a backward selection algorithm, to logistic regression. 4Some logistic regression packages have a built-in L2 regularization. To remove the effect of L2 regularization, set λ = 0 or set the budget C → ∞ (i.e. a very large value). 5 If you encountered instability of the logistic regression problem because of linearly separable classes, modify the Max-Iter parameter in logistic regression to stop the algorithm immaturely and prevent from its instability. 6R calculates the p-values for logistic regression automatically. One way of calculating them in Python is to call R within Python. There are other ways to obtain the p-values as well. 7This is an interesting problem in which the number of features changes depending on the value of the parameter l that is selected via cross validation. Another example of such a problem is Principal Component Regression, where the number of principal components is selected via cross validation.# R e c u r si v e Fea tu re Elimi na tio n from s k l e a r n import d a t a s e t s from s k l e a r n . f e a t u r e s e l e c t i o n import RFE from s k l e a r n . li n e a r m o d el import L o g i s t i c R e g r e s s i o n # loa d the i r i s d a t a s e t s d a t a s e t = d a t a s e t s . l o a d i r i s ( ) # c r e a t e a ba se c l a s s i f i e r used to e v al u a t e a s u b s e t of a t t r i b u t e s model = L o g i s t i c R e g r e s s i o n ( ) # c r e a t e the RFE model and s e l e c t 3 a t t r i b u t e s r f e = RFE( model , 3 ) r f e = r f e . f i t ( d a t a s e t . data , d a t a s e t . t a r g e t ) # summarize the s e l e c t i o n of the a t t r i b u t e s p r i n t ( r f e . s u p po r t ) p r i n t ( r f e . r a n ki n g ) iii. Report the confusion matrix and show the ROC and AUC for your classifier on train data. Report the parameters of your logistic regression βi ’s as well as the p-values associated with them. iv. Test the classifier on the test set. Remember to break the time series in your test set into the same number of time series into which you broke your training set. Remember that the classifier has to be tested using the features extracted from the test set. Compare the accuracy on the test set with the cross-validation accuracy you obtained previously. v. Do your classes seem to be well-separated to cause instability in calculating logistic regression parameters? vi. From the confusion matrices you obtained, do you see imbalanced classes? If yes, build a logistic regression model based on case-control sampling and adjust its parameters. Report the confusion matrix, ROC, and AUC of the model. (b) Binary Classification Using L1-penalized logistic regression i. Repeat 8(a)ii using L1-penalized logistic regression,8 i.e. instead of using pvalues for variable selection, use L1 regularization. Note that in this problem, you have to cross-validate for both l, the number of time series into which you break each of your instances, and λ, the weight of L1 penalty in your logistic regression objective function (or C, the budget). Packages usually perform cross-validation for λ automatically.9 ii. Compare the L1-penalized with variable selection using p-values. Which one performs better? Which one is easier to implement? (c) Multi-class Classification (The Realistic Case) i. Find the best l in the same way as you found it in 8(b)i to build an L1- penalized multinomial regression model to classify all activities in your train8For L1-penalized logistic regression, you may want to use normalized/standardized features 9Using the package Liblinear is strongly recommended. ing set.10 Report your test error. Research how confusion matrices and ROC curves are defined for multiclass classification and show them for this problem if possible.11 ii. Repeat 8(c)i using a Na¨ıve Bayes’ classifier. Use both Gaussian and Multinomial pdfs and compare the results. iii. Create p Principal Components from features extracted from features you extracted from l time series. Cross validate on the (l, p) pair to build a Na¨ıve Bayes’ classifier based on the PCA features to classify all activities in your data set. Report your test error and plot the scatterplot of the classes in your training data based on the first and second principal components you found from features extracted from l time series, where l is the value you found using cross-validation. Show confusion matrices and ROC curves. iv. Which method is better for multi-class classification in this problem? 10New versions of scikit learn allow using L1-penalty for multinomial regression. 11For example, the pROC package in R does the job.
Homework 3 EE 559 1. Assune that in a c-class classification problem, we have k features X1, X2, . . . , Xk that are independent conditioned on the class label and Xj |ωi ∼ Gamma(pi , λj ), i.e. pXj |ωi (xj |ωi) = 1 Γ(pi) λ pi j x pi−1 j e −λjxj , pi , λj > 0. (30 pts) (a) Determine the Bayes’ optimal classifier’s decision rule making the general assumption that the prior probability of the classes are different. (b) When are the decision boundaries linear functions of x1, x2, . . . , xk? (c) Assuming that p1 = 4, p2 = 2, c = 2, k = 4, λ1 = λ3 = 1, λ2 = λ4 = 2, and that the prior probabilites of each class are equal, classify x = (0.1, 0.2, 0.3, 4). (d) Assuming that p1 = 3.2, p2 = 8, c = 2, k = 1, λ1 = 1, and that the prior probabilites of each class are equal, find the decision boundary x = x ∗ . Also, find the probability of type-1 and type-2 errors. (e) Assuming that p1 = p2 = 4, c = 2, k = 2, λ1 = 8, λ2 = 0.3, and P(ω1) = 1/4, P(ω2) = 3/4, find the decision boundary f(x1, x2) = 0. 2. Assume that in a c-class classification problem, there are k conditionally independent features and Xi |ωj ∼ Lap(mij , λi), i.e. pXi|ωj (xi |ωj ) = λi 2 e −λi|xi−mij | , λi > 0, i ∈ {1, 2, . . . , k}, j ∈ {1, 2, . . . , c}. Assuming that the prior class probabilities are equal, show that the minimum error rate classifier is also a minimum weighted Manhattan distance (or weighted L1-distance) classifier. When does the minimum error rate classifier becomes the minimum Manhattan distance classifier? (15 pts) 3. The class-conditional density functions of a discrete random variable X for four pattern classes are shown below: (20 pts) x p(x|ω1) p(x|ω2) p(x|ω3) p(x|ω4) 1 1/3 1/2 1/6 2/5 2 1/3 1/4 1/3 2/5 3 1/3 1/4 1/2 1/5 The loss function λ(αi |ωj ) is summarized in the following table, where action αi means decide pattern class ωi : ω1 ω2 ω3 ω4 α1 0 2 3 4 α2 1 0 1 8 α3 3 2 0 2 α4 5 3 1 0 Assume P(ω1) = 1/10, P(ω2) = 1/5, P(ω3) = 1/2, P(ω4) = 1/5. (a) Compute the conditional risk for each action as: R(αi |x) = P4 j=1 λ(αi |ωj )p(ωj |x)(b) Compute the overall risk R as: R = P3 i=1 R(α(xi)|xi)p(xi) where α(xi) is the decision rule minimizing the conditional risk for xi . 4. The following data set was collected to classify people who evade taxes: Tax ID Refund Marital Status Taxable Income Evade 1 Yes Single 122 K No 2 No Married 77 K No 3 No Married 106 K No 4 No Single 88 K Yes 5 Yes Divorced 210 K No 6 No Single 72 K No 7 Yes Married 117 K No 8 No Married 60 K No 9 No Divorced 90 K Yes 10 No Single 85 K Yes Considering relevant features in the table (only one feature is not relevant), assume that the features are conditionally independent. (25 pts) (a) Estimate prior class probabilities. (b) For continuous feature(s), assume conditional Gaussianity and estimate class conditional pdfs p(x|ωi). Use Maximum Likelihood Estimates. (c) For each discrete feature X, assume that the number of instances in class ωi for which X = xj is nji and the number of instances in class ωi is ni . Estimate the probability mass pX|ωi (xj |ωi) = P(X = xj |ωi) as nji/ni for each discrete feature. Is this a valid estimate of the pmf? (d) There is an issue with using the estimate you calculated in 4c. Explain why the laplace correction (nji+1)/(ni+l), where l is the number of levels X can assume,1 solves the problem with the estimate given in 4c. Is this a valid estimate of the pmf? (e) Estimate the minimum error rate decision rule for classifying tax evasion using Laplace correction. 5. Programming Part: Breast Cancer Prognosis The goal of this assignment is to determine the prognosis of breast cancer patients using the features extracted from digital images of Fine Needle Aspirates (FNA) of a breast mass. You will work with the Wisconsin Prognostic Breast Cancer data set, WPBC. There are 35 attributes in the data set: the first attribute is a patient ID, the second is an outcome variable that shows whether the cancer recurred after two years or not (N for Non-recurrent, R for Recurrent), the third variable is also an income 1For example, if X ∈ {apple, orange, pear, peach, blueberry}, then d = 5.variable that shows the time to recurrence. The other 30 attributes are the features that you will work with to build a diagnosis tool for breast cancer. Ten real-valued features are calculated for each nucleus in the digital image of the FNA of a breast mass.2 They are: • radius (mean of distances from center to points on the perimeter) • texture (standard deviation of gray-scale values) • perimeter • area • smoothness (local variation in radius lengths) • compactness (perimeter2 / area – 1.0) • concavity (severity of concave portions of the contour) • concave points (number of concave portions of the contour) • symmetry • fractal dimension “coastline approximation” – 1) The mean, standard error, and “worst” or largest (mean of the three largest values) of these features were computed for each image, to represent each image using 3 × 10 features. For instance, field 4 is Mean Radius, field 14 is Radius SE, field 24 is Worst Radius. Additionally, the diameter of the excised tumor in centimeters and the number of positive axillary lymph nodes are also given in the data set as attributes 34 and 35. Important Note: Time to recurrence (third attribute) should not be used for classification, otherwise, you will be able to perfectly classify! There are 198 instances in the data set, 151 of which are nonrecurrent, and 47 are recurrent. (a) Download the WPBC data from: https://archive.ics.uci.edu/ml/datasets/ Breast+Cancer+Wisconsin+(Prognostic). (b) Select the first 130 non-recurrent cases and the first 37 recurrent cases as your training set. Add record #197 in the data set to your training set as well. (10 pts) (c) There are four instances in your training set that are missing the lymph node feature (denoted as ?). This is not a very severe issue, so replace the missing features with the median of the lymph node feature in your training set. (5 pts) (d) Binary Classification Using Na¨ıve Bayes’ Classifiers 2For more details see: https://www.researchgate.net/publication/2512520_Nuclear_Feature_ Extraction_For_Breast_Tumor_Diagnosis.i. Solve the problem using a Na¨ıve Bayes’ classifier.3 Use Gaussian class conditional distributions. Report the confusion matrix, ROC, precision, recall, F1 score, and AUC for both the train and test data sets. (10 pts) ii. This data set is rather imbalanced. Balance your data set using SMOTE, by downsampling the common class in the training set to 90 instances and upsampling the uncommon class to 90 instances. Use k = 5 nearest neighbors in SMOTE. Remember not to change the balance of the test set. Report the confusion matrix, ROC, precision, recall, F1 score, and AUC for both the train and test data sets. Does SMOTE help? (10 pts) (e) (Extra practice, will not be graded) Solve the regression problem of estimating time to recurrence (third attribute) using the next 32 attributes. You can use KNN regression. To do it in a principled way, select 20% of data points each class in your training set to choose the best k ∈ 1, 2, . . . , 20, and the rest 80% as the new training set. Report your MSE on the test set using the k you found and the whole training set (not only the new training set!). For simplicity, use Euclidean Distance. Repeat this process when you apply SMOTE to your new training set to only upsample the rare class and make the data completely balanced. Does SMOTE help in reducing the MSE? 3You can drop patient ID, since it is an irrelevant feature.
Homework 2 EE 559 1. Prove the Gauss-Markov Theorem, i.e. show that the least squares estimate in linear regression is the BLUE (Best Linear Unbiased Estimate), which means Var(a T βb) ≤ Var(c T y) where c T y is any unbiased estimator for a T β. (20 pts) 2. (Linear Regression with Orthogonal Design) Assume that the columns x0, . . . , xp of X are orthogonal. Express βbj in terms of x0, x1, . . . , xp and y. (10 pts) 3. (The Minimum Norm Solution) When XTX is not invertible, the normal equations XTXβ = XT y do not have a unique solution. Assume that X ∈ R n×(p+1) r , where r is the rank of X. Assume that the SVD of X is UΣVT , where U ∈ R n×r satisfies UTU = Ir. Also V ∈ R (p+1)×r satisfies VTV = Ir and Σ = diag(σ1, . . . , σr) is the diagonal matrix of positive singular values. (a) Show that βmns = VΣ−1UT y is a solution to the normal equations. (5 pts) (b) Show that for any other solution β to the normal equations, kβk ≥ kβmnsk. [Hint: one way (and not the only way) of doing this is to show that β = βmns + b.] (15 pts) (c) Is VΣ−1UT the pseudo-inverse of X? (Hint: you can prove or disprove using the so-called Penrose properties) (10 pts) 4. Programming Part: Combined Cycle Power Plant Data Set The dataset contains data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant. (a) Download the Combined Cycle Power Plant data1 from: https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant (b) Exploring the data: ( 5 pts) i. How many rows are in this data set? How many columns? What do the rows and columns represent? ii. Make pairwise scatterplots (scatter matrix) of all the varianbles in the data set including the predictors (independent variables) with the dependent variable. Describe your findings. iii. What are the mean, the median, range, first and third quartiles, and interquartile ranges of each of the variables in the dataset? Summarize them in a table. (c) For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back 1There are five sheets in the data. All of them are shuffled versions of the same dataset. Work with Sheet 1.up your assertions. Are there any outliers that you would like to remove from your data for each of these regression tasks? (10 pts) (d) Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis H0 : βj = 0? (10 pts) (e) How do your results from 4c compare to your results from 4d? Create a plot displaying the univariate regression coefficients from 4c on the x-axis, and the multiple regression coefficients from 4d on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis. (5 pts) (f) Is there evidence of nonlinear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form2 Y = β0 + β1X + β2X 2 + β3X 3 + (g) Is there evidence of association of interactions of predictors with the response? To answer this question, run a full linear regression model with all pairwise interaction terms and state whether any interaction terms are statistically significant. (5 pts) (h) Can you improve your model using possible interaction terms or nonlinear associations between the predictors and response? Train the regression model on a randomly selected 70% subset of the data with all predictors. Also, run a regression model involving all possible interaction terms XiXj as well as quadratic nonlinearities X2 j , and remove insignificant variables using p-values (be careful about interaction terms). Test both models on the remaining points and report your train and test MSEs. (10 pts) (i) KNN Regression: i. Perform k-nearest neighbor regression for this dataset using both normalized and raw features. Find the value of k ∈ {1, 2, . . . , 100} that gives you the best fit. Plot the train and test errors in terms of 1/k. (10 pts) (j) Compare the results of KNN Regression with the linear regression model that has the smallest test error and provide your analysis. (5 pts) 2https://scikit-learn.org/stable/modules/preprocessing.htm#generating-polynomial-features
Homework 1 EE 559 1. Assume that you are given the following sample . Estimate the weight of people whose heights are 150, 155, 165, and 190 cm, using KNN with k = 3: yˆKNN = y1 + y2 + ··· + yk k where y1, y2, ··· , yk are the labels of the k nearest neighbors to your test instance. (10 pts) Person Height (cm) Weight (kg) 1 171 80 2 168 78 3 191 100 4 182 80 5 150 65 6 178 83 2. Repeat 1, but instead of using the simple average of the labels of k nearest neighbors, which is use the following weighted average: yˆKNN = w1y1 + w2y2 + ··· + wkyk w1 + w2 + ··· + wk where the weight wi for the label yi of instance i is determined as 1/di, where di the distance between the instance i and the test instance. (10 pts) 3. Assume that J(x) = xTQx + dT x + c where Q = QT 2 Rn⇥n , x, d 2 Rn, and c 2 R|. Show that rxJ(x)=2Qx + d and H = @2J @x@xT = 2Q. Hij = @2J @xi@xj and H is called the Hessian matrix of J. (10 pts) 4. Write down the prediction yb for a test row vector x0 1⇥p made by a linear regression model in terms of y the vector of labels of the training set and Xn⇥(p+1), the (augmented) feature matrix, and explain why yb can be viewed as a special case of KNN regression. (10 pts) 5. Show that the for y 2 Rn, yb = X(XTX)1XT y is a member of the column space of X, i.e. is a linear combination of the columns of X 2 Rn⇥(p+1). (10 pts) 6. Show that in linear regression, if b minimizes RSS(), then y yb is orthogonal to the column space of X. (10 pts) 7. Programming Part: Vertebral Column Data Set This Biomedical data set was built by Dr. Henrique da Mota during a medical residence period in Lyon, France. Each patient in the data set is represented in the data set by six biomechanical attributes derived from the shape and orientation of the pelvis and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle,sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and Abnormal (AB). In this exercise, we only focus on a binary classification task NO=0 and AB=1.1 (a) Download the Vertebral Column Data Set from: https://archive.ics.uci. edu/ml/datasets/Vertebral+Column. (b) Pre-Processing and Exploratory data analysis: (10 pts) i. Make scatterplots of the independent variables in the dataset. Use color to show Classes 0 and 1. ii. Make boxplots for each of the independent variables. Use color to show Classes 0 and 1 (see ISLR p. 129). iii. Select the first 70 rows of Class 0 and the first 140 rows of Class 1 as the training set and the rest of the data as the test set. (c) Classification using KNN on Vertebral Column Data Set (20 pts) i. Write code for k-nearest neighbors with Euclidean metric (or use a software package). ii. Test all the data in the test database with k nearest neighbors. Take decisions by majority polling. Plot train and test errors in terms of k for k 2 {208, 205,…, 7, 4, 1, } (in reverse order). You are welcome to use smaller increments of k. Which k⇤ is the most suitable k among those values? Calculate the confusion matrix, true positive rate, true negative rate, precision, and F1-score when k = k⇤. 2 iii. Since the computation time depends on the size of the training set, one may only use a subset of the training set. Plot the best test error rate, 3 which is obtained by some value of k, against the size of training set, when the size of training set is N 2 {10, 20, 30,…, 210}. 4 Note: for each N, select your training set by choosing the first bN/3c rows of Class 0 and the first N bN/3c rows of Class 1 in the training set you created in 7(b)iii. Also, for each N, select the optimal k from a set starting from k = 1, increasing by 5. For example, if N = 200, the optimal k is selected from {1, 6, 11,…, 196}. This plot is called a Learning Curve. Let us further explore some variants of KNN. (d) Replace the Euclidean metric with the following metrics5 and test them. Summarize the test errors (i.e., when k = k⇤) in a table. Use all of your training data and select the best k when {1, 6, 11,…, 196}. (10 pts) 1Make sure that you convert labels to 0 and 1, otherwise you may not obtain correct answers. 2We will learn in the lectures what these mean, for now research how they are computed and compute them. 3Obviously, use the test data you created in 7(b)iii 4For extra practice, you are welcome to choose smaller increments of N. 5You can use sklearn.neighbors.DistanceMetric. Research what each distance means.i. Minkowski Distance: A. which becomes Manhattan Distance with p = 1. B. with log10(p) 2 {0.1, 0.2, 0.3,…, 1}. In this case, use the k⇤ you found for the Manhattan distance in 7(d)iA. What is the best log10(p)? C. which becomes Chebyshev Distance with p ! 1 ii. Mahalanobis Distance.6 (e) The majority polling decision can be replaced by weighted decision, in which the weight of each point in voting is inversely proportional to its distance from the query/test data point. In this case, closer neighbors of a query point will have a greater influence than neighbors which are further away. Use weighted voting with Euclidean, Manhattan, and Chebyshev distances and report the best test errors when k 2 {1, 6, 11, 16,…, 196}. (10 pts) (f) What is the lowest training error rate you achieved in this homework? (5 pts) 6Mahalanobis Distance requires inverting the covariance matrix of the data. When the covariance matrix is singular or ill-conditioned, the data live in a linear subspace of the feature space. In this case, the features have to be transformed into a reduced feature set in the linear subspace, which is equivalent to using a pseudoinverse instead of an inverse. 3 1. Assume that you are given the following sample. Estimate the weight of people whose heights are 150, 155, 165, and 190 cm, using KNN with k = 3: Y1 +Y2 ++ Yk ЎKNN = k where y1, 2,, Ук аre the labels of the k nearest neighbors to your test instance. (10 pts) Person Height (cm) Weight (kg) 1 171 80 2 168 78 3 191 100 4 182 80 5 150 65 6 178 83 2. Repeat 1, but instead of using the simple average of the labels of k nearest neighbors, which is use the following weighted average: WiY1 + W2Y2 + · + WkYk YKNN = W1 + W2+.+Wk where the weight w; for the label y; of instance i is determined as 1/di, where di the distance between the instance i and the test instance. (10 pts) 4. Write down the prediction ŷ for a test row vector x1xp made by a linear regression model in terms of y the vector of labels of the training set and Xnx(p+1), the (augmented) feature matrix, and explain why ŷ can be viewed as a special case of KNN regression. (10 pts) 3. Assume that J(x) = x+Qx+d®x+ c where Q = QT € Rnxn, x, d∈R”, and c ∈ R. Show that xJ(x) = 2Qx+ d and H = T = 2Q. Hi = ser 82J and H is called the Hessian matrix of J. (10 pts) 6. Show that in linear regression, if ẞ minimizes RSS(3), then y – ӯ is orthogonal to the column space of X. (10 pts) 5. Show that the for y ∈ R”, ӯ = X(XTX)-XTy is a member of the column space of X, i.e. is a linear combination of the columns of X ∈ Rnx(p+1). (10 pts)
1. Disasters involving explosions and fires pose a substantial threat to human life and property. Managaing chemical fires is complex and requires accurate assessment of fuel sources. Martinka et al. [1] demonstrate a CNN-based approach to discriminate burning liquids using a static flame image. In this assignment you will use transfer learning to fine-tune a residual CNN to predict the same. Dataset Download the burning liquid dataset from https://doi.org/10.1007/s10973-021-10903-2 – Supplementary Information, File #2. The dataset consists of 3000 hi-resolution flame images of burning ethanol, pentane, and propanol. (a) Extract the images into a data folder. Then split the images into subfolders named ethanol, pentane, and propanol based on their filename. This structure (shown below) allows you to use the standard PyTorch ImageFolderDataset class for the custom dataset. See the torchvision documentation for more detail: https://pytorch.org/vision/stable/datasets. html. data/ ethanol/ […] pentane/ […] propanol/ […] (b) Split training, validation, and testing sets using a ratio appropriate for the dataset size. (c) Use PyTorch DataSet transforms to resize the images to the model’s input dimension of 224 × 224 × 3. See: https://pytorch.org/vision/stable/transforms.html. Then apply normalization and/or contrast enhancement to adjust the intensity-range. Include reasonable data augmentations such as rotations, flips, and scaling using. Model Load the pretrained torchvision ResNet-34 model. Replace the final classification output-layer to match the number of burning liquid classes. import torch.nn as nn from torchvision.models import resnet34 # https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py # find, ResNet::forward, self.fc num_classes = 3 model = resnet34(pretrained=True) model.fc = nn.Linear(model.fc.in_features, num_classes) Training Fine-tune the pretrained model using the custom dataset. Experiment with reasonable learning rates, batch sizes, and training epochs. Freeze layers so that the initially large classifier training gradients do not propagate through the pretrained feature extractor. Use the property requires grad = False to freeze model parameters. For example, to freeze all layers except the new classifier output layer: for param in model.parameters(): param.requires_grad = False for param in model.fc.parameters(): param.requires_grad = True Begin by freezing all layers except the fully connected classifier layer(s). Train the model with a small learning rate (e.g., 1e-4) over several epochs. Then progressively unfreeze layers to adapt the pretrained model to the new dataset. Experiment with smaller smaller learning rates (e.g., 1e-5) as performance plateaus. Plot learning and accuracy curves for the training and validation sets. Include comments and/or annotate the figures to indicate when you adjusted layer freezing and changed the learning rate. Layer visualization Visualize the feature maps of several convolutional layers within the model. Activation intensity can provide insight and explainability of the internal representation. Create feature maps of the first convolutional layer and a selection of layers from the middle of the network. The following snippet shows how to add a PyTorch hook to programatically capture layer outputs. Refer to the torchvision ResNet source code [2] or print a model summary to identify specific layers by name. Then use torchvision.utils.make grid or similar to produce an image grid showing output activations for all Cout filters in a layer. def visualize_hook(module, input, output): plt.figure(figsize=(15, 15)) for i in range(output.size(1)): plt.subplot(8, 8, i + 1) plt.imshow(output[0, i].detach().cpu().numpy(), cmap=”gray”) plt.axis(“off”) plt.show() # Choose a specific layer and register the hook layer_to_visualize = model.conv1 hook = layer_to_visualize.register_forward_hook(visualize_hook) # Run a single image through the model image = torch.randn(1, 3, 224, 224) # Replace this with a real image from the dataset _ = model(image) hook.remove() # Remove the hook Analysis • Report the accuracy of the fine-tuned model on the testing set. Compare the accuracy to the baseline vanilla pretrained ResNet-34 model. • Generate a confusion matrix to show inter-class error rates. • The sklearn.metrics module provides several loss, score, and utility functions to measure classification performance. https://scikit-learn.org/stable/modules/model_evaluation. html#classification-metrics. Create a precision-recall curve for each class. Precision-recall curves show the trade-off between the true positive rate (precision) and the positive predictive value (recall) as the discrimination threshold T varies from 0 to 1. They are a standard metric to compare binary classifiers. https://scikit-learn.org/stable/auto_examples/model_selection/ plot_precision_recall.html. Calculate the precision and recall for each class by treating it prediciton as a binary classification (i.e., one-vs-many). Then plot the P-R curves on the same plot. You may use preprocessing.label binarize and metrics.precision recall curve from sklearn. References [1] Martinka, J., Neˇcas, A., Rantuch, P. The recognition of selected burning liquids by convolutional neural networks under laboratory conditions. J Therm Anal Calorim 147, 5787-5799 (2022). [2] https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py
The softmax function h(·) takes an M-dimensional input vector s and outputs an M-dimensional output vector a as a = h(s) = 1 X M m=1 e sm e s1 e s2 . . . e sM and the multiclass cross-entropy cost is given by C = − Xn i=1 yi ln ai where y is a vector of ground truth labels. Define the error (vector) of the output layer as: δ = ∇sC = A˙ ∇aC where A˙ is the matrix of derivatives of softmax, given as A˙ = dh(s) ds = ∂p1 ∂s1 · · · ∂pM ∂s1 . . . . . . . . . ∂p1 ∂sM · · · ∂pM ∂sM . (denominator convention with the left-handed chain rule.). Show that δ = a − y if y is one-hot. 2. Logistic regression The MNIST dataset of handwritten digits is one of the earliest and most used datasets to benchmark machine learning classifiers. Each datapoint contains 784 input features – the pixel values from a 28 × 28 image – and belongs to one of 10 output classes – represented by the numbers 0-9. This problem continues your logistic regression experiments from the previous Homework. Use only Python standard library modules, numpy, and mathplotlib for this problem. (a) Logistic “2” detector Previous HW. (b) Softmax classification: gradient descent (GD) In this part you will use soft-max to peform multi-class classification instead of distinct “one against all” detectors. The target vector [Y]l = ( 1 x is an “l” 0 else. for l = 0, . . . , K − 1. You can alternatively consider a scalar output Y equal to the value in {0, 1, . . . , K − 1} corresponding to the class of input x. Construct a logistic classifier that uses K seperate linear weight vectors w0, w1, . . . , wK−1. Compute estimated probabilities for each class given input x and select the class with the largest score among your K predictors: P [Y = l|x, w] = exp(wT l x) PK i=0 exp(wT i x) Yˆ = arg max l P [Y = l|x, w] . Note that the probabilities sum to 1. Use log-loss and optimize with batch gradient descent. The (negative) likelihood function on an N sampling training set is: L(w) = − 1 N X N i=1 log P h Y = y (i) |x (i) , wi where the sum is over the N points in our training set. Submit answers to the following. i. Compute (by-hand) the derivative of the log-likelihood of the soft-max function. Write the derivative in terms of conditional probabilities, the vector x, and indicator functions (i.e., do not write this expression in terms of exponentials). You need this gradient in subsequent parts of this problem. ii. Implement batch gradient descent. What learning rate did you use? iii. Plot log-loss (i.e., learning curve) of the training set and test set on the same figure. On a separate figure plot the accuracy against iteration number of your model on the training set and test set. Plot each as a function of the iteration number. iv. Compute the final loss and final accuracy for both your training set and test set. (c) Softmax classification: stochastic gradient descent In this part you will use stochastic gradient descent (SGD) in place of (deterministic) gradient descent above. Test your SGD implmentation using single-point updates and a mini-batch size of 100. You may need to adjust the learning rate to improve performance. You can either: modify the rate by hand or according to some decay scheme or you may choose a single learning rate. You should get a final predictor comparable to that in the previous question. Submit answers to the following. i. Implement SGD with mini-batch size of 1 (i.e., compute the gradient and update weights after each sample). Record the log-loss and accuracy of the training set and test set every 5,000 samples. Plot the sampled log-loss and accuracy values on the same (respective) figures against the batch number. Your plots should start at iteration 0 (i.e., include initial log-loss and accuracy). Your curves should show performance comparable to batch gradient descent. How many iterations did it take to acheive comparable performance with batch gradient descent? How does this number depend on the learning rate? (or learning rate decay schedule if you have a non-constant learning rate). ii. Compare (to batch gradient descent) the total computational complexity to reach a comparable accuracy on your training set. Note that each iteration of batch gradient descent costs an extra factor of N operations where N is the number data points. iii. Implement SGD with mini-batch size of 100 (i.e., compute the gradient and update weights with accumulated average after every 100 samples). Record the log-loss and accuracies as above (every 5,000 samples – not 5,000 batches) and create similar plots. Your curves should show performance comparable to batch gradient descent. How many iterations did it take to acheive comparable performance with batch gradient descent? How does this number depend on the learning rate? (or learning rate decay schedule if you have a non-constant learning rate). iv. Compare the computational complexity to reach comparable perforamnce between the 100 sample mini-batch algorithm, the single-point mini-batch, and batch gradient descent. Submit your trained weights to Autolab. Save your weights and bias to an hdf5 file. Use keys W and b for the weights and bias, respectively. W should be a 10 × 784 numpy array and b should be 10 × 1 – shape: (10,) – numpy array. The code to save the weights is the same as (a) – substituting W for w. Note: you will not be scored on your models overall accuracy. But a low-score may indicate errors in training or poor optimization.
Consider an MLP with three input nodes, two hidden layers, and three outputs. The hidden layers use the ReLU activation function and the output layer users softmax. The weights and biases for this MLP are: W(1) = ” 1 −2 1 3 4 −2 # , b (1) = ” 1 −2 # W(2) = ” 1 −2 3 4 # , b (2) = ” 1 0 # W(3) = 2 2 3 −3 2 1 , b (3) = 0 −4 −2 (a) Feedforward Computation: Perform the feedforward calculation for the input vector x = [ +1 − 1 + 1 ]T. Fill in the following table. Follow the notation used in the slides, i.e., s (l) is the linear activation, a (l) = h(s (l) ), and a˙ (l) = h˙(s (l) ). l: 1 2 3 s (l) : a (l) : a˙ (l) : (not needed) (b) Backpropagation Computation: Apply standard SGD backpropagation for the input assuming a multi-category cross-entropy loss function and one-hot labeled target: y = [ 0 0 1 ]T. Follow the notation used in the slides, i.e., δ (l) = ∇s (l)C. Enter the delta values in the table below and provide the updated weights and biases assuming a learning rate η = 0.5. l: 1 2 3 δ (l) : W(l) : b (l) :The MNIST dataset of handwritten digits is one of the earliest and most used datasets to benchmark machine learning classifiers. Each datapoint contains 784 input features – the pixel values from a 28 × 28 image – and belongs to one of 10 output classes – represented by the numbers 0-9. In this problem you will use numpy to classify input images using a logistic-regression. Use only Python standard library modules, numpy, and mathplotlib for this problem. (a) Logistic “2” detector In this part you will use the provided MNIST handwritten-digit data to build and train a logistic “2” detector: y = ( 1 x is a “2” 0 else. A logistic classifier takes learned weight vector w = [w1, w2, . . . wL] T and the unregularized offset bias b ≜ w0 to estimate a probability that an input vector x = [x1, x2, . . . , xL] T is “2”: p(x) = P [Y = 1|x, w] = 1 1 + exp − PL i=1 wk · xk + w0 = 1 1 + exp (− (wT x + w0)). Train a logistic classifier to find weights that minimize the binary log-loss (also called the binary cross entropy loss): l(w) = − 1 N X N i=1 (yi log p(x)) + (1 − yi) log (1 − p(x)) where the sum is over the N samples in the training set. Train your model until convergence according to some metric you choose. Experiment with variations of ℓ1- and/or ℓ2-regularization to stabilize training and improve generalization. Submit answers to the following. i. How did you determine a learning rate? What values did you try? What was your final value? ii. Describe the method you used to establish model convergence. iii. What regularizers did you try? Specifically, how did each impact your model or improve its performance? iv. Plot log-loss (i.e., learning curve) of the training set and test set on the same figure. On a separate figure plot the accuracy against iteration number of your model on the training set and test set. Plot each as a function of the iteration number. v. Clasify each input to the binary output “digit is a 2” using a 0.5 threshold. Compute the final loss and final accuracy for both your training set and test set. Submit your trained weights to Autolab. Save your weights and bias to an hdf5 file. Use keys w and b for the weights and bias, respectively. w should be a length-784 numpy vector/array and b should be a numpy scalar. Use the following as guidance: with h5py.File(outFile, ‘w’) as hf: hf.create_dataset(‘w’, data = np.asarray(weights)) hf.create_dataset(‘b’, data = np.asarray(bias)) Note: you will not be scored on your models overall accuracy. But a low-score may indicate errors in training or poor optimization.
Consider the problem of estimating a scalar random variable Y from a vector observation X ∈ R n . We want to find the linear MMSE estimator Yˆ = wT X that minimizes the mean squared error (MSE), E[(Y − Yˆ ) 2 ]. (a) Given two zero-mean jointly Gaussian random variables X and Y with covariance matrix K = ” 5 2 2 4# , find the linear MMSE estimator Yˆ = w ∗X for Y given X. That is, find the optimal weight w ∗ that minimizes the MSE. (b) Calculate the minimum mean squared error (MMSE) achieved by the optimal estimator Yˆ = w ∗X. (c) Show that for jointly Gaussian random variables, the linear MMSE estimator found in part (a) is equivalent to the conditional expectation E[Y |X]. In other words, prove that w ∗X = E[Y |X]. (d) Now suppose X and Y are not jointly Gaussian but have the same covariance matrix K as above. Find the linear MMSE estimator Yˆ = ˜wX in this case. Is the MMSE achieved by w˜ different from the jointly Gaussian case in part (b)? Explain why or why not. 2. Eigenanalysis of Covariance Matrix and PCA. Consider a zero-mean random vector X ∈ R 3 with covariance matrix K = 4 −1 2 −1 5 −1 2 −1 3 . (a) Find the eigenvalues λk and orthonormal eigenvectors ek of K. (b) Show that the covariance matrix K can be expressed in terms of its eigenvalues and eigenvectors using the spectral decomposition (this is a special case of Mercer’s theorem): K = X 3 k=1 λkeke T k . (c) Express X using its Karhunen-Lo`eve expansion (KL expansion), i.e., X = X 3 k=1 Zkek, where Zk are uncorrelated random variables with zero mean and variance equal to the corresponding eigenvalues λk. This expansion is closely related to Principal Component Analysis (PCA), where the eigenvectors of the covariance matrix are called principal components and the eigenvalues represent the variance, often intepreted as “power,” captured by each component. (d) Suppose you want to approximate X using only its two dominant eigenmodes (i.e., the two principal components with the largest eigenvalues). Write the approximation X˜ in terms of the eigenvectors and eigenvalues of K. This is an example of dimensionality reduction using PCA. (e) What is the mean squared error (MSE) of the approximation in (d), i.e., E[∥X − X˜ ∥ 2 ]? Express your answer in terms of the eigenvalues. This MSE is related to the concept of reconstruction error in PCA and the total variance captured by the selected principal components. 3. The secant method is an iterative root-finding algorithm. It uses a sequence of secant line roots to approximate c such that f (c) = 0 for a continuous function f. Unlike Newton’s method it does not require knowledge or evaluation of the derivative f ′ . The secant method is defined by the recurrence: xn = xn−1 − f (xn−1) xn−1 − xn−2 f (xn−1) − f (xn−2) . Write a python script that uses the secant method to approximate the root of a continuous function f in the interval [a, b]. You may assume that f has at most one root in [a, b]. Use |xk+1 − xk| < 10−10 as the convergence criterion. Let N be the number of iterations to reach convergence. Output N followed by the three root approximations xN−2, xN−1, xN . Output each number to its own line and use precision sufficient to show convergence. Import the function f from a file named func.py in the same directory as your script — i.e., from func import f. You may assume that f is continuous on [a, b] and that func.f(x) returns a scalar float for all x ∈ [a, b]. Your script should accept a and b as two numeric command line arguments, i.e., python hw3p1.py “1.1” “1.4”. Your script must validate that a and b are numeric, verify that a < b, and check that f(a)f(b) < 0 — see Bolzano’s Theorem. Write “Range error” to STDERR (standard error) if any of these three conditions fail and immediately terminate. Your script should not produce any output except as described above. 4. Raman spectroscopy is a technique that uses inelastic scattering of light to identify unknown chemical substances. Spectral “peaks” indicate vibrational and rotational modes and are of special importance because they act like a chemical fingerprint. Raman spectroscopy measures photon intensity vs. Raman shift. The Raman shift relates the frequencies of the exciting laser and the scattered photons and is often reported as a wavenumber — the frequency difference in wavelengths per cm (i.e., , cm−1 ). • Generate a molecular fingerprint using the spectroscopic data in raman.rod. The file contains intensity vs. wavenumber data for an unknown chemical sample. A Raman Open Database (ROD) file includes content in addition to the raw intensity data: # content more content _raman_spectrum.intensity wavenumber1 intensity1 wavenumber2 intensity2 … wavenumbern intensityn Use string matching to ignore all lines before raman spectrum.intensity. Load valid (wavenumber, intensity) pairs until the first invalid intensity line (or upon reaching the end of file). Use the method below to estimate the wavenumbers of all spectral peaks. You may use any standard NumPy or SciPy packages or experiment with your own algorithms. • First detect peaks in the raw spectral data. Use the peak locations to focus on regions of interest within the spectrum. For instance: if you detect peaks at x1 cm−1 and x2 cm−1 use regions of interest: [x1 − n1, x1 + n1] and [x2 − n2, x2 + n2]. Experiment to find “good” widths n1, n2, etc. Then use a spline to interpolate intensity within each region of interest. Calculate zero-crossings of the derivative to estimate wavenumbers with maximum intensity. (a) Print the wavenumber estimates for the eight largest spectral peak to STDOUT sorted by magnitude (largest first). (b) Create a figure that shows the Raman data (intensity vs. wavenumber) and mark each of the maximum intensity values. (c) Produce a “zoomed-in” figure for the “regions of interest” corresponding to the four largest peaks. Plot the raw spectral data and overlay your interpolating function. Use a marker to show the wavenumber with maximal intensity.