This project is intended for you to use programming concepts you learned in CSCI 111 (decision statements, loop statements) and apply them in a simple Java program (using some of the classes covered in lecture and lab). In addition, you will submit the project through Blackboard to make sure it is clear how to do that.We will look at your coding style, documentation (comments) and, of course, that the project works. Check out the grading criteria in the Projects folder in Blackboard. Ignore Javadoc for this project.Write a Java program that will 1. Ask the user to type in a sentence, using a JOptionPane.showInputDialog(). 2. The program will examine each letter in the string and count how many time the upper-case letter ‘E’ appears, and how many times the lower-case letter ‘e’ appears. The key here is to use the charAt method in class String.3. Using a JOptionPane.showMessageDialog(), tell the user how many upper and lower case e’s were in the string. 4. Repeat this process until the user types the word “Stop”. (Check out the method equalsIgnoreCase in class String to cover all upper/lower case possibilities of the word “STOP”).Program Submission: The name of your file must be Project0.java with upper/lower case exact. Your program (the .java file, not the .class file) should be submitted through Blackboard by uploading the file. In the “Comments” field of the assignment put your name and lab section. Note that Blackboard lets you submit the assignment only once. If you make a mistake, you will have to ask Dr. Lord to clear the project, and you may lose time. The program is due by midnight on the date indicated at the top of this handout, and will not be accepted after the cutoff date. The date of submission to Bb is the official date for your project.
The final robot projects will use the same Lego EV3 robot. For the final project, groups have a choice among several different projects. If you would like to modify the project or propose your own you need to have the modified project approved by the instructor first.The goal of this project is to design a robot that can move through a room environment similar to the one used for the second project (again, its layout is unknown beforehand) and clean up a number of colored blocks into the appropriate locations.For this task, a number of blocks (approximately 5cm x 5cm x 5cm in size) of two different colors (red and blue) will be distributed in random locations in a room environment. Blocks will be initially on top (and in the center) of a colored floor tile. In addition, there will be two corners in the environment that are color coded (red and blue) that serve as the deposit areas (and for which the locations are known beforehand). The task of the robot is to find the blocks and to bring them to the matching corner and leave them there.The following figure shows an example environment:The goal of this project is to build and program a robot arm that can draw polygons using a marker. Here you should build a robot arm that can move a marker mounted at its end across a piece of paper to draw arbitrary polygons given to it (at compile time). Given that polygon, the robot should move the marker to the first corner (without drawing a line) and then trace the shape on the paper. Drawing will be limited to a 15cm x 10cm area that can be located based on the kinematic characteristics of the robot constructed.The following shows an example scenario: X Y Stair Climbing Robot The goal of this project is to build and program a robot that can climb up and down a set of stairs with 10cm steps. The depth of a step can be variable but its height will be fixed. In addition steps can be winding upward (i.e. each step can be at an angle with respect to the previous one). The ends of each step will be marked in black.For this project it is necessary that at least 2 teams choose it so they can play against each other. The goal of the project is to build a robot that can play a simplified version of soccer using an IR ball and IR seeker sensors. Two teams will play against each other on a field that has 3 zones: two defense zones that only the robot of the defensive team can be in and a middle zone that both teams’ robot can be in. The goal is for each team is to have the ball cross the other team’s base line in order to score a goal. Once a goal is scored, the robots are moved into their team’s defense zone and the ball is placed in the center. Then the game is started again with the team that has been scored on getting a 1s head start.Each participating team will receive an IR seeker sensor that provides a direction signal towards the ball which is equipped with a set of IR LEDs. Each robot has to fit within 0.75f t × 0.75f t.
The goal of this project is to design a behavior-based object finding and removal robot that is able to move from an unknown position in an in-door environment to look for an object, raise an alarm, and push the object off its location on the floor. The robot should use a set of behaviors, including “wander” (search) and “wall following”. The walls of the rooms will be marked with 2” wide (1.88” is ok) blue painters tape and the object will be an empty soda can sitting on top of a red square made with red painters tape.There will be no ”door” openings to the outside of the house and the robot is to find the object, move towards it, indicate when it is within 1 foot of the object (using either a tone or a light), and then push it off the marking it is sitting on. The following figure shows an example environment:Object The behavioral repertoire of your robot should include “wander”, i.e a behavior that enables the robot to move in freespace looking for either a wall or the object, “wall following” which should permit the robot to move along the wall to be able to get to all the rooms in the environment (you might want to implement only one direction, i.e. clockwise or counterclockwise wall following), “goal finding” which should allow you to detect the object and move to it, and a ”clearing” behavior which can remove the can from the mark on the floor.As the walls are represented by blue painters tape, it is ok for the wall detection sensor (the color sensor) to cross on top of the wall. However, the center of the robot is not allowed to ever move on top of the blue tape. All walls will be either horizontal or vertical (i.e. all angles in the environment will be right angles), any piece of wall will be at least 1 foot long, and the marking on the floor under the can will be an area of red painters tape covering a 1 square foot area. At the end of the project each group has to hand in a report, the code, and a recording or your system, and give a short demonstration of their robot. During this demonstration you should provide a short description of the robot and of the details of your behavior-based control system.1. Build a mobile robot for this task. Using the parts in your robot kit, build a mobile robot for the task. (In this assignment the robot has to be able to detect and follow “walls” and to detect the object. Robot localization, on the other hand, is not important since the start location of the robot and the location of the object will not be known. One way to perform “wall following” in the given environment would be to use the color sensor to keep track of the wall.) Your project report should include a short description of your robot design (including the critical design choices made).2. Implement “wander”, “wall following”, ”goal finding”, and ”clearing” on the robot. To address the given task you have to implement a “wander” (search), a ‘wall following”, a ”goal finding” and identification, and a ”clearing” behavior for your robot. “Wander” is intended here to move the robot through freespace to a wall, “wall following” is intended to permit the robot to move between rooms, ”goal finding” is intended to locate the object, and ”clearing” is intended to move the object off of its current location on the floor. To integrate these behaviors you also have to implement a behavior coordination mechanism (e.g. subsumption, weighted averaging, etc.).Once the object has been found and your system has moved closer than 1 foot, your robot should indicate this by starting an alarm and it should then attempt to clear the object from the location (the easiest would simply be to push it for at least 1 foot to make sure it clears the area it is standing on).Your report should contain a description of the important components of your control system. The submission should also contain the actual code for the robot and a recording of the system performing the task.
The objective of this assignment is to navigate a mobile robot through an obstacle course to a goal location. The start position of the robot as well as the locations of all obstacles and of the goal are given before the robot is started.The workspace for the robot is a rectangular area, 4.88 m x 3.05 m in size (this corresponds to exactly 16 x 10 floor tiles in the lab). Obstacles are black cardboard squares 0.305 m x 0.305 m in size (the size of 1 floor tile) which will be placed in the workspace. The goal is a black circle with a radius of 0.305 m. To simplify experiments, the center of the goal area, of the obstacles, and of the start will coincide with the intersection point of four floor tiles and their orientation will be aligned with the tiles. An example of such an obstacle course is shown in the figure below.Obstacles, goal, and start location of the robot will be selected arbitrarily with the restriction that if a path exists, then there will always be a path which is at least 0.61 m (2 tiles) wide. The location of obstacles, start, and goal will be provided at compile time so it is not necessary to write interactive input routines. One possibility is to include them as a header file by providing the coordinates of their centers. For the obstacle course shown above this could look as follows:#define MAX_OBSTACLES 25 /* maximum number of obstacles */ int num_obstacles = 13; /* number of obstacles */ double obstacle[MAX_OBSTACLES][2] = /* obstacle locations */ {{0.61, 2.743},{0.915, 2.743},{1.219, 2.743},{1.829, 1.219}, {1.829, 1.524},{ 1.829, 1.829}, {1.829, 2.134},{2.743, 0.305}, {2.743, 0.61},{2.743, 0.915},{2.743, 2.743},{3.048, 2.743}, {3.353, 2.743}, {-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1}, {-1,-1},{-1,-1},{-1,-1}}; double start[2] = {0.305, 1.219}; /* start location */ double goal[2] = {3.658, 1.829}; /* goal location */At the end of the project each group has to hand in a report and give a short demonstration of their robot. During this demonstration you should provide a short description of the robot and navigation system, and be prepared to answer some basic questions.1. Build a mobile robot for this task. Using the parts in your robot kit, build a mobile robot for the navigation task. (Since the robot has to be able to navigate through the course, an important design criterion might be that the robot is able to keep track of its position and maybe to detect the goal once it is reached). Your project report should include a short description of your robot design (including the critical design choices made).2. Implement a navigation strategy to address the task. Implement a navigation strategy which will permit your robot to accomplish the navigation task with arbitrary obstacle, goal, and start configurations (subject to the constraints described above). The speed of your robot and the length of the path generated by your robot are not of major important here. The main objective is reaching the goal while not hitting any obstacles (i.e. without crossing over one of the obstacle tiles). In addition, the robot has to stay within the assigned workspace.Your report should contain a description of the important components of your navigation strategy and the actual code for the robot.Words of Caution Dead reckoning (i.e. keeping track of the position of the robot using only internal sensors) for the Lego Robots can be relatively imprecise and navigation strategies might therefore sometimes fail.Don’t get discouraged if your robot does not succeed all the time. Also, moving at slower speeds can improve the overall precision.
Purpose: Assignment: Computers sometimes use a format called binary coded decimal or ‘BCD’. Good background information is available from https://en.wikipedia.org/wiki/Binary-coded_decimal , but it is far deeper than you need for this assignment. (read at your own risk, the section marked “Basics” covers all you need to know at this point in time). BCD encodes a decimal number into 4 binary digits. A byte holds 2 BCD digits. A 32 bit word can contain 8 packed BCD digits. Example: The decimal number 4660. Expressed as hex, it is 0x1234. In BCD, it is 0100 0110 0110 0000 or 4660 BCD or 18016 base 10. The decimal number 9999. As hex it is 0x270F In BCD, it is 1001 1001 1001 1001, or 9999 BCD, or 39,321 base 10. Write a program that accepts an integer input (base 10) to stdin. This input value is in packed BCD format. Using the packed BCD format, display the number as a sequence of characters to stdout. The sequence of characters is a simulation of a calculator’s display (5 characters wide, 5 characters tall, with a space between each). Grading Criteria: Example Output: $ gcc -Wall -std=c99 -g hw4.c$ ./a.out4660— — — — —- —-| | | | | | | | | / | || | | | | | | | | / —- —-| | | | | | | | | / | ^— — — — —- —- ^305419896—- —- —- —- —| / | | | | / | || / —- —- —- — / —| / | ^ | | | / | |—- —- ^ —- — —39321— — — — — — — —| | | | | | | | | | | | | | | || | | | | | | | — — — —| | | | | | | | | | | |— — — —-1
Purpose: Assignment: Write a program that will calculate and display the mean, median, sum, max, and min of a provided sequence of 5 numbers. The return types of these functions should be consistent with the result. Example; The sum of two integers is always an integer, so an ‘int’ would be the correct return value for the “sum” function. It is acceptable for functions to be used by other functions you have written for this assignment. 25 5 0 0 2 min = 0 max = 25 median = 2 sum = 32 mean = 6.4000000 0 0 0 0 min = 0 max = 0 median = 0 sum = 0 mean = 0.000000100 100 100 100 100 min = 100 max = 100 median = 100 sum = 500 mean = 100.0000001 2 3 4 5 min = 1 max = 5 median = 3 sum = 15 mean = 3.0000005 4 3 2 1 min = 1 max = 5 median = 3 sum = 15 mean = 3.000000 Grading Criteria:
Purpose: Assignment: Write a program that demonstrates the correct operation of the described sin_() function. float sin_ (float input_angle) The calculation of this trigonometric value will be performed in a multi-stage process; first a look up table (array) will be built storing the sin() for all values between 0 degrees and 359 degrees using a taylor series. (see https://en.wikipedia.org/wiki/Taylor_series for background and the algorithm to be used). This lookup table is created once and then used when the function is called for the life of the program. The sin_() function will use the lookup table to linearly interpolate between two values of the lookup table to return the answer. (see https://en.wikipedia.org/wiki/Linear_interpolation for background and the equation to be used for linear interpolation). The advantage to this design is the function call is very fast, requiring only some addition and division. Slower calculations, like the calculation of the sin in the lookup table is done only one time during initialization. An example of using a look up table and interpolation: To solve for X = 7 Y = (25(10-7) +100(7-5) ) / (10-5)Y = (75 +200) / 5Y = 55 In this example, the function we are interpolating is Y=X*X. An X value of 7 would give 49, so the Error in using the lookup table and interpolation is: Error = Correct – Measured Error = 49 – 55 = -6 You will find the sin() function calculated as described above with be much closer to actual value. Specific Requirements for the sin_() function: Specific Requirements for the program: Grading Criteria: Example Output. (Note the numerical accuracy may differ depending on your code, the specific compiler used, and the size of operands used) Note. These examples are not exhaustive and do not test all of the functionality described for this homework assignment. [bdavis@localhost hw2]$ ./a.out -1 [bdavis@localhost hw2]$ ./a.out 0 0.000000 0.000000 0.000000 0.000000 90 90.000000 1.000004 1.000000 -0.000004 180 180.000000 -0.006925 -0.000000 0.006925 270 270.000000 -1.000004 -1.000000 0.000004 359 359.000000 -0.017453 -0.017453 -0.000000 -1 [bdavis@localhost hw2]$ ./a.out ofile [bdavis@localhost hw2]$ gnuplot gnuplot> plot “ofile” using 1,”ofile” using 2,”ofile” using 3,”ofile” using 4 gnuplot> quit
Data Description: The Metropolitan Museum of Art presents over 5,000 years of art from around the world for everyone to experience and enjoy. The Museum lives in three iconic sites in New York City—The Met Fifth Avenue, The Met Breuer, and The Met Cloisters. Millions of people also take part in The Met experience online.Since it was founded in 1870, The Met has always aspired to be more than a treasury of rare and beautiful objects. Every day, art comes alive in the Museum’s galleries and through its exhibitions and events, revealing both new ideas and unexpected connections across time and across cultures.The Metropolitan Museum of Art provides select datasets of information on more than 470,000 artworks in its Collection for unrestricted commercial and noncommercial use. Critical Details and Instructions:iii. For problems 1-5, you can manipulate the data-frames/dictionaries as you see fit and using whatever functions/libraries you want. However, it is critically important that your end results for each problem match the provided variable name (ex: the result of problem 1 is called df_init) so that they are accessible for grading. You should upload your exam via the File Response dialogue through the Blackboard exam – but if you cannot do so, email it to me ASAP. Note that if you are submitting a .py file you are highly encouraged to include a README to explain what should be run to produce the required structures for problems 1-5 and graphs for problem 6.
1. (Machine Learning (Classification)) a. Choose one of the toy classification datasets bundled with sklearn other than the digits dataset. b. Train three distinct sklearn classification estimators for the chosen dataset and compare the results to see which one performs the best when using 2-fold cross-validation. Note that you should use three distinct classification models here (not just tweak underlying parameters). A relatively complete listing of the available estimators can be found here (https://scikit-learn.org/stable/supervised_learning.html) — but make sure you only use classifiers! Unless you have an inclination to do otherwise, I recommend using the model default parameters when available.c. Repeat a. for 20-fold cross-validation. Explain in a paragraph the difference in your results when using 20-fold vs 2-fold cross-validation (if any).d. Construct a confusion matrix for your most accurate model between the three estimators and two cross-fold options. Which class in your dataset is most accurately predicted to have the correct label by the best classifier, and and which is most likely to be confused among one or more of the wrong classes?2 (Option I). (Trends, Searches, and Sentiment) a. Use the Twitter Trends API to determine the available trending topics for a city of your choice, assigning a tweet volume of 5000 to any trend with no volume provided. b. After sorting the trends in descending order by volume, create a bar graph with each (sorted) trend on the x-axis against its volume on the y-axis.c. Use the Twitter Search API to find 20 tweets for each of the three most popular trends in the chosen city, and preprocess their associated tweet text (preferring extended tweet text, if available) in a manner appropriate for tweets.d. Use TextBlob to determine the sentiment for each set of 20 tweets. i. Do you notice a substantial difference in the proportion of positive and negative sentiment for the three trends? Try to theorize why or why not. ii. Do you believe the sentiment analysis to be reliable for any or all of the trend? Explain why or why not.2 (Option II). (Machine Learning (Regression)) a. Locate a non-proprietary, small-scale dataset suitable for regression online. There are countless sources and repositories than you can use in this task, but if you have trouble finding one, I recommend starting via Kaggle (https://www.kaggle.com/code/rtatman/datasets-for-regression-analysis/notebook). Explain briefly what the dataset represents, what target variable you will be using, and what other features are present. You may want or need to apply preprocessing to your data to insure it can be used properly with the regression models (e.g. making every feature numeric through transformation or by dropping some)b. Train three distinct sklearn regression estimators for the chosen dataset and compare the results to see which one performs the best when using 10-fold cross-validation, utilizing the R-Squared score to gauge performance. Note that you should use two distinct regression models here (not just tweak underlying parameters). A relatively complete listing of the available estimators can be found here (https://scikit-learn.org/stable/supervised_learning.html) — but make sure you only use regression models! Unless you have an inclination to do otherwise, I recommend using the model default parameters when available.c. Repeat part b utilizing the Mean Square Error to gauge performance. Briefly research the difference between the two metrics (MSE and R2), and explain in a paragraph or two i. the difference between them ii. when each one is the preferable metric to use.
5/5 - (1 vote) This assignment deals with using textblob and other open-source libraries to perform NLP-based analysis on documents using Python. All parts should use the same three documents (as outlined in Part 1 below). In addition to your .ipynb and/or .py files, you must submit a report document (in .doc or .pdf format) that answers various questions below. Part 1: Select and download three texts of your choosing that represent different media or writing formats (for example, you could choose i. a novel, movie script, and play script or ii. a short story, poem, and novel, etc.) Make sure you briefly descibe your documents and explain the difference between them in a paragraph. Part 2: (a) Compute word counts for each of your documents after excluding English stop words (and optionally, performing lemmatization). (b) Create and display a bar plot for each document that include word counts for the 25 most frequent words (after the above processing). (c) Create and display a word cloud for each document (using a mask image of your choice) that includes only the 100 most frequent words. Note that you’ll likely want to use the approach outlined in Session 25 that utilizes the fitwords method, since you will want data consistent with those for part (b). (d) Do you see any notable difference between the documents wrt (b) and/or (c) above? Try to explain why or why not, and whether you would expect such a difference. Part 3: (a) Use Textatistic to compute the average of the Flesch–Kincaid, Gunning Fog, SMOG, and Dale–Chall scores for each document. (b) Are there noticeable differences among your documents’s readability scores, and do you suspect any difference is present (or should be present)? Part 4: (a) Use spaCy to compute the pairwise similarity between your documents (i.e. doc. 1 to doc. 2, doc. 1 to doc. 3, doc. 2 to doc. 3). (b) Do any of these similarity scores seem higher or lower than you would expect? Explain your response. Part 5: (a) Use spaCy to find the named entities in your documents. (b) Produce a bar plot for each document that includes the count for the 20 most common named entities (by name). (c) Produce a second bar plot per document based on the counts of every named entity type (PERSON, ORG, etc.) (d) Do you notice any meaningful differences (or similarities) among the documents wrt to these plots? If so, explain what they are.
A small, but important aspect in text mining and natural language processing is measuring word frequency. This assignment deals with a heavily boiled-down exercise in loading a text file into Python and computing word frequency statistics. It requires usage of text files, strings and dataframes, so it is heavily encouraged that you take a look at relevant sessions (14-17) if you have not already done so.(a) Locate a movie script, play script, poem, or book of your choice in .txt format. You are free to choose nearly any novel, movie script, or play that you like, with the qualification that your chosen document must have a minimum of 5 chapters, scenes, and/or acts that distinguish one portion of the document’s narrative from another. For example, the novel “Great Expectations” has 59 chapters, the script for “Jaws” has about 27 scenes, and all or almost all Shakespearean plays have exactly five acts. It is important that for part (e) of the document that these segments exist for your document. Project Gutenburg is a great resource for this if you’re not sure where to start.(b) Load the words of this structure in sequential order of appearance into a one-dimensional Python list (i.e. the first word should be the first element in the list, while the last word should be the last element) that is case insensitive. It’s up to you how to deal with special chacters — you can remove them manually, ignore them during the loading process, or even count them as words, for example. Make sure you have this list clearly assigned to a variable, so we can evaluate it during grading.(c) Use your list to create and print a two-column pandas data-frame with the following properties: i. The first column for each index should represent the word in question at that index ii. The second column should represent the number of times that particular word appears in the text. iii. The rows of the data-frame should be ordered according to the first occurrence of each word. iv. It’s up to you whether or not your data-frame will include an index per row. Make sure you have this data-frame clearly assigned to a variable, so we can evaluate it during grading.Ex: if the first word in your text is “the” which occurs 500 times and the second is “balcony” which only appears twice, your data-frame should begin like the following:(d) Stop-words are commonly used words in a given language that often fail to communicate useful summative information about its content. The attached stop_words.py file has a simple list of common stop words assigned to a variable. For this part of the assigment, you are to create a modified copy of the data-frame from (c) with the following modifications: i. all stop words have been removed from the data-frame and ii. the data frame rows have been sorted in decreasing order of frequency counts. Again, make sure you have this data-frame clearly assigned to a variable, so we can evaluate it during grading.(e) While total word counts can provide a useful measure of the content of a document, they cannot reveal much about its underlying trends. In the context of document analysis, the term trend implies a direction (in terms of theme, mood, etc.) in which the content changes throughout the narrative. For example, some works of fiction begin with a comedic tone, and take on a more serious tone in later stages, or vice versa. For the last part of your assignment, you are going to modify the approach taken in part (d) to address individual segments of the document. More specifically, you are to divide the raw document into partitions according to the chapters, acts, etc. that are present, and then produce a list of data-frames, where each list element is a single data-frame containing word frequencies for a single segment with the same format as the data-frame from part (d) outlined above. You are free to use whatever means you prefer in splitting the text into chapters and constructing the list of data-frames, but one option is to use regular expressions with the raw document. Once again, you must insure your list is readily accessible to us in the form of a variable.You can use .py files, .ipynb files, or a combination of the two in your solution. Zip these file(s) along with a simple README telling me what to run to generate the list and data-frames into a zip file with the name , where LN is your last-name and FN is your first-name, and submit this file to Blackboard.
For this programming assignment, you will be producing a heavily boiled-down (American) football simulator and play visualization using Python. The process should give you experience with built-in and custom functions with the language, process simulation, and very simple depiction of data. There are four interrelated problems with a cumulative value of 100%, and the optional components for problems 1-3 can be completed for 2-3 extra credit points each (with the max grade not exceeding 106%). For full credit, you must include brief documentation for your code at least, including a very simple README (which should indicate, for example, optional components you completed and any changes from the norm in your code) and a short (line or two) comment for each function you implement, indicating what it does.1) (20 pts) In football terminology, a down is the period in which a single play (successful or failed) is executed. You are to write the function down(successprct, yardrange), where successprct is a number between 0-100 indicating probability of offensive success (i.e. because passes can be incomplete or plays otherwise unsuccessful), and yardrange is a tuple with two values representing the minimum and maximum number of yards gained. Your function should return a number of yards according to the following rules: i. a random number between 1 and 100 is generated — if it exceeds successprct then the play “fails” and 0 is returned, otherwise ii. the number of yards returned is equal to a random number between the min and max according to yardrange.Optional: if you like, you can include “sacks” or “penalties” in your code, presumably which have an attached percentage chance of happening and incorporate them into the down function.2) (30 pts) A drive represents a series of downs that result in either a touchdown or turnover (assume no punting or field goal is possible). You are to write the function drive(yards_to_TD, successprct, yardrange) where yards_to_TD is the number of yards a team must move the ball to achieve a touchdown, and successprct and yardrange are identical in form to their down counterparts. The drive function must do the following: up to four downs will be executed in sequence using the down function above with successprct and yardrange as arguments, with the following details: i. the number of yards generated by the down function will be subtracted from yards_to_TD on each call ii. if yards_to_TD ever reaches zero or below, the team scores (see below) iii. If four sequential plays take place and the team doesn’t score, the ball is turned over to the other team with zero points scored (see below). The output of drive will be a tuple with two elements representing i. points scored and ii. field position for the other team, respectively. If the team scores (yards_to_TD reaches zero or below), then the first element will be 7 (90% of an extra point attempt to succeed) or 6 (10% of the attempt to fail) – otherwise the first value will be 0. If the team scores then the second element will be 80 (reflecting a kick-off/touchback position for the other team) – otherwise the second element will be calculated as 100-yards_to_TD (reflecting the other team moving the ball in the opposite direction.)Optional: if you like, you can include the notion of “first downs” in your code: in this case, the down counter “resets” to down 1 (first down) whenever ten or more positive yards are gained cumulatively within the four sequential downs. 3) (30 pts) Here you are to create a simple visual depiction of a drive. Function drive_depicted(yards_to_TD, successprct, yardrange) is identical in form and return to the drive function, with the exception that it must also provide visual details of every down within the drive, including the progress of the side on offense (i.e. how many yards from their own end zone) and the yards remaining to touch-down (i.e. how many yards from the opponent’s end zone). The nature of the visualization is flexible, but the simplest approach is an ascii-like depiction as in the examples below:Ex 1: Successful drive O—-|—->—-|—-|—-|—-|—-|—-|—-|—-X 1st Down, 80 Yds to Go O—-|—-|—-|—-|—>|—-|—-|—-|—-|—-X 2nd Down, 51 Yds to Go O—-|—-|—-|—-|—-|—-|—-|>—|—-|—-X 3rd Down, 28 Yds to Go O—-|—-|—-|—-|—-|—-|—-|—-|—-|—-T TD Scored! Xtra Pt Made Ex 2: Failed drive O—-|—->—-|—-|—-|—-|—-|—-|—-|—-X 1st Down, 80 Yds to Go O—-|—-|—-|—>|—-|—-|—-|—-|—-|—-X 2nd Down, 61 Yds to Go O—-|—-|—-|—>|—-|—-|—-|—-|—-|—-X 3rd Down, 61 Yds to Go O—-|—-|—-|—-|—-|—-|>—|—-|—-|—-X 4th Down, 38 Yds to Go O—-|—-|—-|—-|—-|—-|—-|—Q|—-|—-X Turnover, 21 Yds to Go Note that in the above examples yards-to-go is being rounded to the closest even yard for depiction purposes – this is not essential, but may make visualization a bit less clunky in practice.Optional: If you like, you can use any of the Python figure/image/animation libraries to make a more sophisticated visualization of drives. But be aware that this can become a very complicated and difficult effort if you don’t have previous experience!4) (20 pts) Your last step is to create a football game simulation. While normal football is based on quarters and time limits, you can assume a game has a fixed number of alternating drives between the two teams, and each of the teams has a different success-rate/yard-gain as input to the drive function. More specifically, you are to write function simulategame(num_drives, prctT1, yrangeT1, prctT2, yrangeT2), where num_drives is the total number of drives played in the game by each team, prctT1 and yrangeT1 correspond to the successprct and yardrange for the first team (T1), and prctT2 and yrangeT2 correspond to the successprct and yardrange for the second team (T2). The function will do the following: i. initialize scores for both teams at 0 and yards_to_TD at 80, ii. call the drive function with yard_to_TD and team 1’s successprct/yardrange values, iii. increment T1’s score according to drive’s return and adjust yards_to_TD according to the second return (see problem 2 above), iv. call the drive function for team 2 using its parameters and v. adjust T2’s score if relevant, and vi. repeat steps ii-v num_drives time in total. Your return should be a 2-element tuple, with the first element being T1’s final score, and the second being T2’s final score.It is highly advised that you test your functions work correctly by calling them from an outside script.Your submission to Blackboard should be a single .zip file with the name __1.zip, where and are your last name and first name respectively. As outlined above, the file should include your jupyter notebook and/or python file(s), plus a README file for documentation and to guide execution.
Data Description: The four attached json files (savedtweets_americalatina.json, savedtweets_machinelearning.json, savedtweets_superleague.json, savedtweets_weibo.json), represent four separate classes of 100 tweets collected using a search query with the appropriate suffix. For example, saved_tweetsamericalatina.json has 100 tweets with the query “América Latina.”Each tweet has up to seven characteristics (stored as key-value pairs): screen_name, text, location, lang, retweet_count, latitude*, and longitude*.* Many tweets are missing these characteristics: see instructions below. Instructions (Four parts in total):Part 1. Load each json file into Python (obtaining a list of dictionaries for each) and perform the following: a. discard any tweets that lack latitude (those without latitude will also lack longitude, and vice-versa) b. Use the tweet-preprocessor to clean the text for each tweet using all available (default) options.For each collection, save the modified list of tweets back into a new json file with the name prep_tweets_class#.json, where # matches the order of json files cited above (0=…americalatina, 1= …machinelearning, 2=..superleague, 3=weibo). You should have files savedtweets_class0.json, savedtweets_class1.json, savedtweets_class2.json, and savedtweets_class3.json at the end of the process.Part 2. For each modified collection of tweets (i.e. after the transformation from part 1) calculate the # tweets with positive, negative, and neutral sentiment and depict these on a simple bar plot. You should have 3 bars per plot (one bar for positive, one bar for negative, one bar for neutral), and 4 plots total (one per tweet query class). Part 3. Pool together all modified tweets into a single list, but maintain a combined secondary list of equal size that dictates the class (0, 1, 2, or 3) to which each tweet belongs. Ex: If there are 44 América Latina tweets at the beginning of the pooled list of tweets, the first 44 elements of the secondary list should be 0. Part 4. Assume your combined lists each have a length of n. Your next goal is to construct a n x 5 numpy feature array suited for machine learning, where each row matches the corresponding index in your lists, and the 5 columns represent the features for the tweet at that position as follows: Feature 1: The length of the tweet’s text. Feature 2: The tweet’s retweet count. Feature 3: The tweet’s latitude. Feature 4: The tweet’s longitude. Feature 5: one of two values as follows: 0 if the tweet is in English, or 100 otherwise.For example, the first row in your feature array may look like the below: [80. , 1. , 46.2380576 , 6.15323095, 100. ] Part 5. Convert your secondary list of classes into an array, and then perform 10-fold cross-validation using three distinct classification estimators (either the ones we used in class, or those of your own choosing) to determine the accuracy available in using our features from part 4 in predicting the class of tweets. Part 6. Using the t-SNE estimator to compress our features into 2 dimensions, visualize the tweets on a scatter-plot with 4 different colors for 4 different classes. Briefly comment (inline code comments are fine) on where you see distinct clusters of classes on the plot, and where you do not see any distinction.
1. Apply K-Means and Agglomerative clustering algorithms to real data 2. Analyze and optimize the parameters of each clustering method 3. Analyze and compare the clustering resultsa) Use the K-Means algorithm to cluster the provided data. Vary the number of clusters from 2 to 20 and select the optimal number. Justify your choice based on the SSE vs. No. clusters plot. b) Using the number of clusters selected in (a), generate the silhouette plot.c) Using the silhouette coefficients, identify 5 samples that are at the core of each cluster and 2 samples that are at the boundary of any two clusters (if they exist). Display the original images associated with these samples and comment on the results.a) Use the hierarchical agglomerative algorithm, with the Ward’s method to compute the distance between two clusters, to cluster the provided data. Generate the dendrogram and use that to identify the optimal number of clusters. Justify your choice.b) Using the number of clusters selected in (a), generate the silhouette plot. c) Repeat (a) and (b) using single-link and complete-link. Compare the silhouette plots of the 3 methods and identify the best distance for this data. Justify your choice.d) Using the silhouette coefficients of the best method identified in (c), identify 5 samples that are at the core of each cluster and 2 samples that are at the boundary of any two clusters (if they exist). Display the original images associated with these samples and comment on the results.(15 points) For each clustering method (K-Means, Agglomerative), compute the adjusted rand index by comparing the generated clusters to the provided ground truth (this should be the only time you use the ground truth). Using these ARI’s and the visualizations generated for each problem, identify the best clustering method for this application. Justify your choice.What to submit? • A report that o Describes your experiments, the parameters considered for each method, etc. o Summarizes, explains (using concepts covered in lectures) and compares the results (using plots, tables, figures) • Do not submit your source code • Your report needs to be a single file (MS Word or PDF) • Your report cannot exceed 10 pages using a font of 12 • Assign numbers to all your figures/tables/plots and use these numbers to reference them in your discussion
Objectives 1. Apply Kernel SVM and MLP classification algorithms to the fashion-MNIST dataset 2. Use k-fold cross validation to identify the best way to rescale and preprocess the data 3. Use k-fold cross validation to identify the parameters that optimize performance (generalization) for each method 4. Compare the accuracy and identify correlation between the outputs of the two methodsFor this homework, you will apply the following classification methods to the fashion-MNIST classification data 1. Kernel Support Vector Machines 2. Multilayer Perceptrons• Apply 4-fold cross-validation to the provided training data subset to train your classifiers and identify their optimal parameters. In addition to the classifier’s parameters (e.g. regularization, kernel, Number of layers/nodes, learning rate, etc.), you should also consider the following 4 ways to preprocess and rescale the data: a) No preprocessing b) StandardScaler c) RobustScaler d) MinMaxScaler• After fixing the classifiers’ parameters, apply each method to the provided testing data subset to predict and analyze your results. Compare the accuracy obtained during training (average of the cross-validation folds) to those of the test data and comment on the results (overfitting, underfitting, etc.)• Analyze the correlation between the output of the 2 classifiers by displaying the predict_proba of SVM vs. predict_proba of MLP (using test data). Using these scatter plots (one per class), identify (if available) the following 3 groups• G-1: Samples that are easy to classify correctly by the SVM, but hard to classify by MLP • G-2: Samples that are easy to classify correctly by the MLP, but hard to classify by SVM • G-3: Samples that are hard to classify correctly by both methods For each group, display few samples (as images) and identify any common features among them.What to submit? • A report that o Describes your experiments, the parameters considered for each method, etc. o Summarizes, explains (using concepts covered in lectures) and compares the results (using plots, tables, figures) • Do not submit your source code • Your report needs to be a single file (MS Word or PDF) • Your report cannot exceed 10 pages using a font of 12 • Assign numbers to all your figures/tables/plots and use these numbers to reference them in your discussion
Objectives 1. Apply various classification algorithms to the movie reviews dataset 2. Use k-fold cross validation to identify the parameters that optimize performance (generalization) for each method 3. Compare the accuracy and explainability of each methodProblem #1 For this homework, you will apply the following classification methods to the movie reviews classification data (available in Blackboard) 1. Multinomial Naïve Bayes 2. Random Forest 3. Gradient Boosted Regression Trees• Apply 4-fold cross-validation to the provided training data subset to train your classifiers and identify their optimal parameters. • After fixing the classifiers’ parameters, apply each method to the provided testing data subset to predict and analyze your results. Compare the accuracy obtained during training (average of the cross-validation folds) to those of the test data and comment on the results (overfitting, underfitting, etc.)• Analyze the results of each method by inspecting the feature importance (if applicable) and few misclassified samples. • Select the best algorithm and justify your choice based on accuracy, explainability, time required to train/test, etc.What to submit? • A report that o Describes your experiments, o Summarizes, explains (using concepts covered in lectures) and compares the results (using plots, tables, figures)o Identifies the best method for each dataset. • Do not submit your source code • Do not submit raw output generated by your code!• Your report needs to be a single file (MS Word or PDF) • Your report cannot exceed 10 pages using a font of 12 • Assign numbers to all your figures/tables/plots and use these numbers to reference them in your discussion
Objectives Problem #1For this problem, you will use the Wine Quality database (posted in Blackboard). Use the provided training data subset to train your model and the testing subset to predict and analyze your results. What to submit?
1. Build and analyze simple classification algorithms based on KNN and linear models 2. Use k-fold cross validation (k=5) to identify the parameters that optimize performance (generalization) for each method 3. Identify cases of underfitting and overfitting 4. Select parameters that optimize performance (generalization) 5. Compare the accuracy and explainability of each methodProblem #1 For this homework, you will apply the following classification methods to the SPAM e-mail data (available in Blackboard) a) KNN binary classifier. Vary the parameter K b) Logistic Regression classifier. Vary the regularization parameter Cc) Linear Support Vector Machines classifier. Vary the regularization parameter C • Apply 5-fold cross-validation to the provided training data to train your classifiers and identify their optimal parameters.• After fixing the classifiers’ parameters, apply each method to the provided testing data to predict and analyze your results. Compare the accuracy obtained during training (average of the crossvalidation folds) to those of the test data and comment on the results (overfitting, underfitting, etc.)• Analyze the results of each method by inspecting the feature importance (if applicable) and few misclassified samples. • Select the best algorithm and justify your choice based on accuracy, explainability, time required to train/test, etc.What to submit? • A report that o Describes your experiments, o Summarizes, explains (using concepts covered in lectures) and compares the results (using plots, tables, figures) o Identifies the best method for each dataset.• Do not submit your source code • Do not submit raw output generated by your code! • Your report needs to be a single file (MS Word or PDF)• Your report cannot exceed 10 pages using a font of 12 • Assign numbers to all your figures/tables/plots and use these numbers to reference them in your discussion