Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Iti 1120 lab # 3 boolean expressions, if statements, debugging, and …

1 What is in this lab? 1. This lab bas 3 Tasks (3rd task has several parts). You should try to do these parts at home. Task 2 and 3 are in the interactive textbook. Watch the video I provided (follow the link at page 7) before doing these two tasks. 2. And 5 programming exercises (0 to 4) The slides that are included here but do not explain either Tasks or Programming exercises are there as reminder of some relevant material that is needed for this lab. 2 Lab 3 overview • Open a browser and log into Brightspace • On the left hand side under Labs tab, find lab3 material contained in either lab3-students.zip or lab3-studnets.pdf file • Download that file to the Desktop and open it. • Read slides one by one and follow the instructions explaining what to do. When stuck ask for help from TA. Ideally you should try to complete some, or all of this, at home and then use the lab as a place where you can get help with things you may have had difficulties with. 3 Before starting, always make sure you are running Python 3 This slide is applicable to all labs, exercises, assignments … etc ALWAYS MAKE SURE FIRST that you are running Python 3. That is, when you click on IDLE (or start python any other way) look at the first line that the Python shell displays. It should say Python 3. (and then some extra digits) If you do not know how to do this, read the material provided with Lab 1. It explains it step by step 4 div // and mod % operators in Python If uncertain, here is how to compute a//b (i.e. integer division) and a%b (i.e remainder) 1. Compute x=a/b 2. a//b is then equal to the whole (i.e integer) part of x. More precisely a//b is equal the integer that is closest to a/b but not bigger than a/b 3. a%b is equal to a – (a//b) * b 5 6 Task 1 What is the type and value of each of the following expressions in Python? Do this in you head (and/or paper) first. The check both columns in Python shell. eg. You can test what kind of value the first expression returns by typing type(13*0.1) in Python shell Expression Type Value 13 * 0.1 float 1.3 int(13) * 0.1 13 * int(0.1) int(13 * 0.1) 13 % 7 7 % 13 6%3 0%15 6//2.5 6%2.5 2> ‘A’==‘A’ True >>> ‘Anna’==‘Anna’ True >>> ‘Anna’=‘anna’ False >>> a=‘June’ >>> a==‘june’ False >>> a==‘June’ True >>> b=‘Ju’ + ’ne’ >>> a==b True ps. Do not copy paste the above into python shell. It will likely give you syntax errors since quotes do not copy/paste correctly from Word. 13 Examples of compound boolean expressions: • This is how you would test if age is at least 18 and at most 65: age>=18 and age 16 evaluates to False, and not(age > 16) evaluates to True • Suppose day refers to a string which is a day of a week. Here is how you would test if day is a weekend: day==“Saturday” or day==“Sunday” • Here are two ways to test if age is less than 18 or greater than 65. Think about the 2nd one – 1st way: age 65 – 2nd way: not(age>=18 and age >> pay(10, 35) 350 >>> pay(10, 45) 475.0 >>> pay(10, 61) 720.0 Programming exercises: Question 2 Rock, Paper, Scissors is a two-player game in which each player chooses one of three items. If both players choose the same item, the game is tied. Otherwise, the rules that determine the winner are: (a)Rock always beats Scissors (Rock crushes Scissors) (b) Scissors always beats Paper (Scissors cut Paper) (c) Paper always beats Rock (Paper covers Rock) Write a function called rps that takes the choice ‘R’, ‘P’, or ‘S’ of player 1 and the choice of player 2, and returns −1 if player 1 wins, 1 if player 2 wins, or 0 if there is a tie.” Note that I did not give you either the number of parameters or the names of parameters. You will have to figure it out on your own. Looking at the example runs below should help too: Example tests: >>> rps(‘R’, ‘P’) 1 >>> rps(‘R’, ‘S’) -1 >>> rps(‘S’, ‘S’) 0 17 18 Programming exercises: Question 3a Open a new file with IDLE. Write a program that has a function called is_divisible • The function is_divisible has two input parameters that are integers n and m and returns True if n is divisible by m and False otherwise. • Outside of that function, your program should interact with the user to get two integers. To determine if the 1st is divisible by the 2nd it should call is_divisible function. It should print a message explaining the result. Two example tests (one on the left and one on the right) >>> Enter 1st integer: 9 Enter 2nd integer: 3 9 is divisble by 3 >>> Enter 1st integer: 8 Enter 2nd integer: 3 8 is not divisble by 3 19 Programming exercises: Question 3b Open a new file with IDLE. Write a program that has two functions one called is_divisible and the other called is_divisible23n8 •The function is_divisible is the same as in the previous questions so you can copy/paste it to the beginning of the new file. •The function is_divisible23n8 has one input parameter, an integer. It should return string “yes” if the given number is divisible by 2 or 3 but not 8. Otherwise it should return a string “no”. Your function is_divisible23n8 must use, i.e make a call to, is_divisible •Outside of that function, your program should interact with the user to get one integer. It should call is_divisible23n8 function to deterimen if the number the user gave is divisible by 2 or 3 but not 8. It should print a message explaining the result. >>> Enter an integer: 18 18 is divisible by 2 or 3 but not 8 >>> Enter an integer: 16 It is not true that 16 is divisible by 2 or 3 but not 8 >>> Enter an integer: 3 3 is divisible by 2 or 3 but not 8 Bonus programming exercises: For those who are done and want to more programming exercises for the lab or home, follow this link and complete any, or ideally all, exercises there: https://runestone.academy/ns/books/published/thinkcspy/Selection/Exercises.html 20

$25.00 View

[SOLVED] Iti 1120 lab # 2 numbers, expressions, assignment statements, functions … bit of strings

What is in this lab? Objective of this lab is to get familiar with Python’s expressions, function calls, assignment statements and function design via: 1. 2 Tasks (each with a number of questions). You should try to do these tasks at home on a paper. 2. and 4 programming exercises The slides that are included here but do not display either Tasks or Programming exercises are there for you as a reminder of some relevant material that is needed for this lab. 2 Starting Lab 2 • Open a browser and log into Brightspace • On the left hand side under Labs tab, find lab2 material contained in lab2-students.zip file • Download that file and unzip it. 3 Before starting, always make sure you are running Python 3 This slide is applicable to all labs, exercises, assignments … etc ALWAYS MAKE SURE FIRST that you are running Python 3 That is, when you click on IDLE (or start python any other way) look at the first line that the Python shell displays. It should say Python 3 If you do not know how to do this, read the material provided with Lab 1. It explains it step by step 4 div // and mod % operators in Python // is called div operator in Python. It computes integer division % is called mod operator in Python. It computes the remainder of integer division If uncertain, here is how to compute a//b and a%b 1. Compute first x=a/b 2. a//b is then equal to the whole (i.e integer) part of x. More precisely a//b is equal the integer that is closest to a/b but not bigger than a/b 3. a%b is equal to a – (a//b) * b 5 Task 1 • Open the pdf file called in Task1-lab2.pdf• Read the instructions and complete all the exercises Note: If you have not printed this document beforehand or do not have a tablet with a pen, just take a piece of blank paper and write your answers on that paper. 6 Task 2 • Go to coursera webpage and log in. • Go to this link: https://www.coursera.org/learn/learn-to-program/home/welcome• Go to Week 1, Assessments and complete Quiz 1 (online) If you have issues signing up for free to this coursera course, below steps helped resolve it for most students (as shared by TA David Worley): 1. Signing up with a personal email instead of a school email 2. Signing up with email/password instead of logging in through Google 3. Trying on a different browser/device Some students had no issues with using their uottawa email, but for students who did have issues that seemed to help in most cases. If you still have problems, see the next page 7 Task 2 (in case coursera does not work) • Only do this if coursera web page is giving you difficulties. The quiz in the following file Task2-lab2.pdf is the same as on coursera webpage. • Open the pdf file called in Task2-lab2.pdf• Read the instructions and complete all the questions 8 Strings In addition to integer, float (i.e. number) and boolean objects. Python has string objects. (For now think of objects as just values) •A string is a sequence of characters between single quotes, double quotes or triple quotes. ‘This is a string’ Note that these are also strings: “ “ this is a string that is comprised of one blank space ‘257’ this is a string unlike 257 which is an integer •Strings can be assigned to variables. Examples: s1=‘Harry’ s2=“Potter” •There are many operations that can be applied on strings. For example when the + operator is applied to two strings, the result is a string that is the concatenation of the two. For example, s1+s2, would result in a string ‘HarryPotter’ Note that “The year is “+ 2525 would cause a syntax error since the + operator can be applied to two numbers or two strings but not the mix of the two. This however is a valid expression ‘The is year “+ “2025” Python also has * operator for strings. It can be applied to a string and an integer. Eg: 4 * “la” gives ‘lalalala’ 9 Programming Exercises Pretend that the following 4 programming questions are your Assignment 1. Write all your solutions to the following 4 questions in one file called lab2_prog_solutions.py You will be instructed to do a similar thing in your Assignment 1. IMPORTANTE NOTE: for this LAB and the ASSIGNMENT(s): If a question specifies the function name and the names of its parameters, then that same function name and function parameter names must be used when programming your functions. That will be the case in every question in your assignment 1. For example in the question on the next page, your function definition MUST start with: def repeater(s1, s2, n): as that is specified as a part of the question 10 Programming exercises 1 Write a function called repeater(s1, s2, n) that given two strings s1 and s2 and an integer n returns a string that starts with an underscore, then s1 and s2 alternate n times, then ends with an underscore. (For those who know loops: you may not use loops to solve this questions.) Testing your code: Here is what the output of your function should look like when you make the following function calls: 11 Programming exercises 2 Read the first paragraph of this page on quadratic equation and finding its roots (it. solutions) https://en.wikipedia.org/wiki/Quadratic_equation Write a function called roots(a, b, c) that given three coefficients a and b and c prints a nicely formatted message displaying the equation and its two roots (the two roots may be the same number). You may assume that a is a non zero number, and that a and b and c are such that b2-4ac is a positive number. (Do you know why we are making this assumption?) Testing your code: 12 Programming exercises 3 Think back on the previous question … Write a function called real_roots(a, b, c) that returns True if the quadratic equation with the given three coefficients a and b and c has real roots. Otherwise it returns False. Recall that roots of a quadratic equation are real if and only if b2-4ac is a non-negative number. (Do not use if statements nor loops) Testing your code: 13 Programming exercises 4 Write a function called reverse(x) that given a two digit positive integer x returns the number with reversed digits. (You may assume that x is a two digit positive integer). (Do not use if statements nor loops) Hints: Think of mod and div operators and how they can help. What number should you div x with to get the 1st digit. Testing your code: 14

$25.00 View

[SOLVED] Assignment 6 the goal of this assignment is to learn and practice sets, dictionaries and objects and recursion.

The goal of this assignment is to learn and practice sets, dictionaries and objects and recursion. This assignment has three parts, first about dictionaries and sets (worth 40 points), the second one about objects (worth 60 points), and third about recursion (worth 15 points).Put all the required documents into a folder called a6_xxxxxx where you changed xxxxxx to your student number, zip that folder and submit it as explained in Lab 1. In particular, the folder should have the following files: a6_part1_xxxxxx.py, a6_part2_xxxxxx.py and a6_part2_testing_xxxxxx.txt a6_part3_xxxxxx.py references-YOUR-FULL-NAME.txtAs always, you can make multiple submissions, but only the last submission before the deadline will be graded. As always, each of your programs must run without syntax errors. In particular, when grading your assignment, TAs will first open your filea6_part1_xxxxxx.py with IDLE and press Run Module. The same will be done with a6_part2_xxxxxx.py and a6_part3_xxxxxx.py. If pressing Run Module causes any syntax error, the grade for that part becomes zero. Furthermore, for each function whose code is missing, I have provided below one or more tests to test your functions with.To obtain a partial mark for these function your solutions may not necessarily give the correct answer on these tests. But if your function gives any kind of Python error when run on the tests provided, that function will be marked with zero points. Finally, each function has to be documented with docstrings.Using global variables inside of functions is not allowed. Using either keyword break or continue is not allowed. About references-YOUR-FULL-NAME.txt file: The file must be a plain text file. The file must contain references to any code you used that you did not write yourself, including any code you got from a friend, internet, AI engines like chatGPT, social media/forums (including Stack Overflow and discord) or any other source or a person. The only exclusion from that rule is the code that we did in class, the code done as part the lab work, or the code in your textbook. So here is what needs to be written in that file. For every question where you used code from somebody else: • Write the question number • Copy-paste all parts of the code that were written by somebody else. That includes the code you found/were-given and that you then slightly modified. • Source of the copied code: name of the person or the place on the internet/book where you found it. While you may not get points for copied parts of the question, you will not be in the position of being accused of plagiarism.Any student caught in plagiarism will receive zero for the whole assignment and will be reported to the dean. Showing/giving any part of your assignment code to a friend also constitutes plagiarism and the same penalties will apply. If you have nothing to declare/reference, then just write a sentence stating that and put your first and last name under that sentence in your references-YOUR-FULL-NAME.txt file.Not including references-YOUR-FULL-NAME.txt file, will be taken as you declaring that all the code in the assignment was written by you. Recall though that not submitting that file, comes with a grade penalty.For part 1, I provided you with starter code in file called a6_part1_xxxxxx.py. Begin by replacing xxxxxx in the file name with your student number. Then open the file. Your solution (code) for the assignment must go into that file in the clearly indicated spaces. The file has a part of of the main precoded for you. It also has some functions completely precoded for you. Your task will me to code the remaining functions. You are not allowed to delete or comment-out any parts of the provided code. The only exception to that rule is the keyword pass. Some functions have that keyword. You can remove it once you are done coding that function. You also must follow the instructions given in comments and implied by docstrings. You are however allowed to add your own additional (helper) functions. In, fact you must add at least THREE more function. If you are running out of ideas here the names of some of the extra functions my solution has: remove_punctuation(words), process_lines(ls), make_dict(lsw), is_valid(D,query) is_word(word)I have provided 5 text files to test and debug your code with as explained in the next section. Now to the problem. For this part, you will need to write a program that solves co-existence problem. What is co-existence problem? You will write a Python program to solve the co-existence problem. The co-existence problem is stated as follows. We have a file containing English sentences, one sentence per line. Given a list of query words, your program should output the line number of lines that have all those words. While there are many ways to do this, the most efficient way is to use sets and dictionaries. Here is one example. Assume that the following is the content of the file. Line numbers are included for clarity; the actual file doesn’t have the line numbers.1. Try not to become a man of success, but rather try to become a man of value. 2. Look deep into nature, and then you will understand everything better. 3. The true sign of intelligence is not knowledge but imagination. 4. We cannot solve our problems with the same thinking we used when we created them. 5. Weakness of attitude becomes weakness of character. 6. You can’t blame gravity for falling in love. 7. The difference between stupidity and genius is that genius has its limits. (These are attributed to Albert Einstein. )If we are asked to find all the lines that contain this set of words: {“true”, “knowledge”, “imagination”} the answer will be line 3 because all three words appeared in line 3. If they appear in more than one line, your program should report all of them. For example, co-existence of {“the”, “is”} will be lines 3 and 7.IMPORTANT: You should download a text file version of book War and Piece from here: https://www.dropbox.com/s/pg4p9snzv60rp5v/WarAndPiece.txt?dl=0 Download it and save it in the same directory as your program. You solution should be instantaneous on that book, i.e. your program should produce the required dictionary in 1 or 2 seconds on that book and it should answer questions about any co-existence instantaneously.Python Implementation: You need to implement the following functions: 1) open_file() The open_file function will prompt the user for a file-name, and try to open that file. If the file exists, it will return the file object; otherwise it will re-prompt until it can successfully open the file. This feature must be implemented using a while loop, and a try-except clause.2) read_file(fp) This function has one parameter: a file object (such as the one returned by the open_file() function). This function will read the contents of that file line by line, process them and store them in a dictionary. The dictionary is returned. Consider the following string pre-processing: 1. Make everything lowercase 2. Split the line into words 3. Remove all punctuation, such as “,”, “.”, “!”, etc. 4. Remove apostrophes and hyphens, e.g. transform “can’t” into “cant” and “first-born” into “firstborn” 5. Remove the words that are not all alphabetic characters (do not remove “cant” because you have transformed it to “cant”, similarly for “firstborn”). 6. Remove the words with less than 2 characters, like “a”Hint for string pre-processing mentioned above: To find punctuation for removal you can import the string module and use string.punctuation which has all the punctuation. 2 To check for words with only alphabetic characters, use the isalpha() method. Furthermore, after pre-processing, you add the words into a dictionary with the key being the word and the value is a set of line numbers where this word has appeared.For example, after processing the first line, your dictionary should look like: {‘try’: {1}, ‘not’: {1}, ‘to’: {1}, ‘become’: {1}, ‘man’: {1},’of’: {1}, ‘success’: {1}, ‘but’: {1}, ‘rather’: {1}, ‘value’: {1}} This should be repeated for all the lines; the new keys are added to the dictionary, and if a key already exists, its value is updated. At the end of processing all these 7 lines, the value in the dictionary associated with key “the” will be the set {3, 4, 7}. (Note: the line numbers start from 1.)3) find_coexistance(D, query) The first parameter is the dictionary returned by read_file; the second one is a string called query. This query contains zero or more words separated by white space. You need to split them into a list of words, and find the line numbers for each word. To do that, use the intersection or union operation on the sets from D (you need to figure out which operation is appropriate). Then convert the resulting set to a sorted list, and return the sorted list. (Hint: for the first word simply grab the set from D; for subsequent words you need to use the appropriate set operation: intersection or union.)4) #main The main part of the program should call the three functions above. Loop, prompting the user to enter space-separated words. Use that input to find the co-occurrence and print the results. Continue prompting for input until “q” or ”Q” is inputed.Very important considerations: Every time you want to look up a key in a dictionary, first you need to make sure that the key exists. Otherwise it will result in an error. So, always use an if statement before looking up a key: if key in data_dict: ## the key exists in a dictionary, so it is safe to use data_dict[key]After you completed the program, see how it works for the two files provided: einstein.txt, and gettysburg.txt 1.1 Testing Part 1 1.2 Testing with einstein.txt file Enter the name of the file: b.txt There is no file with that name. Try again. Enter the name of the file: grrrr There is no file with that name. Try again. Enter the name of the file: einstein.txt Enter one or more words separated by spaces, or ‘q’ to quit: the The one or more words you entered coexisted in the following lines of the file: 3 4 7 Enter one or more words separated by spaces, or ‘q’ to quit: the is The one or more words you entered coexisted in the following lines of the file: 3 7 Enter one or more words separated by spaces, or ‘q’ to quit: true knowledge imagination The one or more words you entered coexisted in the following lines of the file: 3 Enter one or more words separated by spaces, or ‘q’ to quit: bla Word ‘bla’ not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: can’t The one or more words you entered coexisted in the following lines of the file: 6 Enter one or more words separated by spaces, or ‘q’ to quit: Word ” not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: ? Word ” not in the file Enter one or more words separated by spaces, or ‘q’ to quit: a Word ‘a’ not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: nature The one or more words you entered coexisted in the following lines of the file: 2 Enter one or more words separated by spaces, or ‘q’ to quit: THE 3 The one or more words you entered coexisted in the following lines of the file: 3 4 7 Enter one or more words separated by spaces, or ‘q’ to quit: tHe The one or more words you entered coexisted in the following lines of the file: 3 4 7 Enter one or more words separated by spaces, or ‘q’ to quit: man becomes The one or more words you entered does not coexist in a same line of the file. Enter one or more words separated by spaces, or ‘q’ to quit: harry potter Word ‘harry’ not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: harry man Word ‘harry’ not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: man harry Word ‘harry’ not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: man one two Word ‘one’ not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: q More on test runs on file einstein.txt Enter the name of the file: (Vida: Instead of a file name PRESS CTRL-C) Then: >>> f=open_file() Enter the name of the file: ah.txt There is no file with that name. Try again. Enter the name of the file: einstein.txt >>> f >>> d=read_file(f) >>> d {‘try’: {1}, ‘not’: {1, 3}, ‘to’: {1}, ‘become’: {1}, ‘man’: {1},’of’:{1, 3, 5}, ‘success’: {1}, ‘but’: {1, 3},’rather’: {1}, ‘value’: {1}, ‘look’: {2}, ‘deep’: {2}, ‘into’: {2},’nature’: {2}, ‘and’: {2, 7}, ‘then’: {2}, ‘you’: {2, 6}, ‘will’: {2}, ‘understand’: {2}, ‘everything’: {2},’better’: {2}, ‘the’: {3, 4, 7}, ‘true’: {3},’sign’: {3}, ‘intelligence’: {3}, ‘is’: {3, 7}, ‘knowledge’: {3},’imagination’: {3}, ‘we’: {4}, ‘cannot’: {4},’solve’: {4}, ‘our’: {4}, ‘problems’: {4}, ‘with’: {4}, ‘same’: {4},’thinking‘: {4}, ‘used’: {4}, ‘when’: {4},’created’: {4}, ‘them’: {4}, ‘weakness’: {5}, ‘attitude’: {5},’becomes’: {5}, ‘character’: {5}, ‘cant’: {6}, ‘blame’: {6}, ‘gravity’: {6}, ‘for’: {6}, ‘falling’: {6}, ‘in’: {6},’love’: {6}, ‘difference’: {7}, ‘between’: {7}, ‘stupidity’: {7}, ‘genius’: {7}, ‘that’: {7}, ‘has’: {7}, ‘its’: {7}, ‘limits’: {7}} >>> >>> find_coexistance(d,” the has”) [7] >>> find_coexistance(d,” the is “) [3, 7] 1.3 Testing with gettysburg.txt file Enter the name of the file: gettysburg.txt Enter one or more words separated by spaces, or ‘q’ to quit: nation The one or more words you entered coexisted in the following lines of the file: 2 6 9 23 Enter one or more words separated by spaces, or ‘q’ to quit: here dead The one or more words you entered coexisted in the following lines of the file: 14 22 Enter one or more words separated by spaces, or ‘q’ to quit: It is The one or more words you entered coexisted in the following lines of the file: 10 17 19 Enter one or more words separated by spaces, or ‘q’ to quit: 4you Word ‘4you’ not in the file. 4 Enter one or more words separated by spaces, or ‘q’ to quit: Q 1.4 Testing with WarAndPiece.txt file Enter the name of the file: WarAndPiece.txt Enter one or more words separated by spaces, or ‘q’ to quit: hard life The one or more words you entered coexisted in the following lines of the file: 33953 49922 60869 Enter one or more words separated by spaces, or ‘q’ to quit: 2013 Word ‘2013’ not in the file. Enter one or more words separated by spaces, or ‘q’ to quit: VIII The one or more words you entered coexisted in the following lines of the file: 52 110 154 194 228 274 328 356 402 450 530 600 634 674 714 756 790 2079 8264 13577 17689 20153 23726 27877 30215 33840 38274 45012 51021 53356 55805 58145 61010 63871 Enter one or more words separated by spaces, or ‘q’ to quit: black-eyed The one or more words you entered coexisted in the following lines of the file: 2682 49686 61292 Enter one or more words separated by spaces, or ‘q’ to quit: black-eyed wide-mouthed The one or more words you entered coexisted in the following lines of the file: 2682 Enter one or more words separated by spaces, or ‘q’ to quit: What’s the good of denying it, my dear? The one or more words you entered coexisted in the following lines of the file: 2900 Enter one or more words separated by spaces, or ‘q’ to quit: q PART 2 and 3 are on the NEXT TWO PAGESFor this part, you are provided with 3 files: a6_part2_xxxxxx.py, a6_part2_testing_given.txt and drawings_part2.pdf File a6_part2_xxxxxx.py already contains a class Point that we developed in class. For this part, you will need to develop and add two more classes to a6_part2_xxxxxx.py: class Rectangle and class Canvas. To understand how they should be designed and how they should behave, you must study in detail the test cases provided in a3_part2_testing_given.txt. These tests are your main resource in understanding what methods your two classes should have and what their input parameters are. I will explain few methods below in detail, but only those whose behaviour may not be clear from the test cases.As in Assignment 1 and Assignment 2, for part 2 of this assignment you will need to also submit your own text file called a6_part2_testing_xxxxxx.txt demonstrating that you tested your two classes and their methods (in particular, demonstrating that you tested them by running all the calls made in a6_part2_testing_given.txt) =========================================================== Details about the two classes: ===========================================================Class Rectangle represents a 2D (axis-parallel) rectangle that a user can draw on a computer screen. Think of a computer screen as a plane where each position has an x and a y coordinate. The data (i.e. attributes) that each object of type Rectangle should have (and that should be initialized in the constructor, i.e., __init__ method of the class Rectangle) are: * two Points: the first point representing the bottom left corner of the rectangle and the second representing the top right corner of the rectangle; and, * the color of the rectangleNote that the two points (bottom left and top right) completely determine (the axis parallel) rectangle and its position in the plane. There is no default rectangle. (see drawings_part2.pdf file for some helpful illustrations)The __init__ method of Rectangle (that is called by the constructor Rectangle) will take two objects of type Point as input and a string for the color). You may assume that the first Point (passed to the constructor, i.e. __init__) will always have smaller than or equal x coordinate than the x coordinate of the second Point and smaller than or equal y coordinate than the y coordinate of the second Point. Class Rectangle should have 13 methods. In particular, in addition to the constructor (i.e. __init__ method) and three methods that override python’s object methods (and make your class user friendly as suggested by the test cases), your class should contain the following 9 methods: get_bottom_left, get_top_right, get_color, reset_color, get_perimeter, get_area, move, intersects, and contains.Here is a description of three of those methods whose job may not be obvious from the test cases. * Method move: given numbers dx and dy this method moves the calling rectangle by dx in the x direction and by dy in the y-direction. This method should not change directly the coordinates of the two corners of the calling rectangle, but must instead call move method from the Point class. * Method intersects: returns True if the calling rectangle intersects the given rectangle and False otherwise. Definition: two rectangles intersect if they have at least one point in common, otherwise they do not intersect.* Method contains: given an x and a y coordinate of a point, this method tests if that point is inside of the calling rectangle. If yes it returns True and otherwise False. (A point on the boundary of the rectangle is considered to be inside). ========================================================================== Class Canvas represents a collection of Rectangles. It has 8 methods. In addition, to the constructor (i.e. __init__ method) and two methods that override python’s object methods (and make your class user friendly as suggested by the test cases), your class should contain 5 more methods:add_one_rectangle, count_same_color, total_perimeter, min_enclosing_rectangle, and common_point. Here is a description of those methods whose job may not be obvious from the test cases. * The method total_perimeter: returns the sum of the perimeters of all the rectangles in the calling canvas. To compute total perimeter do not compute a perimeter of an individual rectangle in the body of total_perimeter method. Instead use get_perimeter method from the Rectangle class. * Method min_enclosing_rectangle: calculates the minimum enclosing rectangle that contains all the rectangles in the calling canvas. It returns an object of type Rectangle of any color you prefer. To find minimum enclosing rectangle you will need to find the minimum x coordinate of all rectangles, the maximum x coordinate for all rectangles, the minimum y coordinate and the maximum y coordinate of all rectangles.* Method common_point: returns True if there exists a point that intersects all rectangles in the calling canvas. To test this (for axis parallel rectangles like ours), it is enough to test if every pair of rectangles intersects (according to a Helly’s theorem for axis-aligned rectangles: http:// en.wikipedia.org/wiki/Helly’s_theorem ). *** Finally recall, from the beginning of the description of this assignment that each of your methods should have a type contract.In this part you will have to implement two functions, one called digit_sum(n) and one called digital_root(n). Both functions must have recursive implementation. In particular, you cannot use any kind of loop in either of the two functions.The rst function, digit_sum(n) needs to solve recursively the following problem. Given a given nonnegative integer n return the sum of all the digits of n. For example, if n is 69701, the function should return 23 since 6+9+7+0+1 = 23. Then, implement a recursive function called, digital_root(n) to compute the digital root of a given nonnegative n. Your function digital_root function must use the digit_sum function.The digital root of a number is calculated by taking the sum of all of the digits in a number, and repeating the process with the resulting sum until only a single digit remains. For example, if you start with 1969, you must rst add 1+9+6+9 to get 25. Since the value 25 has more than a single digit, you must repeat the operation to obtain 7 as a nal answer. Place both functions in the same le a6_part3_xxxxxx.py.

$25.00 View

[SOLVED] Assignment 5 the goal of this assignment is to learn and practice the concepts covered thus far. in particular you will get more practice with (2d)

The goal of this assignment is to learn and practice the concepts covered thus far. In particular you will get more practice with (2D) lists and functions. For one of the functions you will also need to know how to prevent syntax errors i.e. crashes by using try/except concept for handling exceptions. We have learnt that before. Given the goals of this assignment, you cannot user: dictionaries, sets, deque, bisect module. You can though, and in fact should, use .sort or sorted functions.For this assignment, I provided you with starter code in file c alled a 5_xxxxxx.py. B egin b y r eplacing x xxxxx i n t he file name with your student number. Then open the file. Your s olution ( code) f or t he a ssignment m ust g o i nto t hat fi le in the clearly indicated spaces. The file h as m ain c ompletely c oded f or y ou. N othing e lse w ill g o i nto t he m ain. I t a lso h as some functions completely precoded for you. Your task will be to code the remaining functions. You are not allowed to delete or comment-out any parts of the provided code. The only exception to that rule is the keyword pass. Some functions have that keyword. You can remove it once you are done coding that function. You also must follow the instructions given in comments and implied by docstrings. You are however allowed to add your own additional (helper) functions. In fact, you must add at least one more function. I have provided 5 text files to test and debug your code with a s e xplained in the next section. To submit the assignment, create a folder called a5_xxxxxxx where (as usual) you must replace xxxxxx in the file name with your student number. Place the following two files into the folder, 1. a5_xxxxxx.py 2. references-YOUR-FULL-NAME.txt Zip the folder and submit it. Do not use winrar to create .rar file instead of .zip file. (No need to submit a5_xxxxxx.txt as proof that you tested your function. By now we trust that you learnt and understand the need and importance for testing your functions and code in general) As always, you can make multiple submissions, but only the last submission before the deadline will be graded. As always, your program must run without syntax errors. In particular, when grading your assignment, TAs will first open your file a5_xxxxxx.py with IDLE and press Run Module. If pressing Run Module causes any syntax error, the grade for the assignment becomes zero. Furthermore, for each function whose code is missing, I have provided below one or more tests to test your functions with. To obtain a partial mark for these functions your solutions may not necessarily give the correct answer on these tests. But if your function gives any kind of Python error when run on the tests provided, that function will be marked with zero points. Finally, each function has to be documented with docstrings. There is also a5-more-example-runs.txt file, giving additional example runs to those given in the next section. The behaviour of all example runs below and in a5-more-example-runs.txt should be considered as an implied requirement for the assignment – as always. Using global variables inside of functions is not allowed. In particular, inside of your functions you can only use variables that are created in that function. For example, the following code fragment would not be allowed, since variable x is not a parameter of function a_times(a) nor is it a variable created in function a_times(a). It is a global variable created outside of all functions. 1 def a_times(a): result=x*a return result x=float(input(“Give me a number: “)) print(a_times(10)) About references-YOUR-FULL-NAME.txt file: The file must be a plain text file. The file must contain references to any code you used that you did not write yourself, including any code you got from a friend, internet, AI engines like chatGPT, social media/forums (including Stack Overflow and discord) or any other source or a person. The only exclusion from that rule is the code that we did in class, the code done as part the lab work, or the code in your textbook. So here is what needs to be written in that file. For every question where you used code from somebody else: • Write the question number • . Copy-paste all parts of the code that were written by somebody else. That includes the code you found/were-given and that you then slightly modified. • Source of the copied code: name of the person or the place on the internet/book where you found it. While you may not get points for copied parts of the question, you will not be in the position of being accused of plagiarism. Any student caught in plagiarism will receive zero for the whole assignment and will be reported to the dean. Showing/giving any part of your assignment code to a friend also constitutes plagiarism and the same penalties will apply. If you have nothing to declare/reference, then just write a sentence stating that and put your first and last name under that sentence in your references-YOUR-FULL-NAME.txt file. Not including references-YOUR-FULL-NAME.txt file, will be taken as you declaring that all the code in the assignment was written by you. Recall though that not submitting that file, comes with a grade penalty. 1 Social Networks: friends recommendations and more – 100 points Have you ever wondered how social networks, such as Facebook, recommend friends to you? Most of the social networks use highly sophisticated algorithms for this, but for this assignment you will implement a fairly naive algorithm to recommend the most likely new friend to users of a social network. In particular, you will recommend the most probable user to befriend based upon the intersection of your common friends. In other words, the user that you will suggest to Person A is the person who has the most friends in common with Person A, but who currently is not friends with Person A. Five text files have been provided for you to run your program with. Each represents a social network. Three are small test files containing a made-up set of users and their friendships (these files are net1.txt, net2.txt and net3.txt). The two are a subset of a real Facebook dataset, which was obtained from: https://snap.stanford.edu/data/egonets-Facebook.html The format of all five files is the same: The first line of the file is an integer representing the number of users in the given network. The following lines are of the form: user_u user_v where user_u and user_v are the (non-negative integer) IDs of two users who are friends. In addition user_u is always less than user_v For example, here is a very small file that has 5 users in the social network: 5 0 1 1 2 1 8 2 3 The above is a representation of a social network that contains 5 users. User ID=0 is friends with User IDs = 1 User ID=1 is friends with User IDs = 0, 2, 8 User ID=2 is friends with User IDs = 1, 3 User ID=3 is friends with User IDs = 2 User ID=8 is friends with User IDs = 1 Spend time studying the above small example to understand the model. For example, notice that since friendship is a symmetric relationship the social media networks in this assignment, if user_u is friends with user_v, that means that user_v is also friends with user_u. Such “duplicate” friendships are not present in the file. In particular each friendship is listed once in such way that user_u < user_v 2 Also note that, while you can assume that user IDs are sorted, you cannot assume that they are consecutive integers differing by one. For example the user IDs above are: 0,1,2,3,8. You can also assume that in each file the users are sorted from smallest to largest (in the above example you see that users appear as: 0 1 1 2). Specifically, friendships of user_u appear before friendships of user_v if and only if user_u < user_v. And also for each user its friends appear sorted, for example for user 1 friendship with friend 2 appears before friendship with friend 4. To complete the assignment you will have to code the following 9 functions. I strongly recommend you code the in the order given below and do not move onto coding a function until you complete all before. The function descriptions, including what they need to do, are given in a5_xxxxxx.py. 1. create_network(file_name) (35 points) This is the most important (and possibly the most difficult) function to solve. The function needs to read a file and return a list of tuples representing the social network from the file. In particular the function returns a list of tuples where each tuple has 2 elements: the first is an integer representing an ID of a user and the second is the list of integers representing his/her friends. In the a5_xxxxxx.py I refer the list that create_network function returns as a 2D-list for friendship network (although one can argue that is is a 3D list). In addition the 2D-list for friendship network that must create_network function returns must be sorted by the ID and a list of friends in each tuple also must be sorted. So for the example above, this function should return the following 2D-list for 2D-list for friendship network: [(0, [1]), (1, [0,2,8]), (2,[1,3]), (3,[2]), (8,[1])] More examples: >>> net1=create_network(“net1.txt”) >>> net1 [(0, [1, 2, 3]), (1, [0, 4, 6, 7, 9]), (2, [0, 3, 6, 8, 9]), (3, [0, 2, 8, 9]), (4, [1, 6, 7, 8]), (5, [9]), (6, [1, 2, 4, 8]), (7, [1, 4, 8]), (8, [2, 3, 4, 6, 7]), (9, [1, 2, 3, 5])] >>> net2=create_network(“net2.txt”) >>> net2 [(0, [1, 2, 3, 4, 5, 6, 7, 8, 9]), (1, [0, 4, 6, 7, 9]), (2, [0, 3, 6,8, 9]), (3, [0, 2, 8, 9]), (4, [0, 1, 6, 7, 8]), (5, [0, 9]), (6, [0, 1, 2, 4, 8]), (7, [0, 1, 4, 8]), (8, [0, 2, 3, 4, 6, 7]), (9, [0, 1, 2, 3, 5])] >>> net3=create_network(“net3.txt”) >>> [(0, [1, 2, 3, 4, 5, 6, 7, 8, 9]), (1, [0, 4, 6, 7, 9]), (2, [0, 3, 6,8, 9]), (3, [0, 2, 8, 9]), (4, [0, 1, 6, 7, 8]), (5, [0, 9]), (6, [0, 1, 2, 4, 8]), (7, [0, 1, 4, 8]), (8, [0, 2, 3, 4, 6, 7]), (9, [0, 1, 2, 3, 5]), (100, [112]), (112, [100, 114]), (114, [112])] >>> net4=create_network(“big.txt”) >>> net4[500:502] [(500, [348, 353, 354, 355, 361, 363, 368, 373, 374, 376, 378, 382, 388, 391, 392, 396, 400, 402, 404, 408, 409, 410, 412, 414, 416, 417, 421, 423, 428, 431, 438, 439, 444, 445, 450, 452,455, 463, 465, 474, 475, 483, 484,487, 492, 493, 497, 503, 506, 507, 513, 514, 517, 519, 520, 521, 524, 525, 527, 531, 537, 538, 542, 546, 547, 548, 553, 555, 556, 557560, 563, 565, 566, 580, 591, 601, 604, 614, 637, 645, 651, 683]), (501, [198, 348, 364, 393, 399, 441, 476, 564])] 2. getCommonFriends(user1, user2, network) (15 points) >>> getCommonFriends(3,1,net1) [0, 9] >>> getCommonFriends(0,112,net3) [] >>> getCommonFriends(217,163,net4) [0, 100, 119, 150] 3. recommend(user, network) (15 points) Read the docstrings to understand how this function should work. Understand why the given friends are recommended in the examples below including why no friend is recommended for 0 in net2 and 112 in net 3. >>> recommend(6,net1) 7 >>> recommend(4,net2) 2 >>> recommend(0,net2) >>> recommend(114, net3) 100 3 >>> recommend(112,net3) >>> recommend(217,net4) 163 4. k_or_more_friends(network, k) (5 points) >>> k_or_more_friends(net1, 5) 3 >>> k_or_more_friends(net2, 8) 1 >>> k_or_more_friends(net3, 12) 0 >>> k_or_more_friends(net4, 70) 33 5. maximum_num_friends(network) (5 points) >>> maximum_num_friends(net1) 5 >>> maximum_num_friends(net2) 9 >>> maximum_num_friends(net3) 9 >>> maximum_num_friends(net4) 347 6. people_with_most_friends(network) (5 points) >>> people_with_most_friends(net1) [1, 2, 8] >>> people_with_most_friends(net2) [0] >>> people_with_most_friends(net3) [0] >>> people_with_most_friends(net4) [0] 7. average_num_friends(network) (5 points) >>> average_num_friends(net1) 3.8 >>> average_num_friends(net2) 5.0 >>> average_num_friends(net3) 4.153846153846154 >>> average_num_friends(net4) 19.78 8. knows_everyone(network) (5 points) >>> knows_everyone(net1) False >>> knows_everyone(net2) True >>> knows_everyone(net3) False >>> knows_everyone(net4) False 9. get_uid(network) (10 points) 4 >>> get_uid(net1) Enter an integer for a user ID:alsj That was not an integer. Please try again. Enter an integer for a user ID: twenty That was not an integer. Please try again. Enter an integer for a user ID:9aslj That was not an integer. Please try again. Enter an integer for a user ID:100000 That user ID does not exist. Try again. Enter an integer for a user ID:4.5 That was not an integer. Please try again. Enter an integer for a user ID: -10 That user ID does not exist. Try again. Enter an integer for a user ID:-1 That user ID does not exist. Try again. Enter an integer for a user ID:7 7 5

$25.00 View

[SOLVED] Assignment 4 the goal of this assignment is to learn and practice the concepts covered thus far: function design

Assignment 4 The goal of this assignment is to learn and practice the concepts covered thus far: function design, lists and loops. In this assignment you may not use any of the following: – dictionaries, sets – keywords break and continue – global variables in bodies of functions as explained below. Using any of these in a solution to a question constitues changing that question. Consequently, that question will not be graded.Your grade will partially be determined by automatic (unit) tests that will test your functions. All the specified requirements are mandatory (including function names, and behaviour implied by test cases and demo videos). Any requirement that is specified and not met may/will result in deduction of points. Submit your assignment by the deadline via Brightspace (as instructed and practiced in the first lab.) You can make multiple submissions, but only the last submission before the deadline will be graded. For this assignment you must submit 6 files as described below. For each missing file there will be a grade deduction:For this assignment, you do not need to submit a4_xxxxxx.txt as proof that you tested your functions. By now we trust that you learnt and understand the need for and importance of testing your functions and code in general.The assignment has two parts. Each part explains what needs to be submitted. In part 1 you will implement 4 shorter programs. In part 2, you will implement a card game. Put all the below six required documents (and your declaration file) into a folder, zip it the folder, and submit the resulting a4_xxxxxx.zip as explained in lab 1. In particular, the folder (and thus your submission) should have the following files: Part 1:a4_Q1_xxxxxx.py, a4_Q2_xxxxxx.py, a4_Q3_xxxxxx.py, a4_Q4_xxxxxx.py Part 2: a4_GAME_xxxxxx.py + references-YOUR-FULL-NAME.txtAll programs must run without syntax errors. In particular, when grading your assignment, TAs will first open your file, e.g. a4_GAME_xxxxxx.py with IDLE and press Run Module. If pressing Run Module causes any syntax error, the grade for Part 2 becomes zero. The same applies to Part 1.Furthermore, for each of the functions (in Part 1 and Part 2), I have provided one or more tests to test your functions with. To obtain a partial mark your function may not necessarily give the correct answer on these tests. But if your function gives any kind of python error when run on the given tests, that question will be marked with zero points. Some text cases are given inside docstrings like: remove_pairs in the provided starter file a4_game_xxxxxx.py; and, clean_up and is_rigorous in a4_Q4_xxxxxx.py.To determine your grade, your functions will be tested both with examples provided for Part 1 and Part 2 and with some other examples. Thus you too should test your functions with more example than what I provided. Global variables in bodies of functions are not allowed. If you do not know what that means, for now, interpret this to mean that inside of your functions you can only use variables that are created in that function. For example, this is not allowed, since variable x is not a parameter of function a_times(a) nor is it a variable created in function a_times(a).It is a global variable created outside of all functions. def a_times(a): result=x*a return result x=float(input(“Give me a number: “)) print(a_times(10))About references-YOUR-FULL-NAME.txt file: The file must be a plain text file. The file must contain references to any code you used that you did not write yourself, including any code you got from a friend, internet, AI engines like chatGPT, social media/forums (including Stack Overflow and discord) or any other source or a person. The only exclusion from that rule is the code that we did in class, the code done as part the lab work, or the code in your textbook. So here is what needs to be written in that file. For every question where you used code from somebody else:• Write the question number • . Copy-paste all parts of the code that were written by somebody else. That includes the code you found/were-given and that you then slightly modified. • Source of the copied code: name of the person or the place on the internet/book where you found it. While you may not get points for copied parts of the question, you will not be in the position of being accused of plagiarism. Any student caught in plagiarism will receive zero for the whole assignment and will be reported to the dean.Showing/giving any part of your assignment code to a friend also constitutes plagiarism and the same penalties will apply. If you have nothing to declare/reference, then just write a sentence stating that and put your first and last name under that sentence in your references-YOUR-FULL-NAME.txt file.Not including references-YOUR-FULL-NAME.txt file, will be taken as you declaring that all the code in the assignment was written by you. Recall though that not submitting that file, comes with a grade penalty.For this part of the assignment, you are required to write four short programs. For this part you need to submit 4 files: a4_Q1_xxxxxx.py, a4_Q2_xxxxxx.py, a4_Q3_xxxxxx.py and a4_Q4_xxxxxx.py . 1.1 Question 1: (5 points) Implement a Python function named number_divisible that takes a list of integers and a integer n as input parameters and returns the number of elements in the list that are divisible by n. Then, in the main, your program should ask the user to input integers for the list and an integer for n, then it should call the function number_divisible, and print the result. In this question you may assume that the user will follow your instructions and enter a sequence of integers separated by spaces for the list and an integer for n. You can use str method .strip and .split to handle the user input.Here is a way to ask a user for a list: raw_input = input(“Please input a list of numbers separated by space: “).strip().split() But now raw_input is a list of strings that look like integers so you need to create a new list that is a list of equivalent integers. Function call example: >>> number_divisible([6, 10, 2, 3, 4, 5, 6, 0], 3) 4 An example run of the program: 2 Please input a list of integers separated by spaces: 1 2 3 0 5 -6 995 Please input an integer: 2 The number of elements divisible by 2 is 31.2 Question 2: (5 points) A run is a sequence of consecutive repeated values. Implement a Python function called two_length_run that takes a list of numbers as input parameter and returns True if the given list has at least one run (of length at least two), and False otherwise. Make sure the function is efficient (i.e. it stops as soon as the answer is known). Then, in the main, your program should ask the user to input the list, then it should call two_length_run function, and print the result. You can obtain a list of numbers from the user as explained in Question 1.Four examples of program runs: Please input a list of numbers separated by space: 1 4 3 3 4 True Please input a list of numbers separated by space: 1 2 3 3 3 4.5 6 5 True Please input a list of numbers separated by space: 1.0 2 3.7 4 3 2 False Please input a list of numbers separated by space: 7.7 False Function call examples: >>> two_length_run( [2.7, 1.0, 1.0, 0.5, 3.0, 1.0] ) True >>> >>> two_length_run([1.0,1]) True1.3 Question 3: (10 points) As mentioned, a run is a sequence of consecutive repeated values. Implement a Python function called longest_run that takes a list of numbers and returns the length of the longest run. For example in the sequence: 2, 7, 4, 4, 2, 5, 2, 5, 10, 12, 5, 5, 5, 5, 6, 20, 1 the longest run has length 4. Then, in the main, your program should ask the user to input the list, then it should call longest_run function, and print the result. You can obtain a list of numbers from the user as explained in Question 1.Five examples of program runs: Please input a list of numbers separated by space: 1 1 2 3.0 3 3 3 3 6 5 5 Please input a list of numbers separated by space: 6 6 7 1 1 1 1 4.5 1 4 Please input a list of numbers separated by space: 6 2.4 4 8 6 1 Please input a list of numbers separated by space: 3 1 Please input a list of numbers separated by space: 0 Function call example: longest_run([6, 6, 7, 1.0, 1.0, 1.0, 1, 4.5, 1]) 41.4 Question 4: (10 points) In the questions you are provided with starter code a4_Q4_xxxxxx.py and 5 files file1.txt, …, file5.txt. As usual you cannot modify the given parts of the code. For this question you need to code the two missing functions clean_up and is_rigorous What these functions should do is described in the docstrings. Some test cases for these two functions can also be found in the docstrings. The provided program asks for the name of a file. The files your program will be tested with will have one character per line (like bo1.txt). Then the function read_raw returns a list of characters from the file. Here are some example runs: RUN 1: Enter the name of the file: file1.txt Before clean-up: [‘D’,’F’,’B’,’G’,’$’,’$’,’$’,’A’,’A’,’C’,’G’,’D’,’A’,’$’,’C’,’*’,’P’,’E’,’D’,’*’,’D’,’D’,’E’,’B’,’$’,’#’,’D’,’D] After clean-up: [‘$’, ‘$’, ‘$’, ‘$’, ‘A’, ‘A’, ‘B’, ‘B’, ‘C’, ‘C’, ‘D’, ‘D’, ‘D’, ‘D’, ‘D’, ‘D’, ‘E’, ‘E’, ‘G’, ‘G’] This list has no * but is not rigorous and it has 20 characters. RUN 2: Enter the name of the file: file2.txt Before clean-up: [‘A’, ‘*’, ‘$’, ‘C’, ‘*’, ‘*’, ‘P’, ‘E’, ‘D’, ‘D’, ‘#’, ‘D’, ‘E’, ‘B’, ‘$’, ‘#’] After clean-up: [‘#’, ‘#’, ‘$’, ‘$’, ‘D’, ‘D’, ‘E’, ‘E’] This list is now rigorous; it has no * and it has 8 characters. RUN 3: Enter the name of the file: file3.txt Before clean-up: [‘A’, ‘B’, ‘*’, ‘C’, ‘*’, ‘D’, ‘*’, ‘*’, ‘*’, ‘E’] After clean-up: [] This list is now rigorous; it has no * and it has 0 characters. RUN 4: Enter the name of the file: file4.txt Before clean-up: [‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’] After clean-up: [‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’] This list has no * but is not rigorous and it has 6 characters. RUN 5: Enter the name of the file: file5.txt Before clean-up: [] After clean-up: [] This list is now rigorous; it has no * and it has 0 characters.For this part, you will program a card game described here: http://www.classicgamesandpuzzles.com/Old-Maid.htmlYou will implement the two player version. One player will be the computer (i.e. your program) and the other a user of your program. In what follows, let’s refer to the computer player as Robot and user player as Human. You may assume that Robot will always deal the cards.As part of this assignment I provided the file called a4_GAME_xxxxxx.py. Replace xxxxxx in the file name with your student number. You should open that file and run it to see what it does already. All of your code must go inside of that file. The file already has some functions that are fully coded for you and other functions for which only docstrings and partial or no code are provided. Designing your program by decomposing it into smaller subproblems (to be implemented as functions) makes programming easer, less prone to errors and makes your code more readable. No part of the given code can be changed. Your code must go into clearly indicated places. No code can be added to the main. You can design some extra functions of your own if you like.Functions make_deck, wait_for_player and shuffle_deck are already fully coded for you. You need to develop the remaining functions: deal_cards, remove_pairs, print_deck, get_valid_input, and play_game. The functions must meet the requirements as specified in their docstrings (and as implied by the example program runs below and as implied by the video instructions). The main bulk of your code (the game playing part) will go into the function called play_game. That function should use/call the other functions that you are required to develop (i.e. deal_cards, remove_pairs, print_deck, and get_valid_input).When developing function get_valid_input you may assume that Human will enter integer when asked for an integer, but you may not assume that it will be in the correct range. The function get_valid_input gets the input from Human about which face-down card of Robot it wants. When it is Robot’s turn to play you must implement it such that Robot takes a random card from Human. Also recall that what Human calls 3rd card, for example, is in position/index 2 in Robot’s deck (as it is represented by a list). Study the example of the program run below carefully to understand how your program should behave.The behaviour of the program that you see in the run is required –s all aspects of it. Also watch the video I made to get even better idea of how your program must behave. The video demo can be found here: https://youtu.be/mMBApSkvHyMSome suggestions: • Study the provided code and understand what it does and how it should be used. • Spend some time thinking about various parts of the game that need to be implemented. For example, it needs to be able to display Human’s deck to Human, it needs to be able to ask Human for what card she wants, it needs to be able to remove pairs from either Human or Robot’s deck … etc. The provided functions do quite of bit of that job for you.• When you are coding individual functions recall that you can test each function in the shell without finishing the remaining functions. For example, when implementing function remove_pairs you can test it in the shell by typing something like: >>> remove_pairs([’10♣’, ‘2♣’, ‘5q’, ‘6♣’, ‘9♣, ‘Aq’, ’10q’]) The shell should display (with cards not necessarily in this order): [‘2♣’, ‘5q’, ‘6♣’, ‘9♣’, ‘Aq’]Thus you can code and test your functions one by one (without completing the other parts) • The game alternates between Robot and Human. Think about how you can represent whose turn it is to play, in your program. One way is to have a variable that you set to zero when it is Robot’s turn and to one when it is Human’s turn. You also need to figure out what to test to see if the game is over.2.1 Testing Part 2 Test runs for Part 2 are in the file entitled A4-game-runs.pdf The behaviour implied by the tests runs should be considered as required specifications in addition to what is explained above.

$25.00 View

[SOLVED] Cpts 475/575: data science assignment 5 – part 2: classification

In order to classify the text effectively, you will need to split your text into tokens. It is common practice when doing this to reduce your words to their stems so that conjugations produce less noise in your data. For example, the words “speak”, “spoke”, and “speaking” are all likely to denote a similar context, and so a stemmed tokenization will merge all of them into a single stem. R has several libraries for tokenization, stemming, and text mining.Examples of such libraries that you may want to use as a starting point are tokenizers, SnowballC, and tm, respectively. Alternatively, some of you may want to consider using quanteda, which will handle these functionalities along with others needed in building your model in the next step. Similarly, Python has libraries such as sklearn and nltk for processing text.You will need to produce a document-term matrix from your stemmed tokenized data. This will create a large feature set (to be reduced in the following step) where each word represents a feature, and each article is represented by the number of occurrences of each word.Before representing the feature set in a non-compact storage format (such as a plain matrix), you will want to remove any word which appears in too few documents. For this assignment, you will remove 15% of the words corresponding to the least frequent words in the document i.e., only 85% of the terms should be kept.To demonstrate your completion of this part, print the feature vector of the words that are appear 4 or more times in the 2205th article in the dataset. Your output should show the words and the number of occurrences in the article.For this part of the assignment, you will build and test a Multinomial Naïve Bayes classifier and a Multinomial Logistic Regression classifier to handle the multiple classes in the dataset.First, reduce the feature set using a feature selection method, such as removing highly correlated features. The caret package in R or similar libraries in Python like sklearn can be used to help reduce the number of features and improve model performance. You may wish to try several different feature selection methods to see which produces the best results.Next, split your data into a training set and a test set. Your training set should comprise approximately 80% of your articles and your test set the remaining 20%. In splitting your data into training and test sets, ensure that the five categories are nearly equally represented in both sets. Experiment with other split percentages (than 80-20) to ensure a balanced representation of the five categories, and use the split that gives you the best result for the required classifiers below.Next, build a Multinomial Naïve Bayes classifier using your training data. In R, you can use the multinomial_naive_bayes() function from the naivebayes package, and in Python, the MultinomialNB class from sklearn can be used. After building the model, use it to predict the categories of your test data.Once you have produced a model that generates the best predictions you can get, print a confusion matrix of the results to demonstrate your completion of this task. For each class, give scores for precision (TruePositives / TruePositives+FalsePositives) and recall (TruePositives / TruePositives+FalseNegatives).Finally, build a Multinomial Logistic Regression classifier using the same training and test sets and compare the results using a confusion matrix, as well as precision and recall scores for each class, with those from the Multinomial Naïve Bayes classifier.

$25.00 View

[SOLVED] Cpts 475/575: data science assignment 5 – part 1: linear regression & logistic regression

1) (18 points) This question involves the use of multiple linear regression on the redwine (winequality-red.csv) data set available on Canvas in the Datasets for Assignments module. This is the same dataset used in Assignment 2.a. (6 points) Perform a multiple linear regression with pH as the response and all other variables except citric_acid as the predictors. Show a printout of the result (including coefficient, error, and t-values for each predictor). Comment on the output by answering the following questions:i) Which predictors appear to have a statistically significant relationship to the response? How do you determine this? ii) What does the coefficient for the free_sulfur_dioxide variable suggest, in simple terms?b. (6 points) Produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?c. (6 points) Fit at least 3 linear regression models (exploring interaction effects) with alcohol as the response and some combination of other variables as predictors. Do any interactions appear to be statistically significant?2) (30 points) This problem involves the Boston data set, which can be loaded from library MASS in R and is also made available in the Datasets for Assignments module on Canvas (boston.csv). We will now try to predict per capita crime rate (crim) using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.a. (6 points) For each predictor, fit a simple linear regression model to predict the response. Include the code, but not the output for all the models in your solution. b. (6 points) In which of the models is there a statistically significant association between the predictor and the response? Considering the meaning of each variable, discuss the relationship between crim and each of the predictors nox, chas, rm, dis and medv. How do these relationships differ?c. (6 points) Fit a multiple regression model to predict the response using all the predictors. Describe your results. For which predictors can we reject the null hypothesis H0 : βj = 0? d. (6 points) How do your results from (a) compare to your results from (c)? You can present this comparison as a plot or as a table or any other form of comparison you deem fit.e. (6 points) Is there evidence of non-linear association between the predictors age and tax and the response crim? To answer this question, for each predictor (age and tax), fit a model of the form: Y = β0 + β1X + β2X 2 + β3X 3+ ε Hint: use the poly() function in R. Use the model to assess the extent of non-linear association.3) (12 points) Suppose we collect data for a group of students in a statistics class with variables: X1 = hours studied, X2 = undergrad GPA, X3 = PSQI score (a sleep quality index), and Y = receive an A.We fit a logistic regression and produce estimated coefficient, β0 = −8, β1 = 0.1, β2 = 1, β3 = -.04. a. (4 points) Estimate the probability that a student who studies for 32 h, has a PSQI score of 11 and has an undergrad GPA of 3.0 gets an A in the class. Show your work.b. (4 points) How many hours would the student in part (a) need to study to have a 65 % chance of getting an A in the class? Show your work.c. (4 points) How many hours would a student with a 3.0 GPA and a PSQI score of 3 need to study to have a 60 % chance of getting an A in the class? Show your work.

$25.00 View

[SOLVED] Cpts 475/575: data science assignment 4: joins (relational data) and visualization

Problem 1 (50 pts). This problem will involve the Lahman dataset (including the tables Batting, Teams, Salaries, and Managers). It is available in R by loading the Lahman library using the following command: library(Lahman)Alternatively, you can download the csv files from the Modules page on Canvas. The files are Batting.csv, Teams.csv, Salaries.csv, and Managers.csv. You can use Lahman_Desc.txt (also from Modules) to check the column descriptions for each dataset. We will first use joins to search and manipulate the dataset, then we will produce a flight count visualization.a) (10 pts) Filter the dataset (using a left join) to display the playerID, yearID, teamID, stint, G (games played), HR (home runs), and salary for all players who hit more than 30 home runs in a single season and played for a team in New York (teamID “NYA” or “NYN”) between 2010 and 2020. How many players match these criteria?b) (10 pts) What is the difference between the following two joins? Do not show the result of these anti_joins in your submission. anti_join(Salaries, Batting, by = c(“playerID” = “playerID”)) anti_join(Batting, Salaries, by = c(“playerID” = “playerID”)) What is the difference between semi_join and anti_join? Provide an example using the Salaries and Batting tables.c) (10 pts) Select the teamID, yearID, and the total number of runs batted in (RBI) for each team in the American League (AL) for the year 2015 (using one or more inner joins with the Teams and Batting tables). How many total home runs were hit by American League teams in 2015?d) (10 pts) Using the Managers and Teams tables, determine the number of seasons each manager managed a team. Use group_by and count to get the number of unique managerID and teamID combinations. How many unique combinations of managerID and teamID are present? Are there any players with unusually high number of years as a manager?e) (10 pts) Using the provided template as a start, produce a horizontal bar plot that shows the number of wins for the top 10 teams in 2019. Adjust the axis labels to clearly represent the teams and the number of wins. Add a meaningful title to the plot, and include the number of wins as text on each bar for clarity. Teams %>% filter(yearID == 2019) %>% select(teamID, W) %>% ggplot(aes(x = reorder(teamID, W), y = W)) + geom_bar(stat = “identity”, fill = “steelblue”) + coord_flip()Problem 2 (30 pts). The goal of this problem to create a visualization of the US map showing the states/territories and the number of presidential votes received during an election year. For this task, you will work with the us-presidents.csv dataset. The dataset can be found on the Modules page on Canvas.There dataset consists of 612 observations of 4 variables: year, state, state_po, office, totalvotes. For this question, you will create two visualizations of the US map for two presidential years of your choice coloring the states or sizing the point/marker for the states according to the number of total votes received from that state for the presidential election.Compare both maps and comment on any observations. You are free to choose any mapping tool you wish to produce this visualization. Try to make your visualization as nice looking as possible. You can use the state column directly to visualize the observations or you could get the coordinates for each state (depending on the tool and your visualization). Research how this can be done and use what you find. The dataplusscience.com website has some blogs about mapping that you may find useful. After you have coordinates you can use different methods for mapping. You can use packages available in R or Python. Another simple method is probably through https://batchgeo.com/features/map-coordinates/ . However, you can also use d3 to map the locations, if you want to learn something that you could use for other projects later.Problem 3 (20 pts). Create a word cloud for an interesting (relatively short, say a couple of pages) document of your own choice. Examples of suitable documents include: summary of a recent project you are working or have worked on; your own recent Statement of Purpose or Research Statement or some other similar document.You can create the word clouds in R using the package called wordcloud or you can use another tool outside of R such as Wordle. If you do this in R, you will first need to install wordcloud (using install.packages(“wordcloud”)) and then load it (using library(wordcloud)). Then look up the documentation for the function called wordcloud in the package with the same name to create your cloud. Note that this function takes many arguments, but you would be mostly fine with the default settings. Only providing the text of your words may suffice for a minimalist purpose.You are welcome (and encouraged) to take the generated word cloud and manipulate it using another software to enhance its aesthetic. If you have used Wordle instead of R, Wordle gives you functionalities to play with the look of the word cloud you get. Experiment till you get something you like most.Your submission for this would include the figure (cloud) and a brief caption that describes the text for the cloud. For example, it could be something like “Jenneth Joe’s Essay on Life During Pandemic, written in June 2021.”

$25.00 View

[SOLVED] Cpts 475/575: data science assignment 3: data transformation and tidying

Question 1. (50 pts total) For this question you will be using either the dplyr package from R or the Pandas library in Python to manipulate and clean up a dataset called NBA_Stats_23_24.csv (available in the Modules section on Canvas under the folder Datasets for Assignments). This data was pulled from https://www.nba.com/stats website. The dataset contains information about the Men’s National Basketball Association games in 2023 – 2024. It has 735 rows and 30 variables.Here is a description of the variables: Variable Description Rk Rank Player Player’s name Pos Position Age Player’s age Tm Team G Games played GS Games started MP Minutes played per game FG Field goals per game FGA Field goal attempts per game FG% Field goal percentage 3P 3-point field goals per game 3PA 3-point field goal attempts per game 3P% 3-point field goal percentage 2P 2-point field goals per game 2PA 2-point field goal attempts per game 2P% 2-point field goal percentage eFG% Effective field goal percentage 2 FT Free throws per game FTA Free throw attempts per game FT% Free throw percentage ORB Offensive rebounds per game DRB Defensive rebounds per game TRB Total rebounds per game AST Assists per game STL Steals per game BLK Blocks per game TOV Turnovers per game PF Personal fouls per game PTS Points per gameLoad the data into R or Python, and check for missing values (NaN). All the tasks in this assignment can be hand coded, but the goal is to use the functions built into dplyr or Pandas to complete the tasks. Suggested functions for Python are shown in blue while suggested R functions are shown in red. Note: if you are using Python, be sure to load the data as a Pandas DataFrame. Below are the tasks to perform. Before you begin, print the first few values of the columns with a header containing the string “FG”. (head(), head())a) (5 pts) Count the number of players with Free Throws per game greater than 0.5 and Assists per game greater than 0.7. (filter(), query()) b) (10 pts) Print the Player, Team, Field goals per game, Turnovers per game, and Points per game of the players with the 10 highest points, in descending order of points. (select(), arrange(), loc(), sort_values()). Which player has the seventh highest points?c) (10 pts) Add two new columns to the dataframe: FGP (in percentage) is the ratio of FG to FGA, FTP (in percentage) is the ratio of FT to FTA. Note that the unit should be expressed in percentage (ranging from 0 to 100) and rounded to 2 decimal places (e.g., for Jamal Cain, FGP is 43.33) (mutate(), assign()). What is the FGP and FTP for Josh Giddey?d) (10 pts) Display the average, min and max Offensive rebounds per game for each team, in descending order of the team average. (group_by(), summarise(), groupby(), agg()). You can exclude NAs for this calculation. Which team has the max Offensive rebounds per game?e) (15 pts) In question 1c, you added a new column called FTP. Impute the missing (or NaN) FTP values as the FGP (also added in 1c) multiplied by the average FTP for that team. Make a second copy of your dataframe, but this time impute missing (or NaN) FTP values with just the average FTP for that team. What assumptions do these data filling methods make? Which is the best way to impute the data, or do you see a better way, and why? You may impute or remove other variables as you find appropriate. Briefly explain your decisions. (group_by(), mutate(), groupby(),assign())Question 2. (50 pts total) For this question, you will first need to read section 5.3.1 in the R for Data Science book (https://r4ds.hadley.nz/data-tidy#sec-billboard). Grab the dataset “billboard” from the tidyr package (tidyr::billboard), and tidy it as shown in the case study before answering the following questions. The dataset is also available on the Modules page under Datasets forAssignments on Canvas. Note: if you are using Pandas you can perform these same operations by just replacing the pivot_longer() function with melt() and the pivot_wider() function with pivot(). a) (5 pts) Explain why this line > mutate(week = parse_number(week)) is necessary to properly tidy the data. What happens if you skip this line? b) (5 pts) How many entries are removed from the dataset when you set values_drop_na to true in the pivot_longer command (in this dataset)?c) (5 pts) Explain the difference between an explicit and implicit missing value, in general. Can you find any implicit missing values in this dataset? If so, where? d) (5 pts) Looking at the features (artist, track, date.entered, week, rank) in the tidied data, are they all appropriately typed? Are there any features you think would be better suited as a different type? Why or why not?e) (5 pts) Generate an informative visualization, which shows something about the data. Give a brief description of what it shows, and why you thought it would be interesting to investigate.f) (5 pts) Generate a line plot showing the rank progression of a specific song over time. You can choose a song you like best from the dataset. (Hint: higher ranks are better so reverse your axis appropriately). Briefly describe what the plot shows. g) (8 pts) Produce a barplot to show the count of songs per artist in the dataset. Limit the plot to the top 15 artists by number of songs. What are your thoughts about this top 15 list? Were you surprised by the presence of any particular artist?h) (12 pts) Suppose you have the following dataset called RevQtr (You can download this dataset from the Modules page, under Datasets for Assignments, on Canvas): Group Year Qtr.1 Qtr.2 Qtr_3 Qtr.4 1 2022 61 24 81 70 1 2023 30 92 96 84 1 2024 84 97 33 12 2 2022 31 62 11 97 2 2023 39 47 11 73 2 2024 69 30 42 85 3 2022 67 31 98 58 3 2023 68 51 69 89 3 2024 24 71 71 56 4 2022 71 60 64 73 4 2023 12 60 16 30 4 2024 82 48 27 13The table consists of 6 columns. The first shows the Group code, the second shows the year and the last four columns provide the revenue for each quarter of the year. Re-structure this table and show the code you would write to tidy the dataset (using gather()/pivot_longer() and separate()/pivot_wider() or melt() and pivot()) such that the columns are organized as:Group, Year, Interval_Type, Interval_ID and Revenue. Note: Here the entire Interval_Type column will contain value ‘Qtr’ since the dataset provides revenue for every quarter. The Interval_ID will contain the quarter number. Below is an instance of a row of the re-structured table:Group Year Interval_Type Interval_ID Revenue 1 2022 Qtr 1 61 How many rows does the new dataset have?

$25.00 View

[SOLVED] Cpts 475/575: data science assignment 2: r basics and exploratory data analysis

1 (50 points). This exercise relates to the Red Wine Quality dataset (winequality-red.csv), which can be found under the Datasets modules in Canvas. The dataset contains a number of physicochemical test variables for 1599 different red wine variants of the Portuguese “Vinho Verde” wine. The variables are • fixed_acidity • volatile_acidity • citric_acid • residual_sugar • chlorides • free_sulfur_dioxide • total_sulfur_dioxide • density • pH • sulphates • alcohol (output variable based on sensory data) • quality (score between 0 and 10)Before reading the data into R or Python, you can view it in Excel or a text editor. For each of the following questions, include the code you used to complete the task as your response, along with any plots or numeric outputs produced. You may omit outputs that are not relevant (such as dataframe contents), but still include all of your code.(a, 6 points) Use the read.csv() function to read the data into R, or the csv library to read in the data with python. In R you will load the data into a dataframe. In python you may store it as a list of lists or use the pandas dataframe to store your data. Call the loaded data red_wine_data. Ensure that your column headers are not treated as a row of data. (b, 8 points) Find the median quality of all the wine samples. Then find the mean alcohol level for all the wine samples.(c, 8 points) Produce a scatterplot that shows the relationship between wine density and volatile_acidity. Ensure it has appropriate axis labels and a title. Briefly state if you see any effect of volatile_acidity on density.(d, 10 points) Create a new qualitative variable, called ALevel, by binning the alcohol variable into two categories (High and Medium). Specifically, divide the data into two groups based on whether the alcohol level exceeds 10.5 or not (alcohol greater than 10.5 is considered High otherwise it is considered Medium).Now produce side-by-side boxplots of the ratio of sulphates to chlorides (hint: create a new variable that calculates sulphates / chlorides) for each of the two ALevel categories. There should be two boxes on your figure, one for High and one for Medium. How many samples are in the High category?(e, 8 points) Produce a histogram showing the citric_acid numbers for both High and Medium (ALevel) wine samples. You may choose to show both on a single plot (using side by side bars) or produce one plot for High samples and one for Medium samples. Ensure whatever figures you produce have appropriate axis labels and a title.(f, 10 points) Continue exploring the data, producing two new plots of any type, and provide a brief (one to two sentence) summary of your hypotheses and what you discover. Feel free to think outside the box on this one but if you want something to point you in the right direction, look at the summary statistics for various features, and think about what they tell you. Perhaps try plotting various features from the dataset against each other and see if any patterns emerge.2 (50 points). This exercise involves the Bike Sharing dataset (bikes.csv) dataset which can be found under the Datasets modules in Canvas. The features of the dataset are: • date: Date of the observation • season: Season (1: winter, 2: spring, 3: summer, 4: fall) • holiday: Whether the day is a holiday (1: yes, 0: no) • workingday: Whether the day is a working day (1: yes, 0: no) • weather: Weather situation (1: clear, 2: misty/cloudy, 3: light snow/rain, 4: heavy rain/snow) • temp: Temperature in degrees Celsius • atemp: “Feels like” temperature in degrees Celsius • humidity: Relative humidity in % • windspeed: Wind speed (km/h) • count: Count of total rental bikes(a, 6 points) Specify which of the predictors are quantitative (measuring numeric properties such as size or quantity) and which are qualitative (measuring non-numeric properties such as type, category, boolean variable, etc.). Keep in mind that a qualitative variable may be represented as a quantitative type in the dataset, or the reverse. Adjust the types of your variables based on your findings if necessary.(b, 8 points) What is the range, mean, and standard deviation of each quantitative predictor? Which season has the highest average bike rental count?(c, 8 points) Produce boxplots of bike rental counts by weather condition. Your figure should have a boxplot for each weather condition (1 through 4). Which weather condition has the highest median bike rental count?(d, 10 points) Produce a bar plot showing the count of rentals for each month of the year. (Hint: You can extract the month from the date variable using the format function in R.) Which month has the highest rentals?(e, 10 points) Using the full dataset, investigate the relationships between predictors graphically, using scatterplots, correlation scores, or other tools of your choice. Create a correlation matrix for the relevant quantitative variables.(f, 8 points) Suppose that we wish to predict the total count of bike rentals based on the other variables. Which, if any, of the other variables might be useful in predicting the bike rental count? Justify your answer based on the prior correlations.

$25.00 View

[SOLVED] Cpts 475/575: data science assignment 1: create data science

As I explained in class, the purpose of this task is to create a visual data science profile of yourself. Specifically, you will create two instances of profiles. The first will show the way you see yourself now. The second will show how you would like to see yourself by the end of the course.The profile is simple. On the horizontal-axis you will have seven “areas of skills” that could generally be regarded important to Data Science: 1) Computer Science. 2) Math. 3) Statistics. 4) Machine Learning. 5) Domain expertise. 6) Data visualization. 7) Communication and presentation skills. On the vertical-axis you will have a relative scale (think percentage) of your skill level in each of these areas. The area in which you have the strongest skill will be close to 100, and the area in which you think you have very little skill would be close to zero.As an example, see the slide in the lecture slides of Aug 23 that shows the data science profile of the author of the book “Doing Data Science”. As a context, the author, Rachel Schutt, has a PhD in Stat and has held several senior and executive-level Data Science positions in industry.Your task is to create your own profile – two to be exact, one showing current and the other projected. You are still a student and you may not feel you have a lot of skill in some of these areas. Allow yourself a generous interpretation of skill level and keep in mind that this is on a relative scale. Also, keep in mind that it is perfectly okay to have zero skill level in some of these areas.For example, if you are a computer science major, it is natural that “Domain expertise” would be the area in which you have the lowest skill level among the seven, and it is okay for it to be close to zero. That said, if you have had an internship some place or an interest/hobby that you think has helped you acquire some expertise in an area, you could take that into account in deciding the level of your “Domain expertise”. In any event, make sure to mention what the domain is if you indicate your domain expertise to be non-zero.You can use any tool (Excel, R, Python, etc) you wish to make the plots. Here are a few associated presentation considerations and discussion points you are asked to address as part of this task.1.a. (40 points) The areas in the horizontal axis could be ordered in a number 1 of different ways. What ordering in your opinion would be most effective (and aestetically pleasing) and why? Create your profile in the order you chose.1.b. (10 points) Is there a skill (bucket) you think should be added to this data science profile? A skill you think should be removed? Specify and justify briefly.Task 2 (50 points total) As you recall, we briefly discussed the article “Data Science and Prediction” by Vasant Dhar in class in connection with the topic “what is data science?” A link to a copy of the article is posted on Canvas. Read the article and briefly answer the following questions.2.a. (15 points) The author identifies a few ways in which data science differs from statistics. What are those ways?2.b. (25 points) In the section of the article headed “Knowledge Discovery” (pages 70 to 72 of the article), the author makes a distinction between domains in terms of the predictive power of their theories (models). Secifically, the author points out that models in the physical sciences are generally expected to be “complete”, whereas in the social sciences they are generally “incomplete”. The author discusses ways in which “big data” could potentially put domains on both ends of this spectrum on firmer grounds in terms of theory development. Give a brief summary of the ways the author identifies. Do you see any additional ways than what the author sees? (If the discussion in this section of the article reasonated in some ways with your own research or work you do, feel free to incorporate that in your answer.)2.c. (10 points) Imagine you were asked to write a “head-line” (as you see in newspapers) for this article, followed by two or three very telling summary sentences. What would your headline and the summary sentences be? Weight: Task 1 carries 50% – broken down as 40% for 1.a and 10% for 1.b Task 2 carries 50% – broken down as 10% for 2.a, 15% for 2.b and 25% for 2.c.

$25.00 View

[SOLVED] Aiml 231 assignment 3: regression, clustering and nns

This question involves using linear regression techniques to predict the fuel efficiency, measured by miles per gallon (MPG) of vehicles based on various attributes in the Auto MPG dataset. This analysis will help understand the influence of different vehicle characteristics such as engine size, weight, and horsepower on fuel economy. The Auto MPG dataset describes city-cycle fuel consumption in MPG with several car attributes such as car weight, displacement, horsepower, etc. You will use the Auto MPG dataset provided by seaborn.load dataset.Data Loading and Preprocessing • Load the Auto MPG dataset using seaborn.load dataset(’mpg’). • Use the function train test split from sklearn.model selection to perform a 80/20 split to form the training and test sets. Set the random seed to 231 for reproducibility.(a) Conduct exploratory data analysis (EDA) to visualize and summarize the training set. You should include histograms to show distributions of variables and scatter plots to understand relationships between each pair of variables. Highlight important patterns in the report.(b) Data Preprocessing before linear regression: • Examine the dataset to find any missing values. Implement an appropriate imputation technique to manage missing data. Clearly document the method you choose for imputation.• Examine the dataset to find any categorical variables. Review the encoding techniques discussed in our lectures and select the most appropriate method for each categorical variable in the dataset considering factors such as the number of categories and the ordinal nature of the data. Document the chosen encoding methods.Provide a brief justification for your choice. (c) Construct a linear regression model to predict MPG of a vehicle using the dataset’s features. Report the coefficients, the training and test performance of your model using R-squared and mean squared error (MSE).Explore and compare the performance of K-means clustering and hierarchical clustering on a synthetic dataset to identify natural groups within the data.Data Preparation • use make blobs from sklearn.datasets to generate a dataset with four features, three clusters, and 300 samples. Leave other parameters to be default values and set the random seed to 231 for reproducibility.(a) Implement K-means clustering using sklearn.cluster.KMeans on the dataset. Determine the best K by evaluating the silhouette scores for various K values ranging from 2 to 5. (b) Visualisation of your clusters with Principal Component Analysis (PCA): utilize PCA to reduce the dimensionality of your dataset. Specifically, project the data onto the first two principal components, which will serve as the new axes for visualisation. Construct and present a scatter plot using these two principal components. Each cluster should have a different color.(c) Apply hierarchical clustering to the same dataset using sklearn.cluster.Agglomerative Clustering with the following linkage methods: single, complete, and average. Create dendrograms using scipy.cluster.hierarchy.dendrogram to visually represent the clusters. Ensure that each dendrogram is clearly labeled. Compare the effect of these linkage methods on creating clusters in this scenario.(d) Discuss the advantages and disadvantages of hierarchical clustering compared with Kmeans clustering in this scenario.This question is to show a basic understanding of neural networks by implementing a multilayer perceptron (MLP) model in PyTorch to classify handwritten digits from the Digits dataset. The dataset contains 1,797 images of handwritten digits, each image being an 8×8 pixel grayscale image of a digit (0-9). Each image is represented as a 64-feature input vector, corresponding to the grayscale values of the pixels. As part of this question, there will be a compulsory in-person marking for your code part which will contribute 10 out of the total 60 marks. You will be required to demonstrate the neural network model you have developed. During the in-person marking session, you will present your code and explain your decision-making process regarding the building, and training of your neural network model.Data Loading and Preprocessing • Load the Digits dataset using sklearn.datasets.load digits(). • Split the data into a training set (80%) and a testing set (20%). Set the random seed to 231 for reproducibility.• Normalize the images by scaling the pixel values to a range of 0 to 1. • Convert the datasets into PyTorch tensors and create DataLoader objects for both training and testing sets.(a) Define a neural network class by extending torch.nn.Module. The network should have one input layer, one hidden layer with 128 neurons, and one output layer. Use the ReLU activation function for the hidden layer, and the softmax activation function for the output layer. Determine the number of neurons in the input and output layers and justify your answer. Implement your neural network class accordingly.(b) Use the torch.nn.CrossEntropyLoss for your loss function, choose an optimizer from torch.optim.SGD and torch.optim.Adam and set an appropriate learning rate, provide justifications for your choices. Train the model for 15 epochs. After each epoch, print the training loss and accuracy.(c) After training, evaluate the model on the test set to measure its accuracy. Print the test accuracy and show five example predictions along with their actual labels. (d) Evaluate and compare the effectiveness of different activation functions including Sigmoid and Tanh in place of ReLU.(e) Consider the network architecture, how would adding more hidden layers or changing the number of neurons in a layer affect the model’s performance? Expected Outputs (Remember to put the outputs in your report)• Training Output: At the end of each training epoch, your program should display: – Average loss for the epoch. – Training accuracy for the epoch. Example output after each epoch: Epoch 1: Loss = 2.302, Accuracy = 11% Epoch 2: Loss = 1.904, Accuracy = 32% … Epoch 15: Loss = 0.312, Accuracy = 90%• Testing Output: After the model has been trained, report the overall accuracy on the test dataset. Also, include a few example images from the test set alongside their predicted and actual labels to visually demonstrate the model’s performance.Example output: Test Accuracy: 88% Additionally, display several test images along with their predicted and actual labels to visually assess the model’s performance. This can be presented in a table or as image plots with captions. For example: Test Image 1: Predicted Label = 3, Actual Label = 3 Test Image 2: Predicted Label = 7, Actual Label = 7 … Test Image 5: Predicted Label = 4, Actual Label = 9Assessment Format: You can use any font to write the report, with a minimum of single spacing and 11 point size (hand writing is not permitted unless with approval from the lecturers). Reports are expected to be at most 8 pages that cover all the questions described above. Late Penalty: Late submissions for assignments will be managed under the ”Three Late Day Policy”. You will have three automatic extension days, which can be applied to any assignments throughout the course. No formal application is required; instead, any remaining late hours will be automatically deducted when submitting assignments after the due date. You have the flexibility to use only a portion of your late day and retain the remainder for future use.Please note that these three days are for the whole course, not for each assignment. The penalty for assignments that are handed in late without prior arrangement (or use of ”late days”) is one grade reduction per day. Assignments that are more than one week late will not be marked. If you require any extension due to exceptional circumstances (like medical), you need to email the course coordinator.Submission: You are required to submit a .pdf report and your source code files as a Jupyter notebook(.ipynb file) and/or a python code (.py) file through the web submission system from the AIML231 course website by the due time.

$25.00 View

[SOLVED] Aiml 231 assignment 2: machine learning pipeline

The goal of this assignment is to help you understand data manipulation and visualisation tools for machine learning. The purpose is to implement common data handling methods on real-world observations. To validate the effectiveness of the implemented methods, you are also required to perform data analysis tasks to draw useful conclusions.In particular, the following topics should be reviewed: • CRISP-DM • Machine Learning Pipeline • Exploratory Data Analysis (EDA) • Data Preprocessing • Feature Selection and Feature ConstructionIt requires use of python, numpy, matplotlib, scipy, and scikit-learn, and serves as an introduction to all those tools. You can run Python based on the template Jupyter notebook and Python code templates provided.In this assignment, your task is to build a machine learning pipeline to predict customers’ credit risks (good or bad) of a German bank. The dataset Credit and the feature descriptions can be downloaded from the Assignment page.The first part of this assignment is to explore the data and to define the machine learning task. You should: 1. Perform EDA as an initial step to analyse the Credit dataset. The analyses should be conducted on the whole dataset, i.e., on the “Data.csv” in the data folder. The analyses should explore the data from the four different aspects: AIML231: Assignment 2 Machine learning pipeline • Describe the summary statistics of the data. This should include the number of instances and number of features. Report the number of categorical and numerical features separately.• Identify the top three numerical features with the highest correlation with the target variable Credit Risk according to the Pearson correlation, and report their correlation values.• Plot the distributions of these three numerical features identified in the previous question and the target variable using histograms. One histogram for each feature/variable. Describe how to determine the number of bins to draw the histograms. Based on the histograms, describe the shape of their distributions (i.e., Positive or Negative or Zero) with respect to their skewness and kurtosis (use Scipy for obtaining skewness and kurtosis values).• Check for missing values. Write a paragraph to briefly summarise how many features containing missing values and the percentage of missing values for each incomplete feature.2. Among the three machine learning tasks: classification, regression, and clustering, which one does this problem belong to? Justify your answer. Provide answers to above questions in your report. Submit your Jupyter Notebook file (.ipynb) or your Python file (.py) that shows how you get the answers.It is crucial to partition the data prior to preprocessing in any supervised learning task to prevent data leakage. Therefore, we must split the whole dataset into a training set and a test data. Any preprocessing model must be trained on the training set only. The trained preprocssing model, then, can be applied to process both the training set and the test set.You should determine the appropriate approaches to perform the following preprocessing steps. • Encoding categorical data to numerical data. • Handling missing data. • Normalising/standardising the data.Describe your chosen approaches and and your rationale for selecting them in your report. Show the preprocessing steps in the preprocess() function which takes the original training and test sets as its input and outputs the processed training and test sets. All following questions (Parts 4 and 5) should use the processed sets.A straightforward feature selection approach is to rank features based on their relevance to the target output. Then, we can select the top-ranked features for use in our machine learning task. In this part of the assignment, you will use Mutual Information to rank features. The higher the mutual information score, the better the feature. You must use sklearn.feature_selection.mutual_info_classif to calculate the mutual information between each feature and the target variable.You should: 1. Implement the feature_ranking() method that takes a training set as its input and outputs a feature subset containing top five features. 2. Write a short report that includes: • Report the top five features selected by the aforementioned feature selection process.• Evaluate and compare the performance on the test set using the subset containing the top five features and the original feature set. Determine which one is better and provide your justification. • Use a heatmap to show the Pearson correlation between the top five features. In your report, you should show your heatmap, provide an analysis of the visualisation, and interpret how the features relate to each other.Sequential Forward Feature Selection (SFFS) is a well-known feature selection method. The task of this part is to implement the SFFS algorithm in Python based on the provided code template and then examine the selected features. The pseudocode of the algorithm can be seen in Algorithm 1. Algorithm 1 Sequential Forward Feature Selection1: Input: Training set with D features, number of selected features d 2: Output: A selected feature set S 3: Initialize set of selected features: S ← ∅ 4: Initialize set of remaining features: R ← {f1, f2, …, fD} 5: for i = 1 to d do 6: Find the feature f ∗ in R that achieves the best score when combined with S 7: Remove f ∗ from R 8: Add f ∗ to S 9: end for 10: return S 5.1 ImplementationYou will need to complete two methods: sequential_feature_selection() and sequential_score(): • sequential_feature_selection() is a method that takes a training set and the number of features d that you wish to select. The method starts with an empty set, and iteratively adds one feature at a time to the set until d features are selected in the set. The method finally outputs the selected feature set. sequential_feature_selection() will use sequential_score() to determine which feature to add at each step.• sequential_score() is a method that takes a feature subset S and a training set. It uses 10-fold cross validation to evaluate the classification performance of the feature subset S. In this part, we will use KNN(K = 3) as the classifier.5.2 Report Write a concise report that includes: 1. Is the implemented SFFS algorithm a filter, embedded, or wrapper feature selection approach? Justify your answer. 2. Is the implemented SFFS algorithm a feature ranking or feature subset selection approach? Justify your answer.3. Set the number of selected features d to five and run the sequential selection algorithm. • Report the five selected features and the testing performance achieved when using these five features. • Use a heatmap to show the Pearson correlation between the five selected features. Provide an analysis of the visualisation, interpreting how the features relate to each other.• Compare the testing performance of the implemented SFFS algorithm and the testing performance of the feature ranking algorithm. Discuss which one is better and provide your justification.Assessment • Format: You can use any font to write the report, with a minimum of single spacing and 11-point size (handwriting is not permitted without approval from the course coordinator). Reports are expected to be at most 8 pages that cover all the questions described above.• Late Penalties: Late submissions for assignments will be managed under the “Three Late Day Policy”. You will have three automatic extension days, which can be applied to any assignments throughout the course. No formal application is required; instead, any remaining late hours will be automatically deducted when submitting assignments after the due date. You have the flexibility to use only a portion of your late day and retain the remainder for future use. Please note that these three days are for the whole course, not for each assignment. The penalty for assignments that are handed in late without prior arrangement (or use of “late days”) is one grade reduction per day. Assignments that are more than one week late will not be marked. If you require any extension due to exceptional circumstances (like medical), you need to email the course coordinator.

$25.00 View

[SOLVED] ALY3015 Assignment 4 Web

ALY3015 Assignment 4 For the dataset “mtcars” dataset within R, i) Perform. PCA on the centered dataset without the “mpg” variable. Which variables would you exclude from the dataset prior to doing PCA? Plot the screeplot and the biplot of the PCA results. Describe the findings from both plots (15 points) ii) What fraction of total variation is explained by the 1st PC? Which variables have top 3 positive and negative loadings on the 1st PC? Describe your findings. (10 points) iii) Calculate correlation between “mpg” variable and all PC’s. State one conclusion from your findings from question ii and iii (5 points)

$25.00 View

[SOLVED] ECE 219 Winter 2025 Project 2 Data Representations and Clustering Python

Large-Scale Data Mining:  Models and Algorithms ECE 219 Winter 2025 Project 2: Data Representations and Clustering Due February 07, 2025 by 11:59 pm Introduction Machine learning algorithms are applied to a wide variety of data, including text and images. Before applying these algorithms, one needs to convert the raw data into feature representa- tions that are suitable for downstream algorithms. In project 1, we studied feature extraction from text data, and the downstream task of classification.  We also learned that reducing the dimension of the extracted features often helps with a downstream task. In this project, we explore the concepts of feature extraction and clustering together.  In an ideal world, all we need are data points – encoded using certain features– and AI should be able to find what is important to learn, or more specifically, determine what are the underlying modes or categories in the dataset.  This is the ultimate goal of General AI: the machine is able to bootstrap a knowledge base, acting as its own teacher and interacting with the outside world to explore to be able to operate autonomously in an environment. We first explore this field of unsupervised learning using textual data, which is a continuation of concepts learned in Project 1. We ask if a combination of feature engineering and clustering techniques can automatically separate a document set into groups that match known labels. Next we focus on a new type of data, i.e.   images.   Specifically,  we first explore how to use “deep learning” or “deep neural networks (DNNs)” to obtain image features.  Large neural networks have been trained on huge labeled image datasets to recognize objects of different types from images.  For example, networks trained on the Imagenet dataset can classify more than one thousand different categories of objects.  Such networks can be viewed as comprising two parts:  the first part maps a given RGB image into a feature vector using convolutional filters, and the second part then classifies this feature vector into an appropriate category, using a fully-connected multi-layered neural network (we will study such NNs in a later lecture).  Such pre-trained networks could be considered as experienced agents that have learned to discover features that are salient for image understanding.   Can one use the experience of such pre- trained agents in understanding new images that the machine has never seen before?  It is akin to asking a human expert on forensics to explore a new crime scene.  One would expect such an expert to be able to  transfer their domain knowledge into a new scenario.   In a similar vein, can a pre-trained network for image understanding be used for transfer learning? One could use the output of the network in the last few layers as expert features.  Then, given a multi-modal dataset –consisting of images from categories that the DNN was not trained for– one can use feature engineering (such as dimensionality reduction) and clustering algorithms to automatically extract unlabeled categories from such expert features. For both the text and image data, one can use a common set of multiple evaluation metrics to compare the groups extracted by the unsupervised learning algorithms to the corresponding ground truth human labels. Clustering Methods Clustering is the task of grouping a dataset in such a way that data points in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). Thus, there is an inherent notion of a metric that is used to compute similarity among data points, and different clustering algorithms differ in the type of similarity measure they use, e.g., Euclidean vs Riemannian geometry.  Clustering algorithms are considered  “unsupervised learning”,i.e. they do not require labels during training. In principle, if two categories of objects or concepts are distinct from some perspective (e.g. visual or functional), then data points from these two categories – when properly coded in a feature space and augmented with an associated distance metric – should form. distinct clusters.  Thus, if one can perform. perfect clustering then one can discover and obtain computational characterizations of categories without any labeling. In practice, however, finding such optimal choices of features and metrics has proven to be a computationally intractable task, and any clustering result needs to be validated against tasks for which one can measure performance.  Thus, we use labeled datasets in this project, which allows us to evaluate the learned clusters by comparing them with ground truth labels. Below, we summarize several clustering algorithms: K-means: K-means clustering is a simple and popular clustering algorithm.  Given a set of data points {x1,..., xN } in multidimensional space, and a hyperparameter K denoting the number of clusters, the algorithm finds the K cluster centers such that each data point belongs to exactly one cluster. This cluster membership is found by minimizing the sum of the squares of the distances between each data point and the center of the cluster it belongs to. If we define μk  to be the “center” of the kth cluster, and Then our goal is to find rnk ’s and μk ’s that minimize  The approach of K-means algorithm is to repeatedly perform the following two steps until convergence: 1.  (Re)assign each data point to the cluster whose center is nearest to the data point. 2.  (Re)calculate the position of the centers of the clusters:  setting the center of the cluster to the mean of the data points that are currently within the cluster. The center positions may be initialized randomly. Hierarchical Clustering Hierarchical clustering is a general family of clustering algorithms that builds nested clusters by merging or splitting them successively.  This hierarchy of clusters is represented as a tree (or dendrogram).  A flat clustering result is obtained by cutting the dendrogram at a level that yields a desired number of clusters. DBSCAN DBSCAN or Density-Based Spatial Clustering of Applications with Noise findscore samples of high density and expands clusters from them. It is a density-based clustering non-parametric algorithm:  Given  a set of points, the algorithm groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). HDBSCAN HDBSCAN extends DBSCAN by converting it into a hierarchical clustering al- gorithm, and then using an empirical technique to extract a flat clustering based on the stability of clusters (similar to the elbow method in k-Means).  The resulting algorithm gets rid of the hyperparameter  “epsilon”, which is necessary in DBSCAN  (see here for more on that). Common Clustering Evaluation Metrics In order to evaluate a clustering pipeline, one can use the ground-truth class labels and compare them with the cluster labels.  This analysis determines the quality of the clustering algorithm in recovering the ground-truth underlying labels.  It also indicates if the adopted feature extrac- tion and dimensionality reduction methods retain enough information about the ground-truth classes. Below we provide several evaluation metrics available in sklearn . metrics. Note that for the  clustering sub-tasks,  you  do  not  need to  separate your data to  training  and test sets. Question 25 requires you to split the data. Homogeneity is a measure of how  “pure” the clusters are.  If each cluster contains only data points from a single class, the homogeneity is satisfied. Completeness indicates how much of the data points of a class are assigned to the same cluster. V-measure is the harmonic average of homogeneity score and completeness score. Adjusted Rand Index is similar to accuracy, which computes similarity between the clus- tering labels and ground truth labels. This method counts all pairs of points that both fall either in the same cluster and the same class or in different clusters and different classes. Adjusted mutual information score measures the mutual information between the cluster label distribution and the ground truth label distributions. Dimensionality Reduction Methods In project 1, we studied SVD/PCA and NMF as linear dimensionality reduction techniques. Here, we consider some additional non-linear methods. Uniform Manifold Approximation and Projection  (UMAP) The UMAP algorithm constructs a graph-based representation of the high-dimensional data manifold, and learns a low-dimensional representation space based on the relative inter-point distances.  UMAP allows more choices of distance metrics besides Euclidean distance. In particular, we are interested  in  “cosine  distance”  for  text  data,  because  as  we  shall  see  it  bypasses  the magnitude of the vectors, meaning that the length of the documents does not affect the distance metric. Autoencoders An autoencoder  is a special type of neural network that is trained to copy its input to its output. For example, given an image of a handwritten digit, an autoencoder first encodes the image into a lower dimensional latent representation, then decodes the latent representation back to animage.  An autoencoder learns to compress the data while minimizing the reconstruction error. Further details can be found in chapter 14 of [4]. Part 1 - Clustering on Text Data In this part of the project, we will work with “20 Newsgroups” dataset, which is a collection of approximately 20, 000 documents, partitioned (nearly) evenly across 20 different newsgroups (newsgroups are discussion groups like forums, which originated during the early age of the Internet), each corresponding to a different topic.  Use the fetch 20newsgroups function from scikit-learn to load the dataset.  Detailed usage of the dataset and sample code can be found at this link. To get started with asimple clustering task, we work with a well-separable subset of samples from the larger dataset. Specifically, we define two classes comprising of the following categories. Clustering with Sparse Text Representations 1. Generate sparse TF-IDF representations:   Following the steps in Project 1, trans- form the documents into TF-IDF vectors.  Use min df  =  3, exclude the stopwords  (no need to do stemming or lemmatization), and remove the headers and footers. No need to do any additional data cleaning. QUESTION 1: Report the dimensions of the TF-IDF matrix you obtain. 2. Clustering:  Apply K-means clustering with k = 2 using the TF-IDF data.  Note that the KMeans class in sklearn has parameters named random   state, max   iter and n   init. Please use random   state=0, max   iter ≥ 1000 and n   init ≥ 30. You can refer to sklearn - Clustering text documents using k-means for a basic work flow. (a) Given the clustering result and ground truth labels, contingency table A is the matrix whose entries Aij  is the number of data points that belong to the i’th class and the j’th cluster. QUESTION 2: Report the contingency table of your clustering result.  You  may use the provided plotmat . py to visualize the matrix.  Does the contingency matrix have to be square-shaped? QUESTION 3: Report the 5 clustering measures explained in the introduction for K- means clustering. Clustering with Dense Text Representations As you may have observed, high-dimensional sparse TF-IDF vectors do not yield a good clus- tering result, especially when K-Means clustering is used.   One  of the  reasons is that in  a high-dimensional space, the Euclidean distance is not a good metric anymore, in the sense that the distances between data points tends to be almost the same (see [1]). K-means clustering has other limitations.   Since its objective is to minimize the sum of  within-cluster l2  distances, it implicitly assumes that the clusters are isotropically shaped, i.e. round-shaped.   When  the clusters  are not round-shaped,  K-means may fail to identify the clusters properly.  Even when the clusters are round, K-means algorithm may also fail when the clusters have unequal variances. A direct visualization for these problems can be found at sklearn - Demonstration of k-means assumptions. In this part we try to find a  “better” representation tailored to the performance of the downstream task of K-means clustering. Towards finding a better representation, we can reduce the dimension of our data with different methods before clustering. 1. Generate dense representations for better K-Means Clustering (a) First we want to find the effective dimension of the data through inspection of the top singular values of the TF-IDF matrix and see how many of them are significant in reconstructing the matrix with the truncated SVD representation. A guideline is to see what ratio of the variance of the original data is retained after the dimensionality reduction. QUESTION 4: Report the plot of the percentage of variance that the top r  principle components retain v.s. r, for r = 1 to 1000. Hint: explained variance ratio   of TruncatedSVD objects.  See sklearn document. (b) Now, use the following two methods to reduce the dimension of the data.  Sweep over the dimension parameters for each method, and choose one that yields better results in terms of clustering purity metrics. • Truncated SVD / PCA Note that you don’t need to perform SVD multiple times:  performing SVD with r = 1000 gives you the data projected on all the top 1000 principle components, so for smaller r’s, you just need to exclude the least important features. • NMF QUESTION 5: Let r be the dimension that we want to reduce the data to (i.e. n   components). Try r = 1 − 10, 20, 50, 100, 300, and plot the 5 measure scores v.s.  r for both SVD and NMF. Report a good choice of r for SVD and NMF respectively. Note:  In the choice of r, there  is a trade-off between the  information preservation, and better performance of k-means in lower dimensions. QUESTION 6: How do you explain the non-monotonic behavior of the measures as r increases? QUESTION 7: Are these measures on average better than those computed in Question 3? 2. Visualize the clusters We can visualize the clustering results by projecting the dimension-reduced data points onto a 2-D plane by once again using SVD, and coloring the points according to the: • Ground truth class label; • Clustering label respectively. QUESTION 8: Visualize the clustering results for: • SVD with your optimal choice of r for K-Means clustering; •  NMF with your choice of r for K-Means clustering. To  recap, you can accomplish this  by first creating the dense  representations and then once again  projecting these representations into a 2-D plane for visualization. QUESTION 9: What do you observe in the visualization?  How are the data points of the two classes distributed?  Is distribution of the data ideal for  K-Means clustering? 3. Clustering of the Entire 20 Classes We have been dealing with a relatively simple clustering task with only two well-separated classes.  Now let’s face a more challenging one:  clustering for the entire 20 categories in the 20newsgroups dataset. QUESTION  10: Load documents with the same configuration as  in Question 1,  but for ALL 20 categories. Construct the TF-IDF matrix, reduce its dimensionality using BOTH NMF and SVD (specify settings you choose and why), and perform K-Means clustering with  k=20 . Visualize the contingency matrix and report the five clustering metrics (DO BOTH NMF AND SVD). There is a mismatch between cluster labels and class labels.  For example, the cluster #3 may correspond to the class #8.  As a result, the high-value entries of the 20 × 20 contingency matrix can be scattered around, making it messy to inspect, even if the clustering result is not bad. One  can  use   scipy . optimize . linear_sum_assignment   to  identify  the  best-matching cluster-class pairs, and permute the columns of the contingency matrix accordingly.  See below for an example: import  numpy  as  np from plotmat  import plot_mat  #  using  the  provided plotmat.py from  scipy . optimize  import  linear_sum_assignment from  sklearn . metrics  import  confusion_matrix cm  =  confusion_matrix(labels,  clustering_labels) rows,  cols  =  linear_sum_assignment(cm, maximize=True) plot_mat(cm[rows[:, np . newaxis],  cols],  xticklabels=cols, →      yticklabels=rows,  size=(15 ,15)) 4. UMAP The clustering performance is poor for the 20 categories data.  To see if we can improve this performance, we consider UMAP for dimensionality reduction.  UMAP uses cosine distances to compare representations.  Consider two documents that are about the same topic and are similar, but one is very long while the other is short.  The magnitude of the TF-IDF vector will be high for the long document and low for the short one, even though the orientation of their TF-IDF vectors might be very close.  In this case, the cosine distance adopted by UMAP will correctly identify the similarity, whereas Euclidean distance might fail. QUESTION 11: Reduce the dimension of your dataset with UMAP. Consider the following settings: n   components = [5, 20, 200], metric = ”cosine” vs. ”euclidean” .  If ”cosine” metric fails, please look at the FAQ at the end of this spec. Report the permuted contingency matrix and the five clustering evaluation metrics for the different combinations (6 combinations). QUESTION  12: Analyze the contingency  matrices.  Which setting works best and why? What about for each metric choice? QUESTION  13: So far, we have attempted K-Means clustering with 4 different representation learning techniques (sparse TF-IDF representation, PCA-reduced, NMF-reduced, UMAP-reduced). Compare and contrast the clustering results across the 4 choices, and suggest an approach that is best for the K-Means clustering task on the 20-class text data.  Choose any choice of clustering metrics for your comparison. Clustering Algorithms that do not explicitly rely on the Gaussian distribution per cluster While we have successfully shown in the previous section that some representation learning techniques perform better than others for the task of K-Means clustering on this text dataset, this sweep only covers a half of the end-to-end solution for representation learning.  What if we changed the clustering method?  In this section we introduce 2 additional clustering algorithms. 1. Agglomerative Clustering The AgglomerativeClusteringobject performs a hierarchical clustering using a bottom up approach:  each observation starts in its own cluster,  and  clusters  are successively merged together. There are 4 linkage criteria that determines the merge strategy. QUESTION  14: Use UMAP to reduce the dimensionality properly, and perform Agglom- erative clustering with  n_clusters=20 .  Compare the  performance of  “ward” and  “single” linkage criteria. Report the five clustering evaluation metrics for each case. 2. HDBSCAN QUESTION 15: Apply HDBSCAN on UMAP-transformed 20-category data. Use  min_cluster_size=100 . Vary the min cluster size among 20, 100, 200 and report your findings in terms of the five clustering evaluation metrics - you will plot the best contingency matrix in the next question.   Feel  free to try  modifying other  parameters  in  HDBSCAN to get better performance. QUESTION 16: Contingency matrix Plot the contingency matrix for the best clustering model from Question 15. How many clusters are given by the model? What does  “-1” mean for the clustering labels? Interpret the contingency matrix considering the answer to these questions. QUESTION 17: Based on your experiments, which dimensionality reduction technique and clus- tering methods worked best together for 20-class text data and why?  Follow the table below.  If UMAP takes too long to converge, consider running it once and saving the intermediate results in a pickle file. Hint:  DBSCAN and  HDBSCAN do not accept the number of clusters as an input parameter.  So pay close attention to how the different clustering metrics are being computed for these methods. QUESTION  18: Extra credit:   If you can find creative ways to further enhance the clustering performance, report your method and the results you obtain. Part 2 - Deep Learning and Clustering of Image Data In this part, we aim to cluster the images of the tf flowers dataset. This dataset consists of images of five types of flowers.  Explore this link to see actual samples of the data. Extracting meaningful features from images has along history in computer vision. Instead of considering the raw pixel values as features, researchers have explored various hand-engineered feature extraction methods, e.g.  [5]. With the recent rise of “deep learning”, these methods are replaced with using appropriate neural networks. Particularly, one can adopt a neural network already trained to classify another large dataset of images.  These pre-trained networks have been trained to morph the highly non-smooth scatter of images in the higher dimension, into smooth lower-dimensional manifolds. In this project, we use a VGG network  [6] pre-trained on the ImageNet dataset  [7].  We provide a  helper  codebase  (check Week 4  in BruinLearn), which guides you through the necessary steps for loading the VGG network and for using it for feature extraction. QUESTION  19: In a brief paragraph discuss:  If the VGG  network  is trained on a dataset with perhaps totally different classes as targets, why would one expect the features derived from such a network to have discriminative power for a custom dataset? Use the helper code to load the flowers dataset, and extract their features.   To perform computations on deep neural networks fast enough, GPU resources are often required.  GPU resources can be freely accessed through “Google Colab” . QUESTION  20: In  a  brief  paragraph explain  how the  helper  code  base  is  performing  feature extraction. QUESTION  21: How  many  pixels are there  in the original images?   How  many features  does the VGG network extract per image; i.e what is the dimension of each feature vector for an image sample? QUESTION  22: Are  the  extracted  features  dense  or  sparse?   (Compare  with  sparse  TF-IDF features in text.) QUESTION 23: In order to inspect the high-dimensional features,t-SNE is a popular off-the-shelf choice for visualizing Vision features.  Map the features you have extracted onto 2 dimensions with t-SNE. Then plot the mapped feature vectors along x and y axes.  Color-code the data points with ground-truth labels.  Describe your observation. While PCA is a powerful method for dimensionality reduction, it is limited to  “linear” transformations.  This might not be particularly good if a dataset is distributed non-linearly. An alternative approach is use of an “autoencoder” or UMAP. The helper has implemented an autoencoder which is ready to use. QUESTION 24: Report the best result (in terms of rand score) within the table below. For HDBSCAN, introduce a conservative parameter grid over min   cluster   size and min   samples. Lastly, we can conduct an experiment to ensure that VGG features are rich enough in information about the data classes.  In particular, we can train a fully-connected neural network classifier to predict the labels of data. For this task, you may use the MLP  module provided in the helper code base. QUESTION  25: Report the test accuracy of the MLP classifier on the original VGG features. Report the same when  using the  reduced-dimension features  (you  have  freedom  in  choosing the dimensionality reduction algorithm and its parameters).  Does the  performance of the model suffer with the  reduced-dimension  representations?   Is  it  significant?   Does  the  success  in  classification make sense in the context of the clustering results obtained for the same features in Question 24. Part 3 - Clustering using both image and text In part  1  and  part  2,  we  have  practived  the  art of clustering text  and images separately. However, can we map image and text to the same space?   In the Pokemon world, Pokedex catalogs Pokemon’s appearances and various metadata. We will build our Pokedex from image dataset link and meta metadata link. Fortunately, ECE 219 Gym kindly provides new Pokemon trainers with the helper code for data preprocessing and inferencing.  Please find the code on Bruinlearn modules Week 4. Each Pok´emon may be represented by multiple images and up to two types (for example, Bulbasaur is categorized as both Grass and Poison types).  In this section, we will focus on the first image (named 0.jpg) in each folder for our analysis. We will use the pre-trained CLIP [8] to illustrate the idea of multimodal clustering.  CLIP (Contrastive Language–Image Pretraining) is an innovative model developed by OpenAI, de- signed to understand and connect concepts from both text and images.  CLIP is trained on a vast array of internet-sourced text-image pairs.  This extensive training enables the model to understand a broad spectrum of visual concepts and their textual descriptions.  Figure 1: CLIP training summary CLIP consists of two primary components: a text encoder and an image encoder. The text encoder processes textual data, converting sentences and phrases into numerical representa- tions.   Simultaneously, the image encoder transforms visual inputs into a corresponding set of numerical values.  These encoders are trained to map both text and images into a shared embedding space, allowing the model to compare and relate the two different types of data di- rectly. The training employs a contrastive learning approach, where the model learns to match corresponding text and image pairs against numerous non-matching pairs.  This approach helps the model in accurately associating images with their relevant textual descriptions and vice versa.

$25.00 View

[SOLVED] MH6812 Advanced Natural Language Processing with Deep Learning

MH6812: Advanced Natural Language Processing with Deep Learning Project Proposal Instructions (a)         Team   Size:   Each  team   should  generally   contain  3-5  students.   But,  if  the   project  is significant enough, then more people may be allowed; please confirm with the instructor. (c)          Proposal: Some information you should think about when determining the topic: •    Goals/Objectives: Describe the goals of your project in terms of a scientific question you are trying to answer – e.g., your goal may be to investigate whether a particular model or technique performs well at a certain task, or whether you can improve a particular model by adding some new variant, or (for theoretical/analytical projects), you might have some particular hypothesis that you seek to confirm or disprove. Otherwise, your goal may be simply to successfully implement a complex neural model, and show that it performs well on  a  given  task.   Briefly  motivate  why  you  chose  this  goal  –  why  do  you  think  it  is important, interesting, challenging and/or likely to succeed? If you have any secondary or stretch goals (i.e. things you will do if you have time), please also describe them. •    NLP tasks: What  NLP tasks/applications you intend to consider for your  model. Describe the task clearly (i.e., give an example of an input and an output, if applicable) •    Data: The  dataset(s) you will  use. What  kind of preprocessing they  need. If you plan to collect your own data, describe how you will do that and how long you expect it to take. •    Neural  Models:  Describe  the  models  and/or  techniques you  plan to  use.  Make  it clear which  parts you  plan  to  implement  yourself,  and which  parts you  will  download from elsewhere. If there is any part of your planned method that is original, make it clear. •    Baseline(s):  What  baselines  will you  use to  compare your model with?  Make it clear if these  will  be  implemented  by  you,  downloaded  from  elsewhere,  or   if  you  will  just compare with previously published scores. •    Evaluation:   How   will  you  evaluate  your   results?  Specify  at   least  one  well-defined, numerical,  automatic  evaluation  metric  you  will  use  for  quantitative  evaluation.  What existing  scores  will  you  be  comparing  against  for  this  metric?  For  example,  if  you’re reimplementing or extending a method, state what score(s) the original method achieved; if  you’re  applying  an  existing  method  to  a  new  task,  mention  the  state-of-the-art performance on the new task, and say something about how you expect your method to perform.  compared  to  other  approaches.   If  you   have  any   particular  ideas  about  the qualitative evaluation you will do, you can describe that too. •    Possible  Submission  (optional):  Do  you  plan  to  submit  the  work  to  a  conference  or journal in your field or in NLP? When is the deadline?

$25.00 View

[SOLVED] 4CCEIMCP Design Making a ConnectionMatlab

4CCEIMCP Design: Making a Connection Individual Coursework Project Autonomous Control Design of a Ship for Environmental Clean-up Throughout 4CCE1MCP    Design:Making     a     Connection, you  will  investigate  innovative solutions for the collection of floating marine debris.You will combine simulation-driven analysis with hardware experimentation to deliver a scaled-down prototype of your design at the end of the  semester.We  encourage  you  to   explore  multiple   design  solutions  and  reflect   on  what improves your design. In  this  document,you  are  provided  with  information  to  complete  the  individual  coursework assessment for the course.This short project will be an opportunity to gain intuition about the system you will build and to compete with your classmates on a control design solution. Context Problem:   Each  year,millions  of tonnes  of waste  and  pollutants  enter  the  ocean(The   Ocean Clean-up 2021).This waste causes significant environmental impact as it washes on our beaches, deposits on the ocean floor,and gets consumed by marine wildlife.A substantial portion of waste floats  and  accumulates  in  rotating  ocean  currents(NOAA 2021).The  biggest  accumulation  of marine debris is the Great Pacific Garbage Patch that is located between Hawaii and California (see Figure  1)and expands over  1.6 million square kilometres(The Ocean Clean-up 2021).   Figure   1:A close-up photograph of the Great Pacific Garbage Patch. Source:  Forbes (2019) Proposed     Engineering      Solutions:Several organisations are examining how to minimise and remove non-degradable waste that is being inappropriately  disposed  of in the  ocean(Oceana, The  Ocean  Cleanup,River  Cleaning,Ichthion).While  international  waste   directives  recognise the necessity of reducing and collecting waste at source(EU Waste Directive 2018,UK Waste Directive Amendments 2020),these organisations recognise that the clean-up of existing debris is an important strategy to safeguarding the health of oceans and marine wildlife. Popular collection strategies involve collecting waste directly at accumulation points(see Figure 2).River estuaries have been identified as a viable collection point(The Ocean Cleanup 2021).  Debris collects in river catchment areas and acts as a conveyor belt that disperses waste into the ocean.Waste is intercepted and sorted at the estuary. Floating waste dispersed into the ocean can sink to the floor or float away into oceanic currents. While most of the floating debris gets deposited on the coastline,a portion of the floating debris collects in rotating ocean currents.The resulting waste deposits contain millions of tonnes of waste(see Figure 1)and have also been identified as a viable collection point. Figure   2:(left)The  interceptor  collects  effluents  from  a  river,(right)waste  collection  in Ocean Source: The Ocean Cleanup 2021 Autonomous        Operation: It  is  desirable  to  have machines that can replace humans in dull, dirty,difficult,and   dangerous   tasks.While   automation   has    served   this   purpose    since   the Industrial Revolution,in recent years there has been a surging interest in intelligent,autonomous systems that can operate with little to no human intervention. Autonomous machines are playing an increasingly important role in environmental restoration. We are seeing new designs that allow robots to sort and recycle household waste,drones that can re-plant forests by dropping seeds over unprecedentedly large areas and autonomous tractors that can farm the land.In this project,you will investigate how to incorporate autonomy in the ship collection system that you design. Brief For the individual coursework assessment,you will develop an autonomous control algorithm and test the algorithm in simulation for the clean-up scenario provided (see Figure 2).This will help you prepare for the design,build and test group project. Starter files can be downloaded from the KEATS module page. Figure 2: Simulation scenario ship_template.slx You are responsible for: designing  a digital motor controller that can  autonomously  collect  all  floating  objects in the provided scenario and return them to the home collection point(see Figure 2); optimising  design  dimensions  (length  and  width   of  components)for  the   ship  collector relative to your controller design; testing the effectiveness of your design through simulation; submitting a version of your design for assessment; competing for the quickest clean-up time for the provided scenario. Your  controller  must   work  for  at  least  three  randomised  scenarios,which  you  can  select using  the  features  illustrated  below: Figure 2:Choose object location scenarios using the mask of the System block. ship_template.slx Deliverables You should submit: a  1-page  report(excluding  supporting  figures)on  your  control  design  which  includes: -    a  section that explains the rationale for your controller design -    a   section   that   discusses  the  benefits  and  drawbacks  of  using   feedback  vs feedforward strategy for this control design -    an analysis of your controller system response and its effectiveness at collecting the floating objects,including plots of motor commands and ship response. a   screenshot   of   your   control   logic,as   implemented   in   a   Stateflow   chart,MATLAB function or equivalent method a simulation submission file that is compliant with the specification outlines specified in the Instruction for Submission Section. a video recording of the ship simulation as it collects the objects,recorded at playback speed of x8,see instructions below. a zipped folder where the project model and data can be shared with assessors. Your report should be a marimum of 1 written page+2 pages  of appendix that contains your worked solutions,screenshots  of code  and  images  of any plots  you generate.The page  limit includes figures,all plots should have a title,labelled axis and units.You should use a 12pt Sans Serif  font,e.g.Calibri. Learning    objectives ● Implement a logic-driven controller to define operating modes for the ship and to sweep an  area Implement  and  tune  an  open-loop  controller  to  control  ship  motion  and  steering Specify  component  parameters  for  the  collector  based  on  a  simulation-driven  design analysis ●  Test  controller  design  in  simulation Marking Criteria Individual coursework submission accounts for 25%of module grade. Simulation     Performance: 40%of your  grade  will  be  awarded  based  on  how  quickly  you managed to complete the scenario in simulated time. Metric Weight Marks Return five or more objects to base in less than120s in simulated time 25% 10 Return seven or more objects to base in less than 90s in simulated time 25% 10 Return allobjects to base in less than 45s in simulated time 25% 10 Return all nine objects to base in less than 30s in simulated time 25% 10   Total (100%) 40  

$25.00 View

[SOLVED] ECON 584 Spring 2023 Homework 3 Java

ECON 584 Spring 2023 Homework #3 Due Feb 8 before class 1. The first part of this homework is to replicate the regression analysis for the admail demand model in Chapter 1. To begin, use SST or Excel (or another program) and the dataset from Chapter 1 “master.xlsx” to replicate the demand regression. You will use “newmaster.xlsx” for part #6 of this assignment. 2. Replicate the regression in Table 1.1 of the text. Note the estimation period is all periods ending in January 1996 (142 observations). Please double check the estimation period against the book. For the assignment, please copy and paste the regression output and highlight the regression coefficients (which should match the ones in the book). HINT: You’ll need to create some variables; you’ll also have to create other variables by dividing some of ones already there by CPI or taking the natural log of others. 3. Use the regression coefficients to create a model that predicts demand and express it like     “Y  = -20.1338 - 0.32071*ln(pi_mnppr/cpi) + …” Graph predicted and actual demand for the regression model in Step 1 using Excel. 4. Interpret the regression coefficients of ln(pi_mnppr/cpi), ln(all_ad_p/cpi),  ln(ret_m_al/cpi) from Question 2. Also interpret the meaning of R^2 in your regression result. 5. Consider an alternative model with only trend and seasonal dummy variables and higher order trend variables. Write out your model in the same format as used in part 3 (e.g. Y = -281.1470 - 5.1326*trend + …). Does your model fit better or worse? Show your model in a graph. Copy and paste regression output from this model. Do you prefer it? Why? HINT: to answer the “why” question, don’t just look at R^2. 6. Use your model and the model in Table 1.1 to forecast demand in the period June 1996 through July 1997. How do the forecasts compare? Graph the forecast and the actual data on in one plot. You will be doing an ex-poste forecast only i.e. pretending as if you knew the values in the period June 1996 through July 1997 before they actually occurred.  

$25.00 View