Q1 [40 points] Collecting and visualizing The Movie DB (TMDb) data Q1.1 [25 points] Collecting Movie Data You will use “The Movie DB” API to: (1) download data about movies and (2) for each movie, download its 5 similar movies. You will write some Python 3 code (not Python 2.x) in script.py in this question. You will need an API key to use the TMDb data. Your API key will be an input to script.py so that we can run your code with our own API key to check the results. Running the following command should generate the CSV files specified in part b and part c: python3 script.py Please refer to this tutorial to learn how to parse command line arguments. Please DO NOT leave your API key written in the code. Note: The Python Standard Library and the requests library are allowed. Python wrappers (or modules) for the TMDb API may NOT be used for this assignment. Pandas also may NOT be used we are aware that it is a useful library to learn. However, to make grading more manageable and to enable our TAs to provide better, more consistent support to our students, we have decided to restrict the libraries to the more “essential” ones mentioned above. a. How to use TheMovieDB API: ● Create a TMDb account and request for an API key https://www.themoviedb.org/account/signup. Refer to this document for detailed instructions. ● Refer to the API documentation https://developers.themoviedb.org/3/gettingstarted/introduction , as you work on this question. Note: The API allows you to make 40 requests every 10 seconds. Set appropriate timeout intervals in your code while making requests. We recommend you think about how much time your script will run for when solving this question, so you will complete it on time. The API endpoint may return different results for the same request. b. [10 points] Search for movies in the “Comedy” genre released in the year 2000 or later. Retrieve the 300 most popular movies in this genre. The movies should be sorted from most popular to least popular. Hint: Sorting based on popularity can be done in the API call. ● Documentation for retrieving movies: https://developers.themoviedb.org/3/discover/moviediscover https://developers.themoviedb.org/3/genres/getmovielist ● Save the results in movie_ID_name.csv. Each line in the file should describe one movie, in the following format NO space after comma, and do not include any column headers: movieID,moviename For example, a line in the file could look like: 353486,Jumanji: Welcome to the Jungle Note: You may need to make multiple API calls to retrieve all 300 movies. For example, the results may be returned in “pages,” so you may need to retrieve them page by page. Please use the “primary_release_date” parameter instead of the “release_date” parameter in the API when retrieving movies released in the year 2000 or later. The “release_date” parameter will incorrectly return a movie if any of its release dates fall within the years listed. c. [15 points] For each of the 300 movies, use the API to find its 5 similar movies. If a movie has fewer than 5 similar movies, the API will return as many as it can find. Your code should be flexible to work with however many movies the API returns. ● Documentation for obtaining similar movies: https://developers.themoviedb.org/3/movies/getsimilarmovies ● Save the results in movie_ID_sim_movie_ID.csv. Each line in the file should describe one pair of similar movies NO space after comma, and do not include any column headers: movieID,similarmovieID Note: You should remove all duplicate pairs after the similar movies have been found. That is, if both the pairs A,B and B,A are present, only keep A,B where A < B. For example, if movie A has three similar movies X, Y and Z; and movie X has two similar movies A and B, then there should only be four lines in the file. A,X A,Y A,Z X,B You do not need to fetch additional similar movies for a given movie, if one or more of its pairs were removed due to duplication. Deliverables: Place all the files listed below in the Q1 folder. ● movie_ID_name.csv: The text file that contains the output to part b. ● movie_ID_sim_movie_ID.csv: The text file that contains the output to part c. ● script.py: The Python 3 (not Python 2.x) script you write that generates both movie_ID_name.csv and movie_ID_sim_movie_ID.csv. Note : Q1.2 builds on the results of Q1.1. Specifically, Q1.2 asks that the “Source,Target” be added to the resulting file from Q1.1. If you have completed both Q1.1 and Q1.2, your csv would have the header row — please submit this file. If you have completed only Q1.1, but not Q1.2 (for any reasons), then please submit the csv file without the header row. Q1.2 [15 points] Visualizing Movie Similarity Graph Using Gephi, visualize the network of similar movies obtained. You can download Gephi here. Ensure your system fulfills all requirements for running Gephi. a. Go through the Gephi quickstart guide. b. [2 points] Insert Source,Target as the first line in movie_ID_sim_movie_ID.csv. Each line now represents a directed edge with the format Source,Target. Import all the edges contained in the file using Data Laboratory in Gephi. Note: Remember to check the “create missing nodes” option while importing since we do not have an explicit nodes file. c. [8 points] Using the following guidelines, create a visually meaningful graph: ● Keep edge crossing to a minimum, and avoid as much node overlap as possible. ● Keep the graph compact and symmetric if possible. ● Whenever possible, show node labels. If showing all node labels create too much visual complexity, try showing those for the “important” nodes. ● Using nodes’ spatial positions to convey information (e.g., “clusters” or groups). Experiment with Gephi’s features, such as graph layouts, changing node size and color, edge thickness, etc. The objective of this task is to familiarize yourself with Gephi and hence is a fairly open ended task. d. [5 points] Using Gephi’s builtin functions, compute the following metrics for your graph: ● Average node degree (run the function called “Average Degree”) ● Diameter of the graph (run the function called “Network Diameter”) ● Average path length (run the function called “Avg. Path Length”) Briefly explain the intuitive meaning of each metric in your own words. You will learn about these metrics in the “graphs” lectures. Deliverables: Place all the files listed below in the Q1 folder. ● For part b: movie_ID_sim_movie_ID.csv (with Source,Target as its first line). ● For part c: an image file named “graph.png” (or “graph.svg”) containing your visualization and a text file named “graph_explanation.txt” describing your design choices, using no more than 50 words. ● For part d: a text file named “metrics.txt” containing the three metrics and your intuitive explanation for each of them, using no more than 100 words. Q2 [35 pt] SQLite The following questions will help refresh your memory about SQL and get you started with SQLite a lightweight, serverless embedded database that can easily handle up to multiple GBs of data. As mentioned in class, SQLite is the world’s most popular embedded database. It is convenient to share data stored in an SQLite database just one crossplatform file, and no need to parse (unlike CSV files). You will modify the given Q2.SQL.txt file to add SQL statements and SQLite commands to it. We will autograde your solution by running the following command that generates Q2.db and Q2.OUT.txt (assuming the current directory contains the data files). $ sqlite3 Q2.db < Q2.SQL.txt > Q2.OUT.txt We will generate the Q2.OUT.txt using the above command. You may not receive any points if we are unable to generate the 2 output files. You may also lose points if you do not strictly follow the output format specified in each question below. The output format corresponds to the headers/column names for your SQL command output. We have added some lines of code in the Q2.SQL.txt file which are for the purpose of autograding. . DO NOT REMOVE/MODIFY THESE LINES. You may not receive any points if these statements are modified in any way (our autograder will check for changes). There are clearly marked regions in the Q2.SQL.txt file where you should add your code. We have also provided a Q2.OUT.SAMPLE.txt which gives an example of how your final Q2.OUT.txt should look like after running the above command. Please avoid printing unnecessary items in your final submission as it may affect autograding and you may lose points. Purpose of some lines of code in file Q2.SQL.txt are as follows: ● .headers off. : After each question, an output format has been given with a list of column names/headers.This command ensures that such headers are not displayed in the output. ● .separator ‘,’ : To specify that the input file and the output are commaseparated. ● select ‘’: This command prints a blank line. After each question’s query, this command ensures that there is a new line between each result in the output file. WARNING: Do not copy and paste any code/command from this PDF for use in the sqlite command prompt, because PDFs sometimes introduce hidden/special characters, causing SQL error. Manually type out the commands instead. Note: For the questions in this section, you must use only INNER JOIN when you perform a join between two tables. Other types of join may result in incorrect outputs. Note: Do not use .mode csv in your Q2.SQL.txt file. This will cause quotes to be printed in the output of each select ‘’; statement. a. Create tables and import data. i. [2 points] Create the following two tables (“movies” and “cast”) with columns having the indicated data types: ● movies ○ id (integer) ○ name (text) ○ score (integer) ● cast ○ movie_id (integer) ○ cast_id (integer) ○ cast_name (text) ii. [1 point] Import the provided movienamescore.txt file into the movies table, and moviecast.txt into the cast table. You can use SQLite’s .import command for this. Please use relative paths while importing files since absolute/local paths are specific locations that exist only on your computer and will cause the autograder to fail right in the beginning. b. [2 points] Create indexes. Create the following indexes which would speed up subsequent operations (improvement in speed may be negligible for this small database, but significant for larger databases): i. scores_index for the score column in movies table ii. cast_index for the cast_id column in cast table iii. movie_index for the id column in movies table c. [2 points] Calculate average score. Find the average score of all movies having a score >= 5 Output format: average_score d. [3 points] Find poor movies. List the five worst movies (lowest scores). Sort your output by score from lowest to highest, then by name in alphabetical order. Output format: id,name,score e. [4 points] Find laid back actors. List ten cast members (alphabetically by cast_name) with exactly two movie appearances. Output format: cast_id,cast_name,movie_count f. [6 points] Get high scoring actors. Find the top ten cast members who have the highest average movie scores. Sort your output by score (from high to low). In case of a tie in the score, sort the results based on the name of the cast member in alphabetical order. Skip movies with score = 50. The view should have the format: good_collaboration( cast_member_id1, cast_member_id2, movie_count, average_movie_score) For symmetrical or mirror pairs, only keep the row in which cast_member_id1 has a lower numeric value. For example, for ID pairs (1, 2) and (2, 1), keep the row with IDs (1, 2). There should not be any self pairs (cast_member_id1 == cast_member_id2). Full points will only be awarded for queries that use joins. Remember that creating a view will not produce any output, so you should test your view with a few simple select statements during development. One such test has already been added to the code as part of the autograding. Optional Reading: Why create views? h. [4 points] Find the best collaborators. Get the five cast members with the highest average scores from the good_collaboration view made in the last part, and call this score the collaboration_score. This score is the average of the average_movie_score corresponding to each cast member, including actors in cast_member_id1 as well as cast_member_id2. Sort your output in a descending order of this score (and alphabetically in case of a tie). Output format: cast_id,cast_name,collaboration_score i. SQLite supports simple but powerful Full Text Search (FTS) for fast textbased querying (FTS documentation). Import movie overview data from the movieoverview.txt into a new FTS table called movie_overview with the schema: movie_overview ( id integer, name text, year integer, overview text, popularity decimal) NOTE: Create the table using fts3 or fts4 only. Also note that keywords like NEAR, AND, OR and NOT are case sensitive in FTS queries. 1. [1 point] Count the number of movies whose overview field contains the word “fight”. Output format: count_overview 2. [2 points] List the id’s of the movies that contain the terms “love” and “story” in the overview field with no more than 5 intervening terms in between. Output format: id Deliverables: Place all the files listed below in the Q2 folder ● Q2.SQL.txt: Modified file additionally containing all the SQL statements and SQLite commands you have used to answer questions a i in the appropriate sequence. ● Q2.OUT.txt: Output of the queries in Q2.SQL.txt. Check above for how to generate this file. Q3 [15 pt] D3 Warmup and Tutorial ● Go through the D3 tutorial here before attempting this question. ● Complete steps 0116 (Complete through “16. Axes”). ● This is a simple and important tutorial which lays the groundwork for Homework 2. Note: We recommend using Mozilla Firefox or Google Chrome, since they have relatively robust builtin developer tools. Deliverables: Place all the files/folders listed below in the Q3 folder ● A folder named d3 containing file d3.v3.min.js (download) ● index.html : When run in a browser, it should display a scatterplot with the following specifications: a. [12 pt] Generate and plot 60 objects: 30 upwardpointing equilateral triangles and 30 crosses. Each object’s X and Y coordinates should be a random integer between 0 and 100 inclusively (i.e., [0, 100]). An object’s X and Y coordinates should be independently computed. Each object’s size will be a value between 5 and 50 inclusively (i.e., [5, 50]). You should use the “symbol.size()” function of d3 to adjust the size of the object. Use the object’s X coordinate to determine the size of the object. You should use a linear scale for the size, to map the domain of X values to the range of [5,50]. Objects with larger x coordinate values should have larger sizes. This link explains how size is interpreted by symbol.size(). You may want to look at this example for the usage of “symbol.size()” function. All objects with size greater than the average size of all scatterplot objects should be colored blue and all other objects should be colored green. All these objects should be filled (Please see the figure below) and the entire graph should fit in the browser window (no scrolling). b. [2 pt] The plot must have visible X and Y axes that scale according to the generated objects. The ticks on these axes should adjust automatically based on the randomly generated scatterplot objects. c. [1 pt] Your full name (in upper case) should appear above the scatterplot. Set the HTML title tag (
5/5 - (1 vote) Q1 [20 pts] Implementation of Page Rank Algorithm In this question, you will implement the PageRank algorithm in Python for large dataset. The PageRank algorithm was first proposed to rank web search results, so that more “important” web pages are ranked higher. It works by considering the number and “importance” of links pointing to a page, to estimate how important that page is. PageRank outputs a probability distribution over all web pages, representing the likelihood that a person randomly surfing the web (randomly clicking on links) would arrive at those pages. As mentioned in the lectures, the PageRank values are the entries in the dominant eigenvector of the modified adjacency matrix in which each column’s values adds up to 1 (i.e., “column normalized”), and this eigenvector can be calculated by the power iteration method, which iterate through the graph’s edges multiple times to updating the nodes’ probabilities (‘scores’ in pagerank.py) in each iteration : For each iteration, the Page rank computation for each node would be : Where: You will be using the dataset Wikipedia adminship election data which has almost 7K nodes and 100K edges. Also, you may find the dataset under the hw4-skeleton/Q1 as “soc-wiki-elec.edges” In pagerank.py, You will be asked to implement the simplified PageRank algorithm, where Pd ( vj ) = 1/n in the script provided and need to submit the output for 10, 25 iteration runs. To verify, we are providing with the sample output of 5 iterations for simplified pagerank. For personalized PageRank, the Pd ( ) vector will be assigned values based on your 9 digit GTID (Eg: 987654321) and you are asked to submit the output for 10, 25 iteration runs. Deliverables: pagerank.py [12 pts]: your modified implementation simplified_pagerank_{n}.txt: 2 files (as given below) containing the top 10 node IDs and their simplified pageranks for n iterations simplified_pagerank10.txt [2 pts] simplified_pagerank25.txt [2 pts] personalized_pagerank_{n}.txt: 2 files (as given below) containing the top 10 node IDs and their simplified pageranks for n iterations personalized_pagerank10.txt [2 pts] personalized_pagerank25.txt [2 pts] Q2 [50 pts] Random Forest Classifier Q2.1 – Random Forest Setup [45 pts] Note: You must use Python 3.7.x for this question. You will implement a random forest classifier in Python. The performance of the classifier will be evaluated via the out-of-bag (OOB) error estimate, using the provided dataset. Note: You may only use the modules and libraries provided at the top of the .py files included in the skeleton for Q2 and modules from the Python Standard Library. Python wrappers (or modules) may NOT be used for this assignment. Pandas may NOT be used — while we understand that they are useful libraries to learn, completing this question is not critically dependent on their functionality. In addition, to make grading more manageable and to enable our TAs to provide better, more consistent support to our students, we have decided to restrict the libraries accordingly. The dataset you will use is Predicting a Pulsar Star dataset. Each record consists of the parameters of a pulsar candidate. The dataset has been cleaned to remove missing attributes. The data is stored in a comma-separated file (csv) in your Q2 folder as pulsar_stars.csv. Each line describes an instance using 9 columns: the first 8 columns represent the attributes of the pulsar candidate, and the last column is the class which tells us if the candidate is a pulsar or not (1 means it is a pulsar, 0 means not a pulsar). Note: The last column should not be treated as an attribute. Note2: Do not modify the dataset. You will perform binary classification on the dataset to determine if a pulsar candidate is a pulsar or not. Essential Reading Decision Trees To complete this question, you need to develop a good understanding of how decision trees work. We recommend you review the lecture on decision tree. Specifically, you need to know how to construct decision trees using Entropy and Information Gain to select the splitting attribute and split point for the selected attribute. These slides from CMU (also mentioned in lecture) provide an excellent example of how to construct a decision tree using Entropy and Information Gain. Random Forests To refresh your memory about random forests, see Chapter 15 in the Elements of Statistical Learning book and the lecture on random forests. Here is a blog post that introduces random forests in a fun way, in layman’s terms. Out-of-Bag Error Estimate In random forests, it is not necessary to perform explicit cross-validation or use a separate test set for performance evaluation. Out-of-bag (OOB) error estimate has shown to be reasonably accurate and unbiased. Below, we summarize the key points about OOB described in the original article by Breiman and Cutler. Each tree in the forest is constructed using a different bootstrap sample from the original data. Each bootstrap sample is constructed by randomly sampling from the original dataset with replacement (usually, a bootstrap sample has the same size as the original dataset). Statistically, about one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree. For each record left out in the construction of the kth tree, it can be assigned a class by the kth tree. As a result, each record will have a “test set” classification by the subset of trees that treat the record as an out-of-bag sample. The majority vote for that record will be its predicted class. The proportion of times that a predicted class is not equal to the true class of a record averaged over all records is the OOB error estimate. Starter Code We have prepared starter code written in Python for you to use. This would help you load the data and evaluate your model. The following files are provided for you: util.py: utility functions that will help you build a decision tree decision_tree.py: a decision tree class that you will use to build your random forest random_forest.py: a random forest class and a main method to test your random forest What you will implement Below, we have summarized what you will implement to solve this question. Note that you MUST use information gain to perform the splitting in the decision tree. The starter code has detailed comments on how to implement each function. util.py: implement the functions to compute entropy, information gain, and perform splitting. decision_tree.py: implement the learn() method to build your decision tree using the utility functions above. decision_tree.py: implement the classify() method to predict the label of a test record using your decision tree. random_forest.py: implement the methods _bootstrapping(), fitting(), voting() Note: You must achieve a minimum accuracy of 75% for the random forest. Note 2: Your code must take no more than 5 minutes to execute. Note 3: Remember to remove all of your print statements from the code. Nothing other than the existing print statements in main() should be printed on the console. Failure to do so may result in point deduction. Do not remove the existing print statements in main() in random_forest.py. As you solve this question, you will need to think about multiple parameters in your design, some may be more straightforward to determine, some may be not (hint: study lecture slides and essential reading above). For example: Which attributes to use when building a tree? How to determine the split point for an attribute? When do you stop splitting leaf nodes? How many trees should the forest contain? Note that, as mentioned in lecture, there are other approaches to implement random forests. For example, instead of information gain, other popular choices include Gini index, random attribute selection (e.g., PERT – Perfect Random Tree Ensembles). We decided to ask everyone to use an information gain based approach in this question (instead of leaving it open-ended), to help standardize students’ solutions to help accelerate our grading efforts. Q2.2 – forest.txt [5 pts] In forest.txt, report the following: What is the main reason to use a random forest versus a decision tree? (
5/5 - (2 votes) Q1 [15 points] Analyzing a Graph with Hadoop/Java Imagine that your boss gives you a large dataset which contains an entire email communication network from a popular social network site. The network is organized as a directed graph where each node represents a person’s email address and an edge between two nodes (e.g., address A and address B) has a weight stating how many times A has written to B. You have been tasked with finding the person that each person has written to the most, along with that count (see the example below for more clarification). Your task is to write a MapReduce program in Java to report, for each node X (the “source”, or “src” for short) in the graph, the person Y (the “target” or “tgt” for short) that X has written to the most, and the number of times X has written to Y (the outbound “weight”, from X to Y). If a person has written to multiple targets that have exactly the same (largest) number of times, return the target with smallest node id. First, go over the Hadoop word count tutorial to familiarize yourself with Hadoop and some Java basics. You will be able to complete this question with only some knowledge about Java. You should have already loaded two graph files into HDFS and loaded into your HDFS file system in your VM. Each file stores a list of edges as tab-separated-values. Each line represents a single edge consisting of three columns: (source node ID, target node ID, edge weight), each of which is separated by a tab (t). Node IDs and weights are positive integers. Below is a small toy graph, for illustration purposes (on your screen, the text may appear out of alignment). src tgt weight 10 110 3 10 200 1 200 150 30 100 110 10 110 130 15 110 200 67 10 70 3 Your program should not assume the edges to be sorted or ordered in any ways (i.e., your program should work even when the edge ordering is random). Your code should accept two arguments. The first argument (args[0]) will be a path for the input graph file on HDFS (e.g., data/graph1.tsv), and the second argument (args[1]) will be a path for output directory on HDFS (e.g., data/q1output1). The default output mechanism of Hadoop will create multiple files on the output directory such as part-00000, part-00001, which will be merged and downloaded to a local directory by the supplied run script. Please use the run1.sh and run2.sh scripts for your convenience. The format of the output: each line contains a node ID, followed by a tab (t), and the expected “target node ID,weight” tuple (without the quotes; and note there is no space character before and after the comma). Lines do not need to be sorted. The following example result is computed based on the toy graph above. Please exclude nodes that do not have outgoing edges (e.g., those email addresses which have not sent out any communication). For the toy graph above, the output is as follows. 10 70,3 200 150,30 100 110,10 110 200,67 Deliverables [5 points] Your Maven project directory including Q1.java. Please see detailed submission guide at the end of this document. You should implement your own MapReduce procedure and should not import external graph processing library. [5 points] q1output1.tsv: the output file of processing graph1.tsv by run1.sh. [5 points] q1output2.tsv: the output file of processing graph2.tsv by run2.sh. Q2 [25 pts] Analyzing a Large Graph with Spark/Scala on Databricks Tutorial: First, go over this Spark on Databricks Tutorial, to get the basics of creating Spark jobs, loading data, and working with data. You will analyze mathoverflow.csv[3] using Spark and Scala on the Databricks platform. This graph is a temporal network of interactions on the stack exchange web-site MathOverflow. The dataset has three columns in the following format: ID of the source node (a user), ID of the target node (a user), Unix timestamp (seconds since the epoch). Your objectives: Remove the pairs where the questioner and the answerer are the same person. Very important: all subsequent operations must be performed on this filtered dataframe. List the top 3 answerers who answered the highest number of questions, sorted in descending order of questions answered count. If there is a tie, list the individual with smaller node ID first. List the top 3 questioners who asked the highest number of questions, sorted in descending order of questions asked count. If there is a tie, list the individual with the smaller node ID first. List the top 5 most common answerer–questioner pairs, sorted in descending order of pair count. If there is a tie, list the pair with the smaller answerer node ID first. If there is still a tie, use the questioner node ID as the tie-breaker by listing the smaller questioner ID first. List, by month, the number of interactions (questions asked/answered) from September 1, 2010 (inclusively) to December 31, 2010 (inclusively). List the top 3 individuals with the most overall activity (i.e., highest total questions asked and questions answered). You should perform this task using the DataFrame API in Spark. Here is a guide that will help you get started on working with data frames in Spark. A template Scala notebook, q2-skeleton.dbc has been included in the HW3-Skeleton that reads in a sample graph file examplegraph.csv. In the template, the input data is loaded into a dataframe, inferring the schema using reflection (Refer to the guide above). Note: You must use only Scala DataFrame operations for this task. You will lose points if you use SQL queries, Python, or R to manipulate a dataframe. You may find some of the following DataFrame operations helpful: toDF, join, select, groupBy, orderBy, filter Upload the data file examplegraph.csv and q2-skeleton.dbc to your Databricks workspace before continuing. Follow the Databricks Setup Guide for further instructions. Consider the following directed graph example where we show the result of achieving the objectives on examplegraph.csv. +——–+———-+———-+ |answerer|questioner| timestamp| +——–+———-+———-+ | 1| 4|1283296645| | 3| 4|1283297908| | 1| 4|1283298280| | 2| 2|1283298467| | 3| 4|1283299092| | 1| 2|1283300824| | 2| 1|1283300967| | 1| 2|1283301485| | 3| 1|1283302207| | 2| 4|1283303844| | 4| 1|1283304158| | 3| 1|1283304547| | 1| 2|1283305468| | 2| 1|1283305625| | 1| 7|1283306548| | 3| 3|1283306910| | 7| 2|1283309524| | 4| 1|1283310284| | 1| 4|1283310295| | 2| 7|1283310872| +——–+———-+———-+ 1. Remove the pairs where the questioner and the answerer are the same person. For example, the instance of the edge answerer: 2 – questioner: 2 and answerer: 3 – questioner: 3 should be removed from examplegraph. +——–+———-+———-+ |answerer|questioner| timestamp| +——–+———-+———-+ | 1| 4|1254192988| | 3| 4|1254194656| | 1| 4|1254202612| | 2| 2|1254232804| | 3| 4|1254263166| … | 1| 7|1254392595| | 3| 3|1254395022| | 7| 2|1254396925| … | 2| 7|1254436186| +——–+———-+———-+ 2. List the top 3 answerers who answered the highest number of questions, sorted in descending order of question count. If there is a tie, list the individual with the smaller node ID first. For the above examplegraph the results would look like this: +——–+——————+ |answerer|questions_answered| +——–+——————+ | 1| 7| | 2| 4| | 3| 4| +——–+——————+ 3. List the top 3 questioners who asked the highest number of questions, sorted in descending order of question count. If there is a tie, list the individual with the smaller node ID first. For the above examplegraph the results would look like this: +———-+—————+ |questioner|questions_asked| +———-+—————+ | 1| 6| | 4| 6| | 2| 4| +———-+—————+ 4. List the top 5 most common questioner-answerer pairs, sorted in descending order of pair count. If there is a tie, list the pair with the smaller answerer node ID first. If there is still a tie, use the questioner node ID as the tie-breaker by listing the smaller questioner ID first. For the above examplegraph the results would look like this: +——–+———-+—–+ |answerer|questioner|count| +——–+———-+—–+ | 1| 2| 3| | 1| 4| 3| | 2| 1| 2| | 3| 1| 2| | 3| 4| 2| +——–+———-+—–+ 5. List, by month, the number of interactions (questions asked/answered) from September 1, 2010 (inclusively) to December 31, 2010 (inclusively). The month of September is represented by the number 9, month of October by 10 and so on. For the above examplegraph the results would look like this: +—–+——————+ |month|total_interactions| +—–+——————+ | 9| 14| +—–+——————+ 6. List the top 3 individuals with the most overall activity, i.e., highest total questions asked and questions answered. For the above examplegraph the results would look like this: +——+————–+ |userID|total_activity| +——+————–+ | 1| 13| | 2| 8| | 4| 8| +——+————–+ We have provided you with the walkthrough of the steps you need to perform with an example graph. Now it is your turn to replace the examplegraph with the mathoverflow graph and list the results of your experiments in the provided q2_results.csv file. Deliverables [10 pts] q2.dbc Your solution as Scala Notebook archive file (.dbc) exported from Databricks. See the Databricks Setup Guide on creating an exportable archive for details. q2.scala, Your solution as a Scala source file exported from Databricks. See the Databricks Setup Guide on creating an exportable source file for details. Note: You should export your solution as both a .dbc & a .scala file. [15 pts] q2_results.csv: The output file of processing mathoverflow.csv from the q2 notebook file. You must copy the output of the display()/show() function into the file titled q2_results.csv into the relevant sections. Q3 [35 points] Analyzing Large Amount of Data with Pig on AWS You will try out Apache Pig for processing n-gram data on Amazon Web Services (AWS). This is a fairly simple task, and in practice you may be able to tackle this using commodity computers (e.g., consumer-grade laptops or desktops). However, we would like you to use this exercise to learn and solve it using distributed computing on Amazon EC2, and gain experience (very helpful for your career), so you will be prepared to tackle problems that are more complex. The services you will primarily be using are Amazon S3 storage, Amazon Elastic Cloud Computing (EC2) virtual servers in the cloud, and Amazon Elastic MapReduce (EMR) managed Hadoop framework. For this question, you will only use up a very small fraction of your $100 credit. AWS allows you to use up to 20 instances in total (that means 1 master instance and up to 19 core instances) without filling out a “limit request form”. For this assignment, you should not exceed this quota of 20 instances. Refer to details about instance types, their specs, and pricing. In the future, for larger jobs, you may want to use AWS’s pricing calculator. AWS Guidelines Please read the AWS Setup Guidelines provided to set up your AWS account. Datasets In this question, you will use a dataset of over 130 million customer reviews from Amazon Customer Reviews. (Further details on this dataset are available here). You will perform your analysis on two datasets based off of this data, which we have prepared for you: a small one (~1GB) and a large one (~32GB). VERY IMPORTANT: Both the datasets are in the US East (N. Virginia) region. Using machines in other regions for computation would incur data transfer charges. Hence, set your region to US East (N. Virginia) in the beginning (not Oregon, which is the default). This is extremely important, otherwise your code may not work and you may be charged extra. The files in these two S3 buckets are stored in a tab (‘t’) separated format. An example of their structure is available here. Goal Output the top 15 product categories having the highest average star rating per review along with their corresponding averages, in tab-separated format, sorted in descending order. Only consider entries whose review bodies are 100 characters or longer, have 30 or more total votes, and were verified purchases. If multiple product categories have the same average, order them alphabetically on product category. Refer to the example and calculations below: product_id review_id star_rating Toys RDIJS7QYB6XNR 3 Toys BAHDID7JDNAK0 4 Toys YHFKALWPEOFK4 2 Books BZHDKTLJYBDNE 4 Books ZZUDNFLGNALDA 3 Books (4 + 3 ) / 2 = 5.5 Toys (3 + 4 + 2) / 3 = 3.0 Note: The small dataset only includes 5 categories, so you will only get 5 rows for its output. Sample Output To help you evaluate the correctness of your output, we provide you with the output for the small dataset Note: Please strictly follow the formatting requirements for your output as shown in the small dataset output file. You can use https://www.diffchecker.com/ to make sure the formatting is correct. Improperly formatting outputs may not receive any points. Using PIG (Read these instructions carefully) There are two ways to debug PIG on AWS (all instructions are in the AWS Setup Guidelines): Use the interactive PIG shell provided by EMR to perform this task from the command line (grunt). Refer to Section 8: Debugging in the AWS Setup Guidelines for a detailed step-by-step procedure. You should use this method if you are using PIG for the first time as it is easier to debug your code. However, as you need to have a persistent ssh connection to your cluster until your task is complete, this is suitable only for the smaller dataset. Upload a PIG script with all the commands which computes and direct the output from the command line into a separate file. Once you verify the output on the smaller dataset, use this method for the larger dataset. You don’t have to ssh or stay logged into your account. You can start your EMR job, and come back after when the job is complete! Note: In summary, verify the output for the smaller dataset with Method 1 and submit the results for the bigger dataset using Method 2. Sample Commands: Load data in PIG To load data for the small example use the following command: grunt> reviews = LOAD ‘s3://amazon-reviews-pds/tsv/amazon_reviews_us_M*’ AS (marketplace:chararray,customer_id:chararray,review_id:chararray,product_id:chararray,product_parent:chararray,product_title:chararray,product_category:chararray,star_rating:int,helpful_votes:int,total_votes:int,vine:chararray,verified_purchase:chararray,review_headline:chararray,review_body:chararray, review_date:chararray); To load data for the large example use the following command: grunt> reviews = LOAD ‘s3://amazon-reviews-pds/tsv/*’ AS (marketplace:chararray,customer_id:chararray,review_id:chararray,product_id:chararray,product_parent:chararray,product_title:chararray,product_category:chararray,star_rating:int,helpful_votes:int,total_votes:int,vine:chararray,verified_purchase:chararray,review_headline:chararray,review_body:chararray, review_date:chararray); Note: Refer to other commands such as LOAD, USING PigStorage, FILTER, GROUP, ORDER BY, FOREACH, GENERATE, LIMIT, STORE, etc. Copying the above commands directly from the PDF and pasting on console/script file may lead to script failures due to the stray characters and spaces from the PDF file. Your script will fail if your output directory already exists. For instance, if you run a job with the output folder as s3://cse6242oan-/output-small, the next job which you run with the same output folder will fail. Hence, please use a different folder for the output for every run. You might also want to change the input data type for star_rating to handle floating point values. While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data (the whole data is over 100 GB). Once you believe your PIG commands are working as desired, you can use them on the complete data and wait since it will take some time. Deliverables pig-script.txt: The PIG script for the question (using the larger data set). pig-output.txt: Output (tab-separated) (using the larger data set). Note: Please strictly follow the guidelines below, otherwise your solution may not be graded. Ensure that file names (case sensitive) are correct. Ensure file extensions (.txt) are correct. The size of each pig-script.txt and pig-output.txt file should not exceed 5 KB. Double check that you are submitting the correct set of files — we only want the script and output from the larger dataset. Also double check that you are writing the right dataset’s output to the right file. You are welcome to store your script’s output in any bucket you choose, as long as you can download and submit the correct files. Ensure that unnecessary new lines, brackets, commas etc. aren’t in the file. Please do not make any manual changes to the output files Q4 [35 points] Analyzing a Large Graph using Hadoop on Microsoft Azure VERY IMPORTANT: Use Firefox or Chrome in incognito/private browsing mode when configuring anything related to Azure (e.g., when using Azure portal), to prevent issues due to browser caches. Safari sometimes loses connections. Goal The goal is to analyze graph using a cloud computing service – Microsoft Azure, and your task is to write a MapReduce program to compute the distribution of a graph’s node degree differences (see example below). Note that this question shares some similarities with Question 1 (e.g., both are analyzing graphs). Question 1 can be completed using your own computer. This question is to be completed using Azure. We recommend that you first complete Question 1. Please carefully read the following instructions. You will use two data files in this questions: small.tsv[4] (zipped as ~3MB small.zip; ~11MB when unzipped) large.tsv[5] (zipped as 247MB large.zip; ~1GB when unzipped) Each file stores a list of edges as tab-separated-values. Each line represents a single edge consisting of two columns: (Source, Target), each of which is separated by a tab. Node IDs are positive integers and the rows are already sorted by Source. Source Target 0 0 0 1 1 1 1 2 2 3 Your code should accept two arguments upon running. The first argument (args[0]) will be a path for the input graph file, and the second argument (args[1]) will be a path for output directory. The default output mechanism of Hadoop will create multiple files on the output directory such as part-00000, part-00001, which will have to be merged and downloaded to a local directory (instructions on how to do this are provided below). The format of the output should be as follows. Each line of your output is of the format diff count where (1) diff is the difference between a node’s out-degree and in-degree (i.e., out-degree minus in-degree); and (2) count is the number of nodes that have the value of difference (specified in 1). The out-degree of a node is the number of edges where that node is the Source. The in-degree of a node is the number of edges where that node is the Target. diff and count must be separated by a tab (t), and the lines do not have to be sorted. When the source and target is the same node, such as [0, 0] in the example, node “0” should be counted in both in-degree and out-degree. The following result is computed based on the graph above. -1 1 0 2 1 1 The explanation of the above example result: Output Explanation -1 1 There are 1 nodes (node 3) whose degree difference is -1 0 2 There are 2 nodes (node 1 and node 2) whose degree is 0. 1 1 There is 1 node (node 0) whose degree difference is 1. Hint: One way of doing it is using the mapreduce procedure twice. The first one for finding the difference between out-degree and in-degree for each node, the second for calculating the node count of each degree difference. You will have to make changes in the skeleton code for this. In the Q4 folder of the hw3-skeleton, you will find the following files we have prepared for you: src directory contains a main Java file that you will work on. We have provided some code to help you get started. Feel free to edit it and add your files in the directory, but the main class should be called “Q4”. pom.xml contains necessary dependencies and compile configurations for the question. To compile, you can run the command in the directory which contains pom.xml. mvn clean package This command will generate a single JAR file in the target directory (i.e. target/q4-1.0.jar). Creating Clusters in HDInsight using the Azure portal Azure HDInsight is an Apache Hadoop distribution. This means that it handles large amounts of data on demand. The next step is to use Azure’s web-based management tool to create a Linux cluster. Follow the recommended steps shown here (or the full Azure’s documentation here) to create a new cluster. At the end of this process, you will have created and provisioned a New HDInsight Cluster and Storage (the provisioning will take some time depending on how many nodes you chose to create). Please record the following important information (we also recommend that you take screenshots) so you can refer to them later: Cluster login credentials SSH credentials Resource group Storage account Container credentials VERY IMPORTANT: HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. To save the credit, you’d better to delete your cluster when it is no longer in use. You can find all clusters and storages you have created in “All resources” in the left panel. Please refer here for how to delete an HDInsight cluster. Uploading data files to HDFS-compatible Azure Blob storage We have listed the main steps from the documentation for uploading data files to your Azure Blob storage here: Follow the documentation here to install Azure CLI. Open a command prompt, bash, or other shell, and use az login command to authenticate to your Azure subscription. When prompted, enter the username and password for your subscription. az storage account list command will list the storage accounts for your subscription. az storage account keys list –account-name –resource-group command should return “key1” and “key2”. Copy the value of “key1” because it will be used in the next steps. az storage container list –account-name –account-key command will list your blob containers. az storage blob upload –account-name –account-key –file –container-name –name / command will upload the source file to your blob storage container. is the folder name you will create and where you would like to upload tsv files to. If the file is uploaded successfully, you should see “Finished [#####] 100.0000%”. Using these steps, upload small.tsv and large.tsv to your blob storage container. The uploading process may take some time. After that, you can find the uploaded files in storage blobs at Azure (portal.azure.com) by clicking on “Storage accounts” in the left side menu and navigating through your storage account ( -> -> -> ). For example, “jonDoeStorage” -> “Blobs” -> “jondoecluster-xxx” -> “jdoeSmallBlob” -> “small.tsv”. After that write your hadoop code locally and convert it to a jar file using the steps mentioned above. Uploading your Jar file to HDFS-compatible Azure Blob storage Azure Blob storage is a general-purpose storage solution that integrates with HDInsight. Your Hadoop code should directly access files on the Azure Blob storage. Upload the jar file created in the first step to Azure storage using the following command: scp /q4-1.0.jar @-ssh.azurehdinsight.net: You will be asked to agree to connect by typing “yes” and your cluster login password. Then you will see q4-1.0.jar is uploaded 100%. SSH into the HDInsight cluster using the following command: ssh @-ssh.azurehdinsight.net is what you have created in step 10 in the flow. Note: if you see the warning – REMOTE HOST IDENTIFICATION HAS CHANGED, you may clean /home//.ssh/known_hosts” by using the command rm ~/.ssh/known_hosts. Please refer to host identification. Run the ls command to make sure that the q4-1.0.jar file is present. To run your code on the small.tsv file, run the following command: yarn jar q4-1.0.jar edu.gatech.cse6242.Q4 wasbs://@.blob.core.windows.net//small.tsv wasbs://@.blob.core.windows.net/smalloutput Command format: yarn jar jarFile packageName.ClassName dataFileLocation outputDirLocation Note: if “Exception in thread “main” org.apache.hadoop.mapred.FileAlreadyExistsException…” occurs, you need to delete the output folder and files from your Blob. You can do this at portal.azure.com. Click “All resources” in the left panel, you will see you storage and cluster. Click your storage, then “Blobs” and then your container, you will see all folders including the blob created by you and the “smalloutput” folder. You need to click “Load more” at the bottom to see all folders/files. You need to delete output at different places: 1. smalloutput; 2. user/sshuser/* The output will be located in the directory: wasbs://@.blob.core.windows.net/smalloutput If there are multiple output files, merge the files in this directory using the following command: hdfs dfs -cat wasbs://@.blob.core.windows.net/smalloutput/* > small.out Command format: hdfs dfs -cat location/* >outputFile Then you may exit to your local machine using the command: exit You can download the merged file to the local machine (this can be done either from Azure Portal or by using the scp command from the local machine). Here is the scp command for downloading this output file to your local machine: scp @-ssh.azurehdinsight.net:/home//small.out Using the above command from your local machine will download the small.out file into the local directory. Repeat this process for large.tsv. Make sure your output file has exactly two columns of values as shown above. Your output files should be able to be opened and readable in text editors like notepad++. Deliverables [15pt] Q4.java & q4-1.0.jar: Your java code and converted jar file. You should implement your own map/reduce procedure and should not import external graph processing library. [10pt] small.out: the output file generated after processing small.tsv. [10pt] large.out: the output file generated after processing large.tsv. Q5 [10 points] Regression: Automobile price prediction, using Azure ML Studio Note: Create and use a free workspace instance on Azure Studio instead of your Azure credit for this question. Please use your Georgia Tech username (e.g., jdoe3) to login. Goal The main purpose of this question is to introduce you to Microsoft Azure Machine Learning Studio and familiarize you with its basic functionality and typical machine learning workflow. Go through the “Automobile price prediction” tutorial and complete the tasks below. You will modify the given file, results.csv, by adding your results for each of the tasks below. We will autograde your solution, therefore DO NOT change the order of the questions or anything else. Report the exact numbers that you get in your output, DO NOT round the numbers. [3 points] Repeat the experiment mentioned in the tutorial and report the values of the metrics as mentioned in the ‘Evaluate Model’ section of the tutorial. [3 points] Repeat the same experiment, change the ‘Fraction of rows in the first output’ value in the split module to 0.8 (originally set to 0.75) and report the corresponding values of the metrics. [4 points] Evaluate the model with the 5-fold cross-validation (CV), select the parameters in the module ‘Partition and sample’ (Partition and Sample) (see figure below). Report the values of Root Mean Squared Error (RMSE) and Coefficient of Determination for each fold. (1st fold corresponds to fold number 0 and so on). Figure: Property Tab of Partition and Sample Module Specifically, you need to do the following: Import the entire dataset (Automobile Price Data (Raw)) Clean the missing data by removing rows that have any missing values Partition and Sample the data. Create a new model (Linear Regression) Finally, perform cross validation on the dataset. Visualize/Report the values. Deliverables [10pt] results.csv: a csv file containing results for all of the three parts. Important: folder structure of the zip file that you submit You are submitting a single zip file HW3-GTUsername.zip (e.g., HW3-jdoe3.zip, where “jdoe3” is your GT username), which must unzip to the following directory structure (i.e., a folder “HW3-jdoe3”, containing folders “Q1”, “Q2”, etc.). The files to be included in each question’s folder have been clearly specified at the end of each question’s problem description above. HW3-GTUsername/ Q1/ src/main/java/edu/gatech/cse6242/Q1.java pom.xml run1.sh run2.sh q1output1.tsv q1output2.tsv (do not attach target directory) Q2/ q2.dbc q2.scala q2_results.csv Q3/ pig-script.txt pig-output.txt Q4/ src/main/java/edu/gatech/cse6242/Q4.java pom.xml q4-1.0.jar (from target directory) small.out large.out (do not attach target directory) Q5/ results.csv Version 0 [1] Graph1 is a modified version of data derived from the LiveJournal social network dataset, with around 30K nodes and 320K edges. [2] Graph2 is a modified version of data derived from the LiveJournal social network dataset, with around 300K nodes and 69M edges. [3] Graph derived from the Stanford Large Network Dataset Collection [4] subset of Pokec social network data [5] subset of Friendster data Published by Google Drive–Report Abuse
5/5 - (3 votes) Q1 [10 points] Designing a good table. Visualizing data with Tableau. Imagine you are a data scientist working with data that documents population distribution according to ethnic group, age and gender across years. a. [5 points] Good table design. You want to help your organization analyze the data in the years 2017 and 2018. Create a well-designed a table to visualize data contained in age-distribution.csv. You can use any tool (e.g., Excel, HTML) to create the table. For each year, and for each ethnic group (treat “Other Ethnic Groups” as an ethnic group), your table should clearly communicate,: The total number of males (across all ages) The total number of females (across all ages) The total population (across all ages) The percentage of people that are 65 years and over, rounded to 2 decimal places (you will need to calculate this percentage) Save the table as table.png. You may decide on the most meaningful column names to use, the number of columns, and the column order. Keep suggestions from lecture in mind when designing your table. You are not required to use only the techniques described in lecture. For OMS students, the online lecture video pertaining to this topic is Week 4 – Fixing Common Visualization Issues – Fixing Bar Charts, Line Charts). For campus student, please review slide 43 and onwards of the lecture slides. b. [5 points] Tableau. You want to help your organization better understand the yearly trend in population growth (in a city) and contribution of each ethnicity towards that growth. Visualize the data population.csv[1] as a stacked bar chart. Your chart should display years (1960 to 1970, inclusively) on the vertical axis and the total population on the horizontal axis. (Optional reading: the effectiveness of stacked bar charts is often debated — sometimes, they can be confusing, difficult to understand, and may make data series comparison challenging.) Our main goal here is for you to try out Tableau, a popular information visualization tool. Thus, we keep this part more open-ended, so you can practice making design decisions. We will accept most designs from you all. We show one possible design in the figure below, based on the tutorial from Tableau, and you are not limited to the techniques presented there. Please follow the instructions below:+ Your design should visualize the values of the categories Total Malays, Total Indians, Total Chinese, Other Ethnic Groups (Total) for each year. Your design should utilize a stacked bar chart to show the count for each of the aforementioned columns Your design should have clear label axes and a clear chart title. Include a legend for your chart. Save the chart as barchart.png. Tableau has provided us with student licenses for Tableau Desktop, available for Mac and Windows. Go to tableau activation and select “Tableau Desktop”. After the installation, you will be asked to provide an activation key, which you can find on the Canvas page for this assignment. This key is for your use in this course only. Do not share the key with anyone. If you do not have access to a Mac or Windows machine, please use the 14-day trial version of Tableau Online: 1. Visit https://www.tableau.com/trial/tableau-online 2. Enter your information (name, email, GT details, etc) 3. You will then receive an email to access your Tableau Online site 4. Go to your Site and create a workbook One final option, if neither of the above methods work, is to take advantage of Tableau for Students. Follow the link and select “Get Tableau For Free”. You should be able to receive an activation key which offers you a one-year use of Tableau Desktop at no cost by providing a valid Georgia Tech email. Note that it is unclear whether Tableau intends for these licenses to be renewable, so you may only be eligible to receive one in the event that you have never used a Tableau for Students license before. Figure 1: Example of a stacked bar chart Q1 Deliverables: The directory structure should be as follows: Q1/ table.png barchart.png age-distribution.csv population.csv table.png – An image/screenshot of the table in Q1.a (png format only). barchart.png – An image of the chart in Q1.b (png format only), Tableau workbooks will not be graded!). The image should be clear and of high-quality. age-distribution.csv and population.csv – the datasets. Q2 [15 points] Force-directed graph layout You will experiment with many aspects of D3 for graph visualization. To help you get started, we have provided the graph.html file (in the Q2 folder). Note: You are welcome to split graph.html into graph.html, graph.css, and graph.js. Please also make certain that any paths in your code are relative paths. Nonfunctioning code will result in a five point deduction. a. [3 points] Adding node labels: Modify graph.html to show a node label (the node name, i.e., the source) on the top right of each node. If a node is dragged, its label must move with it. b. [3 points] Styling edges: Style the edges based on the “value” field in the links array. Assign the following styles: If the value of the edge is equal to 0, the edge should be black, thin, and dashed. If the value of the edge is equal to 1, the edge should be green, thick, and solid. c. [3 points] Scaling nodes: [1.5 points] Scale the radius of each node in the graph based on the degree of the node (you may try linear or squared scale, but you are not limited to these choices). Note: Regardless of which scale you decide to use, you should avoid extreme node sizes (e.g., nodes that are mere points, barely visible, or of huge sizes. Failure to do so will result in a poor quality visualization. Note: D3 v4 (and above) does not support d.weight (which was the typical approach to obtain node degree in D3 v3). You may need to calculate node degrees yourself. Example relevant approach: https://stackoverflow.com/questions/43906686/d3-node-radius-depends-on-number-of-links-weight-property [1.5 points] The degree of each node should be represented by varying colors. Pick a meaningful color scheme (hint: color gradients). The number of color gradations is up to you, but it must be visually evident that the nodes with higher degree are colored a darker/deeper color and the nodes with lesser degree are colored lighter. d. [6 points] Pinning nodes (fixing node positions): [2 points] Modify the code so that when you double click a node, it pins the node’s position such that it will not be modified by the graph layout algorithm (note: pinned nodes can still be dragged around by the user but they will remain at their positions otherwise). Node pinning is an effective interaction technique to help users spatially organize nodes during graph exploration. [2 points] Mark pinned nodes to visually distinguish them from unpinned nodes, e.g., pinned nodes are shown in a different color, border thickness or visually annotated with an “asterisk” (*), etc. [2 points] Double clicking a pinned node should unpin (unfreeze) its position and unmark it. Figure 2a. Example Visualization Q2 Deliverables: The directory structure should be as follows: Q2/ graph.(html / js / css) graph.(html / js / css) – the html file created, and the js / css files if not included in graph.html Q3 [15 points] Line Charts Use the dataset[2] provided in the file earthquakes.csv (in the Q3 folder) to create line charts. Refer to the tutorial for line chart here. Note: You will create four plots in this question, which should be placed one after the other on a single HTML page, similar to the example image below (Figure 3). Note that your design need NOT be identical to the example. a. [5 points] Creating line chart. Create a line chart that visualizes the number of earthquakes worldwide from 2000 to 2015 (inclusively), for the four magnitude ranges: [‘5_5.9’, ‘6_6.9’, ‘7_7.9’, ‘8.0+’]. Use the color scheme provided below for the magnitude ranges. Add a legend at the top right corner of the chart showing the magnitude-color mapping. Chart title: Worldwide Earthquake stats 2000-2015 Horizontal axis label: Year Use scaleTime like you did in HW1Q3 Vertical axis label: Num of Earthquakes Use linear scale for this part a Colors scheme: {‘5_5.9’: ‘#FFC300’, ‘6_6.9’: ‘#FF5733’, ‘7_7.9’: ‘#C70039’, ‘8.0+’: ‘#900C3F’} b. [4 points] Adding symbols and scaling symbol sizes. Create a line chart for this part (append to the HTML page) whose design is a variant of what you have created in part a. Start with your chart from part a. Then modify the code to visualize each data point in the chart as a solid circle, whose size is proportional to “Estimated Deaths”. Use a good scaling coefficient (your choice) to make the chart legible, visually attractive and meaningful. Keep the legend. Chart title: Worldwide Earthquake stats 2000-2015 with symbols c. [6 points] Axis scales in D3. Create two line charts for this part (append to the HTML page) to try out two axis scales in D3. Start with your chart from part b. Then modify the vertical axis scale for each chart: the first chart uses the square root scale for its vertical axis (only), and the second plot uses the log scale for its vertical axis (only). Keep the legend and symbols. In explanation.txt, explain when we may want to use such nonlinear scales as square root scale and log scale in charts, in no more than 50 words. Note: the horizontal axes should be kept in linear scale, and only the vertical axes are affected. Hint: You may need to carefully set the scale domain to handle the 0s in data. First chart Chart title: Worldwide Earthquake stats 2000-2015 square root scale This chart uses the square root scale for its vertical axis (only) Other features should be the same as part b. Second chart Chart title: Worldwide Earthquake stats 2000-2015 log scale This chart uses the log scale for its vertical axis (only) Other features should be the same as part b. Figure 3a: Example line chart Figure 3b: Example line chart with symbols Figure 3c-1: Example line chart using square root scale Figure 3c-2: Example line chart using log scale Q3 Deliverables: The directory structure should be organized as follows: Q3/ earthquakes.csv linecharts.(html / js / css) linecharts.pdf explanation.txt earthquakes.csv – the dataset. linecharts.(html / js / css) – the html file created, and the js / css files if not included in linecharts.html linecharts.pdf – a PDF document showing the screenshots of the four line charts created above (one for Q3.a, one for Q3.b and two for Q3.c). You should print the HTML page as a PDF file, and each PDF page shows one plot (hint: use CSS page break). Clearly title the plots as instructed (see examples in Figure 3). explanation.txt – the text file explaining your observations for Q3.c. Q4 [15 points] Heatmap and Select Box Example: 2D Histogram, Select Options Use the dataset provided in earthquakes.csv (in the Q4 folder) that describes the earthquake counts for different states from 2010 to 2015 in the US. Visualize the data using D3 heatmaps. [3 points] Create a file named heatmap.html. Within this file, create a heatmap of the earthquakes for different states from year 2010 to 2015 (inclusively). Place the state name on the heatmap’s horizontal axis and the year on its vertical axis. [1 point] A heatmap’s color scheme is a very important design element that has a direct impact on the heatmap’s effectiveness. Colorize the earthquake counts for each state, using a meaningful 9-gradation color gradient of your choice. [3 pt] Add axis labels and a legend to the chart. Place the year (2010, 2011, 2012, etc.) on the vertical axis (i.e. top → bottom: 2010 → 2015). Place the state name (“Alabama”, “Arizona”, “Arkansas”, etc.) on the horizontal axis also in alphabetical order (i.e. left → right: A → Z). [6 pt] Create a drop down select box with D3 based on the total counts (from 2010 to 2015) of earthquakes of a state. The selections are “0 to 9”, “10 to 99”, “100 to 499”, and “500 or above”. When the user selects a different range in this select box, the heatmap and the legend should both be updated with values corresponding to the selected range. Note the differences in the horizontal axes and legends for “0 to 9” and “500 or above” in Figure 4a and Figure 4b below. While the 9 color gradations in the legend remain the same, the threshold values are different. The default category when the page loads should be “0 to 9”. [2 pt] Implement a mouseover effect. When the mouse cursor is on a heatmap cell , the value of that cell will be displayed between the chart title and the heatmap. Note: The Earthquake Statistics is from USGS with some modifications. The data provided in earthquake.csv would need to be “reshaped” in such a way that it can produce the expected output. All data reshaping must only be performed in javascript; you must not modify earthquake.csv. That is, your code should first read the data from earthquake.csv file as is, then you may reshape that loaded data using javascript, and then use it to create the heatmap. The threshold values should not be hardcoded. They do not necessarily have to match the ones provided in the screenshots below. The screenshots provided below serve as an example only. You are not expected to produce an exact copy of the screenshots. Please feel free to experiment with fonts, placement, color, etc. as long as the output looks reasonable for a heatmap and meets the functional requirements mentioned above. Figure 4a: Counts of earthquakes in the states that have 0-9 earthquakes in total from 2010 to 2015. When the mouse is placed on the grid (Tennessee, 2012), the value of 9 will show up. Figure 4b: Counts of earthquakes in the states that have 500 or above earthquakes in total from 2010 to 2015. When the mouse is placed on the grid (California, 2014), the value of 191 will show up. Q4 Deliverables: The directory structure should look like: Q4/ heatmap.(html / js /css) earthquakes.csv heatmap.(html / js / css) – the html file created, and the js / css files if not included in heatmap.html earthquakes.csv – the dataset Q5 [20 points] Interactive Visualization Use the dataset state-year-earthquakes.csv provided in the Q5 folder to create an interactive line chart and sub-chart. This dataset[3] contains the earthquake counts by U.S. state and region, in the years 2010 to 2015 (inclusively). In the data sample below, each row under the header represents a state, its region, year, and count of earthquakes. state, region, year, count Hawaii,West,2010,17 Hawaii,West,2011,34 [3 points] Create a line chart. Summarize the data by displaying the count of earthquakes by region for each year. You will need to sum the count of earthquakes by year for all states in their respective regions. Then, display one line for each of the 4 regions in the dataset. Axes: All axes should automatically adjust based on the data. Do not hard-code any values. The vertical axis will represent the total count of earthquakes for a region. Display these values using a linear scale. The horizontal axis will represent the years. Display these values using a time scale. [3 points] Line styling, legend, and title. Lines: Each line should use a different color of your choosing to differentiate between regions. Display a dot shape over each data point in the line chart(i.e., a line should have one dot displayed for each year). Legend: Display a legend on the right-hand portion of the chart that maps the line color to the name of the region. Title: Display the title “US Earthquakes by Region 2010-2015” at the top of the plot. The line chart should be similar in appearance to the chart provided in figure 5.b Note: The data provided in state-year-earthquakes.csv requires some processing for aggregation. All aggregation must only be performed in javascript; you must not modify state-year-earthquakes.csv. That is, your code should first read the data from .csv file as is, then you may process the loaded data using javascript. Figure 5b.Line Chart representing count of earthquakes by year for each region Interactivity and sub-chart. In the next few parts of this question, you will create event handlers to detect mouseover and mouseout events over each dot shape that you added in Q5.b, so that when hovering over a dot, a horizontal bar chart representing the earthquake count for each state in a region will be shown below the line chart (for the year of that dot). For example, hovering over the dot for the West region in 2011 will display the bar chart for all states in the Western region and their individual earthquake counts in 2011. See Figure 5c for an example. Figure 5c.Bar chart representing count of earthquakes for the Western region in 2011 [5 points] Create a Bar chart Use a horizontal design for the bar chart, with one bar per state in the selected region. Each bar represents the count of earthquakes for one state in the selected year. Axes: All axes should automatically adjust based on the data. Do not hard-code any values. The vertical axis represents states in a region. The state names should be sorted in ascending order on the vertical axis where the state with the lowest amount of earthquakes is at the bottom and the state with the highest order of earthquakes is at the top. Note: If a region has multiple states with an equivalent count of earthquakes, then order those state names in ascending alphabetical order. e.g., Alabama, Delaware, and Florida have 0 earthquakes in 2013. They will be ordered as: … Florida Delaware Alabama The horizontal axis represents the count of earthquakes for the selected year. Display these values using a linear scale. [3 points] Bar styling and title Bars: All bars should have the same color and a fixed bar width. Title: Display a title with format “ern Region Earthquakes ” at the top of the plot where , and are the variables set by hovering over a dot in the line chart. e.g., If displaying earthquakes for the South in 2012, the title would read: “Southern Earthquakes 2012” [3 points] Mouseover Event Handling The barchart and its title should only be displayed during mouseover events for a dot in the line chart. The dot in the line chart should change to a larger size during mouseover to emphasize that it is the selected point. [3 points] Mouseout Event Handling The barchart and its title should be hidden from view on mouseout and the dot previously mouseovered should return to its original size. The graph should exhibit interactivity similar to the .gif in Figure 5f. Figure 5f.Line Chart+BarChart demonstrating interactivity Q5 Deliverables: The size of the dot in the line chart should be reset. The directory structure should be as follows: Q5/ interactive.(html/js/css) state-year-earthquakes.csv interactive.(html/js/css) – The html, javascript, css to render the visualization in Q5. state-year-earthquakes.csv – The datasets used to show the information of each state. Q6 [20 points] Choropleth Map of State Data Example of choropleth map: Unemployment rates Use the dataset[4] provided in the file state-earthquakes.csv and states-10m.json (in the Q6 folder) and visualize them as a choropleth map. Each record in state-earthquakes.csv represents a state and is of the form , where State: the name of the state. e.g., Alabama. Region: the region which the state belongs to. e.g., South. 2010,…,2015: the number of earthquakes in that state in 2010, …, 2015, respectively. Total Earthquakes: the total number of earthquakes in that state during 2010-2015 (the number of earthquakes in the state-earthquakes.csv file have been slightly modified from the original values and do not represent the official figures). The states-10m.json file is a TopoJSON topology containing two geometry collections: states, and nation. a. [15 points] Create a choropleth map using the provided datasets, use Figure 6 below as reference. [10 points] The color of each state should correspond to the log of total earthquakes in that state (Total Earthquakes field in state-earthquakes.csv.). i.e., darker colors correspond to higher total earthquakes in that state and lighter colors correspond to lower total earthquakes in that state in log scale. Use gradients of only one particular hue. Use promises (part of the d3.v5.min.js file present in the lib directory; there is no need to download or install anything) to easily load data from multiple files into a function. Use topojson (present in the lib folder) to draw the choropleth map. [5 points] Add a vertical legend showing how colors map to the total number of earthquakes. (In the example shown in Figure 6, there are 7 color gradations, but you must use exactly 9 in your submission.) b. [5 points] Add a tooltip using the d3-tip.min library (in the lib folder). On hovering over a state, the tooltip should show the following information on each line: (1) state name, (2) region, and (3) total earthquakes. The tooltip should appear when the mouse hovers over the state. On mouseout, the tooltip should disappear. Use Figure 6 below as reference. We recommend that you position the tooltip some distance away from the mouse cursor, which will prevent the tooltip from “flickering” as you move the mouse around quickly (the tooltip disappears when your mouse leaves a state and enters the tooltip’s bounding box). Please ensure that the tooltip is fully visible (i.e., not clipped, especially near the page edges). Note: You must create the tooltip by only using d3-tip.min.js in the lib folder. Figure 6. Reference example for Choropleth Maps Q6 Deliverables: The directory structure should be organized as follows: Q6/ choropleth.(html/js/css) state-earthquakes.csv states-10m.json choropleth.(html /js /css)- The html/js/css file to render the visualization. state-earthquakes.csv – The datasets used to show the information of each state. states-10m.json – Dataset needed to draw the map. Q7 [5 points] Pros and Cons of Visualization Tools This question has two parts. The first part is optional and WILL NOT be graded and the second part is required and WILL be graded. a. [OPTIONAL – NO points] Line chart using R. Use R to create a line chart that looks the same as the 4th line chart in Q3, i.e., the line chart in Q3c with log scale y-axis. b. [5 points] Comparison write-up. If you did part a, you may use your experience with R from part a to complete this comparison write-up by comparing it with Tableau and D3 in the following aspects. If you did not do part a, pick a visualization system/tool/library/framework that you are familiar with (R, R Shiny, Python, Plotly, Excel, JMP, Matlab, Mathematica, Julia, etc.), and compare it with Tableau and D3 in the following aspect. Your write-up for each comparison aspect should be within the word limitations specified. Ease to develop for developers [40 words] Ease to maintain the visualization for developers (e.g., difficulty of the maintenance of the product as the requirements change, the data changes, the hosting platform changes, etc.) [40 words] Usability of visualization developed for end users [40 words] Scalability of visualization to “large” datasets [40 words] System requirements to run the visualization (e.g., browsers, OS, software licensing) for end users [40 words] Your answer will depend on what you have learned from working through the questions in this assignment, and your personal experience. Note: Your claims should be well justified, supported with compelling reasons. Simply stating that a tool is better (or worse) than D3 without justifications will receive a low (or no) score. We recommend formatting your answers as bullet lists for better readability. For example: 1. Ease to develop R: … Tableau: … D3: … 2. Ease to maintain the visualization R: … Tableau: … D3: … … Text (e.g., “Ease to develop”, “D3:“ above) mainly for organizing you answers do not count towards the word limit. Q7 Deliverables: The directory structure should be as follows: Q7/ linechart.jpg (optional) analysis.txt chart.jpg – the line chart you created using R. (note: this is optional and will not be graded) analysis.txt – comparison of R and D3. Important: folder structure of the zip file that you submit You are submitting a single zip file HW2-GTUsername.zip (e.g., HW2-jdoe3.zip, where “jdoe3” is your GT username), which must unzip to the following directory structure (i.e., a folder “HW2-jdoe3”, containing folders “Q1”, “Q2”, etc.). The files to be included in each question’s folder have been clearly specified at the end of each question’s problem description above. HW2-GTUsername/ lib/ d3.v5.min.js d3-tip.min.js d3-scale-chromatic.v1.min.js topojson.v2.min.js d3-dsv.min.js d3-fetch.min.js Q1/ table.png barchart.png age-distribution.csv population.csv Q2/ graph.(html / js / css) Q3/ linecharts.(html / js / css) linecharts.pdf earthquakes.csv explanation.txt Q4/ heatmap.(html / js /css) earthquakes.csv Q5/ interactive.(html / js / css) state-year-earthquakes.csv Q6/ choropleth.(html / js / css) state-earthquakes.csv states-10m.json Q7/ linechart.jpg (optional) analysis.txt Version 2 [1] Source: here [2]Source: USGS https://earthquake.usgs.gov/earthquakes/browse/stats.php [3] Source: USGS https://earthquake.usgs.gov/earthquakes/browse/stats.php [4]Source: USGS https://earthquake.usgs.gov/earthquakes/browse/stats.php Published by Google Drive–Report Abuse
Q1 [40 pts] Scalable single-PC PageRank on 70M edge graph In this question, you will learn how to use your computer’s virtual memory to implement the PageRank algorithm that will scale to graph datasets with as many as billions of edges using a single computer (e.g., your laptop). As discussed in class, a standard way to work with larger datasets has been to use computer clusters (e.g., Spark, Hadoop) which may involve steep learning curves, may be costly (e.g., pay for hardware and personnel), and importantly may be “overkill” for smaller datasets (e.g., a few tens or hundreds of GBs). The virtual memory based approach offers an attractive, simple solution to allow practitioners and researchers to more easily work with such data (visit the NSF-funded MMap project’s homepage to learn more about the research). The main idea is to place the dataset in your computer’s (unlimited) virtual memory, as it is often too big to fit in the RAM. When running algorithms on the dataset (e.g., PageRank), the operating system will automatically decide when to load the necessary data (subset of whole dataset) into RAM. This technical approach to put data into your machine’s virtual memory space is called “memory mapping”, which allows the dataset to be treated as if it is an in-memory dataset. In your (PageRank) program, you do not need to know whether the data that you need is stored on the hard disk, or kept in RAM. Note that memory-mapping a file does NOT cause the whole file to be read into memory. Instead, data is loaded and kept in memory only when needed (determined by strategies like least recently used paging and anticipatory paging). You will use the Python modules mmap and struct to map a large graph dataset into your computer’s virtual memory. The mmap() function does the “memory mapping”, establishing a mapping between a program’s (virtual) memory address space and a file stored on your hard drive — we call this file a “memory-mapped” file. Since memory-mapped files are viewed as a sequence of bytes (i.e., a binary file), your program needs to know how to convert bytes to and from numbers (e.g., integers). struct supports such conversions via “packing” and “unpacking”, using format specifiers that represent the desired endianness and data type to convert to/from. Q1.1 Set up Pypy Ubuntu sudo apt-get install pypy MacOS Install Homebrew Run brew install pypy Windows Download the package and then install it. Run the following code in the Q1 directory to learn more about the helper utility that we have provided to you for this question. $ pypy q1_utils.py –help Q1.2 Warm Up (15 pts) Get started with memory mapping concepts using the code-based tutorial in warmup.py. You should study the code and modify parts of it as instructed in the file. You can run the tutorial code as-is (without any modifications) to test how it works (run “python warmup.py” on the terminal to do this). The warmup code is setup to pack the integers from 0 to 63 into a binary file, and unpack it back into a memory map object. You will need to modify this code to do the same thing for all odd integers in the range of 1 to 42. The lines that need to be updated are clearly marked. Note: You must not modify any other parts of the code. When you are done, you can run the following command to test whether it works as expected: $ python q1_utils.py test_warmup out_warmup.bin It prints True if the binary file created after running warmup.py contains the expected output. Q1.3 Implementing and running PageRank (25 pts) You will implement the PageRank algorithm, using the power iteration method, and run it on the LiveJournal dataset (an online community with millions of users to maintain journals and blogs). You may want to revisit the MMap lecture slides (slide 9, 10) to refresh your memory about the PageRank algorithm and the data structures and files that you may need to memory-map. (For more details, read the MMap paper.) You will perform three steps (subtasks) as described below. Step 1: Download the LiveJournal graph dataset (an edge list file) The LiveJournal graph contains almost 70 million edges. It is available on the SNAP website. We are hosting the graph on our course homepage, to avoid high traffic bombarding their site. 3 Step 2: Convert the graph’s edge list to binary files (you only need to do this once) Since memory mapping works with binary files, you will convert the graph’s edge list into its binary format by running the following command at the terminal/command prompt: $ python q1_utils.py convert
Q1 [25 pts] Analyzing a Graph with Hadoop/Java a) [15 pts] Writing your first simple Hadoop program Imagine that your boss gives you a large dataset which contains an entire email communication network from a popular social network site. The network is organized as a directed graph where each node represents an email address and the edge between two nodes (e.g., Address A and Address B) has a weight stating how many times A wrote to B. Your boss wants to find out the people most frequently contacted by others. Your task is to write a MapReduce program in Java to report the 117 51 1 194 51 1 299 51 3 230 151 51 194 151 79 51 130 10 Your code should accept two arguments upon running. The first argument (args[0]) will be a path for the input graph file on HDFS (e.g., /user/cse6242/graph1.tsv), and the second argument (args[1]) will be a path for output directory on HDFS (e.g., /user/cse6242/q1output1). The default output mechanism of Hadoop will create multiple files on the output directory such as part-00000, part-00001, which will be merged and downloaded to a local directory by the supplied run script. Please use the run scripts for your convenience. The format of the output should be such that each line represents a node ID and the largest weight among all its inbound edges. The ID and the largest weight must be separated by a tab (t). Lines do not need be sorted. The following example result is computed based on the toy graph above. Please exclude nodes that do not have incoming edges (e.g., those email addresses that never get contacted by anybody). For the toy graph above, the output is as follows. 51 3 151 79 130 10 Test your program on graph1.tsv and graph2.tsv. To demonstrate how your MapReduce procedure works, use the inline example above, trace the input and output of your map and reduce functions. That is, given the above graph as the input, describe the input and output of your map and reduce function(s) and how the functions transform/process the data (provide examples whenever appropriate). Write down your answers in description.pdf. Your answers should be written in 12pt font with at least 1” margin on all sides. Your pdf (with answers for both parts a and b) should not exceed 2 pages. You are welcome to explain your answers using a combination of text and images. b) [10 pts] Designing a MapReduce algorithm (and thinking in MapReduce) Design a MapReduce algorithm that accomplishes the following task: given an unordered collection of two kinds of records, the algorithm will join (combine) record pairs from the collection. In practice, if you need join a lot of data on Hadoop, you would typically higher level tools like Hive and Pig, instead of writing code to perform the joins yourself. Learning how to write the code here will help you gain deeper understanding of the innerworking of the join operations, and will help you decide what kinds of joins to use under different situations. 6 We highly recommend that you read this excellent article about joins using MapReduce, which will give you some hints to correctly complete this question. NOTE: You only need to submit pseudo code, a brief explanation of your algorithm, and the trace of input and output of your map and reduce functions for the example given below. No coding is required. Input of your algorithm: Student, Alice, 1234 Student, Bob, 1234 Department, 1123, CSE Department, 1234, CS Student, Joe, 1123 The input is an unordered collection of records of two types: Student and Department. The Student record is of the form
Q1 [10 pts] Designing a good table. Visualizing data with Tableau. Imagine you are a data scientist working with United Nations High Commissioner for Refugees (UNHCR). Perform the following tasks to aid UNHCR’s understanding of persons of concern. a. [5 pts] Good table design. Create a table to display the details of the refugees (Total Population) in the year 2005 from the data provided in unhcr_persons_of_concern.csv. You can use any tool (e.g., Excel, HTML) to create the table. Keep suggestions from class in mind when designing your table (see lectures slides, specifically slide 43, for what to, but you are not limited to the techniques described). Describe your reason for choosing the techniques you use in explanation.txt in no more than 50 words. 2 b. [5 pts] Tableau: Visualize the demographic attributes (age, sex, country of origin, asylum seeking country) in the file unhcr_popstats_demographics.csv (in the folder Q1) for any given year in one chart. Tableau is a popular InfoViz tool and the company has provided us with student licenses. Go to tableau activation and select “Get Started”. On the form, enter your Georgia Tech email address for “Business email” and “Georgia Institute of Technology” for “Organization”. The Desktop Key for activation is available in T-Square Resources as “Tableau Desktop Key”. This key is for your use in this course only. Do not share the key with anyone. Provide a rationale for your design choices in this step in the file explanation.txt in no more c. [3 pts] Scaling node sizes: 1. Scale the radius of each node in the graph based on the degree of the node. 2. In explanation.txt, using no more than 40 words, discuss your scaling method you have used and explain why you think it is a good choice. There are many possible ways to scale, e.g., scale the radii linearly, by the square root of the degree, etc. d. [6 pts] Pinning nodes (fixing node positions): 1. Modify the html so that when you double click a node, it pins the node’s position such that it will not be modified by the graph layout algorithm (note: pinned nodes can still be dragged around by the user but they will remain at their positions otherwise). Node pinning is an effective interaction technique to help users spatially organize nodes during graph exploration. 2. Mark pinned nodes to visually distinguish them from unpinned nodes, e.g., pinned nodes are shown in a different color, border thickness or visually annotated with an “asterisk” (*), etc. 3. Double clicking a pinned node should unpin (unfreeze) its position and unmark it. Q2 Deliverables: The directory structure should be as follows: Q2/ graph.html explanation.txt graph.js, graph.css (if not included in graph.html) Q3 [15 pts] Scatter plots Tutorial: Making a scatter plot Use the dataset provided in the file data.tsv (in the folder Q3) to create two scatter plots. 1 a. [8 pts] Create a scatter plot with the distribution feature on the Y-axis and the body mass feature on the X-axis. Use different symbols and colors to indicate the different species: ● Red circles for Lagomorpha ● Blue squares for Didelphimorphia ● Green triangles for Dasyuromorphia b. [2 pts] Add a legend to the scatter plot to show how species names map to the colored symbols. c. [3 pts] Create another scatter plot using the same data, symbols, and legend as above, but use the log scale instead for both axes. Note: The two scatter plots should be placed on a single html page, one after the other, as shown in Figure 1; your plots’ visual design can be different from what is shown. d. [2 pts] Explain in no more than 50 words, in explanation.txt, when we may want to use log scales in charts (e.g., in scatter plots). 1 Derived from source: https://www.esapubs.org/archive/ecol/E084/094/#data 4 Figure 1. Example of how the two scatter plots should be arranged on a single HTML page. First show the plot from part a, then the one from part c. Q3 Deliverables: The directory structure should be organized as follows: Q3/ scatterplot.(html / js / css) explanation.txt scatterplot.(pdf / png) data.tsv ● scatterplot.(html / js / css) – the html/js/css files created. ● explanation.txt – the text file with your answer for 3d. ● scatterplot.(pdf / png) – a screenshot (png or pdf format) showing the two scatter plots created above. ● data.tsv – the dataset 5 Q4 [15 pts] Heatmap and Select Box Example: 2D Histogram, Select Options Use the dataset provided in heatmap.csv (in the folder Q4) that describes power usage (kWh) across 2 multiple zip codes in Los Angeles and visualize it using D3 heatmaps. a. [6 pts] Create a heatmap of the power usage over time for zip code 90077. Place the month on the heatmap’s horizontal axis and the year on its vertical axis. Power readings will be represented by colors in the heatmap. b. [3 pt] Add axes and legends to both charts similar to the 2D Histogram example. Instead of placing the month number on the horizontal axis, place the name of the month (“Jan”, “Feb”, “Mar”, etc.). Use d3.axis()’s member function .tickFormat() to provide a custom format to each tick value on the axis. c. [6 pt] Now create a drop down select box with D3 that is populated with the unique zip codes in ascending order. When the user selects a different zip code in this select box, the heatmap for power usage should be updated with the values corresponding to the selected zip code. The default zip code when the page loads should be 90077. 2 Source: https://catalog.data.gov/dataset/water-and-electric-usage-from-2005-2013-83298 6 Q4 Deliverables: The directory structure should look like (remember to include the d3 library): Q4/ heatmap.(html / js /css) heatmap.(png / pdf) heatmap.csv ● heatmap.(html / js/ css) – the html / js / css files created. ● heatmap.(png / pdf) – a screenshot (png or pdf format) of the plots created in Q4.b ● heatmap.csv – the dataset Q5 [25 pts] Sankey Chart Example: Sankey diagram from formatted JSON Formula One racing is a championship sport in which race drivers represent teams to compete for points over several races (also called Grand Prix) in a season. The team with the most points at the end of a season wins the prestigious Formula One World Constructors’ Championship award. You will visualize the flow of points for the races held in 2016 . The drivers win points according to their final 3 standing in each race, which finally get added to their respective team’s total. Note: The implementation of certain parts in this question may be quite challenging. Figure 2. Example Sankey Chart visualizing the flow of points for the 2015 season 3 Source: https://ergast.com/mrd/ 7 a. [15 pts] Create a Sankey Chart using the provided datasets (races.csv and teams.csv) in the Q5 folder. The chart should visualize the flow of points in the order: race → driver → team You must use the sankey.js provided in the lib folder. You can keep the blocks’ vertical positions static. Your chart should look similar to the example Sankey Chart for the 2015 season as shown in the above image. Note: For this part, you will have to read in the csv files and combine the data into a format that can be passed to the sankey library. To accomplish this, you may find the following javascript functions useful: d3.nest(), array.filter(), array.map() b. [6 pts] Use the d3-tip library to add tooltips as shown in the above image. You are welcome to make your own visual style choices using css properties. Note: You must create the tooltip by only using d3.tip.v0.6.3.js p resent in the lib folder. c. [4 pts] From the visualization you have created, determine the following: 1. [1 pt] Which driver won the Grand Prix 2016? 2. [1 pt] Which team won the Grand Prix 2016? 3. [1 pt] Which driver won the Spanish Grand Prix? 4. [1 pt] Which team has the maximum number of players? Put your answers in observations.txt. Modify the template provided to you (in Q5 folder) by replacing team_name/driver_name with your answer Sample observations.txt 1.driver_name 2.team_name 3.driver_name 4.team_name Q5 Deliverables: The directory structure should be as follows: Q5/ races.csv teams.csv viz.(html/js/css) observations.txt ● races.csv and teams.csv – the data sets (unmodified) ● viz.(html/js/css) – The html, javascript, css to render the visualization in Q5.a and b. ● observations.txt – Your answer for Q5.c. 8 Q6 [20 pts] Interactive visualization Mr. Fluke runs a small company named FooBar. His company manufactures eight products around the year. He wants you to create an interactive visualization report using D3 so that he can see the total revenue generated per product type and the revenue breakdown across product types for the four quarters in 2015. Use the dataset provided in the Q6 folder. Integrate the dataset provided in dataset.txt directly in an array variable in the script. Example:
Q1 [45 pts] Collecting and visualizing Twitter data 1. [30 pts] You will use the Twitter REST API to retrieve (1) followers, (2) followers of followers, (3) friends and (4) friends of friends of a user on Twitter (a Twitter friend is someone you follow and a Twitter follower is someone who follows you). a. The Twitter REST API allows developers to retrieve data from Twitter. It uses the OAuth mechanism to authenticate developers who request access to data. Here’s how you can set up your own developer account to get started: ● Twitter : Create a Twitter account, if you don’t already have one. ● Authentication: You need to get API keys and access tokens that uniquely authenticate you. Sign into Twitter Apps with your Twitter account credentials. Click ‘Create New App’. While requesting access keys and tokens, enter: Name dva_hw1_
MATH5905 Term One 2025 Assignment One Statistical Inference University of New South Wales School of Mathematics and Statistics MATH5905 Statistical Inference Term One 2025 Assignment One Given: Friday 28 February 2025 Due date: Sunday 16 March 2025 Instructions: This assignment is to be completed collaboratively by a group of at most 3 students. Every effort should be made to join or initiate a group. (Only in a case that you were unable to join a group you can present it as an individual assignment.) The same mark will be awarded to each student within the group, unless I have good reasons to believe that a group member did not contribute appropriately. This assignment must be submitted no later than 11:59 pm on Sunday, 16 March 2025. The first page of the submitted PDF should be this page. Only one of the group members should submit the PDF file on Moodle, with the names, student numbers and signatures of the other students in the group clearly indicated on this cover page. By signing this page you declare that: I/We declare that this assessment item is my/our own work, except where acknowledged, and has not been submitted for academic credit elsewhere. I/We acknowledge that the assessor of this item may, for the purpose of assessing this item reproduce this assessment item and provide a copy to another member of the University; and/or communicate a copy of this assessment item to a plagiarism checking service (which may then retain a copy of the assessment item on its database for the purpose of future plagiarism checking). I/We certify that I/We have read and understood the University Rules in respect of Student Academic Misconduct. Name Student No. Signature Date 1 MATH5905 Term One 2025 Assignment One Statistical Inference Problem One a) Suppose that the X and Y are two components of a continuous random vector with a density fX,Y (x, y) = 12xy 3, 0 < x < y, 0 < y < c (and zero else). Here c is unknown. i) Find c. ii) Find the marginal density fX(x) and FX(x). iii) Find the marginal density fY (y) and FY (y). iv) Find the conditional density fY |X(y|x). v) Find the conditional expected value a(x) = E(Y |X = x). Make sure that you show your working and do not forget to always specify the support of the respective distribution. b) In the zoom meeting problem from the lecture, show that the probability that if there are 40 participants in the meeting then the chance that two or more share the same birthday, is very close to 90 percent. Problem Two A certain river floods every year. Suppose that the low-water mark is set at 1 and the high- water mark X has a distribution function FX(x) = P (X ≒ x) = 1? 1 x3 , 1 ≒ x 0 and > 0 the beta function B( , ) = ÷ 1 0 x ?1(1 ? x) ?1dx satisfies B( , ) = ( ) ( ) ( + ) where ( ) = ÷﹢ 0 exp(?x)x ?1dx. A Beta ( , ) distributed random vari- able X has a density f(x) = 1B( , )x ?1(1? x) ?1, 0 < x < 1, with E(X) = /( + ). iii) Seven observations form this distribution were obtained: 2, 3, 5, 3, 5, 4, 2. Using zero-one loss, what is your decision when testing H0 : ≒ 0.80 against H1 : > 0.80. (You may use the integrate function in R or any favourite programming package to answer the question.) Problem Four A manager of a large fund has to make a decision about investing or not investing in certain company stock based on its potential long-term profitability. He uses two independent advi- sory teams with teams of experts. Each team should provide him with an opinion about the profitability. The random outcome X represents the number of teams recommending investing in the stock to their belief (based on their belief in its profitability). If the investment is not made and the stock is not profitable, or when the investment is made and the stock turns out profitable, nothing is lost. In the manager*s own judgement, if the stock turns out to be not profitable and decision is made to invest in it, the loss is equal to four times the cost of not investing when the stock turns out profitable. The two independent expert teams have a history of forecasting the profitability as follows. If a stock is profitable, each team will independently forecast profitability with probability 5/6 (and no profitability with 1/6). On the other hand, if the stock is not profitable, then each team predicts profitability with probability 1/2. The fund manager will listen to both teams and then make his decisions based on the random outcome X. a) There are two possible actions in the action space A = {a0, a1} where action a0 is to invest and action a1 is not to invest. There are two states of nature = { 0, 1} where 0 = 0 represents ※profitable stock§ and 1 = 1 represents ※stock not profitable§. Define the appropriate loss function L( , a) for this problem. b) Compute the probability mass function (pmf) for X under both states of nature. c) The complete list of all the non-randomized decisions rules D based on x is given by: d1 d2 d3 d4 d5 d6 d7 d8 x = 0 a0 a1 a0 a1 a0 a1 a0 a1 x = 1 a0 a0 a1 a1 a0 a0 a1 a1 x = 2 a0 a0 a0 a0 a1 a1 a1 a1 For the set of non-randomized decision rules D compute the corresponding risk points. d) Find the minimax rule(s) among the non-randomized rules in D. e) Sketch the risk set of all randomized rules D generated by the set of rules in D. You might want to use R (or your favorite programming language) to make the sketch precise. 3 MATH5905 Term One 2025 Assignment One Statistical Inference f) Suppose there are two decisions rules d and d∩. The decision d strictly dominates d∩ if R( , d) ≒ R( , d∩) for all values of and R( , d) < ( , d∩) for at least one value . Hence, given a choice between d and d∩ we would always prefer to use d. Any decision rule that is strictly dominated by another decisions rule is said to be inadmissible. Correspondingly, if a decision rule d is not strictly dominated by any other decision rule then it is admissible. Indicate on the risk plot the set of randomized decisions rules that correspond to the fund manager*s admissible decision rules. g) Find the risk point of the minimax rule in the set of randomized decision rules D and determine its minimax risk. Compare the two minimax risks of the minimax decision rule in D and in D. Comment. h) Define the minimax rule in the set D in terms of rules in D. i) For which prior on { 1, 2} is the minimax rule in the set D also a Bayes rule? j) Prior to listening to the two teams, the fund manager believes that the stock will be profitable with probability 1/2. Find the Bayes rule and the Bayes risk with respect to his prior. k) For a small positive ? = 0.1, illustrate on the risk set the risk points of all rules which are ?-minimax. Problem Five The length of life T of a computer chip is a continuous non-negative random variable T with a finite expected value E(T ). The survival function is defined as S(t) = P (T > t). a) Prove that for the expected value it holds: E(T ) = ÷﹢ 0 S(t)dt. b) The hazard function hT (t) associated with T . (In other words, hT (t) describes the rate of change of the probability that the chip survives a little past time t given that it survives to time t.) i) Denoting by FT (t) and fT (t) the cdf and the density of T respectively, show that hT (t) = fT (t) 1? FT (t) = ? d dt log(1? FT (t)) = ? d dt log(S(t)). ii) Prove that S(t) = e? ÷ t 0 hT (x)dx. iii) Verify that the hazard function is a constant when T is exponentially distributed, i.e.,
Academic Year: 2024-2025Assessment Introduction:Course:BEng Electronic EngineeringBEng Robotic EngineeringMEng Robotic Engineering Module Code: EL3105Module Title: Computer VisionTitle of the Brief: Video Stabilisation Type of assessment: Assignment IntroductionThis Assessment Pack consists of a detailed assignment brief, guidance on what you need to prepare, and information on how class sessions support your ability to complete successfully. The tutor responsible for this coursework will be available on 28th of January 2025 to answer questions related to this assessment. You’ll also find information on this page to guide you on how, where, and when to submit. If you need additional support, please make a note of the services detailed in this document. Submission details; How, when, and where to submit:Assessment Release date: 21/01/2025.Assessment Deadline Date and time: 28/03/2024 (23:59).Please note that this is the final time you can submit – not the time to submit!You should aim to submit your assessment in advance of the deadline.The Turnitin submission link on Blackboard, will be visible to you on: 03/03/2025.Feedback will be provided by: 06/05/2025.This assignment constitutes 50% of the total module assessment mark. You should write a report for this assignment documenting your solutions for the tasks defined in the assignment brief given below. The report should include a very short introduction describing the problem, description of your adopted solutions, a more extensive description of the results and conclusions section summarising the results. The report should be approximately 1500 words long, plus relevant materials (References and Appendices). You should use Harvard referencing system for this report. The report should be submitted electronically to “Video Stabilisation”Turnitin through Blackboard.You should submit a documented matlab/python code solving the given tasks. The code should be self-contained, i.e., it should be able to run as it is, without a need for any additional tools/libraries. In case, there are multiple files please create a single zip code archive containing all the files. The code should be submitted separately from the report into Blackboard EL3105 assignment area denoted as “Video Stabilisation Code”.Note: If you have any valid mitigating circumstances that mean you cannot meet an assessment submission deadline and you wish to request an extension, you will need to apply online, via MyUCLan with your evidence prior to the deadline. Further information on Mitigating Circumstances via this link.We wish you all success in completing your assessment. Read this guidance carefully, and any questions, please discuss with your Module Leader. Teaching into assessmentThe tutor responsible for this coursework will be available on 28/01/2025between 14:00 in the 16:00 to answer questions related to this assessment.All the algorithmic aspects necessary for the successful completion of the assignment have been covered during the lectures, tutorial, and laboratory sessions, these include feature detection, descriptor calculation, robust matching, estimation of a transformation aligning matched features, tracking and image warping. Additional SupportAll links are available through the online Student Hub1. Our Library resources link can be found in the library area of the Student Hub.2. Support with your academic skills development (academic writing, critical thinking and referencing) is available through WISER on the Study Skills section of the Student Hub.3. For help with Turnitin, see Blackboard and Turnitin Support on the Student Hub 4. If you have a disability, specific learning difficulty, long-term health or mental health condition, and not yet advised us, or would like to review your support, Inclusive Support can assist with reasonable adjustments and support. To find out more, you can visit the Inclusive Support page of the Student Hub.5. For mental health and wellbeing support, please complete our online referral form, or email [email protected]. You can also call 01772 893020, attend a drop-in, or visit our UCLan Wellbeing Service Student Hub pages for more information.6. For any other support query, please contact Student Support via [email protected]. For consideration of Academic Integrity, please refer to detailed guidelines in our policy document . All assessed work should be genuinely your own work, and all resources fully cited.8. For this assignment, you are not permitted to use any category of AI tools. Assignment BriefThis assignment is designed to give you an insight into selected aspects of computer vision applied to image feature extraction, feature matching, and motion compensation. You are asked to solve various tasks including detection of image features and their robust matching, write computer vision software as well as test your solution and interpret the results.This assignment will enable you to:• Deepen your understanding of the features/keypoints detection and robust matching between features, image transformation and warping models.• Recognise software design challenges behind implementations of computer vision algorithms.• Design and optimise software to meet specified requirements.• Acquire a hands-on understanding of image-based camera motion compensation.(These correspond to point 1, 2, 4 and 5 of the module’s learning outcomes. Module learning outcomes are provided in the Module Descriptor) The assignment consists of two main tasks. The first task is to explain and justify your selected methodology for video stabilisation, specifically focusing on camera motion jitter compensation. This should include a discussion of jitter compensation for a moving camera, handling moving objects within the scene, depth of field, and the ability to operate in real-time.The second task is to implement the selected method using MATLAB and/or Python. You are provided with two pre-recorded videos that increase in scene complexity. The first video features a static scene with a jittering, but otherwise static, camera. The second video contains a scene with moving objects. The two videos, video_seq_1.avi and video_seq_2.avi, are available on the Computer Vision Blackboard site.You are expected to write a MATLAB (and/or Python) program that removes the apparent scene motion caused by the camera jitter in the video sequences. Your algorithm should be designed to process the images sequentially, meaning that when estimating the current image correction, it should only use the current and preceding frames. Additionally, the algorithm should be optimized to work in real-time, with computational complexity that does not depend on the length of the sequence. Late work If the report and/or code are submitted after the deadline they will be automatically flagged as late. Except where an extension of the hand-in deadline date has been approved lateness penalties will be applied in accordance with the University policies. Marking schemeYour report should contain the following elements; it will be marked in accordance with the following marking scheme: Item Weight (%)1. Justification of the adopted video stabilisation approach 302. Software implementation 403. Evaluation of the results 154. Presentation of the report 15Total 100 ReferencesWang, Y., Huang, Q., Jiang, C., Liu, J., Shang, M. and Miao, Z. (2023) Video stabilization: A comprehensive survey, Neurocomputing, Volume 516, pp. 205-230.Wang, A., Zhang, L. and Huang, H. (2018) High-Quality Real-Time Video Stabilization Using Trajectory Smoothing and Mesh-Based Warping. IEEE Access, Vol 6. pp. 25157:66.Souza, M.R. and Pedrini, H. (2018) Digital Video Stabilization Based on Adaptive Camera Trajectory Smoothing, EURASIP Journal on Image and Video Processing, pp. 2018:37.Litvin, A., Konrad, J. and Karl, W.C. (2003) Probabilistic Video Stabilization Using Kalman Filtering and Mosaicking. IS&T/SPIE Symposium on Electronic Imaging, Image and Video Communications and Proc.Tordoff, B. and Murray, D.W. (2002) Guided Sampling and Consensus for Motion Estimation. European Conference on Computer Vision Disclaimer: The information provided in this assessment brief is correct at time of publication. In the unlikely event that any changes are deemed necessary, they will be communicated clearly via e-mail and a new version of this assessment brief will be circulated.Feedback Guidance:Reflecting on Feedback: how to improve.From the feedback you receive, you should understand:• The grade you achieved.• The best features of your work.• Areas you may not have fully understood.• Areas you are doing well but could develop your understanding. • What you can do to improve in the future - feedforward. Use the WISER: Academic Skills Development service. WISER can review feedback and help you understand your feedback. You can also use the WISER Feedback GlossaryNext Steps:• List the steps have you taken to respond to previous feedback.• Summarise your achievements • Evaluate where you need to improve here (keep handy for future work): Disclaimer: The information provided in this assessment brief is correct at time of publication. In the unlikely event that any changes are deemed necessary, they will be communicated clearly via e-mail and a new version of this assessment brief will be circulated.
An Individual or Group Essay - Written Business Plan (100%) Date Due;Week 21 (Summer term)Monday 31st March 2024 by 1200 hrs. 2500-3000 words (plus or minus 10%)not including references,bibliography This assignment is designed to increase your understanding of the entrepreneurial processes by researching,developing and effectively communicating a new business idea of your own choice,you will then present this idea in the form. of a written Business Plan(BP). All entrepreneurs need to be able to capture the essence of their business idea and be able to clearly articulate this to a variety of interested parties,for example potential investors.Your writen business plan needs to succinctly paint a picture that allows the reader to understand the concept of the business.Remember that if you pick a Café in Lancaster, or selling cakes on Campus your BP will need to be brilliant!If you select a business idea that is more complex, less certain, less understood there is room for manoeuvre. Ambition,endeavour and bravery are required for any business to be successful in today's business world,make sure this comes through in your BP. This essay will require you to have thought through the business plan and its components in order for you to produce a pitch that brings your idea to life.Your first draft will likely be 4500 words or more, the skill in good writing is to condense the words and continue to condense them until you reach your word count without losing the goodness from your idea. This is not an easy skill to learn,so do not leave this assignment to the last minute and expect to produce an outstanding elevator pitch with one draft. Each week in the lectures and workshops we will provide you with the knowledge and tools to help you build your business start-up,and therefore your writen business plan. We will, as with so many elements of business'start with the end in mind'Stephen Covey. Individual assignment with a maximum of 2500-3000 words excluding references. Criteria Presentation(10%) With a short word count to present a business idea in full will require you to be creative with your presentation,it should be professional,concise and to the point. Business Analysis (50%) Demonstrate a clear understanding of the micro and macro environmental factors bearing down on your business plan.This should include the principles aspects you consider key to your future organisation,organisational structure,culture,management,sales,finance,operations and the geographical coverage you intend to cover.You wil take into account risk factors,you may risk rate these factors in priority. You may consider being innovative,or you may pick a good safe bet that you know works and generates cash.You will also include a section for the future business'what it will look like in three years'time. Identification and Application of UN SDG's.15% Demonstrate a clear understanding of the SDGs that this start up will impact upon in terms of their positive and negative effects.Clearly articulate how you will mitigate negative effects and how you have built positive effects into your planning.Explicitly state how you will measure and monitor these in your plan (SDG KPls) Probability of Success (25%) Demonstration that your business will work in practice,if it does not work in practice there is no point in doing it!You will need to demonstrate realism in the planning,application,projections,finance and cash considerations throughout your business plan.Failure to submit any of these documents will incur in late penalties per day/hour,be considered as a non-submission and can be awarded a mark of 0 (zero).
110.109 Introductory Financial Accounting Assignment 2 Booklet Distance / Internal Semester 1 - 2025 IMPORTANT INFORMATION Assignment 2 contributes 15% towards your final grade. Assessment 2 is due before 11:00 PM on Monday 31st March (New Zealand Time). To complete the assignment successfully, you are required to submit three (3) PDF files: (i) Prepare and export a ‘Journal Report’ for the month of January 2025. (ii) Prepare and export a ‘Profit and Loss Statement ’ for the period from 1 January 2025 to 31st January 2025; and (iii) Prepare and export a ‘Balance Sheet Statement’ as at 31st January 2025. Assignment 2 covers material from weeks 1 to 4 inclusive and mainly relates to the following learning outcomes: 1. Demonstrate an understanding of the financial reporting framework for general-purpose financial statements for commercial enterprises. 2. Identify, measure, record and communicate economic transactions and events of commercial enterprises’ operations using fundamental accounting concepts, including the double-entry accounting system. Before attempting this assignment, it is strongly recommended that you read the instructions on the following pages and the “Assignment Details” thoroughly. You should also study the relevant material in the text and make sure you understand the concepts covered by practicing the weekly review and workshop questions before or as you complete this assignment. Assessment 2 (Xero) requires you to sign up a Xero free trial account to complete it. Details for signing up and use of Xero are provided in the “Assessment 2 Details” on the following page. Please remember that all 110.109 assignments must be your own work. Discussion on STREAM or in study groups is fine, but comparing or suggesting answers as opposed to concepts may lead to marks being deducted to the extent of receiving zero marks if answers are too similar. ASSIGNMENT 2 GENERAL INSTRUCTIONS How to Get Started with This Assessment This assessment requires you to sign up a Xero free trial account to complete [https://www.xero.com/nz]. You need a valid email ID and may need a contact phone number (to authenticate log-in ID). Once you have signed up a Xero account, you can create a trial organisation. Please note that you can create as many trial organisations as you want, but each trial organisation lasts only 30 days. This means after you have set up an organisation that is required to complete this assessment, you will have only 30 days of access to process transactions and prepare journal reports and financial statements. To complete this assessment, you must follow the ‘Xero Accounting Student Manual ’ and a ‘pre-recorded webinar training session ’ available in the ‘Assignment 2 Files’ folder. 1. How to Download This Assignment from STREAM There are two folders under the “Assignment 2 Files” . One folder includes the ‘Assignment 2 Booklet (PDF format) ’ and the other folder includes two preparatory materials “Recorded training webinar (MP4 video format)” and “XERO Student Manual (PDF format)” to be downloaded to complete this assignment. Please note that the available XERO Student Manual is not the latest version (ver. 2019) and therefore you may find some a minor difference compare to the current platform of XERO. Assignment 2 Booklet (This book) Procedure to download Assignment 2 Booklet: • Log on to STREAM for course 110.109. • Go to the Assessment section, scroll down to the Assignment 2 subheading, and then click on the folder – “Assignment 2 Files,” and then click on the relevant file (PDF file) to download, i.e. “Assignment 2 Booklet” . • Choose “Save File” and give the destination where you would like to keep the file. Also, you can download the XERO Student Manual and Recorded Trainning Webinar following the same process. Remember to include your name and ID while submitting Assignment 2. FOLLOW THE ‘ASSIGNMENT 2 DETAILS’ as mentioned below. Contact the Course Coordinator if you have any problems with downloading the files from STREAM. 2. How to Complete Assignment 2 Following the Assessment – 2 Booklet for further instruction. XERO Accounting Relevant instructions are available within the Booklet in later section “Assignment 2 Details” 3. How to Submit Assignment 2 Assignment 2 is time-stamped when received by Stream; the deadline for on-time submission is at 11 PM on the due date [31st March, 2025]. Please note it is your responsibility to ensure that your assignment has been received. With the number of students involved, it is impossible for lecturers to know whether a missing assignment is the result of a student pulling out of the course, deciding to forgo the assignment marks, or the result of a potential gremlin in the system. Murphy’s Law applies and things can go wrong, so make certain you check with the submission dropbox after you have submitted your assignment. You should be able to see your file on the left hand side ofthe screen. Procedure to upload your Assignment 2: Please refer to the instructions under “Submitting Assessments to Stream Assignment boxes” on stream (by dragging down “Assessment” icon) Contact the teaching staff if you have any queries regarding the above instructions. The assignment submission drop boxes will remain open until the due date. 4. Assignment queries Please feel free to keep in touch with the 110.109 team regarding this assignment, preferably through the Assignment 2 forum. That way all students benefit and often we have found that the best way of learning is through discussion with your peers as well as teaching staff. Please note that this assignment is at individual, not group level. Discussion is fine but do not post your answers to the assignment on STREAM discussion forums, as that will lead to penalties. If you are unsure about this issue, you are welcome to use the Private Conversations. There are a large number of students with whom the teaching staff are involved in this Course, but that does not mean that you are not each individually very important to us. We value each student and will try to provide appropriate guidance to the best ofour ability. ASSIGNMENT 2 DETAILS 1. Start a new organization. Name of your organization: “Your name + ID number Ltd” (e.g., Tom_Kirk 2500789 Ltd) Industry: Choose among options “Gardening Service” Do you have Employees: Select “Yes” Are you registered for GST: Select “Yes” Then Click: “Start Trial” 2. Create a bank account. Bank name: Northpac Bank Ltd. Account name: Bank-your name (e.g., Bank-Tom Kirk) Account type: Other Account number: 1092025 3. Create the following account using the ‘Chart of Accounts’ option on the XERO website. Similarly, update the account (Bank – Tom Kirk) with an account code ‘601’ 4. Process the following events and transactions: re dueon25March2025All transactions include GST where relevant unless otherwise statedJanuary1Invested $25,000.00 cash in the bus Receive money entry, Account is 971 Capital-your name.January3Appointed a newadministrato forthnightly. Oliver will .forthepurchaseofnew office equipment $10 ficeEquipment.[ :Youmayneedtoverifythecompanyaccountaccept notification at your email address that used for XERO logi nightly GardeningService provided $1,287.50.Note:Newsalesinvoiceentry,Amountsare:TaxInclusive.Accountis200Sales[ :Youmayneedtoverifythecompanyaccountaccept notification at your email address thatusedforXEROlogin .00.Note:Spend money entry, Account is 477Salaries.January15Cash customers paid for GardenServices using mob Receive money entry, Amounts are: Tax Inclusi 138.00Note: Inclusive. Account is 260 OtherRevenue.January 23Paid CAR FIX Ltd. for Motors vehicle expenses $552.00 by E Spend Money, Amounts are: Tax Inclusive. Accountis449 MotorVehicle Expenses.January 27Paid for a family dinner out at BAY Restaurant with the business EFTPOS card$184.00.Note: gs.January 29Cash customers paid for Garden cleaning using mobile EFTPO Amounts are: Tax Inclusive. Accou companyBarcourt-JohnsonLtd,whichisbuilding50newhomes.Underthis ingandlandscaping services for Barcourt-Johnson Ltd. Thetotal expectedreve the contract is $50,000 (exclus gardening and landscaping service from February 7, 202 Requirements: (i) Prepare the ‘Journal Report’ for the month of January 2025 and export it to PDF file. Save the file and name as “surname_ID#_JR” (The ID# being your student ID number, e.g., KIRK_2500789_JR). You can generate the ‘Journal Report’ Oder by ‘Date’. (60 marks). (ii) Prepare and export a ‘Profit and Loss Statement’ for the period from 1 January 2025 to 31 January 2025. Export and Save the PDF file and name as “surname_ID#_PL” (The ID# being your student ID number, e.g., KIRK_2500789_PL). (20marks). (iii) Prepare and export a ‘Balance Sheet’ at 31 January 2025. Export and Save the PDF file and name as “surname_ID#_BS” (The ID# being your student ID number, e.g., KIRK_2500789_BS). (20 marks).
Real Estate Finance FINN3061 2024/2025 Module Information Details of the Module Outline, Prerequisites, Co-requisites, Overall Aim(s) of the module, Learning Objectives, including details of key skills this module will help students to acquire are available at the link given below: Programme and Module Handbook: Undergraduate Programme and Module Handbook 2024-2025 – FINN3061 Real Estate Finance (dur.ac.uk) Real Estate Finance Module Outline 2024-25 Teaching Methods and Contact Hours This module requires 200 hours of study. This includes a combination of lectures, workshops and independent study as follows: Activity Number Frequency Duration Total Hours Lectures 20 1 per week 1 hour 20 Workshops 8 4 in term 1 4 in term 2 1 hour 8 Preparation and Reading 172 Total 200 The hours of study include all formal contact hours (lectures and seminars), the time devoted to background reading, and all preparation and reading time associated both with the formal contact hours and the formative and summative assessments (including essays and examinations). Please note that attendance at workshops is compulsory and is monitored. Students will be allocated to seminar groups at the start of term. Formative Assessment The main aim of the formative assessment is to help you, in a structured way, to understand the material and its applications, consolidate your knowledge and further develop relevant skills. They do not count towards the overall mark for the module but are compulsory. The formative assessment for this module will consist of: • Weekly online quizzes Further guidance can be found in the Student Information Hub on information such as writing essays, the submission process, assessment criteria and grade descriptors. Summative Assessment The summative assessments constitute the formal assessment of a student’s performance and count towards the overall mark for the module. Summative assessment will take the form. of unseen written Online tests and an assignment. The Online tests will occur in the last week of each term (Week 10 in the Michaelmas term (Test 1) and Week 20 in the Epiphany term (Test 2)). The submission deadline for the summative assignment will be in May/June. Component: Online Test Component Weighting: 30% Element Length / duration Element Weighting Resit Opportunity Online test 1 45 minutes 50% same Online test 2 45 minutes 50% same Component: Assignment Component Weighting: 70% Element Length / duration Element Weighting Resit Opportunity Assignment 1500 words max 100% same The summative assignment details will be made available by the Learning and Teaching Team via your Learn Ultra Programme Site in due course. Further guidance can be found via the Student Information Hub on SharePoint including information on referencing, assessment criteria and grade descriptors. Unless clearly identified as group assessments, summative assessments (e.g. such as exams, assignments, presentations) have to be a student’s individual piece of work. Collaboration between students in writing individual summative assessments is not permitted. Past Examination Questions Copies of the examination papers for recent years are available at the University library’s exam depository here.(Note: the assessment for this module currently does not have an examination paper component). Assessment Criteria Performance in the formative and summative assessments for this module is judged against the following criteria: • Relevance to question(s) • Organisation, structure and presentation • Depth of understanding • Analysis and discussion • Use of sources and referencing • Overall conclusions Seeking Help You should always feel welcome to talk to staff whenever you wish to discuss any aspect of the module. Please do keep in touch with us. A small misunderstanding can turn into a big problem if it is not dealt with in a timely manner. The first port of call for any queries relating to your understanding of the of the module material and readings should be the lecturer who taught the relevant session. Please direct questions about administrative issues, such as the module outline, the exam structure and formative assessment etc. to the module leader. If you have problems that relate more generally to your studies across this and other modules, please contact your academic adviser or year tutor without delay. In serious cases you should normally also see the Programme Director. Full details of the support mechanisms that are in place are available via the Student Information Hub on SharePoint. The Student Information Hub is also a valuable source of information for any questions you might have, such as how to access various University services, how Business School processes work, and where to find University policies, etc. Detailed Syllabus and Reading List The following pages give details of the topics covered by the module. Further information will be provided on Learn Ultra as the module progresses. Private study of recommended reading material is an integral part of the module. The list of recommended reading given for each topic in the module syllabus is divided into ‘essential’ (ER) and ‘further’ reading. ‘Essential’ readings are primarily intended to reinforce your understanding of the core lecture material. Where more than one such item is listed, these should usually be regarded as alternatives. ‘Further’ reading is intended to promote greater depth and breadth of understanding. All sources listed will be available electronically via the University library or the internet. To make it easier for you to access journal articles and books online, please consider installing the viaDurham button in your browser. Most of the sources used in this module will be journal articles, which students will be able to read online or download. Many of the journal articles listed are accessible directly or indirectly via the EBSCO or the Science Direct database1. If you cannot find the article there, please look up the journal directly at: University Library : Online resources : electronic journals -Durham University. Recommended reading in the form. of online books can be accessed via the Reading List on Learn Ultra, which you will find under the “Start Here – Module Information” link. To ensure that all students can access the recommended reading at any time, please download the recommended reading in online books as chapters if this is possible, rather than read them online, as the latter limits the number of students who can simultaneously access the source.
INFT2051 – Mobile Application Development Project Progress Presentation Howeverifyouworkasapairortriotheexpectationofthelevelof complexityandpolishwillbeincreasedduetothenumberofpeople - - - sualStudioCode FilesVideo DurationIndividual: +/-1minute)Groups:12minutes (+/-1m - vary depending idesFor IndividualsFor GroupsA1_c1234567A1_c1234567c7654321
Assessment Brief 2024/2025 Please make sure you carefully read and understand the question or task. If you have unanswered questions, please post these on the course Moodle Discussion Forum, and we’ll respond. Assignment Information Course Code MGT5343 Course Title Human Resource Development Weighting 50% Question release date 13 January 2025 Submission date: 17 March 2025 Grades and Feedback to be released on: 7 April 2025 Word limit 3000 (+/- 10%) Action to be taken if word limit is exceeded Use academic judgement to adjust the grade to reflect failure to adhere to the word limit 1. QUESTION/ DESCRIPTION OF ACTIVITY This is an individual written assignment. Acting as an HRD consultant, you are asked to design a leadership or management development intervention with an emphasis on sustainability for an organisation of your choice. The organisation must have a clear sustainability agenda. The intervention must NOT draw on an existing programme/intervention. Your assignment should address/include the following: · Work through ALL the stages of the HRD cycle and explain how each stage will be designed and implemented. · Justify how your intervention is aligned to the sustainability strategy/goals of your chosen organisation. · Cost the proposal and make a case for the resources required. · Use the academic literature to frame, develop and justify, and critically evaluate your approach. · Write a short critical reflection as a conclusion on the key skills and roles needed by an HRD practitioner in the realisation of this intervention within the specific organisational context and the implications for their CPD. 2. ADDITIONAL INFORMATION Additional guidelines · Choose your organisation carefully: it must be an organisation with a stated sustainability agenda/strategy/goals – it can be of any nationality but must have an English-language website. Include a link to the website in the assignment. · Find a statement on the website or other documentation relating to its sustainability agenda. · Think about the HRD implications of this agenda. · Analyse what the learning needs might be for leaders or managers to lead/manage this strategy. You probably won’t have access to organisational members so you will have to deduce their needs in accordance with espoused organisational strategy. · Think about the internal and external contexts of the organisation – how will these affect HRD needs? · How will the training intervention be aligned to the organisational sustainability agenda (vertical, horizontal or both)? · Lay out the learning design – think about the format (face-to-face, online blended, etc.) and justify your choice(s) according to the learning aims/objectives, context and logistics. You can put sample materials in an appendix if you wish. · How does this design allow for human diversity (think about learning styles, preferences etc.) and possible cultural differences (for example, think about the countries that this organisation is operating in and the needs of expatriate leaders/managers)? · State how the training is going to be delivered (who is going to do it, where and when will it take place etc). · Anticipate any problems which might arise and how you would overcome them. · Draw up a budget for your intervention and justify the cost. State how you will ‘sell’ this plan to the senior management (especially if it goes over budget or includes some expensive items); might there be any unexpected/hidden costs to account for? · Explain how you will evaluate the intervention – think about what you (and your organisation) need to know from the evaluation. This should be aligned to the evaluation theories you draw on, what types of evaluation you employ and when it is conducted. · Who would you feedback the results to and why? · What measures would you put in place for continuing improvement? · What key skills would a HRD practitioner need to have and what roles would they need to play to realise this intervention within the specific organisational context? What might be most challenging and how might they hone and develop these through CPD along the way? Additional Points and Tips · Refer to theory throughout! · You can draw on lecture material and set readings but ensure to evidence appropriate further reading (suggested and beyond) throughout as well. · You must refer to the four-stage HRD cycle model; however, if you wish to also refer to some of the more complex models you can. · Please engage critically with the theory and models you employ, including the HRD cycle model itself! How have they been critiqued? What are their strengths and shortcomings for your purposes? How can the shortcomings be addressed? · Please do NOT use the same organisation as your classmates. · Report: Please make sure that your submission includes an introduction, main body and conclusion. Include a cover page with assignment topic and word count detailed. Use headings, sections and paragraphs and avoid bullet points. · Language: your project should be written in British, not American, English. · Referencing style: All pieces of assessment should use the Harvard style. of referencing throughout. · Be as creative as you like! 3. ASSESSMENT RUBRIC/ CRITERIA Criteria Excellent Very Good Good Satisfactory Weak 1) The organisation is clearly presented and the link to its sustainability strategy/goals is well articulated The organisation is very carefully selected with interesting/unusual/ challenging sustainability agendas. Excellent concise summary of the organisation and its operations Very carefully selected interesting organisation. Very good clear description of the organisation’s sustainability goals and general operations Thoughtful well justified choice of an organisation with a well stated sustainability agenda/ challenge. Good clear description of organisational operations Reasonable if rather obvious choice of organisation. Some justification given of choice given and competent description of the organisation Uninspired/random choice of organisation. Little justification for choice and little or no link to sustainability strategy. Vague and/or incomplete description of the organisation 2) Intervention design is strategically aligned to the organisation’s sustainability strategy/objectives and organisational training needs Excellent, fully detailed creative and imaginative intervention design. Fully in alignment with sustainability agendas and wider organisational strategy Design underpinned by and justified in terms of a comprehensive TNA strategy Very good, interesting intervention design. Very clearly underpinned by a clear TNA strategy and strategic organisational needs Good attempt at developing an informed and competent intervention design. Clear evidence that TNA was conducted and good alignment to organisational needs. A reasonable attempt at intervention design with some attempt at alignment to organisational strategy. Some evidence that a TNA was conducted and has influenced the design, although details rather vague in places. Some attempt at implementation design albeit rather lacking in imagination, details and justification. Little evidence of TNA nor alignment to organisational sustainability agendas or wider strategy. 3) The delivery plan is clear and contextually appropriate with a well justified evaluation plan Excellent, fully detailed creative and imaginative delivery plan showing clear understanding of contextual and cultural context. Full understanding of possible issues and clear contingency plan. Excellently presented and justified comprehensive evaluation plan. Very good, detailed, well thought through contextually and culturally appropriate delivery plan with understanding of potential problems and solutions to them. Well justified wide-ranging evaluation plan. Good attempt at developing an appropriate and competent delivery plan. Good appropriate evaluation plan A reasonable attempt at developing a delivery plan showing some understanding of contextual and cultural issues and resulting issues. Evaluation plan in place although details might be vague in places. Some attempt at developing a delivery plan albeit lacking in details and justification. Little consideration of cultural and contextual issues. Evaluation plan attempted but lacking in depth and details. 4) The budget is well-presented and justified and the case for resources clearly made Excellent, fully detailed, comprehensive well-presented and clearly presented budget. Thorough, well evidenced case presented with links clearly made to organisational strategy Very good, very clearly presented budget with few omissions. Compelling case for support with some links to organisational strategy. A competent, well put together budget with a few omissions. Justification for expenditure generally well presented for supporting organisational needs. A reasonable budget but lacks clarity and justification. Some attempt at aligning the case for support to organisational strategy. Budget weakly constructed and presented. Numerous omissions. Justifications lacking or unclear. Case for support rather half-hearted and/or missing supporting evidence. 5) Critical engagement with the existing literature and understanding of the strengths and weaknesses of chosen activities and learning approaches clearly demonstrated. Excellent critical engagement with a wide range of literature and resources (many self-sourced). Thorough knowledge of strengths and weaknesses of chosen approaches and activities fully articulated and balanced. Very good critical engagement with a range of literature and evidence of reading beyond course materials. Very good realisation of the strengths and weaknesses of chosen approaches and activities and clear attempts to balanced them. Good engagement with literature and resources with some evidence of criticality and wider reading. Clear efforts to complement and balance strengths and weaknesses of different activities and approaches. A reasonable attempt at engagement with course materials and resources but with little criticality or evidence of further reading. Some attempts to critique chosen approaches. Some attempt at engagement with course materials and resources but generally uncritical and descriptive. No or little attempt to reach beyond the course materials. Chosen approaches unjustified and non-critically applied. 6) Critical understanding of the key skills and roles needed by HRD practitioners in order to realise the intervention, along with CPD implications. Extremely comprehensive analysis of all key skills and roles needed. Excellent justification given for choice of most challenging skills/roles demonstrating excellent understanding of the challenges of the specific organisational context. Detailed and clear suggestions given for CPD. Comprehensive analysis of key skills and roles and roles required and well linked to the specific organisational context. The most challenging skills/roles clearly justified and sensible suggestions given for CPD. Good identification of key skills and roles needed. Some analysis of the demands of the organisational context. Some challenging skills/roles identified, and good suggestions given for CPD. Able to identify some of the key skills and roles needed but some omissions. Attempts an analysis of the demands of the organisational context and gives some suggestions for CPD albeit incomplete in places. Little attempt to identify the key skills and roles needed. Very little attention paid to the demands of the organisational context and/or suggestions given for CPD not given or very cursory. 7) Structure and logic of answer/argument; overall standard of presentation and written expression. Answer exemplary in terms of logic and structure of answer/ argument. Will normally show greatest skill and judgement in synthesising various strands of argument in the conclusion. Tone is appropriate and consistent. Language is academic and writing is clear, concise, and persuasive. Major arguments are effectively made, and ideas always relate back to main line of argument. Answer is clearly structured with clear and consistent logic of argument. There may be some minor errors. Will normally have strong and consistent conclusions that provide insight into the issue. Tone is appropriate and consistent. Language is academic and writing is clear and concise. Thoughtful progression of ideas and details. Major arguments are effectively made Structure of answer and logic of argument is generally sound but with some errors and/or unevenness. Conclusions here may be more inconsistent and/or summary-like; and/or less insightful. Tone is appropriate, language is academic, and writing is clear and effective, with few lapses. Very little or no unclear sentence phrasing. The progression of ideas could be more thoughtful. Most of the time ideas are connected well and relate back to main arguments. Clear attempt to structure the answer and develop logical arguments, but likely to be unconvincing and/or confused. Will normally have made a clear attempt to conclude piece, but this may be confused, inconsistent and/or lack insight. Contains informal language or conversational tone. Writing style. could be more effective. Some unclear sentence phrasing. Could weave main ideas more effectively into an overarching argument in response to the question(s) posed. Answer cannot be readily followed, with illogical claims made. Answer has no clear structure. Conclusions may be absent, perfunctory or too confused to demonstrate reasonable attainment. Non-academic writing style, lacking in clarity, precision, and/or persuasiveness. Error- ridden in terms of spelling or grammar. Poor syntax, making sentences and paragraphs difficult to understand.
COMP 6211 Biometrics Coursework A system is required to recognise people by images of their body shape. You are provided with images of clothed subjects standing on a (static) treadmill in our gait laboratory. This is part of the Large Southampton Gait Database [Shutler et al 2004]. You are required to design a biometric system that can be used to recognise these subjects as individuals. The dataset is in two folders: in the first folder two images are available for each subject, the second folder repeats some subjects, but not all of these subjects, again in two views. You should not use the subject identification number for recognition, it is there to aid development. You should also not use a face recognition system for your biometric system since the biometric system you are expected to develop should be based on body shapes. However, you are allowed to use a face detection algorithm to detect faces and use the face location with respect to other body parts in your analysis. The main objectives are to derive measures that can: i) show histograms of intra and inter class variations of subjects. ii) present Correct Classification Rates for subject recognition; and iii) compute Equal Error Rate (EER) for subject verification/recognition and then present the Correct Classification rate for the subject verification at the computed EER. You also need to present all necessary performance measures for your system. You can use any implementation system, though reward will be made for sophistication. A basic system could use manual measurement, an Excel spreadsheet and compare recognition and verification performances with a random result. More sophisticated versions can include computer vision and machine learning approaches and aim for automated recognition and verification of these subjects by their body image. You are required to write a max 2000 word report describing your approaches and your results. The report should be submitted zipped together with any operational code. The format of the report should follow the IEEE formats and should include Abstract, Intro, Method, Results, Discussion, Conclusions and References (if any). The zipped file should be submitted to the ECS Handin system, by 28th March 2025 at 4pm. The marking scheme is: Report presentation 25% Recognition performance 25% Selection and justification of approaches used for feature extraction 25% Analysis of the overall method: advantages, disadvantages and future directions 25% Total 100% This coursework will constitute 30% of the total assessment for COMP 6211 Biometrics
Assessment Brief – Academic Year 2025 Module Code ARTD2108 Assessment Type 3000 word illustrated To include 15 page creative ideation that can incorporate generative AI outcomes. Module Title MAJOR PROJECT (FMM) Weighting 100% Launch Date WC 27/01/2025 Word Count 100% - 3000 word illustrated To include 15 page creative ideation that can incorporate generative AI outcomes. Summative Submission Date Thurs 15th May 2025 by 1600 UK time Feedback to Students No earlier than 4 working weeks there after Method of Submission illustrated report 1000% Upload a PDF to Blackboard via Turnitin. Introduction The learning outcomes of this module are defined in the module profile, found on blackboard. Please constantly refer to these, as they are what you are assessed against. You are expected to demonstrate more independent working skills than in part 1. Your summative submission should demonstrate critical understanding of fashion retailing and future developments in the changing retail landscape. It should show you understand the operational practices of fashion retail management, retail distribution and their inter-relationships. It will demonstrate strategic thinking about contemporary issues impacting on the fashion retail environment today and finally an understanding of the role technology will play in that environment. Within the module you will investigate the principles of retail theory and understand how the application of retail strategy will enhance a brand. A Future Retail Concept focussing on Disability, Neuro divergence and/or Wellbeing. "Physical retail's evolution demands a fundamental shift from traditional transaction-focused spaces to experiential destinations. Successful retailers must create immersive, technology-enabled environments that deliver personalized experiences while addressing growing consumer demands for sustainability and community connection" Deloitte (2024) 'Future of Retail: The Phygital Revolution', Harvard Business Review Digital Articles, January 2024, pp. 1-8. "The post-pandemic retail landscape requires physical stores to serve as multifunctional spaces that merge digital innovation with human connection. Successful retailers will prioritize experiential retail, focusing on personalization, sustainability, and community engagement to create meaningful customer experiences" (McKinsey, 2024). McKinsey & Company (2024) 'Retail's Next Chapter: Beyond Digital Integration', Harvard Business Review, 102(1), pp. 112-120. Project Requirement You are to produce an Illustrated Report – 3000 words with 15 page visual ideation of a future retail space that places particular awareness and accountability towards disability, neuro divergence and/or wellbeing. Each week your lectures and seminars will cover a separate topic from the list below. Each section must be covered in your Illustrated report. Your task is to investigate and analyses the current state of retail and ideate the future of retail with an understanding of how retail considers disability, neuro divergence and/or wellbeing. You must present your findings in an Illustrated report structure. You must determine and choose the market level (luxury, mid-market, high street, value etc.) and type of store (Department Store, Speciality Store, Direct retailer, Concept Store, Sneaker, Boutique, High Street, Denim, Skate, Discount, Luxury, Outlet, Children’s, Fast Fashion, Lingerie, Sport etc.) You must also consider the geographic location of your store. So for example you cond choose a LUXURY, SNEAKER STORE in NEW YORK. (The above examples are not exhaustive). You must also give consideration to the following topics · Covid in Context: Current retail environment. The Impact that the pandemic has had on the retail store specifically towards disability, neuro divergence and wellbeing. · Conflict Theory: Historical changes to the store. Disruptive theory and challenges. · Community: How neighbourhood and localisation play a role in store design. · Commerce and Connectivity: Frictionless trading and instore technology. · Central Place Theory: Location, type and density of stores. · Content: In store product, supply chain and distribution models. · Competitive & Communicative strategies: The evolution of store design. · Curation: Experiential Retail, Innovation, VM and Exhibition The 15 ideation can be aligned to a brand (this can be UK, EU or an International brand) that will best align to the type of retail store from your report. Informed by your research an illustrated 3000 report and visualisation must use Adobe Creative Suite and visually consider and present the extensive ideation and illustrated research conclusions from your report. These can but are not limited to the following: · Location · Accessibility · Access · Overall mood. · Navigation · Health · Mental Stimulation · Inclusive spaces · Lighting, colour. · Assistive technology · VM, product display, store window · Materials, textures, audio and gamification. · Floor plan, product. · Technologies and in store innovation used. · Geographic location, distribution & supply chain infrastructure. Please note: Your choice of brand will determine the overall presentation of you illustrated concept and this should be evident in your photography, illustrations and depictions. Utilitise imagery that best demonstrates the overall ideation of your new future retail project. You can annotate and further articulate your ideas but the onus is on visualising your ideas. You can use generative AI but you must show all of your prompts and/0r reference all of the images you use.
Assessment 1 Brief: Challenges Analysis DDES9905 | Immersive Design, Complexity and Wicked Problems Assessment Task 1: Challenges Analysis WEIGHT 40% ASSESSMENT TYPE Report GROUP WORK No DUE DATE Week 5 Friday 11.55 pm AEST SUBMISSION What to Submit REQUIREMENTS An electronic PDF containing a design report of 1200 words (maximum) Where to Submit Moodle Submission Portal The final submission will be checked for originality through Turnitin when you submit. Penalties may apply for late submission as indicated in the course outline. Assessment Description Describe two challenges facing society or industry, one that you consider complex and one that you consider wicked and outline the differences. Describe the stakeholders and interdependencies in each challenge, including how you would go about researching, identifying and drawing boundaries around them. Finally, consider how you would begin to approach or intervene in these challenges to scope for interventions that use an immersive technique. Describe the purpose and rationale behind your choice. How to Complete this Assessment 1. Identify and describe two different types of challenges. 2. Contextualise the characteristics that make them complex or wicked. 3. Describe the stakeholders and any potential interdependencies - political, economic, cultural, etc. How you would research or identify these elements? 4. Explain how you might intervene in these challenges to scope potential solutions using an immersive technique(s). 5. Justify your choices, explaining the purpose behind your approach. Highlight any drawbacks or biases you think you might encounter. Course Learning Outcomes addressed in this task 1. Recognise complexity and analyse wicked problems facing industry and society. 2. Use visualisation, simulation and immersive practices to sketch and rapidly prototype design solutions to complex and wicked problems