Sittyba Fundamentals of Machine Learning (CS-UA 473) Spring Semester 2025 §0.0 Purpose, design & philosophy (PDP): As data and computational resources become ever more abundant, the ability to leverage both has an increasingly transformational impact on economy, society and civilization, from prediction to generative AI. “Machine Learning” is an umbrella term for the algorithms, tools and approaches that drive this development. This class is a survey course intended to give an overview of all major flavors of Machine Learning that are in common use in the first quarter of the 21st century. Importantly, we will place a particular emphasis on understanding the foundations that machine learning algorithms rest on, as we enter the 4th age of human development. The ultimate purpose of this class is for you to be able to apply these fundamental machine learning approaches to solve real world problems both with confidence and competence. §1.0 Instructor: Pascal Wallisch, PhD [teaches the lecture] Office: 60 Fifth Avenue, Room 210 Phone: (212) 998-8430 Email: [email protected] Office hours: Tu 2.15-3.15 pm (Walk-ins welcome, first come, first serve - take a fox stick) We 1.00-2.00 pm (Walk-ins welcome, first come, first serve - take a fox stick) Th 3.00-4.00 pm (Walk-ins welcome, first come, first serve - take a fox stick) §1.1 Teaching Assistants (email:[email protected]): Course assistant [teaches the lab]: Umang Sharma. OH: Thu 12.30-1.30pm in 60 5th Ave, Room 402 Tutor [teaches one on one]: Hamza Alshamy. Schedule one-on-one sessions viaCalendly Section leader [teaches the recitations]: Zhe Zeng. OH: Fr 12.00-1.00 pm, Room 340 in 60 5th Ave Graders [grade assignments]: Several, anonymous, no contact (teaching vs. grading firewall) §1.2 Lecture times: Mo & We 11:00 am - 12:15 pm §1.3 Lecture space: GCASL, C95 mirrored inhttps://nyu.zoom.us/j/93881706252 §1.4 Session content: There are 3 kinds of sessions per week. On Monday, lectures introduce new course content each week, focusing on high level goals, concepts and algorithms. (Usually) on Wednesday, the lab focuses on the practical implementation of the lecture content in code, using real and synthetic data. On Friday, the recitation section focuses on implementation and practice of course materials. Sometimes, we will also feature guest speakers who will provide an industry perspective on class concepts. §1.5 Section: Fridays, 9.30 – 10.45 am and 2.00-3.15 pm in 31 Washington Pl (Silver), Room 405 §1.6 Prerequisites: Linear Algebra, Data Structures, Probability & Statistics §1.7 Scope: 0 to 1. Language of instruction is Python, we index from 0. §1.8 Materials (none of these is required, they are recommended depending on your background): Concepts: “Pattern Recognition and Machine Learning”, by Bishop Linear Algebra: “Linear Algebra and Learning from Data” by Strang Math: “Mathematics for Machine Learning” by Deisenroth, Faisal & Ong Coding: “Introduction to Machine Learning with Python” by Müller & Guido Machine Learning overview: “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems ” by Geron §2.0 Course grading: The total grade is calculated as follows (out of 256 points) : A) After Action Appraisals (12) 1 point / AAA 12 points total B) Basic course logistics quiz (1) 4 points / quiz 04 points total C) Capstone project (1) 64 points/project 64 points total D) Deceptive AI output (1) 4 points / output 04 points total E) Exit survey (1) 4 points / survey 04 points total F) Final interview & emergency skills test (1) 64 points / FIST 64 points total G) Groundstone survey 4 points / survey 04 points total H) Homeworks (5) 20 points / HW 100 points total Total 256 points §2.1 Grade cutoffs: A 243-256 B+ 220-229 C+ 190-199 D+ 150-169 F 64-127 A- 230-242 B B- 210-219 200-209 C C- 180-189 170-179 D 128-149 I 0-63 §2.2 Attendance and Participation: You are responsible for the material covered in this course. Consistent attendance is critical, and whereas all lectures will be recorded, this class has been optimized for live attendance/performance. You’ll get the most out of it that way. To incentivize attendance, we assign an attendance and participation grade with the AAA assignments. There are 14 weeks, and you need to complete 12 of these AAA assignments to get a full participation score. Each Wednesday, we will open up an assignment (“After Action Appraisal“ – AAA) on Brightspace. To avoid confusion, this assignment needs to be completed BEFORE the lecture on Monday of next week. By completing this reflection and digestion assignment, you affirm that you engaged with the class sessions. Slides and code are provided to aid note-taking. They are no substitute for attending the actual class. Basing AAA on slides instead of class attendance is an academic integrity violation. §2.3 Homeworks: Are designed to build skills and conceptual proficiency. There are no shortcuts. Immersion is key. Thus, there are 6 assignments which are due every few weeks. Please allow yourself enough time to complete them by getting started early. Note that whereas there are 6 homeworks, we will only count the scores on the highest 5 towards your course grade. In addition, each homework contains some extra credit questions, which counts towards the homework grade. §2.4 Capstone project: This will be something that – hopefully – sparks joy and that ties together the skills you learned in this class. We’ll release a spec sheet what it entails at a suitable time in the course, around April 1st. This project will allow you to gauge whether you enjoy solving problems with Machine Learning methods and whether the class imparted the skills to do so competently. §2.5 FIST: Whereas we anticipate – and even encourage – you to use generative AI (like chatGPT, Github Copilot or Sparrow) to do the weekly assignments and the capstone project, you need more than just be good at prompting the AI to succeed in this field. For instance, during a technical interview. There are also skills someone claiming ML expertise is just expected to have on tap, particularly in an emergency. For scalable realism, we simulate these demands in a final, cumulative and comprehensive test, asking true/false questions with a modest attempt & incorrect (a&i) penalty. For peace of mind, you can bring any notes you want, but *all* electronic devices (computer, iPad, phone, smartwatch, etc.) are banned. Note: As this is an in-person final that happens during finals week, be sure to make travel arrangements accordingly. If you miss it, you will get an incomplete. §2.6 Deceptive AI output: Prompt a generative AI to say something about ML that is incorrect. §2.7 Surveys and quizzes (B, E, G): These are low stakes assignments that will help us calibrate the class, fine-tuning them to match for your needs, wants, and competencies optimally.
CHEM1057 Analytical Challenge - 2025 The first assessment task for CHEM1057 will require the completion of a report on an analytical challenge problem. For the challenge, which you have been set, you will need to answer a set of questions and prepare a report of no more than four A4 sides in length. Note that in answering these questions; you are encouraged to do some research, looking for published studies and/or spectra as appropriate. Whilst using ‘Google’ can get you some way in your research, you should find using chemspider.com, chemicalbook.com, the NIST chemistry webbook, Sci-Finder N, and wok.mimas.ac.uk more useful. You should provide references to any papers or websites to which you may refer to in your report. A reference manager (Mendeley or Endnote etc.) can be useful in managing your references. The references do not contribute to the page count. The report needs to be submitted through e-assignments by the deadline published on BlackBoard. Note, late penalties will apply and high standards of academic integrity are expected. Early submissions are encouraged. Marking criteria Out of Marks Chemical structure and physical properties of target analytes 24 Please provide a table of the chemical structures of the target molecules (drawn in ChemDraw) and any relevant physical properties that are necessary to evaluate the use of the analytical techniques. Evaluation of 4 classes of analytical techniques 40 For each of the 5 classes of analytical technique (chromatography, infrared , UV- Vis, NMR spectroscopy, and mass spectrometry) explain how the technique may be used in a qualitative and/or quantitative analysis to determine the presence of the target analytes, highlighting any key parameters and spectral features, as appropriate. You may wish to include sample spectra. Met the specific challenge question asked 16 Make a recommendation as to the most appropriate method or methods of analysis so as to meet the challenge posed, including recommended samples handling or processing. Use of references and their attribution 10 Correctly presented references from the primary scientific literature, books, or databases/websites. Consistent use of reference style. Quality of presentation - completed to a professional level 10 Layout of report, correct use of scientific terminology, correct grammar and spelling, use of a professional writing style. Analytical Challenge Problem 3. Quality Control of Green Teas Tea is the most widely consumed drink in the world and consists of three major types: green tea (unfermented), oolong tea (semi-fermented), and black tea (fermented). Green tea contains a variety of products such as polyphenols, caffeine, theanine, and vitamins1,2 . Many characteristics are considered for judgement of the tea quality, and this is reflected in the price of the tea. The quality of a green tea is mainly assessed through its colour, flavour, and aroma and depends on the content of several compounds. A local tea green tea importer is concerned about the grading of the tea purchased from a new supplier and suspects that they are not of the quality reported. The importer has employed you to analyse the green tea samples, and you are tasked with determining suitable analytical methods for the six compounds listed caffeine, theobromine, gallic acid, epigallocatechin gallate, epicatechin-3-O gallate and epicatechin. To answer the client’s brief, you will need to confirm which of these compounds is present in the imported teas and identify suitable analytical methods for the quantification of the key marker molecules listed above. In your report, provide a chemical structure and any physical properties that would be important in selecting the analytical methods. For each of the analytical techniques available to you, listed below, determine whether the method is an appropriate choice to provide the analysis of the green tea samples. In each case, justify your answer. You must complete this for all of the techniques, not just your preferred method(s). a. Chromatography (explaining the most appropriate method for the sample or samples of interest) b. Infrared spectroscopy c. UV-Vis spectroscopy d. NMR spectroscopy e. Mass Spectrometry After selecting the most appropriate analytical technique (or combination of techniques) describe the factors you would need to consider when preparing samples of the green tea for analysis. References 1. K-H. Choi, J.S. Park, H.S. Kim, Y.H. Choi, J.H. Jeon, J-H. Lee. J. Korean Magn. Reson. Soc. 2017, 21, 119-125 2. Z. Han, M. Wen, H. Zhang, L. Zhang, X. Wan, C-T, Ho. Food Chem. 2022, 374, 131796.
SEMESTER 2 2024/25 INDIVIDUAL COURSEWORK BRIEF: Module Code: Assessment: Individual Coursework - Assignment Weighting: 30% Module Title: Managing Digital Design and Web Development Module Leader: Submission Due Date: @ 16:00 08 MAY 2025 Word Count: 1000 This assessment relatesto the following module learning outcomes: A. Knowledge and Understanding A1. Be able to identify and have a basic understanding of practices and challenges involved in managing digital design and web development; A2. Have gained an appreciation of the technologiesrequired to develop and operate websites and other web or mobile applications. B. Subject Specific Intellectual and Research Skills B2. Have an appreciation of complexity in real-world systems; B3. Be able to apply basic digital design and/or coding skills. C. Transferable and Generic Skills C3. Programming skills Coursework Brief: Activity Your task is to develop one or more scripted web pages that are meant to be part of a web application for an eco-friendly travel company dedicated to promoting sustainable tourism through adventure packages that explore breathtaking natural locations. The company offers three carefully designed packages tailored to different experience levels and durations: Diamond, Silver and Gold. The Diamond, a 4-day adventure priced at £1,200 per adult; the Silver, a 7-day journey for £2,500 per adult; and the Gold, a premium 10- day experience for £4,500 per adult. Families benefit from a 30% discount for children on all packages, making these trips more affordable. Additionally, a 15% VAT is applied to the total cost. At the moment, the business does not have a name, logo, and brand colours so it is your job to decide on these. Your web page(s) should be appealing, easy to use, and provide a package quote calculator. Specifically, your solution should contain: • HTML pages – the Home Page will introduce the brand with captivating visuals of nature and adventure activities, accompanied by a brief description of the company’s ethos of eco-tourism. The Packages Page will provide detailed information about each adventure package, including its duration, cost, and the unique experiences it offers, helping users make informed decisions. The Booking Page will feature an interactive quote calculator that allows users to select their preferred package, specify the number of adults and children, and view the total cost dynamically. This includes applying the appropriate child discounts and VAT. A Contact Page will offer a simple form. to collect user details, including full name, email, phone number, and message, with a dummy "Send" button for demonstration purposes. • Functionality/scripting – The website will feature dynamic functionality supported by a script. that fetches package details from a MySQL database (the database specification will be provided in a separate file). This script. will calculate the total cost based on the number of adults and children, applying the 30% discount for children and adding the 15% VAT. For example, if a family of two adults and one child selects the Silver package, the script. will calculate the total cost by multiplying the cost per adult by the number of adults, applying the child discount, and then calculating the VAT. The result will be displayed on the booking page, providing clear and transparent pricing. • Styling (usability and visual aesthetics) – a user-friendly and visually appealing styling that are suitable for and well applied to the problem setting, and with appropriate positioning of all elements. The layout will feature clear navigation, modern typography for readability, and high-quality images that capture the essence of eco-tourism. Importantly, to avoid any possible confusion, every page you include in your solution must clearly include the following disclaimer: “Note that this is a fictitious website that was developed by a student as part of a programming assignment. None of the content on this page is meant to be genuine nor should it be taken as such”. Also, please do not make any attempt to submit your pages to a search index or to provide any external link to them. To develop your solution, you will need to use HTML, JavaScript, and PHP server-side scripting. Use of CSS for styling is strongly encouraged; some may wish to use Bootstrap for this purpose. Importantly, your server-side script(s) must be designed to run under the existing Web server configuration used to host your personal web file store (https://student-lamp.soton.ac.uk/~your_username/). Solutions that require different PHP versions, customised server configurations, etc. will attract low marks; you may wish to verify compatibility at the beginning stages of your work. Furthermore, you are asked to produce a written report (max. 1,000 words) which discusses and justifies your main design decisions (e.g. usability considerations taken into account, visual aesthetics, choice of framework/starting template, etc.). This report should also reference any sources of information or of existing code you used, and how you applied or further customised these. It should demonstrate how you reflected on the most relevant elements of your solution; you are encouraged to justify certain choices based on further research/reading. Database specifications The database specification will be provided in due course. Necessary rights will be granted to access the database and select records from the required tables. Important note on Academic Responsibility and Conduct This is an individual assignment so your markup code and scripts must be your own work: you are not allowed to copy from other students. You are of course encouraged to look for useful information sources to support your design choices and reference them in your code. Also, you are allowed to make use of existing templates or frameworks and development environments to speed up development, or you may look for scripting code examples on the Web, in books, etc., and adapt and incorporate individual chunks of scripting code provided you acknowledge their use and the sources in code comments.
2024 EXAMINATIONS COMPUTING AND COMMUNICATIONS - In-Person, Written Exam [2.5 hrs] SCC.311 Distributed Systems Candidates are asked to answer THREE questions from FOUR; each question is worth a total of 25 marks. Use a separate answer book for each question. Question 1 The vector clock algorithm is used in distributed systems to capture the partial ordering of events across multiple nodes. It allows nodes to maintain a logical timestamp called a vector clock, which consists of an array of integers. Each element in the array corresponds to a node in the distributed system. Here's the step-by-step algorithm for updating the vector clocks: 1. Initialisation: When a node Ni starts, it initialises its local vector clock T to 2. Event Occurrence: Whenever an event occurs at node Ni (e.g., sending or receiving a packet), it increments the element T[i] by 1. This represents the occurrence of a local event at node Ni. 3. Sending a Message: When node Ni wants to send a message m, it sends the message along with its current vector clock (T) to the destination node via the network. 4. Receiving a Message: When node Ni receives a message (m, T') from another node via the network, it updates its own vector clock according to the received vector clock (T') as follows. For each index j in the vector clock, Ni takes the maximum value between its current timestamp T[j] and the corresponding timestamp T'[j] from the received vector clock. Additionally, Ni increments the timestamp T[i] element by 1 to reflect the occurrence of the receive event. 1.a. Assign the vector clocks of each event (shown with a dot) and the clocks sent in the messages in the diagram below. [13 marks] 1.b. Based on the vector clocks of the events in the diagram, indicate which events happen before C2 based on the happens-before relationship. (Note that this question is negatively marked for selecting incorrect options.) [3 marks] 1.c. Based on the vector clocks of the events, indicate which events happen after the event B2 based on the happens-before relationship (Note that this question is negatively marked for selecting incorrect options.) [2 marks] 1.d) Write Java code for a function happensBefore() which takes two vector timestamps: T 1 and T2 as arguments and returns True if T1 happens-before T2 and False otherwise. Hint: T1.length is the size of the array T1 in java. public class VectorTimestamp { public static boolean happensBefore(int[] T1, int[] T2) { } } [4 marks] 1.e. In terms of the happens-before relationship, what can you conclude about the order of the following pair of events? Justify your answer with the vector timestamp of each event. i. A3 and C2 ii. C1 and A2 iii. B2 and A4 [3 marks] [Total 25 marks] Question 2 2.a. Which of the following statements are true concerning the detection of crash failures? Note that points will be deducted for selecting incorrect choices, with the score not falling below zero. 1. It is possible to determine whether a remote host has crashed accurately on the Internet. 2. Systems should be specifically designed to accommodate erroneous failure detections. 3. If a node has actually crashed, all nodes which interact with it will eventually be able to detect that crash. 4. A failure detector usually operates by sending a message to a remote node and waiting a fixed amount of time before reporting that node as having failed. [2 marks] 2.b. i. In a system designed to be tolerant of Byzantine failures, where there are three malicious nodes, what is the minimum number of nodes in total that must be present to detect the failure accurately? [2 marks] ii. How many communication rounds are needed within this group of nodes to detect the problem using Lamport’s algorithm? [2 marks] 2.c. The below diagram shows a form of service replication. Figure 1 i. State which kind of replication you think the diagram in Figure 1 represents. [2 marks] ii. What role does node A have in this system? [1 mark] iii. What role does node B have in this system? [1 mark] iv. Is the illustrated protocol a correct implementation of this kind of replication? Justify your answer. [5 mark] 2.d. i. A distributed hash table is a decentralised approach to store files. If every host issues one request for a random file every second, as we increase the number of hosts which are part of the network, but the total number of files being stored stays the same, how would you expect the volume of traffic from the perspective of an individual node to change? Explain your answer. Note: The number of files is much larger than the number of hosts. [2 marks] ii. In a client-server system to store files, a fixed number of hosts in the system are replica servers storing the files, while the rest are clients each requesting one random file from a randomly selected server. In this scenario, let’s assume the number of clients increases with a fixed number of files; how would the volume of traffic change from the perspective of a server? Explain your answer. Note: The number of files is much larger than the number of clients. [2 marks] 2.e. Is it possible to implement a reliable failure detector using a reliable communication channel? Explain your answer. [3 marks] 2.f. Discuss why indirect communication is appropriate for mobile environments where network coverage can be poor in some areas. (2-3 sentences) [2 marks] 2.g. Name one way of achieving scalability in a distributed system. [1 mark] [Total 25 marks] Question 3 Answer the multiple-choice questions below. Note that points will be deducted for selecting incorrect choices, with the individual score for each question not falling below zero. Unless otherwise stated, each question can have multiple correct answers. 3.a. Among the following use-cases below, which are potential use cases for a consensus protocol? Please select each option that applies. a) Implementing state machine replication when there are no failures b) Maximizing network throughput between a leader and backup replicas c) Implementing FIFO group communication d) Ensuring total order delivery of messages in a distributed system prone to failures e) Ensuring consistency of state across replicas in the presence of failures [2 marks] 3.b. Please select from the options below that apply to the Paxos consensus protocol. a) It is a Peer-to-Peer approach b) It involves a two-phase commit protocol c) It provides Byzantine fault tolerance d) It provides eventual consistency across replicas e) It provides Crash fault tolerance [1 mark] 3.c. What role does the leader node play in the Raft consensus protocol? Please select each option that applies. a) Initiating elections b) Proposing entries to be appended to the distributed log c) Distributing votes to other nodes d) Monitoring the health of other (i.e., follower) nodes e) Receiving requests from clients, appending them to its log and sending them to followers [2 marks] 3.d. What are the crucial requirements for achieving consensus in distributed systems? Please select each option that applies. a) A quorum of replicas must reach an agreement b) Complete absence of network partitions, meaning that any two replicas must be able to communicate at any given time c) Asynchronous communication between replicas d) All participating replicas must unanimously agree on a decision e) Replicas must not crash due to a fault [2 marks] 3.e. Which of the following protocols is NOT applicable to build a fault-tolerant, replicated state machine? Please select each option that applies. a) Paxos b) Raft c) PBFT d) Two-phase commit e) Proof-of-Work [2 marks] 3.f. Which of the following is a disadvantage of using traditional consensus protocols like Paxos or Raft in blockchain networks? Please select each option that applies. a) Not Sybil-resistant b) Low fault tolerance c) Limited decentralisation d) Not designed to tolerate Byzantine faults e) Not designed to tolerate crash faults [2 marks] 3.g. What is the primary purpose of a distributed log in consensus protocols like Raft or Paxos? Please select each option that applies. a) Storing the state of each replica at the leader b) Recording committed transactions for which consensus is reached c) Caching frequently accessed transaction data d) Maintaining network routing information e) Recording client requests that are being replicated [2 marks] 3.h. Which of the following statements are correct about Lamport's algorithm for logical clocks? Please select each option that applies. a) Lamport clocks ensure that events are totally ordered. b) Lamport clocks rely on global synchronisation for accuracy. c) Lamport clocks can accurately capture potential causality among events. d) Lamport clocks assign unique timestamps to each event. e) Lamport clocks use vector of timestamps to each event [2 marks] 3.i. Which of the following are challenges NOT addressed by PBFT? Please select each option that applies. a) Crash fault tolerance b) Byzantine fault tolerance c) Tolerance to network partitions d) Eventual consistency among replicas e) State machine replication [1 mark] 3.j. Which of the following are true when comparing PBFT (Practical Byzantine Fault Tolerance) to Paxos and Raft? Please select each option that applies. a) PBFT can tolerate a higher percentage of faulty replicas b) PBFT requires fewer message exchanges to reach a consensus. c) PBFT can tolerate crash faults d) PBFT achieves lower latency in reaching consensus e) PBFT provides tolerance to faults that are harder to detect [2 marks] 3.k. Which of the following statements about active and passive replication? Please select each option that applies. a) Active replication does not require a primary replica b) Active replication is more suitable for a blockchain c) Passive replication is more complex to implement than active replication d) Passive replication involves replicas executing requests independently e) Active replication is not tolerant to crash faults [2 marks] 3.l. Which of the following best describes eventual consistency? Please select each option that applies. a) All replicas are guaranteed to have the same state at all times b) Consistency is achieved immediately after an update operation c) Replicas may have temporarily divergent states but will eventually converge d) Consistency is achieved through strict synchronisation of all replicas e) Eventual consistency is not applicable in distributed systems [1 mark] 3.m Which of the following statements accurately describes the relationship between eventual consistency and conflict resolution in distributed systems? Please select each option that applies. a) Eventual consistency guarantees conflict-free operation, eliminating the need for conflict- resolution mechanisms b) Eventual consistency ensures that conflicts are immediately resolved to maintain consistency across replicas c) Eventual consistency acknowledges the possibility of conflicts and provides mechanisms to detect and resolve them over time d) Eventual consistency relies solely on strong consistency to prevent conflicts from occurring e) Eventual consistency is incompatible with conflict resolution, as it prioritises availability over consistency [1 mark] 3.n Which of the following statements accurately describes the role of conflict-free replicated data types (CRDTs) in achieving eventual consistency and resolving conflicts in distributed systems? Please select each option that applies. a) CRDTs guarantee immediate consistency across all replicas, eliminating the possibility of conflicts b) CRDTs are only applicable in systems where strong consistency is prioritised over availability c) CRDTs provide data structures and algorithms that ensure conflict resolution without requiring coordination among replicas d) CRDTs rely on centralised authorities to resolve conflicts and maintain consistency e) CRDTs are incompatible with eventual consistency, as they prioritise availability over consistency. [1 mark] 3.o Which statement accurately reflects the trade-offs described by the CAP theorem? Please select each option that applies. a) In distributed systems, it is always possible to achieve both high availability and strong consistency simultaneously b) The CAP theorem states that a distributed system can achieve only two out of three properties: consistency, availability, and partition tolerance c) Consistency refers to the ability of a system to remain operational and responsive despite network partitions d) Achieving strong consistency in a distributed system requires sacrificing partition tolerance e) CAP theorem prioritises partition tolerance over both consistency and availability [2 marks] [Total 25 marks] Question 4 Answer the multiple-choice questions below. Note that points will be deducted for selecting incorrect choices, with the individual score for each question not falling below zero. Unless otherwise stated, each question can have multiple correct answers. 4.a. Which of the following statements about Java RMI (Remote Method Invocation) is correct? Please select each option that applies. a) Objects sent over the network using RMI must implement a remote interface b) Java RMI allows Java objects to invoke methods only on objects located on the same JVM c) Java RMI is suitable for client-server and not for peer-to-peer systems d) Java RMI is not suitable to execute by a caller that resides in a different host than the remote object e) Java RMI supports passing Java objects as arguments or return objects in remote method calls [2 marks] 4.b. When a client invokes a method on a remote object located on a server in Java RMI, where does the execution of the method take place? Please select each option that applies. a) The client first retrieves the object from the server and then executes the method locally b) On the server where the remote object is located c) Execution can happen on either the client side or the server side, depending on the implementation. d) Execution happens on both the client and server sides simultaneously d) Execution is handled by the RMI registry. [2 marks] 4.c. What is the primary difference between horizontal and vertical scaling? Please select each option that applies. a) Horizontal scaling increases the size of individual resources, while vertical scaling adds more resources of the same type b) Horizontal scaling adds more resources of the same type, while vertical scaling increases the size of individual resources c) Horizontal scaling involves distributing workload across multiple servers, while vertical scaling involves upgrading a single server d) Horizontal scaling is typically used for databases, while vertical scaling is used for web servers e) Horizontal scaling is more cost-effective than vertical scaling [2 marks] 4.d. Which of the following statements about space and time uncoupling are NOT correct? Please select each option that applies. a) Space and time uncoupling allows for messages to be produced and consumed asynchronously, decoupling the timing and location of message production and consumption b) Distributed shared memory is an indirect communication approach with high scalability c) Message queues are an example of space and time-coupled communication, where producers and consumers are closely synchronised in both space and time d) Space and time uncoupling enables better scalability and fault tolerance in distributed systems e) Space and time uncoupling eliminates the need for buffering messages in communication systems [2 marks] 4.e. Which ordering semantics guarantees that events are delivered in the order they were sent by the sender? Please select each option that applies. a) Global time ordering b) FIFO ordering c) Causal ordering d) Total ordering e) Synchronous ordering [2 marks] Figure 2 4.f. In Figure 2, two clients send messages to three replicas. At each replica, the incoming messages are delivered to the application in the order that they arrive. Indicate whether total ordering is achieved in the Figure above. Explain why or why not? [2 marks] 4.g. Suppose that the two clients in Figure 2, stamp each message they send with their current system time. Also assume that the two clients have perfectly synchronised system clocks. Each replica temporarily buffers an incoming message, orders the messages by their timestamp, and then delivers them to the application. Indicate whether this approach can guarantee total ordering on the Internet. Explain why or why not. [2 marks] 4.h. Which ordering semantics ensures that events are delivered in the order they occurred globally across the entire distributed system? Please select each option that applies. a) Global time ordering b) FIFO ordering c) Causal ordering d) Total ordering e) Synchronous ordering [2 marks] 4.i. In a distributed system with causal ordering, process A sends a message to processes B and C, and then C sends a message to processes A and B after receiving A’s message. Please select each option that applies. a) B must order A’s message before process C b) B must order C’s message before A’s message c) B will order A’s messages according to their order of arrival d) B can order A’s messages arbitrarily e) A’s message is related to C’s message according to happens-before relationship [2 marks] 4.j. Explain an approach that guarantees clients 1 and 2 in Figure 2 can individually achieve FIFO ordering. Describe what is required from both the client and the replicas. [2 marks] 4.k. Bob proposes the following group communication mechanism where one node among N nodes is designated as the leader. Each node sends its messages to the leader, which then broadcasts them to all the nodes using a First-In-First-Out (FIFO) broadcast. i. Is Bob’s broadcast mechanism sufficient to achieve a total ordering of messages across all the N nodes? [1 mark] ii. Explain the drawbacks of Bob’s broadcast mechanism, if any. (1-2 sentences) [2 marks] iii. Suggest one potential improvement or modification to Bob's broadcast mechanism to address its drawbacks. [2 marks] [Total 25 Marks]
ECON3106 Politics and Economics Exercises 1 1 . Definition 1. A preference ranking > over a set of alternatives A is transitive if, for any three alternatives, A, B, C ∈ A, if A > B and B > C , then A > C. There is a society with 3 individuals: i, j, k (Irma, Jakie and Kelly). Their preferences are represented as: Irma: A >i B >i C Jakie: B >j A >j C Kelly: C >k B >k A Irma proposes a system where each individual associates 3 points to his or her favourite alternative, 2 to the second and 1 to the third. The sum of each individual points will constitute the social ranking. 1.1 Show the social ranking resulting from Irma’s method. 1.2 Show that Irma’s method always gives a transitive so- cial ranking (hint: notice that in the natural numbers, i.e. 1; 2; 3; . . . , “greater than” is transitive) 2 . There is a society with 3 individuals: i, j, k (Irma, Jakie, Kelly, Louise). Their preferences are represented as: Irma: A >i C >i B Jakie: B >j A >j C Kelly: B >k C >k A Louise: A >l C >l B 2.1 Find the set of Pareto efficient alternatives The society needs to choose one of the three alternatives. As Louise and Kelly are the youngest ones, the society wishes to give their preferences extra- consideration. Consider the following social choice method: Round 1—select between B and C: each individual “votes” for the alter- native she prefers the most between the two. The alternative with most votes is selected for Round 2. In case of a tie, Louise’s preferences will determine the selected alternative. Round 2—choose between selected alternative and A: each individual“votes” for the alternative she prefers the most between the two. The alternative with most votes is chosen. In case of a tie, Kelly’s preferences will deter- mine the selected alternative. 2.2 Which alternative would be chosen if the society was to use this method? Now consider the following social choice method: Round 1—select 2 alternatives: each individual “votes” for the alternative she prefers the most between the three. If an alternative gets the most votes, then it is chosen. Otherwise, the two top alternatives are selected for Round 2. Round 2—choose between the 2 selected alternatives: each individual“votes” for the alternative she prefers the most between the two. The alternative with most votes is chosen. In case of a tie, Louise’s preferences will deter- mine the selected alternative. 2.3 Which alternative would be chosen if the society was to use this method? 3 Condorcet Method (Open Agenda) A society is composed of 3 individuals named 2, 6, and 10. There are three alternatives, whether to have one, three, or five parties. We label these three alternatives, respectively, 1, 3, and 5. For any individual i (where i is a name like 2, etc), her utility if alternative A is chosen is given by ui (A) = - (i - 2A)2 . For example, if the alternative chosen is A = 5, individual i = 6 receives utility equal to u6 (5) = - (6 - 2 × 5)2 = - (6 - 10)2 = -16. 3.1 What is the most favourite alternative for each indi- vidual? For the remaining of this exercise, assume voters vote sincerely. I also invite you to think about whether sincere voting would be a Nash equilibrium of the voting game. 3.2 Consider a majority vote between alternatives 1 and 3. Which alternative would win? 3.3 Consider a majority vote between alternatives 3 and 5. Which alternative would win? 3.4 What can we conclude about the alternative 3? 4 Arrow’s Impossibility Theorem A friend of yours proposes a system to choose between different alternatives and proves to you that this is not a dictatorship. Using Arrow’s impossibility theorem, what must you conclude? 5 Strategic Voting in Plurality Elections There is a plurality election with three candidates, {X, Y, Z}. You are a voter with preferences X > Y > Z. You read in an accurate poll that there are three types of voters: 1. circa 49% will surely vote for Z; 2. circa 48% will surely vote for Y ; 3. circa 3% have the same preferences you have, but have not yet decided for whom to vote. 5.1 What is a plurality election? 5.2 If all voters like you (group 3) vote sincerely, which candidate would you expect to win? 5.3 If all voters like you (group 3) vote strategically, which candidate would you expect to win? 6 Strategic Voting and the Swing Voter’s Curse Assume that you are a member of a jury voting by simple majority rule between two alternatives: A or B. In case of a tie, the jury will toss a fair coin to choose between the alternatives. There are other 99 jurors. You have been told the following: if A is the correct alternative, then 50 of the other voters will vote for A and 49 will vote for B ; If B is correct, then 50 of the remaining voters vote for B and 49 vote for A. That is, in each possible state, a majority of 50vs49 voters are guessing correctly. This means that you are the so called swing voter and the result of the ballot depends on you. You think that A is the correct alternative with probability 80%. 6.1 If you vote for A and your vote is pivotal (i.e. deci- sive), which alternative must be the correct one? 6.2 Is voting A a good idea for you? 6.3 If you vote for B and your vote is pivotal (i.e. deci- sive), which alternative must be the correct one? 6.4 Is voting for B a good idea for you? 6.5 If you were given the alternative between voting A, B , or abstaining, what would you prefer to do? If you have answered correctly, then you have shown an example of a result known as the swing voter’s curse. Congratulations! 7 "Majority voting aggregates information dis-persed among the voters." Comment in no more than two paragraphs. Make sure to refer clearly to major results in voting theory. 8 . There is a society with 71 individuals. Each individual’s name is a (natural) number between 1 and 71. Call i the name (number) of each individual. Her preferences are given by the utility function ui = - jA - ij where A is an alternative and jxj is the absolute value of x. The set of the alternatives is A ≡ {1, 40, 81}. 8.1 For any individual i ∈ {1; . . . ; 71}, find her bliss point. (Hint: if you make a list of 71 bliss points, you are not being very efficient). 8.2 Are the preferences of these individuals single-peaked? You might notice that there is more than one median voter. 8.3 What is their bliss point? 8.4 Suppose that the bliss point of the median voter(s) is put to vote against another alternative of your choice. What would be the result of the vote? (How many votes for each alternative?) 8.5 According to the median voter theorem, what is the unique equilibrium outcome of an open agenda method in this society? 9 . There is a society with three voters, I = {a, b, c}. Voters have preferences over three possible alternatives {0, 1, 3} as follows: 0 >a 1 >a 3; 1 >b 0 >b 3; 3 >c 1 >c 0. 9.1 Do the voters exhibit single-peaked preferences? (Pro- vide a justification for your answer) 9.2 Is there a Condorcet winner in this society? If so, which theorem guarantees its existence and why? 9.3 Which alternative is the Condorcet winner, and why? 10 . In an election there are two candidates, L and R. Both candidates only care about winning the election, i.e. they are office-motivated. There is a continuum of voters of total mass 1. A generic voter has name i. A fraction γ ∈ (1/2, 1) of the voters have income yi = yl. The remaining fraction (1 — γ) of voters have income yi = yh > yl. The set of possible alternatives is all the tax rates τ between 0 and 1. The tax is purely redistributive: if y- is the mean income, consumption for voter i is equal to ci = yi + τ (y- - yi ) . Each voter wants to maximize her own consumption. Before the election, candidates L and R choose platforms τL and τR , respec- tively. That is, each chooses a tax rate. Each voter then observes the platforms and votes for the candidate whose platform she prefers. 10.1 Express the mean income as a function of the income of the two groups? 10.2 What is the median income? How does it compare with the mean income? 10.3 What is the tax rate most preferred by a voter with income yl? 10.4 Use the theorems seen in class to predict the plat- forms of the two candidates and the policy that will be implemented by this society.
SOSC1449 Understanding Our Economy Assignment 2 Due date: 6th of May 11:59 pm 1. (5 points) Derive the IS curve, where using the following five equations: 2. (5 points) Draw the IS curve on a diagram with the real interest rate on the y-axis and short-run output on the x-axis. Using the diagram, illustrate and explain the effect on the IS curve if the government increases the investment tax credit to promote investment. 3. (5 points) Draw the IS curve on a diagram with the real interest rate on the y-axis and short-run output on the x-axis. Using the diagram, illustrate and explain the effect on the IS curve if the central bank lowers the nominal interest rate. 4. (5 points) Suppose consumption and short-run output have a relationship where is between 0 and 1. Mathematically derive the new IS curve shown below based on this consumption function. [Hint: Substitute and then rearrange the equation so that is only on one side of the equation.] 5. (5 points) On a single diagram with the real interest rate on the y-axis and short-run output on the x-axis, draw and label the IS curves derived in Questions 1 and 4. Clearly distinguish between the two curves. 6. (5 points) Using the IS curve and the monetary policy rule provided in the lecture slides, mathematically derive the Aggregate Demand (AD) curve. Draw the AD curve on a diagram with the inflation rate on the y-axis and short-run output on the x-axis. 7. (5 points) Using the Phillips curve, where the expected inflation rate is mathematically derive the Aggregate Supply (AS) curve. Draw the AS curve on a diagram with the inflation rate on the y-axis and short-run output on the x-axis. 8. (25 points) Draw the AS and AD curves on a diagram with the inflation rate on the y-axis and short-run output on the x-axis. Use the diagram to analyze the relationship between inflation and output when the central bank lowers its inflation target, Follow each step below. Provide a discussion and illustrate all relevant shifts and outcomes. a. (5 points) Identify and explain which curve (AS or AD) shifts when the central bank lowers its inflation target. Use the diagram to support your explanation. b. (5 points) After the central bank lowers the inflation target at time 1, what is the level of inflation? Explain your answer, referencing the AS-AD framework. c. (5 points) At time 2, what is the equilibrium level of inflation? Using your answers from Questions a and b, provide the precise level and explain how it is determined. d. (5 points) At time 3, what is the equilibrium level of inflation? Using your answers from Questions a and b, provide the precise level and explain how it is determined. e. (5 points) As time progresses, to what level does the equilibrium inflation rate converge? Explain why, referencing the AS-AD framework. 9. (25 points) How would this analysis be different if the AS curve is based on expectation πt = a. (5 points) Draw the AS and AD curves on a diagram and show which curve shifts when the central bank lowers the inflation target, b. (5 points) After the central bank lowers the inflation target at time 1, what is the level of inflation? Explain your answer. c. (5 points) At time 2, what is the equilibrium level of inflation? Provide the precise level, referencing your answers from Question a and b. d. (5 points) As time progresses, to which level does the equilibrium inflation rate converge? Explain why. e. (5 points) Compare the analyses in Questions 8 and 9. Describe the main difference in the inflation and output dynamics due to the different expectations assumptions. 10. (5 points) If the monetary policy rule is where γ > 0 that the central bank cares not only about inflation, but also about the short-run deviation. Compared to which policy rule incentivizes the central bank to raise the interest rate more aggressively, and why? 11. (5 points) Using the new policy rule, mathematically derive the revised version of the AD curve as shown below [Hint: Substitute and then rearrange the equation so that is only one side of the equation.] 12. (5 points) On a single diagram with the inflation rate on the y-axis and short-run output on the x-axis, draw and label the AD curves derived in Questions 6 and 11. Clearly distinguish between the two curves.
Phase locking of optical sources and impact on system Example 1 - Phase difference between lasers Two free-running lasers, each with linewidth 250 kHz, are used as the signal laser and the LO in a coherent optical transmission system. What is the rms increase in the phase difference between the laser over a time interval of 1 µs? Example 2 - OPLL with delay A prototype first-order OPLL is constructed from optical components with fibre pigtails. The total length of fibre in the loop is 2 m. What is the maximum loop bandwidth if the propagation delay through the fibre is the dominant delay?
COMP222 2025 Second CA Assignment Individual Coursework Implement a game using an existing game engine Assignment 2 (of 2) Weighting: 15% Deadline: 10am on Thursday, 8th May. Standard UoL late penalties apply Submission on Canvas: Submit 3 files. • A video – typically up to 5 minutes long which shows your game in action. The production values are not important, but it should include the aspects of your game that demonstrate meeting the marking criteria (below). You may annotate the video with voice or subtitles to highlight key areas, but this is not required. • A zip file that contains the project source for your game. This should contain the code that you developed for the game. • A 1-2 page pdf report – details below. There is no word limit for this, but it should be a short summary of how you made the individual aspects of the game. This may be longer if you have made extensive use of 3rd party assets. Learning outcomes assessed 2. An appreciation of the fundamental concepts associated with game development: game physics, game artificial intelligence, content generation; 3. The ability to implement a simple game using an existing game engine Implement a simple game, of your choice, using a game engine such as Unity, Unreal, Godot, JMonkeyEngine. The game does not have to be extremely complex, it is recognised that this assignment accounts for 15% of the module total. However, the game should contain certain aspects, which are detailed below. If you wish to submit a game that does not contain any one of these aspects, such as game physics, then you should request this before submission, by email, so that you can receive appropriate advice. A suggested game is an implementation of the sport of Curling, which will be used in the examples below. https://en.m.wikipedia.org/wiki/Curling (There is no need to keep strictly to the rules for any game). Other suitable examples are: Mini Golf – Take turns to hit a ball, avoiding obstacles, into a hole. Simple Car Race – Drive a car object around a course, or obstacles, to reach a goal. Hurdle Race – Run and jump over obstacles along a course. Ball Drop Puzzle – Drop a ball into a series of ramps and other objects to reach a target. Some of these games will have extensive tutorials available online. If you follow these, you must declare this as an asset used – andyou should make substantial changes to make your own contribution clear. The marking criteria below will give concrete examples using Curling, but it is not a requirement to follow for this assignment. In each case, there is a minimal example of what would be acceptable for at least a 50% mark, and examples of other features that could be implemented to achieve higher marks. Marking Criteria The aspects required, and the marks awarded for each are as follows. Scene / environment 15% The game should take place in some World, that is visible to the player. This can be generated by code or by using the design tools within the game engine. In a Curling game, this will minimally consist of an Ice Sheet, 4 walls surrounding it, and a number of Curling ‘stones’. This could additionally include other aspects such as appropriate textures, an arena, visible players, crowd, lighting effects, electronic scoreboard, etc. Interaction 20% There must be some way for the player to control the game. In the simplest form, this could be to react in a pre-determined way when a key is pressed, but more advanced controls are preferrable. E.g., these could include using the mouse movements to simulate the sliding of the Stone, the ‘curling’ of the stone to make the path bend, and the various moving and Sweeping actions used in curling. Physics 25% The game should behave in some ‘realistic’ way, including motion, friction, and collisions. For Curling, a minimal example would require the stone to move and slow down after release, and to bounce from the edges of the rink. More complex physics might include ‘curling’ the stone, sweeping to reduce friction, and complex multi-body collisions between the stones. The game of curling is largely 2-dimensional, so you may want to be creative to enable complex 3-dimensional movement; dropping the stones onto the ice would give a simple demonstration. Structure 20% The gameplay should follow some structure. The game will start, follow some path depending on the user input, and come to an end at some point. There may be a scoring mechanism, different levels, or different paths to take during the game. In Curling, players should take it in turns to each play a ‘stone’ until there are none left - and the correct scoring rules followed. A more complex structure could involve playing multiple ‘ends’, following rules for ties, and a tournament structure where multiple teams can take part. You can include splash screens or menu systems. Extra / Creativity 20% There are 2 options that you can choose from. You can either concentrate on good game design principles and creativity, or on advanced technical aspects. Your report should contain a section describing which of these you have chosen. Option 1: Game design Describe a number of design principles that you have followed – explain where the principle has come from (lecture notes, books, articles), and what you did in your game design to meet this principle. This will be hard to do well with a simple curling game, and other game types would offer more scope. Option 2: Technical proficiency These can be in any form. – advance physics, game structure, AI, etc. Your report should highlight any advanced aspects that you have implemented. This could include using game assets, such as a Physics engine, but you should also develop a reasonable amount of your own code. Report Description Your report should briefly describe the aspects that you have implemented for each of the 5 Marking Criteria. This can be in the form. of bullet points and does not need extensive description. Instructions If necessary, give a paragraph explaining how to play your game and what controls would be used. This is not required if all instructions are described in the game video. Assets Your report should also Give a List all Assets that you used in creating your game, with a note or URL to explain where they came from. Assets include any Textures, 3D models, behaviour scripts. Please state if you created them yourself, and whether any AI assistance was used (including advanced auto-complete features, such as Github CoPilot). If you have followed any online tutorials then you MUST include the URL. Any use of Assets that are used, but not declared, may be classed as academic misconduct. Notes. 1. You do not need to use professional game assets as these do not form. part of the marking criteria. 2. The quality of your code will be marked (e.g.; class structure, clear functions, naming conventions, representation of game state), but the game you create is more important. 3. You must submit a viewable video of your game; if we cannot view the video on submission, due to insufficient permissions for example, then you may not get a mark higher than 50%. 4. Your game does not need to include any AI, as this has been assessed separately. Novel AI techniques may still be awarded marks under “Technical proficiency”. 5. Please include advance visible warnings if any content of your game could be shocking or offensive. 6. Do not leave it until the last minute to submit videos. Processing and upload can take some time, and anything uploaded after the deadline will automatically be considered as late, and penalties will be applied.
Applications of photonic systems Microwave photonics link Let’s assume a fiber-optic link based on an externally modulated laser. For the purpose of this exercise we can further assume that both the transmitter and the receiver are impedance matched. Question 1: An external modulator has an optical loss of 5 dB, Vπ of 5V and input resistance of 50 Ω . Find the ratio of the squared modulated optical power to the electrical input power for optical input power to the modulator of 10 mW. Question 2: A detector has responsivity of 0.9 A/W and is connected to a load of resistance of 50 Ω . What is the ratio of electrical output power to squared modulated optical input power? Question 3: If the modulator in question 1 is connected to an optical link with 10 dB loss, followed by the detector in question 2, what will be the electrical gain of the link? Question 4: If the laser input power to the modulator is increased to 100 mW, how much will the electrical gain of the link increase by? When in trouble, refer to: 1 C. H. Cox, G. E. Betts and L. M. Johnson, "An analytic and experimental comparison of direct and external modulation in analog fiber-optic links," in IEEE Transactions on Microwave Theory and Techniques, vol. 38, no. 5, pp. 501-509, May 1990, doi: 10.1109/22.54917. 2 Charles H. Cox, III “Analog Optica Links Theory and Practice”, Cambridge University Press, 2004, https://doi.org/10.1017/CBO9780511536632 Question 5 - Optical heterodyning If two lasers (λ1= 1550nm, λ1= 1549nm) are combined at a photdodiode, what frequency signal will be generated at the output? Question 6 - Optical ring resonators What would be the circumference of an optical ring resonator if its free spectral range (FSR) is 20 GHz. Consider silicon nitride on silicon oxide platform. with group refractive index of 1.72 and a minimum bending radius of 80 um. What is the propagation loss through ring resonators?
ECON10071/20071 - 2023/24 Week 9 - Practice Exercises Hypothesis Testing and Confidence Intervals 1. The YouGov Voting Intention opinion poll (for end of March 2023) delivers the following table of results to the question ”Which of the following do you think would make the best Prime Minister?”. Obs Percentages Prime Minister Age 18-24 25-49 50-64 65+ Rishi Sunak 18% 19% 27% 42% Keir Starmer 28% 34% 30% 25% Not Sure 41% 42% 41% 31% Refused 13% 5% 2% 2% Total 100% 100% 100% 100% Test whether the variables (preferred prime minister, P , and age, A) are independent. 2. In order to see whether their random sample has the same average age as the intended population (voting age population), YouGov wishes to calculate a confidence interval for the average age, µ, using a sample of n = 2003. Calculate a 99% confidence interval. The sample average age is 46.3 years. From excellent population data from the last census we know that the population variance in age is σ 2 = 90. 3. Using the same data, calculate a 95% confidence interval for the population percentage of Rishi Sunak and Keir Starmer supporters amongst the 19 to 24 year olds (separate confidence intervals for both party leaders). Test whether the variables (preferred prime minister, P , and age, A) are independent. Age 18-24 25-49 50-64 65+ Total Total 210 827 495 471 2003 Obs number Prime Minister(Pi ) Age(Aj ) 18-24 25-49 50-64 65+ Total Rishi Sunak 38 157 198 Keir Starmer 59 149 118 Not Sure 86 347 203 146 782 Refused 27 41 10 9 88 Total 210 827 495 471 2003 Expected number Prime Minister(Pi ) Age(Aj ) 18-24 25-49 50-64 65+ Rishi Sunak 55.1473 217.1752 123.6875 Keir Starmer 63.5347 149.7604 142.4993 Not Sure 81.9870 322.8727 193.2551 183.8852 Refused 9.2262 36.3335 21.7474 20.6930 Joint probabilities Prime Minister(Pi ) Age (Aj ) 18-24 25-49 50-64 65+ Pr(Pi ) Rishi Sunak 0.0275 0.1084 0.0618 Keir Starmer 0.0317 0.0748 0.0711 Not Sure 0.0409 0.1612 0.0965 0.0918 0.3904 Refused 0.0046 0.0181 0.0109 0.0103 0.0439 Pr(Aj ) 0.1048 0.2351 1
SUMMATIVE ASSIGNMENT 2 – BUSI4AY15 Business Analytics Masters Programmes 2024/25 For this assignment, you will be provided a data set in Excel and an Excel answer sheet. At the bottom of this assignment, you will find a list of exercises to execute on the data set provided. You are to input your answers into your Excel answer sheet and submit it on Blackboard. SUBMISSION INSTRUCTIONS A penalty will be applied for work uploaded after 11:59am as detailed in the Late submission policy. You must leave sufficient time to fully complete the upload process before the deadline and check that you have received a receipt. At peak periods, it can take up to 30 minutes for a receipt to be generated. FORMAT You are to submit the Excel answer sheet with the file name adequately changed as instructed. You are not to alter the structure of the Excel answer sheet. The Excel file should be kept in the .xlsx format. MARKING GUIDELINES The number of marks carried by each question in the exercise is indicated clearly in this assignment. PLAGIARISM AND COLLUSION Note that your data set is unique to you, correspondingly, the answers that you will obtain will also be unique to you. Students suspected of plagiarism, either of published work or the work of other students, or of collusion will be dealt with according to School and University guidelines. SPECIFIC INSTRUCTIONS 1. You will be able to find your data set for this assignment in the “Data Sets” folder. All of the files in this folder are named “yourZnumber_Number1_Number2.xlsx”. You are to find the data set corresponding to your Z number. Download this data set. 2. You are strongly advised to save a copy of the data set in a safe location, in the unlikely but potential event that you accidentally overwrite the data during the process of your analysis. 3. Note that all of your colleagues have been provided different datasets, and consequently will arrive at different correct answers for the assignment. As such, please ensure that you use the data set that corresponds to your Z number and not the data sets of any of your colleagues. 4. In the assignment folder, you will also find an Excel file that is labelled “yourZnumber_SA2.xlsx”. This is your answer sheet, which you use to fill your responses in for each question and which you will submit on Blackboard. Change yourZnumber in the file name to your Z number right now. 5. As your assignment is machine-graded, any minute error in your file name will render it unreadable by the machine and will lead to a complete loss of marks, so please ensure that the file name is correct. 6. The Excel answer file contains two columns. The first column lists the question numbers to which an answer is expected in the Excel answer file. In the second column, you are to key in your answer to the corresponding question there. Do not alter the structure of the Excel file, namely, do not add new rows or columns or key in any values outside of the demarcated area. 7. You will be required to key different types of answers in a specific form. a. For numerical answers, please leave your answers with at least 3 significant figures (in other words, with at least 3 non-zero digits, e.g. you may reflect “12.345” as “12.3” and “0.01234” as “0.0123”). If your numerical answer is a whole number, leave them as such (e.g. you may leave “3” as “3” as opposed to “3.00” that is in 3 significant figures). b. If you are required to report probabilities, for example, p-values, if the numerical value is smaller than 0.0001 or 1e-4, please report the value as 0. If you are asked to report probabilities or proportion or percentages, please reflect the value in decimals (e.g. report 78% as 0.78). Never reflect your answer as a fraction (e.g. for 1/3, instead use 0.333 in 3 significant figures). c. For multiple choice questions in this assignment, feel free to leave your responses in any of large or small capitals. 8. Wrong answers have a potential to carry partial marks. 9. This assignment comprises a total of 43 questions, amounting to a total of 100 points. ASSIGNMENT QUESTIONS You will be able to download your data set from Blackboard. Your data set will contain a total of 600 data points. Please split the data set into a training and testing data set. The training data set will comprise the first 500 data points and the testing data set will comprise the last 100 data points. Unless explicitly stated that it is for testing, the training data set should always be used. You are strongly advised to save a copy of the data set in a safe location, in the unlikely but potential event that you accidentally overwrite the data during the process of your analysis. In this data frame, you will find 10 columns. The column ‘y’ is to be assumed as the outcome / dependent variable. The other columns labelled from ‘x1’ to ‘x9’ are to be assumed as the predictors / independent variables. To verify that you have downloaded the correct data set and split the training and testing data correctly, please calculate the mean and variance of y for both the training and testing data sets. In the file name of your data set, you see the following “yourZnumber_Number1_Number2_Number3_Number4.xlsx”. Numbers 1 and 2 would be the mean and variance for y in the training data set; and numbers 3 and 4 are the mean and variances for the testing data set. Do not proceed if these numbers are not correct! Verify with your tutor if they are incorrect. For this assignment, unless otherwise stated, we will use a significance level of 5%. Question 1 1 point Find the mean of ‘y’. Question 2 3 points Compute the correlations of all 9 predictors ‘x1’ to ‘x9’ against the outcome ‘y’. Which of these predictors has the strongest correlation with ‘y’? Your answer should be in the format: ‘x’, e.g. ‘x1’. Question 3 2 points What is the consequence of answering Question 2 in the context of building a linear regression model? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A Predictors with a high correlation would indicate multi-collinearity and should not be included as predictors in the linear regression model. B Predictors with a high correlation would indicate multi-collinearity and should be included as predictors in the linear regression model. C Predictors with a low correlation would indicate a weak relationship with the outcome ‘y’ and should not be included as predictors in the linear regression model. D Predictors with a low correlation would indicate a weak linear relationship with the outcome ‘y’ and should still be included as predictors in the linear regression model, as we cannot rule out deeper non-linear relationships with the outcome. Question 4 3 points Build a linear regression model with ‘y’ as the outcome and all of the ‘x’ variables as the predictors. Let us call this Model A. Report the adjusted R-squared value. Question 5 1 point What percentage of the variance in ‘y’ is not explained by your current linear regression model in Question 4? Question 6 1 point What is the p-value for the variable ‘x4’ in your linear regression model? Question 7 2 points Depending on your answer for Question 6, what can you conclude about whether or not there is a relationship between the outcome variable ‘y’ and the feature ‘x4’? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A We do not have enough data or evidence to say that there is a relationship between outcome ‘y’ and the feature ‘x4’. B We do not have enough evidence to say that there is definitely no relationship between outcome ‘y’ and the feature ‘x4’, but we can at least say that we do not have enough data to conclude that this relationship is linear. C There is a linear relationship between outcome ‘y’ and feature ‘x4’. D There is a relationship between outcome ‘y’ and feature ‘x4’, but we do not yet have enough data to conclude if this relationship is linear. Question 8 3 points Build a linear regression model with ‘y’ as the outcome, but with only the top 5 ‘x’ variables with the highest magnitude of correlation with ‘y’. Let us call this Model B. Report the adjusted R-squared value. Question 9 2 points Run a test that verifies if the R-squared value for Model B is significantly different from the R-squared value for Model A. Report the associated p-value for this test. Question 10 2 points Depending on your answer for Question 9, what can you conclude about whether or not it is a good idea to omit the ‘x’ variables with small magnitude of correlation with the outcome ‘y’? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A The p-value is small, thus indicating that there is no significant difference between the models. As such, we are justified in omitting the ‘x’ variables. B The p-value is small, thus indicating that there is a significant difference between the models. As such, we are not justified in omitting the ‘x’ variables. C The p-value is large, thus indicating that there is no significant difference between the models. As such, we are justified in omitting the ‘x’ variables. D The p-value is large, thus indicating that there is no significant difference between the models. However, we should not rush to the conclusion that these ‘x’ variables play no part in the outcome ‘y’. Question 11 1 point You would notice that there is an ‘x’ variable that has a negative correlation with the outcome ‘y’ and with magnitude at least 0.2. Which variable is that? Your answer should be in the format: ‘x’, e.g. ‘x1’. Question 12 3 points Using your analysis from Model A, plot a scatter plot of the residuals against the variable you identified in Question 11. What can you conclude from the plot? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A There are non-linearities in the model. B There is multi-collinearity in the model. C There is an interaction effect involving this variable in the model. D None of the above assumptions of linear regression has been violated. Question 13 3 points In your data set, create a new variable that is the exponential of the variable you identified in Question 11. Run a linear regression model with all the ‘x’ variables plus this variable. Let us call this Model C. Report the adjusted R-squared value. Question 14 1 point Report the p-value for this new variable in Model C. Question 15 2 points Run a test that verifies if the R-squared value for Model C is significantly different from the R-squared value for Model A. Report the associated p-value for this test. Question 16 1 point Report the p-value for the variable you identified in Question 11 in Model C. Question 17 1 point Based on your answers from Questions 14 to 16, what can you conclude? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A Adding the new variable led to a significantly better model. However, we should not keep the original variable from Question 11, but keep only the new variable in the regression model. B Adding the new variable led to a significantly better model. Moreover, both the original variable from Question 11 and the new variable should be retained in the regression model. C Adding the new variable did not lead to a significantly better model. However, we should keep the new variable in the regression model. D Adding the new variable did not lead to a significantly better model. Moreover, we should delete the variable in Question 11 and the new variable from the regression model. Question 18 3 points Plot two scatter plots as follows: (I) The outcome ‘y’ against the predictor ‘x9’ only for the data points where ‘x1’ is larger or equal to 0 (II) The outcome ‘y’ against the predictor ‘x9’ only for the data points where ‘x1’ is less than 0 Compare the two scatter plots. What can you conclude from them? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A There are non-linearities in the model. B There is multi-collinearity in the model. C There is an interaction effect involving this variable in the model. D None of the above assumptions of linear regression has been violated. Question 19 5 points Create a new variable that addresses the problem in Question 18. Now, modify Model C to include this new variable. Let us call this Model D. Report the adjusted R-squared value. Question 20 2 points Run a test that verifies if the R-squared value for Model D is significantly different from the R-squared value for Model C. Report the associated p-value for this test. Question 21 3 points Compute the correlations of all the other 8 predictors against the predictor ‘x8’. Which of these predictors has the strongest correlation with ‘x8’? Your answer should be in the format: ‘x’, e.g. ‘x1’. Question 22 1 point What might Question 21 be indicative of? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A There are non-linearities in the model. B There is multi-collinearity in the model. C There is an interaction effect involving this variable in the model. D None of the above assumptions of linear regression has been violated. Question 23 5 points Using a more precise test than correlations, re-perform. the analysis in Question 21 using Model D. If the cut-off for the test is set at 5, how many variables should be omitted from the model? Question 24 4 points Using your analysis from Question 23, modify Model D to arrive at a new model, but keeping variable ‘x8’. Let us call this Model E. At this point, remove all non-significant variables from Model E. You should be left with 5 predictors. Report the adjusted R-squared value. Question 25 2 points Run a test to verify that Model E is indeed better than Model D. What can you conclude from your test? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A Removing the variables led to a significantly worse model, because the p-value of the test was significant. B Removing the variables led to a significantly worse model, because the p-value of the test was not significant. C Removing the variables did not lead to a significantly better model, because the p-value of the test was significant. D Removing the variables did not lead to a significantly better model, because the p-value of the test was not significant. Question 26 2 points On the training data set, calculate the root mean squared error for Model E. Question 27 2 points On the testing data set, calculate the root mean squared error for Model E. Checkpoint Before proceeding onto the next part of this exercise, verify that you are now indeed left with just 5 predictors in Model E. These might not be ‘x1’ to ‘x9’ and might be some transformation of them. If you are not left with 5 predictors, you might want to verify your working. Additionally, you might want to remove any non-significant variables that are still left in Model E. Copy the 5 predictors in Model E into a new sheet. To this new sheet, create a new variable that is 1 if ‘y’ is greater or equal to 0 and 0 if ‘y’ is less than 0. From now on, we will refer to this new binary variable as ‘z’, and it will be treated as the outcome variable. Question 28 3 points Build a logistic regression model with ‘z’ as the outcome variable and these 5 predictors. Call this Model F. Report the AUC. Question 29 2 points What is the significance of the value in Question 28? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A It represents the correlation between the outcome ‘z’ and 5 predictors, so the larger the value the better the model. B It should not be less than 0.5, otherwise the model is no better than predicting every point as yes or no. C It is an objective measure of the goodness of the model without having to resort to a testing data set; a high value will guarantee good performance even in a testing data set. D A high value shows that there is a high trade-off between sensitivity and specificity, indicating that the model is predicting poorly. Question 30 1 point Set the cut-off probability to 0.5. What is the in-sample accuracy of Model F? Question 31 1 point Set the cut-off probability to 0.5. What is the out-of-sample accuracy of Model F? Question 32 2 points Based on your answer to Questions 30 and 31, what can you conclude about Model F? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A Model F seems to generalize out-of-sample as its out-of-sample accuracy is comparable to its in-sample accuracy. B Model F does not generalize out-of-sample as its out-of-sample accuracy is not comparable to its in-sample accuracy. C Model F seems to generalize out-of-sample as its out-of-sample accuracy is higher than its in-sample accuracy. D Model F does not generalize out-of-sample as its out-of-sample accuracy is lower than its in-sample accuracy. Question 33 1 point Set the cut-off probability to 0.5. What is the in-sample recall of Model F? Question 34 1 point Set the cut-off probability to 0.5. What is the in-sample specificity of Model F? Question 35 2 points If the cut-off probability is changed to 0.4, how do we expect the answers to Questions 33 and 34 to change? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A The recall will increase and the specificity will decrease. B The recall will increase and the specificity will increase. C The recall will decrease and the specificity will decrease. D The recall will decrease and the specificity will increase. Question 36 5 points In this example, we verify what happens if we use Model E to solve the classification problem of whether or not outcome ‘y’ is greater or equal to 0. With Model E, form. predictions according to your final model with the same 5 predictors as in Model F on the testing data set. If the outcome or prediction was greater or equal to 0, then we record that as a positive, and vice versa. Based on this, form. a confusion matrix for Model E. Report the out-of-sample accuracy for Model E. Question 37 3 points Compare your answers between Questions 31 and 36. What can you conclude? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A The accuracy for the linear regression model is higher because it is forms continuous predictions, which enables the decision maker to better draw the line dividing the regions where outcome ‘y’ is larger than 0 or not. B The accuracy for the logistic regression model is higher because it solves the simpler problem of simply deciding if outcome ‘y’ is larger than 0 or not, rather than predicting the actual value of ‘y’. Thus, it can afford to be more precise. C There is numerically a difference, but this difference is not significant, as both models use the same underlying linear structure to form. predictions and should therefore arrive at similar results. D There is no reasonable way to conclude on the differences between these two models; it depends on how the predictors relate to the outcome. Question 38 2 points Build a classification tree model with ‘z’ as the outcome variable and the same 5 predictors. Call this Model G. Use default parameters of depth = 4, cp = 0.01, minsplit = 20, minbucket = 5. Report the out-of-sample accuracy. Question 39 3 points Compare your answers between Questions 31 and 38. What can you conclude? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A We can conclusively decide which model is better based on the accuracy, because this is an out-of-sample statistic. B Even if the regression tree model posted a higher accuracy, it is too hasty to conclude that it has a better performance than the logistic regression model as one could potentially shift the cut-off probabilities in the latter to arrive at a better model. C Even if the regression tree model posted a lower accuracy, it is too hasty to conclude that it has a poorer performance than the logistic regression model as one could still change the parameters of the tree. D It is unreasonable to compare the accuracies as they arise from different models. Question 40 2 points Find the leaf that has a proportion of class ‘1’ that is closest to 50%. Report this proportion. Question 41 3 points What is the significance of your answer to Question 39? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A This proportion forms our benchmark. Our out-of-sample accuracy should minimally beat this proportion. B If this proportion is very close to 50%, then we should reduce the depth of the tree so that the data points in this leaf can be diluted with other leaves. C If this proportion is very far away from 50%, then this might be a sign of overfitting in the tree. D If this proportion is very close to 50%, then it indicates the presence of leaves where the model is very much undecided on the correct outcome and might indicate poor performance. Question 42 5 points Build a hierarchical clustering model with the 5 predictors in Models E, F and G using the training data set. Use the setting of 4 clusters. Using only training data points in the largest cluster, build a logistic regression model with the 5 predictors and outcome ‘z’. Call this Model H. Report the AUC. Question 43 3 points How would you expect your answer to Question 42 to differ from Question 28? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A The AUC in Model H is higher than the AUC in Model F, because it has fewer data points. B The AUC in Model H is higher than the AUC in Model F, because the data points used in Model H belong to the same cluster, thus exhibit a more consistent relationship. C The AUC in Model H is lower than the AUC in Model F, because it has fewer data points. D The AUC in Model H is lower than the AUC in Model F, because the way data points are selected into Model H is not based on the outcome variable ‘z’.
INFO20003 Semester 1, 2025 Assignment 2: SQL Due: 11:59pm Friday, 2 May, 2025 Weighting: 10% of your total assessment EV-XYZ: Electric vehicle and charger database Description EV-XYZ is a platform you’re creating to help keep track of its electric vehicles, charging stations, and charging activities. An electric vehicle (EV) charging station provides charging facilities with different charging rates and costs to the electric vehicles. The charging stations can also be associated with other facilities like cafés and restaurants. Charging station For each charging station, the system records its details, that are – the address of the charging station (as street address, suburb, state, postcode), and the establishment date. Each charging station is also associated with at least one ‘company’ that owns that charging station. A charging station can be jointly owned by multiple companies. Each charging station has at least one charging ‘outlet’ where electric vehicles can plug-in for charging. An outlet of a charging station can be uniquely identified with the charging station’s ID and the outlet’s ID, as ‘charging station ID X, outlet ID Y ’. Each outlet has a charging rate in kW (e.g. 120), and the charging cost per kwh is also recorded (in $/kWh, e.g. 0.25 $/kWh). Different outlets of the same charging station can have different charging costs. The system also stores information about ‘facilities’ (e.g., a café or restaurant), if they are associated with a charging station. A facility can provide discount coupons, which can be used for discounted rates of a ‘charging event ’. For each coupon, the system stores some values of the coupon, which are – the unique coupon ID, and discount value. A coupon can only be issued by one facility and used in at most one charging event. Electric vehicle (EV) + People Each electric vehicle is associated with a unique vehicle identification number (VIN), manufacturer company, model name, year, capacity of the battery (in kWh, e.g. 60kWh). For each manufacturer company - the name of the company, a unique ABN number, and the current CEO’s name are stored. Sometimes an EV company is owned by a parent EV company, which the model also stores. Each electric vehicle is registered to one person. For each person, the system stores that person’s (unique) driving license number, and their name. One person can have multiple electric vehicles registered with them. Charging event The system maintains the information of all charging events – that is, which electric vehicle is charged at which outlet of a charging station. When a person wants to charge a car, they request to charge at a particular charging station. The person who charges the car may not necessarily be the car’s registered owner, so we record the license number of the person who is charging. Once an outlet is available, the system will assign an outlet to the person, and they may use it to start charging. The kWh a charge event consumed is also recorded after charging is completed. A charging event may or may not use a discount coupon, where the coupon can only be from one of the facilities. A discount coupon represents a ‘percentage discount’ (e.g. a value of 0.5 indicates a 50% discount). Data Model Figure 1: The physical ER model of EV-XYZ Assignment 2 Setup A dataset is provided which you can use when developing your solutions. To set up the dataset, download the file ev_2025.sql from the Assignment link on Canvas and run it in Workbench. This script creates the database tables and populates them with data. Note that this dataset is provided for you to experiment with, but it is not the same dataset as what your queries will be tested against (the schema will stay the same, but the data itself may be different). This means when designing your queries you must consider edge cases even if they are not represented in this particular data set, and should not hardcode information like IDs into your queries. The script. is designed to run against your account on the Engineering IT server (info20003db.eng.unimelb.edu.au). If you want to install the schema on your own MySQL Server installation, uncomment the lines at the beginning of the script. Do NOT disable only_full_group_by mode when completing this assignment. This mode is the default, and is turned on in all default installs of MySQL workbench. You can check whether it is turned on by running the query SELECT @@sql_mode;. The command should return a string containing “ONLY_FULL_GROUP_BY” or “ANSI”. When testing, our test server WILL have this mode turned on, and if your query fails due to this, you will lose marks. The SQL tasks In this section are listed 10 questions for you to answer. Write one (single) SQL statement per question. Subqueries and nesting are allowed within a single SQL statement In general, we care more about correctness than constructing the ‘most efficient’ query (computationally, or in terms of number of characters/lines). However, you may be penalized for writing overly complicated SQL statements (e.g the query is 2-3x longer than required, using superfluous joins, etc), using very poor formatting, using very poor alias naming, or other decisions that make it hard for us to read/understand what you’re trying to do when marking! DO NOT USE VIEWS (or ‘WITH’ statements/common table expressions) to answer questions. 1. Find the model name and model year of the vehicle with the highest battery capacity. If there are ties, return a row for each of those model name and year with equal highest capacity. Your query should return results of the form. (model_name, model_year, battery_capacity). (1 mark) 2. Find all the charging stations with at least one outlet of 100 or higher charging rate. Do not repeat the same station multiple times in the result if it has multiple outlets which meet the criteria. Your query should return results of the form. (station_id, state, postcode). (1 mark) 3. Find all the charging stations that do not have any facility associated with them. Your query should return results of the form. (station_id). (1 mark) 4. Find all the people who have electric vehicles registered in their name, where that vehicle has no charging event in the database. Only include people with at least one car registered to them that meets this criteria. Your query should return (license_number, name, total_num_of_cars_ with_no_charge_event_registered_to_person), ordered by name in increasing order. (2 marks) 5. Find all facilities that have ever issued a coupon, but had no coupons redeemed on “2025-01- 01” (i.e., no charging event requested charging using that coupon on that day). Your query should return all such facilities in the form (facility_id). (2 marks) 6. Find all vehicle models and model years that, on average, charge more than 50kWh when they charge at outlets with a charging rate > 68 kW. If a charging event has NULL for kWh value, it should not be considered in the average. The average_kwh must be rounded to two decimal places (hint: use the `Round` function). Return results as (model_name, model_year, company_name, rounded_average_kwh). (2 marks). 7. Find the total number of vehicles manufactured by the company with an ABN of ‘1’, or any of that company’s child or grandchild companies. Your query should return a single value of the form (total_number_manufactured) (2 marks). Further clarification for Q7: If a company X is owned by company Y, then X is the child company of Y. If company Y is owned by company Z, then X is the grandchild company of Z. You may assume there are no ‘great-grandchild’ companies (see example below). You may also assume that there are no circular relationships, e.g., if X is a child or grandchild of Y, then Y cannot be a child or grandchild of X. For example, suppose that the `Company` table looked like the following: GeneralMotorsLLC childcompanyof“1”“3”“, ”“2”‘’ Since the company with abn “2” is a child of (owned by) company “1”, and company “3” is a child of company “2”, answering this question would involve finding the total number of cars manufactured by companies “1”, “2” and “3”. There will never be a company which has a parent_abn of “3”, since that would then be a “great-grandchild company”. 8. Find all vehicles that have only ever been charged by people who are NOT the registered owner of the vehicle. Only include vehicles in the result that have been in at least one charging event. Return results as (VIN). Charging events with NULL kWh should still be considered. (3 marks) 9. Find all (person, car) pairings where the person has charged that car at every outlet of every station that is both located in a postcode between 3000 and 4000 (including 3000 but not 4000) and owned by the manufacturer of the car. Return results as (license_number, VIN). Only consider stations owned by the company directly, not by child companies. Charging events with NULL kWh should still be considered. (3 marks) Further clarification for Q9: - If a carY has been charged at all outlets matching the criteria by personX, and additionally has been charged at all outlets matching the criteria by personW, the results would include rows (license_number_personX, vin_carY) and (license_number_personW, vin_carY). - A row in the output of the query indicates that the same person charged the same car at all outlets that match the criteria for that car. Say there exists a carY, and station1 and station2 are the only two stations that fulfil the criteria for carY (have a postcode of 3xxx, and are owned by the manufacturer of carY). Say there exists a personA who has charged carY at every outlet of station1 but never charged at any outlet of station2. A different personB also exists, who has charged the same carY at every outlet of station2 but never at any outlet of station1. In this instance, no rows should be returned as result, because no single person charged carY at every outlet matching the given criteria (even though the car was charged at every outlet by somebody). 10. What was the total income of outlet `2` of the charging station located at street address `125 Collins Street` in postcode `3000` in January 2025? Use the `requested_at` date to determine whether a charging event was on that date. Your query should return a single value of the form (total_income), rounded to two decimal places (hint: use the `Round` function, and round after performing any aggregations). Note that you should consider the income after applying any discounts (see hint below). (3 marks) Hint: The income generated from a single charging event E at an outlet O which used coupon C for a discount can be calculated as: E.kwh x O.price_kwh x C.discount
ECON3106 Politics and Economics Exercises 2 1 Audits in Brazil All the questions this Section refer to the paper: "Audit risk and rent extrac- tion: Evidence from a randomized evaluation in Brazil - Zamboni and Litschig (2018)" 1.1 Describe ONE situation under which even if the audit risk of a municipality increases we could expect no change in the rent extraction. 1.2 Describe the policy change in May 2009 that the au- thors use to identify the effects of auditing. 1.3 Do you think the following statement is true or false? "The authors nd that the reduction in rent extrac- tion had negative effects on the welfare of the citi- zens of the affected municipality". Justify your an- swer with ONE piece of evidence 2 Corruption A small town has 60 voters and 2 types of jobs. 20 voters work as farmers (f) and 40 voters work in the mine (m). The small town must decide whether to build a water damn (d = 1) or not (d = 0). In case the water dam is not built everybody earns 8. If the water dam is built farmers the earning of the farmers increase by 10. Miners do not bene t from the construction of the water dam. The cost of the water dam is 180 and in case is built must be nanced equally by all 60 voters through a head tax. All voters care about the di erence between their earning and the taxes they have to pay 2.1 Write down the utility function of a farmer and a miner in the case that the water dam is not built. Do the same in the case the water dam is built 2.2 Is building the water dam efficent according to the social welfare citerion? Now imagine that the town's major works in the mine and he is the only one with the power to decide whether the water dam should be built. A group of 5 farmers decides to bribe the major by paying 2 each only if water dam is built. 2.3 Will the major accept the bribe? 2.4 Is bribing efficient according to the social welfare cri- terion? 3 An election A society of 99 voters must choose a policy p in the space (0, 1). Voter i's utility is given by Two candidates, A and B , propose platforms .30 and .505, respectively. Candi- dates compete in a plurality election and each voter is expected to vote for the candidate whose platform he likes the most. Each candidate only wants to win the election. 3.1 What is the bliss point of voter i? 3.2 Voters exhibit single-peaked preferences. Show that this is true for i = 40. 3.3 How many votes will each candidate receive? 3.4 According to the Median Voter Theorem, what should we expect the two candidates to propose? 3.5 Suppose now that there are 3 candidates: A, B , and C. The candidates' platforms are respectively .1, .5, and .9. You are told that voters are strategic and all voters i ≤ 50 will vote for A and all voters i > 50 will vote for C (i.e., nobody is voting for B). Is this a Nash Equilibrium? (brie y explain your logic) 4 Lobbying In order to bring fast internet connection to the rural areas a country has to decide whether to build new antennas outside the cities. Givent that there is the same number of people living in the cities and the rural areas in case the antenna is built the total construction cost of the antennas of 10 Millions AUD will be equally split between rural towns and the cities. Rural towns are lobbying for the construction of these antennas spending a total of 1 Million AUD$ in their lobbying activities. The cities instead are spending 3 Millions AUD$ against the construction of the antennas. The probability that the antennas are built is described by the following contest success function: 4.1 Given the lobbying e orts what is the probability that the bill is introduced The social welfare of the rural towns before taking account any lobbying activity and taxes is 10 Millions AUD$ in case the antennas are not built and it increase by 15 Millions AUD$ in case the antennas are built. The social welfare of the cities before taking account any lobbying activity and taxes is instead always 80 Millions AUD$ independently if the antennas are built or not. 4.2 Calculate the total social welfare of this country tak- ing into account taxes and lobbying e orts. 4.3 Compare this previous situtation to one where there is no lobbying and the antennas are not built. Which one is a preferred situation according to the social welfare criterion.
Integrated photonic sub-systems Example 1 – Microwave photonics link An analogue optical link based on single mode fibre is operated at 1550 nm wavelength with fibre loss of 0.2 dB/km. If coupling losses from the transmitter (modulator) to fibre and to receiver (photodetector) are 3 dB each, what is the optical path loss for a link of the length of 20 km? What is the equivalent electrical power loss for this optical transmission path? What is the equivalent electrical power gain for this system if the laser is integrated with a modulator? Example 2 – Optical frequency comb (OFC) Let’s assume an OFC which is based on amplified circulating loop. What is the frequency of the supermode (resonance frequency) of the optical comb, if its fibre loop length is 20 m? What is the relation between frequency fRF and supermode frequency?
ECON2060 Research Proposal Instructions Summary of task The goal of this assessment is to write an original and interesting proposal for a research project related to behavioural economics. You will come up with your own research question and describe a practical and feasible method for how you would go about answering it. The word limit is 1,500 words, which excludes the title page, abstract, tables and figures, references, and any appendixes. This is a hard word limit, and you will be penalised for exceeding it. Submission will be through Turnitin. In addition to the detailed instructions below, please note two mandatory components that must be followed by all students: 1. In the Related Literature section, students are required to reference, in a substantial way, at least one academic journal article that has been published since 2024. 2. In the Method section, students must include at least one table or figure. This could be a flow chart of the proposed experimental design or data collection process, or a table of summary statistics on preliminary data. The table or figure must be labelled and contain a full caption that provides a self-contained explanation of the table or figure. (“Self-contained” means that a new reader could jump straight to the table or figure and its caption, and still understand its meaning.) We have uploaded some examples of high-scoring research proposals from past students. Please note that some of these assignments were marked using the legacy marking criteria, and as such did not include the above additional components. You are allowed to use AI to assist you in this assessment task. In fact, it is encouraged, especially for the following tasks: 1. As an initial ‘soundboard’ to get feedback on your research topic ideas 2. As a final ‘proofreader’ for writing and grammatical errors Getting started Your first step is to decide your topic, and specifically, your research question. For example, it could be a test of a mainstream economics concept, an extension of a behavioural economics concept, or the application of a behavioural economics theory to a novel population or setting (such as a behavioural ‘nudge’). Typically (but not always), a good question can be rewritten in the form. “Does X cause Y?”, where X and Y are some characteristics or outcomes of interest. Table 1 gives a summary of such rephrasing of the questions in some of the papers discussed in this course. There is quite some flexibility with the format of your proposal, and you should not feel constrained by the guidelines below if you think you can do it better with your particular topic. These guidelines are there to help improve the quality of your proposal, but if you can write a high-quality paper in another way, you will not be penalised for unorthodoxy. Likewise, feel free to include anything in an appendix that is not an appropriate fit in the main document but that you do refer to in the text, such as specific experiment instructions, a variable dictionary for an existing dataset, or other background information or material. Most proposals will not need an appendix, but, seeing as it does not count for your word limit, you may wish to exploit it for material that is relevant but not critical to your research proposal. Regardless of your preferred format, however, your structure must include the following sections: 1. Title page 2. Abstract 3. Introduction 4. Related literature (with mandatory recent journal article reference) 5. Method (with mandatory inclusion of a table or figure) 6. Conclusion 7. References You may add other sections or subheadings for readability if you wish, but be careful of your word limit. Details of each section 1. Title page Your title page should contain your proposal’s Title, your name and student number, date of submission, your abstract, and an accurate word count (which excludes the title page, abstract, tables and figures, references, and any appendixes). 2. Abstract The abstract (placed on your title page) should be a four-sentence summary of your entire research proposal, so make sure it includes what you believe are the absolute key ingredients of your paper. The structure and content are flexible, but a typical format would be: • First sentence: jump right into the topic or research question. (In some cases, you might prefer a first sentence of motivation instead.) It would typically start with something like “This research proposal…” followed by what it is you are proposing to investigate/answer. • Second sentence: State what the contribution is of your research proposal. What has the previous literature proposed or found, and what will your proposal contribute to fill the gaps in our knowledge? E.g. “While previous literature has found that/assumes that/predicts that […], this proposal contributes to our knowledge of this issue by testing whether […]” • Third sentence: This should be about how you plan to answer your question. State your method (or ‘empirical’ approach). E.g.: “I propose…” followed by the method, e.g. “I propose to run a lab experiment in which I will test how changing X affects people’s behaviour towards Y.” If there is a treatment/explanatory variable and an outcome variable (and there should be!), or details of a specific context or sample that is being tested, they should be clearly specified. • Fourth sentence: Add any remaining critical details of the method, and/or include the bare minimum of your analysis plan: what do you plan to test, how do you plan to test it (e.g. a t-test between control and treatment groups, or a linear regression on your data), and what conclusions will you draw depending on these results? E.g. “I will run a t-test to determine whether there are differences in [Y] between the treatment and control groups, and if the treatment group’s [Y] is significantly larger, this would support the hypothesis that […]”. 3. Introduction This section should be started on a new page to your title page/abstract, and should be short, typically 1-3 paragraphs. Provide a motivation for the broad topic and why you think the reader should be interested in it, and introduce any relevant economic theories (mainstream or behavioural). You may wish to highlight where there is currently a gap in our knowledge (which your research proposal will aim to fill). But don’t dwell for too long on setting up the context; make sure your research question clearly appears by the end of the first paragraph. If different school of academic thought predict different answers to your research question, you may then want to spend 1-2 sentences outlining these predictions, and/or to briefly introduce which method you are proposing to use to test these predictions. In terms of your writing, try to avoid flowery language, embellishments, or ambiguities. Write clearly and matter-of-factly. Academic papers are often considered ‘dry’ in style, which can be true; the purpose of an academic paper is not to entertain with the writing, but to convey the material as clearly as possible; how ‘interesting’ an academic paper is will be typically judged on the worthiness of the topic and the quality of the research. Here are some examples of introductions of behavioural economics papers that broadly follow the structure that is expected of you. Most children think of their potential future occupations in terms of what they will be (firemen, doctors, etc.), not merely what they will do for a living. Many adults also think of their job as an integral part of their identity. At least in the United States, “What do you do?” has become as common a component of an introduction as the anachronistic “How do you do?” once was, yet identity,pride, and meaning are all left out from standard models of labour supply. This omission is understandable: identity, pride, and meaning are difficult to quantify and are thus hard to incorporate into the empirically driven field of labour economics. In this article, we focus on minimal perceived meaning by the labour producing force and investigate how it influences labour supply in controlled laboratory experiments. Our intention is to compare situations with no meaning (or as low a level of meaning as we can create) with situations having some small additional meaning. Thus, our investigation will focus not on occupations highly endowed with meaning, like medicine or teaching, but on the least-common denominator of meaningfulness that is shared by virtually all compensated activities. – Ariely, Kamenica and Prelec (2008) Neoclassical models include several fundamental assumptions. While most of the main tenets appear to be reasonably met, the basic independence assumption, which is used in most theoretical and applied economic models to assess the operation of markets, has been directly refuted in several experimental settings (Knetsch 1989; Kahneman, Knetsch, and Thaler 1990; Bateman et al. 1997). These experimental findings have been robust across unfamiliar goods, such as irradiated sandwiches, and common goods, such as chocolate bars, with most authors noting behaviour consistent with an endowment effect. Such findings have induced even the most ardent supporters of neoclassical theory to doubt the validity of certain neoclassical modelling assumptions. Given the notable significance of the anomaly, it is important to understand whether the value disparity represents a stable preference structure or if consumers’ behaviour approaches neoclassical predictions as market experience intensifies. In this study, Igather primary field data from two distinct markets to test whether individual behaviour converges to the neoclassical prediction as market experience intensifies. – List (2003) Charitable contributions in the United States were estimated to exceed $300 billion annually in 2007, 2008, and 2009. This is roughly $1000for each person in the US, a not insignificant amount. Given the reliance of charitable organizations on these contributions, it is quite important to try to identify and implement effective methods for enhancing the revenue received. There has been some recent work on suggested donations to public radio, and some study of the notion of paying-what-you-want as a pricing device. We extend both of these notions to fund-raising in a restaurant venue, exploring whether the suggested amount (if any) mattered with respect to the contributions raised. Businesses like grocery stores and restaurants often ask customers (typically through having a donation jar at the check-out register) to donate money to a certain charity organization. One often sees a suggested certain donation level. But there has been little by way of systematic and controlled study regarding how the suggested donation level affects behaviour in this environment. Our research question is to attempt to determine the optimal amount to suggest, or whether it is better to make no suggestion. – Charness and Cheung (2013) Improving energy efficiency reduces costs for firms and mitigates CO2 emissions. This is particularly important in the transportation sector, which is responsible for approximately 25%–28% of greenhouse gas emissions in Western industrialized countries (cf. EEA, 2018, EPA, 2018). Fuel accounts for around 40% of variable costs for transportation companies. We conducted an analysis to determine if loss aversion helps motivate drivers to drive in a fuel-efficient manner. If successful, this could reduce fuel consumption by about 22%. – Hoffman and Thommes (2020) Nudging has been found to affect human behavior across a wide range of domains. In particular, it has been used to improve the payment morale of citizens when they owe money to public institutions. While the traditional view (Allingham and Sandmo, 1972) considered citizens’ tax compliance as a matter of audits and harsh fines, it is by now well understood that tax morale is also a very important factor for compliance (Kirchler, 2007). In fact, nudging has been frequently applied to improve tax morale, even though with mixed results. In the realm of taxation, taxpayers are very likely to anticipate, however, that the government will ultimately enforce correct tax payments, which is why nudges might have a good chance to work. In other situations, however, public institutions may not want to enforce the collection of citizens’ payments for social or ethical reasons. Whether or not nudging also works in such a setting and whether it can have persistent effects even after abolishing the nudge again are the key questions of this paper. – Sutter, Rosenberger and Sutter (2020) 4. Related literature Please make sure that you clearly relate your research question to the existing academic literature. What are the possible answers to your research question that have been discussed in past studies? What papers answer a similar question to yours, and what do they find? (For example, if you are researching “Do tennis players exhibit loss aversion?”, you would want to cite studies that investigate whether other sporting players exhibit loss aversion). It is possible that closely related studies come from fields other than economics, such as psychology or even more specialised fields (for example, in the previous example, papers from sports journals might be relevant). But when in doubt, prioritise economics papers. What you must definitely avoid is proposing a study that has already been carried out. So, make sure that your literature search is thorough. Google Scholar is the best place to start, and once you find a close paper, use the “Cited by” feature to filter by recent, related papers. A common question is “How many papers should I cite?” This is hard to answer other than the general comment “The most important ones, but no more” . While it is important not to omit any critical paper, it is equally important not to spread yourself too thin such that you cite many papers but with insufficient detail for the relevance to be clear to the reader. Here are some types of examples. • If your proposal is an extension of one specific paper, then you may justify citing only this paper, so that you can go into deep detail about this paper and what your extension contributes to it. For example, you may be adding an original extension to the design of Niederle and Vesterlund’s (2007) competitiveness experiment. • If your extension has a very similar design to one study but applies it in a different domain – for example, you apply the Apesteguia and Palacios-Heurta (2010) paper about soccer penalty kicks to rugby union – you would want to (at least) cite both this paper and the most relevant paper about psychological pressure in your new domain (rugby union). • If your research proposal tries to reconcile two or more papers that reached contradictory conclusions, then you would want to describe these papers in detail (and you may not need to cite more). For example, Albrecht and Smerdon’s (2022) design references three contradictory theories in its review, and cites the main papers for each theory. • If your research proposal covers several topics – for example, you are comparing whether confirmation bias or the sunk cost fallacy can best explain why people don’t sell their crypto investments – you may need to cite more papers (in this case, ones on both biases in general, and also on broad psychological biases in the crypto market). Many research topics fall into this category. At the end of this section, you should state in one sentence what the specific contribution of your research would be to this literature. What is the gap that you are filling in the academic landscape? 5. Method This should be the longest section of your proposal, roughly half of your allocated word count. Typically, the method for your research proposal will be either an experiment (lab or field) or an empirical study of existing data. Your proposed method must be: (a) able to answer your research question, (b) practical, and (c) ethical. Table 1 gives examples of the methods used in some of the papers discussed in this course. If you choose an experiment, your proposal must include the following details: • The experimental design, including the type and number of subjects, the groups, and how you will administer the treatment(s) • The experimental procedure. This can be in broad terms, but all critical information must be included such that another researcher who reads your proposal would be able to implement the experiment you describe o You may wish to check the experimental papers assigned as readings in this course for examples of how to describe the experimental design and procedures. If you choose an empirical study of existing data, your proposal must include the following details: • The source of the data (e.g. “OECD PISA data wave 2015”), or how you would plan to collect it (e.g. “Scrape all Champion’s League football games from 2021-22 from the UEFA website”) • The key explanatory variable(s) (this would be the ‘X’ variable) and outcome variables(s) (the ‘Y’ variable) from the data • An explanation of how you plan to address any potential statistical biases such as selection bias in your analysis. This may involve describing additional control variables that you propose to include in your analysis of the data. If your design will make use of a natural experiment, clearly detail the source of the randomisation and why it means that your proposed method will accurately answer your research question. o For example, in Apesteguia and Palacios-Huerta (2010), the method of choosing which football side gets the first penalty kick is random, which prevents selection bias. In Gong (2015), the author made use of an existing program (the VCT) that randomly assigned HIV testing. No matter which method you use, you should clearly describe your treatment variable (or variables; the ‘X’) and outcome variable (or variables; the ‘Y’). Next, you should state how you plan to analyse the (experimental or natural) data, including any statistical tests that you propose to use (such as a t-test). If there are other variables that are important in your dataset (either an existing data set or one you will collect from your experiment), describe them and how you will use them. Finally, you should clearly state your hypotheses as they relate to the variables and tests. This will include any sub-sample effects (also known as “heterogeneous effects”), e.g., does your effect differ for males and females? (and how would you test this)? A reminder that this section also contains a mandatory component: You must include at least one table or figure. 6. Conclusion Your conclusion should be short (1 paragraph). You may wish to describe what you will conclude about your hypotheses or the motivating theories depending on which way your results turn out, as well as any limitations of your research or risks for its implementation (and how these might be mitigated). You should also describe the implications that you think your results might have, for either existing economic theories or for policy-makers / industry / other relevant groups. 7. References Your research proposal should be fully (and correctly) referenced, both within the text and by including a full bibliography. You are free to use any of the standard referencing styles so long as you are consistent (see UQ’s reference guide). To save time and guarantee accuracy, especially if you use a lot of references, you may wish to use a referencing software like Zotero or Endnote. For instance, Zotero (free!) can be installed as a web browser extension, which is very handy because once you find a paper online, you can import it into your Zotero library with one click. It also has a Word extension, meaning that you can import your library references into your Word document and also add an automated bibliography of references that updates by itself. (If you don’t use many references, it’s just as easy or easier to do things manually.)
Computer Science and Engineering CS6083, Spring 2025 Project #1 (due April 27) April 14, 2025 You are hired by a startup company to help build the database backend for a new web-based service, similar to Pinterest, that allows people to maintain online “pinboards” with pictures that they find and like and want to share with others. Users can sign up for the service, and can then create one or more pinboards. Later, users can “pin” pictures that they find on the web or upload themselves, and these pictures then become visible on one of their pinboards. Users can also “repin” pictures that they find on other user’s pinboards, which adds them to their own boards. Users can also follow other people’s pinboards, and can invite users to be their friends. Finally, users can “like” pictures they find on other boards, and can add short comments to other’s pictures. As an example, consider two users, Erica and Timmy. Erica likes to travel, and also loves antique furniture. She signs up and creates two pinboards, “Furniture” and “Dream Vacations” . Whenever she sees a picture on the web that she likes and wants to show to her friends, say a picture of a nice sofa on an website, or a picture of a beautiful beach, she pins it to one of her boards. Erica also has friends who often look at her images and sometimes like the pictures or leave comments such as “Cute” or “love it!” . Timmy is seven years old, likes dinosaurs and monsters, and when he grows up he wants to become a pirate. He creates boards named “Super Dinosaurs” and “Pirates” and whenever he sees a picture of dinosaurs or pirates (or even better, dinosaurs and pirates) he pins it to his boards. He also follows several pinboards by others that have a lot of pictures of monsters and dinosaurs – to do so he defines a ‘follow stream” called “Monsters and Dinosaurs” containing pictures from four other boards that he follows. He also sometimes repins some of these pictures so they appear on his own board. For simplicity, we assume that all boards, pictures, pins, and likes are visible to everyone, and that all pictures could be repinned and liked by any other user. However, a user’s follow stream is private. So this describes the basic idea behind the system. (You may also explore services such as Pinterest to get the idea.) In this first part of the course project, you will have to design the relational database schema that stores all the information about users, boards, pictures, friendships, follow streams, repins, likes, and comments. In the second part of the project, you have to design a web-accessible interface that makes this system usable for real users. You should use your own database system on your laptop or an internet-accessible server. Use a system that supports text operators such as like and contains. Both parts of the project may be done individually or in teams of two students. However, you have to decide on a partner and email the TAs with your names by Monday, April 21. The second part of the project will be due a few days before the final exam. Note that the second project builds on top of this one, so you cannot skip this project. Before starting your work, you should think about what kind of operations need to be performed, and what kind of data needs to be stored. For example, there should be a login page, a page where a user can sign up for the first time (by supplying an email address and choosing a user name), and a page where users can create or update their profiles. Users should be able to create pinboards, ask other users to become friends, and should be able to answer friend requests. They should be able to pin pictures they find on the web, or which they upload themselves. They should be able to repin and like pictures. For simplicity, we assume that all boards, pictures, pins, and likes are visible to everyone, and that all pictures could be repinned and liked by any other user. However, people may decide to only allow their friends to comment on pictures on their board (this is a setting they can choose for each board). When a user likes a picture, this is counted as a like of the original pin of the picture, not of a particular repinning of the picture. However, comments about a repinned picture are only associated with the repinned and not the original picture. Users may also use keyword queries to search for pictures, by matching against their tags, and the system would then return pictures matching the keywords sorted by either time, relevance, or number of likes. Some words about pinning and repinning, which will mainly be important for the second part of the project. When a user pins a picture on the web, she should supply the URL of the image, the URL of the page in which the image was found, and a few tags (e.g., “couch, brown, modern’, ikea). The system should also download and store the picture itself in the database as a blob (in case the image changes later or is removed from the original site). If the user uploads an image, the system would store the image on its site, assign a URL to the image, and then pin that URL. When a user repins (or re-repins etc.) an image, this does not result in a copy of the picture, but isjust a pointer to the picture as it was first pinned, with the same URL and tags, and if the first pinner removes it, it should become inaccessible everywhere it was repinned. Of course, the same picture might be originally pinned by several users, possibly under different URLs, and you do not have to remove such duplicate pictures. Also, ideally images would be pinned using a button on your browser that is provided as a browser plugin, but you do not have to do this as part of this project, so your system will probably require users to paste URLs into a dialog box. Two more remarks: First, it is recommended to always store time stamps for any action such as pinning, liking, commenting, as real services use such log information for later data mining. Second, you should of course not use database permissions or views to implement user identification. There will not be a separate DBMS account for each user, but the web interface and application itself will log into the database. So, the system you implement can see all the content, but has to make sure at the application level that each logged-in user is identified through the use of cookies in the second part of the project. Project Steps and Deliverables: In the following, we describe the suggested steps you should take for this project, and the associated deliverables. You should approach this project like one of the design problems in the homeworks, except that the schema may end up being a bit more complicated. You should spend some time carefully designing sample data that allows you to test some of the functionality. Note again that in this first problem, you will only deal with the database side of this project - a suitable web interface will be designed in the second part. However, you should already envision, plan, and maybe describe the interface that you plan to implement. The suggested steps are as follows: (a) Design, justify, and create an appropriate relational database schema for the above scenario. Make sure your schema is space efficient, and suitably normalized. Show an ER diagram of your design, and a translation into relational format. Identify keys and foreign key constraints. Provide a short discussion of any assumptions that you made in your design, and how they impact the model. Note that you may have to revisit your design if it turns out later that the design is not suitable. (b) Use a database system to create the database schema, together with key, foreign key, and other constraints. (c) Write SQL queries (or sequences of SQL queries) for the following tasks. (1) Signing Up, Creating Boards, and Pinning: Write queries that users need to sign up, to login, to create or edit their profile, to create pinboards, to pin a picture, and to delete a pinned picture. (2) Friends: Write queries for asking another user to be friends, and for answering a friend request. (3) Repinning and Following: Write queries for repinning a picture and for creating a follow stream. Also, write a query that given a follow stream, displays all pictures belonging to that follow stream in reverse chronological order. (4) Liking and Commmenting: Write queries to like a picture, and to add a comment to a picture (while making sure the user is allowed to comment on this picture). (5) Keyword Search: Write a query to perform. a keyword search for pictures whose tags match the keywords. Use the contain operator to do so. (d) Populate your database with some sample data, and test the queries you have written in part (c). Make sure to input interesting and meaningful data and to test a number of cases. Limit yourself to a few users and a few messages and threads each, but make sure there is enough data to generate interesting test cases. It is suggested that you design your test data very carefully. Draw and submit a little chart of your tables that fits on one or two pages and that illustrates your test data! Print out and submit your testing. (e) Document and log your design and testing appropriately. Submit a well-written description and justification of your entire design, including ER diagrams, tables, constraints, queries, procedures (if any), and tests on sample data. Your documentation should be a comprehensive paper, including introduction, explanations, ER and other diagrams, and more (typically about 8-12 pages). This paper will be expanded in the second part of the project.
SUMMATIVE ASSIGNMENT 3 – BUSI4AY15 Business Analytics Masters Programmes 2024/25 For this assignment, you will be provided a data set in Excel and an Excel answer sheet. At the bottom of this assignment, you will find a list of exercises to execute on the data set provided. You are input your answers into your Excel answer sheet and submit it on Learn Ultra. SUBMISSION INSTRUCTIONS FORMAT You are to submit the Excel answer sheet with the file name adequately changed as instructed. You are not to alter the structure of the Excel answer sheet. The Excel file should be kept in the .xlsx format. MARKING GUIDELINES The number of marks carried by each question in the exercise is indicated clearly in this assignment. PLAGIARISM AND COLLUSION Note that your data set is unique to you, correspondingly, the answers that you will obtained will also be unique to you. Students suspected of plagiarism, either of published work or the work of other students, or of collusion will be dealt with according to School and University guidelines. SPECIFIC INSTRUCTIONS 1. You will be able to find your data set for this assignment in the “Data Sets” folder. All of the files in this folder are named “yourZnumber_Number1_Number2.xlsx”. You are to find the data set corresponding to your Z number. Download this data set. 2. You are strongly advised to save a copy of the data set in a safe location, in the unlikely but potential event that you accidentally overwrite the data during the process of your analysis. 3. Note that all of your colleagues have been provided different datasets, and consequently will arrive at different correct answers for the assignment. As such, please ensure that you use the data set that corresponds to your Z number and not the data sets of any of your colleagues. 4. In the assignment folder, you will also find an Excel file that is labelled “yourZnumber_SA3.xlsx”. This is your answer sheet, which you use to fill your responses in for each question and which you will submit on Learn Ultra. Change yourZnumber in the file name to your Z number right now. 5. As your assignment is machine-graded, any minute error in your file name will render it unreadable by the machine and will lead to a complete loss of marks, so please ensure that the file name is correct. 6. The Excel answer file contains two columns. The first column lists the question numbers to which an answer is expected in the Excel answer file. In the second column, you are to key in your answer to the corresponding question there. Do not alter the structure of the Excel file, namely, do not add new rows or columns or key in any values outside of the demarcated area. 7. You will be required to key different types of answers in a specific form. a. For numerical answers, please leave your answers with at least 3 significant figures (in other words, with at least 3 non-zero digits, e.g. you may reflect “12.345” as “12.3” and “0.01234” as “0.0123”). If your numerical answer is a whole number, leave them as such (e.g. you may leave “3” as “3” as opposed to “3.00” that is in 3 significant figures). b. If you are required to report probabilities, for example, p-values, if the numerical value is smaller than 0.0001 or 1e-4, please report the value as 0. If you are asked to report probabilities or proportion or percentages, please reflect the value in decimals (e.g. report 78% as 0.78). Never reflect your answer as a fraction (e.g. for 1/3, instead use 0.333 in 3 significant figures). c. For multiple choice questions in this assignment, feel free to leave your responses in any of large or small capitals. 8. Wrong answers have a potential to carry partial marks. 9. This assignment comprises a total of 18 questions, amounting to a total of 100 points. ASSIGNMENT QUESTIONS In this assignment, we are going to consider a company that produces six types of products (Products A, B, C, D, E and F) and serves five different markets (Markets V, W, X, Y and Z). The operating circumstance of the company is described as follows: · Each product has a cost to produce and will fetch a different given revenue when sold in each specific market. · The company incurs per unit logistical costs for transporting a type of product to each market. · There is a total production capacity per month at the factory that is measured in the units of effort, with each product costing a particular amount of effort to produce. · Each market demands a certain number of each of the products every month. It is assumed that demand does not need to be met, but cannot be exceeded, ie., the demand reflects the maximum number of each product that can be sold in that market. Every month, the company is required to decide on the production level of each product and the logistical plan to transport how many of each product to each market to be sold. Parameters The following table shows the cost of producing each product and the revenue it will generate at each market. A B C D E D Cost 1.1 1.5 1.3 1.2 1.0 1.4 Product Market V W X Y Z A 3.2 2.6 2.7 3.0 3.4 B 2.7 2.9 3.4 3.2 2.6 C 2.9 3.3 2.7 3.0 3.1 D 3.1 3.4 3.1 2.8 2.8 E 3.1 3.4 3.2 3.4 2.6 F 3.3 2.5 3.0 3.5 3.4 The following table shows the per unit cost to transport a product to each market. Product Market V W X Y Z A 1.0 1.4 0.7 0.6 1.1 B 1.0 1.5 0.9 0.8 0.6 C 0.8 0.9 1.5 0.8 0.7 D 0.9 0.5 1.2 0.7 0.6 E 1.0 0.6 1.4 1.2 0.7 F 0.8 1.4 0.6 1.3 1.3 The following table shows the effort require to produce each product. The total available effort per month at the factory is 30 units of effort. A B C D E F Effort 0.0038 0.0031 0.0036 0.0032 0.0033 0.0030 The historical demand observed for each product in each market is captured in the data set that is provided to you. In your data set, each row represents the demand observed in a month in a particular market. As such, the data set has three columns: ‘product’ indicating the product, ‘market’ indicating the market, and ‘demand’ the observed demand for that product and that market in that month. For every pair of product and market, there are 50 records of the demand over 50 months, hence amounting to a total of 1,500 data points. To verify that you have downloaded the correct data set, please calculate the mean and variance of the demand across the 50 months for Product A and Market V. In the file name of your data set, you will see “yourZnumber_Number1_Number2.xlsx”. Numbers 1 and 2 would be the mean and variance respectively that you had calculated. Do not proceed if these numbers are not correct! Verify with your tutor if they are incorrect. In this assignment, we shall assume that the company is interested in maximizing their profits. This is understood as the revenue minus the costs. The only source of the revenue is when products are sold at each of the markets. The costs comprise both production costs and transportation costs. The company is constrained by total production in the factories, in the units of effort, and the demand for the products at each of the markets cannot be exceeded. Use to denote the decision variable of how many of Product P (P = A, B, C, D, E or F) is to be transported to be sold at Market M (M = V, W, X, Y or Z). (The letter at the top will always represent the market and the letter at the bottom will always represent the product, e.g. , ,…) Assignment Question 1 8 points How many decision variables are there in this optimization problem? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A There are 36 decision variables, where 30 (=6 products X 5 markets) arise from the decisions of how many of each product to transport to each market; and 6 arising from how many of each product to produce. B There are 35 decision variables, where 30 (=6 products X 5 markets) arise from the decisions of how many of each product to transport to each market; and 5 arising from how many products are sold at each market. C There are 41 decision variables, where 30 (=6 products X 5 markets) arise from the decisions of how many of each product to transport to each market; 6 arising from how many of each product to produce; and 5 arising from how many product are sold at each market D There are 30 decision variables, where 30 (=6 products X 5 markets) arise from the decisions of how much of each product to transport to each market. How many of each product to produce can be inferred from these decisions. Question 2 3 points The total number of product A that is produced can be expressed in the following linear manner in terms of the decision variables, where the ‘?’ (question marks) and [Q2] are some constants (known numbers), that are not necessarily the same. What number should go into [Q2]? Question 3 3 points Building on Question 2, the total cost of production of Product B can be expressed in the following linear manner in terms of the decision variables, where the ‘?’ (question marks) and [Q3] are some constants (known numbers), that are not necessarily the same. What number should go into [Q3]? Question 4 3 points Building on Question 3, the total cost of production across all products can be expressed in the following linear manner in terms of the decision variables, where the ‘?’ (question marks) and [Q4] are some constants (known numbers), that are not necessarily the same. What number should go into [Q4]? Question 5 3 points The total transportation cost of all products to all of the markets can be expressed in the following linear manner in terms of the decision variables, where the ‘?’ (question marks) and [Q5] are some constants (known numbers), that are not necessarily the same. What number should go into [Q5]? Question 6 3 points The total revenue generated by all products in all of the markets can be expressed in the following linear manner in terms of the decision variables, where the ‘?’ (question marks) and [Q6] are some constants (known numbers), that are not necessarily the same. What number should go into [Q6]? Checkpoint 1: Objective function Based on your response to Questions 4, 5, and 6, you should be able to write out your objective function, which is the revenue minus the total cost (comprising production cost and transportation cost). Question 7 3 points The constraint that the total effort required to manufacture all of the products at the factory can be expressed in the following linear manner in terms of the decision variables, where the ‘?’ (question marks), [Q7] and [Q8] are some constants (known numbers), that are not necessarily the same. What number should go into [Q7]? Question 8 3 points Looking at the same equation in Question 7, what number should go into [Q8]? Question 9 8 points For the constraint limiting the number of each product sold under the demand, for now, we will use the average demand that you would be able to calculate from your data set. What is mean demand for Product D in Market X? Question 10 3 points The constraint that the quantity of Product A that is transported to Market V should not exceed the demand for Product A at Market V can be expressed in the following linear manner in terms of the decision variables, where the ‘?’ (question marks), [Q10] and [Q11] are some constants (known numbers), that are not necessarily the same. What number should go into [Q10]? Question 11 8 points Looking at the same equation in Question 10, what number should go into [Q11]? Question 12 8 points Apart from the non-trivial constraints (which constrains all of the x-variables above zero), how many constraints would your optimization model have? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A We would have 2 constraints, one constraining the total quantity of products by the effort and one constraining the total quantity of products by the demand. B We would have 6 constraints, one constraining the total quantity of products by the effort and 5 constraining the total quantity of products by the demand at each of the markets. C We would have 11 constraints, 6 constraining the total quantity of products by the effort for each of the products and 5 constraining the total quantity of products by the demand at each of the markets. D We would have 31 constraints, one constraining the total quantity of products by the effort and 30 constraining the total quantity of products by the demand at each of the markets for each of the products. Checkpoint 2: Constraints and optimization model Based on your response to Questions 7 to 12, you should be able to write out the full optimization model. Question 13 21 points Solve for the optimal decisions x for the company. What is the optimal profit that the company can earn? Question 14 3 points What is the optimal number of Product D to transport to Market Z? Question 15 3 points What is the optimal total quantity of Product E to produce at the factory? Question 16 6 points Generate a sensitivity report for your optimization model. It turns out that the total effort required for production in the factory is measured in FTE (full time equivalents). In other words, the total effort of 30 actually reflects 30 workers. Management is currently considering the possibility of hiring an additional worker part time worker at 0.5 FTE to bring the total number of workers to 30.5. Without solving a new model, what is the highest monthly salary that should be offered to this worker so that hiring the worker would be cost effective? If it is not possible to infer this value from the sensitivity report, key in 999 into your answer sheet. Question 17 3 points In building our optimization model, we have assumed that the demand is the average demand. However, in the data set, we actually do observe a range of values of the demand. It might be necessary to see what is the impact on the optimal decision of using the average demand. What is the fifth smallest demand of Product C in Market Y observed in the data set? Question 18 8 points Based on the sensitivity report and without solving a new model, what is the consequence on the optimal profits that the company earns should the demand for Product C in Market Y not be the average demand, but rather, only the 10th percentile amongst all historic demand observed? Choose the most appropriate answer from the following responses and key in one of A, B, C or D into your Excel file. A There would be no impact on the optimal profits because at present, the demand constraint for Product C in Market Y is not tight and the change in the demand is within the allowable decrease. B Optimal profits will fall, however, it is unclear by how much, because the demand constraint for Product C in Market Y is at present not tight, but the change in the demand lies outside the allowable decrease. C Optimal profits will fall and the fall is given per unit by the shadow price, because the demand constraint for Product C in Market Y is at present tight, and the change in demand is within the allowable decrease. D Optimal profits will fall, however, it is unclear by how much, because the demand constraint for Product C in Market Y is at present tight, but the change in the demand lies outside the allowable decrease.
ECON3106 Politics and Economics Exercises August 8, 2019 1 Bayes' Politics An incumbent politician is of type g with probability π . Voters want to re-elect her only if she is g but cannot observe it directly. Yet, they know that a g politician would choose alternative A with probability p. Any other type of politician chooses A with probability q. 1.1 What is the probability that an incumbent who has chosen A is of type g? 1.2 In which case the voters should re-elect the incumbent if they observe A? Now, assume that there are two possible states: 0 and 1. State 0 is exactly as above. State 1 is di erent, because in state 1 a g politician never chooses A (other politicians stlill choose A with probability q). The probability of state 0 is r. 1.3 How would your two previous answers change? 2 Bayes' Politics II An incumbent chooses between two alternatives, A and B. There are two possi- ble states: θA ans θB . the probability of state θA is π = Pr (θA ). The incumbent knows which state is true, but the voters cannot observe the state. The incum- bent can be of two types: a and b. A b incumbent always chooses B. 2.1 Assume that an a incumbent chooses A when the state is θA and B when the state is θB . If you observe A being chosen, what is the probability that the incumbent is of type a? 2.2 With the same assumption, if you observe B being chosen, what is the probability that the incumbent is of type a? Now, assume that the voters always reelect the incumbent if they observe A. Also, the incumbent of type a cares only about being reelected: he gets a payo of R > 0 if reelected and 0 otherwise. 2.3 Under this assumption, which alternative would a type a incumbent choose? 3 In democracy, politicians are accountable to voters, who can choose to replace them when an election comes. Some judges and central bankers are instead appointed for a xed term and are not accountable to voters. Explain very brie y (a few lines) pros and cons of po- litical accountability. 4 Bayes' Rule and Pandering There are two states of the world, θ ∈ {θA , θB }. Although voters cannot observe the state, they know that P (θ = θA ) = 0.8. The incumbent knows the state and must decide between policies A and B. Politicians can be of type b or type g with equal probability. Type b politicians always choose i to mismatch with θi whereas type g always chooses policy A. 4.1 What do voters believe about the incumbent's type if they observe policy choice A? Use Bayes' Rule. 4.2 If voters only care about selecting type g politicians, how should they vote to maximise their chances of a type g politician? Note that a comprehensive strat- egy for voting should explain what to do depending on what the incumbent has chosen. 4.3 What do the voters believe about the incumbent's type if they observe policy choice A? Use Bayes' Rule. 4.4 If voters only care about selecting type g politicians, how should they vote to maximise their chances of a type g politician? 4.5 If type g politicians are purely office-motivated, which of these two strategies should they use? 5 How can campaign advertising affect what vot- ers believe about a candidate? 6 Why should better paid politicians affect the selection of politicians and the behavior. of elected politicians once in office? You can base your answer on the results found in the following paper: Motivating Politicians: The Impacts of Monetary Incentives on Quality and Perfor- mance - Ferraz and Finan (2009) (100 words) 7 NGOs that provide food and other supplies to countries in need often worry they not helping but hurting the recipients of help. How can aid of this type be counterproductive? Use the results found in US Food Aid and Civil Confiict - Qian, Nunn (2014) in order to answer this question (100 words)