Faculty of Computing, Engineering & Media (CEM) Coursework Brief 2024/25 Module name: Mobile Communication 1 Module code: ENGD3105 Title of the Assessment: Design and Analysis of Image Communication System ENGD3105 Mobile Communication 1 Coursework Assignment 2024-2025 Design and Analysis of Image Communication System (Maximum Marks: 50) Issue date: 16/10/2024 Submission deadline: 13/12/2024 noon Feedback date: 08/01/2025 Aim: In this assignment, you will design, implement, and test an image communication system consisting of a channel coder, modulator, demodulator, and channel decoder. You will use MATLAB for your implementation. What to submit: Your coursework must be submitted as a report in Word format. Your MATLAB codes must be provided as text (not pictures) and included as appendices at the end of the report. All code lines must be commented to explain them. Your answers must be supported by references as appropriate. All references must be in IEEE style. Please note that 5 marks will be allocated to the presentation and organisation of your report. Your report, excluding the cover page and MATLAB codes, must not exceed eight pages (minimum font size 11 pt). 1) Design and implement in MATLAB a communication system that uses channel coding with an LDPC code and digital modulation with QPSK to transmit digital images over an additive white Gaussian noise (AWGN) channel. Your system must be implemented as a function. You must clearly describe all the components of your system and motivate your design. [20 marks: Design: 10 marks, quality of the implementation: 10 marks] 2) Use your system to simulate the transmission of the image Fruits.jpg. Discuss the results. [10 marks: Design of the simulation: 2 marks, discussion of the results: 8 marks] 3) Replace the LDPC code with a Turbo code and compare the results to those in 2). [15 marks: Design: 3 marks, quality of the implementation: 4 marks, discussion of the results: 8 marks] The following MATLAB documentation can help you with your coursework · Reading images: https://uk.mathworks.com/help/matlab/ref/imread.html · LDPC encoder: https://uk.mathworks.com/help/comm/ref/ldpcencode.html · LDPC decoder: https://uk.mathworks.com/help/comm/ref/ldpcdecode.html · Turbo encoder: https://uk.mathworks.com/help/comm/ref/comm.turboencoder-system-object.html · Turbo decoder: https://uk.mathworks.com/help/comm/ref/comm.turbodecoder-system-object.html
Fundamentals of Photonics Fall 2024 Project description Due: December 04th 4pm Design a fiber-waveguides-fiber transmission system based on Silicon (n=3.5) and SiO2 (n=1.5). with ultra large bandwidth , compact and robust to +5nm variations with a total area of less than 1mm2. The system is composed of : · fiber-waveguide coupler · 1x2 splitters · 1mm long waveguides your goal is to ensure that A. the time delay for all channels is uniform. for all wavelength. This is determined by the dispersion of the 1mm waveguide. B. the transmission for all channels is uniform. and maximum. This is determined mainly by the splitting ratio and by the fiber-waveguide coupling, respectively. Your input laser emits 10mW light from 1300nm to 1700nm. At the output of each channel there should be at least 1mW (due to the responsivity of the fiber coupled detectors). Assume that the output waveguides and the input laser is coupled to a fiber. with a mode profile that extends in one dimension (x) as where xo=5 microns . In other direction the mode extends infinitely (ie it is a slab-like waveguide). For all waveguides the bottom cladding is 3 micron thick. Instructions: · When using bends, the minimum bending radius is 5 microns for the maximum confined single mode waveguide and 20 microns for a 10 times thinner waveguide. To find the radius for waveguides with in-between widths, linearly interpolate between these two extremes. · Start by designing each of the components for 1500nm and then optimize them for wide bandwidth. Finally check (and optimize further if needed ) robustness to variations. · When considering variations of +5nm, assume that All dimensions in all waveguides vary by 5nm. · Use normalized parameters for all your calculations. For the fiber-waveguide coupler · neglect reflections when calculating the transmission (ie include only overlap). Also assume that both waveguide and fiber are slab waveguide, ie infinite in one dimension. · For the 1x2 splitter show (using plots) what is the most sensitive dimensions to variations and the just vary that dimensions. Calculate the change in amplitude at a given propagation length and then integrate the function numerically. · For the 1mm waveguide with minimal dispersion, in order to find the optimal change of width as a function of propagation length: o I. Choose a profile of the waveguide, discretize the waveguide along the propagation length every wavelength (this ensures that the discretization is fine enough while not requiring you to check the optimal discretization steps) and average the waveguide width over each discretized section. o II. Calculate the amplitude in a higher order mode B exiting the discretized section considering that the input into the section is a fundamental mode A. Note that when calculating the amplitude in a higher order consider that the propagation length of discretized section is much smaller than the coupling rate. therefore the amplitude in a higher order should be independent of the discretization. o III. If B/A is higher than 1% , change the profile to decrease the probability and repeat the process from II. · Decide what is the wavelength range that your laser will operate (within the larger range of 1300-1700nm). Then decide how many channels your system will have. Build an 1xn (n>2) splitter by cascading 1x2 splitters) · State what is the total bandwidth supported by your system (in THz). Note 1nm bandwidth is equal to 133GHz bandwidth. · Plot a schematic of whole system (include all relevant dimensions in the schematics) and estimate the total area of the system. · Plot the transmission out of each channel as a function of wavelength for both optimized and non-optimized geometry · Plot the time delay of each channel (assume that the delay is only induced by the 1mm long waveguide) as a function of wavelength for both optimized and non-optimized geometry. The time delay is given by: The project should be done either individually or in teams(preferred) by groups of up to 3 members. Please have a section that explains clearly what was the role of each group member. The project should be done using your favorite code writing software (matlab, mathematica, etc) for your calculations and plotting your data. All assumptions and choices you made (dimensions, number of waveguides, etc) should be discussed. Neglect reflections.
Real Estate Valuation and Appraisal (URBAN5040) Project Brief 2024-25 Project Aims The aim of the project is to introduce you to the complex operation of the investment, use and development markets by applying the techniques and methods associated with valuations and appraisals to solve real life problems. It also gives you the opportunity to analyse and reflect on the various challenges inherent in completing contemporary valuations and appraisals. There will be two elements to the submission - 1) Calculations - Applying valuation and appraisal techniques, through data collection, market analysis and assumptions, to complete the valuation and appraisal of an office building, industrial unit, residential property and hotel, submitted as one complete excel spreadsheet (50%). 2) Critical reflections - Preparing responses to two statements in relation to valuation and appraisal in practice. These will be short academic essays with clearly defined and well- informed positions, a maximum of 750 words each, excluding references (50%). Any additional information that informs parts one and two can be included in an appendix, as and when relevant and appropriate. More detail on the breakdown of these elements can be found below in terms of marking guidance and criteria. Output Objectives When you successfully complete the project, you should be able to: • Systematically examine and compare real estate market conditions in Glasgow and the Central belt, across varied sub-sectors. • Understand how the performance of real estate markets is connected to broader economic trends at the urban, regional, and national levels, and the challenges in analysing data and trends. • Be proficient in the collection, interpretation, synthesis and preparation of material from a variety of diverse sources. • Undertake the analysis and critical evaluation of market data. • Put into practice the income, direct comparison and profit methods of valuation, and • Critically appraise investments using a DCF model. • Evaluate a range of academic sources and establish distinct critical positions and arguments in relation to two statements regarding valuation processes in the real estate market today. Project Details - Part One - Valuations and Appraisal You have been appointed by NAPT Property Investment Ltd. This is a Glasgow based private property investment company looking for advice and guidance on the management of their portfolio. They have asked you to undertake a DCF appraisal of the investment value and traditional market valuation on the following investment opportunity. G1 Building, 5 George Square, Glasgow, G2 1DY This 5-star (CoStar rating) award winning building is arranged over eleven floors offering office accommodation over the first to eighth floors and a bar covering the ground floor and basement. The building has a NIA of 131,448 ft2 and contains 10,660 m2 (113,663 ft2) of high-quality office space and 1,713 m2 (18,435 ft2) of space currently let as a bar/restaurant, with a total of 61 covered car parking spaces. The particulars of the building can be found on Moodle, and there is also more information available on CoStar. For the purpose of this valuation, the current tenancy schedule is as follows: B & G 1,713sq m (18,435 sq ft) which is currently let to Browns Bar & Brasserie. The 15- year lease was agreed on Monday 12th November 2018 at £515,000 and is quoted on a net basis. The annual rent is paid in arrears and index-linked to the Consumer Price Index (CPI). Level 1 & 2 2,852 sq m (30,702 sq ft) currently let and occupied by The Scottish Herald newspaper. These floors were letonanIRIbasis that was signed at the end of August 2012 for a period of 20 years with rent reviews every five years. The current passing rent is £550,000 per annum and paid in advance. Seventeen parking spaces are attached to this lease. Levels 3, 4 & 5 These three floors (4,744 sqm/51,074 sq ft) are currently let and occupied by Gardner Investment Management. These floors were let on an FRI basis that was signed in early March 2014 for a period of 14 years with rent reviews every seven years. The current passing rent is £1,639,000 per annum and paid in arrears. Twenty-one parking spaces are attached to this lease. Levels 6, 7, 8, 3,723 sq m (40,074 sq ft) across these three floors was let on 18th March 2019 by Amazon. The 15-year lease is currently set at £1,030,000 (IRO) per annum in arrears and has five yearly rent reviews. This space is let with 15 car parking spaces. You estimate a current gross initial yield of 5.5%, an implied growth rate of 2% per annum on non- index-linked rents, along run CPI of 2.25% and a suitable target rate of return of 9%. You also estimate it will take 12 months after an existing lease ends to find a new good quality tenant under current Glasgow office market conditions, and you are required to explicitly allow for rental income voids. All leases across levels 1 - 8 are quoted in current terms. The seller is asking for offers over £65,000,000. Your client has asked you to critically appraise the market rent for the office space in the Glasgow Core submarket. In addition, you are required to recommend how much your client should offer to purchase this asset, and to evaluate and compare the rents achievable, market conditions and rental growth prospects for prime offices in Glasgow. This recommendation of the offer should come once you have completed both of the valuations for George Square, and be clearly included in your spreadsheet. Your excel spreadsheet should include (and clearly link in through formula), all evidence and figures used to inform. the assumptions made in relation to the valuation inputs. In addition to the investment appraisal and valuation of this prime office, your client has asked you to undertake the valuation of a series of investments in Glasgow and Edinburgh. They would like to know the market value of the following interests: 97/7 Montgomery Street, Hillside, Edinburgh, EH7 This third-floor flat forms part of a traditional tenement in Edinburgh, within walking distance of Edinburgh city centre, and many local amenities. It is bright, contemporary and well proportioned. Hillside is a central Edinburgh address, close to Leith and Calton Hill, and is popular with young professionals looking to take advantage of city life. There is a selection of bars, restaurants, coffee shops and shops on the property’s doorstep, with very good transport connections (tram and bus), and Edinburgh Waverley train station only a short walk away. The spacious property consists of 63.92 sq m (688 sq ft) gross internal floor area which contains two double bedrooms (3.38m x 2.64m / 2.95m x 1.93m), a separate WC and bathroom. The property boasts an open plan kitchen / dining room (5.13m x 3.20m) and a large living room (4.52m x 3.45m). The property is finished to a very high standard and has many period features. There is a security entry system leading to a secure residents’ communal staircase. The flat is in excellent condition, currently vacant and available for sale, has a C Council Tax Band and has a communal garden and on street parking. 70 Cambuslang Road, Glasgow, G32 8NB The subject property, built 2006, comprises a single warehouse unit within an established industrial location, incorporating accommodation with a secure yard and car parking. The warehouse area is currently leased to DHL, who occupy 5,109 sq m/ 55,000 sq ft (GIA). The site was let on 3rd March 2015, FRI on a 15-year lease with five yearly rent reviews (last review 2020). The current rent was set at £6 per sq ft, equivalent to £330,000 per annum. Your client is interested in buying both the heritable and leasehold interests. Radisson Blu Hotel, 275-309 Argyle Street, Glasgow, G2 8DP This property, fronting on to St Enoch Square and within walking distance from Glasgow Central Station, comprises a new hotel accommodating 249 bedrooms, ground floor reception, restaurant and bar. The property is owned by your client but rented to Whitbread (who own the Premier Inn chain) until March 2040. After allowing for staff and other running costs, you estimate that annual earnings before, interest, taxation, depreciation and amortisation of £3,150,000 represents a fair and maintainable profit for a reasonably competent operator in this property. You also estimate that 5.5% is a suitable capitalisation rate for the owner’s interest whereas 9% is more suitable for the leasehold interest. You are required to value both the heritable and leasehold interests. You are not required to produce a full professional valuation report. For part one, you are required to present annotated valuations that explain your assumptions and contain a critical analysis of comparables and provide clear sources for the inputs being used. It is entirely up to you how you choose to present the inputs for each of the valuations – although it is best practice to have each valuation on a separate sheet in your Excel file. All the assumptions and market evidence could be housed on one page, then connected to the worksheets, or you might want to have the related information on the same sheet as your calculations – it is entirely up to you. When determining your rents, yields and other inputs – you need to clearly state how you have arrived at these from market evidence, and note any other assumptions being used – such as those listed below. Additional information: a. A traditional valuation (10%) and DCF investment appraisal (15%) for the George Square property which assumes that the investor plans to buy the property and sell after a holding period of 15 years. Your appraisal must also account for 5.8% purchase costs and a further 2.5% for disposal costs at the end of the holding period. Forecasts predict that suitable exit yields, post refurbishment, will be around 4.5% for the George Square office. You should also allow 5% of the market rent for rent review costs on leases at every rent review (over and above annual management costs). Also, do not forget to include the car parking in your valuations and appraisals. These spaces have value too. You estimate each space has a market price of £18,000 at the valuation/appraisal date. b. Valuations of the market values, as of 4th December 2024 (submission date), for the property interests your client has asked you to value. These should be accompanied by a critical discussion of your comparable evidence, assumptions and methods employed. c. When valuing the leasehold interest at 70 Cambuslang Road, your client has asked you to use a suitable Year’s Purchase formula that makes adjustment for a 2.5% sinking fund and 21% rate of corporation taxation which the current occupieris liable for. Remember, current practice tends to use a single Year’s Purchase formula to value leaseholds, but your client has expressly requested a dual rate Year’s Purchase and allowance for taxation. When valuing the landlord’s and tenant’s interests in Radisson Blu Hotel, Glasgow assumes the rent represents 65% of the Fair Maintainable Operating Profit (FMOP). For part one of the coursework the grading will be broken down as follows: - Traditional office valuation for George Square (10%) - Discounted cashflow for George Square (15%) - Residential valuation for Montgomery Street (7.5%) - Industrial valuation – leasehold (5%) and freehold (5%) - Hotel valuation (7.5%) - Total for calculations – 50% of course grade. Project details - Part Two - Critical Reflections This part of the submission requires you to prepare two short critical reflections on the statements below. These should be considered independently of each other – as separate responses, rather than merging any details together. Please prepare responses of 750 words maximum for each. The word count does not include any references or visuals / figures you may choose to include to enhance your work or demonstrate key points. However, if you include text within tables, for example, this will contribute to your word count. Harvard referencing should be used throughout, and a full reference list provided. There is no hard and fast rule on the number of references – you need to find a balance between content and references which works best for you. However, it will not work in your favour if you fail to include any, or only very limited, references. Evidence of additional reading and around the topic areas is key to communicating your perspectives in an informed way, and references help to support points. These reflections should clearly state a variety of perspectives and bring together challenges, barriers, positives and negatives which reflect the complexity of the real estate market and the practicalities of valuation and appraisal processes. Part of the challenge is successfully doing this within a limited number of words – you need to be considered and thoughtful in what you choose to include. Clarity is key. Provide responses to each of the following: 1. Critically evaluate the strengths and weaknesses of both contemporary and traditional approaches to valuation, considering how well they effectively capture the ‘value’ of real estate across different asset classes. 2. Reflect on the various challenges and barriers to collecting and collating appropriate real estate data for valuations and appraisals – how can we understand what data is appropriate in global markets?
CS-350 - Fundamentals of Computing Systems Homework Assignment #8 - BUILD Due on November 14, 2024 — Late deadline: November 16, 2024 EoD at 11:59 pm BUILD Problem 1 We now have a server capable of performing non-trivial operations on images! All we have to do now is to make the server multi-threaded. And yes, we already have the infrastructure to spawn multiple threads. So what’s missing exactly? Output File: server_mimg .c Overview. As mentioned above, the main idea is to allow multple workers to perform operations on images in parallel. Everything you have developed in BUILD 6 will be reused, but we must make sure that when multiple worker threads are spawned, the correctness of the operations on the images is still guaranteed. But what could be jeopardizing the correctness of these operations? Let us consider a concrete case. Imaging the following sequence of requests queued at the server: (1) Retrieve image A, (2) blur image A, (3) detect the vertical edges of image A, and (4) send back the result of the operations performed on image A. With only one worker, the operations are carried out in this sequence and the result sent back to the client is an image with the cumulative operations (2) and (3) correctly applied to the source image. With 2 workers (unless we fix our implementation) we could have worker #1 and #2 working in parallel on operations (1) and (2) with the result of the operations being some weird mix of the two operations. In this assignment, the goal is to allow safe multi-threading where the semantics of sequential operations on the images is preserved even if multiple threads are spawned and operate in parallel. For this task, we will use semaphores to perform inter-thread synchronization. Design. One of the main problem that we have to solve is un-arbitrated access to shared data structures. To verify that there is a problem unless we insert synchronization primitives accordingly, start with your (or my) solution for HW6, rename it appropriately, and enable spawning more than 1 worker threads. Then, run the following simple experiment. First, run the client to generate the sequence of operations listed above with 1 worker thread and look carefully at the output report generated by the client: ./server_mimg -q 100 -w 1 2222 & ./client -I images/ -L 1:R:1:0,0:b:1:0,0:v:1:0,0:T:1:0 2222 You will notice that the first hash reported by the client (9f3363f0249c15163d52e60fd9544c31) is simply the hash of the original test1.bmp image. The second (and last) hash reported by the client is the hash (00e4fc4b9c7c71ee2ca3946053f78793) of the blur+vertical edge detection operations applied in sequence to the image. However, if we increase the number of worker to 2, the final hash will be different! For instance, when running: ./server_mimg -q 100 -w 2 2222 & ./client -I images/ -L 1:R:1:0,0:b:1:0,0:v:1:0,0:T:1:0 2222 The last hash obtained on the reference machine changes to b5932c2bcb0a64121def911286c706e2, but might be something else entirely on a different machine. Also in some cases, the server crashes entirely. To solve the problem, the cleanest way is to introduce semaphore-based synchronization between threads. In order to define a semaphore, you should use the type sem_t defined in semaphore .h. Before a semaphore can be used, it must be initialized. This can be done with the sem_init(sem_t * semaphore, int pshared, unsigned int init_value). Here, semaphore is pointer to the semaphore to be initialized, pshared can be set to 0, and init_value is a non-negative initialization value of the semaphore, following the semantics we have covered in class. Once your semaphore has been correctly initialized (make sure to check for the error value of the sem_init( . . .) call!), the wait and signal operations can be performed over it, following the semantics we have discussed in class. To wait on a semaphore, you must use the sem_wait(sem_t * semaphore) call; to signal on a semaphore, you must use the sem_post(sem_t * semaphore). Shared Data Structures. Of course, the main question is what data structures must be protected?. Here is a list of things that can be problematic, but your own implementation could be different, so try to map the statement below to your own code. (1) Image Objects: One obvious place to start is to protect the image objects that are registered with the server and upon which operations are requested by the client. We want to prevent different worker to simultaneously change the content of an image, so a good idea is to introduce one semaphore per each registered image! These must be created and/or initialized dynamically at image registration time. (2) Image Registration Array: Another shared data structure is the global array of registered images. Concurrent operations over that array is not a good idea, so all the threads will need to synchronize when trying to access that shared data structure. (3) Connection Socket: What? The connection socket has always been shared, so why is that a problem now? The truth is that it has always been a problem, but we did not care because the responses from the workers to the client were always a one-shot send( . . .) operation. But now, there are cases where the server follows a two-step approach in the protocol it follows with the client. For instance, when handling an IMG_RETRIEVE operation, a worker first provides a positive acknowledgment of completed request and then the payload of the image being retrieved. What if another worker starts sending other data while a retrieve operation is in progress? Careful: the same goes for the parent when handling IMG_REGISTER operations. (4) Request Queue and STDOUT Console: We already know that the shared request queue and the shared STDOUT console require the use of semaphores to ensure correctness. Perhaps take inspiration from the use of semaphores in those cases to handle the other shared data structures listed above. Desired Output. The expected server output is pretty much what you already constructed in HW6. Here is it summarized again for reference. You should print queue status dumps, rejection and completion notices. Queue status dumps and rejection notice are identical in format to HW5 and HW6. Once again, the queue dump status is printed when any of the worker threads completes processing of any of the requests. Just like HW6, when a request successfully completes service, the thread ID of the worker thread that has completed the request will need to be added at the beginning of the line following the format below. You can assign thread ID = (number of workers + 1) to the parent thread. If multiple worker threads are available to process a pending request, any one of them (but only at most one!) can begin processing the next request. T R:,,,,,, , Here, is a string representing the requested operation over an image. For instance, if the operation was IMG_REGISTER, then the server should output the string “IMG REGISTER” (no quotes) for this field. should just be 0 or 1, depending on what the client requested. should be the image ID for which the client has requested an operation. If the server is ignoring any of these values in the response, set these fields to 0. Finally, should report the image ID on which the server has performed the operation requested by the client. Recall that this might be different from what sent by the client if overwrite = 0 in the client’s request, but it must be the same if overwrite = 1. Additional Help. You might have noticed, from the commands recommended above, that the client (v4.2) now allows you to define a script of image operation requests. This is useful to test the correctness of your server under a controlled workload. To use this feature, you should still provide the path to the folder containing the test images using the -I parameter. Next, you should also provide the -L parameter, where the is a comma-separated list of image operations with the following format: :::. Here, is a number of seconds that will elapse between this and the next oper- ation in the script. Next, is a single case-sensitive (!!) letter that identifies which operation to be performed (see list below). • R: IMG_REGISTER • r: IMG_ROT90CLKW • b: IMG_BLUR • s: IMG_SHARPEN • v: IMG_VERTEDGES • h: IMG_HORIZEDGES • T: IMG_RETRIEVE The field should always be set to 1 (for simplicity we do not handle cases with overwrite = 0). Finally, the should be the ID on which the operation should be performed. This field has a special meaning in the case of IMG_REGISTER operations. Only in this case, it tells the client which one of the files scanned in the images folder should be registered with the server. In all the other cases, an ID = n tells the client to request the operation on the nth image that it has registered with the server. When a script. is requested at the client, the client will conveniently report how it has understood the script. For instance, when using the script. 1:R:1:2,2:b:1:0,0:T:1:0 the client will report: [#CLIENT#] INFO: Reading BMP 0: test1.bmp | HASH = 9f3363f0249c15163d52e60fd9544c31 [#CLIENT#] INFO: Reading BMP 1: test2.bmp | HASH = b6770726558da9722136ce84f12bfac8 [#CLIENT#] INFO: Reading BMP 2: test3.bmp | HASH = f2ac174476fb2be614e8ab1ae10e82f0 [#CLIENT#] INFO: Reading BMP 3: test4.bmp | HASH = 0caaef67aee1775ffca8eda02bd85f25 [#CLIENT#] INFO: Reading BMP 4: test5.bmp | HASH = 5597b44eaee51bd81292d711c86a3380 [#CLIENT#] INFO: Reading BMP 5: test6.bmp | HASH = 11552ac97535bd4433891b63ed1dd45d [#CLIENT#] Next Req . : +1 .000000000 - OP = IMG_REGISTER, OW = 1, ID = 0 [#CLIENT#] Next Req . : +2 .000000000 - OP = IMG_BLUR, OW = 1, ID = 0 [#CLIENT#] Next Req . : +0 .000000000 - OP = IMG_RETRIEVE, OW = 1, ID = 0 Submission Instructions: in order to submit the code produced as part of the solution for this homework assignment, please follow the instructions below. You should submit your solution in the form of C source code. To submit your code, place all the .c and .h files inside a compressed folder named hw8.zip. Make sure they compile and run correctly ac- cording to the provided instructions. The first round of grading will be done by running your code. Use CodeBuddy to submit the entire hw8.zip archive at https://cs-people.bu.edu/rmancuso/courses/ cs350-fa24/codebuddy.php?hw=hw8. You can submit your homework multiple times until the deadline. Only your most recently updated version will be graded. You will be given instructions on Piazza on how to interpret the feedback on the correctness of your code before the deadline.
CIS 5450 Homework 1: Data Wrangling and Cleaning (Fall 2024) Hello future data scientists and welcome to CIS 5450! In this homework, you will familiarize yourself with Pandas and Polars! Both are cute animals and essential libraries for Data Science. This homework is focused on one of the most important tasks in Data Science, preparing datasets so that they can be analyzed, plotted, used for machine learning models, etc... This homework will be broken into analyzing several datasets across four sections! 1. Working with Amazon Prime Video Data to understand the details behind its movies 2. Working on merged/joined versions of the datasets (more on this later though). 3. Regex 4. Working with Used Cars Dataset and Polars to see performance between Pandas, eager execution in Polars, and lazy execution in Polars. IMPORTANT NOTE: Before starting, you must click on the "Copy To Drive" option in the top bar. This is the master notebook so you will not be able to save your changes without copying it ! Once you click on that, make sure you are working on that version of the notebook so that your work is saved Run the following 4 cells to setup the notebook %set_env HW_ID=cis5450_fall24_HW1 %%capture !pip install penngrader-client from penngrader.grader import * import pandas as pd import numpy as np import seaborn as sns from string import ascii_letters import matplotlib.pyplot as plt import datetime as dt import requests from lxml import html import math import re import json import os !wget -nc https://storage.googleapis.com/penn-cis5450/credits.csv !wget -nc https://storage.googleapis.com/penn-cis5450/titles.csv What is Pandas? Apart from animals, Pandas is a Python library to aid with data manipulation/analysis. It is built with support from Numpy. Numpy is another Python package/library that provides effi cient calculations for matrices and other math problems. Let's also get familiarized with the PennGrader. It was developed specifi cally for 545 by a previous TA, Leonardo Murri. PennGrader was developed to provide students with instant feedback on their answer. You can submit your answer and know whether it's right or wrong instantly. We then record your most recent answer in our backend database. Let's try it out! Fill in the cell below with your 8-digit Penn ID and then run the following cell to initialize the grader. # PLEASE ENSURE YOUR PENN-ID IS ENTERED CORRECTLY. # IF NOT, THE AUTOGRADER WON'T KNOW WHO TO ASSIGN POINTS TO YOU IN OUR BACKEND # YOUR PENN-ID GOES HERE AS AN INTEGER STUDENT_ID = 99999998 # You should also update this to a unique "secret" just for this homework, to # authenticate this is YOUR submission SECRET = STUDENT_ID Leave this cell as-is... %%writefile notebook-config.yaml grader_api_url: 'https://23whrwph9h.execute-api.us-east-1.amazonaws.com/default/Grader23' grader_api_key: 'flfkE736fA6Z8GxMDJe2q8Kfk8UDqjsG3GVqOFOa' grader = PennGrader('notebook-config.yaml', "cis5450_fall24_HW1", STUDENT_ID, STUDENT_ID) We will use scores from Penn Grader to determine your grade. You will still need to submit your notebook so we can check for cheating and plagarism. Do not cheat. Note: If you run Penn Grader after the due date for any question, your assignment will be marked late, even if you already had full points for the question before the deadline. To remedy this, if you're going to run your notebook after the deadline, either do not run the grading cells, or reinitialize the grader with an empty or clearly fake ID such as 999999999999 (please use 10+ digits to be clearly a fake STUDENT_ID ) Adding our data so that our code can find it We can't be data scientists without data! We provided code for you to download the data (the "wget" cell from earlier). If you go to the view on the left and click files, you should see something similar to this image Part 1: Working with Amazon Prime Video Data [38 points] In this part of the homework we will be working with a dataset focused on Amazon Prime Video Movie Data! 1.0 Loading in Titles data (2 points) Let's first load our dataset into a Pandas Dataframe. Use Pandas's read_csv functionality, which you can find documentation for here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html While reading documentation is hard at first, we strongly encourage you to get into the habit of doing this, since many times your questions will be answered directly by the documentation (ex: "why isn't my dataframe. dropping duplicates" or "why didn't this dataframe. update"). TODO Save the Credits dataframe. to a variable named: credits_df Save the Titles dataframe. to a variable named: titles_df #TODO: Import your two files to pandas dataframes -- make sure the dataframes are named correctly! Let's focus on the titles_df for now and see what the dataframe. looks like. Display the first 10 rows of the dataframe. in the cell below (take a look at the documentation to find how to do this!) #TODO: Display the first 10 rows of `titles_df` Another thing that is often times helpful to do is inspect the types of each column in a dataframe. Output the types of titles_df in this cell below. # TODO: Display the datatypes in `titles_df` Save the types of the type , release_year , runtime , seasons , imdb_id , and tmdb_score columns to a series called titles_df_types (retaining the index names) and pass them into the autograder cell below. # View the output here! titles_df_types = # TEST CASE: titles_df_types (2pt) # [CIS 545 PennGrader Cell] - 2 points grader.grade(test_case_id = 'titles_df_types', answer = titles_df_types) 1.1 Cleaning up Titles data (4 points) When you work with data, you'll have NaNs, duplicates or columns that don't give much insight into the data. There are different ways to deal with missing values (i.e. imputation, which you can read into on your own), but for now, let's drop some of these rows in titles_df to clean up our data. Note that there might be multiple ways to do each step. Also note that a lot of the columns in titles_df have all nulls. Thus, ensure to drop the unnecessary columns before filtering out rows with nulls Refer to the documentation if you get stuck -- it's your best friend! TODO: 1.1 Make a new data frame. titles_cleaned_df Keep only the following columns: id , title , type , release_year , runtime , genres , production_countries , imdb_score , imdb_votes , tmdb_popularity , tmdb_score . Drop rows that have NaNs in them. Use the info function to see the number of null rows in this DataFrame. before this, and afterward to sense check that your operation is correct Reset the index and drop the index column which stores the original index prior to resetting the index. We recommend you print out the intermediate dataframe. prior to this to see that the indices are not consecutive! Cast title , type to type string , and imdb_votes to type int . Save the result to titles_cleaned_df . Note: The affected string columns should appear as string datatype and not object (you can check using df.dtypes). If it is not, we recommend checking up on this documentation to see how to successfully convert object into strings (Hint: cast as 'string' and not str). #TODO: Keep only the necessary columns #TODO: Drop nulls #TODO: Reset and drop the index #TODO: Cast type # TEST CASE: titles_cleaned_df (4pt) # [CIS 545 PennGrader Cell] - 4 points grader.grade(test_case_id = 'titles_cleaned_df', answer = titles_cleaned_df) 1.2 Data Wrangling with Titles Data (8 points) Now, let's process the data in an appropriate format so that we can answer some queries more easily. Make sure to use titles_cleaned_df for this part. TODO: 1.2 Create a column called is_movie that contains a value of 1 if the type of content is MOVIE and a value of 0 if not. Create the genres_expanded column in titles_cleaned_df to create individual rows for each genre of each movie. Hint: Make sure it is the correct type before doing this! Similar to before, create a production_countries_expanded column to create individual rows for each country where the movie was produced. Drop the redundant columns type , genres , and production_countries , as well as all null values, saving the result as titles_final_df . Make sure to reset and drop the index as well! (8 points) Hint: See apply, explode, json.loads, and lambda in Python documentation. Note: Feel free to reference this Geeks4Geeks link to better understand how to use the json.loads() function. You may not import ast or eval to do this. Note: We recommend printing out the intermediate steps and testing your logic on singular values and getting the correct answer, before applying it to the entire dataframe! Note: Include rows with the type SHOW, too. # TODO # TEST CASE: titles_final_df (8pt) # [CIS 545 PennGrader Cell] - 8 points grader.grade(test_case_id = 'titles_final_df', answer = titles_final_df) 1.3 Compute the Top Performing Genres 1.3.1 Compute the Best Genres By IMDb and TMDb Score (6 points) In this section we will compute the top performing genres, and will use both data from the Internet Movie Database (IMDb) and The Movie Database (TMDb) to do so. We will use titles_final_df in this section. TODO: 1.3.1 Use groupby() function Create a dataframe. genres from titles_final_df with only the columns genres_expanded , tmdb_popularity , imdb_score and tmdb_score . Filter genres to only keep those movies with tmdb_popularity greater than 2.0. Create a dataframe. genres_imdb_df that contains the average imdb_score for each genre_expanded . Make sure to keep the resultant genres_expanded and imdb_score columns Sort this in descending order, keeping only the top 10 values Create a column called score that is the average score rounded to two decimal places Reset the index and drop the index column Have only score and genres_expanded as part of genres_imdb_df Do the same steps for creating genres_imdb_df to create genres_tmdb_df with tmdb_score instead! #TODO: Create genres #TODO: Create genres_imdb_df #TODO: Create genres_tmdb_df # TEST CASE: genres_df (6pt) # [CIS 545 PennGrader Cell] - 6 points grader.grade(test_case_id = 'genres_df', answer = (genres_imdb_df, genres_tmdb_df)) 1.3.2 Compute the Percentage Difference Between Genres (6 points) In this section we will compute the differences in results between genres_imdb_df and genres_tmdb_df . TODO: 1.3.2 Merge genres_imdb_df and genres_tmdb_df on genres_expanded to create merged_df . Rename the score columns to score_imdb and score_tmdb respectively Create a column difference in merged_df that is defi ned as the absolute value of the percentage difference between score_imdb and score_tmdb . Hint: Check out the abs function for help with this! Use the following formula for this: Make sure do not use use Python iteration (e.g., for loop, while loop). Sort merged_df in descending order by difference Reset the index and drop the index column merged_df should have score_imdb , genres_expanded , score_tmdb , and difference # TODO # TEST CASE: merged_df (6pt) # [CIS 545 PennGrader Cell] - 6 points grader.grade(test_case_id = 'merged_df', answer = merged_df)
ECON 4F03 Fall 2024 Guidelines for Report 3- Final Paper Submission instructions: An electronic copy must be submitted to Avenue dropbox. If you submit your work late (without submitting an MSAF) you will be penalized. Each day is a 10% penalty. If you submit an MSAF, you still must submit the Final Paper. Use of Generative AI Students are not permitted to use generative AI in this course. In alignment with McMaster academic integrity policy, it “shall bean offence knowingly to … submit academic work for assessment that was purchased or acquired from another source” . This includes work created by generative AItools. Also stated in the policy is the following, “Contract Cheating is the act of “outsourcing of student work to third parties” (Lancaster & Clarke, 2016,p. 639) with or without payment.” Using Generative AI tools is a form. of contract cheating. Charges of academic dishonesty will be brought forward to the Office of Academic Integrity. GUIDELINES In this report, you describe your overall question and policy relevance, and describe your 3 key papers/articles from your summary. You then analyze some of their strengths and weaknesses. Your main job is to demonstrate your understanding of what constitutes strong and weak economic research, that is, to evaluate the quality of the methods rather than your personal agreement (or not) with any given policy position. I want your paper to be a balanced review of evidence. Make a title page with your name and your economic question. 2 Recommended Structure of the 4F03 Final Paper (Length of paper: 10 pages) Section 1. Introduce the context and state your economic question and the policy relevance of the question (0.5-1 page). 8 Section 2. Paper 1 (approximately 2-2.5 pages) a. Complete reference for the article and a web link. 2 b. Question addressed or hypothesis tested in the article and how it relates to your question in the introduction 2 c. Details about the data 3 i. Was it observational data or experimental data? If observational, could it be regarded as using a natural experiment? If experimental, was it a lab, social or field experiment? ii. Type of data: panel/time series/cross-sectional iii. Unit of observation: does a data point correspond to a household, a firm, an industry,a province/state,a country, etc iv. Data source(s). e.g Statistics Canada-Survey of Labour and Income Dynamics. v. Years and geographic area vi. Number of observations d. Empirical model and estimation methods (e.g. linear model with ordinary least squares, instrumental variables, difference-in-differences, etc). 3 e. The key results in relation to part b. Note that you must tie the key results to the research question you listed in part b. 3 f. Explain how the results help to answer your research question (as stated in your introduction). Sometimes the question answered by the article will be exactly the same as the one you are answering-sometimes it will be related but not identical. 2 g. potential policy implications of these results 2 h. A few (2 or 3 total) internal strengths and/or weaknesses (include your reasoning!) 4 i. A few (2 or 3 total) external strengths and/or weaknesses (include your reasoning!) 4 25 marks per article = 75 marks total for article reviews Recall that internal strengths and weaknesses refer to whether (or not) the authors’ conclusions are valid for the population from which the sample was drawn to estimate the model. Typical weaknesses includeselected samples, small samples, inappropriate model or methods for the data available (e.g OLS doesn’t handle omitted variables, simultaneity, or measurement error in the x variables.) Recall that external strengths and weaknesses refer to whether (or not) the authors’ conclusions are valid for a different population than that from which the sample used to estimate the model was drawn. How old are the data? Would one get the same estimates undercurrent conditions? Would one get the same estimates if the sample came from a different country, province, age group, ethnic group, etc.? Section 3. Paper 2 (same structure as section 2) Section 4. Paper 3 (same structure as section 2) Final Section. (1-2 pages). This should include: Overall conclusions from the 3 papers regarding your economic question and the policy relevance of the results 10 marks Unanswered questions/future research 5 marks Clarification of terminology: POLICY RELEVANCE. Refers to the policies to which a paper potentially applies. For example, an article on the relationship between mortality rates and healthcare expenditures is of potential relevance for healthcare policy. There is no reference to the actual conclusions of the paper. POLICY IMPLICATIONS. Refers to the actual conclusions of the paper and what those conclusionsimply for policymakers. For example,a finding that healthcare expenditures lead to lower mortality rates implies that there is a clear benefit from such spending (although it does not tellus whether such spending is cost effective). Writing and Formatting: The text of your final paper (not counting title page or any reference pages or tables/figures) is to beat most 10 printed pages in length. There is only one exception to this page limit: if a paper included in this report was not included in report 1(Proposal) or report 2(Summary) then you must justify how it meets the criteria described in sections II.2.1, II.2.2 and II.2.3 of the guidelines for the Proposal. This justification (and nothing else) is to be placed on page 11 following. All margins set to 2.5cm, use 12pt fonts and line spacing=1.5. Points will be deducted for poor spelling and grammar. Use a spell check and grammar check. In-text citation: Cuff (1998) or Smith, Jones and Barry (2009) If using direct quotes(which must be placed in quotation marks) or paraphrasing. Better to AVOID quotes and use your own words/ use paraphrasing. Follow the APA citations style as explained in the video found at: APA 7th in Minutes: In-Text Citations (youtube.com) A partial list of practices to avoid Long paragraphs. (Paragraphs are used to indicate a change in topic.) Contractions (won’t, can’t, etc.) Frequent use of the first person (I will discuss, I will show, etc.) Informal or “chatty” style of writing. Acknowledging Sources. Please be VERY careful to acknowledge all sources which you consulted during the preparation of your paper. You should reference not just published work, but also unpublished papers, including those of other students, and previous papers of your own. YOUR WRITTEN SUBMISSIONS WILL BE CHECKED BY TURNITIN.COM. Turnitin provides a very effective report on the extent to which your text matches the text in journal articles, working papers and other student papers. Please see the Course Outline for further details on Academic Ethics. Do not worry if it substantially matches your summary-I expect it do so as the final paper fleshes out your summary! Also do not worry if it matches quotes from the article-that is also to be expected! Writing Help McMaster have free access to an on-line program to help with writing. The user copies and pastes text into the program and receives notification of errors and suggestions for corrections. This program is called Grammarly. To register as a user, go to http://www.grammarly.com/edu/and click on “Sign Up” at the top of the page. Note that your final paper will be graded on both economic content and writing style/errors.
CIS 5450 Homework 2: SQL and DuckDB Due: Friday, October 11 2024, 10:00pm EST Worth 95 points in total (25 manually graded) Welcome to Homework 2! By now, you should be familiar with the world of data science and the Pandas library. This assignment focuses on helping you get to grips with two new tools: SQL and DuckDB. Through this homework, we will be working with SQL by exploring a Indego dataset containing bike rides, stations and weather data. We will then expand our exploration with DuckDB and inish by comparing the two with Pandas. We are introducing a lot of new things in this homework, and this is often where students start to get lost. Thus, we strongly encourage you to review the slides/material as you work through this assignment. Before you begin: · Be sure to click "Copy to Drive" to make sure you're working on your own personal version of the homework · Check the pinned FAQ post on Ed for updates! If you have been stuck, chances are other students have also faced similar problems. Y Part 0: Libraries and Set Up Jargon import pandas as pd !pip3 install penngrader-client !pip install sqlalchemy==1.4.46 !pip install pandasql !pip install geopy !pip install -U kaleido from penngrader.grader import * import pandas as pd import datetime as dt import geopy.distance as gp import matplotlib.image as mpimg import plotly.express as px # import re import pandasql as ps #SQL on Pandas Dataframe import nltk nltk.download('punkt') import duckdb from wordcloud import WordCloud from matplotlib.dates import date2num import matplotlib.pyplot as plt from PIL import Image # from collections import Counter # import random # Three datasets we're using ! wget -nc https://storage.googleapis.com/penn-cis5450/indego_trips.csv ! wget -nc https://storage.googleapis.com/penn-cis5450/indego_stations.csv ! wget -nc https://storage.googleapis.com/penn-cis5450/weather_2022_PHL.csv PennGrader Setup # PLEASE ENSURE YOUR PENN-ID IS ENTERED CORRECTLY. IF NOT, THE AUTOGRADER WON'T KNOW # TO ASSIGN POINTS TO YOU IN OUR BACKEND STUDENT_ID = # YOUR PENN-ID GOES HERE AS AN INTEGER # SECRET = STUDENT_ID %%writefile config.yaml grader_api_url: 'https://23whrwph9h.execute-api.us-east-1.amazonaws.com/default/Grader23' grader_api_key: 'flfkE736fA6Z8GxMDJe2q8Kfk8UDqjsG3GVqOFOa' grader = PennGrader('config.yaml', 'cis5450_fall24_HW2', STUDENT_ID, SECRET) Biking in Philadelphia I'm sure in your time in Philadelphia so far you've come across these blue bikes and stations. Indego is the company responsible for this bike sharing ride system, and they make data on bike trips available to the public. This data can not only be useful to get information of how people in Philly use bikes, but it can give information on the most visited places in the city which can be useful for city planners and business owners. In this homework, we'll be exploring some data about bikes including: · Trips: data about bike trips during theirst week of October 2022. · Stations: data about bike stations, their ID and Name. · Weather: data about the weather in Philadelphia during 2022. We'll be parsing this data into dataframes and relations, and then exploring how to query and assemble the tables into results. We will primarily be using DuckDB, but for some of the initial questions, we will ask you to perform. the same operations in Pandas as well, so as to familiarize you with the differences and similarities of the two. Part 1: Load & Process our Datasets [15 points total] Before we get into the data, we irst need to load and clean our datasets. Metadata You'll be working with three CSV iles: indego_trips.csv indego_stations.csv weather_2022_PHL.csv The ile indego_trips.csv contains data about each trip, like the origin station, destination station and duration. The ile indego_stations.csv includes information about stations and their status in January 2023. The ile weather_2022_PHL.csv has one row per day during 2022 and shows weather information. TODO: · Load indego_trips.csv and save the data to a dataframe. called trips_df . · Load indego_stations.csv and save the data to a dataframe. called stations_df . · Load weather_2022_PHL.csv and save the data to a dataframe. called weather_df . # TODO: Import the datasets to pandas dataframes -- make sure the dataframes are named correctly! # view trips_df using .head() to make sure the import was successful # view stations_df using .head() to make sure the import was successful # view weather_df using .head() to make sure the import was successful 1.1 Data Preprocessing Next, we are going to want to clean up our dataframes, namely trips_df and stations_df , by 1) ixing columns, 2) changing datatypes, 3) handling nulls. First, let us view theirst few rows of trips_df . You may also call .info() and additionally check the cardinality of each column to view the speciics of the dataframe. This is a good irst step to take for Exploratory Data Analysis (EDA). 1.1.1 Cleaning trips_df [8 points] .info() gives us meaningful information regarding columns, their types, and the amount of nulls, based on which we can now clean our dataframe. Perform. these steps and save results on a new dataframe. trips_cleaned_df TODO: · Drop the column plan_duration . We already have that information in the column passholder_type , which is more understandable. · Drop the rows where end_station is 3000. This is a virtual station used for maintainance, and doesn't represent a real trip. · Drop all rows with null values. · Cast the columns: o start_time , end_time , trip_route_category , passholder_type , bike_type as string. (Cast to 'string'and not 'str') o bike_id as int. · Sort results by trip_id ascending · Reset and drop the index and save results as trips_cleaned_df After performing these steps, trips_cleaned_df should have the following schema: Final Schema: trip_id duration start_time end_time start_station start_lat start_lon end_station end_lat end_lon bike_id trip_route_category passholder_type bike_type #view info of trips_df # TODO: drop plan_duration # TODO: drop rows with irrelevant end_station # TODO: drop rows with null values # TODO: cast the types of the columns # TODO: sort the results # TODO: drop the index and save the results # 4 points grader.grade(test_case_id = 'test_cleaning_trips', answer = trips_cleaned_df) Now we are going to clean up the start_time and end_time columns so that they are easier to use. We will be using Regex in this section to separate out the date and the time from the entries. TODO: · Fill in the Regex patterns to retrieve irst the date, and then the time, found in each entry. · Extract the relevant parts of each column and populate new columns called the following: date_start , time_start , date_end , time_end . Note that the datetime type does contain both date and time, but that we are wanting to explore your Regex capabilities :) · Cast the new date columns as datetime64[ns], using pd.to_datetime() and the format as '%m/%d/%Y' · Remove columns start_time , end_time #view trips_cleaned_df using .head() # TODO: cast the types of start_time and end_time to 'string' # TODO: fill in the pattern to retrieve the date of each entry # HINT: think about the unique syntax of the date section, the different variations it can appear in date_pattern = # TODO: fill in the pattern to retrieve the time of each entry # HINT: think about the unique syntax of the time section, the different variations it can appear in time_pattern = # TODO: populate columns date_start, time_start, date_end, time_end # TODO: cast the date columns to as datetime64[ns] # TODO: drop the start_time and end_time # 4 points grader.grade(test_case_id = 'test_regex_trips', answer = trips_cleaned_df) 1.1.2 Processing Stations [3 Points] stations_df contains information on Indego stations across the city. We will clean this df by removing Inactive stations and stations created after October 2022. Perform. these steps and assign the cleaned dataframe. to stations_cleaned_df . TODO: · Drop the stations that have an Inactive status. · Cast column day of go live_date as datetime64[ns]. · Drop the stations that were created after 10/7/2022 since this is the last date of rides we are analyzing. · Drop the columns day of go live_date and status · Create a new column called is_west_philly that is True if zone is 2 or 3 and False otherwise. · Save the resulting dataframe. as stations_cleaned_df , and sort it by station_id ascending After performing these steps, stations_cleaned_df should have the following schema: Final Schema: station_id station_name zone is_west_philly #view info of stations_df # TODO: Drop the stations that have an Inactive status. # TODO: Cast column day_of_go_live_date as datetime64[ns]. # TODO: Drop the stations that were created after 10/7/2022. # TODO: Drop day_of_go_live_date and status columns # TODO: Create a new column called is_west_philly that is True if zone is 2 or 3 and False otherwise. # TODO: Sort by station_id ascending # TODO: Reset and drop the index, and save the resulting dataframe as stations_cleaned_df # 3 points grader.grade(test_case_id = 'test_cleaning_stations', answer = stations_cleaned_df) Y 1.1.3 Cleaning the weather [4 Points] Then, let's clean weather_df and make it usable. We are going to make two different datasets, one for the actual data, and another for the record-holding data. TODO: · Create actual_weather_cleaned_df and only keep the following 5 columns: o date , actual_mean_temp , actual_min_temp , actual_max_temp , actual_precipitation · Create record_weather_cleaned_df and only keep the following 4 columns: o date , record_min_temp , record_max_temp , record_precipitation Then for both datasets: · Convert column date into type datetime64[ns] . · Keep only the rows from 9/1/2022 to 10/31/2022, inclusive. · Sort by column date descending. · Reset and drop the index. After performing these steps, actual_weather_cleaned_df should have the following schema: Final Schema: date actual_mean_temp actual_min_temp actual_max_temp actual_precipitation ... and record_weather_cleaned_df should have the following schema: Final Schema: date record_min_temp record_max_temp record_precipitation #view info of weather_df # TODO: create actual_weather_cleaned_df # TODO: create record_weather_cleaned_df # TODO: for both datasets, convert column 'date' into type datetime64[ns] # TODO: for both datasets, keep only the rows from 9/1/2022 to 10/31/2022, inclusive # TODO: for both datasets, sort by column 'date' descending # TODO: for both datasets, reset and drop the index # 4 points grader.grade(test_case_id = 'test_cleaning_weather', answer = [actual_weather_cleaned_df, record_weather_cleaned_df]) Part 2: DuckDB [55 points total] IMPORTANT: Pay VERY CLOSE attention to this style guide! The typical low to use duckdb is as follows: 1. Write a SQL query in the form. of a string 。 String Syntax: use triple quotes """""" to write multi-line strings 。Aliases are your friend: if there are very long table names or you ind yourself needed to declare the source (common during join tasks), it's almost always optimal to alias your tables with short INTUITIVE alias names o New Clauses New Line: each of the main SQL clauses ( SELECT , FROM , WHERE , etc.) should begin on a new line o Use Indentation: if there are many components for a single clause, separate them out with new indented lines. Example below: """ SELECT ltn.some_id, SUM(stn.some_value) AS total FROM long_table_name AS ltn INNER JOIN short_table_name AS stn ON ltn.common_key = stn.common_key INNER JOIN med_table_name AS mtn ON ltn.other_key = mtn.other_key WHERE ltn.col1 > value AND stn.col2
ENG2077 Engineering Skills 2 – CAD: Mountain Board project Module leaders: Bruce Gregg, Jacob Young & Ewan Bremner Brief: Using provided components, the concept images in this brief (Appendix 1 and 2), and your own creative design, create a “digital twin” model of a Mountainboard in Fusion. Then, generate technical drawings to assist with the manufacturing and assembly of your Mountainboard in compliance with BS8888 standards. Fusion: If you do not already have an Autodesk account, create an account via the US version of their website https://www.autodesk.com/. Do not use the UK website. Click “Sign in” and choose “Create Account” on the next page. Sign up using your university email address then click the following link to access the software:https://www.autodesk.com/education/edu-software You may be prompted to prove your eligibility for free software. Simply fill in the form and provide the requested information and you’ll be granted access. For technical support:eng[email protected] If you already have an Autodesk account from your previous year, you may be asked to renew your educational licence. Check your student email for a warning about this. Follow the instructions provided and select the 1-year student licence. For full Fusion installation guidance and first time log in, please refer to the “Fusion Installation Guidance” on Moodle. Fusion uses a cloud-based file system which means you can log in and access your files from any machine with Fusion installed. There are computer labs throughout the University and Library with Fusion installed for your benefit. Method: This is a self-led, online video course where you are expected to schedule your own learning to meet the summative assignment deadline. Complete the video training course in the Mountainboard Project section of the ENG2077 Moodle. Following the recommended schedule specified in the introductory lecture slides will result in you completing the assignment early with time to review and refine your work, and finish the assignment well clear of the exam period. Using the techniques you have learned, along with the downloadable parts and components list guidance, create a 3D model of the assembly shown in appendix 1 and 2. Generate a part drawing displaying key dimensions and details of your deck or base plate, along with general assembly drawings, and appropriate exploded view drawings containing a Bill of Materials. Submission requirements: - You must submit an “ .f3d” Fusion file containing your 3D model. - You must submit an “ .doc” word file containing a Fusion shareable link. - You must submit “.pdf” technical drawings including a General Assembly drawing, appropriate exploded view with BOM drawings, and a part drawing of your deck or base plate component. - To be submitted via Moodle. Submission deadlines: - A final “summative” submission at the end of Semester 1, which is graded. The deadline date is specified in the ENG2077 Mountainboard section’s submission portal, and in the project’s lecture slides. Training materials: All training materials are hosted on the ENG2077 Moodle portal. (Note: Some buttons or instructions maybe inaccurate due to rapid version changes in the software. If anything is unclear, please contact us at the email address below.) The practise tutorial is a five-part video series that covers everything you need to guide you towards building your own mountainboard. There is a University of Glasgow branded, technical drawing template inside the file called “ Mountainboard Project Downloads.zip” found in the section titled “ Mountainboard Project – Download Materials” . This is also where you can access the pre-provided Mountainboard components. Download the files and extract the contents. The video course will give you instructions on how to use these files. If you intend to use your own personal device to complete this task, please install and/or test the software before watching the videos. Alternatively, the University provides computer labs with Fusion pre-installed, simply log in with your Autodesk credentials. Do not put yourself in a position where you think your personal device will work with Fusion only to discover a technical problem. Personal device failure is not considered “Good cause” for failing to complete your work. Test your personal devices or do not use them. Learning outcomes: - Develop a model through the CAD generation portion of a design process - Create part components using CAE software - Create sub-assemblies and assemblies using CAE software - Understand the concept of a “top-down” modelling approach - Create technical drawings using CAE software, understand their purpose, how they function and how to create them effectively. How this is graded: This is a pass / fail module, graded out of 100. The pass / fail threshold is 50. The submitted model must be assembled as per the concept images in appendix 1 and 2, with mostly accurate joints and motions. You must produce a deck component of your own design and creativity, while still being realistic for manufacture, assembly, and riding. We will check the design timelines to ensure your deck is self-produced. There must be a minimum of three distinct technical drawings submitted, in accordance with the submission requirements. The marking focus will be split across both your inputs (3D model) and outputs (technical drawings). What we are looking for is for you to start producing CAD models that represent a digital twin of the physical product, including all necessary off-the-shelf nuts, bolts, bearings etc., and produce accurate joint motions. Can you create a 3D CAD model to a required specification, and can you communicate it effectively via technical drawings, in compliance with British standards?
CIS 5450 Homework 3: Hypothesis Testing and Machine Learning Due Date: October28that 10:00PM EST 101pointstotal(=85 autograded+ 16manuallygraded). Welcome to CIS 5450 Homework 3! In this homework you will gain some familiarity with machine learning models for supervised learning. Over the next few days you will strengthen your understanding of hypothesis testing via simulation and ML concepts using baseball, insurance, and diabetes datasets. Some housekeeping below! Before you begin: · Be sure to click "Copy to Drive" to make sure you're working on your own personal version of the homework · Check the pinned FAQ post on Ed for updates! If you have been stuck, chances are other students have also faced similar problems. Note: We will be manually checking your implementations and code for certain problems. If you incorrectly implemented a procedure using Scikit-learn (e.g. creating predictions on training dataset, incorrectly process training data prior to running certain machine learning models, hardcoding values, etc.), we will beenforcing a penalty system up to the maximum value of points allocated to the problem. (e.g. if your problem is worth 4 points, the maximum number of points that can be deducted is 4 points). · Note: If your plot is not run or not present after we open your notebook, we will deduct the entire manually graded point value of the plot. (e.g. if your plot is worth 4 points, we will deduct 4 points). · Note: If your .py ile is hidden because it's too large, that's ok! We only care about your .ipynb ile. Part 0. Import and Setup Import necessary libraries (do not import anything else!) %%capture !pip3 install penngrader-client import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.neighbors import KNeighborsClassifier from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold from sklearn.metrics import accuracy_score, precision_score, recall_score from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, StandardScaler from sklearn.ensemble import RandomForestClassifier import random import math from xgboost import XGBClassifier from penngrader.grader import * !apt install zstd !wget -nc -O diabetes_prediction_dataset.csv.zst https://www.dropbox.com/scl/fi/p8qpv4eja0xp3 !unzstd -f diabetes_prediction_dataset.csv.zst !wget -nc -O games.csv.zst https://www.dropbox.com/scl/fi/43au9nv0bty84pqg6aw64/games.csv.zst !unzstd -f games.csv.zst !wget -nc -O medical_cost.csv.zst https://www.dropbox.com/scl/fi/8nz07htxxi07xilddsulx/medica !unzstd -f medical_cost.csv.zst PennGrader Setup # PLEASE ENSURE YOUR PENN-ID IS ENTERED CORRECTLY. IF NOT, THE AUTOGRADER WON'T KNOW # TO ASSIGN POINTS TO YOU IN OUR BACKEND STUDENT_ID = # YOUR PENN-ID GOES HERE AS AN INTEGER # SECRET = STUDENT_ID %%writefile config.yaml grader_api_url: 'https://23whrwph9h.execute-api.us-east-1.amazonaw s.com/default/Grader23 d k ' l k k ' %set_env HW_ID=cis5450_fall24_HW3 grader = PennGrader('config.yaml', 'cis5450_fall24_HW3', STUDENT_ID, SECRET) Part 1: Hypothesis Testing via Simulation [17 Points Total] 1.1: Estimating Pi through Simulation [4 points] Consider a circle with radius 1/2 inside of a unit square: We could compute the area of the circle with a well-known formula and the value of π, but we can also compute both the area of the circle, and the mysterious value of π, via simulation! If we randomly sample a point inside the unit square, the probability that the point falls within the circle is equal to the area of the circle divided by the area of the square. Thus, if we sample a total of Pt points and Pc of them are in the circle, we can write the area of the circle Ac as: Solving for π gives: Below is some Python code that simulates picking a random point in the square, testing if that point is inside the circle, and keeping track of Pc and Pt. Run this code to ensure it works, and see how long it takes. The simulation should sample 10 million points. %%time def pt_in_circle(x, y): return math.sqrt(x**2 + y**2)
STA304H5F: Surveys, Sampling and Observational Data Technical Report Instructions The typical flow of a technical article is: abstract, introduction, methodology, analysis, discussion/results, conclusion/limitations, appendix. It does NOT need to follow this exact format (some journal articles have slightly different formats) . Depending on how much you want to elaborate, you can split certain sections into two parts (i.e. Analysis could become Quantitative Analysis and Qualitative analysis.) You have free reign to decide your outline insofar as it makes logical sense. Journal articles are often written for an audience that knows statistics well. That is, you will not need to elaborate terms such as p-values, null & alternative hypotheses, etc. It will be the responsibility of the reader to understand the statistical analysis method. However, it would be nice to have some plain language conclusions at the end for readability. We do not expect students to write or perform literature review, ethical statements, conflicts of interest, or funding. (However, it would be nice to include literature review if you had the time.) We are expecting students to use either LaTeX, R-Markdown, or Quarto for the report. WARNING: DO NOT SAY AFFECT AND OTHER WORDS THAT IMPLY CAUSALITY. (SUCH AS: Influence, impact, contribute…) This is not a controlled randomized trial! You should look for words such as association, relationship, connection. 1. Abstract In prior versions of this course, we did not request for an abstract. However, abstracts are necessary for any academic article. Your abstract should be a quick summary of your paper and is shown before the introduction. They are used for other researchers to tell whether the paper is worth reading. It should address the following: • A brief introduction to the topic. • Aim of the paper. • A brief statement regarding the data collection methodology. • A summary of key findings. • A brief overview of implications of results, or what needs to be improved. Here’s an example of an abstract (colour coded to match the above description): In a statistics program, most courses emphasize statistical theory over practical applications, often resulting in a focus on examinations rather than assignments. However, many statistics programs include a course on survey and sampling design, where students can be assessed through projects. Designing these projects requires more resources than tests, including increased grading workload and time spent resolving group conflicts. This study examines whether students prefer projects over traditional examinations and identifies the benefits they perceive from project-based learning. In a third-year statistics survey & sampling course, we asked students their perceptions of project based learning through Google Forms after completion of the final project. The results show that 81% of students prefer projects and found them useful for developing skill sets necessary for the workforce. The main reasons for disliking projects were group conflicts and unclear instructions. In response, we plan to provide more resources to support student success and to find ways to mitigate potential group disputes in the future. 2. Introduction Within the introduction, the following questions should be answered: • What is your study about and why should we care? • What background information is necessary to let the reader understand? • What are your research questions? • What are your hypotheses? • What is the brief outline of the rest of your paper? Below, a student wrote two different drafts of an introduction regarding a study including cannabis usage: Draft 1: Those opposed to the decriminalization of cannabis will often cite that it is destroying the youth and their cognitive ability to function in society. Cannabis is a popular drug that people occasionally will smoke for leisure activity. There are studies done that show that prolonged cannabis consumption is not good for brain development, and perhaps this could influence students to care less about school. We aim to analyze the following research question: • (RQ1) What are the impacts of cannabis usage on lecture attendance? o Null hypothesis: cannabis usage has no association on lecture attendance. o Alternative hypothesis: cannabis usage is correlated with a lower attendance rate. • (RQ2) What are the impacts of cannabis usage on students’ grades? o Null hypothesis: cannabis usage has no association on students’ grades. o Alternative hypothesis: cannabis usage is correlated with lower grades. • (RQ3) Do students perceive cannabis to be a positive asset in their life? o Null hypothesis: cannabis usage is not associated with being an advantage to one’s life. o Alternative hypothesis: cannabis usage is linked to improving one’s lifestyle. There are some issues with this introduction: 1. Fairly bland, boring, and sometimes awkward. 2. It doesn’t transition well, i.e., the introduction of research questions were quite abrupt. 3. What is their exact population? Where could they be gathering this information? 4. What is the outline for the rest of the paper? Draft 2: This study examines the impact of cannabis consumption on university students' lecture attendance. Cannabis, a psychoactive substance frequently used by young adults for recreational purposes, has garnered increasing attention due to its potential implications for cognitive and behavioral outcomes, particularly among students. Extensive research has suggested that prolonged and frequent cannabis consumption may have adverse effects on brain development and cognitive function (Iversen, 2003) . In light of this, we formulate the hypothesis that individuals who engage in regular, weekly cannabis use are more prone to reduced lecture attendance. This research endeavors to investigate the relationship between cannabis consumption patterns and student engagement with academic activities, shedding light on an area of growing concern in contemporary education. The structure of the paper is as follows: Section 2 outlines our data collection methods. Section 3 presents our quantitative analysis. In Section 4, we explore feedback from students and the common themes that arise. Section 5 discusses the qualitative results and addresses our research question. Section 6 covers the limitations of our study, and Section 7 concludes our analysis. Again, there are some issues with this introduction: 1. This is not concise (mostly the second paragraph), and honestly boring to read. It sounds like there is an abundance of “fluff” to over-compensate for lack of substance. (What is contemporary education?) 2. The research questions and hypotheses are not clear. (They also don’t mention enough.) 3. Again, we are unsure of the exact population and where they could possibly be gathering this information. We can combine drafts 1 and 2, and integrate the missing pieces, to create a superior introduction (note: longer doesn’t always mean better, it’s just that the previous two drafts had missing information) . Final Draft: Cannabis faces significant stigma from older generations due to its illegalization and negative stereotypes, such as the belief that it increases laziness. Moreover, extensive research suggests that prolonged and frequent cannabis consumption may have adverse effects on brain development and cognitive function (Iversen, 2003) . While some individuals may use cannabis for recreational purposes, it can also provide relief from medical conditions that induce high levels of pain. In this study, we analyze whether the stigma against cannabis is well-deserved. Specifically, we examine the impact of cannabis consumption on university students' lecture attendance and grades at a North American intensive university. In October 2024, we deployed a survey via email to collect data on students’ cannabis usage, academic performance, demographic factors, and attitudes towards cannabis. The survey included both users and non-users of cannabis for comparisons. Additionally, we carefully differentiated between prescribed and recreational cannabis use. We aim to study the following research questions: • (RQ1) What are the impacts of cannabis usage on lecture attendance? o Null hypothesis: cannabis usage has no association on lecture attendance. o Alternative hypothesis: cannabis usage is correlated with a lower attendance rate. • (RQ2) What are the impacts of cannabis usage on students’ grades? o Null hypothesis: cannabis usage has no association on students’ grades. o Alternative hypothesis: cannabis usage is correlated with lower grades. • (RQ3) Do students perceive cannabis to be a positive asset in their life? o Null hypothesis: cannabis usage is not associated with being an advantage to one’s life. o Alternative hypothesis: cannabis usage is linked to improving one’s lifestyle. The structure of the paper is as follows: Section 2 outlines our data collection methods. Section 3 presents our quantitative analysis. In Section 4, we explore feedback from students and the common themes that arise. Section 5 discusses the qualitative results and addresses our research question. Section 6 covers the limitations of our study, and Section 7 concludes our analysis. 3. Methodology This is where you outline your data collection methodology in detail. ● Where and when was the data collected? (Piazza, lectures, tutorials, online databases…) ● What sampling method did you use? (SRS, stratified, etc.…) ○ How did you ensure randomness? (randomly sampling from an R program, systematically deploying surveys in lecture based on seating arrangements…) ● What were your strata, if any? ● What was your sample size? ● What is a general summary of the questions you asked? (What are your variables?) Below we present an example. Between May 2023 to August 2023, a survey meant to understand students’ academic performance and recreational drug usage was deployed within an introductory statistics course at a research-intensive North American university. We utilized simple random sampling by using a Python random generator to randomly sample 50 students from an email list of all students taking the introductory statistics course (N = 242) . Out of 50 students, 19 did not respond, thus we were left with only (n = 31) responses. The survey consisted of 10 short answer questions, asking for their average cannabis usage per week, lecture attendance, and miscellaneous demographic factors (gender, program of study, age.) 4. Analysis WARNING: In a journal article, codes are not provided unless in an appendix. Do not put R-code here. Things that are included in this section: ● Show relevant graphs & tables. ● Show computations for your sample size. ● Necessary assumptions for tests are shown before utilizing them. (If assumptions are not satisfied, you should not be using that test!) ○ For example, the assumptions for the two-sample test for mean are the following: ■ The samples are independent from each other & are obtained randomly. ■ The samples are normally distributed. ■ The variances for the two independent groups are equal. ● Show the outputs of the computations and statistical tests, i.e., p-values, test- statistics are provided here. ○ Remark: the tests for significance should be done at the 0.05 level. ● Some statistical information that may be useful to reference (depending on the context): standard deviation, confidence interval, median, IQR. If you have used a questionnaire, use N = 200 for computing the sample size. n, unfortunately, is already known. So, you will be reverse calculating as if you wanted a certain bound to fit your sample size. You do NOT need to show EVERYTHING you computed. Only the computations that you will DISCUSS in the report. Normally, a lot of side-things you compute at first will have no statistically significant results. (I.e., gender may have no connection with the amount of cannabis a person will smoke.) Part of this project is having the ability to decipher what is important to report. This is a snippet of what to include in the analysis section: For our study, our population size is N = 300 and we plan to collect data using simple random sampling. To determine a sample size to collect, we’ll go with the calculation focusing on the mean population parameter. It is assumed that there is an equal proportion amongst those who enjoy projects in courses versus those who enjoy examinations. Hence, given a bound of error of 0.13, the sample size calculation ends up being like: Hence, we sampled 50 students for our analysis. We found that 40% of our participants identified as male (n = 20), and the rest were female (n = 30) . We also had 64% domestic students (n = 32) and 36% international students (n = 18) . Surprisingly, only 40% said they preferred projects over examinations (n = 20) . To answer RQ1, we need to calculate the one-sample proportion test. All of our assumptions are satisfied, as we have a random sample, a binomial distribution (two binary outcomes/choices) and we have at least 5 from each outcome (np = 20, n(1-p) = 30) . We obtain a Chi-square test statistic of 2, df = 1, and p = 0.1573. RQ2 tries to determine if certain demographic factors contribute to whether someone prefers assignments. Again, we need to check if the assumptions for the two-sample proportion test are satisfied. We have the same binomial distribution, and the samples are still independent from each other. Below is a table that summarizes the last assumptions. All assumptions were satisfied, so we included a brief result of the statistical test. We also analyzed the Likert scale data, which had options which ranged from: “strongly disagree”, “disagree”, “slightly disagree”, “neutral”, “slightly agree”, “agree”, and “strongly agree” . We denoted “strongly disagree” as 1, “strongly agree” as 7. To see whether students perceived their skills to be enhanced through project courses in general, we conducted the one-sample t-test for the mean, where option 4 (neutral) denotes no change. The assumptions for the one-sample test involves the data being randomly selected, independent from each other, and the normal appears normally distributed. The first two conditions have been addressed, and the last condition can be satisfied by the central limit theorem due to having 50 samples which is greater than 30. Below is a summary of the Likert scale questions, as well as the one-sample test for the mean: Remark 1: some interesting data points to include would be the confidence interval, which I omitted due to laziness. I also did not include “advanced tests” . Remark 2: it was unrelated to any of the research questions, but if you peek at the attached code file I tested to see if there were any differences amongst the perceptions of Likert scale answers between male & female, and international & domestic. There were no differences, so I did not include them. If you have time for more results, and they do end up being significant, it would be interesting to include them. 5. Discussion/Results In this section you should write the interpretation of your results , and hopefully in plain language conclusions. You may also use this section to elaborate facts regarding a table or a graph in detail, such as pointing out abnormalities or highlighting the stark differences between two groups in your data. A discussion section is typically longer the more interesting the results, but it can be made relatively short if no results were found. 6. Limitations Briefly mention limitations and what should be done in the future. Consider what you would’ve changed in your survey if you went back in time to redo the study. A common limitation in studies involve an inadequate sample size, failing to reject the null hypothesis, missing confounding variables, biased sample size and/or survey, poor survey questions. A common issue in prior years is that students mentioned limitations that were more of a critique of the method as opposed to their own study. They probably stole these ideas from ChatGPT. As a warning, the following are not sufficient limitations (and I will address why): • Lack of causal connections. STA304’s official name is “Survey, Sampling, & Observational Data” . Observational data, in nature, will never lead to causal results. If you want to make a causal connection, you’re in the wrong course. • Self-reported data. You want to stalk and hack into a database that reveals information about your fellow classmates just for your project!? The course name also includes “surveys”! Surveys will always be tied with self-reported data. • Temporal factors. Maybe this would cause an effect … But realistically, are you going to spend a year collecting survey data? I doubt most people’s opinions don’t drastically change unless your project is entirely season dependent … There are more nonsensical answers that generative AI tools will spout out. I am begging you to take a second to think about your limitations rather than consulting AI. (In fact, I may have literally given the answers in the first paragraph of this section.) Here’s an example of an adequate limitation section: In our study we tried to see which demographic factors (gender, ethnicity, domestic or international, and year of study) are linked with an overreliance of using AI tools using the Chi Square test for independence or the non-parametric Fisher’s exact test (depending on whether the assumptions of the Chi square were satisfied) . Unfortunately, in all cases we failed to reject the null hypothesis. Hence, there are no ties between various demographic factors and whether students tend to use AI tools. In fact, we found that 90% of students admitted to using AI generative tools. As a result, we are less likely to see any relationships due to a high majority already using them. In the future, we will incorporate open-ended questions to ask students to elaborate on their preferences. 7. Conclusion In this section you should have the following: • Summarize your findings and explicitly answer your research questions. o No new information should be provided; everything mentioned here should have been mentioned in the past. • Talk about more in detail of what researchers should do in the future. Naturally, people may wonder the difference between an abstract and the conclusion. • The abstract is supposed to be a concise summary of the entire paper. It goes through the phrases of an introduction, methodology, analysis, and results. • The conclusion does summarize the paper but emphasizes on the latter half of the paper (results and limitations) . It does not include background information. It will also re-address limitations and go into depth about future directions.
1. Preliminary InformationUnlike previous projects, the scope of this project is the same for both ECE 463 and ECE 563 students. This is because the OOO pipeline must be modeled in its entirety to measure cycles to execute a trace. Note that the project focuses on modeling data dependencies (only through registers), pipeline stages, and structural hazards (Issue Queue and Reorder Buffer). Therefore, we assume perfect branch prediction and perfect caches, and ignore memory dependencies: you will NOT integrate a BTB, conditional branch predictor, instruction cache/TLB, data cache/TLB, or Load Queue/Store Queue.You must implement your project using the C, C++, or Java languages, for two reasons. First, these languages are preferred for computer architecture performance modeling. Second, our Gradescope autograder only supports compilation of these languages.1.4. Responsibility for self-grading your project via Gradescope You will submit, validate, and SELF-GRADE your project via Gradescope; the TAs will only manually grade the report. While you are developing your simulator, you are required to frequently check via Gradescope that your code compiles, runs, and gives expected outputs with respect to your current progress. This is necessary to resolve porting issues in a timely fashion (i.e., well before the deadline), caused by different compiler versions in your programming environment and the Gradescope backend. This is also necessary to resolve non-compliance issues (i.e., how you specify the simulator’s command-line arguments, how you format the simulator’s outputs, etc.) in a timely fashion (i.e., well before the deadline).In this project, you will construct a simulator for an out-of-order superscalar processor that fetches and issues N instructions per cycle. Only the dynamic scheduling mechanism will be modeled in detail, i.e., perfect caches and perfect branch prediction are assumed.The simulator reads a trace file in the following format: … Where: For example: ab120024 0 1 2 3 ab120028 1 4 1 3 ab12002c 2 -1 4 7 Means: “operation type 0” R1, R2, R3“operation type 1” R4, R1, R3“operation type 2” -, R4, R7 // no destination register! Traces are posted on the Moodle website. The simulator executable built by your Makefile must be named “sim” (the Makefile is discussed in Section 6). Your simulator must accept command-line arguments as follows: sim The parameters , , and , are explained in Section 5. is the filename of the input trace.The simulator first outputs the timing information for each dynamic instruction in program order, followed by final outputs (simulator command, processor configuration, and simulation results). See Section 6 regarding the formatting of these outputs and validating your simulator. The simulator outputs the timing information for each dynamic instruction in the trace, in program order (i.e., in the same order that instructions appear in the trace). The per-instruction timing information is output in the following format: fu{} src{,} dst{}FE{,} DE{…} RN{…} RR{…} DI{…} IS{…} EX{…} WB{…} RT{…} is the line number in trace (i.e., the dynamic instruction count), starting at 0. Substitute 0, 1, or 2 for the . , , and are register numbers(include –1 if that is the case). For each of the pipeline stages, indicate the first cycle that the instruction was in that pipeline stage followed by the number of cycles the instruction was in that pipeline stage. Here is an example instruction from one of the validation runs. 5 fu{2} src{15,-1} dst{16} FE{5,1} DE{6,1} RN{7,1} RR{8,1} DI{9,1} IS{10,3} EX{13,5} WB{18,1} RT{19,1} Notice that the begin-cycle of a given pipeline stage equals the begin-cycle of the immediately preceding pipeline stage plus the number of cycles spent in the immediately preceding pipeline stage. For example, the instruction’s first cycle in the EX stage is cycle 13, which is the first cycle in IS (10) plus the number of cycles spent in IS (3).The simulator outputs the following after completion of the run: Figure 1. Overview of microarchitecture to be modeled, including the terminology and parameters used throughout this specification. Parameters: Function units: There are WIDTH universal pipelined function units (FUs). Each FU can execute any type of instruction (hence the term “universal”). The operation type of an instruction indicates its execution latency: Type 0 has a latency of 1 cycle, Type 1 has a latency of 2 cycles, and Type 2 has a latency of 5 cycles. Each FU is fully pipelined. Therefore, a new instruction can begin execution on a FU every cycle. Pipeline registers: The pipeline stages shown in Figure 1 are separated by pipeline registers. In general, this spec names a pipeline register based on the stage that it feeds into. For example, the pipeline register between Fetch and Decode is called DE because it feeds into Decode. A “bundle” is the set of instructions in a pipeline register. For example, if DE is not empty, it contains a “decode bundle”.Table 1 lists the names of the pipeline registers used in this spec. It also provides a description of each pipeline register and its size (max # instructions).Table 1. Names, descriptions, and sizes of all of the pipeline registers. About register values: For the purpose of determining the number of cycles it takes for the microarchitecture to run a program, the simulator does not need to use and produce actual register values. This is why the initial Architectural Register File (ARF) values are not provided and the instruction opcodes are omitted from the trace. All that the simulator needs, to determine the number of cycles, is the microarchitecture configuration, execution latencies of instructions (operation type), and register specifiers of instructions (true, anti-, and output dependencies). This section provides a guide to implementing your simulator.Call each pipeline stage in reverse order in your main simulator loop, as follows. The comments indicate tasks to be performed. // To issue an instruction: // 1) Remove the instruction from the IQ.// 2) Add the instruction to the// execute_list. Set a timer for the// instruction in the execute_list that// will allow you to model its execution// latency. // enough free entries to accept the entire// rename bundle, then process (see below)// the rename bundle and advance it from// RN to RR.//// Apply your learning from the class // lectures/notes on the steps for renaming:// (1) allocate an entry in the ROB for the// instruction, (2) rename its source// registers, and (3) rename its destination// register (if it has one). Note that the// rename bundle must be renamed in program// order (fortunately the instructions in // the rename bundle are in program order). } while (Advance_Cycle()); // Advance_Cycle performs several functions. First, it advances the simulator cycle. Second, when it becomes known that the pipeline is empty AND the trace is depleted, the function returns “false” to terminate the loop.Sample simulation outputs are provided on the Moodle site. These are called “validation runs”. Refer to the validation runs to see how to format the outputs of your simulator. You must submit, validate, and self-grade[2] your project using Gradescope. Here is how Gradescope (1) receives your project (zip file), (2) compiles your simulator (Makefile), and (3) runs and checks your simulator (arguments, print-to-console requirement, and “diff -iw”): runs have whitespace. Note, however, that extra or missing blank lines are NOT ok: “diff -iw” does not ignore extra or missing blank lines. See the required report template in Moodle for the grading breakdown, experiments, and report contents. Use the report template as the basis for the report that you submit (insert graphs, fill in answers to questions, etc.). Various deductions (out of 100 points): -1 point for each day (24-hour period) late, according to the Gradescope timestamp. The late penalty is pro-rated on an hourly basis: -1/24 point for each hour late. We will use the “ceiling” function of the lateness time to get to the next higher hour, e.g., ceiling(10 min. late) = 1 hour late, ceiling(1 hr, 10 min. late) = 2 hours late, and so forth. For this third and final project, Gradescope will accept late submissions no more than one week after the deadline. The goal of this policy is to allow adequate time for the TAs to grade reports and assess partial credit for simulator development effort (for simulators that don’t match any validation runs), before final grades are due for the semester. See Section 1.1 for penalties and sanctions for academic integrity violations. It is good practice to frequently make backups of all your project files, including source code, your report, etc. You can backup files to another hard drive (your NFS B: drive in your NCSU account, home PC, laptop … keep consistent copies in multiple places) or removable media(flash drive, etc.).Correctness of your simulator is of paramount importance. That said, making your simulator efficient is also important because you will be running many experiments: many superscalar processor configurations and multiple traces. Therefore, you will benefit from implementing a simulator that is reasonably fast. One simple thing you can do to make your simulator run faster is to compile it with a high optimization level. The example Makefile posted on the Moodle site includes the –O3 optimization flag. Note that, when you are debugging your simulator in a debugger (such as gdb), it is recommended that you compile without –O3 and with –g. Optimization includes register allocation. Often, register-allocated variables are not displayed properly in debuggers, which is why you want to disable optimization when using a debugger. The –g flag tells the compiler to include symbols (variable names, etc.) in the compiled binary. The debugger needs this information to recognize variable names, function names, line numbers in the source code, etc. When you are done debugging, recompile with –O3 and without –g, to get the most efficient simulator again. As mentioned in Section 6, another reason for being wary of excessive run times is Gradescope’s autograder timeout. I have written a tool that allows you to display instruction schedules identical to the ones drawn in class. You may use this tool as an optional visualization aid. o Download the scope tool from the Moodle website. o Run your simulator and redirect its output to some filename, . Go to the Moodle website to view an example. [1] The ISA is MIPS-like: 32 integer registers, 32 floating-point registers, the HI and LO registers (for results of integer multiplication/divide), and the FCC register (floating-point condition code register).[2] The mystery runs component of your grade will not be published until we release it. The report will be manually graded by the TAs.
5/5 - (1 vote)
You are responsible for the appropriate level of detail. Step by step solution (scanned from paper or word doc drawing or any tools of that kind) is required. NO DIRECT answers. 55, 50, 10, 40, 80, 90, 60, 100, 70, 80, 20, 50, 22 =================================================================
Advanced Excel Module 2 Lecture Module 2 –lookup function and data Table management Outline Part 1 Lookup Functions - Vlookup • Exact lookup • Approximate lookup - HLookup - XLookup Part 2 Sorting and Filtering Using Tables - Sort by one or more columns - Filter by color - Filter with conditions - Text Filter - Format table as Table and rename Part 1 Lookup functions File: Part 1 - Basic lookup functions.xlsx VLOOKUP is a powerful Excel function that allows you to search for a value in the first column of a table and return a value in the same row from a specified column. Vlookup will lookup the Lookup_value in the 1st column of the Table_array, and try to find the value that matches with the Lookup_value , once found, return the cells content in the col_index_num with an approximate or an exact match (Range_lookup) VLOOKUP(lookup_value, table_array, col_index_num, range_lookup) Lookup_Value: • Is the value to be found Table_Array: • Is a table of data in which data is retrieved Table_array can be a reference to a range or to a table name Col_Index_num: • Is the column number in table_array from which the matching value should be returned e.g. the first column of values in the table is column 1 Range_lookup: • Is a logical value: to find the closest match in the first column of the table (sorted in ascending order) = TRUE or 1 or to find an exact match = False or 0 Worksheet: Vlookup (Exact match) Task 1: Retrieve Name and Price information corresponding to the ID using vlookup o C17: =VLOOKUP(C15,A3:C12,2,FALSE) o C18: =VLOOKUP(C15,A3:C12,3,0) Worksheet: Vlookup (Approximate match) Task 2: Find Discount rate (D4:D13) for each price from the discount table (H3:I8) o D4: =VLOOKUP(C4,$H$4:$I$8,2,TRUE) Note: • there is no exact match of the price to the discount rate table. - Example: For Price = 675.18, excel will go through the discount table and try to find the interval that contains this value. First, it tries to find the step value (600) that is lower than the Price, and the next step value (900) is higher than the Price, then returns the corresponding discount rate within this interval, that is 10%. • If the price is -100$, the output will be #N/A since Excel cannot find a step valueless than - 100$. • Remember to lock the table_array with $ sign or You may also use a named table, Table_Discount for H5:I8 - Method 1: Format an excel Table Select any cell within the table, or range of cells you want to format as a table On the Home tab, click Format as Table Keep one cell active on the table Table tab > change the name in Table Name field - Method 2: Define name Select the cell, range of cells that you want to name On the Formulas tab, click Define name Type the name in Name field Task 3: Retrieve Grade (D17:D27) for each mark from grade table (H17:I27) o Name grade table H17:I27 as TableGradeScale o D17: =VLOOKUP(C17,TableGradeScale,2,TRUE) Note: When implementing an approximate match, you need to define your searching table_array in ascending order! If the grade scale table is in descending order, you'll get the following: because excel is unable to search for approximate values in any order other than ascending order Now the table is horizontal. We can use Hlookup HLOOKUP(lookup_value, table_array, row_index_num,[range_lookup]) Task 4: Retrieve Name and Price information given the ID using Hlookup o C7: =HLOOKUP(C6,$B$2:$K$4,2,FALSE) o C8: =HLOOKUP(C6,$B$2:$K$4,3,FALSE) Worksheet: Xlookup X lookup will research the Lookup_value in the range of the Lookup_array, and try to find the value that matches with the Lookup_value , once found, return the cells content within the Return_array when the Lookup_value is not found, If_not_found value is returned and the research will be done with a specific Match_mode method and with a specific Search_mode XLOOKUP(lookup_value, Lookup_array, Return_array, If_not_found, Match_mode) Lookup_value: • Is the value to search for Lookup_array: • Is the array or range to search Return_array: • Is the array or range to search If_not_found: • Is returned if no match is found Match_mode: • Specify how to match Looup_value against the values in Lookup_array 0: exact match, -1: exact match or next smaller, 1: exact or next larger, 2: wildcard character match) Search_mode: • Specify the search mode to use 1: search first to last, -1: search last to first Benefits of using Xlookup: It can use a lot of different match mode It can search for both horizontal and vertical data It can perform. a reverse search It can return entire rows and columns of data instead of a single value It can include the "if not found" argument Task 5: Retrieve Name and Country information given the ID using Xlookup o Method 1: using range B3: =XLOOKUP(A3;C6:C15;A6:B15;"ID not found";0;1) o Method 2: using table array Assuming that the cell range A5:D15 has been formatted as a table with the name Table_Employee B3: =XLOOKUP(A3;Table_Employee[Emp ID];Table_Employee[[Employee Name]:[Country]];"ID not found";0;1) Part 2 Sorting, filtering, Using Tables File: Part 2 - Sorting, filtering, tables.xlsx Task 1: Sort by CustomerID in descending order o Method 1: • Select the entire table (Ctrl+A) > Data tab> in Sort & Filter Click on Sort • Select CustomerID, on what you want to sort and the order Note: Excel will automatically identify if there is a header. ☑My data has header be sorted. o Method 2: • Format range A1:H202 as a table: • Select one cell in the table • In Home tab, click Format as table button you get automatically into the header row the filter button you can sort the column with the order you want • Click filter button of the column CustomerID • Then select Sort Largest to Smaller Task 2: Sort table with two columns (levels): LastName ascending, FirstName descending Note: you must use Sort dialog box (see Task1, Method1) because when you use twice the Filter button Excel will retain only the last sort you did Task 3: Sort or Filter by color o Click on the ‘Filter’ button of the column CustomerID o Select Sort by color or Filter by color Note: The rows are not in consecutive numbers. The funnel icon in the filter button shows that filtering has been done in that column. Task 4: Clear Filter o Method 1: • Click the funnel icon of the column for which you want to clear the filter • Select Clear Filter This method clears only the filter on this column o Method 2: • Data tab> in Sort & Filter Click on Clear clear This method clear all columns filters Important: it is possible to clear a filter BUT it is not possible to clear a sort. Task 5: Number Filter with CustomerID greater than 100 o Click on the ‘Filter’ button of the column CustomerID o Select Number Filters and Greater Than … o Then type 100 Task 6: Text Filter all customers living in New York o Click on the ‘Filter’ button of the column City o Type New York in the Searchbox Task 7: Text Filter all customers with LastName end with ‘on’ o Click on the ‘Filter’ button of the column LastName o Select Text Filters and Ends with … o Type on Task 8: Text Filter all customers with Address contains 'road' o Click on the ‘Filter’ button of the column Address o Type road in Searchbox Task 9: Text Filter all customers with 'a' as second letter of LastName o Click on the ‘Filter’ button of the column LastName o Select Text Filters and Begins with … o Type ?a Note: here, ? is what we call a wildcard, it represent any single character Task 10: Format the ‘Customers table’ as Table If it's not already done: o Select one cell into the table Warning: DO NOT select the whole worksheet, the table range is A1:I202 o Select the table region > Ribbon ‘ Home’ > ‘Styles’ > ‘Format as Table’ Now if you click on any cell, a new ribbon ‘Table Design’ will appear. Try the following operation: • Rename the Table as ‘Table_Customer’ You'll find Table name on the left side of the Table Design tab • Add a new column 'FullName' at the end of the table Excel resizes automatically the table • In 'FullName' column, Create a formula to concatenate FirstName and LastName columns, you'll add a space as separator Note: when you click into the cell, Excel table recognizes automatically column names you should get the following formula: =[@FirstName]&" "&[@LastName] • Add to the bottom of the table a total row to count the number of customers Tips: Check Total row option in Table Design tab, and you get automatically the count at the bottom of the last column Note: you could change the calculation using the drop-down list at the right side of the cell Worksheets ‘Products’, ‘Orders’ and ‘Orders products list’ Task 9: Change Table design for worksheets ‘Products’, ‘Orders’ and ‘Orders’ products list’, and name them ‘TableProducts’, ‘TableOrders’ and ‘TableOrderDetails’ respectively. Worksheet ‘Orders products list’ Task 10: Add new column 'Price' after the last column Note that the table is automatically resized, and calculate the 'Price' = ‘Price per Unit’ * ‘Quantity’ Note that the formula takes into account the table column names, [@[Price per Unit]]*[@Quantity] and not the cell references when clicking into the cell, C2*D2 You may also notice that the formula is automatically copied to the bottom of the table Task 11: Add new column 'Final Price' after the last column and calculate as ‘Price’ – ‘Price’ * ‘Discount’ Task 12: In Ribbon ‘Table Design’, Remove duplicates with the same ProductName Task 13: In Ribbon ‘Table Design’, insert Slicer of ProductName and select all 'Chocolate' products
CS 300: Programming II – Fall 2024 Due: 10:00 PM CT on MON 12/02 P09 Leaderboard Overview STUDENTS BE AWARE: WE ARE ENABLING PvP IN CS 300 Okay no we’re not. But if we were, this is how we’d keep track of who’s winning. Hierarchical data structures allow us to efficiently maintain a sorted data structure, so our first attempt at this will be using a Binary Search Tree to maintain a leaderboard of the players of a game. Grading Rubric 5 points Pre-assignment Quiz: accessible through Canvas until 11:59PM on 11/24. +5% Bonus Points: students whose final submission to Gradescope is before 5:00 PM Central Time on WED 11/27 and who pass ALL immediate tests will receive an additional 2.5 points toward this assignment, up to a maximum total of 50 points. 12 points Immediate Automated Tests: accessible by submission to Gradescope. You will receive feedback from these tests before the submission deadline and may make changes to your code in order to pass these tests. Passing all immediate automated tests does not guarantee full credit for the assignment. 20 points Additional Automated Tests: these will also run on submission to Gradescope, but you will not receive feedback from these tests until after the submission deadline. 13 points Manual Grading Feedback: TAs or graders will manually review your code, focusing on algorithms, use of programming constructs, and style/readability. 50 points MAXIMUM TOTAL SCORE Learning Objectives After completing this assignment, you should be able to: ● Describe the structure and functionality of a Binary Search Tree. ● Implement a Binary Search Tree with the relevant recursive algorithms for adding, removing, and traversing nodes. ● Demonstrate the utility of the Iterator, Iterable, and Comparable interfaces. Additional Assignment Requirements and Notes Keep in mind: ● Pair programming is NOT ALLOWED for this assignment. You must complete and submit P09 individually. ● The ONLY external libraries you may use in your program are: java.util.Iterator, java.util.NoSuch Element Exception ● Use of any other packages (outside of java.lang) is NOT permitted. ● You are allowed to define any local variables you may need to implement the methods in this specification (inside methods). You are NOT allowed to define any additional instance or static variables or constants beyond those specified in the write-up. ● You are allowed to define additional private helper methods. ● Only Game and LeaderboardTester may contain a main method. ● All classes and methods must have their own Javadoc-style method header comments in accordance with the CS 300 Course Style Guide . ● Any source code provided in this specification may be included verbatim in your program without attribution. ● All other sources must be cited explicitly in your program comments, in accordance with the Appropriate Academic Conduct guidelines. ● Any use of ChatGPT or other large language models must be cited AND your submission MUST include screenshots of your interactions with the tool clearly showing all prompts and responses in full. Failure to cite or include your logs is considered academic misconduct and will be handled accordingly. ● Run your program locally before you submit to Gradescope. If it doesn’t work on your computer, it will not work on Gradescope. Need More Help? Check out the resources available to CS 300 students here: https://canvas.wisc.edu/courses/427315/pages/resources CS 300 Assignment Requirements You are responsible for following the requirements listed on both of these pages on all CS 300 assignments, whether you’ve read them recently or not. Take a moment to review them if it’s been a while: ● Appropriate Academic Conduct, which addresses such questions as: ○ How much can you talk to your classmates? ○ How much can you look up on the internet? ○ How do I cite my sources? ○ and more! ● Course Style. Guide, which addresses such questions as: ○ What should my source code look like? ○ How much should I comment? ○ and more! Getting Started 1. Create a new project in Eclipse, called something like P09 Leaderboard. a. Ensure this project uses Java 17. Select “JavaSE-17” under “ Use an execution environment JRE” in the New Java Project dialog box. b. Do not create a project-specific package; use the default package. 2. Download two (1) PROVIDED Java source files from the assignment page on Canvas . You will not modify these files at all: a. BSTNode.java b. Game.java 3. Download three (3) INCOMPLETE Java source files from the assignment page. You must complete these files: a. Player.java – implements the Comparable interface b. Leaderboard.java – implements the Iterable interface c. LeaderboardTester.java 4. Create one (1) new Java source file within that project’s src folder: a. LeaderboardIterator.java – implements the Iterator interface Implementation Requirements Overview In this project you will implement an application that maintains a leaderboard of the players of a PvP (player-vs-player) game, ordered by their scores in that game. Players may challenge other players in the game, which may result in changes to the leaderboard. We are NOT providing additional documentation beyond this writeup and the comments in the provided code. Provided Classes The following classes are provided in their entirety. You do not need to implement anything in them. ● BSTNode – a generic Binary Search Tree node. ● Game – a class which maintains a given game’s Leaderboard across PvP challenges. Classes You Will Implement You will completely or partially implement the following classes: ● Player – represents a single player of the game, including their name and numeric score. This class is nearly complete, but you must make these objects Comparable to other Players and complete the compareTo() method. ● Leaderboard – a binary search tree consisting of BSTNode. We have provided the public interface to this data structure, but the real work is done in the protected recursive helper methods, which you must implement. ● LeaderboardIterator – Leaderboard’s toString() method relies on an enhanced for loop, which in turn requires an iterator. This iterator must begin at the Player with the smallest score in the leaderboard, and iterate through the entirety of the leaderboard in increasing order. ● LeaderboardTester – a tester class for your BST implementation. Organization of the Leaderboard The core data structure in this project is the Leaderboard, which is a Binary Search Tree of Player objects. The BST orders the Players by their score (and breaks ties using the Players’ names), so the minimum value in the BST is the Player with the smallest score and the maximum value is the Player with the largest score. We are providing you with a generic BSTNode class. Your Leaderboard must be built out of BSTNode (note Player, not T), and you will implement the relevant algorithms in the Leaderboard class. You will also need to implement Comparable for Player, which is the comparison method that you will use in Leaderboard. Implementation Details and Suggestions Begin by adding the required interfaces to your classes: ● Player must be Comparable (to what type?) ● Leaderboard must be Iterable (over what type?) ● LeaderboardIterator must be an Iterator (over what type?) 1. Implement compareTo() and Tests Once Player is Comparable, you will need to complete the required compareTo() method according to the description in the comments. At this time you should also implement the first three tester methods in LeaderboardTester: ● testCompareToDiffScore ● testCompareToSameScoreDiffName ● testCompareToEqual The comments in the file are intended to provide some direction, but you are welcome to add additional tests as you see fit. 2. Implement Leaderboard and Tests We recommend implementing the methods in the order below. Testing and debugging the latter methods will be significantly easier if you know the structure of your BST is correct, and implementing the earlier methods will help you get the structure correct. After implementing each of these methods, implement the relevant tester methods in LeaderboardTester, and test your implementation thoroughly! If you wait until the very end to test, you will have lots of weird and confusing bugs. See the next subsection for implementation/testing hints. 1. getMinScoreHelper and getMaxScoreHelper 2. countHelper 3. lookupHelper and the following test methods: a. testLookupRoot b. testLookupLeft c. testLookupRight d. testLookupNotPresent 4. addPlayerHelper and the following test methods: a. testAddPlayerEmpty b. testAddPlayer c. testAddPlayerDuplicate 5. nextHelper and the following test methods: a. testGetNextAfterRoot b. testGetNextAfterLeftSubtree c. testGetNextAfterRightSubtree 6. removeHelper and the remaining test methods Note that you may NOT add ANY loops (of any kind) in the Leaderboard class. You MUST implement the above methods recursively. The loop present in the toString() method is the only permitted loop. 2.1 Some Hints ● Draw LOTS of pictures! ● Write tests as you go along. If you save testing for last you will be sad. ● You MAY use the addPlayer() method to construct a tree for other testers, but for the lookup tests in particular (since you won’t have completed that method yet) you will probably want to check out the provided get Root() method in Leaderboard. You can add ONE player to the tree (we’ve provided the code for this), and then set up the rest of the tree by using the BSTNode methods set Left() and set Right(). ○ Note that doing this will NOT affect the size field, so if you need to know how many nodes are present, you’ ll need to use count() instead. ● For most of the Leaderboard tests, it is easiest if you use nearly identical Players that only differ by one aspect (e.g. all players are named “A” but have different scores, or all players have the default 1500 score but different names). This makes keeping track of the correct ordering easier. ● Remember: in every node, the left and right subtree should also be a valid binary tree. That is, if you were to just yank out any left or right node anywhere in the tree (with its descendants), they could stand alone as a valid binary search tree. ● If you are writing really long methods, you are probably overthinking it. My longest method by far is removePlayerHelper, clocking in at about 40 lines with comments and whitespace, and it’s about twice as long as anything else in the class. 3. Implement the Iterator The LeaderboardIterator implements the Iterator interface and iterates through all values in the tree in increasing order. This means that a full run of for (Player p : leaderboard) System .out.println(p); should begin with leaderboard.get MinScore() and end with leaderboard.get MaxScore(), and include leaderboard.size() number of different lines. YOU MAY CHOOSE the implementation details of this class (constructor, data fields, etc) as long as it conforms to the following requirements: 1. Implements the Iterator interface 2. Constructed and initialized by Leaderboard’s iterator() method 3. First call to next() returns leaderboard.get MinScore() 4. Can call next() without an exception exactly leaderboard.size() number of times 5. When hasNext() returns false, calling next() causes a NoSuchElementException Your Leaderboard class must also support use of an enhanced for loop as shown above. 4. [OPTIONAL] Run the Game Now that your code is completed, you should be able to successfully run the Game.java code supported by your Leaderboard. Check out the sample output on the assignment page if you want to verify that everything is working as expected! Assignment Submission Hooray, you’ve finished this CS 300 programming assignment! Once you’re satisfied with your work, both in terms of adherence to this specification and the academic conduct and style. guide requirements, make a final submission of your source code to Gradescope. For full credit, please submit the following files (source code, not .class files): ● Player.java ● Leaderboard.java ● LeaderboardIterator.java ● LeaderboardTester.java Additionally, if you used generative AI at any point during your development, you must include screenshots showing your FULL interaction with the tool(s). Your score for this assignment will be based on the submission marked “active” prior to the deadline. You may select which submission to mark active at any time, but by default this will be your most recent submission.
Control of an Electric Drive in Simulink Introduction Simulink is a dynamic simulation environment of Matlab, in which complex physical systems can be modelled through differential equations and their behaviour can be analysed. An electric drive can also be modelled in Simulink through the equations governing its operation. However, as the electric drive consists of different subsystems, such as an electric motor, a power electronic converter, a mechanical load, each of these subsystems can be modelled separately before combining them into a single model to emulate the behaviour of a complete electric drive. Fig. 1 shows the basic structure of an electric drive. The type of the motor determines the configuration of the power converter, the number of sensors, and the control algorithm. For example, if the motor is a dc machine, then the power converter would be a half-bridge (two-quadrant drive) or a full H-bridge (four-quadrant drive), there will be one current sensor and one dc-link voltage sensor. The position/speed of the rotor is acquired through a shaft-mounted position sensor. Fig. 1 A typical electric drive Using the blocks and tools offered by Simulink, the physical behaviour of the blocks shown in Fig. 1 can be emulated. The scheme of Fig. 1 in terms of Simulink blocks is shown in Fig. 2. The highlighted areas represent different subsystems of Fig. 1. The area labelled ‘Display’ shows a scope on which different quantities can be plotted as a function of time to visualize the time evolution of different variables. The subsystems are briefly described below. Fig. 2 Simulink block diagram of a dc motor drive The motor In Fig. 2, a dc motor is shown as the actuator, but it can also be any other electrical machine, such as a three-phase permanent magnet synchronous motor. The details of the ‘Motor’ subsystem are shown in Fig. 3. As observed, they are the electrical and mechanical state equations of a separately excited constant flux dc motor. The applied armature voltage is the electrical actuation signal and the torque produced by the machine acts as the mechanical actuation signal. The load torque is shown as a separate input, which can be either a constant, a step function or any other load torque profile depending on the application being analysed. The outputs of the dc motor block are the armature current and the rotor mechanical position. The user can choose to have the mechanical speed as another output. The parameters of the dc motor can be set/changed by double clicking on the block and just inputting the new values in the dialog box. Fig. 4 shows the dialog box for the dc motor parameters. Since all the parameters shown in Fig. 4 are in their standard SI units, the inputs (voltage and load torque) and the outputs of (current and angle) of the motor block should also be interpreted in their standard SI units. Fig. 3 Simulink block implementation of the state equations of a constant flux dc motor Fig. 4 Parameter dialog box for a constant flux dc motor The power converter For a dc motor drive, the power electronic converter can consist of a half-bridge or a full H-bridge depending on whether the motor is required to rotate in one direction only (half-bridge) or in both directions (full-bridge). To preserve generality of the implemented drive system, a full H-bridge is simulated to give maximum flexibility to the user. The power converter block also includes a pulse width modulation (PWM) scheme that converts the duty cycles for the two legs of the H-bridge (da and db) into pulses of varying widths. The dc-link voltage is defined as a constant input decided by the user. The modulator block’s parameter dialog box is shown in Fig. 5, which requires the user to input the switching frequency in Hz. The details of the modulator block are shown in Fig. 6. Fig. 5 Parameter dialog box for the modulator Fig. 6 H-bridge modulation scheme Sensors and ADCs In electric drives, voltage, current and position sensors are used to measure the dc-link voltage, the load currents and the shaft position respectively. Since these quantities are in the analog domain while the control, in modern electric drives, is in digital domain, an analog to digital conversion is necessary. Analog to digital converters (ADCs) do this conversion and provide the controller with measurements at a fixed sampling frequency (decided by the drive designer). The sensors measuring the voltage and current also introduce noise on the measurements, which is normally a zero-mean, constant variance white noise. In addition to the white noise on the analog signal, the noise due to the quantization effect of the ADCs impacts the measurement in the digital domain further. All these effects are simulated inside the ‘Sensing subsystem’ of Fig. 2 as detailed in Fig. 7. For the shaft position measurement, incremental or absolute position sensors are normally used in electric drives. The resolution of the position signal available to the controller depends on the number of pulses per revolution of the incremental encoder or the bit resolution of the absolute encoder. The fixed resolution of the position sensors introduces a quantization noise on the position signal. This quantization noise is emulated in the simulation for an incremental encoder. Fig. 7 Sensor subsystem structure It can be noticed from Fig. 7 that there is only one input current ia but two other currents ib and ic are included to allow the user to simulate a three-phase system. For a three-phase machine, the currents ib and ic must also be added as inputs to the block rather than constants as shown in Fig. 7. Fig. 8 shows the parameter dialog box for the sensing subsystem. The range of the current and voltage measurement must be set such that this range is not exceeded at anytime. The resolution of the ADCs is usually 12-bit in commercial electric drives but can also be 14 to 16-bit in case of high-end drives. The pulse- per-revolution (ppr) value for incremental encoders starts from as low as 12ppr for very low-cost encoders and can be in excess of 10,000ppr for devices used for precision applications. Fig. 8 Sensor subsystem parameters Control algorithm The control algorithm for the electric drives is normally executed on a digital signal processor (DSP) at a fixed control execution frequency, usually at the switching frequency of the power converter. The control routines are normally written in a high-level language such as C. The block labelled ‘Control’ in Fig. 2 emulates the behaviour of a DSP that samples the input data at a fixed frequency and outputs the duty cycles for the power converter after one execution cycle. The details of the block are shown in Fig. 9. This block consists of a Matlab s-function. S-functions (system-functions) provide a powerful mechanism for extending the capabilities of the Simulink environment. An S-function is a computer language description of a Simulink block written in MATLAB, C, C++, or Fortran. The block labelled ‘simple_control’ is like any other Simulink block but its behaviour can be fully controlled by the user by modifying the program that describes it. Fig. 9 Details of the block labelled ‘Control’ in Fig. 2 In electric drives, the control algorithm is executed on a DSP that can be programmed in C, BASIC and assembly languages with C being the most commonly used language. The s-function feature of Simulink is therefore used to program the functionality of the block ‘simple_control’ in C. The program describing an s-function block must follow a certain structure and must contain some pre-defined functions and definitions. To program the s-function block properly, it is recommended to start with an example code such as ‘sfuntmpl_doc.c’ or ‘sfuntmpl_basic.c’ available from Matlab and modify according to the requirements of the application. The available templates are for a level 2 s-function. The number of inputs, outputs and parameters of the s-function block are defined inside the C program and they must match the inputs and outputs in Simulink. The parameters passed by Simulink to the s-function are listed in the dialog box of the s-function as shown in Fig. 10. In Fig. 10, the only parameter that Simulink passes to the s-function is Ts, the sampling time. Inside the C program describing the s-function, this parameter Ts is used to define the execution sample time of the s-function block i.e. the block is executed every Ts seconds. Since the execution time of the s-function must match the switching period of the power converter and the sampling frequency of the current, voltage and position measurements, the parameter Ts is defined as a global constant for the simulation. To change this parameter, go to: File->Model Properties->Model Properties, click on the tab Callbacks and then click InitFcn. Fig. 10 Parameter dialog box for the s-function shown in Fig. 9 Some screenshots from the C code for the s-function ‘simple_control’ are shown below with a brief explanation of the functions, variables and parameters. S_FUNCTION_NAME: this constant defines the name of the s-function and it must correspond to the name of the file (without the extension .c) which is also used as the s-function name in the block (see Fig. 10). The header files, such as aux_funcs.h and Constants.h are user-defined .h files that contain definitions of functions and constants used in the code. The two header files are included as an example, others can be defined and included as necessary. Fig. 11 Code lines defining the type of the s-function, inputs, outputs and parameters U(element): this function macro gets a pointer to the vector of inputs from Simulink and allows to get the inputs to local variables. NUM_INPUTS, NUM_OUTPUTS, NUM_PARAMS: these must correspond to the inputs, outputs and parameters of the s-function block in Simulink. If these constants do not match the s-function block’s conditions, Matlab will generate an error and will not compile the code for execution. The parameters passed by Simulink to the function can be accessed as elements of the parameter array starting from 0. For example, the first parameters will be read in as: (mxGetPr(ssGetSFcnParam(S,0))[0]). The second parameter can be read in as: (mxGetPr(ssGetSFcnParam(S,0))[1]). The global variables should be defined outside of any functions so that they’re accessible to all the functions. In Fig. 11, TS, TS_INV and thm_prev are global variables. Variables that must hold their values between executions can be declared as global, although it is not strictly necessary. In Fig. 12, the sizes of the inputs, outputs, sample times, and other arrays are defined. It is important to set the number of sample times to 1 through the function ssSetNumSampleTimes(S, 1); as the s-function is intended to be a single-execution-rate block in our application of an electric drive. The sample time of the s-function is then set by calling ssSetSampleTime(S, 0, Ts); as shown in Fig. 13. The figure also shows the initialization conditions that the user can set, for example, assigning initial values to the global variables. Fig. 12 Definition of the s-function code array sizes Fig. 13 s-function sample time and initialization conditions The outputs for the model are calculated through the function mdlOutputs shown in Fig. 14. First, the inputs from the Simulink environment are read into the local variables and arrays. The calculations necessary for the control of an electric drive are performed on these local variables before passing the outputs to Simulink. This is the function where almost all of the code related to the drive’s control should reside. Fig. 14 Some code lines for mdlOutputs function To complete the process of building a Simulink block from a C program, the code must be compiled into a Matlab executable file. The command used for this is mex (that stands for Matlab executable). This command must be called in the Matlab command window by ensuring that the folder in which the code files are located is selected as the ‘current folder’ in Matlab (see Fig. 15). All the .c and .h files that contain the functions used inside the main file ‘simple_control.c’ should be within the current folder and all .c source files must be included as input arguments of the mex command as shown in Fig. 15, where aux_funcs.c is the second .c source file that must be compiled along with simple_control.c. Every time anything is changed in the code (e.g. changing a parameter in a .h file or adding/deleting a line in any .c file of the project), the mex command should be repeated before expecting a change in the behaviour of the s-function block. This process is similar to compiling and building the project files in a DSP code. Fig. 15 Instructions for compiling the C code into a mex file A C compiler will be needed to compile the code into a mex file. There are several compilers available from Mathworks, any of these can be used for compiling the code. Once successfully mexed, the current folder will have a mexw file with the name of the s-function e.g. simple_control.mexw64. This file will be accessed by Simulink during simulation as the contents of the S-function block shown in Fig. 9. The task Your task is to understand the model and the basic project you are provided. Simulate it with different conditions to enhance your understanding and be familiar to the model and the C code. Then, starting from the basic project as described above, develop the model and C code for the following objectives: 1) Armature current control of the dc motor 2) DC motor’s speed control 3) Apply different load torque profiles to test your speed control 4) DC motor’s position control (optional) 5) A model to simulate a three-phase PMSM drive (advanced) 6) Simulate vector control of a three-phase PMSM (advanced)
Module Title Business Statistic and Data-Driven Decision Making Assignment Mode Individual Assignment Word Count 300 words (+/- 10%). Excl. References and Appendix Citation Format At least three citations in APA format Marks 100 marks Due Date Friday of Week 4, 11.59pm Assignment Brief In this assignment, you are tasked with identifying a product of personal interest (e.g. smart phone, running shoe, etc.), hereafter referred to as Product XYZ. As a member of the marketing team for a company selling Product XYZ, your goal is to recommend a suitable launch price for a newly developed model of Product XYZ. Conduct a market survey to gather price data for Product XYZ online. You may also collect additional relevant variables that could aid in your analysis (e.g. features, brand reputation, etc.). Gather at least 10 records and organize your findings in the following format: No. Brand Model / Description Price (S$) Source / URL 1 2 … … … 10 Note: You may add more columns if you choose to collect additional variables. Complete a management report of approximately 300 words. Your report should be professionally formatted in .pdf format and must include the following sections: • Introduction: o Provide an overview of the research, clearly stating the objective and significance of the study. o Describe the data source and method used for data collection. Include the collected data in the appendix. • Results: o Present the data collected using relevant charts and descriptive statistics. o Interpret these charts and statistics to highlight key insights. o Include detailed calculations of statistics to demonstrate your understanding. o Recommend a launch price for your company model of Product XYZ. • Reflection: o Reflect on what you have learned about statistical methods and data analysis through this exercise. o Provide suggestions for improvement. • References: Cite all external sources used in your research, if any, following APA guidelines. WNote that you are not required to include references for the sources of price data collected during your market survey. • Appendix: Include a table showing the data you collected during your market survey. Rubric Components Maximum Marks Introduction 20 Results 30 Reflection 30 Quality of Work 20
CS101A: Guidelines for Research Paper, Fall 2024 Objective This research paper is an opportunity to demonstrate your understanding of issues and theories in critical Canadian Communication Studies. It is also an opportunity to demonstrate and practise scholarly research, critical thinking and good writing. Your paper will present an identifiable argument, a clear thesis and scholarly research. Deadline By 12:30pm on Thursday, November 28, 2024 to MyLS dropbox (Research Paper). See syllabus for late policy. Evaluation (20% of final grade) Evaluation will be based on evidence that you have used 10 scholarly sources to support and interpret your thesis. Use sources from your annotated bibliography. Include any number of additional popular sources (e.g., government documents, news item, film, web material) in addition to your 10 scholarly sources. The latter (in brackets above) are not scholarly sources. Format • Margins: 2.5cm (one inch) • Length: 6-8 pages (not including title page or bibliography), double-spaced text • Font: 12-point, Times New Roman Choose a topic (one only!) 1. Convergence and Concentration are major issues for citizens and governments alike, particularly in democratic societies such as Canada. With reference to a specific example, argue for or against ownership restrictions in Canadian media. Your chosen example may focus on a particular sector, such as newspapers or radio broadcasting, or you may choose to focus on a specific case study. Consider questions such as these : Why should we care who controls the media? Are larger Canadian media companies better positioned to compete internationally than smaller ones? What are potential problems of corporate bias and influence when a few big companies control much of media in a democratic society such as Canada? 2. Canadian Content Requirements (CanCon) are contentious for Canadian governance, media, society and audiences. Argue for or against CanCon with reference to a recent controversy involving Netflix or other streaming services available via YouTube, Amazon, Google, Apple, Disney, etc. Consider questions such as these : Are CanCon requirements still relevant in 2024? Are CanCon requirements enforceable or important in 2024? Do CanCon requirements protect Canadian cultural identity, or do they entrench our marginality in this contemporary era of globalization and streaming services? 3. Copyright is of concern to content producers and media users because digital content is easily downloaded and shared across multiple media platforms. Argue for or against a copyright term (e.g., 0 years, 14 years, 20 years, 50 years, etc.). Consider questions such as these : Whose interests are best served by the copyright term you have chosen? Would access by media users be limited or expanded according to your copyright term? Creative Commons a good copyright alternative? 4. Canadian media companies in the digital age face significant challenges from media platforms such as Google and Facebook that allow users to share their content for free. Consider this issue in relation to recent attempts to support Canadian media, such as the Online News Act (Bill C-18), for example. Introduce your reader to the issue in the context of the history of Canadian media policy. Consider, its impact on Canadian culture, democracy, and freedom of expression. 5. Activism is an integral part of societal change. By advocating for change in different capacities, Canadians have been able to change the trajectory of history. Recent examples of activism in the Canadian media include the Black Lives Matter movement, Idle No More, and efforts to unionize Amazon and Starbucks, among others. Consider different forms of activism through different forms of media in Canada. How has activism in Canada been shaped by media and viceversa? Tips • Start thinking about, and making notes, for your paper sooner rather than later. • Contact the Writing Centre for guidance and feedbacßk:www.wlu.ca/writing • Do not use long quotes from anyone source. This can lead to plagiarism. • Do not use quotes as filler. It will be obvious to the reader. • Give your paper a unique title that summarizes your topic or thesis. • Run a grammar and spell check before submitting your paper. • Print your paper and read it aloud. It will be easier to catch errors and gaps.