For this part of the assignment, you will examine the workings of the Harris corner detector. Implement the Harris corner detector as described in class (Lecture 5, Slide 48, and Tutorial 2&3) mainly using NumPy , going through each of the described steps:1. Compute the image derivatives (optionally, blur first). 2. Compute the square of the derivatives. 3. Apply Gaussian Filtering on the output of Step 2. 4. Compute the cornerness function response: Determinant(H) − kT race(H) 2 ), where k=0.05.5. Perform non-maximum suppression. Note 1: note that you can use OpenCV functions for gaussian filtering and computing image derivatives.Note 2: This question is worth 20 points: • 10 points for correct analysis and displaying desired outputs. • 10 points for correct, complete, and clean code.Note 3: You can access the images from: ‘data/Q1’ Apply the algorithm to three different images: 1. Checkerboard (Left part of Figure 1): Change the value of the threshold to attain detected corners that are similar to those in the right part of Figure 1. Observe and report the effect of changing the threshold values.2. Building image (Figure 2): Explore the different thresholds and report your observations. Figure 1: Checkerboard input image and the expected Harris corner detector output. (red dots represent the detected corners) (source) Figure 2: Federal Building in Port Huron, MI, US (source)Note 1: This question is worth 40 points: • 4 points for answering section 2.1 correctly. • 6 point for correct analysis and displaying desired outputs for section 2.2. • 10 points for correct analysis and displaying desired outputs for section 2.3. • 10 points for correct analysis and displaying desired outputs for section 2.4. • 10 points for correct, complete, and clean code.Note 2: For this question, you should utilize the pair of images that you captured in assignment 1. In this question, we are going to call them image_1 and image_2. Verify the invariance of SIFT features under changes in image scale and rotation. You are allowed to use inbuilt OpenCV/Scikit-Image functions for this question.Note 3: An instance of a brute-force method here could be computing Euclidean distances between all possible pairs of sift features in the two images.2.1 SIFT in a nutshell Briefly describe the 4 main actions/steps of the SIFT method. 2.2 SIFT between two different pictures 1. Compute SIFT keypoints for image_1 and image_2. 2. Match all keypoints between two images using a brute-force method. 3. Sort the matching keypoints according to the matching distance. 4. Display the top ten matched keypoints.5. Plot the matching distance for the top 100 matched keypoints. Plot the indices of keypoints on the x-axis and the corresponding matching distance on the y-axis.2.3 Invariance Under Scale 1. Compute SIFT keypoints for the image_1. 2. Scale image_1 using scaling factors of (0.25, 0.6, 3, 5). This should result in a total of 4 different transformed images. Display scaled images. 3. Compute SIFT keypoints for all scaled images (total 4) transformed images. 4. Match all keypoints of image_1 to the transformed images from the previous part using a brute-force method.5. Sort the matching keypoints according to the matching distance. 6. Display the top ten matched keypoints for each pair of the image_1 and a transformed image. 7. Plot the matching distance for the top 100 matched keypoints. Plot the indices of keypoints on the x-axis and the corresponding matching distance on the y-axis.8. Discuss the trend in the plotted results. What is the effect of increasing the scale on the matching distance? Explain.2.4 Invariance Under Rotation 1. Rotate image_1 at the angle of (30, 75, 90, 180). Display rotated images. 2. Compute SIFT keypoints for all (total 4) transformed images. 3. Match all keypoints of image_1 to the transformed images using a brute-force method. 4. Sort the matching keypoints according to the matching distance.5. Display the top ten matched keypoints for each pair of image_1 and a transformed image. 6. Plot the matching distance for the top 100 matched keypoints. Plot the indices of the key points on the x-axis and the corresponding matching distance on the y-axis. 7. Discuss the trend in the plotted results. What is the effect of increasing the angle of rotation on the matching distance? Explain.You are given three different views of the same scene in a folder ‘data/Q3’ (Figure:3). Follow these steps in order to stitch given images: (a) Compute the SIFT keypoints and corresponding descriptors for images 1 and 2. (b) Find matching keypoints in images 1 and 2 and display the 20 best pairs. (c) Find the homography that best matches the keypoints from image 1 and 2 using the RANSAC method, and apply the resulting transformation to image 1. Image 2 should not be transformed.(d) Stitch the transformed image 1 and the original image 2 together using linear image blending. Let us call this image 12. Display this image. (e) Compute the SIFT keypoints and corresponding descriptors for images 12 and 3. (f) Find the matching keypoints in 12 and 3 images and display the 20 best pairs.(g) Compute the homography using the RANSAC method. Apply the transformation to image 3. Image 12 should not be transformed. (h) Stitch the transformed image 3 and image 12 together using linear image blending. Display the resulting image.(i) Discuss: Note that we could also use multi-band blending in the section (h). When should one prefer pyramid blending over linear blending? Hint: Here is a short introduction to image blending. Figure 3: Scenery to Stitch.
Using a cellphone camera or a standalone digital camera, capture two images of a household object, where each image of the object is taken from a different viewpoint. These images can be of any size. An example of such an image pair is shown below:Please note that all assignments are to be done individually (not in groups), and so you must acquire your own images. Do not share images or copy someone else’s images! You will be using these images, as well as the processed images, in some future assignments, so it is important to do all the steps correctly. Display the original images in the assignment’s Jupyter notebook.If your images are color (RGB), convert them to Grayscale, by averaging each pixel’s R, G, and B values. Display the grayscale images in the assignment’s Jupyter notebook. Smooth the pair of grayscale images using a 5×5 pixel Gaussian kernel. Then repeat the smoothing on the original grayscale images, this time using a 11×11 Gaussian kernel. Display the smoothed images in the notebook.Compute the x and y derivative images of the smoothed images using the horizontal and vertical Sobel filters. Display the derivative images in the notebook.Compute the edge gradient magnitude and orientation of the smoothed images using the Sobel filter values. Display the magnitude and orientation images in the notebook. For the orientation image, display the angle value using an RGB colormap, such as ‘jet’ in the imshow() function (e.g. something like ax.imshow(data, cmap=’jet’) ). Details about matplotlib colormaps, including other colormaps you can try, can be found at: https://matplotlib.org/stable/tutorials/colors/colormaps.htmlSetup opencv in your colab or home computer environment (go to the Tutorial to learn how to do this). Use the Canny edge detector implementation in opencv to compute the Canny edge detector on your smoothed images. (look at https://docs.opencv.org/3.4/da/d22/tutorial_py_canny.html for details). Plot the Canny detector output images in the notebook.
This exercise will extend the use of GPIOs and will employ the DAC and the buzzer (speaker) to generate observable audio outputs. You will also practice the use of a timer (TIM), interrupts and finally the DMA. The lab will be done in two parts, with the second part building on the success of the first one.This exercise relies on the previous laboratory exercises, the classes and tutorials, and it will focus on the use of basic hardware blocks within the processor. In addition to consulting the class notes, you should consult the processor documentation to complete this exercise. Some specific hints will be given in the tutorial and the lectures leading to this lab exercise.STM32L4+ processors have multiple timers, as described in detail in Sec. 38 of the STM32L4+ Reference Manual, available on MyCourses. These timers can generate a variety of signals and interrupts, and they are able to start DMA.You will need to use Cube MX to configure a few GPIO pins for analog output. Since DAC converts digital register values (i.e., integers) into analog values (i.e., voltages), we will use that signal to drive a speaker with an oscillating signal. You will drive two different signals on two different DAC output channels: a saw wave and a triangle wave, with as similar a frequency as possible.There are two basic paths to discovering which pins must be configured for the on-board DAC. The first, cumbersome way is through manuals, so we will ignore it here. The better path is to use MX. In the Pinout & Configuration tab, MX summarizes many of the features of the chip on the left-hand side, under categories such as System Core, Analog, Timers, etc. Under Analog, choose DAC1. Enabling OUT1 and OUT2 will automatically enable the correct pins in the appropriate mode.To configure the DAC, we need to find DAC1 under Analog in the list of features under Pinout & Configuration. If you haven’t already, in DAC1 Mode and Configuration, enable OUT1 and OUT2 in Connected to external pin only mode. Then, verify the DAC Out1 Settings and DAC Out2Settings: • Output Buffer (Enable) • Trigger (None) • User Trimming (Factory trimming) • Sample And Hold (Sampleandhold Disable)Step 1: Making Signals In Lab 2, you have read the state of a button and written to a LED (besides using ADC). In this lab, you will initialize and write to the DAC to generate signals in an audible frequency range, such that we can observe the system operation with a small speaker. Button and a LED will be used a bit differently.You should at first implement the code that manually generates two signals: a triangle wave, and a saw(tooth) wave. Without use of interrupts, it is difficult to precisely time these signals. However, do your best to generate oscillating signals with a period of ~15 ms (corresponding to 65 Hz, or note C2). You will be shown below how to observe the generated signals by a debugger before sending them to the DAC.Next, assign each signal to a different DAC output channel. To initialize the DAC and write data to it, you’ll need more HAL functions. Sections 16.2.3 and 16.2.4 of the HAL Driver User Manual list the functions you will need; they are detailed in Section 16.2.7.Note that the DAC can operate with either 8-bit (0 to 255) or 12-bit (0 to 4095) precision. You make this choice with parameters passed to the HAL driver. Recall that 8- and 16-bit integer data types are available (uint8_t and uint16_t) and using them may simplify your implementation. Further, note that HAL_Delay(…) can be used to insert a delay between operations in your code. As a reminder, the details of its usage can be found in the HAL Driver User Manual.If you generate your code and return to IDE, the DAC1 should be configured. (Hopefully you will still remember to write your own code within USER CODE BEGIN and USER CODE END, and it’s all still there!).Please take notice whether any light (LED1?) of the board blinks when this part of your project is running. Can you explain why?Step 2: Making Sounds When the waveforms look right, wire a speaker using the components available to you. Note that: (1) Your board has the same external interface as Arduino (A0-A5 and D0-D15, plus others). (2) Your speaker is different, but it fits into the breadboard with the indicated spacing. (3) The resistor is placed in series to speaker to limit the current and protect both devices.Step 3: Making Better Sounds How do the triangle and saw waves sound? Not great. Do they have the desired frequency? Not really, though we can’t really fix this without using timers and interrupts (later!). Next, generate a signal with approximately the same period as above but using the arm_sin_f32() function in the DSP library. (similar to Lab 1.) As before, trace the values before driving the speaker.Useful Notes The Debugger use in Step 1 Above While developing your code, you will spend substantial time using the debugger. Before you test your code with a speaker, use the ITM interface to verify that it is working as intended. Ensure that the Serial Wire Viewer (SWV) is enabled and configured appropriately in the debugger configuration. Since we’ll use the ITM’s data trace functionality this time, no code modifications are required (e.g., to timestamp events). Start the debugger. Once it pauses execution at the first line of main, ensure that the SWV Data Trace Timeline Graph is visible; find it under the Window > Show View > SWV pull-down menu.Before resuming execution, you need to configure (wrench) and then start recording (red button). Configure the Serial Wire Viewer to enable Comparator 0 and Comparator 1, and write the names of the variables you wish to monitor in Var/Addr. In my case, the variables that hold the current signal values are triangle and saw.If you try to specify variables for tracing when they are out of scope (e.g., you pause and the code stops inside a library), you may get a warning indicating Variable not found! Tracing will not work properly unless you configure the comparators while the variables are in scope.When you resume execution (don’t forget to record), if everything is working properly, the data trace will rapidly fill with oscillating signals. Note that at our target frequency you may have to zoom in a bit in order to distinguish your triangle and saw waves. In my case, you will observe that the period of the two signals is not exactly the same. We’ll achieve more precise timing in later labs when we use interrupts.In this part, you will improve upon the quality of the output by using the timer to control the rate of writing to the DAC. The timer can generate interrupt to execute a special function, an interrupt handler. Interrupts tend to be more efficient than polling (which is how we’ve interacted with the button), or using HAL_Delay(…) (which is how we’ve interacted with the DAC), and gives us greater control over timing, which is essential for a wide variety of applications. We’ll first use an interrupt to detect when the button has been pressed. We’ll then use a timer, and its periodic interrupt, to determine when to write new data to the DAC. Then we’ll use the timer, and direct memory access (DMA) to write the DAC; in this last case, sending values to the DAC will be handled almost entirely by hardware, leaving the processor free for other tasks.Push button Configure the push button to enable external interrupts. In the Pinout & Configuration tab, on the left, select GPIO. Under Configuration, select NVIC, and enable EXTI line[15:10] interrupts. This means that an interrupt will be generated whenever there is a signal change on the external interrupt lines; you button should wired to one interrupt, and the corresponding code action will be written for that interrupt to affect the LED, as explained on the next page.DAC For Part 2, modify DAC to use only one channel. (Remember to check the schematic; you’ll be using your speaker again, and it will not work when wired to the wrong output.)Timer The timer will help to update the DAC output at regular intervals. What’s an appropriate interval? CD-quality audio is sampled (and reproduced) at 44.1kHz. Voice call audio can be sampled at lower rates.Choose a sampling rate (e.g., 44.1 kHz). Given your system clock frequency (e.g., 80 MHz), calculate the counter period (the maximum value of the counter) to achieve this sampling rate. In this lab, the timer pre-scaler is not necessary. Finally, under Parameter Settings, set the Trigger Event Selection TRGO to Update Event, and under NVIC Settings, enable TIM2 global interrupt. Together, these settings ensure that (a) when the timer elapses, execution in main() is interrupted; and, (b) the callback function (defined below) is executed.Step 1: Implementing Push-button Interrupts An interrupt is a signal (internal or external) that prompts the processor to stop normal execution (e.g., in main()), and begin executing an interrupt service routine (ISR) handler, a function responding to the interrupt event. In Lab 2, the code polled (checked over and over and over again) for changes in the push button signal; in this lab, you will write a function that is executed whenever the push button interrupt occurs, such that the LED shows the value of the button (1/0).Section 31.2 of the HAL Driver User Manual details the functions used to interact with GPIO. What we’re interested in, in particular, is HAL_GPIO_EXTI_Callback(…). This function is called by the GPIO external interrupt handler, and we can control what it does in main.c by simply writing a new definition; our new function is automatically used instead of the weakly defined original. Write this function in main.c (be sure to respect the function prototype defined in the HAL manual) so that it toggles the LED. This function takes as an argument the pin that caused the interrupt; it’s good programming practice to verify that the interrupt was caused by the pin we think it was. This isn’t essential for our lab, since there are no other external interrupts, but is necessary when a single callback function may need to handle various interrupt sources.Note: remember again to put your code in a USER CODE region so that it doesn’t disappear when we go back to MX to modify our configuration. We’ll be using the push button again later.Step 2: Implementing Timer-driven DAC Output Now, write a callback function for the timer. Section 72.2 of the HAL Driver User Manual details the functions used to interact with timers. You are particularly interested in two sets of functions: the TIM Base functions, and TIM Callback functions. You want to start the timer WITH global interrupts enabled, i.e., in interrupt mode. Read the function definitions carefully, so that you start your timer in the correct mode. (Yes, you need to call a function to start the timer; don’t forget to do so, as this is an otherwise very frustrating problem to debug.)Just like for the button, HAL_TIM_PeriodElapsedCallback(…) is called by the TIM interrupt handler. Write a new definition for it in main.c. Again, it’s good programming practice to verify that the timer causing the interrupt (an argument passed to your function) is actually the one you want to respond to.In this function, you’ll send a new value to the DAC (see Part 1 and Section 16.2 of the HAL Driver User Manual). You cannot pass it as an argument, because you don’t call this function; it is called asynchronously in an entirely hardware-controlled process, and the only argument is the timer that caused the interrupt.What you can do, however, is put the DAC values in a global variable (defined outside of any function, like other variables in main.c). You don’t have control over when the timer will elapse and the callback is called; you need to prepare all the DAC values, and save them in global variables that can be accessed by the callback function.In main(…), write code to populate an array with a sine wave (use the ARM math library). You can “play” this wave on our speaker (using the same circuit as in Lab 2). To get the best possible results: • Pick a wave frequency in the 1-2 kHz range (~C6-C7 in music parlance). Lower frequencies are harder to hear; higher frequencies too, depending on your hearing abilities. • Note that the timer frequency is (and must be) higher than that of the signal you want to drive; how do you ensure that your desired frequency is realized? (Nyquist to the rescue)• Note that the number of saved samples matters; if you save samples for anything other than 2nπ radians, you will have a discontinuity from the end to the beginning of the array, causing distortion.• Scale your DAC values so they vary over about 2/3 of the possible dynamic range. The chip will dynamically clamp GPIO outputs to prevent damage, limiting their current to 20 mA. If you attempt to use the full range of DAC output, the signal will look fine under high impedance (e.g., with a voltmeter or pocket oscilloscope), but will clip when connected to the speaker, causing distortion.Using a global array defining the values to be sent to the DAC, write your implementation of the timer callback so that it sends a new value from this array to the DAC each time it is called.Next, change our code to use direct memory access (DMA). DMA uses an on-chip peripheral that can be programmed to perform memory accesses. In this case, DMA will read our array of sine values and write to the DAC for us. This means that we no longer need to execute code in the timer interrupt callback, saving CPU cycles for other tasks (if we had any) or reducing power.To use DMA, you need to reconfigure the DAC. Instead of using our timer to trigger a callback that sets the DAC value, we’ll use our timer to trigger the DAC itself. Return to MX. The first thing we need to change, then, is to select the appropriate trigger in Parameter Settings. Under Trigger, choose the trigger out event corresponding to your timer.Next, we need to set up DMA. Go to DMA Settings, and add a DMA request. Choose Circular mode; this means the DAC will repeatedly read from the array, starting over from the beginning when the end is reached. Normal mode implies that the array would be read and transferred once. Choose the appropriate data width for your software; e.g., I’ve used 8-bit resolution for my DAC, and a uint8_t array for my sine waves, and therefore want DMA to transfer bytes.Now regenerate your code. Comment out or otherwise disable your timer callback; it is no longer needed. In fact, the global interrupt for your timer isn’t necessary at all, and can be disabled (though it won’t hurt anything). The last thing to do is change how you start the DAC, to start it in DMA mode (Section 16.2 of the HAL Driver User Manual).Step 4: Putting it All Together and Multiple Tone Generation Finally, combine functionality into something more sophisticated. Expand your code so that when the button is pushed, the tone played on the speaker changes. Select at least three different tones; an arpeggio (e.g., C6, E6, G6) would suit the purpose well, but anything else is fine, too. Use interrupts and DMA.Experimental Results to Demo You are asked to reach the following milestones. Grading • C implementation of signals (triangle, saw, sine) in Part 1 o 10% • Visualization of signals (triangle, saw, sine) in Part 1 o 10% • Audible confirmation of signals (triangle, saw, sine) in Part 1 o 10% • Pushbutton interrupt o 10% • Timer interrupt for driving DAC o 10% • DMA driving the data o 20% • Multiple tone audio generation o 10% • Working demo organization and success o 20%Final Report Once you have all the parts working, include all the relevant data to your report. The report should concisely explain your solution to the problem given, including the final code. You should use the established 2-column IEEE format. Please capture the screen shots and relevant code snippets, and include them in the Appendix. All code should be well documented. Any performance evaluation and correctness validation should be apparent from your written report.Due Dates The first two labs will be completed in several phases, over the three weeks. First, you should take time to understand the lab and ask any questions in regular lab sessions or through discussion groups. There will be the first lab demonstration on Mar. 8-10th , by which time you should solve Part 1, and be able to demo and explain how you approach the exercise. Please note that you will be asked to cycle through 3 different signal shapes. The final demonstration in which all the parts (interrupts, timer, DMA) are put together will be on Mar. 15th and 17th and will include showing your source code and demonstrating a working program for all test cases. The final report will be due on Friday, Mar. 18th
You have developed in the previous laboratory exercise a complete C/assembly code for the STM32L4+ processor. This lab exercise will teach you to use the General-Purpose Input/Output (GPIO) pins of the processor, as well as some of the built-in blocks that track the physical status of the processor. There are two readily available physical parameters that the STM32L4+ processor can read: its core temperature and its voltage.The program that you are asked to build will allow you to switch between updating the readout of the temperature and voltage each time the blue button is pressed on your development kit. The two variables expressing these two physical values will be observable by means of “watched variables” that the processor and the IDE allow you observing during the ongoing processor operation This exercise relies on the previous laboratory exercise, all the classes and tutorials, and it will focus on the use of basic hardware blocks within the processor. In addition to consulting the class notes, you should consult the processor documentation to complete this exercise. Some specific hints will be given in Tutorial 2 and the lectures leading to this lab exercise.You saw in the class that GPIO pins can be programmed to perform digital bit input reading, digital pin output, various special functions, as well as the analog quantity reading/writing. You will exercise the first two capabilities in this lab.The simplest GPIO operation is that of outputting a zero or one, and you can test this operation by driving the green LED light on your board. From section 6.12 of the board user manual available on MyCourses, you will see the pin PB14 drives LED2. To drive that pin, you have to enable it in Cube MX (started by double-clicking the .ioc file of your project) and declare as an output (by selecting System Core->GPIO, and then the pin PB14). The tool will allow you to label this pin with a useful mnemonic, as well as set some other parameter (e.g., pull-up or -down). Consider carefully the suitable values for those by consulting the board schematic and the basic electric circuits knowledge.Similarly, you can locate the blue button pin and configure it as an input. Again, consult the circuit schematic to deduce the default state (i.e., when button is not pressed) and the potential need for pull resistors to be included in the pin configuration.STM32L4+ processors are equipped by flexible high-performance ADC converters, as described in detail in Sec. 21.2 of the STM32L4+ Reference Manual, available on MyCourses. Commonly, an ADC would read analog quantities via external pins, but it is simpler to first focus on reading out a) the internal reference voltage and b) the internal temperature sensor. The simplest way to obtain a correct value from an ADC is to start conversion and then read out the value after the conversion is completed, and we will use this method.You will perform this exercise in several stages and record the results obtained at each step in your lab report. Step 1 First, you will write a main C program that detects the pressing of the button and tests the LED2 display by toggling the value every time the button pressed. It is perfectly acceptable to have this program run in an infinite loop generated by Cube MX after you configured the two GPIO pins. You should take note of the commands added for initializing the GPIOs, as it illustrates a way by which the Cube MX GUI generates correct code for your program.Step 2 You will configure the ADC1 to read out the reference voltage and augment your code to convert the value read to a correct voltage value. After generating the code from Cube MX, please take notice of the control structures instantiated for ADC1 and the values generated (especially the assignment of the reference voltage as the source for ADC conversion). You can test the obtained values by observing that variable in your code – no need to interrupt the execution.Step 3 Similar to Step 2, you should configure the ADC1 to read out the temperature, and then add a correct conversion into your program variable, such that the value of that variable is the temperature in degrees Celsius.Step 4 This is the “detective” part of your exercise, where you will compare the code obtained in steps 2 and 3, such that your program control can switch the readouts between the voltage and the temperature. It helps to consult the reference manual to find how to correctly initialize the readings “on fly” to avoid the errors in the procedure.Step 5 Produce the final program that alternates between the ADC readings and prepare it for the demonstration to a TA. The LED light should help in understanding what the processor is doing.Step 6 – optional Activate interrupt generation by pressing blue button and adjust the code to perform action associated with the button by interrupts.Useful Notes In realizing your code, you are free to use good C coding practices, such as the use of conditional compiling for retargeting to different execution cases given above. If you start your project by using the board support package (BSP) for your board, the clock and pins will be properly selected for you. It is a good practice to check the clock and The Debugger While developing your code, you will spend substantial time using the debugger. Please follow the instruction from your tutorials to ensure proper development and debug practices.Experimental Results to Report You are asked to reach the following milestones and include the results in your report. 1. Describe the configuration of pins such that the Step 1 is correctly run on the processor, 2. Describe the ADC1 configuration for Step 2, 3. Describe the ADC1 configuration for Step 3,4. Document the code for programmable reconfiguration of ADC1. 5. Readable and concise code/pseudo-code for your final program execution 6. Test the temperature sensor readings by (non-destructive!) heating and cooling of the processor. Fingers pressed on the processor can raise temperature a bit, and natural convection will reduce it. You can also consider the use of a hair fan.Final Report Once you have all the parts working, include all the relevant data to your report, which will be due first day in a week following the Lab 2 demonstration, but not later than 4 days after the demo.Demonstration The demonstration includes showing your source code and demonstrating a working program. Your program should be capable of manipulating a variety of test cases and should flag errors appropriately.Report The report should concisely explain your solution to the problem given, including the final code. All code should be well documented. Your report should contain a performance evaluation and correctness validation. More detail on the report will be given out in the class.Due Dates The first two labs will be completed in several phases, over the three weeks. First, you should take time to understand the lab and ask any questions in regular lab sessions or through discussion groups. There will be the first lab demonstration on Feb. 15-17th , by which time you should be able to explain to the TAs how you approach the exercise. The final demonstration in which the parts are put together and run on test cases will be on Feb. 232h and 24th and will include showing your source code and demonstrating a working program for all test cases that we will post. The final report will be due on Friday, Feb. 25th.
This exercise introduces the ARM Cortex and FP coprocessor assembly languages, instruction sets and their addressing modes. The ARM calling convention will need to be respected, such that the assembly code can be used with C programming language. The lab and a prior tutorial will introduce you to the STM32CubeIDE, including the compiler and associated tools.In the second part of the exercise, the code developed here will be used in a larger program written in C and the Cortex Microprocessor Software Interface Standard (CMSIS-DSP) application programming interface (API) that incorporates a large set of routines optimized for different ARM Cortex processors.Hence, this lab consists of two components, each requiring a week to compete: Part 1: Assembly language exercise – Kalman filter in one dimension Part 2: Combining assembly/embedded C and optimizing performance; CMSIS-DSPIn assembly and C, parameters for a subroutine are passed via stack or internal registers. In ARM processors, the registers R0:R3 are used for passing integer or pointer variables. Up to four parameters are placed in these registers, and the result is placed in R0 and R1. If any parameter requires more than 32 bits, then multiple registers are used. If there are no free scratch registers, or the parameter requires more registers than remain, then the parameter is pushed onto the stack.Since we will be also dealing with the floating-point parameters on hardware that performs floating-point arithmetic, be aware of having the option of using either software or hardware floating-point linkage, depending on whether the parameters are passed via general purpose or floating-point registers. The objective here is to use the hardware linkage, hence the floating-point registers will be used for parameter and result passing.In addition to the class notes, please refer to the document “Procedure Call Standard for the ARM Architecture”, especially its sections describing The Base Procedure Call Standard. Other documents that will be of importance include the Cortex M4 programming manual, quick reference cards for ARM ISA and the (vector) floating point instructions, all available within the course online documentation. This particular order of passing parameters is applied by major compilers.Using the STM32CubeIDE Integrated Development Environment Tool To prepare for Lab 1, you will need to go through Tutorial 1, where you will learn how to create and define projects, including assembly code projects. The tutorial shows you how to let the tool insert the proper startup code for the given processor, write and compile the code, as well as provide the basics of the program debugging.You will develop the working assembly language code for single-variable Kalman filter that can be used in later exercises. The single-variable version avoids the use of matrix operations required for larger Kalman filters, and makes it amenable to an assembly code implementation, while it still allows experimenting with and appreciating the features of this filter.Kalman filter is a state-based adaptive estimator of a physical process. Its estimation error is provably minimal for linear systems with Gaussian noise. It is the type of an adaptive filter, which is generally preferred to the fixed linear filters. The state space adaptation is performed by a sequence of discrete steps, during which the parameters of the filter change depending on the observed physical value, as well as the current state.Kalman filter performs the adaptation by maintaining the internal state, consisting of the estimated value x, the adaptive tuning factor k and the estimation error, represented by its covariance p. To obtain these values, it requires the knowledge of the noise parameters of the input measurements and the state estimation, represented by their respective covariances q and r.The high-level description of the Kalman filter code is given in the working python program in Figure 1. While the code is fully functional, and it can be directly run within a larger (Python) program, it is used here as a compact high-level specification. Please note that only the update function is required in the assembly part of Lab 1. In the second part, when you include your assembly code with the C code, the initialization function will be needed. That part will be written in C. Note also that there will be differences in the code caused by different syntax and semantics of C, compared to Python. For instance, you will need to carefully specify the data types and include the function prototypes in the code to be able to correctly link the assembly and C code.Figure 1: Python Code Definition of the Kalman Filter Class used in Lab 1 One interpretation of Kalman filter is that of tracking (or estimating the trajectory of) device for the input signal stream. At each time instance i, it generates the tracking consisting of the value vector x[i], that aims to reconstruct the original signal measurement[i]. An important feature of the Kalman filter is that the estimated value differs from the input by a value that has the statistical properties of the white noise. You will be asked later to inspect those properties using your additional statistical processing.Write a subroutine kalman in ARM Cortex M4 assembly language that processes one measurement input to update the local state required for the process estimation. You should naturally use the built-in floating-point unit by using the existing floating-point assembler instructions Your subroutine should follow the ARM compiler parameter passing convention. Recall that up to four integer and 4 floating-point parameters can be passed by integer and floating-point registers, respectively.For instance, R0 and R1 to contain the values of the first two integers and S0 will contain the value of the floating-point parameter to be added. If the datatype is more complex (e.g, struct or a matrix), then a pointer to it is passed instead. For the function return value, the register R0 or S0 are used for integers and floating-point result of the subroutine, respectively.In your case, there is a need for one fixed-point (the address of the struct) detailed below and one floating-point input parameter (value of current measurement). The state of the Kalman filter is the result of your subroutine, but keep in mind that this can be achieved in a way that no output value is produced by the subroutine. This is effectively, the “call by reference” that you should be familiar with from the C programming.The filter will hold its state as a quintuple (q, r, x, p, k) that are five single-precision floating-point numbers. In the next lab, it will be convenient to keep these state variables in a single C language struct, holding Kalman filter state in each time step, consisting of: float q; //process noise covariance float r; //measurement noise covariance float x; //estimated value float p; //estimation error covariance float k; // adaptive Kalman filter gain.The operation of the filter should be correct for all operations of inputs and state variables, including when there are the arithmetic conditions such as overflow.Function Requirements 1. All registers specified by ARM calling convention to be preserved, as well as the stack position should be unaffected by the subroutine call upon returning. 2. The calling convention will obey that of the compiler. Two registers, R0 and S0 contain the arguments, which are respectively the pointer to the state variables structure and the current measurement value. While there is no required result value, please note that the state variables pointed to by the pointer, will be modified.3. No memory location should be used to store any intermediate data. Not only that it should be faster, but no memory allocation is needed that way. 4. The subroutine should be location independent. It should be able to run properly when it is placed in different memory locations. 5. The subroutine should not use any global variables. 6. The subroutine should be as fast as possible, but robust and correct for all cases of positive and negative numbers, as well as the overflows. Grading of the experiment will depend on how efficiently you solve the problem, with all the corner cases being correct.Hints: ARM Assembly Code1. It helps to solve the problem conceptually at first, taking into account all corner cases, including the arithmetic overflows. Move to assembly code when the algorithm is well defined. (If you wish to implement the code in C, it can only help you here and also for Part 2, where you will be asked to produce several variations to the basic solution.)2. Follow the examples of codes and instructions elaborated in classes, tutorials and material posted online for getting quickly to the working code. When optimizing for speed, some ARM Cortex instructions, such as those replacing directly bits between two words could help.3. Document assembly code thoroughly and thoughtfully. It takes only a few hours for you to forget what each register holds and what implicit assumptions were used. It is useful to consider the assembler features that improve the coding style, such as the declaration of register variables, symbolic constants and similar.4. Please keep in mind that the processor does not activate the floating-point unit on powerup. Following the example given in the tutorial, ensure that the floating-point unit is not bypassed.1. When creating a new project, it is simplest to follow the created template. You will notice that your IDE creates main.c and also the startup code in assembly. You can either a. modify the startup code to branch to kalman rather than __main for testing assembly code alone, prior to embedding into C main program, or b. you can call your assembly code from the main (all calling conventions need to be observed). Please note that you will need to declare as global (exported) the subroutine name in your assembly code, as well as follow up the syntax rules for GCC Assembly language.2. For linking with C code that has main.c, no special measures should be needed, so option b) above is better and is readily expandable to the Part 2 extension.3. If the linker complains about some other missing variable to be imported in startup code, you can either declare it as “dummy” in your assembly code, or comment its mention in the startup code.Test Samples Be prepared to use your and TA-provided test samples exercising all the relevant cases.Part 1: Demonstration and Documentation The demonstration involves showing your source code and demonstrating a working program. Your program will be placed at an arbitrary memory location. You should be able to call your subroutine several times consecutively. You should be able to explain what every line in your code does – there will be questions in the demo dealing with all the aspects of your code and its performance. The questions might regard the skeleton code to initiate and start the assembly code that we gave you in the tutorial 1; ask if you do not know!While the full report will be needed for Part 2, which will include the code developed here, it is by far the most efficient that you have the full documentation on your assembly code subroutine completed upon demonstrating the Part 1, especially that the second part will require lots of documentation and additional code development on its own.You will perform this exercise in several stages and record the results obtained at each step in your lab report. First, you will augment the main C program to use the subroutine kalman repeatedly for an array of inputs.Your subroutine call will appear in the C source code exactly as if the C subroutine is used. In doing this, you should follow all the usual rules of writing C programs, keeping in mind that the executable program consists of multiple independently compiled modules.The filter state is a quintuple (q, r, x, p, k) kept in a single C language structkalman_state, consisting of: float q; //process noise covariance float r; //measurement noise covariance float x; //estimated value float p; //estimation error covariance float k; // adaptive Kalman filter gain.The operation of the program should be correct and as efficient as possible. To illustrate the operation of Kalman filter within your main program, Table 1 lists the state vector entries before and after the execution of first four iterations for the case where the initial value of x=5 and measurement(i)=i.Table 1: Evolving Kalman Filter State Space measurement(i) Before 0 0 1 2 3 4 q 0.1 0.1 0.1 0.1 0.1 0.1 r 0.1 0.1 0.1 0.1 0.1 0.1 p 0.1 0.067 0.0625 0.06 0.06 0.06 x 5 1.67 1.25 1.71 2.5 3.43 k 0 0.67 0.625 0.62 0.62 0.62Processing filtered data In the second part of the lab, you will implement functions in embedded C to track the input variable, following the Kalman filter tracking algorithm. The code should execute on a fixed-length input value vector, consisting, say, of 100 values that will be provided by TAs.1. To assess the properties of the Kalman filter tracking of the input stream, you will add four additional operations at the end: a. Subtraction of original and data obtained by Kalman filter tracking. b. Calculation of the standard deviation and the average of the difference obtained in a). c. Calculation of the correlation between the original and tracked vectors. d. Calculation of the convolution between the two vectors.You will then profile the code and compare the speed of your assembly code implementation of the Kalman filter with the equivalent C code subroutine.You will also re-implement the C and assembly code for kalman using the CMSIS-DSP library with the aims of: 1. Cross-validating the C code against the CMSIS-DSP implementation – one lab group member can start preparing the CMSIS-DSP implementation from the beginning, so that your assumptions about the code get validated as well.2. Comparing the performance and appreciating the utility of CMSIS-DSP. 3. Becoming proficient in CMSIS-DSP and capable of using any part of the CMSIS in future labs.Some observations from these calculations are in place: • Kalman filter convergence depends on the initial values. The closer the x is to the measurement, the smaller the need for correction. • By deviating from C programming practices, you risk passing the wrong parameters, so please be aware of that Function RequirementsDesign a C function: int Kalmanfilter(float* InputArray, float* OutputArray, kalman_state* kstate, int Length)Here: -InputArray is the array of measurements -OutputArray is the array of values x obtained by Kalman fiter calculations over the input field (cf. Table 1) -The kstate struct contains the Kalman filter state that is readily available for your debugging -Integer Length specifies the length of the data array to processThe output encodes whether the function executed properly (by returning 0) or the result is not a correct arithmetic value (e.g., it has run into numerical conditions leading to NaN) 1. The function KalmanFilter should call your ‘kalman’ subroutine wherever appropriate. Unnecessary operations should be avoided. 2. Use the ARM calling conventions for later inclusion into a C program. 3. Upon returning from subroutines, the registers and stack should be untouched. 4. The subroutine should be location independent. 5. The subroutine should be robust, correct and then fast.The idea is that the C function will be the interface to the test harness provided by TAs, and with the same interface (often referred to as a wrapper) you can call either the assembly, plain C or the CMSIS-DSP code.Useful Notes In realizing your code, you are encouraged to use good C coding practices, such as the use of conditional compiling for retargeting to different execution cases given above. Feel free to consult our online tutorial on embedded C, found in the Documentation part of the Content.The Debugger and Profiler While developing your code, you will spend substantial time using the debugger. Please follow the instruction from Tutorial 2 to ensure that the compilation for the simulator (and the debugger running on simulated code) compiles properly.The profiler will be a useful tool when the code is bug free and you want to assess and improve the performance of your code.Experimental Results to Report You are asked to perform the following experiments and include the results in your report. 1. Run the program and record the execution for a trace of your choice that nicely illustrates the properties of Kalman filter, such as the convergence towards the input stream and the statistical properties of the deviation to the input,2. Explicitly calculate in your program the difference to the input stream, standard deviation and average of that difference, as well as the correlation and convolution between the two streams. 3. Using CMSIS-DSP, perform the same functions as in 2.4. Using code profiling, report the running times for all the procedures with your code and CMSIS-DSP. You should get the results as fast as possible. 5. Rewrite your core Kalman filter code in plain C as well as in C using the CMSIS-DSP. Keep in mind that it is possible to obtain the “scalar” version of CMSIS-DSP when needed.6. Report the profiling data that compares the three versions of Kalman filter core (assembly, plain C and CMSIS-DSP). 7. Summarize the lessons learned on the use of assembler, C and CMSIS-DSP by selecting the suitable profiling data.8. Finally, run the code on the actual processor and use the debugger to inspect and modify the program variables while the code is running. Summarize in the report on the rules: which variables can be watched, modified and whether and how that can be done without stopping the processor.Final Report Once you have all the parts working, include all the relevant data to your report, which will be due first day in a week following the Lab 2 demonstration, but not later than 4 days after the demo. Please follow the report guidelines included in the lecture slides.
In this assignment, you will explore the spark GraphFrames library as well as implement your own Girvan-Newman algorithm using the Spark Framework to detect communities in graphs. You will use the ub_sample_data.csv dataset to find users who have a similar business taste. The goal of this assignment is to help you understand how to use the Girvan-Newman algorithm to detect communities in an efficient way within a distributed environment.2.1 Programming ReSuirements a. You must use Python and Spark to impeement aee tasks. There will be 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You can use the Spark DataFrame and GraphFrames eibrary for task1, but for task2 you can ONLY use Spark RDD and standard Python or Scaea eibraries. (ps. For Scaea, you can try GraphX, but for the assignment, you need to use GraphFrames.)2.2 Programming Environment Python 3.6, Scaea 2.11 and Spark 2.3.2 We will use Vocareum to automatically run and grade your submission. You must test your scripts on the eocae machine and the Vocareum terminae before submission.2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the reSuired functions on the web. Please do not look for or at any such code! TAs will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism.2.4 What you need to turn in You need to submit the following files on Vocareum: (all lowercase) a. [REQUIRED] two Python scripts, named: task1.py, task2.py b1. [REQUIRED FOR SCALA] two Scala scripts, named: task1.scaea, task2.scaea b2. [REQUIRED FOR SCALA] one jar package, named: hw4.jar c. [OPTIONAL] You can include other scripts called by your main program d. You don’t need to include your results. We will grade on your code with our testing data (data will be in the same format).You will continue to use Yelp dataset. We have generated a sub-dataset, ub_sample_data.csv, from the Yelp review dataset containing user_id and business_id. You can download it from Vocareum.4.1 Graph Construction To construct the social network graph, each node represents a user and there will be an edge between two nodes if the number of times that two users review the same business is greater than or equivaeent to the filter threshold. For example, suppose user1 reviewed [business1, business2, business3] and user2 reviewed [business2, business3, business4, business5]. If the threshold is 2, there will be an edge between user1 and user2. If the user node has no edge, we wiee not inceude that node in the graph. In this assignment, we use fieter threshoed 7.4.2 Task1: Community Detection Based on GraphFrames (2 pts) In task1, you will explore the Spark GraphFrames library to detect communities in the network graph you constructed in 4.1. In the library, it provides the implementation of the Label Propagation Algorithm (LPA) which was proposed by Raghavan, Albert, and Kumara in 2007. It is an iterative community detection solution whereby information “flows” through the graph based on underlying edge structure. For the details of the algorithm, you can refer to the paper posted on the Piazza. In this task, you do not need to implement the algorithm from scratch, you can call the method provided by the library. The following websites may help you get started with the Spark GraphFrames: https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guidepython.html https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-scala.html4.2.1 Execution Detaie The version of the GraphFrames should be 0.6.0. For Python: • In PyCharm, you need to pip install graphframes os.environ[“PYSPARK_SUBMIT_ARGS”] “–packages graphframes:graphframes:0.6.0 • In the terminal, you need –packages graphframes:graphframes:0.6.0 For Scala: • In Intellij IDEA, you need “graphframes” % “graphframes “org.apache.spark” %% “ • In the terminal, you need –packages graphframes:graphframes:0.6.0 For the parameter “maxIter” of 4.2.2 Output Resuet In this task, you need to save your one community and the format ‘user_id1 Your result should be firstly sorted then the first user_id in the community string). The user_ids in each community If there is oney one node in the Figure4.3 Task2: Community Detection to add the sentence below into your code os.environ[“PYSPARK_SUBMIT_ARGS”] = ( graphframes:graphframes:0.6.0-spark2.3-s_2.11″) need to assign the parameter “packages” of the spark graphframes:graphframes:0.6.0-spark2.3-s_2.11 need to add library dependencies to your project graphframes” % “0.6.0-spark2.3-s_2.11” “spark-graphx” % sparkVersion need to assign the parameter “packages” of the spark graphframes:graphframes:0.6.0-spark2.3-s_2.11 of LPA method, you shoued set it to 5. your result of communities in a txt file. Each is: user_id1’, ‘user_id2’, ‘user_id3’, ‘user_id4’, … sorted by the size of communities in the ascending community in eexicographicae order (the user_id community should also be in the eexicographicae the community, we stiee regard it as a vaeid community.Figure 1: community output file format Detection Based on Girvan-Newman aegorithm spark-submit: spark-submit: Each line represents ascending order and user_id is type of order. community. aegorithm (6 pts) In task2, you will implement your in the network graph. Because need to construct the graph again to the Chapter 10 from the Mining For task2, you can ONLY use Spark deeete your code that imports 4.3.1 Betweenness Caecueation In this part, you will calculate structed in 4.1. Then you need to (‘user_id1 Your result should be firstly sorted then the first user_id in the tuple two user_ids in each tuple should your result. Figure4.3.2 Community Detection You are reSuired to divide the highest modularity. The formula According to the Girvan-Newman the betweenness. The “m” in the The “A” in the formula is the adjacent step, “m” and “A” should not be If the community oney has one You need to save your result in task1. 4.4 Execution Format Execution exampee: your own Girvan-Newman algorithm to detect the Because you task1 and task2 code will be executed again in this task following the rules in section 4.1. ning of Massive Datasets book for the algorithm Spark RDD and standard Python or Scala libraries. graphframes. Caecueation (3 pts) the betweenness of each edge in the originae to save your result in a txt file. The format of user_id1’, ‘user_id2’), betweenness vaeue sorted by the betweenness values in the descending tuple in eexicographicae order (the user_id is type should also in eexicographicae order.You do not Figure 2: betweenness output file format (3 pts) the graph into suitable communities, which reaches formula of modularity is shown below: Newman algorithm, after removing one edge, you should the formula represents the edge number of the adjacent matrix of the originae graph. (Hint: be changed). user node, we stiee regard it as a vaeid community. in a txt file. The format is the same with the the communities separately, you4.1. You can refer details. libraries. Remember to originae graph you coneach line is descending order and type of string). The not need to round reaches the global should re-compute the originae graph. In each remove community. output file from Python: spark-submit –packages graphframes:graphframes:0.6.0-spark2.3-s_2.11 task1.py spark-submit task2.py Scala: spark-submit –packages graphframes:graphframes:0.6.0-spark2.3-s_2.11 –-class task1 hw4.jar spark-submit –-class task2 hw4.jar Input parameters: 1. : the filter threshold to generate edges between user nodes. 2. : the path to the input file including path, file name and extension. 3. : the path to the betweenness output file including path, file name and extension. 4. : the path to the community output file including path, file name and extension.Execution time: The overall runtime limit of your task1 (from reading the input file to finishing writing the community output file) is 200 seconds. The overall runtime limit of your task2 (from reading the input file to finishing writing the community output file) is 250 seconds. If your runtime exceeds the above limit, there will be no point for this task.5. About Vocareum a. You can use the provided datasets under the directory resource: /asnlib/publicdata/ b. You should upload the reSuired files under your workspace: work/ c. You must test your scripts on both the local machine and the Vocareum terminal before submission. d. During submission period, the Vocareum will automatically test task1 and task2. e. During grading period, the Vocareum will use another dataset that has the same format for testing. f. We do not test the Scala implementation during the submission period. g. Vocareum will automatically run both Python and Scala implementations during the grading period. h. Please start your assignment early! You can resubmit any script on Vocareum.We will only grade on your last submission. 6. Grading Criteria (% penalty = % penalty of possible points you get) a. You can use your free 8-day extension separately or together. You must submit a late-day reSuest via https://forms.gle/worKTbCRBWKQ6jSu6. This form is recording the number of late days you use for each assignment. By default, we will not count the late days if no reSuest submitted. b. There will be 10% bonus for each task if your Scala implementations are correct.Only when your Python results are correct, the bonus of Scala will be calculated. There is no partial point for Scala. c. There will be no point if your submission cannot be executed on Vocareum. d. There is no regrading. Once the grade is posted on the Blackboard, we will only regrade your assignments if there is a grading error. No exceptions. e. There will be 20% penalty for the late submission within one week and no point after that.
In this assignment, you will implement the SON algorithm using the Apache Spark Framework. You will develop a program to find frequent itemsets in two datasets, one simulated dataset and one real-world dataset generated from Yelp dataset. The goal of this assignment is to apply the algorithms you have learned in class on large datasets more efficiently in a distributed environment.2.1 Programming Requirements a. You must use Python to implement all tasks. There will be 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You are required to only use Spark RDD in order to understand Spark operations more deeply. You will not get any point if you use Spark DataFrame or DataSet.2.2 Programming Environment Python 3.6, Scala 2.11 and Spark 2.3.2 We will use Vocareum to automatically run and grade your submission. We highly recommend that you first test your script on your local machine and then submit to Vocareum.2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! TAs will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism.2.4 What you need to turn in a. Three Python scripts, named: (all lowercase): task1.py, task2.py, preprocess.py b1. [OPTIONAL] two Scala scripts, named: (all lowercase) task1.scala, task2.scala (No need to write preprocessing code in Scala) b2. [OPTIONAL] one jar package, named: hw2.jar (all lowercase) Note. You don’t need to include your output files. We will grade on your code with our testing data (data will be in the same format).In this assignment, you will use one simulated dataset and one real-world dataset. In task 1, you will build and test your program with a small simulated CSV file that has been provided to you. For task 2, you need to generate a subset using business.json and review.json from the Yelp dataset (https://drive.google.com/drive/folders/1-Y4H0vw2rRIjByDdGcsEuor9VagDyzin? usp=sharing) with the same structure as the simulated data. Figure 1 shows the file structure, the first column is user_id and the second column is business_id. In task2, you will test your code with this real-world data. Figure 1: Input Data Format We will only provide submission report for small1.csv on Vocareum for task 1. No submission report will be provided for task2.You are encouraged to used command line to run the code for small2.csv as well as for task2 to get a sense of the running time.In this assignment, you will implement the SON algorithm to solve all tasks (Task 1 and 2) on top of Apache Spark Framework. You need to find all the possible combinations of the frequent itemsets in any given input file within the required time. You can refer to Chapter 6 from the Mining of Massive Datasets book and concentrate on section 6.4 – Limited-Pass Algorithms. (Hint: you can choose either A-Priori, MultiHash, or PCY algorithm to process each chunk of the data)4.1 Task 1: Simulated data (4 pts) There are two CSV files (small1.csv and small2.csv) provided on the Vocareum in your workspace. The small1.csv is just a sample file that you can used to debug your code. For task1, we will test your code on small2.csv for grading. In this task, you need to build two kinds of market-basket model. Case 1 (2 pts): You will calculate the combinations of frequent businesses (as singletons, pairs, triples, etc.) that are qualified as frequent given a support threshold. You need to create a basket for each user containing the business ids reviewed by this user. If a business was reviewed more than once by a reviewer, we consider this product was rated only once. More specifically, the business ids within each basket are unique. The generated baskets are similar to: user1: [business11, business12, business13, …] user2: [business21, business22, business23, …] user3: [business31, business32, business33, …] Case 2 (2 pts): You will calculate the combinations of frequent users (as singletons, pairs, triples, etc.) that are qualified as frequent given a support threshold.You need to create a basket for each business containing the user ids that commented on this business. Similar to case 1, the user ids within each basket are unique. The generated baskets are similar to: business1: [user11, user12, user13, …] business2: [user21, user22, user23, …] business3: [user31, user32, user33, …] Input format: 1. Case number: Integer that specifies the case. 1 for Case 1 and 2 for Case 2. 2. Support: Integer that defines the minimum count to qualify as a frequent itemset. 3. Input file path: This is the path to the input file including path, file name and extension.4. Output file path: This is the path to the output file including path, file name and extension. Output format: 1. Runtime: the total execution time from loading the file till finishing writing the output file You need to print the runtime in the console with the “Duration” tag, e.g., “Duration: 100”. 2. Output file: (1) Output-1 You should use “Candidates:”as the tag. For each line you should output the candidates of frequent itemsets you find after the first pass of SON algorithm, followed by an empty line after each fequent-X itemset combination list. The printed itemsets must be sorted in lexicographical order. (Both user_id and business_id have the data type “string”.) (2) Output-2 You should use “Frequent Itemsets:”as the tag. For each line you should output the final frequent itemsets you found after finishing the SON algorithm.The format is the same with the Output-1. The printed itemsets must be sorted in lexicographical order. Here is an example of the output file: Both the output-1 result and output-2 should be saved in ONE output result file “firstname_lastname_task1.txt”. Execution example: Python: spark-submit firstname_lastname_task1.py Scala: spark-submit –class firstname_lastname_task1 firstname_lastname_hw2.jar 4.2 Task 2: Yelp data (4 pts) In task2, you will explore the Yelp dataset to find the frequent business sets (only case 1). You will jointly use the business.json and review.json to generate the input user-business CSV file yourselves.(1) Data preprocessing You need to generate a sample dataset from business.json and review.json (https:// drive.google.com/drive/folders/1-Y4H0vw2rRIjByDdGcsEuor9VagDyzin?usp=sharing) with following steps: 1. The state of the business you need is Nevada, i.e., filtering ‘state’== ‘NV’. 2. Select “user_id” and “business_id” from review.json whose “business_id” is from Nevada. Each line in the CSV file would be “user_id1, business_id1”. 3. The header of CSV file should be “user_id,business_id” You need to save the dataset in CSV format.Figure 3 shows an example of the output file Figure 2: user_business file You need to submit the code and the output file of this data preprocessing step. The preprocessing code will NOT be graded. We will use different filters to generate another dataset for grading. (2) Apply SON algorithm The requirements for task 2 are similar to task 1. However, you will test your implementation with the large dataset you just generated. For this purpose, you need to report the total execution time. For this execution time, we take into account also the time from reading the file till writing the results to the output file. You are asked to find the frequent business sets (only case 1) from the file you just generated.The following are the steps you need to do: 1. Reading the user_business CSV file in to RDD and then build the case 1 market-basket model; 2. Find out qualified users who reviewed more than k businesses. (k is the filter threshold); 3. Apply the SON algorithm code to the filtered market-basket model; Input format: 1. Filter threshold: Integer that is used to filter out qualified users 2. Support: Integer that defines the minimum count to qualify as a frequent itemset. 3. Input file path: This is the path to the input file including path, file name and extension. 4. Output file path: This is the path to the output file including path, file name and extension. Output format: 1. Runtime: the total execution time from loading the file till finishing writing the output file You need to print the runtime in the console with the “Duration” tag, e.g., “Duration: 100”.2. Output file The output file format is the same with task 1. Both the intermediate results and final results should be saved in ONE output result file “firstname_lastname_task2.txt”. Execution example: Python: spark-submit firstname_lastname_task2.py Scala: spark-submit –class firstname_lastname_task2 firstname_lastname_hw2.jar 5. Evaluation Metric Task 1: Task 2: 6. Grading Criteria (% penalty = % penalty of possible points you get) 1. You can use your free 5-day extension separately or together, and you need to email your TA to indicate that you are using the free days within 24 hours of your submission. 2. There will be 10% bonus for each task (i.e., 0.3 pts, 0.2 pts, 0.3 pts) if your Scala implementations are correct. Only when your Python results are correct, the bonus of using Scala will be calculated. There is no partial point for Scala. 3. There will be no point if your programs cannot be executed on Vocareum Please start your assignment early! You can resubmit on Vocareum. We will grade your last submission.4. There is no regrading. Once the grade is posted on the Blackboard, we will only regrade your assignments if there is a grading error. No exceptions. 5. There will be 20% penalty for the late submission within a week and no point after a week. 6. There will be no point if the total execution time exceeds Section 6 evaluation metric 7. If the outputs of your program are unsorted or partially sorted, there will be 50% penalty. Input File Case Support Runtime (sec) small2.csv 1 4
In assignment 1, you will complete three tasks. The goal of these tasks is to help you get familiar with Spark operations (e.g., transformations and actions) and MapReduce.2.1 Programming Requirements a. You must use Python to implement all tasks. You can only use standard python libraries (i.e., external libraries like numpy or pandas are not allowed). There will be 10% bonus for Scala implementation. You can get the bonus only when both Python and Scala implementations are correct. b. You are required to only use Spark RDD, i.e. no point if using Spark DataFrame or DataSet.2.2 Programming Environment Python 3.6, Scala 2.11, and Spark 2.3.0 We will use Vocareum to automatically run and grade your submission. We highly recommend that you first test your scripts on your machine and then submit to Vocareum if the testing is successful. You should have received an email from Vocareum to access the assignment 1. The email should contain some text like: “Welcome to Vocareum’s Coding Lab! You have been enrolled in the following course: Course: INF 553 – Spring 2020”. If you do not receive such an email, please contact your TA.2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! TAs will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism to the university.In this assignment, you are provided with two datasets (i.e., reviews and businesses) extracted from the Yelp dataset for developing your assignment. 1 You can access and download the datasets either under the directory on Vocareum: resource/asnlib/publicdata/ or in the Google Drive: https://drive.google.com/drive/folders/1dnVCZazaR84UkvhHAwoHyFqxmFF01IHa?usp=sharing We generated these datasets in a random sampling way. These given datasets are only for your testing. During the grading period, we will use different sampled subsets of the Yelp datasets.You need to submit the following files on Vocareum: (all lowercase) A. Python scripts: task1.py, task2.py, task3.py B. [OPTIONAL] Scala scripts: task1.scala, task2.scala, task3.scala; Jar package: hw1.jar 4.1 Task1: Data Exploration (3pts) 4.1.1 Task description You will explore the review dataset and write a program to answer the following questions: A. The total number of reviews (0.5pts) B. The number of reviews in a given year, y (0.5pts) C. The number of distinct users who have written the reviews (0.5pts) D. Top m users who have the largest number of reviews and its count (0.5pts) E. Top n frequent words in the review text. The words should be in lower cases.The following punctuations i.e., “(”, “[”, “,”, “.”, “!”, “?”, “:”, “;”, “]”, “)”, and the given stopwords are excluded (1pts) 4.1.2 Execution commands Python: $ spark-submit task1.py Scala: $ spark-submit –class task1 hw1.jar Params: input_file – the input file (the review dataset) output_file – the output file contains your answers stopwords – the file contains the stopwords that should be removed for Question E y/m/n – see 4.1.1 4.1.3 Output format: You must write the results in the JSON format using exactly the same tags for each question (see an example in Figure 2). The answer for A/B/C is a number. The answer for D is a list of pairs [user, count]. The answer for E is a list of frequent words. All answers should be sorted by the count in the descending order. If two users/words have the same count, please sort them in the alphabetical order. 1 https://www.yelp.com/dataset Figure 2: An example output for task1 in JSON format4.2 Task2: Exploration on Multiple Datasets (2pts) 4.2.1 Task description In task2, you will explore the two datasets together (i.e., review and business) and write a program to compute the average stars for each business category and output top n categories with the highest average stars(1pts). The business categories should be extracted from the “categories” tag in the business file and split by comma (also need to remove leading and trailing spaces for the extracted categories). You need to implement a version without Spark and compare to a version with Spark for reducing the time duration of execution (1pts). 4.2.2 Execution commands Python: $ spark-submit task2.py Scala: $ spark-submit –class task2 hw1.jar Params: review _file – the input file (the review dataset) business_file – the input file (the business dataset) output_file – the output file contains your answers if_spark – either “spark” or “no_spark” n – top n categories with highest average stars (see 4.2.1)4.2.3 Output format: You must write the results in the JSON format using exactly the same tags (see an example in Figure 3). The answer is a list of pairs [category, stars], which are sorted by the stars in the descending order. If two categories have the same value, please sort the categories in the alphabetical order. Figure 3: An example output for task2 in JSON format4.3 Task3: Partition (3pts) 4.3.1 Task description In this task, you will learn how partitions work in the RDD. You need to compute the businesses that have more than n reviews in the review file (1pts). At the same time, you need to show the number of partitions for the RDD and the number of items per partition with either default or customized partition function (1pts). You should design a customized partition function to improve the computational efficiency, i.e., reducing the time duration of execution (1pts). 4.3.2 Execution commands Python: $ spark-submit task3.py Scala: $ spark-submit –class task3 hw1.jar Params: input_file – the input file (the review dataset) output_file – the output file contains your answers partition_type – the partition function, either “default” or “customized” n_partitions – the number of partitions (only effective for the customized partition function) n – the threshold of the number of reviews (see 4.3.1)4.3.3 Output format: You must write the results in the JSON format using exactly the same tags (see an example in Figure 4). The answer for the number of partitions is a number. The answer for the number of items per partition is a list of numbers. The answer for result is a list of pairs [business, count] (no need to sort). Figure 4: An example output for task3 in JSON format 5. About Vocareum 1. You can use the provided datasets under the directory resource: /asnlib/publicdata/ 2. You should upload the scripts under your workspace: work/ 3. Once you click on “Submit”, all the scripts are submitted and automatically run on Vocareum. 4. We highly recommend that you first test your scripts on your machine and then submit to Vocareum if the testing is successful. 5. You can also test your Python/Scala implementations using the required execution commands in the Vocareum terminal before submitting. 6. Here are the testing commands for your reference: 7. You will receive a submission report after Vocareum finishes executing your scripts.The submission report should include the running time and score of each task for the Python implementation. We do not test the Scala implementation during the submission period. 8. Vocareum will automatically run both Python and Scala implementations during the grading period. 9. Please start your assignment early! You can resubmit any script on Vocareum. We will grade on your last submission. 6. Grading Criteria (% penalty = % penalty of possible points you get) 1. You can use your free 5-day extension separately or together. You must submit a late-day request via https://forms.gle/worKTbCRBWKQ6jqu6.This form is recording the number of late days you use for each assignment. By default, we will not count the late days if no request submitted. 2. There will be 10% bonus for each task (i.e., 0.3pts, 0.2pts, and 0.3pts) if your Scala implementations are correct. Only when your Python results are correct, the bonus of using Scala will be calculated.There is no partial point for Scala. 3. There will be no point if your submission cannot be executed on Vocareum. 4. There is no regrading. Once the grade is posted on the Blackboard, we will only regrade your assignments if there is a grading error. No exceptions. 5. There will be 20% penalty for the late submission within one week and no point after that.
For your convenience a file of 200+ random words and names (Words200D16) has been provided on Blackboard. Each line of text in the file is exactly 16 characters in length with the word left justified. You may utilize any code developed in class directly in the lab. While not initially apparent, this lab will be more of a thought problem than coding problem. A careful examination of the sample code utilized in class will reveal a great deal of the code required for the lab has already been developed in class. Indeed, with so many examples, I assume you should be able to complete the “A” option code with little trouble. Hence grading will emphasize evaluation and presentation of results for correctly coded programs. Be professional grade! “C” Option: Assume a main memory hash table of size 128. For convenience, assume the characters in the key are numbered 1 through 16. “My” hash address must be calculated as follows for “STR” using Wallgol: HA = ( STR(1) + STR(3) ) / 256 + STR(6) + 30.For STR = “ABCDEFGHIJKLMNOP” this would be ( ‘A’ + ‘C’ ) / 256 + ‘F’ + 30. All alphabetic/numeric 8 bit ASCII characters in a key are left justified. Keys containing less than 16 alphabetic/numeric characters are padded with spaces on the right to a total of 16 characters. I have never used this function but it should be substantially less than optimal generating a plethora of collisions. All characters must be treated as integers in all steps of the calculation! You may use the slice operator or simple subscripting depending on how you define the keys. I recommend using a long_integer to hold the intermediate results of the hash calculation. Remember a “slice” in Ada allows the user to treat a substring as a single unit. If “str” has the value “ABCD” then str(2..3) refers to the characters in positions 2 through 3 as a group from the string str, i.e., ”BC.” “str(2..2) would be the second character in the string, i.e., “B”. I am assuming you have instantiated appropriate functions (coercion) which converts character or strings (8 bits per ASCII character) to long integers (32 bits) so overflow will not occur in the sum. Each entry placed in the hash table must be a record with 3 fields. The first filed is the key, the second the initial hash address, and the third field is the number of probes to insert the key in the table counting the initial hash address as the first probe. Based on your criticism, write a better hash function. You must explain explicitly why your hash function should be better from both a theoretical and empirical standpoint based on your previous explanation of why my function is bad. You will not receive credit for simply writing a hash function that performs better. Implement your hash function and generate the same results as required for parts A thru E presented in tabular format for comparison. Formally evaluate the results as part of your lab (typed evaluation). The results for all parts of the lab should appear in a single table. Shame on you if your hash function’s performance does not exceed the theoretical values for both the linear and random probe but at least my hash function. “B” Option:You need not complete the “C” option using a main memory hash table. Rather, complete the “C” option using a random access (relative) file for the hash table. Main memory may not be used to store any portion of the hash table! Generate the same results as required for parts A thru E presented in tabular format for comparison. Script Kiddies may have trouble evaluating the failures of my hash function and creating a better function. “A” Option: Do both the “C” and “B” options. Professionally compare and explain the results.The results for all parts of the lab should appear in a single table. Shame on you if your hash function’s performance does not exceed the theoretical values for both the linear and random probe collision handling techniques. REPORT ALL GRADING OPTIONS The exact format of the hash report is left to you. A professional report must at a minimum contain a single table with all results conveniently displayed for the reader. It should contain the results for my hash function for the first and last 30 keys inserted using the linear probe. This should be followed by the results using the random probe. Next exhibit the same material using your hash function. If you do the “A” option, you must include both the “C” and “B” option results in the table. Your evaluation of results should appear next. Do not forget to include the required memory contents including the location, key and number of probes to store/locate the key for both the linear and random probes. YOU ARE THE EXPERT – I AM THE CLIENT. Convenience me you have done a professional job! Partial Table A professional verifies their hash function to be sure it is calculating hash addresses correctly. Hint: You do not write code to be sure the correct answers are being produced. Use your knowledge of the character set to do the calculations by hand. If you are not verifying results, you are not a professional.
For your convenience a file of 200+ random words and names (Words200D16) has been provided on Blackboard. Each line of text in the file is exactly 16 characters in length with the word left justified. You may utilize any code developed in class directly in the lab. While not initially apparent, this lab will be more of a thought problem than coding problem. A careful examination of the sample code utilized in class will reveal a great deal of the code required for the lab has already been developed in class. Indeed, with so many examples, I assume you should be able to complete the “A” option code with little trouble. Hence grading will emphasize evaluation and presentation of results for correctly coded programs. Be professional grade! “C” Option: Assume a main memory hash table of size 128. For convenience, assume the characters in the key are numbered 1 through 16. “My” hash address must be calculated as follows for “STR” using Wallgol: HA = ( STR(1) + STR(3) ) / 256 + STR(6) + 30.For STR = “ABCDEFGHIJKLMNOP” this would be ( ‘A’ + ‘C’ ) / 256 + ‘F’ + 30. All alphabetic/numeric 8 bit ASCII characters in a key are left justified. Keys containing less than 16 alphabetic/numeric characters are padded with spaces on the right to a total of 16 characters. I have never used this function but it should be substantially less than optimal generating a plethora of collisions. All characters must be treated as integers in all steps of the calculation! You may use the slice operator or simple subscripting depending on how you define the keys. I recommend using a long_integer to hold the intermediate results of the hash calculation. Remember a “slice” in Ada allows the user to treat a substring as a single unit. If “str” has the value “ABCD” then str(2..3) refers to the characters in positions 2 through 3 as a group from the string str, i.e., ”BC.” “str(2..2) would be the second character in the string, i.e., “B”. I am assuming you have instantiated appropriate functions (coercion) which converts character or strings (8 bits per ASCII character) to long integers (32 bits) so overflow will not occur in the sum. Each entry placed in the hash table must be a record with 3 fields. The first filed is the key, the second the initial hash address, and the third field is the number of probes to insert the key in the table counting the initial hash address as the first probe. Based on your criticism, write a better hash function. You must explain explicitly why your hash function should be better from both a theoretical and empirical standpoint based on your previous explanation of why my function is bad. You will not receive credit for simply writing a hash function that performs better. Implement your hash function and generate the same results as required for parts A thru E presented in tabular format for comparison. Formally evaluate the results as part of your lab (typed evaluation). The results for all parts of the lab should appear in a single table. Shame on you if your hash function’s performance does not exceed the theoretical values for both the linear and random probe but at least my hash function. “B” Option:You need not complete the “C” option using a main memory hash table. Rather, complete the “C” option using a random access (relative) file for the hash table. Main memory may not be used to store any portion of the hash table! Generate the same results as required for parts A thru E presented in tabular format for comparison. Script Kiddies may have trouble evaluating the failures of my hash function and creating a better function. “A” Option: Do both the “C” and “B” options. Professionally compare and explain the results.The results for all parts of the lab should appear in a single table. Shame on you if your hash function’s performance does not exceed the theoretical values for both the linear and random probe collision handling techniques. REPORT ALL GRADING OPTIONS The exact format of the hash report is left to you. A professional report must at a minimum contain a single table with all results conveniently displayed for the reader. It should contain the results for my hash function for the first and last 30 keys inserted using the linear probe. This should be followed by the results using the random probe. Next exhibit the same material using your hash function. If you do the “A” option, you must include both the “C” and “B” option results in the table. Your evaluation of results should appear next. Do not forget to include the required memory contents including the location, key and number of probes to store/locate the key for both the linear and random probes. YOU ARE THE EXPERT – I AM THE CLIENT. Convenience me you have done a professional job! Partial Table A professional verifies their hash function to be sure it is calculating hash addresses correctly. Hint: You do not write code to be sure the correct answers are being produced. Use your knowledge of the character set to do the calculations by hand. If you are not verifying results, you are not a professional.
So far, you have built a shell environment and a multi-thread scheduler with process synchronization. Excellent job! 3 What’s still missing for a “real” operating system? A file system! In this assignment, you will implement utilities 4 that perform operations on a file system similar to Microsoft’s FAT file system with some improvement. 5 1.1 Sample File Systems 6 You will be given two file system images: disk1.img and disk2.img for self-testing, but your submission may be 7 tested against other disk images following the same specification. 8 You should get comfortable examining the raw, binary data in the file system images using the program xxd. 9 IMPORTANT: since you are dealing with binary data, functions intended for string manipulation 10 such as strcpy() do NOT work (since binary data may contain binary ’0’ anywhere), and you should 11 use functions intended for binary data such as memcpy(). 12 2 Requirements 13 2.1 Part I (5 marks) 14 In part I, you will write a program that displays information about the file system. In order to complete part I, you 15 will need to read the file system super block and use the information in the super block to read the FAT. 16 Your program for part I will be invoked as follows: ./diskinfo disk1.img 17 Sample output: Super block information: Block size: 512 Block count: 5120 FAT starts: 1 FAT blocks: 40 Root directory start: 41 Root directory blocks: 8 FAT information: Free Blocks: 5071 Reserved Blocks: 41 Allocated Blocks: 8 18 Please be sure to use the exact same output format as shown above. 1 19 2.2 Part II (5 marks) 20 In part II, you will write a program, with the routines already implemented for part I, that displays the contents of 21 the root directory in the file system. 22 Your program for part II will be invoked as follows: ./disklist disk1.img 23 The directory listing should be formatted as follows: 24 1. The first column will contain: 25 (a) F for regular files, or 26 (b) D for directories; 27 followed by a single space 28 2. then 10 characters to show the file size, followed by a single space 29 3. then 30 characters for the file name, followed by a single space 30 4. then the file modification date (we won’t display the file creation date). For example: F 2560 foo.txt 2005/11/15 12:00:00 F 5120 foo2.txt 2005/11/15 12:00:00 F 48127 makefs 2005/11/15 12:00:00 F 8 foo3.txt 2005/11/15 12:00:00 31 2.3 Part III (5 Marks) 32 In part III, you will write a program that copies a file from the file system to the current directory in Linux. If the 33 specified file is not found in the root directory of the file system, you should output the message File not found. 34 and exit. 35 Your program for part III will be invoked as follows: ./diskget disk1.img foo.txt 36 2.4 Part IV (5 marks) 37 In part IV, you will write a program that copies a file from the current Linux directory into the file system. If the 38 specified file is not found, you should output the message File not found. on a single line and exit. 39 Your program for part IV will be invoked as follows: ./diskput disk1.img foo.txt 40 3 File System Specification 41 The FAT file system has three major components: 42 1. the super block, 43 2. the File Allocation Table (informally referred to as the FAT), 44 3. the directory structure. 45 Each of these three components is described in the subsections below. 2 Description Size Default Value File system identifier 8 bytes CSC360FS Block Size 2 bytes 0x200 File system size (in blocks) 4 bytes 0x00001400 Block where FAT starts 4 bytes 0x00000001 Number of blocks in FAT 4 bytes 0x00000028 Block where root directory starts 4 bytes 0x00000029 Number of blocks in root dir 4 bytes 0x00000008 Figure 1: Superblock Fields 46 3.1 File System Superblock 47 The first block (512 bytes) is reserved to contain information about the file system. The layout of the superblock is 48 as follows: 49 Note: Block number starts from 0 in the file system. 50 IMPORTANT: the superblock only specifies the starting block of FAT and root directory and how 51 many blocks they have—this does NOT mean that these blocks are consecutive in the disk image, 52 i.e., you need to use the FAT to locate the complete FAT and root directory blocks as well. 53 3.2 Directory Entries 54 Each directory entry takes up 64 bytes, which implies there are 8 directory entries per 512 byte block. 55 Each directory entry has the following structure: Description Size Status 1 byte Starting Block 4 bytes Number of Blocks 4 bytes File Size (in bytes) 4 bytes Create Time 7 bytes Modify Time 7 bytes File Name 31 bytes unused (set to 0xFF) 6 bytes Figure 2: Directory Entry 56 The description of each field follows: 57 Status This is bit mask that is used to describe the status of the file. Currently only 3 of the bits are used. Bit 0 set to 0 if this directory entry is available, set to 1 if it is in use Bit 1 set to 1 if this entry is a normal file Bit 2 set to 1 if this entry is a directory Figure 3: Format of Status Field 58 It is implied that only one of bit 2 or bit 1 can be set to 1. That is, an entry is either a normal file or it is a 59 directory, not both. 60 Starting Block This is the location on disk of the first block in the file 61 Number of Blocks The total number of blocks in this file 3 File Size The size of the file, in bytes. The size of this field implies that the largest file we can support is 232 62 bytes 63 long. 64 Create Time The date and time when this file was created. The file system stores the system times as integer 65 values in the format: 66 YYYYMMDDHHMMSS Field Size YYYY 2 bytes MM 1 byte DD 1 byte HH 1 byte MM 1 byte SS 1 byte Figure 4: Format of Date-Time Field 67 Modify Time The last time this file was modified. Stored in the same format as the Create Time shown above. 68 File Name The file name, null terminated. Because of the null terminator, the maximum length of any filename is 69 30 bytes. 70 Valid characters are upper and lower case letters (a-z, A-Z), digits (0-9) and the underscore character ( ). 71 3.3 File Allocation Table (FAT) 72 Each directory entry contains the starting block number for a file, let’s say it is block number X. To find the next 73 block in the file, you should look at entry X in the FAT. If the value you find there does not indicate End-of-File 74 (see below) then that value, call it Y, is the next block number in the file. 75 That is, the first block is at block number X, you look in the FAT table at entry X and find the value Y. The 76 second data block is at block number Y. Then you look in the FAT at entry Y to find the next data block number… 77 continue this until you find the special value in the FAT entry indicating that you are at the last FAT entry of the 78 file. 79 The FAT is really just a linked list, which the head of the list being stored in the “Starting Block” field in the 80 directory entry, and the ‘next pointers’ being stored in the FAT entries. 81 Fat entries are 4 bytes long (32 bits), which implies there are 128 FAT entries per block. Special values for FAT entries are described in Figure 5. Value Meaning 0x00000000 This block is available 0x00000001 This block is reserved 0x00000002– 0xFFFFFF00 Allocated blocks as part of files 0xFFFFFFFF This is the last block in a file Figure 5: Value of FAT entry 82 83 4 Byte Ordering 84 Different hardware architectures store multi-byte data (like integers) in different orders. Consider the large integer: 85 0xDEADBEEF 86 On the Intel architecture (Little Endian), it would be stored in memory as: 87 EF BE AD DE 4 88 On the PowerPC (Big Endian), it would be stored in memory as: 89 DE AD BE EF 90 Our file system will use Big Endian for storage. This will make debugging the file system by examining the raw 91 data much easier. 92 This will mean that you have to convert all your integer values to Big Endian before writing them to disk. There 93 are utility functions in netinit/in.h that do exactly that. (When sending data over the network, it is expected the 94 data is in Big Endian format.) 95 See the functions htons, htonl, ntohs and ntohl. 96 The side effect of using these functions will be that your code will work on multiple platforms. (On machines 97 that natively store integers in Big Endian format, like the Mac (not the Intel-based ones), the above functions don’t 98 actually do anything but you should still use them!) 99 5 Submission Requirements 100 What to hand in: You need to hand in a .tar.gz file containing all your source code and a Makefile that produces 101 the executables for parts 1 – 4. 102 Please include a readme.txt file that explains any bonus activities that you completed. 103 The file is submitted through connex.csc.uvic.ca site. 104 6 Possible Bonuses 105 There is lots of room for bonus marks in this assignment. As before, you must get permission from your instructor 106 before you begin the bonus activities, at least one week before the due date. Some things you may consider: 107 1. implementing directories other than the root directory 108 2. implementing fast searching for directories and/or free space 109 3. implementing the file system as a device driver for Linux 110 4. writing a shell to interact with the file system If you have any other ideas, please email your instructor. 5 A An Exercise Q1 Consider the superblock shown below: 0000000: 4353 4333 3630 4653 0200 0000 1400 0000 CSC360FS…….. 0000010: 0001 0000 0028 0000 0029 0000 0008 0000 …..(…)…… 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ……………. (a) What block does the FAT start on? How many blocks are used for the FAT? (b) What block does the root directory start on? How many blocks are used for the root directory? 6 Q2 Consider the following block from the root directory: 0005200: 0300 0000 3100 0000 0500 000a 0007 d50b ….1……….. 0005210: 0f0c 0000 07d5 0b0f 0c00 0066 6f6f 2e74 ………..foo.t 0005220: 7874 0000 0000 0000 0000 0000 0000 0000 xt………….. 0005230: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. 0005240: 0300 0000 3600 0000 0a00 0014 0007 d50b ….6……….. 0005250: 0f0c 0000 07d5 0b0f 0c00 0066 6f6f 322e ………..foo2. 0005260: 7478 7400 0000 0000 0000 0000 0000 0000 txt…………. 0005270: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. 0005280: 0300 0000 4000 0000 5e00 00bb ff07 d50b ….@…^……. 0005290: 0f0c 0000 07d5 0b0f 0c00 006d 616b 6566 ………..makef 00052a0: 7300 0000 0000 0000 0000 0000 0000 0000 s…………… 00052b0: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. 00052c0: 0300 0000 9e00 0000 0100 0000 0807 d50b ……………. 00052d0: 0f0c 0000 07d5 0b0f 0c00 0066 6f6f 332e ………..foo3. 00052e0: 7478 7400 0000 0000 0000 0000 0000 0000 txt…………. 00052f0: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. (a) How many files are allocated in this directory? What are their names? (b) How many blocks does the file makefs occupy on the disk? 7 Q3 Given the root directory information from the previous question and the FAT table shown below: 0000200: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000210: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000220: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000230: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000240: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000250: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000260: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000270: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000280: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000290: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 00002a0: 0000 0001 0000 002a 0000 002b 0000 002c …….*…+…, 00002b0: 0000 002d 0000 002e 0000 002f 0000 0030 …-……./…0 00002c0: ffff ffff 0000 0032 0000 0033 0000 0034 …….2…3…4 00002d0: 0000 0035 ffff ffff 0000 0037 0000 0038 …5…….7…8 00002e0: 0000 0039 0000 003a 0000 003b 0000 003c …9…:…;……?…. (a) What blocks does the file foo.txt occupy on the disk? (b) What blocks does the file foo2.txt occupy on the disk? 8
So far, you have built a shell environment and a multi-thread scheduler with process synchronization. Excellent job! 3 What’s still missing for a “real” operating system? A file system! In this assignment, you will implement utilities 4 that perform operations on a file system similar to Microsoft’s FAT file system with some improvement. 5 1.1 Sample File Systems 6 You will be given two file system images: disk1.img and disk2.img for self-testing, but your submission may be 7 tested against other disk images following the same specification. 8 You should get comfortable examining the raw, binary data in the file system images using the program xxd. 9 IMPORTANT: since you are dealing with binary data, functions intended for string manipulation 10 such as strcpy() do NOT work (since binary data may contain binary ’0’ anywhere), and you should 11 use functions intended for binary data such as memcpy(). 12 2 Requirements 13 2.1 Part I (5 marks) 14 In part I, you will write a program that displays information about the file system. In order to complete part I, you 15 will need to read the file system super block and use the information in the super block to read the FAT. 16 Your program for part I will be invoked as follows: ./diskinfo disk1.img 17 Sample output: Super block information: Block size: 512 Block count: 5120 FAT starts: 1 FAT blocks: 40 Root directory start: 41 Root directory blocks: 8 FAT information: Free Blocks: 5071 Reserved Blocks: 41 Allocated Blocks: 8 18 Please be sure to use the exact same output format as shown above. 1 19 2.2 Part II (5 marks) 20 In part II, you will write a program, with the routines already implemented for part I, that displays the contents of 21 the root directory in the file system. 22 Your program for part II will be invoked as follows: ./disklist disk1.img 23 The directory listing should be formatted as follows: 24 1. The first column will contain: 25 (a) F for regular files, or 26 (b) D for directories; 27 followed by a single space 28 2. then 10 characters to show the file size, followed by a single space 29 3. then 30 characters for the file name, followed by a single space 30 4. then the file modification date (we won’t display the file creation date). For example: F 2560 foo.txt 2005/11/15 12:00:00 F 5120 foo2.txt 2005/11/15 12:00:00 F 48127 makefs 2005/11/15 12:00:00 F 8 foo3.txt 2005/11/15 12:00:00 31 2.3 Part III (5 Marks) 32 In part III, you will write a program that copies a file from the file system to the current directory in Linux. If the 33 specified file is not found in the root directory of the file system, you should output the message File not found. 34 and exit. 35 Your program for part III will be invoked as follows: ./diskget disk1.img foo.txt 36 2.4 Part IV (5 marks) 37 In part IV, you will write a program that copies a file from the current Linux directory into the file system. If the 38 specified file is not found, you should output the message File not found. on a single line and exit. 39 Your program for part IV will be invoked as follows: ./diskput disk1.img foo.txt 40 3 File System Specification 41 The FAT file system has three major components: 42 1. the super block, 43 2. the File Allocation Table (informally referred to as the FAT), 44 3. the directory structure. 45 Each of these three components is described in the subsections below. 2 Description Size Default Value File system identifier 8 bytes CSC360FS Block Size 2 bytes 0x200 File system size (in blocks) 4 bytes 0x00001400 Block where FAT starts 4 bytes 0x00000001 Number of blocks in FAT 4 bytes 0x00000028 Block where root directory starts 4 bytes 0x00000029 Number of blocks in root dir 4 bytes 0x00000008 Figure 1: Superblock Fields 46 3.1 File System Superblock 47 The first block (512 bytes) is reserved to contain information about the file system. The layout of the superblock is 48 as follows: 49 Note: Block number starts from 0 in the file system. 50 IMPORTANT: the superblock only specifies the starting block of FAT and root directory and how 51 many blocks they have—this does NOT mean that these blocks are consecutive in the disk image, 52 i.e., you need to use the FAT to locate the complete FAT and root directory blocks as well. 53 3.2 Directory Entries 54 Each directory entry takes up 64 bytes, which implies there are 8 directory entries per 512 byte block. 55 Each directory entry has the following structure: Description Size Status 1 byte Starting Block 4 bytes Number of Blocks 4 bytes File Size (in bytes) 4 bytes Create Time 7 bytes Modify Time 7 bytes File Name 31 bytes unused (set to 0xFF) 6 bytes Figure 2: Directory Entry 56 The description of each field follows: 57 Status This is bit mask that is used to describe the status of the file. Currently only 3 of the bits are used. Bit 0 set to 0 if this directory entry is available, set to 1 if it is in use Bit 1 set to 1 if this entry is a normal file Bit 2 set to 1 if this entry is a directory Figure 3: Format of Status Field 58 It is implied that only one of bit 2 or bit 1 can be set to 1. That is, an entry is either a normal file or it is a 59 directory, not both. 60 Starting Block This is the location on disk of the first block in the file 61 Number of Blocks The total number of blocks in this file 3 File Size The size of the file, in bytes. The size of this field implies that the largest file we can support is 232 62 bytes 63 long. 64 Create Time The date and time when this file was created. The file system stores the system times as integer 65 values in the format: 66 YYYYMMDDHHMMSS Field Size YYYY 2 bytes MM 1 byte DD 1 byte HH 1 byte MM 1 byte SS 1 byte Figure 4: Format of Date-Time Field 67 Modify Time The last time this file was modified. Stored in the same format as the Create Time shown above. 68 File Name The file name, null terminated. Because of the null terminator, the maximum length of any filename is 69 30 bytes. 70 Valid characters are upper and lower case letters (a-z, A-Z), digits (0-9) and the underscore character ( ). 71 3.3 File Allocation Table (FAT) 72 Each directory entry contains the starting block number for a file, let’s say it is block number X. To find the next 73 block in the file, you should look at entry X in the FAT. If the value you find there does not indicate End-of-File 74 (see below) then that value, call it Y, is the next block number in the file. 75 That is, the first block is at block number X, you look in the FAT table at entry X and find the value Y. The 76 second data block is at block number Y. Then you look in the FAT at entry Y to find the next data block number… 77 continue this until you find the special value in the FAT entry indicating that you are at the last FAT entry of the 78 file. 79 The FAT is really just a linked list, which the head of the list being stored in the “Starting Block” field in the 80 directory entry, and the ‘next pointers’ being stored in the FAT entries. 81 Fat entries are 4 bytes long (32 bits), which implies there are 128 FAT entries per block. Special values for FAT entries are described in Figure 5. Value Meaning 0x00000000 This block is available 0x00000001 This block is reserved 0x00000002– 0xFFFFFF00 Allocated blocks as part of files 0xFFFFFFFF This is the last block in a file Figure 5: Value of FAT entry 82 83 4 Byte Ordering 84 Different hardware architectures store multi-byte data (like integers) in different orders. Consider the large integer: 85 0xDEADBEEF 86 On the Intel architecture (Little Endian), it would be stored in memory as: 87 EF BE AD DE 4 88 On the PowerPC (Big Endian), it would be stored in memory as: 89 DE AD BE EF 90 Our file system will use Big Endian for storage. This will make debugging the file system by examining the raw 91 data much easier. 92 This will mean that you have to convert all your integer values to Big Endian before writing them to disk. There 93 are utility functions in netinit/in.h that do exactly that. (When sending data over the network, it is expected the 94 data is in Big Endian format.) 95 See the functions htons, htonl, ntohs and ntohl. 96 The side effect of using these functions will be that your code will work on multiple platforms. (On machines 97 that natively store integers in Big Endian format, like the Mac (not the Intel-based ones), the above functions don’t 98 actually do anything but you should still use them!) 99 5 Submission Requirements 100 What to hand in: You need to hand in a .tar.gz file containing all your source code and a Makefile that produces 101 the executables for parts 1 – 4. 102 Please include a readme.txt file that explains any bonus activities that you completed. 103 The file is submitted through connex.csc.uvic.ca site. 104 6 Possible Bonuses 105 There is lots of room for bonus marks in this assignment. As before, you must get permission from your instructor 106 before you begin the bonus activities, at least one week before the due date. Some things you may consider: 107 1. implementing directories other than the root directory 108 2. implementing fast searching for directories and/or free space 109 3. implementing the file system as a device driver for Linux 110 4. writing a shell to interact with the file system If you have any other ideas, please email your instructor. 5 A An Exercise Q1 Consider the superblock shown below: 0000000: 4353 4333 3630 4653 0200 0000 1400 0000 CSC360FS…….. 0000010: 0001 0000 0028 0000 0029 0000 0008 0000 …..(…)…… 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ……………. (a) What block does the FAT start on? How many blocks are used for the FAT? (b) What block does the root directory start on? How many blocks are used for the root directory? 6 Q2 Consider the following block from the root directory: 0005200: 0300 0000 3100 0000 0500 000a 0007 d50b ….1……….. 0005210: 0f0c 0000 07d5 0b0f 0c00 0066 6f6f 2e74 ………..foo.t 0005220: 7874 0000 0000 0000 0000 0000 0000 0000 xt………….. 0005230: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. 0005240: 0300 0000 3600 0000 0a00 0014 0007 d50b ….6……….. 0005250: 0f0c 0000 07d5 0b0f 0c00 0066 6f6f 322e ………..foo2. 0005260: 7478 7400 0000 0000 0000 0000 0000 0000 txt…………. 0005270: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. 0005280: 0300 0000 4000 0000 5e00 00bb ff07 d50b ….@…^……. 0005290: 0f0c 0000 07d5 0b0f 0c00 006d 616b 6566 ………..makef 00052a0: 7300 0000 0000 0000 0000 0000 0000 0000 s…………… 00052b0: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. 00052c0: 0300 0000 9e00 0000 0100 0000 0807 d50b ……………. 00052d0: 0f0c 0000 07d5 0b0f 0c00 0066 6f6f 332e ………..foo3. 00052e0: 7478 7400 0000 0000 0000 0000 0000 0000 txt…………. 00052f0: 0000 0000 0000 0000 0000 00ff ffff ffff ……………. (a) How many files are allocated in this directory? What are their names? (b) How many blocks does the file makefs occupy on the disk? 7 Q3 Given the root directory information from the previous question and the FAT table shown below: 0000200: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000210: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000220: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000230: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000240: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000250: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000260: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000270: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000280: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 0000290: 0000 0001 0000 0001 0000 0001 0000 0001 ……………. 00002a0: 0000 0001 0000 002a 0000 002b 0000 002c …….*…+…, 00002b0: 0000 002d 0000 002e 0000 002f 0000 0030 …-……./…0 00002c0: ffff ffff 0000 0032 0000 0033 0000 0034 …….2…3…4 00002d0: 0000 0035 ffff ffff 0000 0037 0000 0038 …5…….7…8 00002e0: 0000 0039 0000 003a 0000 003b 0000 003c …9…:…;……?…. (a) What blocks does the file foo.txt occupy on the disk? (b) What blocks does the file foo2.txt occupy on the disk? 8
School of Engineering Coursework Title: State Driven Design of a Sequential Ladder Diagram Program. Module Name: Automation and IoT Module Code: 6570USST Level: 6 Credit Rating: 20 Weighting: Pass/Fail to Support the 20% Quiz Component Lecturer: Ping Liu Contact: If you have any issues with this coursework you may contact your lecturer. Contact details are: Room: T606 Issue Date: Week 10 Hand-in Date: Week 12 Hand-in Method: Submission on Moodle Feedback Date: Week 12 Feedback Method: Feedback will be via the associated Moodle Quiz Programmes: MBEng EEE Suite Introduction This assignment is an exercise in State Driven Design using ladder software Learning Outcomes to be assessed LO2, LO3 Coursework Specification State Driven Design of a Sequential Ladder Diagram Program. Assignment 2 – Ladder Based State Design Exercise, PUFFIN Crossing The objective of this exercise is to use State Design methods to produce a simple sequential system using Ladder Diagram programming. A system that should be familiar to most people will be used, the PUFFIN pedestrian crossing. The PUFFIN crossing replaced the earlier PELICAN crossing and is now the standard UK pedestrian crossing. The crossing is called PUFFIN to reflect its behaviour as a Pedestrian User Friendly INtelligent Crossing (and clearly to give it a silly bird’s name like the previous crossing). The objectives of the change were to: • Remove the PELICAN crossing’s flashing amber / flashing green man state, which led to pedestrian confusion and accidents due to over aggressive driving. • Have the same light sequence for car drivers as other UK traffic lights. • Move the pedestrian lights from the far side of the road to the near side, in a position where pedestrians are more likely to see traffic approaching from the right (fig.1). • Introduce sensors to detect when pedestrians had completed crossing, so that drivers are not held waiting at an empty crossing. Figure 1. PUFFIN Pedestrian display and control unit The crossing you are to program is known as a “Mid-block” crossing, which is the simplest type where the crossing is situated well away from a road junction. You are to make the following assumptions about the crossing hardware. The pedestrian buttons are normally open, momentary action push-buttons. The buttons on each side of the road are connected together in a “wired OR” configuration, so that only a single PLC input is required. TRUE means one of the buttons is currently pressed. The pedestrian detection circuit is designed to give a TRUE signal when there are no pedestrians waiting on either side or on the crossing. Again, a single PLC input is used. The system is mains powered, single phase 240VAC. No output requires more than 1 amp. Figure 2. This State Diagram describes the required behaviour of the system. Traffic light abbreviations R = RED, A = AMBER, G = GREEN Pedestrian display abbreviations RM = RED MAN, GM = GREEN MAN, W = WAIT TNen = 1 means enable timer TN, TN elapsed means timer TN Present Value has reached Set Point Task. Use PicoSoft6 Software to program the PUFFIN crossing described by the State Diagram in figure 2 using Ladder Programming language. You can add clear comments in your program. If you are unfamiliar with PUFFIN Crossings this video might help. http://58.247.255.82:8125/moodle/mod/resource/view.php?id=42558 Before you start programming, set up the Configuration to assign an appropriate PLC and any extension modules required. Your solution ladder program must produce the behaviour specified by the State Diagram given, even if you think the State Diagram is not a perfect match to the behaviour of a real PUFFIN crossing! It has been simplified deliberately. Don’t add any extra “features”. No timer Set Points are indicated on the State Diagram. You should choose sensible values. It might be a good idea to use shorter than normal values while developing the system, only changing to realistic values when the system is working properly. Create a single PDF report that includes: 1) Cover sheet. 2) Choose an appropriate controller and provide reasons. 3) STATE/OUTPUT table. 4) Briefly describe the procedures you used to test your program. 5) Formal printout of your program in ANSI format from Picosfot. Note: The total number of pages for sections 2) to 4) should not exceed two pages. Assessment. This is a Pass/Fail component. You must demonstrate your program and submit report to Moodle in PDF format to be successful. The associated quiz will test you on the process, programming and advantages/disadvantages etc. of sequential logic design methods. Guide to Performance Criteria 70% and above: Your work must be of outstanding quality and fully meet the requirements of the coursework specification and learning outcomes stated. You must show independent thinking and apply this to your work showing originality and consideration of key issues. There must be evidence of wider reading on the subject. 60% - 70%: Your work must be of good quality and meet the requirements of the coursework specification and learning outcomes stated. You must demonstrate some originality in your work and show this by applying new learning to the key issues of the coursework. There must be evidence of wider reading on the subject. 50% - 60%: Your work must be comprehensive and meet all of the requirements stated by the coursework specification and learning outcomes. You must show a good understanding of the key concepts and be able to apply them to solve the problem set by the coursework. There must be enough depth to your work to provide evidence of wider reading. 40% - 50%: Your work must be of a standard that meets the requirements stated by the coursework specification and learning outcomes. You must show a reasonable level of understanding of the key concepts and principles and you must have applied this knowledge to the coursework problem. There should be some evidence of wider reading. Below 40%: Your work is of poor quality and does not meet the requirements stated by the coursework specification and learning outcomes. There is a lack of understanding of key concepts and knowledge and no evidence of wider reading. AHEP4 Outcomes to be assessed M4 Select and critically evaluate technical literature and other sources of information to solve complex problems. M6 Apply an integrated or systems approach to the solution of complex problems. M10 Adopt a holistic and proportionate approach to the mitigation of security risks. M11 Adopt an inclusive approach to engineering practice and recognise the responsibilities, benefits and importance of supporting equality, diversity and inclusion. Extenuating Circumstances If something serious happens that means that you will not be able to complete this assignment, you need to contact the module leader as soon as possible. There are a number of things that can be done to help, such as extensions, waivers and alternative assessments, but we can only arrange this if you tell us. To ensure that the system is not abused, you will need to provide some evidence of the problem. Any coursework submitted late without the prior agreement of the module leader will receive 0 marks. Academic Misconduct The University defines Academic Misconduct as ‘any case of deliberate, premeditated cheating, collusion, plagiarism or falsification of information, in an attempt to deceive and gain an unfair advantage in assessment’. The School takes Academic Misconduct very seriously and any suspected cases will be investigated through the University’s standard policy. If you are found guilty, you may be expelled from the University with no award. It is your responsibility to ensure that you understand what constitutes Academic Misconduct and to ensure that you do not break the rules. If you are unclear about what is required, please ask. Cheating includes: (i) any form of communication with, or copying from, any other source during an examination; (ii) communicating during an examination with any person other than an authorised member of staff; (iii) introducing any written, printed or other material into an examination (including electronically stored information) other than that specified in the rubric of the examination paper; (iv) gaining access to unauthorised material in any way during or before an assessment; (v) the use of mobile phones or any other communication device during an assessment or examination; (vi) the submission of false claims of previously gained qualifications, research or experience in order to gain credit for prior learning; (vii) the falsification of research data, the presentation of another’s data as one’s own, and any other forms of misrepresentation in order to gain advantage; (viii) the submission of work for assessment that has already been submitted as all or part of the assessment for another module without the prior knowledge and consent of the Module Leader for the subsequent assessments; (ix) the submission of material purchased or commissioned from a third party, such as an essay-writing service, as one’s own. Plagiarism is defined as the representation of the work, artefacts or designs, written or otherwise, of any other person, from any source whatsoever, as the student's own. Examples of plagiarism may be as follows: i) the verbatim copying of another's work without clear identification and acknowledgement including the downloading of materials from the Internet without proper referencing of materials; ii) the paraphrasing of another's work by simply changing a few words or altering the order of presentation, without clear identification and acknowledgement; iii) the unidentified and unacknowledged quotation of phrases from another's work; iv) the deliberate and detailed presentation of another's concept as one's own. Collusion includes: (i) the conscious collaboration, without official approval, between two or more students in the preparation and production of work which is ultimately submitted by each in an identical or substantially similar form and/or is represented by each to be the product of his or her individual efforts; (ii) collusion also occurs where there is unauthorised co-operation between a student and another person in the preparation and production of work which is presented as the student's own. For more information you are directed to following the University web pages
2. How did Aristotle categorize human connection/friendship and how does it connect to his view of living well/happiness (eudaimonia)? How does human connection/friendship has changed and what other elements should be considered in developing human connection/friendship in our modern society? ● Reading focus: Nicomachean Ethics Book I, VIII, and IX. Requirements: ● You shall develop a coherent and well-structured essay with clear argument(s) and supporting evidence and analysis. ● Your essay should be at least 1000 words. ● You shall include in-text citations and a work-cited page. Please follow the MLA citation format for in-text citations, but for the work-cited page, you just need to list all your references. ● You must include at least TWO course materials assigned in the past four weeks in your writing. ○ I'd be happy to help you with finding additional materials. Just let me know. Suggestions: ● Please consider using Rollo May, Joseph Campbell, or Roger Schmidt's arguments on the function of myth in your paper. ● Please support your argument with clear textual evidence and in-depth analysis. ● Please use specific examples to demonstrate your point. ● Do not fill your paper with quotes as your paper is a way of training and demonstrating your thinking ability. Include a few well-chosen quotes that strengthen your writing but pleaseadd some further elaboration and commentary before or after each quotation. ● Be specific in your writing – that is, focus on a few major points elaborate with detailed analysis, and try to use meaningful and information-rich phrases. ● You may need to do some outside research. If so, please use the academic database or creditable sources. ● You may want to consider how to address the opposing views.
CO3219/CO4219/CO7219 Internet and Cloud Computing Cloud System Design and Evaluation Learning Outcomes The aims of this deliverable are: 1. to develop technological foundations in Cloud Systems; and 2. to develop expertise in designing and developing robust and scalable Cloud Systems Assignment Brief You have to design and document the architecture of a private cloud, implement it, and then deploy on it a whiteboard application. The whiteboard application should run on at least two separate machines/nodes within your private cloud, with each application’s instance serving a different group of users. You should demonstrate that the distributed application running in your private cloud is consistent in a way that all users will see the same application state across different instances. The cloud and application should scale to support additional users, and offer an improved quality of service when compared to one running on a single machine. The assignment is separated into a set of tasks, and there is a set of activities that should be undertaken for each task. You are not restricted to a particular set of tools or technologies, but you must justify your choices when designing and implementing your architecture. You should critically assess your options before you propose your architecture and carefully evaluate the tools and technologies to implement your architecture. Task 1: Produce a Design of your Private Cloud [20%] Your group should design and produce an architecture of a private cloud. You should evaluate available cloud computing technologies and architectures before you propose a suitable architecture of your private cloud. You should put in place mechanisms to achieve consistency, scalability, agility and quality of service. You should also propose a suitable network topology for your private cloud to manage traffic and provide fault tolerance. In this task, you should demonstrate a clear understanding of the tools and technologies that you will use to implement the design of your private cloud. You should have convincing reasons to justify the technology choices that you have made for the implementation of your architecture. As an individual member of your group, you will be required to explain the design of your group’s private cloud and technological choices in your own words. You should also include an architecture diagram of your proposed system. Task 2: Implement your Private Cloud [30%] Your group should implement the architecture for the private cloud that has been proposed in Task 1. To implement your private cloud, you should use the set of tools and technologies that you have identified in Task 1. You should be able to demonstrate a working implementation of the private cloud in this task. Your implementation should enable the private cloud to provide a consistent, scalable and low latency access to your whiteboard application. This should also enable new virtual machines to be provisioned and decommissioned with minimal human intervention. Each group should set up the private cloud on a minimal number of nodes possible. You should deploy the whiteboard application in each virtual machine. However, the distributed application should offer consistency and should look like a single application to outside users. There are no additional marks for setting up a higher number of nodes in your private cloud. As an individual member of your group, you should explain the implementation, including the environment, components and explanation of any code, in your own words. Task 3: Develop and Deploy a Distributed Whiteboard Application [20%] Your group will need to design and develop a whiteboard application that can be shared between multiple nodes of the private cloud. You need to run this whiteboard in a distributed environment: at least two nodes should be running the application. The whiteboard should support basic drawing features, such as lines and shapes, as well as allowing text to appear anywhere on the whiteboard. The whiteboard should allow multiple users to draw simultaneously on a shared interactive canvas. When a new user joins the system, the user should obtain the current state of the whiteboard so that the same objects are displayed to every active user. At least two virtual machines within the cloud system should host two instances of the whiteboard. However, all the users should see the same state of the whiteboard and should have the ability to perform. all the drawing operations. As an individual member of your group should explain the design and implementation of your whiteboard application in your own words. This should include consistency, replication and state management, and how your group has implemented the application. Task 4: Demonstrate your Private Cloud [10%, shared by all group members] There are two aspects of demonstration. Each group should produce a video. The video should highlight the functionality of your private cloud and their individual contributions in Tasks 1, 2 and 3. The group should highlight how you are monitoring the computing, storage and network resources in your private cloud and should provide information about consistency, performance, availability and scalability of resources. The cloud resource consumption (performance numbers) should go up/down with more users and whiteboard application instances and vice versa. As an individual member of your group, you should demonstrate a component of the private cloud that they have produced in Task 2 by deploying the application that they have produced in Task 3. As an individual member of your group you should produce maximum one page demonstrating your individual contributions. Task 5: Critical Review of your System [20%] As an individual member of your group, you should a critical review of your system, in your own words. This should be based on your own understanding and contributions. You should have cover three areas in your critical review: 1. the architecture of your system and the choices made; 2. the weaknesses and strengths of your design and implementation; 3. suggestions to improve your system’s design and implementation, including reasons for how the proposed changes will improve the functionality of your system.
CS 3233-01 Homework #4 Fall 2024 Due: October 23 Assignment Write a C++ program that uses OpenGL to draw a lighted 3D scene chosen from this list with at least five objects in the scene: • A still life. The primary objects in the scene are items that one might find on a ta- ble or desk such as flowers, pieces of fruit, or common office products. • A transportation scene. The primary objects in the scene are cars, trucks, boats, or other mechanized conveyances. • An architectural scene. The primary objects in the scene are buildings. At least one of the buildings should have an open door or window that allows the viewer to see inside. There should be an object inside for the viewer to see. • Your choice. A scene of your own choosing, subject to my approval. Your program should incorporate Eck’s camera API so the viewer can use a mouse or other pointing device to rotate the scene and view it from various directions. You may use your prism-drawing function if you find it helpful. Your program should set a location for the light source other than the default location. Also, exercise your own inventiveness to include a feature in your program that will sur- prise me. Include a README file that tells me what the surprise is. I will test your pro- gram and look for the surprise before I read the README. Submission Requirements To receive credit for the assignment, your submission must fulfill these requirements: • It must include all the source code files necessary for your application. • Each source code file must begin with a box comment that identifies you, the course, the project, and the due date. • It must include a makefile that builds your application on the SoC server when I enter the one-word command make on the Linux command line. There must be no build-time errors. Building must produce an executable application called a.out. • The makefile must not auto-run the application after it finishes building. • The makefile must include a clean entry.
BUSI4528-E1 BUSINESS SCHOOL A LEVEL 4 MODULE, AUTUMN SEMESTER 2021-2022 QUANTITATIVE RESEARCH METHODS FOR FINANCE AND INVESTMENT 1. a) Consider an experiment on student scores. Students were randomly assigned within schools into three types of classes: small classes with 13-17 students, regular-sized classes with 22-25 students, and regular-sized classes with a full-time teacher aide to assist the home teacher. Student scores on academic tests were recorded as well as some information about the students, teachers, and schools. The experiment considers two academic years data. Score: total examine scores of each student. Small: an indicator variable Small=1 if the student is in a small class, and 0 otherwise. Aide: an indicator variable Aide=1 if there is a full-time teacher aide in the regular- sized classes, and 0 otherwise. Tchexper: teacher’s experiences in terms of years. Male: an indicator variable Male=1 if the student is male, and 0 otherwise. Tchmasters: an indicator variable Tchmasters=1 if the teacher has a master degree, and 0 otherwise. Absent: number of days the student is absent from school. tchid: individual teacher’s ID. schid: individual school’s ID. Using Stata, we regress by OLS student score on small class size, teacher’s aide, student’s gender, teacher’s experience and education. The estimated results are shown below: (i) Write down the regression model and interpret the meaning and significance of each coefficient. Are the signs and relative magnitudes consistent with the intuition? [25 marks] (ii) The table below shows the results of an alternative specification. Compare the results from the table below to those in part (i). In particular, comment on the standard errors. [10 marks] (iii) Let INCOME= income per capita (in thousands of U.S. dollars) and BACHELOR= percentage of the population with a bachelor’s degree or more for selected U.S. states, a total of N=52 observations. The results from a simple linear regression of INCOME and BACHELOR are INCOM(̂)E=12.11+1.24BACHELOR se (2.75) (0.132) Construct a 99% interval estimate of the coefficient for BACHELOR. Interpret the interval estimate. [15 marks] b) Explain the setup of the Logit model and how it provides a remedy to the shortcomings of the Linear Probability Model (LPM). [35 marks] c) Explain how to use estimates of the Logit model to calculate marginal effects of continuous explanatory variables and discrete effects of binary explanatory variables. [15 marks] Total [100 marks] 2. a) Consider the experiment and initial estimation of Question 1.a. We estimate the determinants of Score using a different model, and the results are shown below. Compare the results of the new estimation and those in Question 1.a. State the difference between the two models. [25 marks] b) What is type I and type II error in hypothesis testing? [15 marks] c) What are the main characteristics of t-distribution and under what circumstances should we use t-distribution rather than normal distribution? [10 marks] d) For time series data, outline the static model and the ARDL(1,1) model with serially correlated errors. Discuss the consequences for OLS estimators and standard errors. [30 marks] e) Outline the Dickey-Fuller test for the null hypothesis (H0) of the presence of a unit root in an autoregressive time series model against the alternative hypothesis (HA) that the autoregressive time series model is stationary around a zero mean. [20 marks] Total [100 marks] 3. a) Consider an experiment on whether government subsidies reduce the impact of recession on company’s bankruptcy. Suppose that, before the recession, the government provided subsidies to companies in some selected cities. Companies in the other cities received no subsidies. Before the recession, there were 102 companies receiving subsidies and 153 companies that did not. After the recession, for the companies that received subsidies, 93 survived. Among the companies that did not receive subsidies, 123 survived after the recession. (i) Let the companies without subsidies before the recession be the control group and those that received subsidies be the treatment group. Draw a graph showing that treatment effect. Calculate the magnitude of the treatment effect. [15 marks] (ii) Suppose that we have data of these two groups for 6 years including the recession year, so that N=12. Let AFTER t=1 for years after the recession and AFTER t =0 for years before the recession. Let TREATi=1 for the group whose companies received subsidies and TREATi=0 for the group whose companies did not receive subsides. Let Numcompi,t be the number of companies in each group in each year. We obtained the estimation below: Numcom(̂)pl,t = 166 − 2.87TREATi − 47AFTER t+ 20.3(TREATi × AFTER t) (se) (8.8) (7.6) (10.6) What is the treatment effect from above equation? Is the estimated treatment effect significant at the 5% level? [10 marks] b) Explain what the Central Limit Theorem is. [10 marks] c) What are the assumptions of estimating a multiple linear regression model? [15 marks] d) Outline the Linear Probability Model (LPM) and explain in detail its main limitations. [30 marks] e) What is meant by saying that a time series is stationary? Is the following time series stationary or non-stationary? yt = yt−1 + vt where error terms vt are i.i.d. with mean zero and variance σv(2) , and t denotes the t-th time period. Justify your answer. [20 marks] Total [100 marks] 4. a) Define the term multicollinearity and explain how it can be detected. Also, list the negative consequences of multicollinearity and explain how they can be mitigated. [50 marks] b) Consider the following time series model : yt = β0 + β1xt + β2xt−1 + yyt−1 + εt It is suspected that the model suffers from serial correlation in the error term of the form. εt = Pεt−1 + ut where ut is an identically and independently distributed error term and t denotes the t- th time period. Describe in detail a test to detect serial correlation in the above form of the model. [25 marks] c) Explain the Engle-Granger test for cointegration and write down the steps involved in the test. [25 marks] Total [100 marks]
MGRC30003: Decision Analysis & Simulation, 2024/25 – Summative Individual Coursework Administration This is the summative individual assessment for this unit. It accounts for 70% of the overall unit mark. You must answer all of the questions from both cases attached. Please submit your work, presented either in a single Word or PDF format document. The assessment is to be completed individually and independently. 1st Case: TechPharma’s New Vaccine Development Project It is January 1 of year 0, and Tech Pharma is trying to determine whether to continue development of a new vaccine. The following information is relevant. You can assume that all cash flows occur at the ends of the respective years. Clinical trials (the trials where the vaccine is tested on humans) are equally likely to be completed in year 1 or 2. There is an 80% chance that clinical trials will succeed. If these trials fail, the FDA will not allow the vaccine to be marketed. The cost of clinical trials is assumed to follow a triangular distribution with best case £100 million, most likely case £150 million, and worst case £250 million. Clinical trial costs are incurred at the end of the year clinical trials are completed. If clinical trials succeed, the vaccine will be sold for five years, earning a profit of £6 per unit sold. If clinical trials succeed, a plant will be built during the same year trials are completed. The cost of the plant is assumed to follow a triangular distribution with best case £1 billion, most likely case £1.5 billion, and worst case £2.5 billion. The plant cost will be depreciated on a straight-line basis during the five years of sales. Sales begin the year after successful clinical trials. Of course, if the clinical trials fail, there are no sales. During the first year of sales, Tech Pharma believes sales will be between 100 million and 200 million units. Sales of 140 million units are assumed to be three times as likely as sales of 120 million units, and sales of 160 million units are assumed to be twice as likely as sales of 120 million units. Tech Pharma assumes that for years 2 to 5 that the vaccine is on the market, the growth rate will be the same each year. The annual growth in sales will be between 5% and 15%. There is a 25% chance that the annual growth will be 7% or less, a 50% chance that it will be 9% or less, and a 75% chance that it will be 12% or less. Cash flows are discounted 15% per year, and the tax rate is 40%. (A) Use simulation to model Tech Pharma’s situation. Based on the simulation output, would you recommend that Tech Pharma continue developing the vaccine? Explain your reasoning. What are the three key drivers of the project ’s NPV? (20points) (B) What is the expected range of sales for the first year of the vaccine's launch, and how do different sales scenarios impact the project's NPV? (10points) (C) How do variations in the annual growth rate of sales from years 2 to 5 affect the overall profitability of the project? (10points) (D) How sensitive is the project's NPV to changes in the cost of clinical trials and the construction of the plant? (10points) (E) If the discount rate were to change from 15% to another rate, how would this affect the recommendation to proceed with the vaccine development? (10 points) 2nd Case: Pixel Palace and Melody Mirage There are two unique entertainment venues in a town. Pixel Palace, a retro- themed arcade and virtual reality center, and Melody Mirage, an innovative music lounge and holographic concert venue, stand as the most popular attractions. Every month, both venues record their visitor numbers (refer to the TS_data_2024.csv file), which reflect their popularity and the city's entertainment trends. Over recent years, monthly visitor numbers have fluctuated for both venues due to technological advancements, changing consumer preferences, and the impact of social media trends. To better understand their visitor patterns and make decisions for future expansions and event planning, they have decided to undertake a time series analysis of their data. (F) Suppose both venues want to forecast the visitor numbers for the next 6 months based on the historical data using smoothing methods. Which method should be chosen for each venue, and what are the predicted values? Please write down your analysis using R. (20points) (G) Suppose both venues want to forecast the visitor numbers for the next 6 months based on the historical data using ARIMA models. Which ARIMA models (indicate p, d, and q) should be chosen, and how do you interpret the parameters? Please write down your analysis using R. (20points) Reading beyond the course materials and text could be vital. You are expected to provide a comprehensive managerial report on formulation, solution, and the discussion of the interpretation of the output. Use of graphs and diagrams to illustrate your analysis is certainly encouraged. Use of screenshots of the Excel spreadsheet or R model for each of the various scenarios is also recommended for your own convenience. You should also include the discussion of any assumptions made in applying your techniques to this reassessment case study. You are reminded that this assignment should be handled as an academic report and therefore, the justification ofyour answers is considered essential. You should consider integrating any technical details and/or mathematical justification in the appendix. Finally, it is important to involve an extended analysis followed by more general suggestions of how this project could be extended, including a statement of requirements to carry out the proposed extension. (ILOs 1-5) Intended Learning Outcomes (ILOs): 1) define and identify the problematic area through developing your logical reasoning and creativity, 2) transform a conceptual framework or verbal statement into its equivalent mathematical model, 3) develop and apply fundamental optimization, simulation, and (multi-criteria) decision analysis models towards the effective decision-making in practice, 4) demonstrate a sound knowledge of a range of available tools and methodologies for the solution of problems and the robust analysis and interpretation of their outputs, 5) utilise computer-based softwares which, along with appropriate data, can facilitate the formulation and execution of various models. Submission Requirements: a. This type of individual summative assessment is weighing 70% of the overall unit mark and is related to covering all intended learning outcomes (ILOs 1-5) of weeks 7, 8, 9, and 10 of the unit. b. The report should be uploaded to Turnitin either in a Microsoft Word or a PDF format. You should use a 12-point font size and line-spacing of 1.5 with blank line between paragraphs. Margins should be set to “Normal ” . c. The word count guide (1,500 words) includes everything in the main body of the text (including headings, tables, citations, quotes, lists, etc.). The list of references, appendices, and footnotes are NOT included in the word count. There are no set penalties for going over or under this word count guide. If your work is greater than the suggested maximum, you should interpret this as a signal that you have not expressed your ideas concisely or with enough focus. Similarly, if it is less than the suggested minimum, you should interpret this as a signal that your coverage of the subject is insufficiently comprehensive, either in depth or breadth of coverage. d. An outline marking scheme is shown in brackets to the right of each question. e. Use Harvard System for referencing. f. Include relevant tables and equations where necessary. g. There is no expected absolute number of sources, as the quality of sources is more important than the quantity. Students are expected to read widely and should consult the recommended journal list in the Unit Handbook for guidance on high-quality sources. h. Ensure that you take frequent and multiple backups of your work, since excuses concerning lost or corrupted files will not be treated sympathetically. i. You must submit an electronic copy of the assignment to the Turnitin through Blackboard. Regarding the similarity index, you might consult Undergraduate Studies Office for more information about academic plagiarism policy. In addition, full submission details can be found in your BSc program Handbook. j. AI Usage Statement for Assignments: Generative A I (GenAI) tools (like ChatGPT) may be allowed for brainstorming, planning, grammar checking, and proofreading. Specifically, you may use them to generate ideas, develop outlines, create project timelines, identify and correct grammar errors, and improve overall readability. However, using GenAI tools to generate new content is strictly prohibited. All submitted work must be your original creation. Using GenAI for content generation violates academic integrity and will not be accepted. k. Various websites claim that they help students by showing them what is expected on a typical assignment such as in Business Analytics. These tacitly encourage plagiarism and copying, which does not demonstrate true understanding. It is more important to develop your own voice and your own abilities in writing and research. All assignments are scanned via plagiarism detection software. l. The individual assessment submission date is at 13:00pm on Thursday 5 December 2024.