Advanced NLP: Assignment 1 Feature-Based Semantic Role Labeling (SRL) A) Introduction: You will conduct a feature-based machine learning classification experiment in order to automatically label Propbank Semantic Roles (given the predicate). To accomplish this you will need to: i) get acquainted with the Universal Propbank V1.0 dataset; ii) pre-process this dataset into a workable format for your experiment; iii) motivate and extract three features suitable for automatic SRL, and iv) train and evaluate a logistic regression classifier for SRL as your first model for this course. Please note that this model will be used in the take-home exam. B) Objectives: - Gain hands-on experience developing and running a logistic regression classifier for SRL following the specifications of the Universal Propbank V1.0 dataset. - Gain hands-on experience on identifying and motivating features (drawing on linguistic insight) specifically for SRL. - Gain hands-on experience producing code that is ready and suitable to be further used in interpretability experiments (i.e., challenge datasets). C) Logistic Regression model for SRL Imagine that you are participating in the shared task and need to build a Semantic Role Labeller. A classic SRL consists of four steps: 1. Predicate identification 2. Predicate classification (to be ignore in this assignment) 3. Argument identification (often taken together with step 4) 4. Argument classification Your assignment is to build a system to perform. the last 2 steps of the SRL task in a single model [argument identification + argument classification together] given the predicate. This means you can assume you know the predicates of each sentence in advance. Because this model is going to be used later, during your take-home exam, you SHOULD NOT use information about predicate classification available in the dataset. The Universal Propbank dataset includes information that helps disambiguate the predicate (e.g., come.01 is related to motion, come.02 is related to pursue). Even though the dataset includes this information, you should not use this information to inform. your features. D) What to do: 1. If you have not done so, download and explore the structure of the Universal Propositions Bank v1.0 English dataset (train and test) from its Git repository; 2. Preprocessing: A sentence may have more than one predicate. To solve this, replicate each sentence as many times as there are predicates so each training instance has a single labeled argument structure (See “Appendix: SRL task and data section” for more detailed information). 3. Provide a set of statistics over the train and test set including, for each dataset, the number of tokens and number of sentences before and after preprocessing the datasets to deal with sentences containing multiple predicates. (You can but) You are not required to produce any other statistics. Note that the number of tokens you produce for the test set must match the support in your classification report (i.e., the results). 4. Motivate and extract three features (see “ E) Requirements for Feature Extraction” for more information about this step) for SRL classification using Logistic Regressions. 5. Run one classification experiment using Sklearn’s LogisticRegression by training a token-level SRL on the Universal Propositions Bank v1.0 English dataset. 6. Evaluate the model you produce on the test set. You must use Scikit-learn’s Classification Report to provide Precision, Recall and F1 measures for token-level classification. Make sure to also include a labeled confusion matrix that supports the classification metrics. 7. Store your model’s predictions over the preprocessed testset and save it in a human-readable text format (e.g., as a tsv). Make sure this file contains, at least, the token, the gold label and the predicted label for each prediction. 8. Prepare a ready-to-use function where one can use the trained model to perform SRL on standalone sentences, given the predicate. Among other necessary arguments (e.g., model, etc.), the function should allow the input of a sentence segmented as a list of strings (e.g., [‘ Pia’ , ‘asked’ , ‘ Luis’ , ‘to’ , ‘write’ , ‘this’ , ‘sentence’ ,’ .’]) and a list defining the location of the predicate to label (e.g. [0,0,0,0,1,0,0,0], for the predicate ‘write’). If you think of another better way to design this function, that is also acceptable, as long as it is well documented. Provide an example showing that the function runs a sentence with more than one predicate (you can choose your own sentence(s)!). Note: this function will be important for your take-home exam. 9. Submit one zip file containing a Jupyter notebook (and HTML printout) accompanied by a requirements.txt file and any number of python modules with helper functions. Include also your model’s predictions on the testset (e.g., as a tsv). Do not upload the saved model on Canvas! Provide a link to download the model instead (e.g. needs to be a public link) – make sure the link is available at the top of your notebook. Make sure you run the notebook and save the notebook with the output of all cells. Read more information about the requirements below. E) Requirements for Feature Extraction: - You must motivate and extract EXACTLY three features (not more and not less). There is no such thing as “base/core/default” features. - One of the features you need to motivate and extract is given (same for everyone): This is a complex feature integrating: the directed dependency path from the token to the predicate + the predicate’s lemma; - Feature extraction must be self-sufficient (i.e., not dependent on information contained in the dataset). i.e., you must be able to perform. SRL (given the predicate) of any given tokenized sentence (i.e., a list of strings) and the predicate position. You cannot assume information like lemmas, POS, dependency parse, etc. will be provided with the sentence (i.e., if you use this information to produce features, you must extract it yourself). - Please note that for training/evaluation you will need to ensure the word tokenization is the same as the one used in the dataset. This poses some challenges using SpaCy parsing (which defaults to a different word tokenization to the one used in the shared task). Make sure you are able to handle this. We will discuss this briefly in class. - Each feature must be motivated and be both useful for the task and appropriate for the model (in this case Logistic Regression). Make sure you describe and motivate each feature in your notebook (you can use a markdown cell, or comments to do this). Making sure all features are suitable is a minimum requirement of this assignment (nonsensical or poorly motivated features will lead to failing the assignment). F) Other Requirements for Jupyter Notebook: The Python Notebook should be formatted in a way that will substitute a written report. As such, it should be crafted with care, highlight all important steps of the pipeline and, when necessary, include explaining text and notes about decisions. Your report must include: - A (publically open) link to download the trained model. We recommend using google drive to share a zip containing the model. 』 Do not upload your model on Canvas! - A printed summary of the statistics for both the training and test sets (see above). Make sure your evaluation (and confusion matrix) matches the numbers you have printed in these statistics (i.e. the total number of tokens must match). - A section motivating and explaining the three extracted features (one paragraph per feature should suffice). This paragraph should, for each feature, motivate why the feature is useful for SRL, and describe both how they are extracted and represented (with examples). The motivation should be specific to SRL. - A printed example showing the three extracted features for 2~3 sentences (in pre-vectorized state). This should be an excerpt of the data that will later be fed into the model after vectorization. Make sure no gold data is passed into the model. Passing gold data into the model will result in failing the assignment. - A printed evaluation table using Scikit Learn’s evaluation report, including a labeled confusion matrix. You should also include a couple paragraphs discussing these results (e.g., Are the results good? Which semantic roles are easiest to identify? Which ones were most difficult? etc.) - A printed example showing that your function to perform SRL on standalone sentences is working. Make sure the notebook is sufficiently documented. When in doubt about authorship or lack of understanding, you may be asked for an interview to explain your decisions/code. G) What to submit: Each student submits one zip file using the predefined naming convention (e.g. A1-Student Name.zip). Inside the zip you should include: - A requirements.txt with the necessary installation requirements. - A Python Notebook showcasing the full experiment. This should be submitted both as a notebook (.ipyn) and as an HTML (.html). Make sure you save the notebook (and the HTML) after running every cell – so the outputs are also saved. You should be able to confirm this by inspecting the HTML. - Any number of helper python modules (if needed). - The model’s predictions on the testset as a text file (e.g., as a tsv). H) Grading: The assignment will be graded on a Pass/Fail basis. And based on the following requirements: - Produce a running Python Notebook, including all steps (corpus preprocessing, feature extraction, training and evaluation) to train and evaluate a logistic regression model for SRL; Note: make sure the code runs and does not depend on any files that are not included with your submission (including preprocessed datasets). The only files your code can (and should) depend on are the original Universal Propbank data. - Motivate and extract a predefined feature: a complex feature integrating the directed dependency path from the token to the predicate and the predicate’s lemma; - Motivate and extract two features (in addition to the predefined one), suitable for SRL and Logistic Regression; - Produce a ready-to-use inference function for your model which allows the model to be tested (this used in during the take-home exam); Appendix: SRL task and data The SRL task Semantic Role Labeling (SRL) can be handled as a token classification task. Here, given an input sentence, we want to identify its predicates, and then for each predicate, we want to identify and label its corresponding arguments. Example Sentence: While I read my assignment, the cat sleeps. Predicate-Argument Structure: A dataset for SRL We are using Universal Proposition Banks 1.0, which is in CoNLL format. Remember that this data is labeled only for the syntactic heads (for example, the argument: “while I read my assignment” will only have a label for its syntactic-head (read) instead of the full span. Do note that the notion of head here pertains to the gold dependency parse of the sentence (different theories of grammar may choose different heads). Instead of having: SPAN: [ ‘while’ , ‘ I’ , ‘read’ , ‘my’ , ‘assignment’] LABELS: [’ B-AM-TMP’ , ’I-AM-TMP’ , ’I-AM-TMP’ , ’I-AM-TMP’ , ’I-AM-TMP’] we will have: SPAN: [‘while’ , ‘ I’ , ‘read’ , ‘my’ , ‘assignment’] LABELS: [ ’O’ , ’O’ , ’B-AM-TMP’ , ’O’ , ’O’ ] You will be working with the English data. Here is a screenshot the data (2 sentences): The first sentence has one predicate (enjoy.01), and the second sentence has three predicates (compare.01, be.03, gain.02). Beware that there can be examples without any predicate. Also, note that the conll file has 10 main columns. The 10th column has a predicate-sense label if the current token is a predicate or a “_” otherwise. The columns from 11th to nth are of variable size, and they depend on the number of predicates a sentence has. A sentence with zero predicates has 10 columns, a sentence with 3 predicates has 13 columns, etcetera. The 11th column corresponds to the argument structure of the first predicate, the 12th column corresponds to the argument structure of the second predicate, and so on … For the two sentences shown in the example above, you have four predicates. Each of these four predicates can have its own arguments. This means that, during training, the second sentence will be seen by the model three times – once for each of its predicates.
ECON2131/6034 Public Sector Economics Tutorial 1 The public sector in a mixed economy Multiple choice – Review of consumer theory Note: In the questions below Istands for income, px for price of good x and py or price of goody. 1. Suppose that at current consumption levels an individual's marginal utility of consuming an extra hotdog is 10 whereas the marginal utility of consuming an extra soft drink is 2. Then the number of hot dogs the individual is willing to give up to get one more soft drink is (a) 5. (b) 2. (c) 1/2. (d) 1/5. 2. An increase in an individual's income without changing relative prices will (a) rotate the budget constraint about the x-axis. (b) shift the indifference curves outward. (c) shift the budget constraint outward in a parallel way. (d) rotate the budget constraint about they axis. 3. The slope of the budget constraint line is (a) the ratio of the prices (px/py). (b) the negative of the ratio of the prices (px/py). (c) the ratio of income divided by price of y (I/py). (d) none of the above. 4. If the price of x falls, the budget constraint (a) shifts outward in a parallel fashion. (b) shifts inward in a parallel fashion. (c) rotates outward about the x-intercept. (d) rotates outward about they-intercept. 5. If the prices of all goods increase by the same proportion as income, the quantity demanded of good x will (a) decrease. (b) increase. (c) remain unchanged. (d) change in away that cannot be determined from the information given. 6. Assume x andy are the only two goods a person consumes. If after a rise in pX the quantity demanded of y increases, one could say (a) the income effect dominates the substitution effect. (b) the substitution effect dominates the income effect. (c) it is still impossible to determine whether the substitution or income effect dominates. (d) none of the answers are correct. 7. An individual's demand curve (a) represents the various quantities that a consumer is willing to purchase of a good at various price levels. (b) is derived from an individual's indifference curve map. (c) will shift if preferences, prices of other goods, or income change. (d) all of these answers are correct. 8. If the compensated and ordinary demand curves for a good intersect, at that point the ordinary demand curve will be (a) flatter if this is a normal good. (b) steeper if this is a normal good. (c) flatter if this is an inferior good. (d) horizontal. 9. If an individual's utility function is given by U(x, y) = and I = 100, px = 1, py = 4,his or her preferred consumption bundle will be: (a) (20,20) (b) (50,12.5) (c) (40,15) (d) (30,15) 10. If an individual's utility function is given by U(x, y) = 2x + y and px = 2, py = 3, I = 50, this person will choose: (a) (10, 10). (b) (15, 6.67). (c) (25, 0). (d) (0, 50/3). Discussion - Topic 1: The public sector in a mixed economy 1. For each of the following programs, identify one or more “unintended” consequences: (a) Rent control (b) Minimum wages (c) Agricultural price supports (d) Providing health insurance to children who currently are underinsured (Hint: Think about the US where a large proportion of the population is uninsured) (e) National testing standards for schools 2. Discuss the applications of the following economic principles to the public sector: (a) Scarcity, choice and opportunity cost (b) Welfare (c) Marginal benefit and cost (d) Price theory (e) Efficient production
Digital Systems Laboratory 4/MSc ELEE10023/PGEE11117 Course Description Aim: The course aims to produce students who are capable of developing hardware-software digital systems from high level functional specifications and prototyping them on to FPGA hardware using a standard hardware description language and software programming language. Pre-requisites: Digital Systems Laboratory 3 (ELEE09018) or Digital Systems Laboratory A (PGEE10017) or equivalent in other schools and outside institutions (see below). Engineering Software 3 or equivalent is advisable but not necessary. Co-requisites: Undergraduate students must take Digital System Design 4 (ELEE10007) Prohibited Combinations: None Visiting Students Pre-requisites: Digital design using Verilog, and embedded system programming. Keywords: Embedded Digital System Design, Embedded Processor Programming, Verilog, Data path and Control Path design, Hardware-Software Co-design Default Course Mode of Study: Lab only, 10 weekly 3-hr lab sessions Default Delivery Period: Semester 2, starting in Week 2. Learning Outcomes: 1. Knowledge and understanding of: I. Data paths and Control paths and number of ways of designing them; II. Instruction-set based control path design; III. Control and data path integration; IV. Capture the design of hardware-software digital systems in a standard hardware description language; 2. Intellectual I. Ability to use and choose between different techniques for digital system design and capture; II. Ability to evaluate implementation results (e.g. speed, area, power) and correlate them with the corresponding high level design and capture; 3. Practical I. Ability to use a commercial digital system development tool suite to develop hardware-software digital systems and prototype them on to FPGA hardware; Lab Content, What You Are Required To Do You are required to develop a microprocessor-based system on FPGA with a demo application, written in software running on the microprocessor, which controls toy race-cars remotely. Students will be split into groups of two students each to tackle this problem. The final system will allow a user to simulate controlling cars remotely using a mouse, a VGA screen, and the BASYS 3 FPGA board as illustrated in the figure below. Figure 1. Proposed System for FPGA-Based Remote Car Control A user will be able to hover a mouse pointer over a VGA screen with the position of the mouse pointer on screen commanding the movement of the remote car as illustrated in the following figure. Figure 2. Car movement (command) depending on mouse pointer position The FPGA-based system will consist of a simple microprocessor, a VGA interface, and a mouse interface, in addition to other peripherals which will be detailed later on this document. An Infra-Red (IR) Transmitter interface can be added (As A Bonus) to control a toy car. Each team member will first develop one of the following peripherals, individually: VGA interface, or mouse interface. The whole group will then collaborate to develop the microprocessor at the heart of the proposed system with all necessary peripherals, as well as the complete demo application. Assessment: The lab will be assessed during lab sessions through a number of checkpoints. The overall lab mark will be split into individual (60%) and group components (40%) for both undergraduate and postgraduate students. All students are encouraged to keep a lab book. Verification: To support remote working, a verification environment will be provided for each interface. Details will be provided on Learn. Assessment: The individual component will consist of two checkpoints: 1. First individual assessment in Week 5, when every team member will be assessed on the particular peripheral interface they developed i.e. mouse driver or VGA interface. Code should be uploaded to Learn prior to the timetabled laboratory in Week 5 for assessment. This will account for 25% of the overall lab mark. 2. Second individual assessment in Week 8, when every team member will demonstrate a working microprocessor + peripheral demo software application. For instance, the team member in charge of mouse driver development will present a demo software application running on the microprocessor with the mouse peripheral. Similarly, the team member in charge of VGA interface development will present a demo software application running on the microprocessor with the VGA interface peripheral. The specification of the individual demo software application will be specified later on this document. This second individual assessment will account for 35% of the overall lab mark. Note that while this assessment is individual, it requires prior design of the same microprocessor architecture by all team members. Code should be uploaded to Learn prior to the timetabled laboratory in Week 8 for assessment. 3. The final assessment will be a group assessment in Week 11, where the entire team will demonstrate the complete demo software application running on the complete microprocessor-based system on the BASYS 3 board. There will be an element of peer assessment in this final group based assignment where individuals will be asked to evaluate the individual performance of colleagues within the group. More details of this process will be given nearer the time of assessment. Again all design files should be uploaded to Learn prior to the start of the scheduled laboratory. Details of the university semester structure can be found on the university webpages. Note that the week between weeks 5 and 6 is usually an unnumbered week, Flexible Learning Week. The remainder of this document will present the detailed specification of each component of the proposed system. The details of the assessment components will also be presented where appropriate. 1. PS/2 Mouse Driver The USB connector on the BASYS 3 board can accommodate a USB mouse. Internally, the signals are converted to PS/2-like signals via a microcontroller as discussed on page 7 and 8 of the BASYS 3 reference manual. Hence, all we need do is implement a driver for a PS/2 mouse as the USB connector could be seen as just a wrapper which has already been implemented. The PIC24 drives several signals into the FPGA – two are used to implement a standard PS/2 interface for communication with a mouse or keyboard (see figure 3 below). Figure 3: USB Host signals to PS/2 signal conversion PS/2 devices use a two-wire serial bus (clock and data) to communicate with a host device Communication is bidirectional and performed in packets of 11-bit words, with each word containing a start, stop and odd parity bit. The following describes the PS/2 mouse protocol in detail (NB. the PS/2 keyboard protocol can be found in the BASYS 3 user manual if you are interested in developing a keyboard interface but this is not required in this lab). The mouse device only outputs a clock and data signal when it is moved. Otherwise, the clock and data lines remain at logic high (i.e. ‘1’). Open-Collector drivers are usually used to drive the two-wire bus between the mouse and host. The device can send data to the host only when both data and clock lines are high. Because the host is the bus master, the device must check whether the host is sending data before driving the bus. The clock line is used for this purpose as a “clear to send” signal; if the host pulls the clock line low, the device must not send any data until the clock is released. Communication is performed in 11-bit words, where each word consists of a ‘0’ start bit, followed by 8 bits of data (LSB first), followed by an odd parity bit (i.e. a bit that is set to ‘1’ if the number of 1’s in the 8 bits of data is even, and ‘0’ otherwise), and terminated with a ‘1’ stop bit. The odd-parity bit is used for error detection. Data sent from a PS/2 device to a host is read on the falling edge of the clock signal, whereas data sent from a host to a PS/2 device is read on the rising edge of the clock signal. The following figure shows PS/2 signal timing for a device to host communication. Note the timing requirements which must be strictly adhered to. The clock frequency, for instance, must lie between 10 and 16.7 KHz. Figure 3. PS/2 Device to Host Signal Timing The following figure shows PS/2 signal timing for a host to device communication. The host brings the clock line low first, for at least 100µs. It then brings the data line low and releases the clock line. The host then waits for the PS/2 device to bring the clock line low. After that, it sets or resets the data line with the first data bit, and waits for the device to bring clock line high. It then waits for the device to bring the clock line low before it sets/rests the data line with the second data bit. This process is repeated until all eight data bits are sent as well as the odd-parity bit. Next, the host releases the data line, and waits for the device to bring the data line low, and then the clock line low. Finally, the host waits for the device to release data and clock lines. Figure 4. Host to PS/2 Device Signal Timing Now that we have seen the low level PS/2 protocol, let us look at the high level host-mouse communication. At power-up, a typical host-mouse communication consists of the following steps: 1) The host sends a Reset Command (consisting of byte “FF”) to the mouse, 2) The mouse responds with an Acknowledgement byte “FA”, 3) The mouse then goes through a self-test process and sends “AA” when this is passed. Then a mouse ID byte “00” is sent to the host, after which the host knows that the mouse is functioning well and ready to transmit data, 4) The host sends byte “F4” to instruct the mouse to “Start Transmitting” its position information, 5) The mouse acknowledges the “Start Transmitting” command by sending byte “FA” back to the host**, 6) After this, the mouse starts transmitting its position information in the form of 3 bytes at a sample rate that can be set by the host (the default is 100Hz) **Note however that in the Basys3 FPGA board, probably due to the USB to PS/2 conversion, F4 instead of FA is returned, and parity test fails. Hence in state 8 of MasterStateSM module, the acknowledgement code have been changed to F4, and parity check skipped. Thus, each data transmission from the mouse to the host after initialisation consists of 33 bits, where bits 1 (first bit), 12, and 23 are ‘0’ start bits; bits 10, 21, and 32 are Odd-Parity bits; and bits 11, 22, and 33 are ‘1’ stop bits. The three-byte data fields contain status and movement data as shown in the figure below. Figure 5. Mouse Data Format The mouse reports a relative coordinate system whereby a move to the right generates a positive number in the X Direction Byte field, and a move to the left generates a negative number in this field. Similarly, a move upwards generates a positive number in theY Direction Byte field, and a move downwards generates a negative number. Note that the X and Y Direction Bytes represent the magnitude of the rate of mouse movement, the larger the number the faster the mouse is moving. Bits XS andYS in the Status Byte are the sign bits, whereby a ‘1’ indicates a negative number, whereas XV and YV bits are movement overflow indicators, whereby a ‘1’ means overflow has occurred. The L and R fields in the Status Byte indicate that the left and right button have been pressed, respectively (‘1’ indicates the button has been pressed). What you are required to do You are required to design an FPGA PS/2 mouse interface and implement it on the BASYS 3 board. The clock line is physically connected to pin “C17” of the Artix7 FPGA chip, and the data line is physically connected to pin “B17” of the chip. Note that in connecting these pins in the XDC file, pull up need to be set true by using the additional comment of the form. set_property PULLUP true [get_ports PS2_CLK] after the PS/2 clock and PS/2 Data ports. The FPGA mouse interface can be built from three modules: a “Transmitter” module, a “Receiver” module and a “State Machine” module to control the FPGA-Mouse communication, as shown in Figure 6. Figure 6. Mouse Interface: Simplified Block Diagram The following shows Verilog code fragments for the “Receiver” module: module MouseReceiver( //Standard Inputs input RESET, input CLK, //Mouse IO - CLK input CLK_MOUSE_IN, //Mouse IO - DATA input DATA_MOUSE_IN, //Control input READ_ENABLE, output [7:0] BYTE_READ, output [1:0] BYTE_ERROR_CODE, output BYTE_READY ); /* Fill in the code */ endmodule The following shows Verilog code fragments for the “Transmitter” module: module MouseTransmitter( //Standard Inputs input RESET, input CLK, //Mouse IO - CLK input CLK_MOUSE_IN, output CLK_MOUSE_OUT_EN, // Allows for the control of the Clock line //Mouse IO - DATA input DATA_MOUSE_IN, output DATA_MOUSE_OUT, output DATA_MOUSE_OUT_EN, //Control input SEND_BYTE, input [7:0] BYTE_TO_SEND, output BYTE_SENT ); /* Fill in the code */ endmodule The following shows Verilog code fragments for the Master State Machine module: module MouseMasterSM( input CLK, input RESET, //Transmitter Control output SEND_BYTE, output [7:0] BYTE_TO_SEND, input BYTE_SENT, //Receiver Control output READ_ENABLE, input [7:0] BYTE_READ, input [1:0] BYTE_ERROR_CODE, input BYTE_READY, //Data Registers output [7:0] MOUSE_DX, output [7:0] MOUSE_DY, output [7:0] MOUSE_STATUS, output SEND_INTERRUPT ); /* Fill in the code */ endmodule Finally, the above three blocks should be connected in a “Mouse Transceiver” module as suggested in the code fragments below: Module MouseTransceiver( //Standard Inputs input RESET, input CLK, //IO - Mouse side inout CLK_MOUSE, inout DATA_MOUSE, // Mouse data information output [3:0] MouseStatus, output [7:0] MouseX, output [7:0] MouseY ); // X, Y Limits of Mouse Position e.g. VGA Screen with 160 x 120 resolution parameter [7:0] MouseLimitX = 160; parameter [7:0] MouseLimitY = 120; /* Fill in the code */ endmodule Note how we deal with bidirectional ports (inout’s) needed here as the Data line and the Clock line are both written to and read from. In general, to make use of an inout port “Port_Inout_Example”, you need to create an internal wire (“Port_Inout_Example_In”) with the value of the “Port_Inout_Example” port assigned to it. This can be used internally as an input to user logic when the port is in input mode. To write data to the port, we create an enable signal “Port_Inout_Example_Enable” which when set causes the output port to take the value of an internal signal “Port_Inout_Example_Out” from user logic. Otherwise, the inout port is set to high impedance. The following show Verilog code snippets to the above effect. wire Port_Inout_Example_In; assign Port_Inout_Example_In = Port_Inout_Example; // Use Port_Inout_Example_In as an input to your logic …. …. // Internal signal Port_Inout_Example_Out can be assigned to the inout port if an enable // signal (Port_Inout_Example_Enable) is set assign Port_Inout_Example = ( Port_Inout_Example_Enable ? Port_Inout_Example_Out : 1’bZ); Assessment You will need to complete the mouse interface code based on the above, synthesise and verify your code with test benches, generate an FPGA bitstream and test on the BASYS 3 board. The team member in charge of this peripheral development will be tested in Week 5. This individual assessment will account for 25% of your overall lab mark. During the test, you will need to demonstrate a functioning mouse interface on the BASYS 3 board through plugging a USB mouse to the board, and showing the various mouse register values (Status Byte, X Direction Byte, andY Direction Byte) displayed on the LEDs and seven segment displays of the BASYS 3 board as the mouse is moved around. You will be supplied with the Verilog description of a seven segment display interface (SevenSeg). A specification of the latter can also be found in the Digital System Laboratory 3/A material. During the assessment, you will need to be able to take the assessor through your code answering any possible query about your design choices, coding style, etc. Your code will have been uploaded to Learn prior to the start of the laboratory session.
Java Lab 21: Reflection This lab practices with reflection and RTTI. 1. Create a new Java project called Lab21. Download the file Employee.class. Do not copy Employee.class to the src directory – see #2. 2. Create a class Lab21Main with two methods: main( ) and void classFun(Class c) { } – that is, an empty method (for now). In main, new up a Lab21Main object. Declare the variable Class c = Class.forName("Employee"). IntelliJ will likely complain that this requires a try-catch block, so choose the option that surrounds it with one. Call classFun( ) with c as the parameter. This will fail – you should get something like "No such class" error message. This is just to make sure that the production directory with the .class files is created. Now copy Employee.class into the production directory out/production/Lab21 (where Lab21Main.class is stored) 3. In classFun(): - This, and the rest of this method, need to be in a try-catch block. Let IntelliJ create this for you, and later, let it add the new Exception types to the catch. Then, use this class to get and display: - its canonical name - all the member data - all the local constructors, then all the constructors (getDeclaredConstructors versus getConstructors) - all the local methods, then all the methods – number the output for clarity 4. Construct an Object instance of type Employee using its default constructor from the constructor set (don't do this: Employee e = new Employee(); ) and the newInstance( ) method. Then print out whether the thing is an enum, is an interface, and print its toString(). Next, use the find( ) method, given below, to search through the methods for setSalary. Then use invoke(object, 1000.0) to set the salary to 1000.0. Then find the method getSalary – display what it returns (it should be 1000.0). private static Method find(Method[] methods, String what) { for (Method m: methods) { if (m.toString().contains(what)) { return m; } } return null; }
Advanced NLP: Assignment 2 Transformer-Based Semantic Role Labeling (SRL) A) Introduction For this assignment you will fine-tune a transformer-based model for SRL (given the predicate). You can reuse some of the previous code developed to pre-process corpus and evaluate the results – however you will need to adapt it as needed. The task consists of: i) familiarizing yourself with HuggingFace transformers’ API and ii) adapt a pre-existing codebase for NER and fine-tune a BERT-base model for SRL. iii) evaluate the results. B) Objectives: - Familiarize yourself with Hugginface Transformers libraries, specifically targeting understanding how to use and fine-tune existing LLMs to perform. sequence labelling - Learn about the concept of subword tokenization, inherent to transformer models, and learn to deal with its impact on the input and the output of these models - Gain hands-on experience developing/adapting transformer-based classifiers by fine-turning a BERT-family LLM for token-level negation scope detection. C) What to do: 1. Start by reading Simple BERT Models for Relation Extraction and Semantic Role Labeling (Peng Shi, Jimmy Lin) and Joint Training with Semantic Role Labeling for Better Generalization in Natural Language Inference (Cemil Cengiz, Deniz Yuret). Note: If you have not done so for a previous course, you can also benefit from reading: NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution (Aditya Khandelwal, Suraj Sawant). This is for a different NLP task, but their methods can also be suitable for the task of SRL. 2. Make sure you can run and understand the Python Notebook you will be adapting: https://github.com/huggingface/notebooks/blob/main/examples/token_classification.ipynb . We recommend using this adapted version made available for ML4NLP here. This notebook should be your starter code. It is designed for NER, and the goal is to adapt it for SRL. Note: When you adapt the code, it is possible that you have questions. If you do, revisit/get better acquainted with the Huggingface’s Transformers and companion libraries. These are important libraries for NLP, but they are also quite extensive. It is not expected that you know the library by heart, but it is important to be able to navigate documentation as needed to understand existing codebases. Depending on your background and on the time you have spent with this library, we recommend the following: - https://huggingface.co/docs/transformers/en/tasks/token_classification (quick introduction to token classification) - If you need a bit more detail, you may also find useful to follow some sections of this tutorial: https://huggingface.co/learn/nlp-course/en/chapter1/1 – sections 2 “ Using Transformers” , 3 “ Fine-tuning a pretrained model” should be especially useful. 3. Adapt the notebook for the task of SRL, and fine-tune an LLM from the BERT-family for this sequence labeling task: a) Make sure you are able to explicitly deal with the relation between predicates and the task of SRL. You can follow one of the methods used in any of the papers listed above, or you choose other suitable methods. Using a suitable method is a minimum requirement to pass this assignment. This will require you to adapt the input to the model in some way. If you have questions about the suitability of a new method, ask! b) We recommend that you use distilbert-base-uncased (the default model in the original notebook) for your work. This model is smaller and therefore faster to fine-tune (and can also be used by less powerful machines). If you wish to do so, you are allowed to use other bert-style models for your experiments. Make sure you make it clear which model you are using and why. c) Consider how you should post-process the output of the model to provide metrics that are suitable for the shared task. In particular, you must ensure that the number of tokens in your confusion matrix match the number of predictions expected by the test set. This needs to be motivated and dealt with explicitly. d) Prepare the code to evaluate your model on the evaluation set. Provide Precision, Recall and F1 measures for token-level classification. Make sure to also include a labeled confusion matrix that supports the classification metrics. Store your model’s predictions over the preprocessed test set and save it in text format (e.g., as a tsv). Make sure this file contains, at least, the token, the gold label and the predicted label for each prediction. e) Prepare a ready-to-use function where one can use the trained model to perform SRL on standalone sentences, given the predicate. Among other necessary arguments (e.g., model, etc.), the function should allow the input of a sentence segmented as a list of strings (e.g., [‘ Pia’ , ‘asked’ , ‘ Luis’ , ‘to’ , ‘write’ , ‘this’ , ‘sentence’ ,’ .’]) and a list defining the location of the predicate to label (e.g. [0,0,0,0,1,0,0,0], for the predicate ‘write’). If you think of another better way to design this function, that is also acceptable, as long as it is well documented. Provide an example showing that the function runs a sentence with more than one predicate (you can choose your own sentence(s)!). Note: this function will be important for your take-home exam. 4. Make sure you carefully document the pipeline. Try to make use of a mix of markdown and code comments. 6. Submit one zip file containing a Jupyter notebook (and HTML printout) accompanied by a requirements.txt file and any number of python modules with helper functions. Include also your model’s predictions on the testset (e.g., as a tsv). Do not upload the saved model on Canvas! Provide a link to download the model instead (e.g. needs to be a public link) – make sure the link is available at the top of your notebook. Make sure you run the notebook and save the notebook with the output of all cells. Read more information about the requirements below. D) Requirements for Jupyter Notebook: The Python Notebook should be formatted in a way that will substitute a written report. As such, it should be crafted with care, highlight all important steps of the pipeline and, when necessary, include explaining text and notes about decisions. Your report must include: - A (publically open) link to download the trained model. We recommend using google drive to share a zip containing the model. 』 Do not upload your model on Canvas! - A printed summary of the statistics for both the training and test sets (see above). Make sure your evaluation (and confusion matrix) matches the numbers you have printed in these statistics (i.e. the total number of tokens must match). - Explain and exemplify, providing 1 or 2 examples, how you chose to preprocess the input. Make sure your examples include both human-readable (i.e., using text) examples and machine-readable input (i.e., using sub-word ids). Use prints to show these examples (do not provide them as text/comments). - A printed example showing an excerpt (1 or 2 sentences) of the data as it is fed into the model (e.g., similar to the output of the function tokenize_and_align_labels ()in the starter notebook). Make sure no gold labels are passed into the model as features other than label. - Explain and exemplify, providing 1 or 2 examples, how you process the output of the model. Make sure that your system’s predictions match the tokenization required by the shared-task. Describe the heuristics you use to go from subword to token level predictions. - A printed evaluation table using Scikit Learn’s evaluation report, including a labeled confusion matrix. You should also include a couple paragraphs discussing these results (similar to A1) - A printed example showing that your function to perform SRL on standalone sentences is working. Please note that you can and should delete cells in the starter notebook that are used for demonstration/tutorial purposes (i.e., not specifically for your experiment). The notebook should focus on your experiment. E) Notes on Computation Power: We know you might not have the computing power to fully train your model for multiple epochs on a local machine, but using a free service like https://colab.research.google.com/ should be sufficient to run at least one epoch (probably more). Focus on the quality of execution, and not necessarily the quality of the end product. F) What to submit: Each student submits one zip file using the predefined naming convention (e.g. A2-Student Name.zip). Inside the zip you should include: - A requirements.txt with the necessary installation requirements. - A Python Notebook showcasing the full experiment. This should be submitted both as a notebook (.ipyn) and as an HTML (.html). Make sure you save the notebook (and the HTML) after running every cell – so the outputs are also saved. You should be able to confirm this by inspecting the HTML. - Any number of helper python modules (if needed). - The model’s predictions on the testset as a text file (e.g., as a tsv). G) Grading: The assignment will be graded on a Pass/Fail basis. And based on the following requirements: - Produce a running Python Notebook, including all steps (corpus preprocessing, processing the input for fine-tuning, training, post processing the output and evaluating the trained model) to train and evaluate a transformer-based model for SRL; Note: make sure the code runs and does not depend on any files that are not included with your submission (including preprocessed datasets). The only files your code can (and should) depend on are the original Universal Propbank data. - The Python notebook should be structured and documented with explanations about the code pipeline.
Department of Applied Mathematics AMA533 Life Contingencies Assignment 1 Due time: 23:00, March 7, 2025. 1. (20 pts) Verify by computations that the following recursive equations hold: 2. (10 pts) We know that q50 = 0.04, q51 = 0.06 and q52 = 0.07. Under the UDD assumption within each year, compute the value 3. (10 pts) The lifetime distribution is assumed to follow UDD within each year starting from birth. We also know that the force of mortality satisfies that µ60.5 = 0.032, µ61.5 = 0.054 and µ62.5 = 0.078. Compute 2 60 q .5 . 4. (10 pts) The force of mortality for a survival model is given by Compute the values 20|10q50 , 20q50.5 and ˚e50. 5. (10 pts) Under the UDD assumption within each year, verify that and hold. 6. (10 pts) For a given individual (x), let us consider a policy payable at the end of the year of death with a death benefit of 1 in the first year and an unspecified death benefit in the following years. Under qx = 0.06 and i = 0.1, and some given mortality probabilities at age x + 1 and beyond, the APV of the insurance policy is 0.42. If qx is in fact 0.03 and all other mortality probabilities at age x+ 1 and later remain the same, what is the new value of APV? 7. (10 pts) The insurance company has a group of policyholders all at the age of x, 70% of whom are non-smokers and 30% of whom are smokers. For the fixed age x, the insurance company’s model for mortality has (non-smoker) and (smoker). (i) A policyholder is chosen at random from the group. Compute 1| q x and qx+1 for this policyholder. (ii) Suppose that both non-smoker and smoker mortality follow UDD in each year of age. Compute the value µx+0.2. 8. (10 pts) For a given individual (x), let Z1 be the PVRV for a policy issued to (x) with the death benefit of 1 that is payable at the end of 20 years if (x) dies within 20 years. Let Z2 be the PVRV for a policy issued to (x) with the death benefit of 1 that is payable at the end of 30 years if (x) dies between 10 and 30 years from the issue date. We know that Cov[Z1, Z2] = 0, 10qx = 0.12 and 20qx = 0.38. Compute the value of 30qx. 9. (10 pts) Given: (i) = 0.35; (ii) δ = 0.05; (iii) µx+t = 0.04 for all t. Compute
Assessment 1 – “Controlled Research Argument” At the centre of this assignment is practicing how to engage with a specific argument from a secondary source. This is a very important skill in academic writing. In this assignment, we go beyond simply quoting from a piece of published research. You will apply the ideas/claims of one piece of critical writing to a new situation, and present further arguments of your own in response to it. 1. Re-read carefully the following article. This article will form. the basis of your controlled research assignment paper – this an essay that addresses the claims found in a limited number of critical sources. • Clay Calvert, Emma Morehart, and Sarah Papdelias. “Rap Music and the True Threats Quagmire: When Does One Man’s Lyric become Another’s Crime.” Columbia Journal of Law & Arts 38 (2014): 1-27. 2. Write an essay that presents your own argument in direct response to one of the claims made in this article. • Your controlled research paper should identify a claim in the Calvert, Morehart and Papdelias article. This is your starting-point: your task is to challenge or advance this claim, using examples from real music or relevant examples or cases (either the ones we covered in class or others that you find on your own). • This essay will involve identifying a narrowproblem or issue put forward in the article, putting forward an original claim of your own in response to this issue, providing evidence, and linking this evidence to your claim. At the centre of this assignment is practicing how to engage with a specific argument from a secondary source by applying its ideas/claims to a new situation, and by presenting further arguments of your own in response to it. Length: 600-800 words. Weighting: 20% of final grade. Due: Sunday, 9 March, 11.59pm (via Moodle) Formatting and Submission of Assignment 1 Please follow these guidelines for formatting and author information – this is a good guide to follow for all submitted essays at university: • Include title, author name/student number and date on first page • Double-spaced • Consistent 3 cm (approx.) margins • Font: Times New Roman, 12pt (or similarly professional font in readable 12pt size) • Citation system. You may choose any standard citation system/style (e.g. MLA/Chicago/APA) for your references, provided that you are consistent in its use. Please follow these guidelines for electronic submission • Submit assignments via Moodle – there is a link provided • A good filename convention: e.g. Here is an example of how the essay’s formatting might look, following these rules: Free Speech Under Threat: The Problem of Pop Music In their 2014 article “Rap Music and the True Threats Quagmire”, Calvert, Morehart and Papdelias offer a number of suggestions for how courts should approach cases where a supposed threat was made via rap lyrics. The authors argue that courts should assume that all listeners understand something about the context of rap music when deciding if it is “reasonable” for a subject to have felt threatened. They write that courts should “attribute some minimal understanding of rap’s conventions—the understanding that a hypothetical reasonable person would have—to the rap-ignorant target.”1 This essay will …
ECON2131/6034 Public Sector Economics Tutorial 2 The economic rationale of government Discussion For each program listed below, discuss what market failures, including merit good considerations, might be (or are) used as a partial rationale: (a) National defence (b) Unemployment compensation (c) Federally insured mortgages (d) Law requiring lenders to disclose the true rate of interest they are charging on loans (e) Government prohibition of the use of narcotics Questions 1. Individual 1 and individual 2 have quite different tastes for shirts and beer such that Individual 1 wants many shirts and individual 2 wants large quantities of beer, but at the margin they must be prepared to exchange the same amount of beer for a shirt. Identify the critical assumption about individual preferences that makes this possible. Provide a graphical representation. 2. A consumer views two goods as perfect substitutes one for one (i.e. the consumer considers each unit of good 1 to be worth 1 unit of good 2). (a) Sketch the indifference curves of the consumer. (b) If an economy is composed of two consumers with these preferences, demonstrate that any allocation is Pareto efficient. (c) If an economy has one consumer who views its two goods as perfect substitutes one for one and a second that considers each unit of good 1 to be worth 2 units of good 2, find the Pareto efficient allocations. 3. Consider an economy with two goods x and y, and 2 individuals A and B with preferences represented by the utility functions: UA (xA, yA ) = xAyA UB (xB, yB ) = xByB where xi and yi represent the consumption of goods x andy by each individual i = A,B. The initial endowment of the goods is that A has 12 units of x and 2 units of y while B has 8 units of x and 18 units of goody. (a) Show how the two goods must be allocated among the two individuals at any Pareto efficient allocation in this economy. Draw the corresponding contract curve in the Edgeworth box. (b) Assume A gets to choose a new allocation to maximize utility subject to the constraint that B’s utility is no lower than at the endowment point. Solve formally for the Pareto efficient allocation.
Department of Computing - 2024/2025 Capstone Project Project Code: MA2 Project Title: Vision Based Framework for Automatic Interpretation, Classification and Detection of Construction Workers and Site Equipment Objective of the Project: The project, "Vision Based Framework for Automatic Interpretation, Classification and Detection of Construction Workers and Site Equipment," is an innovative initiative that aims to streamline construction site management using advanced computer vision techniques. The primary goal of this project is to develop a robust system capable of interpreting, classifying, and detecting construction workers and site equipment in real-time. The system should utilize machine learning algorithms and computer vision to analyze video feeds from surveillance cameras installed at construction sites. It can accurately identify and classify various elements on a construction site, such as different types of equipment, vehicles, and workers. This information can be used to monitor site activities, track equipment usage, and ensure worker safety. Furthermore, the system can detect any unusual activities or potential safety hazards, alerting site supervisors to take immediate action. This not only enhances operational efficiency but also significantly improves safety standards on construction sites. This project represents a significant advancement in the field of construction site management, demonstrating the potential of artificial intelligence and computer vision in automating and optimizing complex processes. It should pave the way for a new era of smart, safe, and efficient construction practices. Data Collection: Utilize a dataset containing images of construction workers and site equipment (or a dataset of your choice). Algorithms: You should implement one or more algorithms of your choice, selecting one from each category and using the same dataset for all algorithms: ML Algorithm or DL Algorithm (Deel Learning is recommended) Evaluation Metrics: Algorithms should be compared based on various metrics, including but not limited to: Accuracy: Proportion of correctly classified instances, Precision: Ratio of true positives to the total predicted positives, Recall: Ratio of true positives to the total actual positives, F1-Score: Harmonic mean of precision and recall, mAP: Mean Average Precision is used to analyze the performance of object detection and segmentation systems. Visualization: Generate comparative graphs and tables to present results, facilitating easier analysis of the performance of each algorithm. This will include plots of accuracy across different algorithms. Expected Outcome: The report will culminate in a comprehensive report featuring detailed graphs and tables comparing the results of all implemented algorithms. To implement and process data using hardware and software implementation. Submission: Please read the student handbook as a reference. The report should be the student handbook formatted and include elements such as graphs, flowcharts, equations, tables, and references. Ensure that all source code files, datasets, and relevant materials are uploaded to Blackboard. Your code/program must be normally executed using the OS you provided and include a README file that explains how to execute the code/program. Knowledge/ Skill/ Tools Required: Programming language (Python is Recommended), algorithm design (Machine Learning / Deep Learning is an added advantage)
STAT0035/0036 Academic Year 2024-2025 Guidelines for the Statistics Project (STAT0035/0036) in 2024-2025 INTRODUCTION What does a project involve? A project is a course that allows a student to undertake a major piece of independent work under the guidance of a supervisor. The assessment is based on a written report and an oral presentation. This course is different from other courses in that the content is determined to a large extent by the student. It provides a lot of freedom in choosing what to study but on the other hand it requires a lot more independent thought and organisational skills than the majority of courses. Many students find the project more demanding than the usual lecture courses. However it is also more rewarding, and a well-executed project can give confidence and pride in the results. It is also something that can be used to demonstrate ability to potential employers. Project versions There are two versions of the course available: STAT0035, which is a 30-credit course, and STAT0036, which is a 15-credit course. Both modules are associated with FHEQ Level 6 or Level 7. THE PROJECT Choosing a topic and supervisor If you decide or need to do a final year project, you must first find yourself a topic and a supervisor. ● A provisional list of projects will be available (usually in the summer) before you do your project. You should discuss possibilities for projects before the start of Term 1 of the final year. Undertaking the project You should agree a series of regular meetings with your supervisor to discuss your progress. The frequency of these meetings may vary overtime depending on the nature of the project and on your other commitments, but most students aim to meet with their supervisors every one or two weeks during Terms 1 and 2. While such a meeting and supervision during Terms 1 and 2 are expected to be in person, remote sessions are allowed upon the agreement between you and your supervisor. It is important to get started on your project early in the firstterm, eventhough the deadline for the completed report may seem a long way off. The project coordinator will be available to answer queries about your project. Three workshops will be offered to help with the written report and the oral presentation. The Christmas vacation can be a good opportunity to do some concentrated work on your project. However, you should not leave anything major to be done over the Easter vacation since (a) you should be revising for your exams then and (b) if you do run into problems at that time it maybe difficult to contact your supervisor. You should aim to have the bulk of your report written before the Easter break. For the STAT0035 Project you are expected to undertake approximately 260 hours of study, including the preparation of the report. This equates to around 8-10 hours per week during the term time and vacation time available for your project. The amount of time required should betaken seriously. In particular if you do not work hard enough in the first term you will regret it later. The STAT0036 project should take approximately 130 hours of study. It is a good idea to write up the work that you do as you go along. Otherwise, when you get to the end of the project, you may have forgotten the details of some of your earlier work. It is also a useful way to organise the work that you have done and can show up gaps that need to be filled before you move on. If there is a lot of computing involved, it is important to keep good records of what you have done. It is very frustrating not to be able to reproduce an earlier result. More formal descriptions of the courses are in UCL's Module Catalogue. Content of the project For your project you should undertake a substantial piece of work that investigates aspects of a particular problem, presents solutions and discusses them critically and coherently. This investigation maybe purely via study of books and other sources, but more usually it will involve applying mathematical or statistical techniques to analyse a probability model or a set of data. Through the project, it is important to develop and demonstrate your statistical understanding rather than collecting/cleaning data. A project is not expected to result in 'new discoveries', as would be the case for a postgraduate thesis. However, you are expected to demonstrate originality in the compilation of your report and, for example, wholesale copying of material from books in undigested form. is not appropriate. Writing the report The main output from your project is the final report, which for STAT0035 projects should typically be in the range of 12,000-15,000 words (excluding computer programs, tables, graphs and other output). For STAT0036 projects this should be 7,000-10,000 words. These lengths are guidelines, not prescriptions, and quality is more important than quantity. If it looks like your project will either be very short or very long compared with the guideline, discuss it with your supervisor, preferably not at the last minute. Note that over- length reports will be penalised. ● You need to think carefully about the structure of what you are writing. In order to make a report of this size readable you need to: ● break the material down into chapters, sections and subsections; ● make sure that the material in each section fits together coherently and that the section titles etc. are an accurate description of the content; ● number sections, figures, tables and important equations so that you can cross- reference them; ● put a caption below or above each diagram, graph or figure to say what it is; ● give a list of references and cite them correctly. ● You should think about the audience for which you are writing. For the project you should attempt to present the material in a form that you would be able to understand and assimilate if you were given it by one of your peers. ● You need to attend to details in the presentation. Make sure the spelling and grammar are correct. Spell-checkers and grammar-checkers can be used, but ultimately these can only be properly checked by a careful reading of the report. Make sure references are accurate, dates are correct and so on. ● You need to organise your time during the writing. If you produce a report in the final 3 days, it will show. You may need to edit the report many times before it is in a fit state to be submitted. You should give your supervisor the opportunity to comment on the plan for your report. ● Before the end of Term 2, you should agree with your supervisor a time frame for providing feedback on your draft report (and presentation slides). Some supervisors may prefer to provide feedback on one chapter at a time; others may prefer to provide feedback on the whole report at once. Either way, you need to provide drafts in plenty of time for your supervisor to read them (e.g. usually at least two weeks before the final submission deadline). In return, your supervisor will be happy to provide you with concrete and constructive comments and suggestions for you to revise your draft report (and slides). ● It might help if you can benchmark your draft (written report and oral presentation slides) against the criteria listed in the marking forms (which you can find in the "Marking scheme" section on Moodle). Using mathematical word processing software There is a lot of software currently available that may make the writing up of your project easier and it is recommended that you spend some time familiarising yourself with the possibilities at an early stage of the project. You will probably find that you will need to revise the structure of your report several times, so software that reliably renumbers sections is obviously an advantage. Many word processors (for example WORD) will allow you to organise and cross- reference material as well as to write Greek letters, subscripts and equations. If your report is likely to contain many and/or more complicated equations, you will almost certainly be better offusing the mathematical word processing package LaTeX, which is widely used by members of staff. Whatever software and computer you use, remember to backup your files regularly. Disk failures on laptops are not all that rare, and WORD has an uncanny knack of knowing just when it would be most inconvenient to overwrite your file with rubbish. Help with preparing written reports will be given in the workshops. The oral presentation In the end of Term 2, we will hold a practice oral presentation. The final (assessed) oral presentation will be held face-to-face in Term 3, with slides submitted online in advance. The oral presentation will involve your giving a talk of 13-15 minutes about your project, plus about 5 minutes of Q&A. Help with preparing and delivering oral presentations will be given in the workshops and the practice oral presentation. Assessment of the project The project is assessed on the written report (80%) and the oral presentation (20%). (See the Moodle page of the course for marking schemes.) Electronic versions of the written report and the slides of the final oral presentation should be submitted in two designated areas of the Moodle page of the course (see the Moodle page for the update of information including precise deadlines). Structure of the written report ● Include a frontpage with the title of your project, your candidate number, course code (i.e. STAT0035 or STAT0036), word count, and date (e.g. April 2025). ● The second page should contain an Abstract of up to about 300 words in length. This is a brief statement of the aims of the project and a guide to the major results. It is distinct from the Introduction, which gives the background, motivation and aims of the study in more detail. ● Include a Table of Contents (and possibly a Glossary if appropriate), a chapter/section of Introduction, a chapter/section of Conclusions including discussion of limitations and future work, and a list of references. ● The reference list should include all references that have been used to support the work reported, and these references should be cited in the text ofthe report to indicate where they have been used. ● There are a number of standard ways of referencing books, articles, lecture notes, software, etc., and you should use one of these. You should read the separate guidelines on referencing and discuss with your supervisor what system you are going to use. ● The pages should be clearly numbered and should have a left-hand margin of at least 2cm. ● Examiners attach considerable importance to accuracy, clarity and overall quality of presentation. Achieving this means starting to write early, so that you are not rushing to write everything up at the last minute, and giving your supervisor the opportunity to give you some feedback in time for it to be useful.
Netflix Case MGT 180R – Business Finance Netflix (ticker: NFLX) began as a DVD rental service when it went public in 2002, but it achieved massive success ever since it began offering a streaming service. There are now many players in the streaming service business, and so a considerable amount of uncertainty exists surrounding Netflix’s future. The purpose of this case is to apply your capital budgeting skills in a valuation of Netflix. Think of valuing a company like a big capital budgeting exercise. However, instead of forecasting cash flows for a single project, you will be forecasting cash flows for an entire company. Likewise, instead of computing the NPV of an individual project, you will compute the NPV of the cash flows of the entire company. There will be one key difference: with projects we typically assume they come to an end, but with companies we don’t (at least we hope not!). Thus, we need to make an assumption about how cash flows in the company grow in the long run. Typically we forecast individual cash flows for 5-10 years and then make an assumption about how cash flows grow after that (more details on this below). Here are the steps for the Netflix case: 1. Start with the Netflix excel worksheet I’ve posted on Canvas (Netflix.xlsx) 2. The spreadsheet gives assumptions about the year-over-year (YOY) growth of revenues and related expenses for 2025-2032. These assumptions come from analyst reports and Netflix management forecasts. Combining these sources gives a bumpy growth forecast for the next few years which is why the YOY growth estimates do not follow a smooth pattern at first. Use the base case assumptions in the excel sheet to build a forecast of earnings and free cash flows for the years 2025 to 2032. Specifically, start by forecasting revenues each year. Then, using these forecasts, compute the associated “cost of revenue”, depreciation and capital expenditures. Cost of revenue are all costs of revenue not including depreciation, so that Revenue – Cost of Revenue = Earnings Before Interest Taxes Depreciation and Amortization (EBITDA). Subtract off depreciation to get EBIT. Continue as we did in our capital budgeting exercises until you eventually get to Free Cash Flow. You can assume a tax rate of 21% and negligible changes in net working capital. You can also assume an opportunity cost of capital of 10% for all the FCFs. 3. Beyond 2032, you will need to make an assumption about Netflix’s long-term growth once its growth stabilizes. • Start with the free cash flows you forecast for 2032 and assume that they will grow by 6% to 2033 and continue growing at 6% thereafter. You can then use the growing perpetuity formula to value all the cash flows after 2032. By doing this, you will have what is called a “terminal value” or “continuation value” for Netflix as of the end of 2032 (one year before the first cash flow in the growing perpetuity). 4. To avoid timing complexities, we will assume that it is now the beginning of 2025 and that the first cash flow (the 2025 FCF) will be generated exactly one-year from now, at the end of 2025. Discount all cash flows back to the beginning of 2025 using a 10% cost of capital. The sum of total of these discounted cash flows is the estimated total enterprise value of Netflix at the beginning of 2025. Enterprise Value = Equity + Debt – (Cash and Marketable Securities). 5. To arrive at the price per (equity) share, you must subtract Netflix’s debt from its enterprise value and then add Netflix’s cash and marketable securities. • Netflix has $15.6 billion in debt and $9.6 billion in cash and marketable securities. • Netflix has 437.8 million shares outstanding. Use this information to calculate the price per share. 6. Next, perform some sensitivity analysis. • Given the uncertainty in the competitive environment for streaming services, use the Upside and Downside scenarios for YoY revenue growth to compute the associated Upside and Downside stock prices. • Also, using the base case, check the sensitivity of your valuation to the assumptions about Netflix’s cost of revenue. The sheet assumes that Netflix will be able to stabilize its cost of revenue to be 60% by 2032. However, competition for streaming content, higher product support expenses, or increased advertising could eat into this. Calculate Netflix’s stock price if Cost of Revenue evolves as in the “Sensitivity CoR” row in the spreadsheet. 7. As of the writing of this case, Netflix’s stock price was $950. • Consider what it would take for your forecast to produce a valuation that equates to a $950 stock price. • By changing your revenue growth assumptions, find a set of YoY growth assumptions that would produce a stock price of approximately $950. (The growth rates that you come up with do not have to be the same across the years, and there is no single correct answer: many different sets of growth rates will produce a price of $950.) • With these revenue assumptions in place, compute the compound annual growth rate (CAGR) in FCF from 2025 to 2032. For example, if your estimate of 2025 FCF is $1,000 and for 2032 is $10,000, then you would use the “RATE” function in Excel: =RATE(7,0,-1000,10000). 8. An alternative way to value a stock is by use of multiples. NOTE: There is no reason to expect the multiples-based valuation to agree with your discounted cash flow (DCF) valuation. This may be true for a couple of reasons: 1) It is difficult to find a reasonable comparison company, and 2) Netflix may command a premium (or discount) valuation depending on its fundamentals. • Start by valuing Netflix based on your 2025 projected EBITDA and using the entertainment industry average Enterprise Value/EBITDA ratio of 16.4. This will give you a total company value which you would replace your NPV of FCF in your share value calculation (you still need to subtract debt and add cash and marketable securities before dividing by shares outstanding to get the price per share). • Do the same with the average EV/EBITDA ratio for Internet-based companies: 19.3. Overall, what is your estimate of what Netflix’s stock price should be and what range of valuations do you think are reasonable? Defend your conclusion by discussing and referencing the outcomes of the valuations in steps 5 to 8. You will have arrived at a range of prices and you will need to take a stand on how to interpret this range and which prices and assumptions to weigh more heavily. Your answer should take the form. of a memo (maximum 2 single-spaced pages) that references the spreadsheet exhibits (the exhibits do not count toward the 2-page limit). Do NOT write a chronological history of what you did to solve this case and do NOT write it as #5) answer, #6) answer, etc. You are supposed to explain the highlights of your approach, synthesize what you found, draw a conclusion, and defend it. This is a group case with a maximum number of five people per group. Submission is electronic via Canvas with one submission per group. Submissions MUST also include your Excel file ending in .xlsx. Please make sure your Excel file is not corrupted before submitting, and please list the name of each group member in your submission.
ECOS3035: Economics of Political Institutions Homework I 1. Consider a society of three individuals whose preferences over the four possible alternatives are: Person 1: d ≻ c ≻ b ≻ a Person 2: d ≻ c ≻ a ≻ b Person 3: b ≻ a ≻ c ≻ d Consider the pairwise majority rule and assume that individuals vote sincerely. (a) For the profile of preferences above, do social preferences satisfy unanimity? (b) For the profile of preferences above, are social preferences transitive? (c) Now consider changing person 2’s preferences to a ≻ d ≻ c ≻ b. Are social preferences transitive for the new preference profile? (d) Relate your answers in parts (b) and (c) to Arrow’s theorem? Be brief here. 2. Consider a society of three individuals and three alternatives a, b and c. Consider the pairwise majority rule and assume that individuals vote sincerely. Consider only strict preferences over alternatives. (a) How many profiles of preferences are there in this society? (b) How many profiles of preferences are there in this society where each individual places alternative a at the top of their ranking? (c) List all the possible profiles of preferences for which the rule above violates transitivity. 3. Consider a society of three individuals A, B, and C, who have to choose over three trade policies, more free trade (F), status qou levels of trade (S), and more protectionist trade (P). Suppose society has the following preferences with individuals voting sincerely. A: S ≻ F ≻ P B: F ≻ S ≻ P C: P ≻ S ≻ F (a) Suggest an intuitive way to order the policies above from left to right. (b) For this order of policies, can individuals be ordered in terms of their preferences from left to right? Do this by checking the definitions that we had in class. (c) Who is the median voter? What policy do they prefer? (d) Specify a non-dictatorial aggregation rule where societies preferences are the same as this median voter’s preferences? (e) Why does this rule satisfy all of Arrow’s axioms (in particular, transitivity)? 4. Consider the Hotelling/Downs model of political competition with a unit mass of consumers uniformly distributed over the interval [0, 2]. There are two candidates (candidate 1 and candidate 2) who simultaneously choose a policy on the interval [0, 2], and then voters vote for one of the two candidates. Let sj denote candidate j ′ s policy, j = 1, 2. A voter votes for the candidate whose policy is closest to the voter’s location. A candidate gets a utility of 1 from winning and -1 from losing. Ties are broken with a fair coin toss. (a) On a graph, with s1 on the horizontal axis and s2 on the vertical axis, plot candidate 2’s best response set (that is, specify all the s2’s which are optimal for candidate 2, given s1). (b) On a graph, with s1 on the horizontal axis and s2 on the vertical axis, plot candidate 1’s best response set (that is, specify all the s1’s which are optimal for candidate 1, given s2). (c) Depict the Nash equilibrium on this graph where both players are playing a best response to each other. Provide some intuition for why this is a Nash Equilibrium. 5. Consider the model of vote buying that we did in class (Groseclose and Snyder). In class we considered an example with 7 legislators. Redo the same example, but this time with 9 legislators. What is the optimal number of votes for party A to buy and specify its total cost from doing so?
FV3002 Assignment Brief (2024 – 2025) The work shall be typed or word-processed in your own words. The deadline for submission is 11:59 p.m. (HKT) on 28 Mar 2025 (Friday). Learning Outcomes This piece of assessment will test your ability to meet learning outcomes as described hereunder: Understand fire safety strategies and tools that may be adopted in application to buildings and infrastructure and evaluate their usefulness for a range of applications (Learning Outcome 1) Critically evaluate common guidance documents and topical issues current in the industry relating to the use of active fire protection measures (Learning Outcome 4) Assignment Details This assignment contains 1 (one) questions. Answer all questions with words not exceeding 1,500. The assignment will carry 40% weighting of the total mark of this module. Submission Details (1) The deadline for submission is 11:59 p.m. (HKT) on 28 Mar 2025 (Friday). Late submission will be dealt with strictly in accordance with UCLan Regulations. (2) No hard copy is required to be submitted to the SCOPE counter. This assignment should be submitted through Turnitin to CityU SCOPE CANVAS assignment submission folder. (3) In-text citations and referenced publications shall be added to the answer to each question. (4) Using AI-generated text to complete your assignment is prohibited. Citation of Harvard style shall be used for all quoted references. UCLan regards any use of unfair means in an attempt to enhance performance or to influence the standard of award obtained as a serious academic and/or disciplinary offence. (5) Submission of written assignment shall be type-written in .pdf or .docx format. The file name of your submission shall follow the format as the example below: FVxxx_CHAN TaiMan_G12345678 (6) Students should do whatever means to make sure the files are duly submitted via the CANVAS system and check whether the work is successfully uploaded (by downloading the file from CANVAS again). All claims on technical problem without strong evidence for unsuccessful uploading shall not be accepted. (7) It should be the students’ responsibility to double-check the readability (pdf or docx format) of the submitted files. (8) Administration team will not remove or replace student’s submitted assignment in CANVAS or help students to upload the soft copy of his/her assignments to CANVAS. Question 1 (a) You are requested to design the smoke extraction system for a large compartment with 40m (Length) x 60m (Width) x 12m (Height). All assumptions (such as fire size, height of smoke layer interface, type of smoke extraction system and details of proposed smoke extraction system should be clearly made and well justified with the relevant references. (b) Referring to the proposed smoke extraction system for the captioned compartment mentioned in item (a) above, you are required to provide discussions on how to ensure a good qualitative smoke control system. Marking Criteria Marks will be allocated according to the following criteria: Marking Criteria Marks allocation Knowledge of relevant material and grasp of themes: Students to use own words in demonstrating awareness and appreciation of key issues. 20 Analysis, synthesis and depth of argument: Identification of key points and justified put forward clearly and succinctly. 20 Engineering principle/calculation: Correct application of concept/formulae with complete accuracy and correct answer. 40 Structure: Logical structure with introduction, background and executive summary. 20 Total 100
DEPARTMENT OF APPLIED MATHEMATICS AMA529 STATISTICAL INFERENCE Assignment 2 Due at 11:59pm on 7 Mar, 2025 1. Let X1, . . . , Xn be a random sample from the N(µ, σ2) distribution, where µ is known, and σ2 > 0 is unknown. (a) Calculate E[(X1 − µ) 2 ]. (b) Show that is the best unbiased estimator of σ2. You may use without proof the fact that (Xi − µ)2 is a complete statistic. 2. Let X1, . . . , Xn be a random sample with density f(x; θ) = θ(1 − x)θ−1 for 0 < x < 1 where θ > 0. (a) Show that (1 − Xi) is a sufficient statistic for θ. (b) Find the MLE of 1/θ. (c) Given that (1 − Xi) is complete statistic, find the best unbiased estimator of 1/θ. (d) Find the best unbiased estimator of θ. (Hint: you may start with the MLE of θ; and show that Y1 = − log(1 − X1) follows an exponential distribution). 3. Let X1 and X2 be two independent and identically distributed variables with probability mass function f(x; θ) = for x = θ, θ + 1, . . . where θ ∈ {0, 1, 2, . . .}. (a) Show that min(X1, X2) is a complete sufficient statistic for θ. (b) Find P(min(X1, X2) = k) for k = θ, θ + 1, . . . and E(min(X1, X2)). (c) Find the best unbiased estimator of θ. 4. Let X1, . . . , Xn be a random sample from the Gamma(α, θ) distribution with density f(x; θ) = for x > 0, where α > 0 is known and θ > 0. Consider the hypothesis test H0 : θ = 1 versus H1 : θ = 3. Show that the rejection region of the most powerful test takes the form. of R = {x : xi ≤ c} for some constant c. 5. Let X1 and X2 be two independent and identically distributed random variables whose probability mass function under H0 and H1 is given by Use the Neyman–Pearson lemma to find the most powerful test for H0 versus H1 with size α = 0.09. Also, compute the type II error probability for this test.
CSC108 Assignment 2: Simulating Canadian Elections Due Date: Thursday March 6, 2025 before 4:00pm Goals of this Assignment · Develop code that uses loops, conditionals, and other earlier course concepts. · Practice with lists, including looping over lists, using list methods, and list mutation. · Practice reading problem descriptions written in English, together with provided docstring examples,and implementing function bodies to solve the problems. · Practice reusing functions to help implement other functions. · Continue to use Python 3, Wing 101, provided starter code, a checker module, and other tools. Starter code For this assignment, we are giving you some files, including some Python starter code files. See the Files to Download section below for details on how to download the files. Background Information Voting theory is the study of voting systems. A voting system is an algorithm for computing the winner of an election given a list of candidates and a set of ballots. For example, you may be familiar with the Plurality voting system: each voter marks their ballot for exactly one candidate, and the candidate with the most votes is elected. There are dozens of different voting systems and many different types of ballots. In this assignment, we'll be investigating five systems (Plurality, Approval Voting, Range Voting, Borda Count, and Instant Run-Off) that use four different types of ballots. In case you're curious, we have provided a table showing you where each voting system is used (https://q.utoronto.ca/courses/379470/pages/use-of-different-voting-systems) . (This is only for the curious since everything you need to know about each voting system and Canadian elections is contained within this handout.) Canadian Elections: Ridings, Members of Parliament, and the House of Commons Canada is divided into 338 geographical areas called ridings. In a standard Canadian election, one candidate from each political party runs in each riding. Each riding's voters elect one of these candidates to parliament using the Plurality voting system. These elected Members of Parliament (MPs) get a seat in the House of Commons. (So, there are 338 seats in the House of Commons.) At present, there are five parties represented in the House of Commons; four of them put forward candidates in all provinces. To simplify the assignment somewhat, our simulation will includeonly those four parties: · the Conservative Party of Canada (CPC) (http://www.conservative.ca) · the Green Party (http://www.greenparty.ca) · the Liberals (http://www.liberal.ca) · the New Democratic Party (NDP) (http://www.ndp.ca) For the purposes of the assignment, we will not differentiate candidates from their parties; Ballots will includeonly the party names. Types of Ballots Four types of ballots are used in this assignment. This section refers to the constant PARTY_ORDER, which is given a value in the Constants section of the voting_systems.py file. The Data For this assignment, we will provide starter code that reads voting data from a Comma Separated Value (CSV) file named sample_votes.csv. Each row of this file contains the following information about a single voter, with a comma between each part: riding number: the number of the riding to which the voter belongs. (Many different voters will vote in the same riding.) voter number: a number assigned to a voter. (No two voters in an election a riding have the same voter number.) rank ballot: the voter's rank ballot with each party separated by the ; character. range ballot: the voter's range ballot with each party's points separated by the ; character. approval ballot: the voter's approval ballot with each party's approval/disapproval value separated by the ; character. Notice that the row of data for each voter does not contain their single-candidate ballot. We will determine their single-candidate ballot from their top choice in their rank ballot. Custom Data Type Within our code, the data from each row in sample_votes.csv will be represented as a list that contains ints (for riding number and voter number) and sublists (for representing each of the rank, range and approval ballots). In voting_systems.py, we will refer to this data as VoteData in our type contracts. We use VoteData to represent type: list[int, int, list[str], list[int], list[bool]]. Here is an example of a list[VoteData]: [[0, 1, ['CPC', 'LIBERAL', 'NDP', 'GREEN'], [3, 1, 2, 1], [True, False, False, False]], [0, 2, ['LIBERAL', 'NDP', 'CPC', 'GREEN'], [2, 1, 3, 2], [False, False, True, False]], [1, 3, ['LIBERAL', 'NDP', 'GREEN', 'CPC'], [1, 2, 3, 3], [False, False, True, True]], [1, 4, ['LIBERAL', 'GREEN', 'NDP', 'CPC'], [1, 1, 2, 1], [False, False, True, False]], [1, 5, ['CPC', 'GREEN', 'NDP', 'LIBERAL'], [3, 2, 1, 2], [True, False, False, False]]] Note that the 'YES' and 'NO' that appear in a CSV file are represented as True and False, respectively, in a type VoteData object. Constants We have provided the following constants in the starter code for you to use in your solutions. Read on to see how they should be used in your code. Constants provided in the starter code file voting_systems.py Files to Download Please download the Assignment 2 Starter Files (a2.zip) (https://q.utoronto.ca/courses/379470/files/36292538?wrap=1) (https://q.utoronto.ca/courses/379470/files/36292538/download?download_frd=1) and extract the zip archive. A description of each of the files that we have provided is given in the paragraphs below. Starter code: voting_systems.py The voting_systems.py file contains some constants, some sample data, and function headers and docstrings for the functions you will write. For each function, read both the handout and the header and docstring (especially the examples) to understand what the function should do. Data: sample_votes.csv The sample_votes.csv file contains vote data in comma-separated values (CSV) format. You must not modify this file. Note: do not call open, or read from this file, in your voting_systems.py solution. The file reading tasks are done by us in voting_simulation.py. Simulation code: voting_simulation.py The voting_simulation.py file will let you run a simulation that uses the code you write in voting_systems.py. You will not be submitting this file and there is no need to change it. It reads voting data from sample_votes.csv and passes the data as arguments to calls on your functions defined in voting_systems.py. Checker: a2_checker.py, checker_generic.py and a2_pyta.json These files provide a checker program that you should use to perform. a simple test of your code. See below in the section called CSC108 A2 Checker for more information about a2_checker.py. The checker program requires the files checker_generic.py and a2_pyta.json. You do not need to do anything with these files, other than keep them all in the same folder as your voting_systems.py file. Tasks Suppose you want to analyze how different voting systems change the results of elections. There are multiple voting systems to compare, and hundreds of votes and ridings to consider. To make your life easier, you will write Python functions to help you manage this information. Docstrings, Preconditions, and Assumptions You do not need to write any preconditions in this assignment, as we have provided the docstring descriptions for the required functions. All function docstrings should include at least two examples. If you write your own helper functions, you should write complete docstrings for them. Docstrings, including the requirement to have two examples for each function, make up part of the style. marks. To simplify the assignment, you can make the following assumptions: There will only ever be 4 parties, and the party names will always be given in uppercase. Note that while the given PARTY_ORDER constant refers to a list that is sorted in alphabetical order, that may not always be the case. (We may test your code with PARTY_ORDER referring to a list of 4 parties given in a different order.) If ties occur, they should be broken by choosing the party that comes first in PARTY_ORDER. We suggest you write your code ignoring ties at first, then test to see what happens and then think about how to implement this tie-breaking if necessary. All lists representing a single rank, range, or approval ballot will have the same number of elements as PARTY_ORDER (that is, 4). Task 0: Creating testing data We have provided you with docstrings in the starter code in voting_system.py, however many of the docstrings only have one example. We have provided you with one sample list of VoteData called SAMPLE_DATA_1. Create a second sample list called SAMPLE_DATA_2 that contains the 3 VoteData items representing the data of the voters in the following image (use the same order) : We have started this off for you, setting SAMPLE_DATA_2 to be an empty list in the starter code. Next, as you work through the following tasks, add a second example that uses SAMPLE_DATA_2 in any docstring that does not contain 2 examples. Task 1: Data cleaning The code we provided in voting_simulation.py reads data from a CSV file, separates out the commas, and produces a list[list[str]]. Here is a sample of the kind of list produced by our code: [ ['0', '1', 'NDP;LIBERAL;GREEN;CPC', '1;4;2;3', 'NO;YES;NO;NO'], ['1', '2', 'LIBERAL;NDP;GREEN;CPC', '2;1;4;2', 'NO;NO;YES;YES'], ['1', '3', 'GREEN;NDP;CPC;LIBERAL', '1;5;1;2', 'NO;YES;NO;YES'] ] You are to write the function clean_data, which should modify the provided list according to the following rules: ridings should become int s voter numbers should become int s rank ballots should become a list of str s range ballots should become a list of int s approval ballots should become a list of bool s Applying the clean_data function to the example list[list[str]] given above mutates it to contain this list[VoteData]: [ [0, 1, ['NDP', 'LIBERAL', 'GREEN', 'CPC'], [1, 4, 2, 3], [False, True, False, False]], [1, 2, ['LIBERAL', 'NDP', 'GREEN', 'CPC'], [2, 1, 4, 2], [False, False, True, True]], [1, 3, ['GREEN', 'NDP', 'CPC', 'LIBERAL'], [1, 5, 1, 2], [False, True, False, True]] ] Each of the three sublists above is of type VoteData. You must not use the built-in function eval in clean_data. This function is one of the more challenging functions in A2, because it mutates a list. We suggest that you start with some of the other functions in Task 2 and 3, and come back to this one later. Data cleaning function to implement in voting_systems.py Function clean_data (list[list[str]]) -> None Description The parameter represents a nested list of strings in the format of data read from a file as described above. Modify the nested list so that each sublist is of type VoteData.
AD 688 Big Data Analytics – AD 688 Web Analytics AD 688 Big Data Analytics Syllabus 2 Course Content and Objectives 2.1 Course Description The Web Analytics for Business courses builds on the business analytics foundational course provides comprehensive introduction to Big Data, Data Visualizations, and Cloud Analytics. Students gain hands-on experience with a variety of tools for key concepts. Students will explore core concepts on Big Data and Cloud Analytics including Management of Big Data, Massive Data Stores, Cloud Analytics, Web scraping, Text & Web mining, with comprehensive theory and Practical Application. The course emphasizes hands-on learning, data analytics workflows, and cloudbased tools, preparing students to tackle real-world challenges in business analytics with scalable, data-driven solutions. This course is designed for students aspiring to become competent consultants, entrepreneurs, analysts, machine learning engineers, and data scientists. Upon completion of this course, you will have advanced knowledge of big data anc cloud analytics tools. 3 Course Learning Objectives The database part of this course introduces students to designing a mission critical database application including importing and exporting content and analyzing and presenting the information using front end tools. The web analytics part of this course studies the metrics of websites, their content, user behavior, and reporting. The Google analytics tool is illustrated to collect analytics data. The web mining module presents how data is extracted from websites and analyzed. Email analytics and mobile analytics concepts are also introduced in this course. A term project that will provide advanced overview an integrated overview of the above concepts. 4 Course Resources There is no required textbook for this course. All required readings will be provided in the course website through notes and videos on canvas. The following textbooks are recommended for this course. Some of the books are freely available through the BU library and on the web: Slides and lecture notes are created from combination of sources 1. Big Data Hands On 1. Judith S Hurwitz et al., Big Data for Dummies (John Wiley & Sons, 2013). 2. Venkat Ankam, Big Data Analytics (Packt Publishing Ltd, 2016). 3. Thomas Erl, Wajid Khattak, and Paul Buhler, Big Data Fundamentals: Concepts, Drivers & Techniques (Prentice Hall Press, 2016). 4. Scott Haines, “Modern Data Engineering with Apache Spark,” n.d. 5. Sridhar Alla, Big Data Analytics with Hadoop 3: Build Highly Effective Analytics Solutions to Gain Valuable Insight into Your Big Data (Packt Publishing Ltd, 2018). 2. Cloud Computing 1. Judith S Hurwitz and Daniel Kirsch, Cloud Computing for Dummies (John Wiley & Sons, 2020). 2. Thomas Erl, Ricardo Puttini, and Zaigham Mahmood, Cloud Computing: Concepts, Technology & Architecture (Pearson Education, 2013). 3. Gautam Shroff, Enterprise Cloud Computing: Technology, Architecture, Applications (Cambridge university press, 2010). 4. Sandeep Bhowmik, Cloud Computing (Cambridge University Press, 2017). 4.1 Recommended Textbook 1. Hurwitz et al., Big Data for Dummies. 2. Hurwitz and Kirsch, Cloud Computing for Dummies. 4.2 Optional Textbooks 3. Haines, “Modern Data Engineering with Apache Spark”. 4. Alan Anderson, Statistics for Big Data for Dummies (John Wiley & Sons, 2015). 5. Ron Kohavi, Diane Tang, and Ya Xu, Trustworthy Online Controlled Experiments: A Practical Guide to a/b Testing (Cambridge University Press, 2020). 6. Anthony DeBarros, Practical SQL: A Beginner’s Guide to Storytelling with Data (no starch Press, 2022).
Lab #1 (6 pages): MAKE SURE TO USE LAB GUIDE and APA RESOURCES. MAINTAIN FORMAL WRITING STYLE, RESEARCH FOCUSED, THIRD PERSON PERSPECTIVE, NO PERSONAL PRONOUNS. CITE EVERYTHING WITH CORRECT IN TEXT CITAITONS AND PROOF READ!!! Cover Page and Abstract Intro (aprox 1.5 pgs) Heading is Title of Your Paper 1. General background - broad overview of research on this topic (Body image concerns) (KEEP IT RESEARCH FOCUSED) THEORY – gender differences in body image concerns. A majority of the research (historically) focused on body fat, the outcomes, what this lead to. More current research (McCreary, Saucier, & Courtenay, 2005) is challenging this. 2. Previous Research – specific examples – 2 sources used in the article (be careful of correctly citing), McCreary et al. (2005) article in detail - theory, what they were looking for and what they found (RESULTS: 4 important findings 1) men score higher on DMS regardless of GBV. 2) 3 Masculine Measures are related to DMS 3) Feminine traits are not related to lower DMS 4) No Differential Salience *Final discussion (showing you know the article and understanding it) 3. Introduce Current Study - Why we’re doing it etc. to address limitations of McCreary et al. (2005), different population and measures. Differences between current study and McCreary et al. (2005) – for body image measure – BSQ – Drive for Muscularity Scale (DMS) and BSQS ***current study is using measures geared toward body fat and muscularity together. For gender - personality measure BSRI (different measure, exploring different aspect of gender role socialization). · Research Questions (not bullets, paragraph): · Is there a difference between men and women in body image between concern with muscle mass and concern with body fat/body shape? If so what is it? Hyp – men are going to score higher on DMS women higher on BSQS (based upon McCrery’s theory and findings). · What are the relationships between masculinity/femininity and drive for muscularity/body shape concerns? Is masculinity correlated with drive for muscularity? Hyp –Yes positive correlation between masculinity and DFM (theory and prior research). For DFM and FEM McCreary theorized a relationship but didn’t find it so the current study wants to replicate and confirm or disprove original findings. Also investigating correlations for BSQS. Method section: (aprox 2 pgs) Participants: A total of xxx participants completed surveys. There were xxx (%) men and xxx (%) women between the ages of xx and xx with the average age of xx years old. (Add other demographic information; percent of population below xx years old, report ALL sexual orientation, ALL class and ethnicity). Procedure: How you did it, with detail (Use guide for writing a lab report)… Measures: Sentences Explaining Our Measures: 1 survey – 2 measures BSQ consists of DMS and BSQS and Rate Your Traits = BSRI consists of MAS FEM and AND scales Describe Scales: Drive for Muscularity Scale (DMS). Use your article as a guide to describe this scale (how many items, what its measuring, how its scored, WE DID NOT REVERSE CODE, sample questions, validity); and then our specific results for the alphas – overall alpha = XXX Body Shape Questionnaire Scale (BSQS). Do same for BSQS as for DMS (describe it in the same way as article did for DMS; use BSQ article to help); overall alpha = XXX Bem Sex Role Inventory (BSRI). Do same (Use BSRI article) – measure for masculinity; femininity; androgyny; alpha for MAS = XXX and for FEM = XXX. Androgynous scale wasn’t used for analysis. Results (aprox 1.5-2 pgs) Intro paragraph: explain analysis Table 1. Means and Standard Deviations for the DMS and BSRI Scale Men (n = xxx) Women (n = xxx ) M SD M SD DMS xx xx xx xx BSQS xx xx xx xx BSRI Masculinity xx xx xx xx Femininity xx xx xx xx Explain Chart: descriptive statistics. Report on the results: ANOVA for Gender Differences on DMS, BSQS & BSRI: ANOVA GENDER DMS: ANOVA GENDER BSQS: ANOVA GENDER MASC: ANOVA GENDER FEM: Correlations: DMS and MAS: DMS and FEM: BSQS and MAS: BSQS and FEM: DISCUSSION (aprox 2 pgs): *MOST IMPORTANT SECTION Where we talk about results, restate main findings without statistics; relate to hypotheses and previous research findings; interpret results Limitations: What might have influenced our results? Future Directions: What should we study next?
MACM 401/MATH 801 Assignment 4, Spring 2025. Due Tuesday March 11th at 11pm. Late Penalty: −10% for each hour late. Question 1: P-adic Lifting (20 marks) Reference: Section 6.2 and 6.3. (a) By hand, determine the p-adic representation of the integer u = 116 for p = 5, first using the positive representation, then using the symmetric representation for Z5. (b) Theorem 2: Let u, p ∈ Z with p > 2. For simplicity assume p is odd. If − < u P := () -> randpoly(x,degree=2,dense): > A := Matrix(3,3,P); > d := LinearAlgebra[Determinant](A); d := −224262 − 455486x2 + 55203x − 539985x4 + 937816x3 + 463520x6 − 75964x5 (a) (15 marks) Let A by an n by n matrix of polynomials in Z[x] and let d = det(A). Develop a modular algorithm for computing d = det(A) ∈ Z[x]. Your algorithm will compute determi-nants of A modulo a sequence of primes and apply the CRT. For each prime p it will compute the determinant in Zp[x] by evaluation and interpolation. In this way we reduce computation of a determinant of a matrix over Z[x] to many computations of determinants of matrices over Zp, a field, for which ordinary Gaussian elimination, which does O(n 3 ) arithmetic operations in Zp, may be used. You will need bounds for deg d and ||d||∞. Use primes p = [101, 103, 107, ...] and use Maple to do Chinese remaindering. Use x = 1, 2, 3, ... for the evaluation points and use Maple for the interpolations. Present your algorithm as a homomorphism diagram. Implement your algorithm in Maple and test it on the above example. To reduce the coefficients of the polynomials in A modulo p in Maple use > B := A mod p; To evaluate the polynomials in B at x = α modulo p in Maple use > C := Eval(B,x=alpha) mod p; To compute the determinant of a matrix C over Zp in Maple use > Det(C) mod p; (b) (10 marks) Suppose A is an n by n matrix over Z[x] and and |ai,j,k| < Bm. That is A is an n by n matrix of polynomials of degree at most d with coefficients at most m base B digits long. Assume the primes satisfy B < p < 2B and that arithmetic in Zp costs O(1). Estimate the time complexity of your algorithm in big O notation as a function of n, m and d. Make reasonable simplifying assumptions such as n < B and d < B as necessary. State your assumptions. Also helpful is ln n! < n ln n for n > 1. Question 4: Lagrange Interpolation (20 marks) In class we stated the following theorem for polynomial interpolation. Theorem: Let F be a field. Let (x1, y1),(x2, y2), . . . ,(xn, yn) be n points in F2. If the xi are distinct there exists a unique polynomial f(z) in F[z] satisfying deg(f) ≤ n−1 and f(xi) = yi for 1 ≤ i ≤ n. Lagrange interpolation is an O(n2) algorithm for computing f(z). It does 1. Expand the product M(z) = (z − xi). 2. Set Li(z) = M(z)/(z − xi) for 1 ≤ i ≤ n. 3. Set αi = Li(xi) for 1 ≤ i ≤ n. 4. Set βi = yi · αi −1 for 1 ≤ i ≤ n. 5. Set f = βiLi(z). (a) For F = Z7, x = [1, 2, 3, 4] and y = [0, 5, 5, 0], use Maple’s Interp(x,y,z) mod p; command to find f(z). Now, using Maple as a calculator, execute Steps 1 to 5 to find the interpolating polynomial f(z). I suggest you use Arrays for L, α and β. (b) Write a Maple procedure INTERP(x,y,z,p) that uses Lagrange interpolation to interpolate f(z) for the field F = Zp, that is, for the integers modulo p. Please print out the Li polyno-mials. Test your Maple procedure on the example in part (a). (c) Show that Steps 1,2,3, and 5 do O(n 2 ) multiplications in F. Since Step 4 does n multiplica-tions and n inverses in F, conclude that Lagrange interpolation does O(n 2 ) multiplications in F. Please note the following. An obvious way to code Step 1 in Maple for F = Z7 > M := z-x[1] mod 7; > for i from 2 to n do M := Expand((z-x[i])*M) mod 7; od; In the loop, at step i, this multiplies (z−xi) by M where for some coefficients bk ∈ F. This multiplication is special because the factors (z − xk) and M are both monic. To minimize the number of multiplications in F we can use which needs only i − 1 multiplications xk · bk.