The 1st Assignment of Public Finance (ECON4043) Question 1 [34 marks total] Suppose that Mark is a single father who spends all of his income and TANF benefits on education for his children (denoted by E). Let H denote the hours worked per year for Mark and F represent the free hours he spends for himself. Mark can work a maximum of 2500 hours per year and assume that he spends all of his remaining time for leisure (H + F = 2500). Assume further that Mark can initially make $25/hour and education costs $1/unit. (Round the answer to 2 decimals ifit’s necessary) a. [4 marks] Write down the initial budget constraint for Mark in terms of F and E. What is the price of an hour of leisure for Mark? Draw the budget constraint indicating the x- intercept (F), y-intercept (E) and the slope. b. [6 marks] Now suppose that the government is planning to introduce an TANF policy with a grantee benefit of $12,000 and a benefit reduction rate of 60% for all the single parents, what is the price of leisure for Mark under this policy? Draw the budget constraints for this policy as well as for the initial case on the same graph. What do you predict the impact of this TANF policy will be on Mark’s education expenditure (E) and his leisure time spent for himself (F)? Identify the income and substitution effects separately. c. [6 marks] Based on the above b, now the government decides to decrease the grantee benefit from $12,000 to $6,000 and using the same deduction rate, what is the price of leisure for Mark under this new policy? Draw the budget constraints for this new policy as well as for above two cases on the same graph. What do you predict the impact of TANF policy changes will be on Mark’s education expenditure (E) and his leisure time spent for himself (F)? Identify the income and substitution effects separately. d. [5 marks] Assume that Mark’s utility function takes the following form. U(E,F) = 150ln(F) + 75ln(E) Set up the utility maximization problem and solve for Mark’s optimum expenditure on education (E), leisure (F) and utility (U) under two TANF policies in above b and c. How large will the labor supply response be to the policy changes? e. [4 marks] Assume the labor market of this economy is an competitive market, and the supply curves and the demand curve of the labor market in this economy are as followed: Original Supply Curve: Qs=150,000+8w New Supply Curve with TANF $12,000: Qs=100,000+8w New Supply Curve with TANF $6,000: Qs=120,000+8w Demand Curve: Qd =200,000-10w Using the tools of welfare analysis to measure the welfare implications of the introduction TANF and cutting TANF benefits (Calculate the deadweight loss and show it in a diagram). f. [4 marks] Calculate the price elasticity of supply when wage changes from $25 to $30, hours worked per year increased from 2500 hours to 2800 hours. And it is elastic or inelastic or unit elastic? If the government permanently uses the TANF, will the policy have a larger effect one year from now or five years from now? g. [5 marks] Suppose you are hired by the government to evaluate the impact of the above TANF policy change, for example, from a large TANF to a lower TANF. What type of sample data would you use? What type of estimation method would you use? Explain. Question 2 [23 marks total] Suppose that a local government starts with a balanced budget, and plans to implement a new law to increase transfer payments of medical care to local citizens, because the government believes that this change might improve the health situation of the citizens, and hence promote their productivity and production efficiency in the long run. a. [3 marks] If the new law is passed and put into force, is the increased transfer payment an entitlement spending or a discretionary spending? Why? b. [6 marks] If the government uses the dynamic scoring rather than the static scoring to evaluate the effects of this new law on the budgetary position, what are possible positive and negative effects? For the negative effects, give an example on the expenditure side and an example on the revenue side (try your best to use what you learned in class). c. [6 marks] Suppose that the current year is year 0, and the new law will increase the government medical care expenditure for $105 million in EACH of the next 10 years (from year 1 to year 10). Suppose the annual interest rate is 5% and is stable. Calculate the present discounted value (PDV, denoted in $ in period 0) of these increased expenditure in the next 10 years. Show the equation for calculation and the simplified, compact expression of the PDV formula. d. [6 marks] Continue with part c. Suppose that the government plans to increase the local citizens’ payroll tax on medical care to sponsor the increased expenditure caused by the new law. Suppose the plan for the tax is as follows: In year 1, the government can increase the tax revenue by $100 million, and this tax revenue will increase at an annual rate of 3% from year 2 to year 10. Suppose the annual interest rate is still 5% and it is stable. Calculate the present discounted value (PDV, denoted in $ in period 0) of the increased revenue in the next 10 years. Show the equation for calculation and the simplified, compact expression of the PDV formula. e. [2 marks] Continue with parts c and d. If the government uses an intertemporal budget constraint in year 0, does the government have an intertemporal budget surplus or deficit? Why? Question 3 [43 marks total] Part A: Market failures are often caused by the problem of externalities. About the problem of externalities, we have the following questions: a. [6 marks] The production of paper is likely to involve the pollution of water sources. Assuming that there is a competitive market for paper production, use a typical demand- supply diagram to illustrate the effect of the production externality on total social surplus. b. [10 marks] If a corrective (Pigouvian) tax per unit of pollution is imposed on the producer of paper in Part a), use typical demand-supply diagram to show and explain changes in consumer surplus, producer surplus and total social surplus before and after the tax is imposed. Part B: c. [10 marks] Assume that in a hypothetical country there is only one paper mill A, and according to the requirement of its government, this paper mill needs to reduce the emission of pollutants during the production process. Suppose that the overall cost of pollution reduction for this mill is described by C(Q) = 4Q2 , where Q denotes the quantity of pollution reduction, while the social benefit of pollution reduction for this country can be expressed as B(Q) = 320Q - Q2 . Calculate the socially optimal level of this mill’s pollution reduction, show and explain the calculation process. If a tax per unit of pollution is imposed on this paper mill, can the socially optimal level be achieved? Explain why. d. [8 marks] In Part c), if there also exists another paper mill B in this country, and the overall cost of its pollution reduction is described by C(Q) = 60Q + 2Q2 . Calculate the socially optimal levels of pollution reduction for these two mills. If a tax per unit of pollution is levied on these two mills, calculate the tax that would make these two mills generate the optimal amounts of pollution reduction. e. [9 marks] In Part d), If these two mills each produce 70 units of pollution, calculate the total pollution amount in the social optimum. If these two mills are required to reduce their pollution by the same amount, can the socially optimal level of pollution reduction be achieved? Explain why. If these two mills are given the same numbers of pollution permits and if they are allowed to trade them, how can the socially optimal level of pollution reduction be achieved?
FIT5196-S1-2025 assessment 1 (35%) This is a group assessment and worth 35% of your total mark for FIT5196. Due date: 11:55 PM, Friday, 11 April 2025 Text documents, such as those derived from web crawling, typically consist of topically coherent content. Within each segment of topically coherent data, word usage exhibits more consistent lexical distributions compared to the entire dataset. For text analysis tasks, such as passage retrieval in information retrieval (IR), document summarization, recommender systems, and learning-to-rank methods, a linear partitioning of texts into topic segments is effective. In this assessment, your group is required to successfully complete all the five tasks listed below to achieve full marks. Task 1: Parsing Raw Files(7/35) This task touches on the very first step of analysing textual data, i.e., extracting data from semi-structured text files. Allowed libraries: re, json, pandas, datetime, os Input Files Output Files (submission) group.txt group.xlsx (all input files are in student_group# zip file) task1_.json task1_.csv task1_>.ipynb
Synthesis Exercise 2 LINC11 Winter 2025 March 14, 2025 Individual Submission due: Wednesday March 19, 23:59 on Quercus Group Submission due: Thursday April 3, 23:59 on Quercus The following exercises must be completed by uploading a PDF document onto Quercus. Below you will find a set of data, some background information, and a set of discussion questions about this data and the topic. Considering the data, answer the questions posed to you in prose format, using diagrams and structure drawing where necessary for you discussion, or where explicitly asked by the questions. You first attempt at this problem is to be completed individually. The problems in the exercise should be chal- lenging, but possible to work out. For your individual attempt, please use only the data provided below. Your answers will be marked for effort and completion. Your second attempt will be completed in a small group with others. The second attempt should be a more in-depth discussion, taking into account the solutions that each in- dividual group member supplied. This group submission will be marked for depth of discussion and accuracy of the solutions and discussion provided. 1 Welsh i - dirgelwch? Welsh, the most widely-spoken of the three extant Brittonic Celtic languages, is known for its sophisticated in- flectional, tense and aspectual system. In this exercise, you’ll consider the distribution of the Welsh marker i, a marker that appears clause-initially in some embedded clauses. This marker has been the subject of significant debate because its distribution seems to cross-cut a variety of subordinate clause types. Welsh is a verb-initial language, and usually appears with VSO word order. A standard Welsh clause might have a pre-verbal particle, followed by the verb, then the subject. Negation-related material appears before objects, and adverbial material appears at the end of the clause. This order is also generally true of tensed complement clauses: 1.1 Dadansoddi Analysis Considering the data presented here, formulate a descriptive summary of the distribution of i in both its inflected an un-inflected forms. Specifically, in what kind of situations do you observe these elements appearing? What functions do they seem to perform in the clause? Now, let’s analyse the i marker. What category or categories do you think that it represents in the syntax? Why do you think this to be true? Can your analysis account for the distribution you observed above? Crucially, consider the raising and control constructions: provide a specific analysis of one or more contrasting pairs of sentences that support your conclusions. Draw several simple but clear structures to support your analysis. Additionally, are there any datapoints above that provide a challenge to your account? If so, can you formulate an hypothesis on how to account for them, even if you might need more data to support it? Lastly, let’s consider the connection to broader syntactic theory. In class, we discussed how there were a variety of different approaches to Control and Raising constructions in English. Is this data in Welsh supportive of a particular type of analysis for these constructions? If so, state specifically how the data above is best analysed using one approach, and the challenges that an alternative analysis might face in accounting for this data.
Assignment 2 - Scene Building, Camera, Textures, and Lighting - Due Mar 17, 2025 11:59 PM o Spring 2025 CSC 305 A01 A02 X This assignment tests your understanding of hierarchical transformations, the camera, textures, rasterization, and lighting in the real-time graphics pipeline. This assignment is purposefully open-ended, you are meant to be creative. See past examples at the end of this text, and clarifications in the rubric and notes. Write a program in JavaScript/WebGL that draws an animated scene with the requirements given below. Template code is provided to handle many of the aspects necessary to get basic objects rendered with textures and moving in a scene. Use the template code found on the course webpage. There are important notes about the template code regarding textures below! You will very likely have to address these, so make sure to read them carefully. There is a lot of information here, it is for your benefit. Make sure you read the assignment text from beginning to end. If you have questions consult this assignment text first. In Lab 5 and 7 we modify the assignment basecode to support textures in multiple ways. I strongly suggest using either Lab 5 (with help from Lab 7 later if you need it) as your starting point! Lab 5 is in the Syllabus, Lab 7 will be later. Note you will find that you can not complete all components immediately. There are two key labs upcoming 5 and 6 (with additional hints in 7 for those who need additional insights near the due date of the assignment), and several lectures, rasterization, textures, and lighting that are needed to complete this assignment. Marking Scheme Total Marks: 47 1. [4 Marks] At least one hierarchical object of at least three levels in the hierarchy (e.g. human arm body - > shoulder - > elbow ...) where joint motion clearly shows the interaction between levels. A good example of this is the legs in A1, note that legs in A2 will not count (you can not simply use the legs to fulfill this requirement). 2. [4 Marks] 360-degree camera fly around using lookAt() and setMV() to move the camera in a circle while focusing on a point that the camera is circling. This can be a single fly around or can be a part of a composed scene or can be a loop. 3. [4 Marks] Connection to real-time. You should make sure that your scene runs in real-time on fast enough machines. Real-time means that one simulated second corresponds roughly to one real second. 4. [6 Marks] Make use of at least two textures either procedural or mapped. You must map them to a(n) object(s) in a meaningful way. Using the textures from the Lab modified assignment base code does not count toward the two. Simply placing a texture on a default object using the default object coordinates does not count. Using textures as in the lab code with no meaningful or non-trivial development does not count. 5. [5 Marks] Convert the ADS shader in the assignment base code from a vertex shader to a fragment s hader. You need to compute the lighting equation per fragment. 6. [2 Marks] Convert the Phong to Blinn-Phong in the new fragment shader created in step 3. 7. [5 Marks] At least one shader effect designed from scratch to perform a clearly visible novel effect (novel w.r.t basecode and labs). This can be directly incorporated in the given shader basecode or added to the HTML file, loaded, and compiled as an additional shader program make use of. Each line of your shader code must be commented clearly explaining exactly what the following line does and why. You must clearly identify the purpose and effect the shader produces in the submitted README. Note that some really cool effects require very little code. Note that your effect can use a texture and thus may count as part of your novel texture count above (you need to document this in your readme)! Think about how lighting works, how surfaces work, and how your favourite, games, movies, and comics look. Past examples: Create a spotlight rather than a directional light, cel-shading, swirl effect, water caustics, blur, glow, edge highlighting, x-ray, CRT (retro gaming TV), etc. Please be careful here, it is very tempting to pul effects from Shadertoy, YouTube gurus, or some other s hader resource. You need to complete this item yourself! You are better off working out and coding a simple one-line effect that is we l explained than copying someone else 's effect (one can get you fu l marks, and the other can lead to an academic integrity case). 8. [5 Marks] Complexity: scene setup and design, movement of animated elements, and programming. 9. [5 Marks] Creativity: storytelling, scene design, object appearance and other artistic elements. 10. [5 Marks] Quality: Attention to detail, modelling quality, rendering quality, motion control. 11. [2 Marks] Programming style. 12. [-2 Marks if not] Make and submit a movie of your complete scene. The movie should be the resolution of your canvas, and in a standard file format/codec such as mp4. Include a cover image (png or jpg) from your movie. You may use any screen capture program that is available (e.g. ShareX). Some additional info below. 13. [-4 Marks if not] Provide a readme.txt that describes what you have done, what you have omitted, and any other information that will help the grader evaluate your work, including what is stated below. Requirements/Policies Collaboration None. Original Work The assignment must be done from scratch. Apart from the template provided and labs, you should not use code from any other source, including the previous offering of the class. Note there are many shader sources on the internet. Please develop your own first! You can seek inspiration but do not use others’ code. Zero Mark If the code does not run, no objects appear in the window, or only the template code is running properly, no partial marks will be given. Clarifications · Note that creativity and complexity are both important for this assignment, but are in many ways subjective. The mark will be applied objectively in a way that reflects whether you've done something non- trivial w.r.t. the given code. We are not subjectively judging your artistic ability. · Make sure you rotate the hierarchical elements from [1] around the correct point, i.e. where they touch the parent body parts, so bodies do not appear to break apart. · Start your code in main.js:render() and feel free to write additional functions · You must do the assignment from scratch. If you use existing code (e.g. A1, Lab5, Lab 7) then you must state that clearly in your readme.txt file. You are not allowed to use any piece of code from any other source. Using any piece of code from any source (including previous offerings of the course, the web etc) without attribution will be considered plagiarism. · You can use libraries as stated below. You will not get credit for the code you have not written, however, it might help with the visual complexity of your scene. For example, if you use an existing 3D model for required element 1, then you will not get credit for it. See below for more details. Project Submission Instructions 1. Make a backup copy of your project folder first to avoid accidentally deleting your work. 2. Rename the parent project folder to include your name 3. Remove any unnecessary files from the assignment. You need to submit all files required to run your assignment and nothing more. 4. Archive the project folder in a zip folder and call it csc305_assignment_2.zip. 5. Test the archive a. Unzip the archive in a different location. b. Make sure everything works as expected. 6. Submit the zip file on brightspaces. We want to be able to open your zip file, run, and see your project right away. Also, note that you can add as many files as you want to the project as well as modify any settings that you need to (see below). However, it would be useful if you stated unusual settings and additional files in the readme.txt. Movie Submission Instructions 1. There are lots of screen capture software that simplify screen recording. On Windows, ShareX is very good. 2. Call your movie file _ _movie. [mpg,mp4], replacing and with your first name and last name. 3. Call your image file _ _image.jpg. 4. Zip the two files in an archive called _ _bundle.zip. 5. Submit the file on brightspaces. Security Issues with Textures If the textures do not work, check the console for an error message related to “Cross-origin image”. To use textures in WebGL we have to bypass a security issue that is present with most browsers. The easiest way to handle this is with the python HTTP server approach we discussed in Lab. This is how we will mark your assignment. 1. Open explorer/file manager and navigate to the code folder. 2. Now shift-click in the file list and open a command prompt or powershell here 1. Alternatively, you can open a command prompt or power shell from the windows menus and then use cd [folder] to navigate to the correct folder. 3. Now run the following: python -m http.server 8080 4. Now you should be able to open a browser and type localhost:8080 In the URL bar 5. You’ll see your files being hosted on a simple local server, you can now click your html file. FAQ 1. Are there any size limitations to the project? i.e. number of classes, project size, memory, number of textures, texture resolution etc? A: In terms of texture resolution, number of textures etc, there are: a. Hardware and software limitations. There is a maximum number of textures you can have active at the same time, and there is a maximum size for textures. b. Practical limitations. Brightspace has a 1gb limit on what you can upload and a recommended 40mb maximum. You also want your project to run in real-time. 2. What should be the duration of the movie? A: However long to capture it generally, if it loops you may want to capture two or three loops. Often these are between 20-60 seconds. 3. Can we use images from the web? A: As long as the images are free for public use, sure. Do not use assets that are copyrighted. Note that the filters on google images are rarely correct w.r.t usage rights. Check out resources like: a. OpenGameArt https:/opengameart.org/ b. Wiki Commons https:/commons.w ikimedia.org/w iki/Mai n_ Page c. PARIS MUSÉES https:/www.parismuseescollections.paris.fr/en d. Smithsonian Open Access https:/www.si.edu/OpenAccess 4. Can we use external libraries? A: As long as they (a) are free for public use, (b) do not make the required elements unfairly easier for you (do not use an engine, tool, package, module, or any resource that solves a required element for you), and (c) the TA can run your project, you can use additional libraries. However, if the TA cannot run your program because of missing dependencies etc. you will have a problem. Generally, that means they need to be freely available and you need to submit the complete assignment. No setup or downloading on our part should be required to get it to run. 5. Can we use audio? A: You can, however, 3) and 4) also apply here. 6. How ambitious can we be? A: I encourage you to take this opportunity and explore your programming abilities and your creative interests. However, I advise you not to aim too high, unless you are confident about what you are doing. Above all, FIRST make sure that you have covered the required elements of the assignment. THEN, add complexity incrementally in a way that does not break your base of working elements. Hints · Use the timestamp or dt variables to synchronize your animation elements. · Use setColor(r,g,b) to set the desired colours. · For the motions, you may want to use functions such as 1. x(timestamp) = A* Math.cos(w*timestamp+h) Where timestamp is real-time, A is amplitude, Math.cos is cosine function, w is the angular frequency, h is the phase 1. x(t + dt) = x(t) + x'(t)dt Next position in time ( x(t + dt) ) is just the current position ( x(t) )plus velocity ( x'(t) ) scaled by the change in time (dt). 。Wave functions 。Euler integration for movement · Common mistakes with motion 1. eye[0] = eye[0]*Math.cos(theta) + eye[2]*Math.Sin(theta) ; 2. eye[2] = -eye[0]*Math.sin(theta) + eye[2]*Math.cos(theta) ; / incorrect use of the newly computed eye[0] instead of the original eye[0] 。Camera Motion
ECO1002 Business Economics 2 Assignment 2 Question 1 (10 marks) a. Classify the following persons as employed, unemployed or not in labour force. Put a tick (√, in the appropriate boxes. (5 marks) Person Status Employed Unemployed Not in labour force A Museum guard. Was not at work last week due to illness B Part-time sales assistant at ZARA. Actively looking for full-time work C Unpaid stay-at-home dad. Has not looked for a job in several years D Retired professor. E Has never been employed. Looked for a job last week b. “The minimum value of unemployment rate is zero and it happens when the economy performs well”, is this statement TRUE or FALSE? Explain. (5 marks) Question 2 (6 marks) Suppose that banks do not hold any excess reserves and people hold zero cash. If the central bank buys $10 million of government bonds from the banking system, what will be the effect on the economy’s reserves and money supply given that the reserve requirement for deposits is 15 percent? What type of monetary policy is the central bank adopting in this context? Can you name TWO other monetary policies that would also lead to a similar impact on the economy’s money supply? Question 3 (4 marks) According to the Fisher effect, how does an increase in the inflation rate affect the real interest rate and the nominal interest rate? Question 4 (10 marks) Suppose an economy is in its long-run equilibrium. a. Draw a diagram to illustrate the state of the economy. Be sure to show aggregate demand, short- run aggregate supply, and long-run aggregate supply. Label the axes. (2 marks) b. Now suppose that an investment tax credit is removed. Use your diagram in (a) to show what happens to output, price level and unemployment rate in the short run. (8 marks)
[pdf-embedder url="https://assignmentchef.com/wp-content/uploads/2025/03/CIS_5530_Project2-3.pdf" title="CIS_5530_Project2 3"] CIS 5530: Project 2 A DHT-based Search Engine Spring 2025 Milestone 1 due Mar. 31 Full solution due Apr. 16 Directions You must work in groups of two for this project. Please regularly check Ed throughout this course for clarifications on project specifications. You are required to version control your code, but please only use the GitHub repository created for you by the 5530 staff. You should have a different repo for project 2 vs the one used in project 1 even if your team members stayed the same. Do not work in public GitHub repositories! Please avoid publishing this project at any time, even post-submission, to observe course integrity policies. If you are caught using code from other groups or any sources from public code repositories, your entire group will receive ZERO for this assignment, and will be sent to the Office of Student Conduct where there will be additional sanctions imposed by the university. 1 Overview In this project, you will implement a peer-to-peer search engine (PennSearch) that runs over your implementation of the Chord Distributed Hash Table (PennChord). You are expected to read the entire Chord paper carefully, clarify doubts with your TAs or instructor, or post questions on Ed. You will start using standard OLSR as your routing protocol. The use of OLSR can be turned on using a command line flag --routing=NS3 to simulator-main.cc . You are then responsible for first developing Chord as an overlay network layered on top of your routing protocol, followed by building the search engine application that uses Chord. As Extra Credit, you can use your routing protocols from Project 1 and run Project 2 over the Project 1 network layer (using --routing=LS/DV ). To help you get started, files related to Project 2 include: • simulator-main.cc: In addition to what you learned in project 1, it has SEARCH LOG() and CHORD LOG() functions to log all messages relevant to the search engine and Chord overlay, respectively. It also includes modules for CHORD and PENNSEARCH. • scenarios/pennsearch.sce: An example scenario file that contains a simple 3-node Chord network, the publishing of document keywords by two nodes, and three example queries. • keys/metadata0.keys, keys/metadata1.keys: These keys are located inside the contrib/- upenn-cis553/ directory and not in the scenario directory. Each file contains the meta-data for a set of documents, where each row “DOC T1 T2 T3. . . ” is a set of keywords (in this case, T1, T2, and T3) that can be used to search for a document with identifier DOC. Each document identifier can be a web URL or a library catalog number, and for the purpose of this project, they are simply a string of bytes that uniquely represent each document. In practice, these keywords are generated by reading/parsing web content or other documents to extract keywords. However, since parsing web pages is not the focus of this project, we have skipped this step and supplied these document keywords to you. 1 • penn-chord-*.[h/cc]: Skeleton code for your Chord implementation. • penn-search-*.[h/cc]: Skeleton code for your PennSearch implementation. • grader-logs.[h/cc]: These files contain the functions you need to invoke for the autograder to correctly parse your submission. Please read the documentation in the header file. The command to compile and run Project 2 is the same as Section 2.3 and 2.4 in Project 1 code documentation. Please do not use --result-check flag in the command, and please comment out calls to checkNeigbhorTableEntry() and checkRoutingTableEntry() if you use your LS and DV implementation. You can also opt to use ns-3’s OLSR implementation instead of your own LS and DV. Note that our skeleton code is a starting point, and you are allowed to be creative and structure your code based on your own design. For regular credit, you are, however, not allowed to make any code changes outside of upenn-cis553 directory, nor are you allowed to change simulator-main.cc. In addition to our sample test script, we expect you to design additional test cases that demonstrate the correctness of your PennSearch implementation. We encourage you to get started early. As a result, we have included a milestone 1, where you should have a basic Chord implementation. 2 PennChord Chord nodes will execute as an overlay network on top of your existing routing protocol, i.e., all messages between any two Chord nodes 1 and 2 will traverse the shortest path computed by the underlying routing protocol. The PennChord overlay network should be started only after the routing protocol has converged (i.e., finish computing all the routing tables). You can assume that the links used in the underlying routing protocol computation do not change while PennChord is executed. Also, not all nodes need to participate in the Chord overlay, i.e., the number of nodes in your underlying network topology may be larger than the number of nodes in the Chord overlay. Your PennChord implementation should include finger tables and the use of 32-bit cryptographic hashes. Correct Chord Behavior We will also be using our own test cases to ensure correct Chord behavior. For simplicity, assume each node only maintains one successor (i.e., the closest node in the clockwise direction). You have to implement the actual stabilization functionality described in the Chord paper. In particular, we will check for the following features: • Correct lookup. All lookups routed via Chord must reach the correct node via the right intermediate nodes. You need to support the basic Chord API, IPAddr ← lookup(k), as described in lecture. This API is not exposed to the scenario file explicitly but is used by PennSearch to locate nodes responsible for storing a given keyword. • Consistent routing. All nodes agree on lookup(k) . • Well-formed ring. For each node n, its successor’s predecessor is itself. See Ring State described in the autograder section. • Correct storage. Every item K is stored at the correct node (i.e., lookup(k) ) • Performance. For a given network size, you need to compute and output the average hop count required by Chord lookups that occur during the duration of your simulation. In other words, you should implement the code that will capture the number of hops required by each lookup and output using CHORD LOG the average hop count across all lookups when the simulation ends. The average hop count must exclude lookups that are generated during periodic finger fixing. 2 After finger fixing is correctly implemented, the average hop count should be log(N) instead of N, where N is the number of nodes in the Chord overlay network. • Stabilization protocol. Enough debugging messages (not too many!) to show us that periodic stabilization is happening correctly. Since stabilization generates large numbers of messages, you should provide mechanisms to turn on/off such debugging in your code. Summary of Commands • Start landmark node: Designate a node as your landmark (i.e., node 0). E.g., 0 PENNSEARCH CHORD JOIN 0 will designate node 0 as the landmark node since the source and landmark nodes are the same. • Nodes join: A PennChord node joins the overlay via the initial landmark node. E.g., 1 PENNSEARCH CHORD JOIN 0 will allow node 1 to join via the landmark node 0. Once a node has joined the network, items stored at the successor must be redistributed to the new node according to the Chord protocol. For simplicity, you can assume all joins and leaves are sequential, i.e., space all your join events far apart such that the successors and predecessors are updated before the next join occurs. • Voluntary node departure: A PennChord node leaves the Chord network by informing its successor and predecessor of its departure. E.g., 1 PENNSEARCH CHORD LEAVE that will result in node 1 leaving the PennChord network. All data items stored should be redistributed to neighboring nodes accordingly. For simplicity, you can assume all joins and leaves are sequential. Cryptographic Hashes We provide a helper class to generate 32-bit cryptographic hashes from IP addresses and strings. It depends on the OpenSSL libcrypto library. To generate a digest for a message, use: createShaKey(ipAddress) // or createShaKey(string) 3 PennSearch To test your Chord implementation, we will require you to write PennSearch, a simple keyword-based search engine. Basics of Information Retrieval We first provide some basic knowledge that you would need to understand keyword-based information retrieval. We consider the following three sets of document keywords, one for each of Doc1, Doc2, and Doc3: Doc1 T1 T2 Doc2 T1 T2 T3 T4 Doc3 T2 T3 T4 T5 Doc1 is searchable by keywords T1 or T2. Doc2 is searchable by T1, T2, T3, and T4, and Doc3 is searchable by T3, T4, and T5. Typically, these searchable keywords are extracted from the actual documents with identifiers Doc1, Doc2, and Doc3. 3 Based on these keywords, the inverted lists are {Doc1, Doc2} for T1, {Doc1, Doc2, Doc3} for T2, {Doc2, Doc3} for T3, {Doc2, Doc3} for T4, and {Doc3} for T5. Each inverted list for a given keyword essentially stores the set of documents that can be searched using the keyword. In a DHT-based search engine, for each inverted list Tn, we store each list at the node whose Chord node is responsible for the key range that includes hash(Tn). A query for keywords “T1 AND T2” will return the document identifiers “Doc1” and “Doc 2,” and the results are obtained by intersecting the sets {Doc1, Doc2}, and {Doc1, Doc2, Doc3}, which are the inverted lists of T1 and T2 respectively. We only deal with AND queries in PennSearch, so you can ignore queries such as “T1 OR T2”. Note that the query result is not the actual content of the documents, but rather the document identifiers that represent documents that include both T1 and T2. In a typical search engine, an extra document retrieval phase occurs at this point to fetch the actual documents. We consider the actual document content retrieval step out of the scope of this project. Summary of Commands • Inverted list publishing: 2 PENNSEARCH PUBLISH metadata0.keys means that node 2 reads the document metadata file named metadata0.keys. Node 2 then reads each line, which is of the form Doc0 T1 T2 T3 ... , which means that Doc0 is searchable by T1, T2, or T3. After reading the metadata0.keys file, node 2 constructs an inverted list for each keyword it encounters and then publishes the respective inverted indices for each keyword into the PennChord overlay. For instance, if the inverted list for “T2” is “Doc1, Doc 2,” the command publishes the inverted list “Doc1, Doc2” to the node that T2 is hashed to. This node can be determined via a Chord lookup on hash(T2). As a simplification, the inverted lists are append-only, i.e., new DocIDs are added to a set of existing document identifiers for a given keyword, but never deleted from an inverted list. • Search query: 1 PENNSEARCH SEARCH 4 T1 T2 will initiate the search query from node 1, and take the following steps via node 4: (a) Node 1 contacts node 4 with query “T1 AND T2”; (b) Node 4 issues a Chord lookup to find the node that stores the inverted list of T1, i.e., the node that T1 is hashed to (e.g., Node T1), and sends the query “T1 AND T2” to Node T1; (c) Node T1 retrieves an inverted list for T1 from its local store, issues a Chord lookup to find the node that T2 is hashed to (e.g., Node T2), and sends the retrieved inverted list for T1 together with the query “T2” to Node T2; (d) Node T2 sends the intersection of the inverted lists of T1 and T2 as the final results back either directly back to node 1, or to node 4 (which forwards the results to node 1). If there are no matching documents, a “no result” is returned to node 1. For simplicity, you can assume there is no search optimization, so inverted lists are intersected based on left-to-right ordering of search terms. Note that the search may contain an arbitrary number of search terms, e.g., 1 PENNSEARCH SEARCH 4 T1 T2 T3 . NOTES • You can assume that each document identifier appears in only one metadataX.keys file. For instance, if node 0 publishes metadata0.keys, and node 1 publishes metadata1.keys, you can assume that both files do not contain overlapping document identifiers. This emulates the fact that in practice, each node will publish inverted indexes for documents that it owns, and one can make the assumption that each node owns a unique set of documents. On the other hand, each searchable keyword may return multiple document identifiers. For instance, there are two documents Doc2 and Doc3 that can be searched using the keyword T3. 4 • You can assume that only nodes that participate in the PennSearch overlay can have permission to read document keywords and publish inverted indexes. Nodes, which are outside the Chord network, may initiate SEARCH by contacting any node, which is part of the PennSearch overlay. For instance, in our example command above, 1 PENNSEARCH SEARCH 4 T1 T2 means that node 1 (which may be a node outside the PennSearch overlay) can issue a search query for “T1 and T2” via node 4, which is already part of the PennSearch overlay. 4 Milestones • Milestone 1: (15%) (Autograded) We expect a basic Chord implementation where the ring stabilization protocol works correctly. At this stage, finger table implementation is optional for this milestone. We require only the ringstate output to make sure your ring is formed correctly and maintained as nodes enter/leave the Chord overlay. • Milestone 2a (39%), Milestone 2b (46%) (Both autograded) Complete working implementation of PennChord and PennSearch. Milestone is split into two parts for easier submission. Criteria for Milestone 2A: – Node to join the ring (3 points) * event : node 0 - node 19 join the ring * result : should see lookupissue, lookuprequest, lookupresult log – Well formed ring (3 points) * event : 3 PENNSEARCH CHORD RINGSTATE * result : check well-formed ring, can also check above request path – P keyword metadata (8 points, 2 points each) * events : 2 PENNSEARCH PUBLISH metadata2.keys 3 PENNSEARCH PUBLISH metadata3.keys 4 PENNSEARCH PUBLISH metadata4.keys 5 PENNSEARCH PUBLISH metadata5.keys * result : should see publish and store log – Search correctness (9 points, 3 points each) * event : 1 PENNSEARCH SEARCH 4 Johnny-Depp * result : Pirates-of-the-Caribbean Alice-in-Wonderland * event : 3 PENNSEARCH SEARCH 10 Johnny-Depp Keira-Knightley * result : Pirates-of-the-Caribbean * event : 8 PENNSEARCH SEARCH 17 George-Clooney Brad-Pitt Matt-Damon * result : Ocean’s-Eleven Ocean’s-Twelve – Search consistency (4 points) * events : 2 PENNSEARCH SEARCH 12 Brad-Pitt 3 PENNSEARCH SEARCH 13 Brad-Pitt 4 PENNSEARCH SEARCH 14 Brad-Pitt * result : should all return Mr-Mrs-Smith Ocean’s-Eleven Ocean’s-Twelve Fight-Club 5 – Multiple searches from one node at same time (9 points, 3 points each) * event : 15 PENNSEARCH SEARCH 15 Tom-Hardy * result : The-Dark-Knight-Rises Mad-Max * event : 15 PENNSEARCH SEARCH 3 Emilia-Clarke * result : Game-of-Thrones * event : 15 PENNSEARCH SEARCH 12 Chadwick-Boseemann * result : No such file – Non-chord node issue search (3 points) * event : 25 PENNSEARCH SEARCH 9 Tom-Hanks * result : Forrest-Gump Toy-Story * event : 21 PENNSEARCH SEARCH 16 Jeremy-Renner * result : Arrival Criteria for Milestone 2B: Milestone 2B will test correctness for more advanced cases. It is currently designed as an unseen test — but the autograder will give information for any wrong implementations and hints on how to fix them. Important: in this milestone (2b), you will need to change your Ring State implementation to indicate that all ring state messages for the current ring have been printed to stdout, by printing a End of Ring State message after the last node in the ring printed its ring state message. Autograder The autograder parses specific logs from your submission. Please make sure you have all of your own printouts commented out before making your submission. Having your own printouts interleaved with the logs autograder expects may cause it to crash. You will use your GitHub repo and the Gradescope infrastructure for your submissions. In the following logs, “ (...) ” means to use the corresponding arguments for the function. Please take a moment to read the comments in grader-logs.h. • Ring state (MS1): At any node X, a X PENNSEARCH CHORD RINGSTATE command will initiate a ring output message. The node receiving this command should call the following function to log its ringstate, after that it should pass a message to its successor, which should also call the following function, and so on until X is reached again. GraderLogs::RingState(...) You should see something like this for every node in the ring. Note that if your implementation is correct, IDs, IPs, and Hashes should all form a doubly linked list. Ring State Curr Pred Succ The last node in ringstate (X’s predecessor) should call the following function after invoking RingState: GraderLogs::EndOfRingState() 6 CHORD_LOG(GraderLogs::GetLookupIssueLogStr(...)) • Lookup issue: Every time a node issues a lookup request, the following should be called: • Lookup forwarding: Every time a node forwards a lookup request, the following should be called: CHORD_LOG(GraderLogs::GetLookupForwardingLogStr(...)) • Lookup results: Every time a node returns a result in response to a lookup request back to the node that originated the initial lookup request, the following should be called: CHORD_LOG(GraderLogs::GetLookupResultLogStr(...)) • Inverted list publish: Whenever a node publishes a new inverted list entry (an entry is a key, value pair), the following should be called: SEARCH_LOG(GraderLogs::GetPublishLogStr(...)) • Inverted list storage: Whenever a node (that the keyword is hashed to) receives a new inverted list entry to be stored, the following should be called: SEARCH_LOG(GraderLogs::GetStoreLogStr(...)) Note that if CHORD LOG is turned on, this means that between each Publish and Store output message, we should see a series of lookup output messages. Also, note that when a node leaves the ring, this should trigger its keys to be transferred to another node. This transfer should result in new store messages. • Search: Whenever a node issues a search query with terms T1, T2,. . . ,Tn, the following should be called: SEARCH_LOG(GraderLogs::GetSearchLogStr(...)) • Inverted list shipping: For each inverted list being shipped in the process of the search, the following should be called: SEARCH_LOG(GraderLogs::GetInvertedListShipLogStr(...)) • Search results: At the end of intersecting all keywords (T1, T2, . . . , Tn), output the final document list that is being sent back to the initial query node: SEARCH_LOG(GraderLogs::GetSearchResultsLogStr(...)) 7 • Average Hop Count: The following function should be called on the destructor of each node chord layer. Note that only hops related to searches are counted here (i.e., search layer, not chord layer). For MS1, you should see O(N), and for MS2, you should see O(log(N)) hops. Example: If node 8 gets a lookup related to search, then it forwards to node 10 (hop 1), node 10 forwards to node 15 (hop 2), and node 15 sends the response to node 8 (no hop here, it is a response, not a lookup anymore!). GraderLogs::AverageHopCount(...) Hints • Own Address: Use the following API to obtain current node IP in the application layer: Ipv4Address GetLocalAddress(); • Node Hash key: Use the following API to obtain node hash: uint32_t CreateShaKey(const Ipv4Address &ip) • Callbacks: For MS2, make sure you understand callbacks implementation and check how pings are implemented in the Chord layer. Note that these are different pings from project 1, which used network-layer pings. These are overlay network pings. This is the callback setter in chord: void PennChord::SetPingSuccessCallback( Callback pingSuccessFn) { m_pingSuccessFn = pingSuccessFn; } This is Penn-search using the setter to register a callback (i.e. ”Hey Chord when you have a PingSuccess please execute my function HandleChordPingSuccess”) // Configure Callbacks with Chord m_chord->SetPingSuccessCallback( MakeCallback(&PennSearch::HandleChordPingSuccess, this)); This is the actual function in Penn-Search: void PennSearch::HandleChordPingSuccess( Ipv4Address destAddress, std::string message) { SEARCH_LOG("Chord Ping Success! ...."); // Send ping via search layer SendPennSearchPing(destAddress, message); } This is Chord executing the callback: 8 m_pingSuccessFn(sourceAddress, message.GetPingRsp().pingMessage); Extra Credit We have suggested several avenues for extra credit, to enable students to experiment with the challenges faced in designing the Chord protocol to work 24x7 over the wide-area. Note that doing extra credit is entirely optional. We offer extra-credit problems as a way of providing challenges for those students with both the time and interest to pursue certain issues in more depth. Similar to Project 1, extra credit will be submitted separately from regular credit on Gradescope. You can demo your extra credit to your assigned TA after the deadline. Make sure to document your work, design choices, and test cases. • Demonstrate using your LS/DV from project 1 (5%): If you use your own LS or DV instead of the default ns-3 routing for your demo, you will get 5% extra credit. This may require some modifications to your project 1 code to get this to work properly, in particular, implement routeInput and routeOutput functionality, which is not tested by our autograder. • Chord file system (20%). The Chord File System (CFS) is a fairly sophisticated application that uses Chord. Hence, this has more extra credit than other applications. You need to demonstrate to us that you can take a few fairly large files, store it in CFS, and retrieve them correctly. To earn the entire 20%, you need to implement all the functionalities described in the CFS paper. You can search online for the MIT CFS paper. • Chord performance enhancements (5%): Chord has O(log N) hop performance in the virtual ring. However, the adjacent nodes in the ring may actually be far away in terms of network distance. How can one take advantage of network distance information (gathered via the link/path costs from ns-3 or your project 1 routing implementation) to select better Chord neighbors? Do some research online to find techniques explored by others to solve this problem and implement one such solution. • Churn handling and failure detection (10%): Add support to emulate failures of PennChord nodes, mechanisms to detect failures of PennChord nodes, and repair the routing state in PennChord accordingly when failures occur. To ensure robustness, each node needs to maintain multiple successors. All churn events (node joins, leaves, or failures) can happen concurrently, and a node that fails can rejoins the overlay later. You can, however, not concern yourself with failures at the network layer, i.e., all failures occur at the application-layer node. • PennSearch on mobile devices (20%). Using your Android phones, demonstrate a small (2-3) PennSearch network running on mobile devices. This requires taking the ns-3 code and compiling it on Android. If you do not have an Android, you can use an Android emulator. 9
Assignment 2Due: Friday 03/26/2025 @ 11:59pm ESTDisclaimerI encourage you to work together, I am a firm believer that we are at our best (and learn better) whenwe communicate with our peers. Perspective is incredibly important when it comes to solving prob lems, and sometimes it takes talking to other humans (or rubber ducks in the case of programmers)to gain a perspective we normally would not be able to achieve on our own. The only thing I ask isthat you report who you work with: this is not to punish anyone, but instead will help me figure outwhat topics I need to spend extra time on/who to help. Reminder, you are not to share code withothers nor should you use virtual assistants to help in the development of your code in this assignment.Setup: The DataWhen processing languages without standardized spelling rules or historical language that followedrules that are no longer standard, spelling normalization is a necessary first step. In this assignment,you will build a model that learns how to convert Shakespearean original English spelling to modernEngligh spelling. The data in this assignment comes from the text of Hamlet from Shakespeare’s FirstFolio in original and modern spelling. The training data contains all text up to where Hamlet dies(spoilers!), and the test data is the last 50 or so lines afterwards.Included with this pdf is a directory called data. Inside this directory, you will find four files:• data/train.old. This file contains training sequences (one sequence per line) of language thatis spelled according to Shakespearean English spelling rules.• data/train.new. This file contains the same training sequences (one sequence per line) oflanguage that is spelled according to modern English spelling rules.• data/test.old. This file contains test sequences (one sequence per line) of language that isspelled according to Shakespearean English spelling rules.• data/test.new. This file contains the same test sequences (one sequence per line) of languagethat is spelled according to modern English spelling rules.Section: Starter CodeIn addition to the data directory, there are quite a few code files included with this pdf. Some ofthese files are organized into sub-directories for organization. The files and directories are:• eval/: This directory contains all code files used to evaluate a model’s performance. In thisassignment we will use just one flavor of performance, Character Error Rate:– eval/cer.py: This file contains a metric called “Character Error Rate”. This metricmeasures the average Levenshtein distance between n sequence pairs. Lower is better.• vocab.py: This file contains the Vocab class. This class acts as a set of tokens, and will allowyou to convert between the raw token and its index into the set. This will become importantin the future when we work with neural networks, but for now this functionality isn’t strictlynecessary but I want you to get used to seeing it.Page 1 of 6CS505: Introduction to NLP March 5, 2025• from file.py: This file contains some useful functions for loading data from files.• fst.py: This file contains a few types, but two most important are the Transition and FSTclass. The tl;dr of this file is that I have implemented Finite-State Transducers for you, includingnormalization and composition. If you want to create a fst in code, you should instantiate theFST class. You will then need to set the start and accept states, and provide transitions expressedas Transition objects for the FST instance to update. Whenever you want to lookup transitions,you will need to examine one of three mappings in your FST instance (lets call the instance x):– x.transitions from: This mapping contains all outgoing edges from a state. The key isa state, and the value is a mapping of Transition objects to their weights.– x.transitions to: This mapping contains all incoming edges to a state. The key is astate, and the value is a mapping of Transition objects to their weights.– x.transitions on: This mapping contains all edges that use the same input symbol. Thekey is an input symbol, and the value is a mapping of Transition objects to their weights.One final helpful function in this file is called create seq fst. This function will, given asequence, convert that sequence into a FST object.• topsort.py: This file contains the code necessary to perform a topological ordering of the statesof a FST instance. The entry-point function is called fst topsort.• models/: This directory contains the meat of this assignment. It contains three files, two ofwhich you will need to complete:– models/lm.py: This file contains an implementation of a smoothed KneserNey languagemodel and has code to convert such a language model into a fst.– models/modernizer.py: This file contains the model you will be completing in this assign ment.Page 2 of 6CS505: Introduction to NLP March 5, 2025The Typo Model (10 points)In this section you will learn about the typo model FST we will use to modernize Shakespeareanspelling. Specifically, I want you to complete the create typo model and init typo model methodsin the Modernizer class. The typo model you will build looks like this (shown for a language withonly two tokens, yours will have more edges but the same states and topology):Page 3 of 6CS505: Introduction to NLP March 5, 2025You can use the visualize method in the FST class to check that your typo-model is constructedwith the correct topology. In the create typo model I only want you to focus on creating anunweighted typo-model FST (with the correct topology), and then setting the weights of yourtypo-model FST in the init typo model. When you initialize the weights of your typo model, mostletters in Shakespearean English will stay the same in modern spelling, so for transitions that consumesymbol a and emit symbol a, you should assign a rather large (positive) weight compared to transtionsof the form a : b by like a factor of 100 or something. Additionally you should prefer transitions of theform a : b over ϵ transitions, although the magnitude of this preference isn’t as critical as the preferencefor a : a transitions to a : b transitions. When you are done assigning weights (remember they mustall be positive values), you can call the normalize cond method on a FST object to normalize all thepmfs.Viterbi and Decoding (20 points)Inside the Modernizer class, there is a method stub for a method called viterbi traverse. In thisassignment you will implement multiple flavors of the viterbi algorithm, but all flavors will share thesame graph traversal. As discussed in class, you can separate the functionality into a traversal method(i.e. viterbi traverse) and a suite of different viterbi wrapper methods that perform the variousflavors of viterbi. Please see the method stub for viterbi traverse to see a description of the func tion pointer you should pass as an argument.Once your viterbi traverse method is functioning, I want you to complete the viterbi decodemethod. This method is where you will implement viterbi decoding, and it should use the yourviterbi traverse method.newlineFinally, to get decoding working, you should complete the decode seq method. In this method, youwill use your viterbi decode method to find the largest weighted path (along with the weight whichis a log-probability) of a FST. This FST you will construct using the following logic:1. Build a FST for the sequence w that is given as the argument to the decode seq method. Callthis FST Mw2. Compose MLM (i.e. the FST for the language model), MTM (i.e. the FST for the typo model)and Mw together to produce FST M. The ordering of the machines is important, but not theorder of operations. See which one uses less RAM and time, and go with that order of operations.See the method description for more details.3. Run your viterbi decoding algorithm on FST M.Create a script called init.py that constructs your Modernizer, creates and initializes your typomodel on the training data, and using the decode method, prints the first 10 decodings on the test setalong with their log-probabilities in a method called decode. This method should take no argumentsand should produce your fully-created Modernizer. Include these printouts in your report. Includein your report the Character Error Rate from decoding the entire test set. For full credit you shouldget at most 10%.Page 4 of 6CS505: Introduction to NLP March 5, 2025Forward and Brittle Training (35 points)Inside the Modernizer class, there is a method stub for a method called viterbi forward. This iswhere you will implement the forward algorithm described for HMMs, but works identically on FSTs.The good news is that it is only slightly different from viterbi decode: while decoding calculatesthe largest-weighted path through the graph, the forward algorithm calculates the sum of all pathsthrough the graph starting at the start state of the FST. This means that your viterbi forwardalgorithm will be implemented almost identically to your viterbi decode.newlineWith your viterbi forward algorithm working, you can use it to complete the loglikelihoodmethod. This method calculates the log-likelihood of a collection of samples that are assumed tobe distributed i.i.d. To calculate this, you should use your forward algorithm on each sequence win the dataset to calculate P r[w], log these values and add up the log-probabilities. This sum oflog-probabilities is what your loglikelihood method should return.Once you can calculate the loglikelihood of the dataset, it is time to train the typo model using hard(brittle) expectation maximization. I am only asking you to implement the e-step, which is containedwithin the brittle estep method. In this method, you need to calculate the “hard” counts for each(s, w) token pair, where s is a token in the modern English vocabulary, and w is a token in theShakespearean English vocabulary. To speed up this process (and also to get better results), we aregoing to leverage parallel data. Therefore, during your E-step, the FST you decode is constructeddifferently than the one in your decode seq method In your brittle estep method, when consideringShakespearean sequence w, you also have its modern spelling s:1. First build Mw like before (i.e. convert w into a FST).2. Now convert s into a FST to create Ms.3. Compose together Ms with MTM and Mw. Note that in decode seq, you used MLM in placeof Ms.These hard counts are, as discussed in lecture, calculated by decoding a Shakespearean sequencew to get its most likely correction s∗(at least according to the model right now), and treating that s∗as if it was ground truth. The mapping of (s, w) to their counts is what your brittle estep methodshould return.You are now ready to train the model using brittle-EM. Write a small script called hard em.py thatconstructs your Modernizer, creates and initializes your typo and language model on the training data,and then trains your typo model on the training data until the log-likelihood converges in a functioncalled train. This function should take no arguments and produce a fully trained Modernizer. Youwill likely want to do some add-δ smoothing in your m-step (which I have already implemented foryou, you just need to set the argument). I have already implemented the brittle em method for you,so you don’t need to implement it from scratch. Report a graph of the log-likelihood as a function ofiteration (you may have to modify brittle em to record the log-likelihoods as a function of iteration)in your report. What is the Character Error Rate of your model on the test set? For full credit itshould be better than 7.5%.Page 5 of 6CS505: Introduction to NLP March 5, 2025Backward and Flexible Training (35 points)Inside the Modernizer class, there is a method stub for a method called viterbi backward. Thismethod is where you will implement the backward algorithm, which is a mirror of the forward algo rithm. While the forward algorithm calculates the sum of all paths from the start state to a node inthe graph, the backward algorithm calculates the sum of all paths starting from a node and endingat the accept state. We calculate this by implementing the forward algorithm but in the reversedirection: instead of traversing the vertices in topological order, we do so in reverse topological order,and instead of examining incoming edges of a state, we examine outgoing edges of a state. As such,your backward algorithm should look suspiciously similar to your forward algorithm. Remember, theyshould both calculate P r[w], so you can easily check correctness!Once your backward algorithm is finished, you can complete the flexible estep method. In thismethod, you still have access to parallel data just like in brittle estep, so you should still build thesame composed FST as in that method. The difference between the two is how you calculate yourcounts. The mapping you produce should have the same structure, but the values of the counts willdiffer, as in this method you should calculate “soft” counts instead of “hard” ones.You are now ready to train the model using flexible-EM. Create a script called soft em.py almostidentical to the last one, but train a model with soft-EM instead of hard-EM. Report a graph of thelog-likelihood as a function of iteration in your report. Which flavor of EM performs better withrespect to our Character Error Rate metric? Which flavor of EM performs better with respect tolog-likelihood? For full credit, your soft-EM model should also be better than 7.5% according toCharacter Error Rate.SubmissionPlease turn in all of the files in your repository to gradescope. Due to a student in HW1, you have lostthe ability to self-report your performance. Training a model will take tens of minutes apiece (minetakes around 30min for hard-EM and longer for soft-EM), so please do not train on the autograder.Instead, you should save your model to disk (I recommend using pickle). When you turn in yourcode, please adjust soft em.train and hard em.train so that they load these files from disk intofully-trained Modernizers instead of training them from scratch. Please also submit the serializedmodels. Assume that on gradescope they will be placed in the same directory as soft em.py andhard em.py. In your report you should include console output of having run your code as well as anyobservations you made during your experiments. If an instruction in this prompt says to report onsomething, you should include those elements in your pdf. Please do not handwrite these: typesetpdfs only. There will be separate links on gradescope for your code and your pdf. Your code will beautograded, while your report will be graded by us.Page 6 of 6
In this assignment, we want you to apply the technique that we have learnt in character animation to add a crouch motion to an animated character.You should continue the work in the tutorial on character animation controller to add the crouch motions in this assignment.You should: 1. Enhance the character in our tutorial to be able to do at least crouch idle, and turn left or right during crouch ,and crouch forward with animation, 2. This should be based on what we have finished in the character animator tutorial, i.e. the character should be able to do walk, run, turn left and right, as well as jump motion. 3. The crouch should be toggled on by using the key “c”. Inside crouched state, pressing “c” key will toggle it off and stand up again.During crouching, the character can perform at least four actions: crouch idle, crouch forward, crouch turn left, crouch turn right. It is all up to you to decide the implementation. Obviously using a 2D blend tree is a natural choice. CSCI4120 Principle of Computer Game Software Assignment 3The animations needed are also coming with the standard assets we provided to you in the tutorial, just take a look. Submissions Submit your completed level together with other needed files (zip the complete project folder) in a single archive to your cloud drive, and send us the public link so that our tutor will be able to download and test. Remember you are required to submit the link to our Blackboard assignment page on or before the deadline. http://blackboard.cuhk.edu.hk
Scripting refers to the techniques of writing scripts for pawns in 3D games so that the pawns will have desired behavior according to the game design. In this assignment, you are required to implement a finite state machine (FSM) to control some tanks in Unity game engine with various behaviors, and with a special weapon being built for the player. Requirements You can use the level built according to our tutorial document to continue the work in this assignment. You should : 1. write the scripts for two different behaviors to apply on two enemy tanks. 2. write the scripts to produce a formation based weapon for the player. I. Finite State Machine The tanks should have common behavior (in addition to the working style later) as follow: 1) The tank should scan for whether the player is nearby continuously. If the player is visible within a distance of 30 units, then the tank should enter attack mode. The scanning process should be able to : I) detect the player within its view cone of angle 120 degree forward with 30 units distance, II) check visibility of the player through an unobstructed line of sight i.e. no walls or other objects in between. CSCI4120 Principle of Computer Game Software Assignment 2 2) In attack mode, the tank will first check whether player is within attack distance (30 units). If yes, it will then orient itself towards you, and fire a missile at you. Then it will wait for 1 second and repeat step 2. Otherwise eg. Enemy tank leave, the tank will return to normal(original working state) mode. In addition, the tanks should have two different styles of working. 1) Goal seeking A goal point is set in the level (the flag of the player). The tank should proceed towards the goal point. 2) Patrol Three waypoints will be set up in level (ordinary GameObject with collider). The tank should patrol between these waypoints. Each tank should have a life point of 300, and with each hit by missile, no matter from friendly fire or player, a reduction of 10 life points will be deducted. A zero life point will see the removal of the tank from game. The fired shell should also be removed from game if they collide with tanks or player tank. However you need not handle those fired shells which did not collide with tanks. II. Homing Missiles Homing missiles are very cool in their actions. We would like to implement a homing missile in this part. The missile should be able to target at a moving target i.e. the missile should update its course of direction according to the position of the target. In this case, the missile need to update its direction according to the target’s position. Your implementation should be as follow. CSCI4120 Principle of Computer Game Software Assignment 2 The homing missile will be triggered when player press right mouse button. Once it is pressed, the player tank should check all the current enemy tanks within a radius of 50 unit distance. Now for all those tanks within range, a homing missile should be created with its corresponding target, one missile per tank. On colliding with the target tank, the target tank should be eliminated immediately no matter the life point is.Remark 1. This assignment need you to use extensive scripting and physics engine knowledge. Unity official scripting documentation should help. 2. You can learn a lot on Unity by following this nice tutorial series g 3. Waypoint in Unity http://forum.unity3d.com/threads/a-waypoint-script-explained-in-super-detail.54678/ 4. Game object rotation is a relatively difficult issue. In this assignment, you are not required to handle them cleanly. We will only focus on the actions taken by the tanks. 5. Homing missiles are very interesting to implement though it looks a bit complicated at start. You may consult the following tutorial by “Dapper Dino” for details. Submissions Submit your completed scripts together with other needed files (zip the complete project folder) in a single archive to your cloud drive, and send us the public link so that our tutor will be able to download and test. Remember you are required to submit the link to our Blackboard assignment page on or before the deadline. http://blackboard.cuhk.edu.hk
Level building is extremely important in game development as this is the world where the player will interact. This assignment will let you have a first experience in level building, in addition to scripting of in game objects (entities).You are required to build a simple level with Unreal engine (recommended 4.24) which has the following: 1. the million road of CUHK (only those in the red T region), 2. the CUHK forum(picture below) 3. the sculpture outside university library (picture below) CSCI4120 Principle of Computer Game Software Assignment 1 For others, you can simply just use a texture to simulate the building outside. Finally you need to script the center area of forum so that when triggered by the ‘e’/’E key, it will descend into the ground. Here when the platform descend into the ground, you can just build a big room there to house the platform (in reality there is really an extension there from the library.) Now in the extension floor, if you press ‘e’/’E’key again, the platform will rise back to the original CUHK forum again. The moving platform part of the CUHK forum should be only the red circled region below. To avoid too much work, the structure within the blue lines marked are NOT required. You only need to build the region marked by orange lines for CUHK forum portion. You may go to this link at Google Earth to get more information for this assignment CSCI4120 Principle of Computer Game Software Assignment 1 https://earth.google.com/web/@22.41962955,114.20540097,99.19929774a,119.47515909d,35y,- 35.32179409h,60t,359.99999879r/data=CmAaXhJYCiUweDM0MDQwODljOTBmMGVjOGY6MHg1ZjA3 ODcyYzc1N2M4ZDVmKi9UaGUgQ2hpbmVzZSBVbml2ZXJzaXR5Cm9mIEhvbmcgS29uZyBGb3J1bQp Gb3J1bRgCIAE?authuser=0 The marking scheme will be based on: 1. how close is your level in appearance to that of million road . (70%) I. major consideration : CUHK forum, sculpture, colored bricks and patterns in million roads, II. Minor details such as trees placement (approximately same number of trees), water in CUHK forum, and other minor details that are whether as close to original as possible. III. All the buildings along million road need not be added except University library façade. 2. the lift performance of CUHK forum as stated above (30%) Submissions Submit your level together with associated material files and whatever additions as a single zip file to the CSCI4120 assignment submission slot at Blackboard.
The final project for CS2401 will be a single large project which is to be submitted in three stages and therefore counts as three projects. For this project we will be implementing a game of OTHELLO, with the final product being a game that can play an intelligent game of OTHELLO against a human opponent, namely your instructor.This game must be derived from the author’s game class, and I have made a copy of this class available in ~jdolan/cs2401/projects/game. The files are game.h and game.cc. I have also included in this directory a file called colors.h which was created by a former student who has given us permission to use it. It allows you to adjust the colors of the screen during a text-based console or ssh session. I have altered the play function in the game class, by commenting out some of the code, so that it will work for the first phase of the project. By the end we will return to the original author’s game class with only a couple of little alterations.The game class actually creates a map for us in how the project is developed, and at the end we will find that most of the ”AI” has already been written for us in this parent class. If you look at game.h, you will see that there are virtual functions that “must be overridden” and some that “may optionally be overridden.” Eventually you will write a child version for all the mandatory overrides, but you probably do not need to override any of the optional ones. (Although the winning function is an exception – his version doesn’t work.)The rules of this game appear on a separate sheet. Basically the game consists of two-sided playing pieces and the goal is to out-flank your opponent which allows you to “capture” his pieces by flipping them over to your color. Pieces are out-flanked when you position a piece is such a way that one or more of your opponent’s pieces in a row are between two of your pieces. Every move consists of putting a piece onto the board. The pieces are never moved from their original location, and the game continues until there are no more moves available to either player. At that time the player with the most pieces showing his or her color wins. The first stage is the design stage. In this part you decide how you will represent the pieces and how you will display the board. Good grades are given for the quality of the design, the attractiveness of the board, and the ease of the user interface. This first stage should be derived from the game class. You know that you will be creating a child class for Othello, and that that class will have some way of storing the board. The board should be a two-dimensional array of spaces, pieces or pointers[1] to spaces or pieces, where the spaces are another class which you have written. This board becomes the principle private member of the Othello class. The spaces class should be able to store all the attributes that a space (or you can call it a piece) might have – emptiness, black, white, as well as mutators and accessor functions to transform a piece/space from one state to another.You should then implement your design to the stage where I can see the board displayed and be allowed to make one initial move. The first step in doing this is to write your space/piece class, which will not be derived from anything, but has the ability to change states. (I think a function called “flip” might be useful.) Then, when you write your Othello class, which will be derived from the author’s game class, the best first step, after declaring your board, is to create stubs for all of the author’s purely virtual functions. (A stub is a function with an empty implementation, which exists merely to validate a call, or to allow the program to compile.) For this stage you should implement the display_status, make_move, and an is_legal which returns true if the move is any of the four allowable initial moves. Notice that the author expects the move to be entered and passed around as a string, and you need to stay with this as it is an essential part of the design. Think of the string as merely a container for characters. (Remember you will have inherited the get_user_move function – just go ahead and use it.) Blackboard submission of this 50 point stage (6a) is due at 11:59 p.m. on Mon. April 13th(Late penalty reduced to 4% per day.)In the second stage you implement a two-player game which allows two humans to play the game against each other on the computer. (Basically you will have one of the humans “be” the computer in the terms of the author’s game class.) All rules should be enforced. This stage will involve a much more extensive is_legal function, since now that function must embody all the rules of the game. Blackboard submission of this 50 point stage (6b) is due at 11:59 p.m. on Tues. April 21st(Late penalty reduced to 3% per day.)In the final stage the computer will play an “intelligent” game against a human opponent. Again all rules should be enforced and the computer should not cheat. This is frequently called the “AI” stage of the game, and you will find that much of the AI work is done by the author’s game class.Electronic submission of this 50 point stage (6c) is due at 11:59 p.m. on Sun., April 26th (Late penalty reduced to 2% per day.) First stage play function will look like this:restart( ); // while (!is_game_over( ))// {display_status( );// if (last_mover( ) == COMPUTER)make_human_move( );// else// make_computer_move( );// }display_status( );// return winning( );return HUMAN;} // Only the colored lines will be active. The second stage play function will look like this:restart( ); while (!is_game_over( )){display_status( );//if (last_mover( ) == COMPUTER)make_human_move( );// else// make_computer_move( );}display_status( );return winning( );}// in your functions the second human player will be //referred to as COMPUTER The third stage play function will be returned to the original:restart( ); while (!is_game_over( )){display_status( );if (last_mover( ) == COMPUTER)make_human_move( );elsemake_computer_move( );}display_status( );return winning( );} You change this function for the various stages by commenting out the appropriate lines of code in game.cc. In general this will be the only alteration that we’re doing on the author’s game class.[1] Pointers make this a lot harder, so I’m not recommending this for any but the most adventuresome.
CS2401 Project 3: CollegeFor this project I have written a node class that is pretty much identical to the one that appears in Chapter 5 of your text. And for the data type I have written a little class that stores information about a single college course. And I have written a main that will allow a uniform program interface to facilitate my grading. You can copy these on your prime account with the command cp /home/jdolan/cs2401/projects/project3/* . or you can download them from Blackboard. (The test.cc file that you find in this directory is just a little file I wrote to make sure that everything compiles and that the input and output are working – you can delete it after you take a quick look at it. You will need the other three files.For this project we will be writing another container class. This one will be built with a linked list constructed using the node-class nodes that I have given you. Now with the first container class that we wrote the order of the items did not matter, and could be changed by the class itself if it was convenient. The second container that we wrote was a sequence. Items stayed in whatever order the programmer had chosen to use in putting them into the container with the internal iterator. But there are also containers where the order is maintained by the container itself. Where there is only a single insert function, and it always puts items into the container at the spot where they would go in an ordered list of those items. Such lists can only be used to store data types and classes that have comparison operators.So your assignment is to create a class that will have, as private variables, a string for the students name and a pointer to the head of a linked list which is built from the nodes that I have provided. The list will have the following capabilities:{OVER} The main that I have written will offer the user a chance to do each of these functions, plus a testing of the copy constructor which is done by making a copy of the list, allowing the user to delete a course from the copy, and then allowing the copy to go out of scope.Even though you are only testing one of the Big 3 you are expected you are expected to write all three.There will also be a file-backup of all the data. The program should load the student’s name (which will be the first thing in the file) and list of courses when it starts up and save the altered list to the same file when it is exiting. Again, we will ask for the student’s username to determine the name of this file. If no file exists the program since the student may be in their first semester of college.In your submission please include all the files that I need to compile and run the program including a sample data file.Submission to Blackboard of this project is due at 11:59 p.m. on Tuesday, March 3rd.
CS2401 Project 2: Your Facebook FriendsThe idea of a sequence class is that the programmer can choose where an item is stored in the list, and that sequence, or order, of the items remains the same, even when things are deleted. In this particular project we are going to pass that capability on to the user allowing them to order their Facebook friends in any way they choose. (We’re just maintaining a list here, not working with your actual Facebook account, and you are allowed to have fictitious friends.) Of course, some people have hundreds of Facebook friends, while others have only a few, so we are going to implement this using a dynamic array.Begin this project by copying the main.cc, date.h, date.cc, friend.h and fbfriends.h files that I have provided in ~jdolan/cs2401/projects/project2 and in Blackboard. First I have written a header file for you for a class called Friend. (Note that we can only call this class Friend if we use capitalization, since the word friend is reserved.) This class has have two private variables, one for the friend’s name and the other for their birthday, the later being of type Date, another class that I have given you. You are to write the implementation of this class, including overloaded insertion and extraction operators as well operators for = = and !=. Two friends are “equal” only if they have the same name and the same birthday. (Doctors’ offices do this because of the low probability that two people will have both the same name and the same birthday.)Test this class by writing a main of your own that declares a two friends, lets you put the information into both, outputs them to the screen and compares them for equality.Now, in the main that I have given you, you will find that the application allowsThere is also a file backup mechanism. In this case it operates using the person’s username for the name of the file. {OVER} I have given you the header file for the container class that makes all of this possible, fbfriends.h. You are to write the implementation of this as well. The private variables for this class consist of a pointer of type Friend and variables for capacity, used, and current_index. The constructor will begin by allocating a dynamic array of Friends capable of holding only five friends. (This will save on memory space for those users with few friends.) When the action of adding an additional friend to the list happens you should check if used == capacity, and if it does do a resize operation, that increases the size of the array by five.This container also has an internal iterator, as the author illustrated in section 3.2 of the text, which will require that you write the functionsand you will find that these are implemented in the same way as they are in the text. You will also need to implement show_all – used mostly for testing purposes, bday_sort, find_friend, and is_friend.Because this is a dynamic array you will need to write a resize function, and the Big 3 (destructor, copy constructor, and assignment operator). And because we’re providing file backup we will also have functions for load and save.Your submission should include a data file of at least four friends, although they can be fictitious. (This is your chance to be friends with your favorite music star, actor, or sports hero.) It would probably be smart to have this file hold at least six friends, since the initial size of the array is five, and the resizing will happen automatically when the sixth person is added.In the grading I will be using my own file of friends, and then testing all the different branches of the menu. It would behoove you to do the same with your list of at least six friends.Submit to Blackboard all the files that are needed to compile and run this program, including the ones that I have given you. Also, be sure that you submit the individual files instead of a zipped version of the files. This project is due at 11:59 p.m. on Sunday, February 16th.
As a runner I have, for years, kept a journal, in which I record the time and distance of every run. (I also record some other information, such as whom I was running with, and the weather, but we’re not going to worry about that here.) For this project I have designed a class, built on the MyTime class that you used in Lab 1, to hold the information about a single run. Your assignment is to development a running journal, that is, a container class for my runs.Begin by copying MyTime.h, MyTime.cc, runtime.h, main.cc and runlog.txt into your working directory. You can get them from the Blackboard for this assignment or by copying them from the directory ~jdolan/cs2401/projects/project1 on prime.Begin by writing the implementation for all the functions in the Runtime class. Please note that when we add, subtract, multiply and divide the runs we do these operations to both the time and the distance. Equality (==) means that both time and distance are the same. Also, doing equality of floating point numbers can be problematic, so it might be good to have the distance equal function call all distances that are within a tenth of a mile equal.Now you are to create a class which has, as its primary private variable, an array capable of holding 200 of these runs. (Please declare the CAPACITY of 200 as a static const in your class.) You will also need a variable to keep track of how many of these spots are filled.Your class will have:Also, all the data is saved to a back-up file that created when the program quits and then automatically reloaded (if the file is present) when the program restarts, thus saving the user from re-typing their data upon each entry. This data should be saved in the same format as I show in the runlog.txt file that I have given you.I have written a main that calls each of these functions and you are to use my main, not one that you have written yourself. Notice that I have commented out each function call to encourage you to develop the project in stages.Your submission should include all source files including the ones that I have given you, and a sample data file of at least ten entries. Blackboard submission of this project is due at 11:59 p.m. on Sunday, February 2nd.
5/5 - (1 vote) Assignment Goals Implement a linear regression calculation Examine the trends in real (messy) data Summary Ed Hopkins over at the Wisconsin State Climatology OfficeLinks to an external site. maintains a listing of the dates and durations of full-freeze ice covers on our dear Madison lakes, Mendota and Monona, found hereLinks to an external site.. This data goes back to the mid-1800s, thanks to handwritten records in some very old log books (they’re behind glass in Ed’s office over in AOS). Analyzing this data is an interesting exploration in the changes in our local climate over time, so let’s use some machine learning techniques to take a look at the data. Program Specification As with most real problems, the data is not as clean or as organized as one would like for machine learning. You’ll be retrieving the data for Lake Mendota from http://www.aos.wisc.edu/~sco/lakes/Mendota-ice.htmlLinks to an external site. and cleaning it for your program. Write the following Python functions in a file called ice_cover.py (yes there are a lot, most of them are pretty short and have sample output): get_dataset() — takes no arguments and returns the data as described below in an n-by-2 array print_stats(dataset) — takes the dataset as produced by the previous function and prints several statistics about the data; does not return anything regression(beta_0, beta_1) — calculates and returns the mean squared error on the dataset given fixed betas gradient_descent(beta_0, beta_1) — performs a single step of gradient descent on the MSE and returns the derivative values as a tuple iterate_gradient(T, eta) — performs T iterations of gradient descent starting at with the given parameter and prints the results; does not return anything compute_betas() — using the closed-form solution, calculates and returns the values of and and the corresponding MSE as a three-element tuple predict(year) — using the closed-form solution betas, return the predicted number of ice days for that year iterate_normalized(T, eta) — normalizes the data before performing gradient descent, prints results as in function 5 sgd(T, eta) — performs stochastic gradient descent, prints results as in function 5 Get Dataset The get_dataset() function should return an n-by-2 array of data, where n is the number of winters between 1855 and 2019. Curate a clean data set starting from 1855-56 and ending in 2019-20. Let x be the beginning year: for 1855-56, x= 1855; for 2019-20, x= 2019; and so on. Let y be the total number of ice days in that year: for 1855-56, y= 118; for 2019-20, y= 70; and so on. Note that some years have multiple freeze/thaw cycles, such as 2018-19. That year should be recorded as x= 2018, y= 86. Although we do not ask you to hand in any visualization code or figures, we strongly advise you to plot the data (using any plotting tool inside or outside Python) and see what it is like. Or you can mess around with PBS Wisconsin’s Ice Cover visualization toolLinks to an external site., which Hobbes thinks is pretty neat but that might be because she wrote it. For simplicity, hard code the data set in your program. For the rest of this assignment we will refer to the year value (e.g. 1855, 1926, 2019) as x and the ice days value (e.g. 118, 103, 70) as y. >>> get_dataset() => [[1855, 118], ... [1926, 103], ... [2019, 70]] Dataset Statistics This is just a quick summary function for the above dataset. When called, you should print: the number of data points the sample mean the sample standard deviation on three lines. Please format your output to include only TWO digits after the decimal point. For example (numbers are made up and do not correspond to actual output, you will need to calculate these results yourself): >>> data = get_dataset() >>> print_stats(data) 165 123.45 32.10 Linear Regression This function will perform linear regression with the model We first define the mean squared error (MSE) as a function of the betas: The two arguments for this function represent these two betas. Return the corresponding MSE as calculated on your dataset (which you should retrieve within your function by calling your get_dataset() function). >>> regression(0,0) => 10827.78 >>> regression(100,0) => 386.57 >>> regression(300,-.1) => 332.83 >>> regression(400,.1) => 242059.01 >>> regression(200,-.2) => 84167.47 Note that I’m continuing to round these values to two decimal places for clarity; your returned values should be the full floats. Gradient Descent This function will perform gradient descent on the MSE. At the current parameter , the gradient is defined by the vector of partial derivatives: This function returns the corresponding gradient as a tuple with the partial derivative with respect to as the first value. >>> gradient_descent(0,0) => (-204.41, -395063.04) >>> gradient_descent(100,0) => (-4.41, -7663.04) >>> gradient_descent(300,-.1) => (8.19, 16289.42) >>> gradient_descent(400,.1) => (982.99, 1905384.49) >>> gradient_descent(200,-.2) => (-579.21, -1121958.11) Iterate Gradient Gradient descent starts from initial parameter and iterates the following updates at time t = 1, 2, … , T: The parameters to this function are the T number of iterations to perform, and (eta), the parameter for the above calculations. Always begin from initial parameter . Print the following for each iteration on one line, separated by spaces: the current iteration number beginning at 1 and ending at T the current value of beta_0 the current value of beta_1 the current MSE As before, all floating point values should be rounded to two digits for output. >>> iterate_gradient(5, 1e-7) 1 0.00 0.04 1079.72 2 0.00 0.05 474.59 3 0.00 0.05 437.03 4 0.00 0.05 434.69 5 0.00 0.05 434.55 >>> iterate_gradient(5, 1e-8) 1 0.00 0.00 9325.63 2 0.00 0.01 8040.58 3 0.00 0.01 6941.27 4 0.00 0.01 6000.84 5 0.00 0.02 5196.33 >>> iterate_gradient(5, 1e-9) 1 0.00 0.00 10672.29 2 0.00 0.00 10519.13 3 0.00 0.00 10368.26 4 0.00 0.00 10219.64 5 0.00 0.00 10073.25 >>> iterate_gradient(5, 1e-6) 1 0.00 0.40 440695.17 2 -0.00 -2.18 18649996.73 3 0.01 14.56 790001058.39 4 -0.05 -94.36 33464645835.46 5 0.32 614.54 1417571655440.93 Note that with eta = 1e-6 gradient descent is diverging! Try different values for eta and a much larger T, and see how small you can make MSE (optional). Compute Betas Instead of using gradient descent, we can compute the closed-form solution for the parameters directly. For ordinary least-squared in 1D, this is where and . This function returns the calculated betas and their corresponding MSE in a tuple, as (beta_0, beta_1, MSE). And no, I’m not going to tell you what the answers are. Do the math. Predict Ice Cover Using your closed-form betas, predict the number of ice days for a future year. Return that value. For example: >>> predict(2021) => 85.85 Food for thought (you don’t need to write this up or turn it in, but if you want to discuss it on Piazza please do): What’s the prediction for next year? five years from now? Which year will the predicted number of ice days become negative? What does this say about our model? Normalized Gradient Descent Can’t get your iterating gradient descent to match the closed-form solution? You’re not alone! The culprit is the scale of input x, compared to the implicit offset value 1 (think ). Gradient descent converges slowly when these scales differ greatly, a situation known as bad condition number in optimization. For this function, first normalize your x values (NOT the y values): then proceed exactly as in iterate_gradient(): >>> iterate_normalized(5, 0.1) 1 20.44 -1.85 7036.41 2 36.79 -3.33 4609.88 3 49.88 -4.52 3056.86 4 60.34 -5.47 2062.90 5 68.72 -6.23 1426.75 >>> iterate_normalized(5, 0.01) 1 2.04 -0.18 10410.73 2 4.05 -0.37 10010.20 3 6.01 -0.54 9625.53 4 7.93 -0.72 9256.08 5 9.82 -0.89 8901.27 With you should get convergence within 100 iterations. (Note that the betas are now for the normalized version of x, but you can translate them back to the original x with a little algebra. This isn’t required for the homework.) Stochastic Gradient Descent Now let’s have some fun with randomness and implement Stochastic Gradient Descent (SGD). With everything the same as part 8 (including normalization), modify the gradient as follows: in iteration t, randomly pick ONE of the n items (call it ), and approximate the gradient using only that item. Print the same information as in the previous function. For example (your results WILL differ because of randomness in the items selected): >>> sgd(5, 0.1) 1 17.60 14.00 7993.47 2 36.50 24.68 5760.88 3 51.86 8.92 3160.46 4 70.23 -17.22 1380.68 5 82.74 -6.49 682.61 With you should approximately converge within a few hundred iterations. Since n is small in our dataset, there is little advantage of SGD over standard gradient descent. However, on large datasets, SGD becomes much more desirable… Submission Please submit your code in a file called ice_cover.py. All code should be contained in functions or under a if __name__=="__main__": check so that it will not run if your code is imported to another program.
5/5 - (1 vote) Assignment Goals Implement hierarchical clustering Process real-world data of particular contemporary relevance Summary There is a lot of analysis happening with the various datasets for COVID-19 right now. One of the goals of these analyses is to help figure out which countries are “beating” the pandemic. Using the publicly available Johns Hopkins COVID-19 data, you’ll be performing clustering on time series data for different regions in the world. Each region is defined by a row in the data set, which can be a country, a province, etc. The time series data represents the number of (cumulative) confirmed cases on each day. Because different regions have different onset of the outbreak and differ in magnitude, it is often desirable to convert a raw time series into a shorter feature vector. For this assignment, you will represent a time series by two numbers: the “x” and “y” values in the above ten-hundred plot video. After each region becomes that two-dimensional feature vector, you will cluster all regions with HAC. Program Specification Download the data in CSV format: time_series_covid19_confirmed_global.csvLinks to an external site. Download time_series_covid19_confirmed_global.csvLinks to an external site. This is a snapshot of the data from the morning of April 2, 2020. If you wish to test your code on current data, the Johns Hopkins University git repoLinks to an external site. has constantly-updating data, but we will only be testing your code with the snapshot. Write the following Python functions: load_data(filepath) — takes in a string with a path to a CSV file formatted as in the link above, and returns the data (without the lat/long columns but retaining all other columns) in a single structure. calculate_x_y(time_series) — takes in one row from the data loaded from the previous function, calculates the corresponding x, y values for that region as specified in the video, and returns them in a single structure. Notes: The “n/10 day” is the latest day with LESS THAN OR EQUAL TO n/10 cases, similarly for the “n/100 day”. If “n/10 day” is day i, and today is day j, then x=j–i, not j-1+1. Some x or y can be NaN if the time series doesn’t contain enough growth. There is a link to Matlab code in the video description on YouTube, please consult that for the precise definition of x and y. hac(dataset) — performs single linkage hierarchical agglomerative clustering on the regions with the (x,y) feature representation, and returns a data structure representing the clustering. You may implement other helper functions as necessary, but these are the functions we will be testing. Load Data Read in the file specified in the argument (the DictReader from Python’s csv moduleLinks to an external site. will be of use) and return a list of dictionaries, where each row in the dataset is a dictionary with the column headers as keys and the row elements as values. These dictionaries should not include the lat/long columns, as we will not be using them in this program, but retain the province (possibly empty) and country columns so that data points can be uniquely identified. You may assume the file exists and is a properly formatted CSV. Calculate Feature Values This function takes in the data from a single row of the raw dataset as read in the previous function (i.e. a single dictionary, without the lat/long values but retaining all other columns). As explained in the video above, this function should return the x, y values in a tuple, formatted as (x, y). Perform HAC For this function, we would like you to mimic the behavior of SciPy’s HAC functionLinks to an external site., linkage(). You may not use this function in your implementation, but we strongly recommend using it to verify your results! Input: A collection of m observation vectors in n dimensions may be passed as an m by n array. All elements of the condensed distance matrix must be finite, i.e. no NaNs or infs. If you follow the Matlab code from the YouTube video description, you will occasionally have NaN values — such rows should be filtered out within this function and should not count toward your total number of regions. In our case, m is the number of regions and n is 2: the x and y features for each region. Using single linkage, perform the hierarchical agglomerative clustering algorithm as detailed on slide 19 of this presentationLinks to an external site.. Use a standard Euclidean distance function for calculating the distance between two points. Output: An (m-1) by 4 matrix Z. At the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster m + i. A cluster with an index less than m corresponds to one of the m original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster. That is: Number each of your starting data points from 0 to m-1. These are their original cluster numbers. Create an (m-1)x4 array or list. Iterate through the list row by row. For each row, determine which two clusters you will merge and put their numbers into the first and second elements of the row. The first point listed should be the smaller of the two cluster indexes. The single-linkage distance between the two clusters goes into the third element of the row. The total number of points in the cluster goes into the fourth element. If you merge a cluster containing more than one data point, its number (for the first or second element of the row) is given by m+the row index in which the cluster was created. Before returning the data structure, convert it into a NumPy matrix. If you follow these guidelines for input and output, your result should match the result of scipy.cluster.hierarchy.linkage() and you can use that function to verify your results. Be aware that this function does not contain code to filter NaN values, so this filtering should be performed before calling the function. Tie Breaking In the event that there are multiple pairs of points with equal distance for the next cluster: Given a set of pairs with equal distance {(xi, xj)} where i < j, we prefer the pair with the smallest first cluster index i. If there are still ties (xi, xj), … (xi, xk) where i is that smallest first index, we prefer the pair with the smallest second cluster index. Be aware that this tie breaking strategy may not produce identical results to scipy.cluster.hierarchy.linkage(). Challenge Options If you wish to continue exploring the data, we challenge you to implement a complete linkage option and compare the results with single linkage. You may also wish to use the results of your HAC algorithm to choose an appropriate k for k-means, and implement that clustering as well. Do you get the same clusters every time? If not, how do they differ? Is there anything meaningful there, do you think? Submission Please submit your code in a file called ten_hundred.py. All code should be contained in functions or under a if __name__=="__main__": check so that it will not run if your code is imported to another program.
5/5 - (1 vote) Assignment Goals Practice implementing a minimax algorithm Develop an internal state representation Summary In this assignment you’ll be developing an AI game player for a game called Teeko. As you’re probably awareLinks to an external site., there are certain kinds of games that computers are very good at, and others where even the best computers will routinely lose to the best human players. The class of games for which we can predict the best move from any given position (with enough computing power) are called Solved GamesLinks to an external site.. Teeko is an example of such a game, and this week you’ll be implementing a computer player for it. How to play Teeko Teeko is very simple: It is a game between two players on a 5×5 board. Each player has four markers of either red or black. Beginning with black, they take turns placing markers (the “drop phase”) until all markers are on the board, with the goal of getting four in a row horizontally, vertically, or diagonally, or in a 2×2 box as shown above. If after the drop phase neither player has won, they continue taking turns moving one marker at a time—to an adjacent space only!—until one player wins. Program Specification This week we’re providing a basic Python class and some driver code, and it’s up to you to finish it so that your player is actually intelligent. Here is our partially-implemented game: teeko_player.pyLinks to an external site. Download teeko_player.pyLinks to an external site.Links to an external site.Download Links to an external site. (If your computer doesn’t like downloading .py files, grab teeko_player.py.txtLinks to an external site. Download teeko_player.py.txtLinks to an external site. and remove the .txt extension.) If you run the game as it stands, you can play as a human player against a very stupid AI. This sample game currently works through the drop phase, and the AI player only plays randomly. First, familiarize yourself with the comments in the code. There are several TODOs that you will complete to make a more “intelligent” player. Make Move The make_move(state) method begins with the current state of the board. It is up to you to generate the subtree of depth d under this state, create a heuristic scoring function to evaluate the “leaves” at depth d (as you may not make it all the way to a terminal state by depth d so these may still be internal nodes) and propagate those scores back up to the current state, and select and return the best possible next move using the minimax algorithm. You may assume that your program is always the max player. Generate Successors Define a successor function (e.g. succ(state)) that takes in a board state and returns a list of the legal successors. During the drop phase, this simply means adding a new piece of the current player’s type to the board; during continued gameplay, this means moving any one of the current player’s pieces to an unoccupied location on the board, adjacent to that piece. Note: wrapping around the edge is NOT allowed when determining “adjacent” positions. Evaluate Successors Using game_value(state) as a starting point, create a function to score each of the successor states. A terminal state where your AI player wins should have the maximal positive score (1), and a terminal state where the opponent wins should have the minimal negative score (-1). Finish coding the diagonal and 2×2 box checks for game_value(state). Define a heuristic_game_value(state) function to evaluate non-terminal states. (You should call game_value(state) from this function to determine whether state is a terminal state before you start evaluating it heuristically.) This function should return some float value between 1 and -1. Implement Minimax Follow the pseudocode recursive functions on slide 14 of this presentationLinks to an external site., incorporating the depth cutoff to ensure you terminate in under 5 seconds. Define a Max_Value(state, depth) function where your first call will be Max_Value(curr_state, 0) and every subsequent recursive call will increase the value of depth. When the depth counter reaches your tested depth limit OR you find a terminal state, terminate the recursion. We recommend timing your make_move() method (use Python’s time libraryLinks to an external site.) to see how deep in the minimax tree you can explore in under five seconds. Time your function with different values for your depth and pick one that will safely terminate in under 5 seconds. Testing Your Code We will be testing your implementation of make_move() under the following criteria: Your AI must follow the rules of Teeko as described above, including drop phase and continued gameplay. Your AI must return its move as described in the comments, without modifying the current state. Your AI must select each move it makes in five seconds or less. Your AI must be able to beat a random player in 2 out of 3 matches. We will be timing your make_move() remotely on the CS linux machines, to be fair in terms of processing power.
5/5 - (1 vote) Summary In this assignment you’ll be tackling two situations: Probability simulation (code in a file called envelope_sim.py) Text classification (code in a file called classify.py) Please submit your files in a zip file called p4_.zip, where you replace with your netID (your wisc.edu login). We’ve discussed both of these problems in lecture, so hopefully you’ve got a bit of understanding of the problems behind them as you start in your implementation. Part 1: Probability Simulation We introduced the idea of conditional probability in class with a situation: choose between two envelopes, each containing two balls. If you select the envelope with only black, you get nothing; if you select the envelope with one red and one black, you get a payout. Once you select an envelope, you blindly draw one ball; it is black. Do you switch envelopes? The answer is yes; this is a variant of the classic Monty Hall problemLinks to an external site., which uses counter-intuitive conditional probabilities to trick unsuspecting people into worse odds. In class we worked through why you should switch according to probabilities, but let’s actually run through a simulation and see if our math was valid. For this part of the program, you must write two (2) Python functions: pick_envelope(switch, verbose) — this function expects two boolean parameter (whether you switch envelopes or not, and whether you want to see the printed explanation of the simulation) and returns True or False based on whether you selected the correct envelope run_simulation(n) — this function runs n simulations of envelope picking under both strategies (switch n times, don’t switch n times) and prints the percent of times the correct envelope was chosen for each Write other functions as necessary, but you must include these two. Pick Envelope We’ll leave the details of this implementation up to you, but be sure to go through the entire simulation process: Randomly distribute the three black/one red balls into two envelopes Randomly select one envelope Randomly select a ball from the envelope: if red, you picked the right envelope, return True if black, switch or don’t switch according to the value of the argument Determine whether you picked the payout envelope and return True if you did; False otherwise If the verbose parameter is set to False, this function should not display intermediate output and only return True or False. If the verbose parameter is set to True, format your printed output as follows: >>> pick_envelope(True, verbose=True) Envelope 0: b b Envelope 1: r b I picked envelope 0 and drew a b Switch to envelope 1 => True >>> pick_envelope(True, verbose=True) Envelope 0: b r Envelope 1: b b I picked envelope 0 and drew a b Switch to envelope 1 => False >>> pick_envelope(False, verbose=True) Envelope 0: r b Envelope 1: b b I picked envelope 1 and drew a b => False >>> pick_envelope(False, verbose=True) Envelope 0: b b Envelope 1: b r I picked envelope 1 and drew a r => True Run Simulation Given a value for n, run the simulation you just wrote n times under each strategy (switch/don’t) with verbose set to False (please). Track how many times you choose the correct envelope, and print the results to the console: >>> run_simulation(10000) After 10000 simulations: Switch successful: 75.25 % No-switch successful: 50.160000000000004 % Worth thinking about: why are these numbers roughly 75/50% and not 66/33% as proved in class? What did we change about our setup and/or assumptions? OPTIONAL: String Formatting in Python If you feel like being fancy and forcing decimal-place precision: >>> run_simulation(1000) After 1000 simulations: Switch successful: 75.80% No-switch successful: 48.90% You can use Python’s .format() function as follows: "this is a string {:.2%} of the time".format(1) which outputs 'this is a string 100.00% of the time' You do not need to return anything from this function. Note that your results will not (and should not!) exactly match ours here due to random variation. Part 2: Document Classification This next part is where things will get interesting: we’ll be reading in a corpus (a collection of documents) with two possible true labels and training a classifier to determine which label a query document is more likely to have. Here’s the twist: the corpus is created from your essays about AI100 and the essays on the same topic from 2016, and based on training data from each, you’ll be predicting whether an essay was written in 2020 or 2016. (Your classifier will probably be bad at this! It’s okay, we’re looking for a very subtle difference here.) You will need: corpus.tar.gzLinks to an external site. Download corpus.tar.gzLinks to an external site. For this program, you must write at least seven (7) Python functions: train(training_directory, cutoff) — loads the training data, estimates the prior distribution P(label) and class conditional distributions , return the trained model create_vocabulary(training_directory, cutoff) — create and return a vocabulary as a list of word types with counts >= cutoff in the training directory create_bow(vocab, filepath) — create and return a bag of words Python dictionary from a single document load_training_data(vocab, directory) — create and return training set (bag of words Python dictionary + label) from the files in a training directory prior(training_data, label_list) — given a training set, estimate and return the prior probability p(label) of each label p_word_given_label(vocab, training_data, label) — given a training set and a vocabulary, estimate and return the class conditional distribution over all words for the given label using smoothing classify(model, filepath) — given a trained model, predict the label for the test document (see below for implementation details including return value) this high-level function should also use create_bow(vocab, filepath) Your submitted code should not produce any additional printed output. Our Toy Example For some of the smaller helper functions, we’ll be using a very simple version of a training data directory. The top level directory is called EasyFiles/, and it contains two subdirectories (like your actual training data directory will), called 2020/ and 2016/. EasyFiles/2016/ contains two files, 0.txt (hello world) and 1.txt (a dog chases a cat.), and EasyFiles/2020/ contains one file, 2.txt (it is february 19, 2020.). Each of these files has been pre-processed like the corpus, so all words are in lower case and all tokens (including punctuation) are on different lines: it is february 19 , 2020 . You may wish to create a similar directory structure on your computer. Create Vocabulary Our training directory structure as provided in the corpus is very intentional: under training/ are two subdirectories, 2016/ and 2020/, each of which function as the labels for the files they contain. We’ve pre-processed the corpus, so that every line of the provided files contains a single token. To create the vocabulary for your classifier, traverse BOTH of these subdirectories under training/ (note: do not include test/) and count the number of times a word type appears in any file in either directory. As a design choice, we will exclude any word types which appear at a frequency strictly less than the cutoff argument (cutoff = 1 means retain all word types you encounter). Return a sorted list of these word types. >>> create_vocabulary('./EasyFiles/', 1) => [',', '.', '19', '2020', 'a', 'cat', 'chases', 'dog', 'february', 'hello', 'is', 'it', 'world'] >>> create_vocabulary('./EasyFiles/', 2) => ['.', 'a'] Create Bag of Words This function takes a path to a text file (assume valid, one token per line), reads the file in, creates a bag-of-words representation based on the vocabulary, and returns the bag-of-words in dictionary format. Give all counts of word types not in the vocabulary to OOV (see below). >>> vocab = create_vocabulary('./EasyFiles/', 1) >>> create_bow(vocab, './EasyFiles/2016/1.txt') => {'a': 2, 'dog': 1, 'chases': 1, 'cat': 1, '.': 1} >>> create_bow(vocab, './EasyFiles/2020/2.txt') => {'it': 1, 'is': 1, 'february': 1, '19': 1, ',': 1, '2020': 1, '.': 1} If you encounter a word type which does not appear in the provided vocabulary, add the non-string value None as a special key to represent OOV. Collect counts for any such OOV words. >>> vocab = create_vocabulary('./EasyFiles/', 2) >>> create_bow(vocab, './EasyFiles/2016/1.txt') => {'a': 2, '.': 1, None: 3} Load Training Data Once you can create a bag-of-words representation for a single text document, load the entire contents of the training directory into such Python dictionaries, label them with their corresponding subdirectory label (‘2016’ or ‘2020’) and return them in a list of length n=number of training documents. Python’s os moduleLinks to an external site. will be helpful here, particularly its listdir() function. >>> vocab = create_vocabulary('./EasyFiles/', 1) >>> load_training_data(vocab,'./EasyFiles/') => [{'label': '2016', 'bow':{'a': 2, 'dog': 1, 'chases': 1, 'cat': 1, '.': 1}}, {'label': '2016', 'bow':{'hello': 1, 'world': 1}}, {'label': '2020', 'bow':{'it': 1, 'is': 1, 'february': 1, '19': 1, ',': 1, '2020': 1, '.': 1}}] The dictionaries in this list do not need to be in any particular order. You may assume that the directory string will include a trailing ‘/’ as shown here. NOTE: All subsequent functions which have a training_data parameter should expect it to be in the format of the output of this function, a list of two-element dictionaries with a label and a bag-of-words. Prior Log Probability This method should return the log probability of the labels in the training set In order to calculate these, you will need to count the number of documents with each label in the training data, found in the training/ subdirectory. >>> vocab = create_vocabulary('./corpus/training/', 2) >>> training_data = load_training_data(vocab,'./corpus/training/') >>> prior(training_data, ['2020', '2016']) => {'2020': -0.31939049933692143, '2016': -1.2967892172518587} Because we usually have enough training documents in each class, we do not use add-1 smoothing here; instead we use the maximum likelihood estimate (MLE). Note that the return values are the natural log of the probability. In a Naive Bayes implementation, we must contend with the possibility of underflow: this can occur when we take the product of very small floating point values. As such, all our probabilities in this program will be log probabilities, to avoid this issue. Log Probability of a word, Given a label This function returns a list consisting of the log conditional probability of all word types in a vocabulary (plus OOV) given a particular class label, log . To compute this probability, you must use add-1 smoothing, rather than the MLE (this is different from the prior) to avoid zero probability. >>> vocab = create_vocabulary('./EasyFiles/', 1) >>> training_data = load_training_data(vocab, './EasyFiles/') >>> p_word_given_label(vocab, training_data, '2020') => {'a': -3.04, 'dog': -3.04, 'chases': -3.04, 'cat': -3.04, '.': -2.35, 'hello': -3.04, 'world': -3.04, 'it': -2.35, 'is': -2.35, 'february': -2.35, '19': -2.35, ',': -2.35, '2020': -2.35, None: -3.04} >>> p_word_given_label(vocab, training_data, '2016') => {'a': -1.99, 'dog': -2.4, 'chases': -2.4, 'cat': -2.4, '.': -2.4, 'hello': -2.4, 'world': -2.4, 'it': -3.09, 'is': -3.09, 'february': -3.09, '19': -3.09, ',': -3.09, '2020': -3.09, None: -3.09} (I’ve rounded the float values to two decimal places for readability here; in reality they are much longer.) Note: In this simple case, we have no words in our training set which are out-of-vocabulary. With the cutoff of 2 in the real set, we will see a number of words in the training set which are still out-of-vocabulary and map to None. These counts should be used when calculating , and the existence of an “out-of-vocabulary” word type should be used when calculating all probabilities. Train Given the location of the training directory and a cutoff value for the training set vocabulary, use the previous set of helper functions to create the following trained model structure in a Python dictionary: { 'vocabulary': , 'log prior': , 'log p(w|y=2016)': , 'log p(w|y=2020)': } For the EasyFiles data and a cutoff of 2, this would give (formatted for readability): >>> train('./EasyFiles/', 2) => { 'vocabulary': ['.', 'a'], 'log prior': {'2016': -0.41, '2020': -1.10}, 'log p(w|y=2016)': {'a': -1.30, '.': -1.70, None: -0.61}, 'log p(w|y=2020)': {'a': -2.30, '.': -1.61, None: -0.36} } The values for None are so high in this case because the majority of our training words are below the cutoff and are therefore out-of-vocabulary. Classify Given a trained model, this function will analyze a single test document and give its prediction as to the label for the document. The return value for the function must have the Python dictionary format { 'predicted y': , 'log p(y=2016|x)': , 'log p(y=2020|x)': } Recall that the label for a test document x is the argmax of the following estimate, as defined in lecture: (Recall: use log probabilities! When you take the log of a product, you should sum the logs of the operands.) >>> model = train('./corpus/training/', 2) >>> classify(model, './corpus/test/2016/0.txt') => {'log p(y=2020|x)': -3906.35, 'log p(y=2016|x)': -3916.46, 'predicted y': '2020'} The documents in the test/ subdirectory are for testing your classifier. How many are classified correctly??