Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Community Media VCC390HS Matlab

Community & Media (VCC390HS) Final Project & Presentation:  (35%) Project Presentation (5%) Due: November 22-November29 Final project Submission (30%) Due: November 25 I understand that deadlines can be tough, so please consult with meat least a week in advance if you will have difficulties submitting within this time frame. Presentations are firm in dates Submission Instructions: Please submit to Quercus. Length: Digital Humanities projects with works cited or endnotes are included. Document Specifications: submit a webpage or a link that combine all digital methods or methodologies discussed in class, photovoice (photography or video),Digital Story Telling (Good well scripted Short Video) Oral History (Good well scripted podcast, audio, interview), Scroll telling, Padlet mapping Final project description: By now the class should be able to identify their own self-identity and also their own group member interests, identities, history of immigration and cultures. After this being achieved, helping others means creating and building a community of strangers to belong and feel inclusive is your turn to make it happen!! Ready, Steady, GOOOO! Project choices Project Choice 1: Launch a UTM Community Organization for International Students Submission Format: Web Link Project Overview Your goal is to launch a new UTM community organization that helps international students transition from their liminal space to a more harmonic space. This organization should be inclusive and supportive of all multicultural international students on campus. Choose an existing UTM student union or campus organization as inspiration for your own group. Create a webpage for this organization, including the following elements: 1.   Homepage •   An "About" section introducing your VCC390 community group and the founders of this initiative. •   A "Members" section featuring a diverse selection of students who are new members. Include goals for an online collection, ranging from digitizing videos to migrating digital files, such as photovoice, podcasts, and digital storytelling edits. Note: Be considerate when asking students to share their personal histories, narratives, or content as part of a class assignment. 2. Theme Develop a theme that reflects a commonly shared narrative among members of this innititive. Write a “Vision” statement on the homepage, with an introduction that provides a historical, political, or social overview, incorporating two class concepts and two readings. Address questions such as: (Notes will be written in shared documents explained in the startkit) What is the purpose of this union or organization? Who benefits from the organization’s digital initiatives? What are possible future goals to foster solidarity among members? Reflect on your position -positionality- as an insider vs. outsider and participant vs. objective observer. Be creative in considering how this project could offer long-term benefits to the organization. 3. Theoretical Framework Use 1–2 course readings to identify relevant concepts and theoretical frameworks that support your project. Draw from class activities throughout the term to guide your creative process. Assignment Specifications •     Word Limit: Discuss with your group to determine an appropriate script. length, e.g., 400 words per photo/video. •     Citation Style. Follow Chicago citation guidelines. •     Platform. Use a webpage builder of your choice, such as Wix, WordPress, or Weebly. Additional Guidelines • Opening Line: First impressions matter, so craft a compelling opening. • Members: 4–5 members in total. • Collaboration: The number of members in the organization should correspond to the number of students in your group. Each student will produce a digital narrative for one member (e.g., if there are 4 students in the group, the organization should have 4 members). Method Flexibility You may mix different storytelling methods (e.g., digital storytelling, scrollytelling, photovoice) as long as they align with the course theme and support the organization’spurpose. Project Choice 2: Critical Investigation of an Existing Organization Digital Support for Canadian Immigration Organizations Overview Research websites of North American immigration organizations and design 4 digital project that meets their needs. Select a Canadian immigration organization or a campus organization for newcomers that could benefit from digital support. This project will involve observing, analyzing, and creating. 1. Observe Examine the organization’s vision, mission, and purpose. Consider why it was created, who its members are (e.g., a specific ethnic group or a diverse cultural community), and assess any strengths and areas for improvement. Reflect on whether your group could assist the organization in better achieving its   goals, particularly if it archives immigrant stories. Note if there are opportunities to enhance the representation and promotion of these stories. 2.   Analyze Identify potential opportunities to help the organization achieve online collection goals, such as digitizing videos, migrating digital files, creating photovoice projects, producing podcasts, and editing digital storytelling materials. Be mindful of asking students or participants to share personal histories, narratives, or content within the context of this class assignment. 3. Create Develop public-facing digital materials that align with the class approach and the organization’sneeds. Choose a theme for the organization that resonates with commonly expressed student perspectives. On the homepage, write a “vision” statement for the organization that includes a historical, political, or social overview. Integrate two class concepts and two readings in your introduction to answer questions such as: What is the purpose of this organization? Who benefits from its digital initiatives? What are possible future goals to foster solidarity among members? Assignment Specifications • Word Limit: Discuss with your group to determine the length needed for each section, e.g., 400–600 words per photo or video. • Citation Style: Follow Chicago style citation guidelines. • Platform. Choose a suitable platform. to showcase your project, such as Wix, WordPress, or Weebly. Additional Guidelines Each student should contribute 4–5 ideas, with each student responsible for meeting one specific need of the organization. Possible Themes/concepts- Chosen theme should be a common idea emergedfrom all members invited to the digital organization not random according to your interest always remember who’s benefiting (not you apparently!!) The Proximity of Stranger Bodies to Campus -Sarah Ahmed How does multiculturalism reinvent ‘the nation’ over the bodies of strangers? Typical Canadian and Real Canadian – Sarah Ahmed The In-Betweens Moving Between Cultures - Where are you from? Ying’s critical autoethnography tells the story of how she seeks to more fully understand where she is from. She also wonders, as a Chinese woman living in Aotearoa, if there is anywhere, she can call home (Ying Wang, 2023) By using Ying’s critical ethnography arts-based inquiry framework, develop the members own arts-based journey of exploration into the concept of in-betweenness, as it occurs within the process of moving between roots of culture and adopted culture. (No Illustrations, more of the digital methods mentioned) New Racism beyond Skin Color (Culture Racism)- Sylvia Ang Bahirand Ghar – Aparna Singh The outer/inner domain corresponded to the division of the home and the world—ghar and bahir. Whereas the world was the domain of the men who had to imitate the scientific and technological advance of the West and its rational and “modern methods of statecraft,” the home was the truly Indian domain where women preserved the “self-identity of national culture.” Sengupta 2022 Valorization and Invisibilization - Christoph Sohn Emerging shared regional identities Q/is it possible to combine concepts A/ OFFCOURSE YESSSSS DH Final Project and Presentation startkit 1.   Setup a Shared Folder on Google Docs or Microsoft One Drive. You’ll share this folder with me same as you did for assignment 3 2.   Start a New Document where you’ll keep notes on your meetings for the Project 3.   Begin by establishing who will perform. separate lead roles for the exhibition project. See the roles on the other side of the page. Note that these lead roles will still require that everyone participates; the lead role will simply betaking the initiative to get the task started. a.   Some will perform. multiple roles, so try to distribute the tasks evenly and fairly. In discussion with your group, you can always change roles later if the distribution becomes unexpectedly uneven. b.   Be honest about what you can handle, but don’t try to avoid work. Offer suggestions about how you think you can best contribute. c.   Conversely, be sympathetic and understanding about others’ abilities and schedules. Be encouraging, respectful, and inclusive. Come see me if you need adivce about handling group dynamics. 4.   Take a few minutes to share which choice the group will proceed with. 5.   Decide together what ideas you like best for community organization theme to go in which digital methodology, methods and platform. will be used. 6. Divide and Allocate Writing Tasks and scenarios a. Decide who respectively will write the scenarios and take notes of how this project should come together. b. Decide who will do the research, or everyone will work together to find members for opt 1 or even will search for organizations in relation for opt 2. 7.   Once you’ve decided on roles, rough out a reasonable set of deadlines or benchmarks for different stages of the project and who will present. Lead Roles (see step 4 on the other side of the page) • Meeting notetaker: for every meeting, including this initial meeting, record every topic that is discussed and any decisions that are made about tasks and duties. Hope you’re writing everything down, starting now! • Meeting/message coordinator: this is the point person for communication. This person will be the one who sends out messages to coordinate meetings or tasks and to ensure that everyone is up to date on what is happening. o Briefly discuss reasonable guidelines for communication that everyone can work with. Preferred method of communication: email, WhatsApp, text message, etc.? Make sure that you keep a record of these messages in case I need to see them to assess participation. Assignment 3 was a small scale of working in a group lets see the bigger scale of final project How quickly should the group expect a response from each member? o Briefly discuss general availability for Zoom or in-person meetings Note: Consider setting up a Doodle poll or some other online scheduler to figure out the best time for everyone to meet • Prof. Liaison: This person will be responsible for reaching out with any questions the group has or to schedule a meeting with them. Note: as a group you can decide whether to send a single representative to the meeting or whether some or all of you attend. • Submitter: this person will be responsible for ensuring that every requirement has been met and will upload the necessary document to Quercus on the group’sbehalf. Note: only ONE person in the group will submit the project. November 22 and November 29 are the dates for presenting and pitching ideas in front of the class. Agree on who will be presenting, and keep it to a strict 2-minute talk, as the class has 63 students and time is limited- Please take pictures within the process, pictures or screenshot of group working together Presentation Description: In a 2-minute pitch, present your own project idea by addressing the following questions in a way that ensures the class understands its essence. 1.   Which project have you chosen and why does it interest you? 2.   What theme emerged and concepts you planned to incorporate to align with the outcomes of this project? 3.   How would the chosen community benefit from this project? 4.   Show pictures of your group working collaboratively Reflection Paper – 20% Due Dec 6 Learners individually are required to write a short reflection paper (700-900 words) that addresses the following criteria with reference to the relative scholarly readings: ●   Brief description of your final project ●   How much effort did you put into working with your group? What initiative did you take to get communication going? (Provide examples, such as your contributions to emails, discussions, or scheduling) ●   What you learned from the project and from the course? ●   The outcome of your project. It is pertinent to discuss the value of a success and/or a productive failure ●   Did the mini-assignments and class activities allow you to get different perceptions of yourself and represent lived experiences as a student and part of a community? ●   Did the project come together in the way that you expected? ●   If not, what did you learn from this experience? ●   What is the contribution of your project to the community? ●   What challenges do you find pertaining to this course? How does this course change your perceptions in life? ●   What would you do differently in the future? ●   What do you wish this course has or to add by next time?

$25.00 View

[SOLVED] Econ7810 Econometrics for Economic Analysis Fall 2024 Homework 3 C/C

Econ7810: Econometrics for Economic Analysis, Fall 2024 Homework #3 Due date:  30 Nov.  2024; 1pm. Do not copy and paste the answers from your classmates. Two identical homework  will be treated as cheating. Do not  copy and paste the entire output of your statistical package's. Report only the relevant part of the output.  Please also submit your R-script. for the empirical part. Please put all your work in one single file and upload via Moodle. Part I Multiple Choice (3 points each, 24 points in total) Please choose the answer that you think is appropriate. 1.1 The interpretation of the slope coe   cient in the model Yi  = β0  + β1 ln(Xi ) = ui  is as follows: a.  1% change in X is associated with a β1 % change in Y. b.  1% change in X is associated with a change in Y of 0.01β1. c. change in X by one unit is associated with a 100β1 % change in Y. d. change in X by one unit is associated with a β1  change in Y. 1.2 In the regression model Yi  = β0 +β1Xi +β2 Di +β3 (Xi ×Di )+ui , where X is a continuous variable and D is a binary variable, to test that the two regressions are identical, you must use the a. t-statistic separately for β2  = 0, β3  = 0. b. F-statistic for the joint hypothesis that β0  = 0, β1  = 0. c. t-statistic separately for β3  = 0. d. F-statistic for the joint hypothesis that β2  = 0, β3  = 0 . 1.3 If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then a. a series of t-tests may or may not give you the same conclusion. b. the regression is always signi  cant. c. all of the hypotheses are always simultaneously rejected. d. the F-statistic must be negative. 1.4 You have estimated the following equation: TestScore = 607.3 + 3.85Income - 0.0423Income2 where TestScore is the average of the reading and math scores on the Stanford 9 stan- dardized test administered to 5th grade students in 420 California school districts in 1998 and 1999. Income is the average annual per capita income in the school district, measured in thousands of 1998 dollars. The equation a. suggests a positive relationship between test scores and income for most of the sample. b. is positive until a value of Income of 610.81. c. does not make much sense since the square of income is entered. d. suggests a positive relationship between test scores and income for all of the sample. 1.5 The linear probability model is a. the application of the multiple regression model with a continuous left-hand side variable and a binary variable as at least one of the regressors. b. an example of probit estimation. c. another word for logit estimation. d. the application of the linear multiple regression model to a binary dependent variable. 1.6 The probit model a. is the same as the logit model. b.  always gives the same fit for the predicted values as the linear probability model for values between 0.1 and 0.9. c. forces the predicted values to lie between 0 and 1. d. should not be used since it is too complicated. 1.7 In the logit model Pr(Y = 1jX1 , X2, ...Xk ) = F (β0  + β1X1  + β2X2 + ... + βkXk ), a. the β 's do not have a simple interpretation. b. the slopes tell you the e  ect of a unit increase in X on the probability of Y. c. β0  cannot be negative since probabilities have to lie between 0 and 1. d. β0  is the probability of observing Y when all X's are 0. 1.8 Your textbook plots the estimated regression function produced by the probit regression of deny on P/I ratio.  The estimated probit regression function has a stretched  S   shape given that the coe   cient on the P/I ratio is positive.  Consider a probit regression function with a negative coe   cient.  The shape would a. resemble an inverted  S   shape (for low values of X, the predicted probability of Y would approach 1) b. not exist since probabilities cannot be negative c. remain the  S  shape as with a positive slope coe   cient d. would have to be estimated with a logit function Part II Short Questions (36 points in total) Please limit your answer to less than or equal to 5 lines per sub-question. (8 points) 2.1 Dr.  Qin would like to analyze the  Return to Education and the Gender Gap .   The equation below shows the regression result using the 2005 Current Population Survey.   lnEearnings  refer to the logarithem of the monthly earnings;  educ  refers to the year of education; DFemme is a dummy variable, if the individual is female, =1; exper is the working experience, measured by year; Midwest, South and West are dummy variables indicating the residence regions, while Northeast is the ommited region. Interpret the major results(discuss the estimates for all variables and also address the question that Dr.  Qin wants to analyze. LnEar(^)nings   =   1.215 + 0.0899 × educ — 0.521 × DFemme + 0.0180 × (DFemme × educ) (0.018)  (0.0011)             (0.022)                (0.0016) +0.0232 × exper — 0.000368 × exper2  — 0.058 × Midwest — 0.0078 × South (0.0008)             (0.000018)                      (0.006)                  (0.006) —0.030 × West (0.006) - n = 57, 863 R2  = 0.242 (15 points) 2.2 Sports economics typically looks at winning percentages of sports teams as one of various outputs, and estimates production functions by analyzing the relationship be- tween the winning percentage and inputs. In Major League Baseball (MLB),the determinants of winning are quality pitching and batting.  All 100 MLB teams for the 1999 season.  Pitching quality is approximated by  Team Earned Run Average   (teamera),  and hitting quality by On Base Plus Slugging Percentage   (ops). Your regression output is: Winpct   =    —0.19 — 0.099 × teamera + 1.49 × ops,  R2  = 0.92 (0.08)   (0.008)               (0.126) (a) (5 points) Interpret the regression. Are the results statistically signi  cant and impor- tant? (b) (8 points) There are two leagues in MLB, the American League(AL) and the National League (NL). One major di  erence is that the pitcher in the AL does not have to bat. Instead there is a  designatedhitter  in the hitting line-up. You are concerned that, as a result, there is a di  erent e  ect of pitching and hitting in the AL from the NL. To test this Hypothesis, you allow the AL regression to have a di  erent intercept and di  erent slopes from the NL regression. You therefore create a binary variable for the American League (DAL) and estiamte the following speci  cation: Winpct   =    —0.29 + 0.10 × DAL — 0.100 × teamera + 0.008 × (DAL × teamera) (0.12)  (0.24)            (0.008)                (0.018) +1.622 * ops — 0.187 * (DAL × ops) (0.163)           (0.160)     R2  = 0.92 How should you interpret the winning percentage for AL and NL? Can you tell the di  erent e  ect of pitching and hitting between AL and NL? If so, how much? (2 points) (c) You remember that sequentially testing the signi  cance of slope coe   cients is not the same as testing for their signi  cance simultaneously.  Hence you ask your regression package to calculate the F-statistic that all three coe   cients involving the binary variable for the AL are zero. Your regression package gives a value of 0.35.  Looking at the critical value from the F-table, can you reject the null hypothesis at the 1% level? (13 points) 2.3 A study analyzed the probability of Major League Baseball (MLB) players to  survive  for another season, or, in other words, to play one more season. The researchers had a sample of 4,728 hitters and 3,803 pitchers for the years 1901-1999.  All explanatory variables are standardized. The probit estimation yielded the results as shown in the table: Regression (1) Hitters (2) Pitchers Regression model probit probit constant 2.010  (0.030) 1.625  (0.031) number of seasons played -0.058  (0.004) -0.031  (0.005) performance 0.794  (0.025) 0.677  (0.026) average performance 0.022  (0.033) 0.100  (0.036) where the limited dependent variable takes on a value of one if the player had one more season (being survival) (a minimum of 50 at bats or 25 innings pitched), number of seasons played is measured in years, performance is the batting average for hitters and the earned run average for pitchers, and average performance refers to performance over the career.  (Note that all variables are standardized, so that the mean is zero, and the variance is 1 ) ( 4 points) (a) Interpret the two probit equations and calculate survival probabilities for hitters and pitchers at the sample mean. ( 4 points) (b) Calculate the change in the survival probability for a player who has a very bad year by performing two standard deviations below the average (assume also that this player has been in the majors for many years so that his average performance is hardly a  ected).  How does this change the survival probability when compared to the answer in (a)? (5 points) (c) Since the results seem similar, the researcher could consider combining the two samples.   Explain in some detail how this could be done  and how you could test the hypothesis that the coe   cients are the same. Part III Empirical part (40 points in total) Please limit your answer to less than or equal to 10 lines per sub-question.  PLEASE REPORT YOUR REGRESSION OUTCOMES IN TABLES, NOT SCREENSHORTS. (20 points) 3.1 Please use VOTE2016.dta to answer the following questions.  The following model can be used to study whether campaign expenditures a  ect election outcomes: voteA = β0  + β1 log(expendA) + β2 log(expendB) + u_(1) voteA = β0 + β1 log(expendA) + β2 log(expendB) + β3prtystrA + u (2) where voteA is the percentage of the vote received by Candidate A, expendA and expendB are campaign expenditures (in 1000 dollars) by Candidates A and B, and prtystrA is a measure of party strength for Candidate A (the percentage of the most recent presidential vote that went to A's party). (6 points)(a) Please run the regression  (1)  and report your result in  a table.   Do  A's expenditure a  ect the outcome and how?  What about B's expenditure?  (Hint:  you need to rst creat the variables ln(expendA) and ln(expendB) (8 points)(b) Please run the regression (2) and report your result in the same table.  Do A's expenditure a  ect the outcome and how?  What about B's expenditure?  Compare result from (a) and (b), explain whether we should include prtystrA in the regression or not. If we exclude it, to which direction the coe   cient of interest tend to be biased towards? (6 points) (c) Can you tell whether a 1% increase in A's expenditures is o  set by  a 1% increase in B's expenditure?  How?  Please suggest a regression or test and then answer the question according to your result. (20 points) 3.2 Please download the data jtrain2.dta from Moodle and answer the following questions.   There  is  a job  training experiment for  a group of men.   Men  could  enter  the program starting in January 1976 through about mid-1977. The program ended in December 1977.  A study tried to test whether participation in the job training program had an e  ect on unemployment probabilities and earnings in 1978.  (Note:  for each sub-question ((a), (b), (c), (d)), the answer should not be longer than 8 lines.) Here is the description of the related variables: unem78: =1, if unemployed in 1978; 0, otherwise. train: the job training indicator. =1, trained; 0, otherwise. unem74: =1, if unemployed in 1974; 0, otherwise. unem75: =1, if unemployed in 1975; 0, otherwise. age : age in 1977 educ: years of education white: =1, the individual is white; 0, otherwise. married; =1, if married; 0, otherwise. ( 3 points) (a) Please run a linear probability model of unem78 on train, unem74, unem75, age, educ, white, and married. Interpret the results. (8 points) (b) Run a probit and a logit regression of unem78 on train, unem74, unem75, age, educ, white, and married.  Report the results.  Can you compare the coe   cients of train in the two models to conclude which model gives out bigger e  ect of train?  Explain. (6 points) (c) What is the probability for a single white, aged 25, with 12 year education, being unemployed in both year 1974 and 1975 to be unemployed in 1978 if she/he attended the training program?  What is the probability if this individual did not attend the training program? (3 points) (d) Does the training program seem to help a 30 year old individual who has 16 year education? If so, what's the e  ect?  Explain.

$25.00 View

[SOLVED] ENGF0004 Mathematical Modelling and Analysis II 2024/2025 Coursework 1 R

ENGF0004 Mathematical Modelling and Analysis II 2024/2025 Coursework 1 Release Date: 18 November 2024 Submission Deadline: 6 January 2025, 2 pm UK time Estimated Coursework Return: 4 working weeks after deadline Topics Covered: Topics 1 - 4 Expected Time on Task: 20 hours Guidelines: Failure to follow this guidance might result in a penalty of up to 10% on your marks. I.      Submit a single PDF document with questions in ascending order. This can be produced for example in Word, LaTeX or MATLAB Live Script. Explain in detail your reasoning for every mathematical step taken. Include units for final answers where possible. II. Do not write down your name, or student number, or any information that might help identify you in any part of the coursework. Do not write your name or student number in the title of your coursework  document  file. Do not copy  and  paste  the  coursework  questions  into  your submission - simply rewrite information where necessary for the sake of your argument. III.      Insert  relevant  graphs  or figures, and describe any figures or tables  in your document. All figures must be labelled, with their axes showing relevant parameters and units. IV.      You will need MATLAB coding to solve some questions. Include all code as pasted text in an Appendix at the end of your document. Remember to comment on your code, explaining your steps. This coursework is worth a total of 100 marks and counts towards 20% of your final ENGF0004 grade. Coversheet Please complete this coversheet to declare you have followed good academic practice. Please tick the boxes that apply. 1. I have carefully read and understood Section 9 of the academic manual: Student Academic Misconduct Procedure https://www.ucl.ac.uk/academic-manual/chapters/chapter-6-student-casework-framework/section-9-student-academic-misconduct-procedure. Yes No 2. I have made sure to correctly reference external resources incl. any teaching material. Yes No 3. Have you used any AI tools to support your assignment? Yes No 4. If so, which one(s)? Answer: 5. I have made sure to correctly reference the use of any AI tools in the entirety of my report. Yes No Declaration I declare that all the information provided is true and understand that failure to comply with section 9 of the academic manual may result in penalties as outlined. to sign> 5x(t)c12×10 kg/s damping coefficient of the bridge 5r(t)Gr(t)Nf1

$25.00 View

[SOLVED] FPST 4333 System Process Safety Analysis Lab Bow Tie Analysis Statistics

FPST 4333 System & Process Safety Analysis Lab Bow Tie Analysis Bow-Tie Analysis of a Natural Gas Pipeline Group Assignment (Project Group) The pipeline shown in Figure 1 below transports natural gas from GoPokes Refinery to Gas Distribution Company (GDC). Outside the boundaries of the enterprises, the pipeline is grounded and covers an industrial region, passing through a few residential areas. It has a length of approximately 2.25 miles, a nominal diameter of 16″, with an operating pressure of 369.8 psi. The pipe material is carbon steel and is constructed in accordance with the American Petroleum Institute (API) standard. 1 Simplified Process Flow diagram of natural gas pipeline The main dangers inherent in the natural gas flowing in the pipeline are associated with the high flammability of the methane, which could cause fires or explosions. To avoid corrosion by the soil on the outer surface of the pipeline, an anticorrosive coating using triple-layer polypropylene was added. To protect the pipeline against electrochemical corrosion due to possible leakage currents present in the region, the pipeline has a cathodic protection system for all its buried sections. The Control Center of the pipeline is located at the refinery. The control system has a local indicator and transmitter for pressure and temperature. Flow metering and shutdown valves are located inside the refinery. The control system is responsible for obtaining the information emitted by pressure and temperature transmitters and flow meters and for transmitting signals to the activators of the shutdown valves that have On–Off capability. Instruments and cables are connected to a programmable logic controller (PLC), whose information is sent to a digitally distributed control system (DCS). Operators are responsible for visual inspection of the condition of the valves in the field which includes checking whether valves are open or closed, using specific operational checklists and confirming that the observation data aligns with the control room analytics. If pipeline pressure decreases enough to reach the low-pressure alarm set point, an alarm will sound in the refinery control room. The gas transfer operation can be remotely interrupted by activating the shutdown valves located inside the refinery proper. If the shutdown valves fail to close by remote activation, operators will be required to close the valves manually. To prevent external interference, the pipeline route is signposted by signs and standardized landmarks, but these identifiers are only visible in the daytime. Periodic preventive maintenance is performed to ensure reliability of the cathodic protection system, shutdown valves, pressure and temperature transmitters, flow meters, firefighting system equipment and pipeline signaling. However, the preventive maintenance plan is outdated. The refinery has personnel assigned to daily observation patrols along the pipeline to the Gas Distribution Company. There is no direct communication between the patrol workforce and local emergency services. Patrollers would have to radio back to the control room to initiate the emergency response protocol. Inspections are conducted through visual observation along the pipeline route, seeking anomalies. An annual inspection is performed to evaluate the state of the coating and external corrosion of the pipeline. There is no direct communication between inspection staff and the control room. Inspectors generate and submit a report to the operations supervisor who then would initiate any needed maintenance work requests. The refinery and the Gas Distribution Company have the same resources for emergency action registered on the Emergency Response Plan. The refinery is responsible for pipeline integrity and emergency response along the pipeline route, except for the section that is part of the Gas Distribution Company, which is managed by GDC. The Emergency Response Plan has specific procedures for each accidental event. These emergency control procedures establish a set of actions that include a Mutual Assistance Plan between companies in the industrial area and specific actions for natural gas release. A PHL/PHA was performed and the main risks to the natural gas pipeline were identified that could lead to hazardous release. The PHL/PHA did not consider intentional human actions such as terrorism or vandalism, nor occupational hazards such as slips and falls. Main risks to the system included rupture due to internal or external corrosion, external interferences such as excavations or utility strikes, damage from a lightning or ground motion from natural disasters such as flooding, or major earthquake. An additional potential threat could be rupture due to overpressure by procedural errors such as improper closure of a valve. The consequences of the risks identified have the potential to generate damage to people, assets, the environment, and loss of natural gas supply to customers. Lab Exercise – Bow Tie Analysis Develop a Bow Tie Diagram for the top event of a large leak of the natural gas pipeline. The Bow Tie Analysis should include: • Identification of the threats that could trigger the hazard leading to the top event; • Identification of consequences of the top event. • Identification of preventive barriers that prevent or decrease the frequency of a top event. • identification of mitigation barriers that limit the consequent effects. • Identification of degradation factors capable of increasing the likelihood of failure of preventive or mitigation barriers. • Identification and classification of existing safeguards blocking degradation factors, decreasing the likelihood of failure of a preventive or mitigation barrier. • Identification of shortfalls or deficiencies in pipeline operation, maintenance, and management. Shortfalls may be related to threats, consequences, preventive or mitigation barriers, degradation factors or safeguards; • Proposed recommended actions that could ensure maintenance of barrier integrity. Final Report Submission Requirements • Bow Tie diagram – computer generated using images provided in Bow-Tie Analysis Images.ppt • Identification of Critical Paths to consequence (if any) • Table of Prevention Barriers, degradation factors, and degradation controls • Table of Mitigation Barriers, degradation factors, and degradation controls • Table of recommended corrective actions to mitigate or prevent threats and consequences.

$25.00 View

[SOLVED] ECM604 Econometrics I ECM651 Economic Data Analysis Autumn Term 2024-2025 Python

ECM604 Econometrics I ECM651 Economic Data Analysis Autumn Term 2024-2025 Econometrics Project and Computer Lab Sessions Overview This individual project is designed to give you an opportunity to apply the econometric techniques you have learned in this module to real-world data. You will begin your project with a raw dataset and are expected to create the relevant variables, conduct estimations and tests, justify the methods you use, and critically analyze the results you obtain. The dataset You are required to use the 2013 Annual Population Survey (APS) dataset for this project. The dataset and related documents are available for download on Blackboard. Computer lab sessions Computer lab sessions area core component of this module and play a vital role in successfully completing the project. These sessions are structured to provide comprehensive instruction, not just on STATA commands for econometric estimations, but also on foundational techniques for generating relevant variables from a raw dataset, conducting estimations, and performing tests, all with the support of ChatGPT as an additional tool. We have a total of eight computer lab sessions,  each designed to build your skills progressively: First Four Sessions: These sessions will focus on generating relevant variables from a raw dataset in response to a specific research question. You'll learn how to manipulate and prepare data to align it with your analytical needs. Remaining Four Sessions: These sessions will centre on performing estimations and conducting various econometric tests. Through practical exercises, you'll apply the techniques you've learned, reinforcing your understanding and ability to implement them in real-world scenarios. Generative AI Generative AI, such as ChatGPT, is a valuable tool for researchers, offering assistance in generating ideas, summarizing information, and exploring different perspectives. However, it's important to recognize that it can sometimes provide misleading or incorrect solutions or answers. Therefore, developing the skill to use AI tools critically is essential for conducting effective research. In our computer lab sessions, we will practice using ChatGPT to support your work on this project, with a focus on developing a critical approach to evaluating its outputs. The research question Since this is not a dissertation module, Ido not expect you to spend excessive time identifying a research question or topic for this project. Instead, you are expected to address the following question using the 2013 APS dataset: "Does marital status affect income? Are there any gender differences in this effect?" If you wish to pursue a different research question, please discuss it with me and obtain approval by the end of October. What doI expect you to do? This project is intended to showcase your ability to apply the econometric techniques covered in this module. It is not meant to be a dissertation or a research paper, so there is no need to employ advanced econometric methods beyond those discussed in the course. Effective and careful handling of the data is paramount, rather than the use of complex techniques. You should avoid replicating results from existing research papers. While a comprehensive literature review is not required, reviewing related research papers may offer useful insights. As outlined in the project  overview, your tasks are to: 1.    Generate the relevant variables. 2.    Perform. estimations and tests. 3.   Justify your methods. 4.    Critically review the results obtained. 5.    Use ChatGPT to facilitate your research, applying it critically. During your estimation process, you may face challenges in identifying the effect you are interested in. It is expected that you address these issues using the techniques and knowledge acquired from this module. It is important to acknowledge that not all problems can be resolved and that your results will have limitations and potential weaknesses. These should be clearly and concisely explained in your report. Furthermore, while ChatGPT (or other generative AI) can be a valuable tool, it is important to use it critically. Simply copying and pasting ChatGPT output without thoughtful engagement is not acceptable and will not meet the pass requirements for this project. What do you have to submit? You are required to submit a single Word file via Turnitin. Your submission should be concise and focused, with a maximum of 800 words for sections 1 through 4 and 200 words for section 5, totalling no more than 4 pages. The document should include the following sections: 1.    Introduction:  Provide an explanation of the variables used in your analysis and justify your chosen methods. 2.    Summary Statistics Table: Include a table presenting the summary statistics of your data. 3.    Main Table: Present your STATA estimation results and any test results if applicable. 4.    Results  Interpretation and Discussion: Interpret your findings and discuss the limitations of your analysis. 5.    Reflection on ChatGPT Usage: Reflect on your use of ChatGPT in the following aspects: a.    What did you ask ChatGPT? b.   The pros and cons of ChatGPT’s answers. c.    The limitations of ChatGPT. d.    Cite specific questions and answers from your interactions with ChatGPT, including the page number in section 8. 6.    Reference List: Include a list of references, if applicable. 7.    Do File: Attach your STATA “do file” containing all commands used in your analysis. 8.    ChatGPT Conversations: Include your conversations with ChatGPT. Please ensure that your Word file is clear, well-organized, and adheres to the specified word limits. For (2) and (3), you should take Tables 1 and 2 in my paper “An economic analysis of tiger parenting: Evidence from child developmental delay or learning disability” as examples. The do file is a STATA script that includes all the commands necessary for your project, from generating the relevant variables  to  performing estimations and tests. It should be organized so that I can replicate the results presented in your PowerPoint file simply by running the do file with the 2013 APS dataset. Please ensure that your do file is well-organized and tidy. Afterward, copy and paste the contents of your do file into the Word document you are submitting.

$25.00 View

[SOLVED] EBUS306 Sustainable Supply Chain Management C/C

EBUS306 Sustainable Supply Chain Management Main Individual Assignment 3 - Report (worth 50%) Deadline: Thursday 12th  December 2024 at 2pm Sustainable Supply Chains and Stock Availability Background You are required to conduct a mini project about on-shelf availability in supermarkets, and to write up your findings in a report not exceeding 1,500 words. Submission is required online  via the CANVAS website in a Microsoft Word document. Assignment Tasks 1.   Next time you go to a supermarket make notes on any items that are out of stock 2.   Conduct an analysis on your findings where you consider the issues in the supply chain that could cause these products to be out of stock. Do this for 3 products. 3.   "Providing such high levels of on shelf availability in supermarkets is not environmentally sustainable." Critically analyse this statement. Draw on practical examples of the work supermarkets are doing to reduce their impact on the environment. 4.   Conclude your report. Report Structure (1,500 words) 1.   Introduction - short, concise 2.   Findings - table 3.   Data Analysis - explain reasons why these 3 items could be out of stock 4.   Environmental sustainability versus on-shelf availability in supermarkets 5.   Conclusion You do not need to speak to any supermarket staff to complete this assignment to a high standard. You are not expected to know for sure why items are out-of-stock, but you should suggest logical explanations, supported by academic literature. To obtain high marks for  your work it is essential that you connect your findings with the academic literature. We will go through the brief during the seminars in week 8, when you can ask questions about this assignment. You can ask questions about the assignment on the discussion board that has been set up in the assessment folder entitled ‘ Main Assessment - Individual Report. Plagiarism will not be tolerated. The report must present your own analysis, understanding and conclusions rather than a copy of someone else’s. You should reference all the sources of information that you use. You may use only 1,500 words, excluding the list of references, and no appendices. This means that you must write succinctly by focusing on the key point you wish to make. There is no need for a long introduction. When you work in industry you will have to produce short reports because senior managers will not have time to read long reports. It is your job to analyse and compact the information, not the reader’s. Criteria Quality of Reading and Referencing Reasons for Stockouts Environmental Sustainability Structure, Presentation and Quality of Writing  

$25.00 View

[SOLVED] CENG0037 Dissertation

CENG0037 Dissertation Literature Review To be submitted via Moodle by 08:59 am on Monday 27th November 2024 GenAI Category 2: AI tools can be used in an assistive role for the following purposes only: proofreading and structuring your submission. If considerable changes have been made to your content, then this could be considered academic misconduct. Background research may be performed using GenAI but it is expected that any information obtained from GenAI is critically evaluated and validated using literature sources. Where you have used GenAI for proofreading it is still recommended to do final proofreading yourself as often technical terms can be changed altering the meaning completely. If you do use GenAI then you must acknowledge the use of it. In the scientific literature, stand-alone literature reviews represent authoritative, overarching, wide reaching and up-to-date reports on a specific scientific topic. They rarely include new data, but instead are constructed through the summarising and critical analysis of previously published articles. As discussed in class, one of the primary purposes of a literature review is to highlight gaps in the current knowledge of the field with a view to suggesting areas for further research. The text will be referenced in a style. that is particular to the publishing journal. Task: Write a sub-section of a literature review, that will be submitted to the International Journal of Mining Science and Technology, that will have the OVERALL TITLE: Contributions of the Mining Industry to the UN Sustainable Development Goals This is a very broad topic and impossible to write about within the word limit. You should therefore select a sub-topic to write about in your review. Your assessment should start with the main title as above, and then a sub-title that you have created to reflect the topic that you will write about. It is up to you what this subtopic will be. However a couple of suggestions are (but not limited to): · To focus on one specific aspect of the mining industry (e.g. exploration, processing, economics, resource efficiency, mining innovation or something else of your own choosing) and the impact this aspect has on a variety of different SDGs. · Focus on one or two select SDGs and how various different aspects of the mining industry might contribute to this SDG · Case study focussing on a particular mining company This is not an exhaustive list but as a guideline you should include: · A short introduction to the topic (briefly outlining the broad context). · An introduction to your chosen sub topic (being more specific and highlighting why it is important to discuss). · Concise critical analysis of the literature within this sub-topic. · A short concluding statement that brings the sub-section together highlighting any potential gaps for potential subsequent work. Make sure you read the marking rubric to understand how marks will be awarded. The audience you are writing this summary for is someone with a PhD in a science or engineering topic, but with only a superficial understanding of the resources sector. You must write your literature review so they can understand it. You may use a maximum of 15 references. Please use the referencing style. of the journal ‘International Journal of Mining Science and Technology’. You must find the appropriate style. from their author guidelines. The idea here is that you are writing a review paper to be submitted to this journal, and so are required to follow their referencing guidelines. If you use the wrong referencing style, even if you have put appropriate references in, then you will loose marks. Word limit: 1500 words, not including title or bibliography (you must state your word count at the end of the document). There will be no penalty for under length work, though note that work that is substantially under the word count is unlikely to meet the rubric effectively. There will be penalties applied to assignments that are over length by: · up to 5% (1575 words) will receive a penalty of 10% reduction in the mark (no penalties will cause the mark to go below the pass mark of 50%)

$25.00 View

[SOLVED] EPPM 1113/ EPPD 1063 Lab Test 1 Statistics

EPPM 1113/ EPPD 1063 Lab Test 1 (Duration: 2 hours 30minutes) You have been provided with a dataset containing records of a fictional company’s sales data. Follow the instructions below to answer the questions using Excel formulas. Save your answers and submit your Excel file with your Words file (mail merge question) on UKMFolio. Questions: 1. Use VLOOKUP to find the region for Product ID 105. 2. Use SUMIF to find the total sales of the product named “AlphaX.” 3. Calculate the average units sold across all regions. 4. Determine the maximum units sold in any single transaction. 5. Identify the minimum price per unit for products in the "North" region. 6. Find the total sales for transactions where the salesperson is "James". 7. Use COUNTIF to count transactions in the “South” region. 8. Create a Pivot Chart to display total sales by region. 9. Use the IF function to mark transactions as “High” if total sales exceed 5000; otherwise, mark as “Low.” Create a new column next to the Total Sales and name it as Transaction Status. 10. Calculate the total revenue by summing all "Total Sales" values. Use SUM function 11. Autofill the cells to calculate the difference between each transaction’s units sold and the average units sold (Average is based on Question no 3). 12. Find the average price per unit for products sold by “Sarah.” 13. Use MIN to find the range of sales dates (earliest and latest date). 14. Calculate the sum of units sold for products with IDs above 150. 15. Create a line chart to show total sales trends over time. 16. Find the median price per unit across all products. 17. Count the number of salespeople using COUNTA. 18. Use IF to calculate a 10% bonus for transactions with units sold above 100. 19. Autofill formulas down the column to calculate the final price after applying a 5% discount on all products. 20. Identify the second-highest total sales using LARGE. 21. Calculate the total units sold using SUMIF for product – P110. 22. Use AVERAGEIF to find the average total sales for the "East" region. 23. Use IF OR to divide total sales by 3 if the region is “East” or units sold are over 50; otherwise, return “N/A”. 24. Determine the total number of transactions. Use COUNTA 25. Use IF AND to classify transactions as "Medium" if total sales are above 2000 and units sold are under 100; otherwise, return "Check Data." 26. Please use the Mail Merge function to create a personalized letter of appreciation for each salesperson based on their sales data. Use the provided template below to insert relevant fields, including the salesperson’s name, region, total sales, units sold, and product name. You do not need to submit all 200 letters. Instead, save and submit the template file with the mail merge fields inserted, along with 5 sample letters generated from the merge. Make sure to preview a few records to confirm accuracy in your setup before completing the samples.  

$25.00 View

[SOLVED] BEA3026 Financial Modelling

BEA3026 Financial Modelling Individual Assignment (100% of Total Module Assessment) Released: 12:00 noon on 1st November 2024 Submission Deadline: 12:00 noon on 9th December 2024 (Late Submission Penalties Will Apply) Length: 3,500 words with a permissible deviation of +/- 5% Excluding Title Page, Tables, Figures, Equations and References, BEA3026 Financial Modelling Individual Assignment (100% of the course mark) Overview •    This is an assessed piece of work that will account for 100% of the total marks of this course. •    The  assignment is in the form of a written report, to be submitted on the course ELE submission folder. •    This is an individual assignment, length of 3,500 words with a permissible deviation of +/- 5%, excluding title page, tables, figures, equations, and references. •    A word count must be placed on the front cover; 2 marks will be deducted if there is no word count, and 5 marks will be deducted if the examiner feels that there has been a deliberate attempt to give misleading information about the length of the assignment. •    Penalties will apply if the coursework is submitted late. and details can be found here. Assignment Task You should choose ONE of the following two assignment tasks. Assignment Task Option 1: Financial Statement Modelling You area corporate finance analyst in the mergers and acquisitions department of an investment bank. The bank’s client is considering making abid for Caterpillar Inc (ticker CAT). You are tasked with providing a preliminary estimate of the fair market value of equity that will provide the starting point for the client’s bid for CAT. You should assume that the company will maintain its current capital structure and dividend pay-out ratio. You should assume that the company has an annual stock repurchase policy. Note: a spreadsheet template with CAT's financial statement data for 2023 is available in the Assessment Information folder on ELE. The template also provides the values for the sales growth rate, long term growth rate, WACC, and stock repurchase policy to be used in the valuation. Assignment Task Option 2: Risk Measurement You area risk analyst for an investment bank and have been tasked with analysing the risk of an investment in options on Caterpillar Inc (ticker CAT) stock. Compute the VaR and CVaR for a range of confidence level from 50% to 99% for a position in a call option with a maturity that is as close as possible to three months and a strike price of 350. The VaR horizon is the maturity of the option. You should also estimate the corresponding VaR for a position in the underlying stock. You should incorporate the expected return of the stock in the simulation using the CAPM. Note: A spreadsheet template containing two years of historical daily adjusted close price data  for CAT, five years of historical monthly adjusted close price data for both CAT and S&P500, and the market risk premium and risk free rate for use in the CAPM is available in the Assessment Information folder on ELE. Assignment Structure Guide You should use the following assignment structure for your report: 1. Title page: this should include the assignment title, your student number and the word count. 2.    Introduction: this should summarise the objective of the assignment and motivate it in the context of the existing literature (either from the course contents, or from your further reading), define the assignment objective, summarise the methodology and the key findings, and outline the structure of your report. 3.    Method: this should precisely and succinctly define the methods that you use in the analysis, making use of mathematical formulas where appropriate. 4.    Data: this should define the data items used in the analysis, the sources of the data and, where appropriate, the sample period and frequency. You should also report summary statistics of your data. 5.    Results: this should present the results of the analysis, including any sensitivity analyses. You should provide a critical  discussion of your results relating them to the objective of the assignment. 6.    Conclusion: this should summarise the  main findings of the model, critically evaluate any shortcomings of the data and method, its practical implications and offer some suggestions for future improvement. 7.    References: it is important that you include references for all the papers cited and that you use APA reference system. Further information about referencing other people’s work can be found here: https://ele.exeter.ac.uk/course/view.php?id=6748§ion=3#module-2714576 8. Assignment cover sheet: to comply with the UEBS GENAI policy, you must submit the assignment cover sheet (see from the Appendix 1) together with the assignment report. Guidance on Completion of the Assignment In writing up your report, you should adhere to the following guidelines: •    Your report should be professionally presented. You could assume that it will be read by the senior management of the company and so it needs to be neat, properly structured and clearly  and concisely written. This is an important skill in practice, and you should take this opportunity to develop your skills in this area. Some useful guidance on academic writing can be found from here: https://ele.exeter.ac.uk/course/view.php?id=6748 •    Tables and charts should be accompanied by detailed explanatory notes. Look at any paper in a good finance journal to see how to present a table of results I have put a sample of a published paper in the Assessment Information section on the course website to give you some examples of how to present the assignment results. •    You should use Equation Editor in Microsoft Word to format any equations; You should define any variables that are used in equations and explain/discuss the assumptions that are made for any of the input variables. •    You should pay attention to the formatting of your report, particularly with respect to line spacing, paragraphs and section titles. •    Note that there is no word limit for individual sections, only for the overall report. •    You should not include appendices in your report. •    Note that you can use Generative AI tools such as ChatGPT to support your work without using these tools to try to  replace or substitute for your own ideas and  perspective.  If you do  use generative AI tools in assisting the completion of this academic work, you must reference and acknowledge AI tools used in your academic work following  the  guidance  from  below: https://libguides.exeter.ac.uk/referencing/generativeai. •    Your report should follow the Business School’s guidelines on referencing, citation and avoiding plagiarism, which can be found here: https://vle.exeter.ac.uk/course/view.php?id=6748§ion=2 Assignment Submission Deadline: Monday, 09 December 2024, 12:00 noon GMT Written Report – will be marked Please submit your written assignment in PDF format via the submission link on the modules ELE2 page, which will be available up to three weeks before the above deadline. Supporting Data /Calculation /Programme Files – will be checked A separate Excel file(s) containing your modelling data and calculations should be submitted on ELE  2 to the support document submission folder. Without this, your assignment mark will be capped at 50. The supporting Excel file(s) will not be marked but they are required and will be used to check the originality of your work. Please name your supporting documents in following format: Student Number xx Support File. If submitting more than one file, please put them in a zipped folder to submit. Late submission You will be penalised if you submit your assignment after the deadline. Details of the penalties for late submission can be found here: https://as.exeter.ac.uk/academic-policy-standards/tqa- manual/aph/settingandsubmission/#late

$25.00 View

[SOLVED] EMS702PStatistical Thinking and Applied Machine Learning 2024/2025 Matlab

EMS702P–Statistical Thinking and Applied Machine Learning Case Study: Artificial Intelligence in Air Traffic Management (ATM) Data-Centric Engineering in Airport Airside Operations 16/10/2024 Student Pack – 2024/2025 1.   Problem description In order to optimise airport airside operations, accurate taxi time prediction has played an indispensable role. It is not only important to create more robust schedules and identify choke points between gate and runway for practitioners, but also helps the government analysts to estimate the optimal airport capacity and evaluate the regulation impacts. This case study utilises taxiing data from Manchester International Airport (MAN) ranking among the 2nd busiest airports in the UK. In order to ensure taxi time prediction accuracy, one should comprehensively consider relevant features that may affect taxi time. In this case study, the data comes with up to 25 features, aiming to provide a sufficient set of features for the taxi time prediction. These relevant features are divided into three categories, including (a) aircraft and airport operational factors, (b) airport congestion level and (c) aircraft average speed. You will need to complete 4 tasks in this case study: (i)    Collecting/Selecting and pre-processing data using the programs downloaded from QM+. (ii)   Applying feature  engineering technologies,  in  particular  Principal  Component Analysis  (PCA),  for feature extraction using the dataset that has been collected/selected by you. (iii)  Applying supervised learning, including the Neural Network (NN), Linear Regression (LR) and Adaptive Neuro-Fuzzy Inference Systems (ANFIS) to predict the taxi time. (iv)  Discussing the pros and cons of the  different machine learning tools from the aspects of prediction accuracy, generalisation capability and model transparency. You may also find Appendix C: Steps to Success helpful to complete the above tasks. 2. Description of tasks 2.1 Data collection/selection and processing The students will work together in groups. Each group should collect data from https://opensky-network.org/ for MAN or select data over a period of time from the provided dataset. In order to achieve appropriate data sets for machine learning studies, the following rules need to be followed: Each data set should be a minimum of 10 times the number of useful features data points (you need to decide the time to start recording/retrieving your data and for how long), but do not exceed 1000 data points (otherwise it will consume lots of your time). Data sets collected/selected by groups should be different (consider different time periods). The collected/selected data sets should be pre-processed, so that they are suitable for machine learning based modelling. Data processing is conducted using the programs downloaded from QM+. Details of the data processing programs are shown in Appendix A. Appropriately dividing the data set into sub-sets including training, validation and testing. 2.2 Feature extraction and selection The available features in the collected/selected data are explained in Appendix B. The feature extraction is conducted by using PCA. PCA is a linear dimensionality reduction technique that can be utilised for extracting useful information from a high-dimensional space by projecting it into a lower dimensional sub-space. You need to do research on how to use PCA as a tool to select features (e.g., a paper included in W3.3 slide provides a potential way). You need to decide how many features you will use and explain the reason of the choice. 2.3 NN & LR & ANFIS NN and LR are classical machine learning models and the foundation of many other advanced machine learning approaches. ANFIS represent a hybrid intelligent system. In this case study, you need to apply these three models to solve the taxi time prediction problem. For NN, refer to the following rules: Applying Back Propagation (BP) NN for prediction. Choose the number of the hidden layer nodes and explain the reason. For LR, refer to the following rules: Applying the polynomial basis function for LR modelling. Decide the maximum order of the polynomial function and explain the reason. For ANFIS, refer to the following rules: Applying the Clustering for constructing the initial Fuzzy Rule-based System. Decide the learning algorithms to further train Fuzzy Rule-based System and explain the reason. You will also need to decide the training, validation or testing data sets used for the machine learning models. Statistical tests of the obtained regression models are needed. 2.4 Comparison Compare NN, LR and ANFIS from the aspects of prediction accuracy, generalisation capability and model transparency. The prediction accuracy is quantified by the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Relative Error (MRE), etc.. Statistical tests and/or interval estimation are needed as the means to report the model performance/skill. 2.5 Conclusion Read the following three reports that are provided on QM+. You will gain an understanding of challenges associated with the safe use of AI, the importance of data to AI, methods for determining different sub-datasets for training and verification purposes, and appreciate the urgent need to develop new technologies, processes, tools, and guidance for assuring the safety of systems based on these technologies. Draw conclusions of your work and results in this coursework, using these understandings of AI in life critical engineering. •    Read  the  report  “The  FLY AI  Report”  produced  by  EUROCONTROL.  This  report  provides  an overview of the many ways that AI is already applied in the aviation sector and ATM and assesses its potential to transform. the sector. •    Read the report “AFE 87 – Machine Learning” produced by Aerospace Vehicle Systems Institute. This report provides  an  overview  of different Machine Learning paradigms  and how  these  emerging technologies present new challenges to existing certification processes in the aviation sector. •    Read the article “Aircraft taxi time prediction: Comparisons and insights” to get a better idea of how different Machine Learning algorithms have been used and the pros and cons of different methods. 3.   Assessment The first deliverable of this case study is a 5-minute pre-recorded group presentation (15% of the whole module). The second deliverable of this case study is a 10-page group report (25% of the whole module). The main parts and the suggested weightage of each part to be included in the presentation/report are indicated in the table below. Both deliverables will be marked against these main parts. .Descriptionoftheproblemandhowthepredictivemodelcanbeusedto ,  components,3.Choosing the appropriate number ofprincipal components,4. ,5.Plotting selected princi tion loss.30%NN&LR&ANFISDescriptionofstepstakentoconstructtheNN including1.Choice ofmodel structures,2.    Which sub-dataset(s) is(are)used fortraining,t Quantifyandvisualisetheperformanceofthepredictivemodelon Conductingthestatisticaltestsoftheregressionmodels Assessingtransparencyandgeneralisationcapabilityofthemodel • esin acy,generalisation capability and model transparen • visualising the comparison results in the report.20%ConclusionsDrawing  conclusions furthercrucial  steps  formachine learning approaches applied  operational environmentat airports.10%

$25.00 View

[SOLVED] CS 17700 - Lab 07

[pdf-embedder url="https://assignmentchef.com/wp-content/uploads/2024/11/q4ZWMheBVBFa-README.pdf"] Fall 2024 CS 17700 — Lab 07 Submission files: lab07.py Submission deadline: Monday November 25 @ 11:59 PM Late deadline: Thursday November 28 @ 11:59 PM (–20% penalty) Attempts: 10 All times are local to the Purdue West Lafayette campus (Eastern Time) Objectives: Read formatted data from a file Implement classes based on design criteria Description: As part of a new corporate acquisition, you are now tasked with writing software that will help manage the inventory (stock) of several major grocery stores. This includes adding and removing items from the stock, adjusting prices, and calculating what to charge customers when they checkout. Task 1: Taking stock (20% of grade) Create a class called Store. The constructor should accept one required positional parameter: stock_file (str). Use the open() function and a for loop to read the contents of the stock_file line- by-line. The file is in comma-separated value format (CSV): Item name,Quantity in stock,Price per unit str ,int ,float The constructor should save this data into a dictionary attribute of the class named stock with the following format: {Item name (str): [Quantity in stock (int), Price per unit (float)], } Warning: you are not permitted to import the csv library; this will earn a score of zero. Your workarea in Vocareum contains several CSV files which will be used to test your program. An example execution with one of the files is shown below. Example execution 1: >>> kroger = Store("s_kroger.csv") >>> kroger.stock {'Apple juice': [7, 2.49], 'Peanut butter': [9, 1.99], 'Ranch': [17, 2.29], 'Sour cream': [14, 2.79], 'Ketchup': [20, 2.49], 'Mayo': [18, 3.99], 'Litter': [0, 11.49], 'Cat food': [5, 4.29], 'Bread': [20, 1.79], 'Milk': [6, 2.79], 'Paper towels': [15, 9.99], 'Pizza': [10, 8.99]} Task 2: Doing business (20% of grade) Create a Store method named restock which accepts two required positional parameters: item (str) and quantity (int). This will adjust the quantity of item in the Store’s stock dictionary. The price is left unchanged. If item did not already exist in the Store’s stock dictionary, then it is added with a price of 0.0. Create a Store method named reprice which accepts two required positional parameters: item (str) and price (float). This will adjust the price of item in the Store’s stock dictionary. The quantity is left unchanged. If item did not already exist in the Store’s stock dictionary, then it is added with a quantity of 0. Some example executions are provided below. The auto-grader may perform additional tests with other data. Example execution 2: >>> kroger = Store("s_kroger.csv") >>> kroger.restock("Apple juice", 0) # sold out >>> kroger.restock("Popsicles", 4) # new item: price assumed to be 0.0 >>> kroger.reprice("Ranch", 2.10) # discount >>> kroger.reprice("Ice cream", 2.99) # new item: quantity assumed to be 0 (changes from example execution 1 highlighted based on code executed above) >>> kroger.stock {'Apple juice': [0, 2.49], 'Peanut butter': [9, 1.99], 'Ranch': [17, 2.1], 'Sour cream': [14, 2.79], 'Ketchup': [20, 2.49], 'Mayo': [18, 3.99], 'Litter': [0, 11.49], 'Cat food': [5, 4.29], 'Bread': [20, 1.79], 'Milk': [6, 2.79], 'Paper towels': [15, 9.99], 'Pizza': [10, 8.99], 'Popsicles': [4, 0.0], 'Ice cream': [0, 2.99]} Task 3: Welcome valued customer (25% of grade) Create a Store method named cost which accepts one required positional parameter named cart (dict) and one optional parameter named checkout (bool, default False). o The cart parameter will be a dictionary of the following form: {Item name (str): Purchase quantity (int), ...} This method will sum and return the total price (starting from 0.0) of the cart, rounded to two decimal places. Hint: use the built-in round(..., 2) function to perform this rounding. Hint: the customer cannot purchase more items than there are in stock. If the desired purchase quantity exceeds the quantity in stock for an item, then the customer buys all available stock. Likewise, if the customer wants to purchase an item that is not in stock or completely unknown, skip it and move on to the next item. If checkout is True, then update the Store’s stock dictionary to decrease the quantity in stock of each item the customer has bought. Otherwise, no changes to the stock dictionary are made. Some example executions are provided below. The auto-grader may perform additional tests with other data. Example execution 3: >>> kroger = Store("s_kroger.csv") >>> kroger.cost({}) # empty cart 0.0 >>> kroger.cost({"Litter": 2, "Bread": 2, "Pizza": 1, "Toothpicks": 10}) 12.57 >>> kroger.stock["Bread"] == [20, 1.79] # quantity did not change True >>> kroger.stock["Pizza"] == [10, 8.99] # likewise True >>> kroger.cost({"Litter": 2, "Bread": 2, "Pizza": 1, "Toothpicks": 10}, ... checkout=True) 12.57 (changes from example execution 1 highlighted based on code executed above) >>> kroger.stock {'Apple juice': [7, 2.49], 'Peanut butter': [9, 1.99], 'Ranch': [17, 2.29], 'Sour cream': [14, 2.79], 'Ketchup': [20, 2.49], 'Mayo': [18, 3.99], 'Litter': [0, 11.49], 'Cat food': [5, 4.29], 'Bread': [18, 1.79], 'Milk': [6, 2.79], 'Paper towels': [15, 9.99], 'Pizza': [9, 8.99]} Task 4: Competition (35% of grade) Create a Store subclass named Costco. The class constructor should accept two required positional parameters: stock_file (str) and discount (float). Hint: add the following line of code to your new constructor to reuse the constructor from the parent (super) class Store: super().  init  (stock_file) The value of the discount parameter should be saved as an attribute for use below. Create a Costco method named cost which accepts one required positional parameter named cart (dict) and one optional parameter named checkout (bool, default False). This method will work the same as the Store.cost method, but the total price will be reduced by the discount passed to the constructor. For example, if discount == 0.2, then a 20% discount is applied to the entire cart (i.e. the total price will be 80% of what it would be at a generic Store). Hint: use the following piece of code in your new method to reuse the method from the parent (super) class Store: super().cost(cart, checkout) Hint: as in Task 3, use the built-in round(..., 2) function to round the return value. Some example executions are provided below. The auto-grader may perform additional tests with other data. Example execution 4: >>> costco = Costco("s_kroger.csv", 0.02) >>> costco.discount 0.02 >>> costco.cost( ... {"Litter": 2, "Bread": 2, "Pizza": 1, "Toothpicks": 10}, ... checkout=True) 12.32 >>> costco.stock {'Apple juice': [7, 2.49], 'Peanut butter': [9, 1.99], 'Ranch': [17, 2.29], 'Sour cream': [14, 2.79], 'Ketchup': [20, 2.49], 'Mayo': [18, 3.99], 'Litter': [0, 11.49], 'Cat food': [5, 4.29], 'Bread': [18, 1.79], 'Milk': [6, 2.79], 'Paper towels': [15, 9.99], 'Pizza': [9, 8.99]} Policies: Review the syllabus for policies regarding extensions and academic integrity. All submissions must be made through Vocareum. No work will be accepted via any other method. Your code must be placed into a file named lab07.py inside of the work/ folder in Vocareum. No variation is permitted. You may make up to 10 submissions. Each submission will produce feedback from the auto-grader. NEW: For this assignment, your final score in Vocareum will be the maximum (best) score you obtained on any submission you made, regardless of whether it was the final submission or not. You may make a private post on Ed if you believe your assignment was incorrectly scored by the auto- grader. This must be done within 5 days of the score entering the Brightspace gradebook. Unit testing: Please refer to the Lab 04 handout for more information on how your work will be graded. DO NOT CALL input() or print() ANYWHERE except inside the   main  block (if any). Notes on CSV files: Please refer to the Lab 06 handout.

$25.00 View

[SOLVED] Dsci553 assignment 5 in this assignment, you are going to implement three streaming algorithms.

In this assignment, you are going to implement three streaming algorithms. In the first two tasks, you will generate a simulated data stream with the Yelp dataset and implement the Bloom Filtering and Flajolet-Martin algorithm. In the third task, you will do some analysis using Fixed Size Sample (Reservoir Sampling).2.1 Programming Requirements a. You must use Python and Spark to implement all tasks. There will be a 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You are not required to use Spark RDD in this assignment. c. You can only use standard Python libraries, which are already installed in the Vocareum.2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use the above library versions to compile and test your codes. You are required to make sure your codes work and run on Vocareum otherwise we won’t be able to grade your code.2.3 Important things before starting the assignment: 1. If we cannot call myhashs(s) in task1 and task2 in your script to get the hash value list, there will be a 50% penalty. 2. We will simulate your bloom filter in the grading program simultaneously based on your myhashs(s) outputs. There will be no point if the reported output is largely different from our simulation. 3. Please use integer 553 as the random seed for task 3, and follow the steps mentioned below to get a random number. If you use the wrong random seed, or discard any obtained random number, or the sequence of random numbers is different from our simulation, there will be a 50% penalty. 2.4 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! TAs will combine all the codes we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism.For this assignment, you need to use users.txt as the input file. You also need a Python blackbox file to generate data from the input file. Both users.txt and blackbox.py can be found in the publicdata directory on Vocareum. We use the blackbox as a simulation of a data stream. The blackbox will return a list of user ids from file users.txt every time we call it. Although it is very unlikely that the user ids returned from the blackbox are not unique, you are required to handle it wherever required. Please call the blackbox function like the example in the following figure: If you need to ask the blackbox multiple times, you can do it by the following sample code:4.1 Task1: Bloom Filtering (2.5 pts) You will implement the Bloom Filtering algorithm to estimate whether the user_id in the data stream has shown before. The details of the Bloom Filtering Algorithm can be found on the streaming lecture slide. Please find proper hash functions and the number of hash functions in the Bloom Filtering algorithm. In this task, you should keep a global filter bit array and the length is 69997. The hash functions used in a Bloom filter should be independent and uniformly distributed.Some possible hash functions are: f(x)= (ax + b) % m or f(x) = ((ax + b) % p) % m where p is any prime number and m is the length of the filter bit array. You can use any combination for the parameters (a, b, p). The hash functions should remain the same once you create them. As the user_id is a string, you need to convert the type of user_id to an integer and then apply hash functions to it.The following codes show one possible solution to converting the user_id string to an integer: import binascii int(binascii.hexlify(s.encode(‘utf8’)),16) (We only treat the exact same strings as the same users. You do not need to consider aliases.) Execution Details To calculate the false positive rate (FPR), you need to maintain a previous user set. The size of a single data stream will be 100 (stream_size). And we will test your code for more than 30 times (num_of_asks), and your FPRs are only allowed to be larger than 0.5 at most once.The run time should be within 100s for 30 data streams. Output Results You need to save your results in a CSV file with the header “Time,FPR”. Each line stores the index of the data batch (starting from 0) and the false positive rate for that batch of data. You do not need to round your answer. You also need to encapsulate your hash functions into a function called myhashs. The input of myhashs function is a user_id (string) and the output is a list of hash values. For example, if you have three hash functions, the size of the output list should be three and each element in the list corresponds to an output value of your hash function. Figure below is a template of myhashs function: Our grading program will also import your Python script, call myhashs function to test the performance of your hash functions and track your implementation.4.2 Task2: Flajolet-Martin algorithm (2.5 pts) In task2, you will implement the Flajolet-Martin algorithm (including the step of combining estimations from groups of hash functions) to estimate the number of unique users within a window in the data stream. The details of the Flajolet-Martin Algorithm can be found on the streaming lecture slide. You need to find proper hash functions and the number of hash functions in the Flajolet-Martin algorithm. Execution Details For this task, the size of the stream will be 300 (stream_size). And we will test your code more than 30 times (num_of_asks). And for your final result, 0.2

$25.00 View

[SOLVED] Dsci-553 assignment 4 in this assignment, you will explore the spark graphframes library as well as implement

In this assignment, you will explore the spark GraphFrames library as well as implement your own Girvan-Newman algorithm using the Spark Framework to detect communities in graphs. You will use the ub_sample_data.csv dataset to find users who have similar business tastes. The goal of this assignment is to help you understand how to use the Girvan-Newman algorithm to detect communities in an efficient way within a distributed environment.2.1 Programming Requirements a. For Task 1, you can use the Spark DataFrame and GraphFrames library. For task 2 you can ONLY use Spark RDD and standard Python or Scala libraries. There will be a 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct.2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use these library versions to compile and test your code. There will be no point if we cannot run your code on Vocareum.2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! TAs will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism.2.4 What you need to turn in You need to submit the following files on Vocareum: a. [REQUIRED] two Python scripts, named: task1.py, task2.py b1. [OPTIONAL, REQUIRED FOR SCALA] two Scala scripts, named: task1.scala, task2.scala b2. [OPTIONAL, REQUIRED FOR SCALA] one jar package, named: hw4.jar c. [OPTIONAL] You can include other scripts called by your main program. d. You don’t need to include your results. We will grade your code with our testing data (data will be in the same format).We have generated a sub-dataset, ub_sample_data.csv, from the Yelp review dataset containing user_id and business_id. You can find the data on Vocareum under resource/asnlib/publicdata/.4.1 Graph Construction To construct the social network graph, assume that each node is uniquely labeled and that links are undirected and unweighted. Each node represents a user. There should be an edge between two nodes if the number of common businesses reviewed by two users is greater than or equivalent to the filter threshold. For example, suppose user1 reviewed set{business1, business2, business3} and user2 reviewed set{business2, business3, business4, business5}. If the threshold is 2, there will be an edge between user1 and user2. If the user node has no edge, we will not include that node in the graph. The filter threshold will be given as an input parameter when running your code.4.2 Task1: Community Detection Based on GraphFrames (2 pts) In task1, you will explore the Spark GraphFrames library to detect communities in the network graph you constructed in 4.1. In the library, it provides the implementation of the Label Propagation Algorithm (LPA) which was proposed by Raghavan, Albert, and Kumara in 2007. It is an iterative community detection solution whereby information “flows” through the graph based on underlying edge structure. In this task, you do not need to implement the algorithm from scratch, you can call the method provided by the library.The following websites may help you get started with the Spark GraphFrames: https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-python.html https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-scala.html 4.2.1 Execution Detail The version of the GraphFrames should be 0.6.0. (For your convenience, graphframes0.6.0 is already installed for python on Vocareum.The corresponding jar package can also be found under the $ASNLIB/public folder. ) For Python (in local machine): ● [Approach 1] Run “python3.6 -m pip install graphframes” in the terminal to install the package. ● [Approach 2] In PyCharm, you add the sentence below into your code to use the jar package os.environ[“PYSPARK_SUBMIT_ARGS”] = “–packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 pyspark-shell” ● In the terminal, you need to assign the parameter “packages” of the spark-submit: –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 For Scala (in local machine): ● In Intellij IDEA, you need to add library dependencies to your project “graphframes” % “graphframes” % “0.8.2-spark3.1-s_2.12” “org.apache.spark” %% “spark-graphx” % sparkVersion ● In the terminal, you need to assign the parameter “packages” of the spark-submit: –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 For the parameter “maxIter” of the LPA method, you should set it to 5. 4.2.2 Output Result In this task, you need to save your result of communities in a txt file. Each line represents one community and the format is: ‘user_id1’, ‘user_id2’, ‘user_id3’, ‘user_id4’, …Your result should be firstly sorted by the size of communities in ascending order, and then the first user_id in the community in lexicographical order (the user_id is of type string). The user_ids in each community should also be in the lexicographical order. If there is only one node in the community, we still regard it as a valid community. Figure 1: community output file format4.3 Task 2: Community Detection Based on Girvan-Newman algorithm (5 pts) In task 2, you will implement your own Girvan-Newman algorithm to detect the communities in the network graph. You can refer to Chapter 10 from the Mining of Massive Datasets book for the algorithm details. Because your task1 and task2 code will be executed separately, you need to construct the graph again in this task following the rules in section 4.1. For task 2, you can ONLY use Spark RDD and standard Python or Scala libraries. Remember to delete your code that imports graphframes. Usage of Spark DataFrame is NOT allowed in this task. 4.3.1 Betweenness Calculation (2 pts) In this part, you will calculate the betweenness of each edge in the original graph you constructed in 4.1. Then you need to save your result in a txt file. The format of each line is (‘user_id1’, ‘user_id2’), betweenness value Your result should be firstly sorted by the betweenness values in descending order and then the first user_id in the tuple in lexicographical order (the user_id is type of string).The two user_ids in each tuple should also be in lexicographical order. For output, you should use the python built-in round() function to round the betweenness value to five digits after the decimal point. (Rounding is for output only, please do not use the rounded numbers for further calculation)IMPORTANT: Please strictly follow the output format since your code will be graded automatically. We will not regrade because of formatting issues. Figure 2: betweenness output file format 4.3.2 Community Detection (3 pts) You are required to divide the graph into suitable communities, which reaches the global highest modularity.The formula of modularity is shown below: According to the Girvan-Newman algorithm, after removing one edge, you should re-compute the betweenness. The “m” in the formula represents the edge number of the original graph. (Hint: In each remove step, “m”, “k_i” and “k_j” should not be changed, while ‘A’ is calculated based on the updated graph.). In the step of removing the edges with the highest betweenness, if two or more edges have the same (highest) betweenness, you should remove all those edges. If the community only has one user node, we still regard it as a valid community.You need to save your result in a txt file. The format is the same as the output file from task 1. (Hint: For the second part of task 2, you should take into account the precision. For eg: stop the modularity calculation only if there is a significant reduction in the new modularity) 4.4 Execution Format Execution example: Python: spark-submit –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 task1.py spark-submit task2.py Scala: spark-submit –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 –-class task1 hw4.jar spark-submit –-class task2 hw4.jar Input parameters: 1. : the filter threshold to generate edges between user nodes. 2. : the path to the input file including path, file name and extension. 3. : the path to the betweenness output file including path, file name and extension. 4. : the path to the community output file including path, file name and extension. Execution time: The overall runtime limit of your task1 (from reading the input file to finishing writing the community output file) is 400 seconds. The overall runtime limit of your task 2 (from reading the input file to finishing writing the community output file) is 400 seconds. If your runtime exceeds the above limit, there will be no point for this task. 5. About Vocareum a. Dataset is under the directory $ASNLIB/publicdata/, jar package is under $ASNLIB/public/ b. You should upload the required files under your workspace: work/, and click submit c. You should test your scripts on both the local machine and the Vocareum terminal before submission. d. During the submission period, the Vocareum will automatically test task1 and task2. e. During the grading period, the Vocareum will use another dataset that has the same format for testing. f. We do not test the Scala implementation during the submission period. g. Vocareum will automatically run both Python and Scala implementations during the grading period. h. Please start your assignment early! You can resubmit any script on Vocareum. We will only grade on your last submission. 6. Grading Criteria 5. Grading Criteria (% penalty = % penalty of possible points you get)1. You can use your free 5-day extension separately or together a. https://forms.gle/edH8jw1mJjrLFRcm8 b. This form will record the number of late days you use for each assignment. We will not count late days if no request is submitted. Remember to submit the request BEFORE the deadline. 2. There will be a 10% bonus if you use both Scala and Python. 3. We will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection.4. All submissions will be graded on the Vocareum. Please strictly follow the format provided, otherwise you can’t get the point even though the answer is correct. 5. If the outputs of your program are unsorted or partially sorted, there will be a 50% penalty. 6. We can regrade your assignments within seven days once the scores are released. No argument after one week. 7. There will be a 20% penalty for late submission within a week and no point after a week. 8. Only when your results from Python are correct, the bonus of using Scala will be calculated. There is no partial point for Scala. 7. Common problems causing fail submission on Vocareum/FAQ (If your program runs seems successfully on your local machine but fail on Vocareum, please check these)1. Try your program on Vocareum terminal. Remember to set python version as python3.6, Use the latest Spark /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit Select JDK 8 by running the command “export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64” 2. Check the input command line formats. 3. Check the output formats, for example, the headers, tags, typos. 4. Check the requirements of sorting the results.5. Your program scripts should be named as task1.py task2.py etc. 6. Check whether your local environment fits the assignment description, i.e. version, configuration. 7. If you implement the core part in python instead of spark, or implement it with a high time complexity (e.g. search an element in a list instead of a set), your program may be killed on the Vocareum because it runs too slow. 8. You are required to only use Spark RDD in order to understand Spark operations more deeply. You will not get any points if you use Spark DataFrame or DataSet. Don’t import sparksql. 9. Do not use Vocareum for debugging purposes, please debug on your local machine. Vocareum can be very slow if you use it for debugging. 10. Vocareum is reliable in helping you to check the input and output formats, but its function on checking the code correctness is limited. It can not guarantee the correctness of the code even with a full score in the submission report.11. Some students encounter an error like: the output rate …. has exceeded the allowed value ….bytes/s; attempting to kill the process. To resolve this, please remove all print statements and set the Spark logging level such that it limits the logs generated – that can be done using sc.setLogLevel . Preferably, set the log level to either WARN or ERROR when submitting your code.

$25.00 View

[SOLVED] Dsci553  assignment 2 1. overview of the assignment in this assignment, you will implement the son algorithm using the spark framework.

In this assignment, you will implement the SON Algorithm using the Spark Framework. You will develop a program to find frequent itemsets in two datasets, one simulated dataset and one real-world generated dataset. The goal of this assignment is to apply the algorithms you have learned in class on large datasets more efficiently in a distributed environment.2.1 Programming Requirements a. You must use Python to implement all tasks. You can only use standard python libraries (i.e., external libraries like numpy or pandas are not allowed). There will be a 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You are required to only use Spark RDD in order to understand Spark operations. You will not get any points if you use Spark DataFrame or DataSet. c. Python standard library set : https://docs.python.org/3/library/2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use these library versions to compile and test your code. There will be no point if we cannot run your code on Vocareum. On Vocareum, you can call `spark-submit` located at `/opt/spark/spark-3.1.2-binhadoop3.2/bin/spark-submit`. (Do not use the one at /usr/local/bin/spark-submit). We use `–executor-memory 4G –driver-memory 4G` on Vocareum for grading.2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! TAs will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism. We will report all detected plagiarism and severe penalties will be given for the students whose submissions are plagiarized.2.4 What you need to turn in We will grade all submissions on Vocareum and the submissions on the blackboard will be ignored. Vocareum produces a submission report after you click the “Submit” button (It takes a while since Vocareum needs to run your code in order to generate the report). Vocareum will only grade Python scripts during the submission phase and it will grade both Python and Scala during the grading phase. a. Two Python scripts, named: (all lowercase) task1.py, task2.py b. [OPTIONAL] hw2.jar and two Scala scripts, named: (all lowercase) hw2.jar, task1.scala, task2.scala c. You don’t need to include your results or the datasets. We will grade your code with our testing data (data will be in the same format). d. Students can submit an unlimited number of times. Only the latest submission will be accepted and graded.In this assignment, you will use one simulated dataset and one real-world dataset. In task 1, you will build and test your program with a small simulated CSV file that has been provided to you. Then in task2 you need to generate a subset using the Ta Feng dataset with a structure similar to the simulated data. Figure 1 shows the file structure of task1 simulated csv, the first column is user_id and the second column is business_id. Figure 1: Input Data FormatIn this assignment, you will implement the SON Algorithm to solve all tasks (Task 1 and 2) on top of Spark Framework. You need to find all the possible combinations of the frequent itemsets in any given input file within the required time. You can refer to Chapter 6 from the Mining of Massive Datasets book and concentrate on section 6.4 – Limited-Pass Algorithms. (Hint: you can choose either A-Priori, MultiHash, or PCY algorithm to process each chunk of the data)4.1 Task 1: Simulated data (3 pts) There are two CSV files (small1.csv and small2.csv) in Vocareum under ‘../resource/asnlib/publicdata’. The small1.csv is just a test file that you can use to debug your code. For task1, we will only test your code on small2.csv. In this task, you need to build two kinds of market-basket models. Case 1 (1.5 pts): You will calculate the combinations of frequent businesses (as singletons, pairs, triples, etc.) that are qualified as frequent given a support threshold.You need to create a basket for each user containing the business ids reviewed by this user. If a business was reviewed more than once by a reviewer, we consider this product was rated only once. More specifically, the business ids within each basket are unique. The generated baskets are similar to: user1: [business11, business12, business13, …] user2: [business21, business22, business23, …] user3: [business31, business32, business33, …] Case 2 (1.5 pts): You will calculate the combinations of frequent users (as singletons, pairs, triples, etc.) that are qualified as frequent given a support threshold. You need to create a basket for each business containing the user ids that commented on this business. Similar to case 1, the user ids within each basket are unique.The generated baskets are similar to: business1: [user11, user12, user13, …] business2: [user21, user22, user23, …] business3: [user31, user32, user33, …] Input format: 1. Case number: Integer that specifies the case. 1 for Case 1 and 2 for Case 2. 2. Support: Integer that defines the minimum count to qualify as a frequent itemset. 3. Input file path: This is the path to the input file including path, file name and extension.4. Output file path: This is the path to the output file including path, file name and extension. Output format: 1. Runtime: the total execution time from loading the file till finishing writing the output file You need to print the runtime in the console with the “Duration” tag, e.g., “Duration: 100”. 2. Output file: (1) Intermediate result You should use “Candidates:” as the tag. For each line you should output the candidates of frequent itemsets you found after the first pass of SON Algorithm followed by an empty line after each combination.The printed itemsets must be sorted in lexicographical order (Both user_id and business_id are types of string). (2) Final result You should use “Frequent Itemsets:”as the tag. For each line you should output the final frequent itemsets you found after finishing the SON Algorithm. The format is the same with the intermediate results. The printed itemsets must be sorted in lexicographical order. Here is an example of the output file: Both the intermediate results and final results should be saved in ONE output result file. Command line Format: Python: spark-submit task1.py Scala: spark-submit –class task1 hw2.jar Command line Example: /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit –executor-memory 4G — driver-memory 4G task1.py 1 4 ../resource/asnlib/publicdata/small1.csv task1_output.txt4.2 Task 2: Ta Feng data (4 pts) In task 2, you will explore the Ta Feng dataset to find the frequent itemsets (only case 1). You will use data found here from Kaggle (https://bit.ly/2miWqFS) to find product IDs associated with a given customer ID each day. Aggregate all purchases a customer makes within a day into one basket. In other words, assume a customer purchases at once all items purchased within a day. The data file is provided at ../resource/asnlib/publicdata/ta_feng_all_months_merged.csv Note: Be careful when reading the csv file as spark can read the product id numbers with leading zeros. You can manually format Column F (PRODUCT_ID) to numbers (with zero decimal places) in the csv file before reading it using spark. SON Algorithm on Ta Feng data: You will create a data pipeline where the input is the raw Ta Feng data, and the output is the file described under “output file”.You will pre-process the data, and then from this pre-processed data, you will create the final output. Your code is allowed to output this pre-processed data during execution, but you should NOT submit homework that includes this pre-processed data. (1) Data preprocessing You need to generate a dataset from the Ta Feng dataset with following steps: 1. Find the date of the purchase (column TRANSACTION_DT), such as December 1, 2000 (12/1/00) 2. At each date, select “CUSTOMER_ID” and “PRODUCT_ID”. 3. We want to consider all items bought by a consumer each day as a separate transaction (i.e., “baskets”). For example, if consumer 1, 2, and 3 each bought oranges December 2, 2000, and consumer 2 also bought celery on December 3, 2000, we would consider that to be 4 separate transactions. An easy way to do this is to rename each CUSTOMER_ID as “DATE-CUSTOMER_ID”. For example, if CUSTOMER_ID is 12321, and this customer bought apples November 14, 2000, then their new ID is “11/14/00-12321” 4. Make sure each line in the CSV file is “DATE-CUSTOMER_ID1, PRODUCT_ID1”. 5. The header of CSV file should be “DATE-CUSTOMER_ID, PRODUCT_ID”You need to save the dataset in CSV format. Figure below shows an example of the output file (please note DATE-CUSTOMER_ID and PRODUCT_ID are strings and integers, respectively) Figure: customer_product file Do NOT submit the output file of this data preprocessing step, but your code is allowed to create this file. (2) Apply SON Algorithm The requirements for task 2 are similar to task 1. However, you will test your implementation with the large dataset you just generated. For this purpose, you need to report the total execution time.For this execution time, we take into account the time from reading the file till writing the results to the output file. You are asked to find the candidate and frequent itemsets (similar to the previous task) using the file you just generated.The following are the steps you need to do: 1. Reading the customer_product CSV file in to RDD and then build the case 1 market-basket model 2. Find out qualified customers-date who purchased more than k items. (k is the filter threshold); 3. Apply the SON Algorithm code to the filtered market-basket model; Input format: 1. Filter threshold: Integer that is used to filter out qualified users 2. Support: Integer that defines the minimum count to qualify as a frequent itemset. 3. Input file path: This is the path to the input file including path, file name and extension. 4. Output file path: This is the path to the output file including path, file name and extension. Output format: 1. Runtime: the total execution time from loading the file till finishing writing the output file You need to print the runtime in the console with the “Duration” tag, e.g., “Duration: 100”. 2. Output file The output file format is the same with task 1. Both the intermediate results and final results should be saved in ONE output result file. Command line Format: Python: spark-submit task2.py Scala: spark-submit –class task2 hw2.jar Command line Example: /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit –executor-memory 4G –driver-memory 4G task2.py 20 50 ../resource/asnlib/publicdata/ta_feng_all_months_merged.csv task2_output.txt6. Evaluation Metric Task 1: Input File Case Support Runtime (sec) small1.csv 1 4

$25.00 View

[SOLVED] Dsci 553 assignment 1 1. overview of the assignment in assignment 1, you will work on three tasks. the goal of these tasks is to get you familiar with spark operation types

In assignment 1, you will work on three tasks. The goal of these tasks is to get you familiar with Spark operation types (e.g., transformations and actions) and explore a real-world dataset: Yelp dataset (https://www.yelp.com/dataset). If you have questions about the assignment, please ask on Piazza, this helps promote interactions amongst students and will also serve as an FAQ for other students facing similar problems. You have to submit your assignments on Vocareum directly.2.1 Programming Requirements a. You must use Python to implement all tasks. You can only use standard python libraries (i.e., external libraries like numpy or pandas are not allowed) because that is sufficient for this programming assignment. There will be a 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You are required to only use Spark RDD in order to understand Spark operations. You will not get any points if you use Spark DataFrame or DataSet. c. Python standard library set : https://docs.python.org/3/library/2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use these library versions to compile and test your code. There will be no points granted if we cannot run your code on Vocareum. On Vocareum, you can call `spark-submit` located at `/opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit`. (Do not use the one at /usr/local/bin/spark-submit (2.3.0)). We use `–executor-memory 4G –driver-memory 4G` on Vocareum for grading.2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! TAs will combine all the code that can be found from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all detected plagiarism and severe penalties will be given for the students whose submissions are plagiarized.2.4 What you need to turn in We will grade all submissions on Vocareum. Vocareum produces a submission report after you click the “Submit” button (It takes a while since Vocareum needs to run your code in order to generate the report). Vocareum will only grade Python scripts during the submission phase and it will grade both Python and Scala during the grading phase. a. [REQUIRED] three Python scripts, named: (all lowercase) task1.py, task2.py, task3.py b1. [OPTIONAL, REQUIRED FOR SCALA] three Scala scripts and the output jar file, named: (all lowercase) hw1.jar, task1.scala, task2.scala, task3.scala c. You don’t need to include your results or the datasets. We will grade your code with our testing data (data will be in the same format). d. Students can submit an unlimited number of times. Only the latest submission will be accepted and graded.In this assignment, you will explore the Yelp dataset. You can find the data on Vocareum under resource/asnlib/publicdata/. The two files business.json and test_review.json are the two files you will work on for this assignment, and they are subsets of the original Yelp Dataset. The submission report you get from Vocareum is for the subsets. For grading, we will use the files from the original Yelp dataset which is SIGNIFICANTLY larger (e.g. review.json can be 5GB). You should make sure your code works well on large datasets as well.4.1 Task1: Data Exploration (3 points) You will work on test_review.json, which contains the review information from users, and write a program to automatically answer the following questions: A. The total number of reviews (0.5 point) B. The number of reviews in 2018 (0.5 point) C. The number of distinct users who wrote reviews (0.5 point) D. The top 10 users who wrote the largest numbers of reviews and the number of reviews they wrote (0.5 point) E. The number of distinct businesses that have been reviewed (0.5 point) F. The top 10 businesses that had the largest numbers of reviews and the number of reviews they had (0.5 point) Input format: (we will use the following command to execute your code) Python: /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit –executor-memory 4G –driver-memory 4G task1.py Scala: spark-submit –class task1 –executor-memory 4G –driver-memory 4G hw1.jarOutput format: IMPORTANT: Please strictly follow the output format since your code will be graded automatically. a. The output for Questions A/B/C/E will be a number. The output for Questions D/F will be a list, which is sorted by the number of reviews in the descending order. If two user_ids/business_ids have the same number of reviews, please sort the user_ids /business_ids in the lexicographical order. b. You need to write the results in the JSON format file.You must use exactly the same tags (see the red boxes in Figure 2) for answering each question. Figure 1: JSON output structure for task1 4.2 Task2: Partition (2 points) Since processing large volumes of data requires performance optimizations, properly partitioning the data for processing is imperative. In this task, you will show the number of partitions for the RDD used for Task 1 Question F and the number of items per partition. Then you need to use a customized partition function to improve the performance of map and reduce tasks. A time duration (for executing Task 1 Question F) comparison between the system default partition and your customized partition (RDD built using the partition function) should also be shown in your results.Hint: Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark’s mechanism for redistributing data so that it’s grouped differently across partitions. This typically involves copying data across executors and machines, making the shuffle a complex and costly operation. So, trying to design a partition function to avoid the shuffle will improve the performance a lot. Input format: (we will use the following command to execute your code) Python: /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit –executor-memory 4G –driver-memory 4G task2.py Scala: spark-submit –class –executor-memory 4G –driver-memory 4G task2 hw1.jar Output format: A. The output for the number of partitions and execution time will be a number.The output for the number of items per partition will be a list of numbers. B. You need to write the results in a JSON file. You must use exactly the same tags. C. Do not round off the execution times. Figure 3: JSON output structure for task24.3 Task3: Exploration on Multiple Datasets (2 points) In task3, you are asked to explore two datasets together containing review information (test_review.json) and business information (business.json) and write a program to answer the following questions: A. What are the average stars for each city? (1 point) 1. (DO NOT use the stars information in the business file). 2. (DO NOT discard records with empty “city” field prior to aggregation – this just means that you should not worry about performing any error handling, input data cleanup or handling edge case scenarios).3. (DO NOT perform any round off for the average stars). B. You are required to compare the execution time of using two methods to print top 10 cities with highest average stars. Please note that this task – (Task 3(B)) is not graded. You will get full points only if you implement the logic to generate the output file required for this task. 1. To evaluate the execution time, start tracking the execution time from the point you load the file. For M1: execution time = loading time + time to create and collect averages, sort using Python and print the first 10 cities. For M2: execution time = loading time + time to create and collect averages, sort using Spark and print the first 10 cities. The loading time will stay the same for both methods, the idea is to compare the overall execution time for both methods and understand which method is more efficient for an end-to-end solution. Please note that for Method 1, only sorting is to be done in Python. Creating and collecting averages needs to be done via RDD. You should store the execution time in the json file with the tags “m1” and “m2”.2. Additionally, add a “reason” field and provide a hard-coded explanation for the observed execution times. 3. Do not round off the execution times. Input format: (we will use the following command to execute your code) Python: /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit –executor-memory 4G –driver-memory 4G task3.py Scala: spark-submit –class task3 –executor-memory 4G –driver-memory 4G hw1.jarOutput format: a. You need to write the results for Question A as a text file. The header (first line) of the file is “city,stars”. The outputs should be sorted by the average stars in descending order. If two cities have the same stars, please sort the cities in the lexicographical order. (see Figure 3 left). b. You also need to write the answer for Question B in a JSON file. You must use exactly the same tags for the task. Figure 3: Question A output file structure (left) and JSON output structure (right) for task3 5. Grading Criteria (% penalty = % penalty of possible points you get) 1. You can use your free 5-day extension separately or together https://forms.gle/gs5eDtjd1q18nGEx5 1. This form will record the number of late days you use for each assignment.We will not count late days if no request is submitted. 1. There will be a 10% bonus if you use both Scala and Python and get expected results. 2. We will combine all the codes we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. If plagiarism is detected, there will be no point for the entire assignment and we will report all detected plagiarism. 3. All submissions will be graded on the Vocareum. Please strictly follow the format provided, otherwise you can’t get the point even though the answer is correct. You are encouraged to try out your code on Vocareum terminal. 4. We will grade both the correctness and efficiency of your implementation. The efficiency is evaluated by processing time and memory usage. The maximum memory allowed to use is 4G, and maximum processing time is 1800s for grading. The datasets used for grading are larger than the ones that you use for doing the assignment.You will get *% penalty if your implementation cannot generate correctness outputs for large files using 4G memory within the 1800s. Therefore, please make sure your implementation is efficient to process large files. 5. Regrading policy: We can regrade your assignments within seven days once the scores are released. Regrading requests will not be accepted after one week. 6. There will be a 20% penalty for late submission within a week and no point after a week. If you use your late days, there wouldn’t be a 20% penalty. 7. Only when your results from Python are correct, the bonus of using Scala will be calculated. There is no partial point for Scala. See the example below: Example situationsTask Score for Python Score for Scala (10% of previous column if correct) Total Task1 Correct: 3 points Correct: 3 * 10% 3.3 Task1 Wrong: 0 point Correct: 0 * 10% 0.0 Task1 Partially correct: 1.5 points Correct: 1.5 * 10% 1.65 Task1 Partially correct: 1.5 points Wrong: 0 1.5 6. Common problems causing fail submission on Vocareum/FAQ (If your program runs successfully on your local machine but fail on Vocareum, please check these) 1. Try your program on Vocareum terminal. Remember to set python version as python3.6, And use the latest Spark 2. Check the input command line formats. 3. Check the output formats, for example, the headers, tags, typos. 4. Check the requirements of sorting the results. 5. Your program scripts should be named as task1.py task2.py etc. 6. Check whether your local environment fits the assignment description, i.e. version, configuration. 7. If you implement the core part in python instead of spark, or implement it with a high time complexity (e.g. search an element in a list instead of a set), your program may be killed on the Vocareum because it runs too slow. 8. You are required to only use Spark RDD in order to understand Spark operations more deeply. You will not get any points if you use Spark DataFrame or DataSet. Don’t import sparksql.9. Do not use Vocareum for debugging purposes, please debug on your local machine. Vocareum can be very slow if you use it for debugging. 10. Vocareum is reliable in helping you to check the input and output formats, but its function on checking the code correctness is limited. It can not guarantee the correctness of the code even with a full score in the submission report. 11. Some students encounter an error like: the output rate …. has exceeded the allowed value ….bytes/s; attempting to kill the process. To resolve this, please remove all print statements and set the Spark logging level such that it limits the logs generated – that can be done using sc.setLogLevel.Preferably, set the log level to either WARN or ERROR when submitting your code. 7. Running Spark on Vocareum We’re going to use Spark 3.1.2 and Scala 2.12 for the assignments and the competition project. Here are the things that you need to do on Vocareum and local machine to run the latest Spark and Scala: On Vocareum: 1. Please select JDK 8 by running the command “export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64” 2. Please use the spark-submit command as “/opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit” On your local machine: 1. Please download and set up spark-3.1.2-bin-hadoop3.2, the setup steps should be the same as spark-2.4.4 2. If you use Scala, please update Scala’s version to 2.12 on IntelliJ. 8. Tutorials for Spark Installation Here are some useful links here to help you get started with the Spark installation. Tutorial for ubuntu: https://phoenixnap.com/kb/install-spark-on-ubuntu Tutorial for windows: https://medium.com/@GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c Windows Installation without Anaconda (Recommended): https://phoenixnap.com/kb/install-spark-on-windows-10 Tutorial for mac: https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f Tutorial for Linux systems: https://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm Tutorial for using IntelliJ: https://medium.com/@Sushil_Kumar/setting-up-spark-with-scala-development-environment-using-int el lij-idea-b22644f73ef1 Tutorial for Jupyter notebook on Windows: https://bigdata-madesimple.com/guide-to-install-spark-and-use-pyspark-from-jupyter-in-windows / Spark 3.1.2 installation https://archive.apache.org/dist/spark/spark-3.1.2/

$25.00 View

[SOLVED] Dsci 553:  competition project 1. overview of the assignment in this competition project, you need to improve the performance of your recommendation

In this competition project, you need to improve the performance of your recommendation system from Assignment 3. You can use any method (like the hybrid recommendation systems) to improve the prediction accuracy and efficiency.2.1 Programming Language and Library Requirements a. You must use Python to implement the competition project. You can use any external Python libraries as long as they are available on Vocareum. b. You are required to only use the Spark RDD to understand Spark operations. You will not receive any points if you use Spark DataFrame or DataSet. However, if an external python library requires a separate data structure you may use it to load the data into the library, but make sure to do all data pre/post-processing using a Spark RDD.2.2 Programming Environment Python 3.6, Scala 2.12, JDK 1.8 and Spark 3.1.2 We will use these library versions to compile and test your code. There will be a 20% penalty if we cannot run your code due to the library version inconsistency.2.3 Write your own code Do not share your code with other students!! We will combine all the code we can find from the Web (e.g., GitHub) as well as other students’ code from this and other (previous) sections for plagiarism detection. We will report all the detected plagiarism. 3. Yelp Data In this competition, the datasets you are going to use are from: https://drive.google.com/drive/folders/1SIlY40owpVcGXJw3xeXk76afCwtSUx11?usp=sharing We generated the following two datasets from the original Yelp review dataset with some filters. We randomly took 60% of the data as the training dataset, 20% of the data as the validation dataset, and 20% of the data as the testing dataset.A. yelp_train.csv: the training data, which only include the columns: user_id, business_id, and stars. B. yelp_val.csv: the validation data, which are in the same format as training data. C. We are not sharing the test dataset. D. other datasets: providing additional information (like the average star or location of a business) a. review_train.json: review data only for the training pairs (user, business) b. user.json: all user metadata c. business.json: all business metadata, including locations, attributes, and categories d. checkin.json: user checkins for individual businesses e. tip.json: tips (short reviews) written by a user about a business f. photo.json: photo data, including captions and classifications4. Task (8 points) In the competition, you need to build a recommendation system to predict the given (user, business) pairs. You can mine interesting and useful information from the datasets provided in the Google Drive folder to support your recommendation system. You must make an improvement to your recommendation system from homework assignment 3 in terms of accuracy.You can utilize the validation dataset (yelp_val.csv) to evaluate the accuracy of your recommendation system. There are two options to evaluate your recommendation system: (1) Error Distribution: You can compare your results to the corresponding ground truth and compute the absolute differences. You can divide the absolute differences into 5 levels and count the number for each level as following: >=0 and =1 and =2 and =3 and =4: 12 This means that there are 12345 predictions with < 1 difference from the ground truth.This way you will be able to know the error distribution of your predictions and to improve the performance of your recommendation systems. (2) RMSE Error: You can compute the RMSE (Root Mean Squared Error) by using following formula: where Predi is the prediction for business i and Ratei is the true rating for business i. n is the total number of the business you are predicting. Input format: (we will use the following commands to execute your code) /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit competition.py Param: folder_path: the path of dataset folder, which contains exactly the same file as the google drive Param: test_file_name: the name of the testing file (e.g., yelp_val.csv), including the file path Param: output_file_name: the name of the prediction result file, including the file path Output format: a. The output file is a CSV file, containing all the prediction results for each user and business pair in the validation/testing data. The header is “user_id, business_id, prediction”. There is no requirement for the order in this task. There is no requirement for the number of decimals for the similarity values. Please refer to the format in Figure 1. Figure 1: Output example in CSV b. You also need to write comments that include the description of your method (less than 300 words) in the first part of your program.The description should include the explanation of the models you are using, especially the way you improved the accuracy or efficiency of the system. We look forward to seeing creative methods. Please also report the error distribution, RMSE, and the total execution time on the validation dataset in the description. Figure 2 shows an example of the description file. If the comments are not included or the comments are not informative, there will be a one-point penalty. Figure 2: An example of description file Grading: We will compare your prediction results against the ground truth. We will use our testing data to evaluate your recommendation systems and grade based on the accuracy using RMSE.To get the full points for the competition project, your RMSE result should beat that of the TAs’ which is 0.9800 for testing data. If your recommendation system only beats .9800 for the validation data, you will receive 50% of the points for the competition. The final submission with the highest accuracy will receive an extra 6 points on the final grade. The second place will receive an extra 5 points. The third one will receive extra 4 points and so on until the sixth one will receive extra 1 point. To be more like a competition, you can see a “Leaderboard” button in the “Competition” on Vocareum. Every time you submit the code, your RMSE for validation data will be scored and show up on the leaderboard. You will have the option to choose your display name on the leaderboard. Partial credit will be given if your RMSE for testing data cannot achieve the threshold. If your homework 3 accuracy is x, and competition is y, and you do not meet the threshold, you would get (1-(y-0.98)/(x-0.98))*total score of the competition.5. Submission You need to submit your Python scripts on Vocareum with exactly the same name: ● competition.py 6. Grading Criteria (% penalty = % penalty of possible points you get) 1. You cannot use the extension for the competition. No late submissions will be accepted for the competition. 2. We will combine all the code we can find from the web (e.g., Github) as well as other students’ code from this and other (previous) sections for plagiarism detection. If plagiarism is detected, you will receive no points for the entire assignment and we will report all detected plagiarism.3. All submissions will be graded on Vocareum. Please strictly follow the format provided, otherwise you won’t receive points even though the answer is correct.4. Do NOT use Spark DataFrame, DataSet, sparksql. 5. We will not conduct regrades on competition submissions. 6. There will be no points awarded if the total execution time exceeds 25 minutes. 7. Common problems causing fail submission on Vocareum/FAQ (If your program runs seem successfully on your local machine but fail on Vocareum, please check these) 1. Try your program on Vocareum terminal.Remember to set python version as python3.6, And use the latest Spark /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit 2. Check the input command line format. 3. Check the output format, for example, the header, tag, typos. 4. Your Python script should be named as competition.py 5. Check whether your local environment fits the assignment description, i.e. version, configuration. 6. If you implement the core part in Python instead of Spark, or implement it in a high time complexity way (e.g. search an element in a list instead of a set), your program may be killed on Vocareum because it runs too slowly.

$25.00 View

[SOLVED] Homework 5 coms e6998 problem 1 – ssd, onnx model, visualization, inferencing 35 points

In this problem we will be inferencing SSD ONNX model using ONNX Runtime Server. You will follow the github repo and ONNX tutorials (links provided below). You will start with a pretrained Pytorch SSD model and retrain it for your target categories. Then you will convert this Pytorch model to ONNX and deploy it on ONNX runtime server for inferencing.1. Download pretrained pytorch MobilenetV1 SSD and test it locally using Pascal VOC 2007 dataset. Show the test accuracy for the 20 classes. (4)2. Select any two related categories from Google Open Images dataset and finetune the pretrained SSD model. Examples include, Aircraft and Aeroplane, Handgun and Shotgun. You can use open_images_downloader.py script provided at the github to download the data. For finetuning you can use the same parameters as in the tutorial below. Compute the accuracy of the test data for these categories before and after finetuning. (5+5)3. Convert the Pytorch model to ONNX format and save it. (4)4. Visualize the model using net drawer tool. Compile the model using embed_docstring flag and show the visualization output. Also show doc string (stack trace for PyTorch) for different types of nodes. (6)5. Deploy the ONNX model on ONNX runtime (ORT) server. You need to set up the environment following steps listed in the tutorial. Then you need make HTTP request to the ORT server. Test the inferencing set-up using 1 image from each of the two selected categories. (6)6. Parse the response message from the ORT server and annotate the two images. Show inferencing output (bounding boxes with labels) for the two images. (5)For part 1, 2, and 3, refer to the steps in the github repo. For part 4 refer to ONNX tutorial on visualizing and for 5 and 6 refer to ONNX tutorial on inferencing.References • Github repo. Shot MultiBox Detector Implementation in Pytorch. Available at https://github.com/qfgaohao/pytorch-ssd • ONNX tutorial. Visualizing an ONNX Model. Available at https://github.com/onnx/tutorials/blob/master/tutorials/VisualizingAModel.md • ONNX tutorial. Inferencing SSD ONNX model using ONNX Runtime Server. Available at https://github.com/onnx/tutorials/blob/master/tutorials/OnnxRuntimeServerSSDModel.ipynb • Google. Open Images Dataset V5 + Extensions. Available at https://storage.googleapis.com/openimages/web/index.html • The PASCAL Visual Object Classes Challenge 2007. Available at http://host.robots.ox.ac.uk/pascal/VOC/voc2007/In this question you will analyze different ML cloud platforms and compare their service offerings. In particular, you will consider ML cloud offerings from IBM, Google, Microsoft, and Amazon and compare them on the basis of following criteria:1. Frameworks: DL framework(s) supported and their version. (4) Here we are referring to machine learning platforms which have their own inbuilt images for different frameworks.2. Compute units: type(s) of compute units offered, i.e., GPU types. (2) 3. Model lifecycle management: tools supported to manage ML model lifecycle. (2) 4. Monitoring: availability of application logs and resource (GPU, CPU, memory) usage monitoring data to the user. (2)5. Visualization during training: performance metrics like accuracy and throughput (2) 6. Elastic Scaling: support for elastic scaling compute resources of an ongoing job. (2) 7. Training job description: training job description file format. Show how the same training job is specified in different ML platforms. Identify similar fields in the training job file for the 4 ML platforms through an example. (6)In this problem we will follow Kubeflow-Kale codelab (link below). You will follow the steps as outlined in the codelab to install Kubeflow with MiniKF, convert a Jupyter Notebook to Kubeflow Pipelines, and run Kubeflow Pipelines from inside a Notebook.For each step below you need to show the commands executed, terminal output, and screenshot of visual output (if any). You also need to give a new name to your GCP project and any resource instance you create, e.g., put your initial in the name string.1. Setting up the environment and installing MiniKF: Follow the steps in the codelab to: (a) Set up a GCP project. (2) (b) Install MiniKF and deploy your MinKF instance. (3) (c) Login to MiniKF, Kubeflow, and Rok. (3)2. Run a Pipeline from inside your Notebook: Follow the steps in the codelab to: (a) Create a notebook server. (3) (b) Download and run the notebook: We will be using pytorch-classification notbeook from the example repo. Note that the codelab uses a different example from the repo (titanic dataset ml.ipynb). (4)(c) Convert your notebook to a Kubeflow Pipeline: Enable Kale and then compile and run the pipeline from Kale Deployment Panel. Show output from each of the 5 steps of the pipeline (5) (d) Show snapshots of ”Graph” and ”Run output” of the experiment. (4) (e) Cleanup: Destroy the MiniKF VM. (1)References • Codelab. From Notebook to Kubeflow Pipelines with MiniKF and Kale. Available at https://codelabs.developers.google.com/codelabs/cloud-kubeflow-minikf-kale • https://github.com/kubeflow-kale/examplesThis question is based on Deep RL concepts discussed in Lecture 8. You need to refer to the papers by Mnih et al., Nair et al., and Horgan et al. to answer this question. All papers are linked below. 1. Explain the difference between episodic and continuous tasks? Given an example of each. (2)2. What do the terms exploration and exploitation mean in RL ? Why do the actors employ -greedy policy for selecting actions at each step? Should  remain fixed or follow a schedule during Deep RL training ? How does the value of  help balance exploration and exploitation during training. (1+1+1+1)3. How is the Deep Q-Learning algorithm different from Q-learning ? You will follow the steps of Deep Q-Learning algorithm in Mnih et al. (2013) page 5, and explain each step in your own words. (3) 4. What is the benefit of having a target Q-network ? (3)5. How does experience replay help in efficient Q-learning ? (3) 6. What is prioritized experience replay ? (2) 7. Compare and contrast GORILA (General Reinforcement Learning Architecture) and Ape-X architecture. Provide three similarities and three differences. (3)References • Mnih et al. Playing Atari with Deep Reinforcement Learning. 2013 Available at https://arxiv.org/pdf/1312.5602.pdf • Nair et al. Massively Parallel Methods for Deep Reinforcement Learning. 2015 Available at https://arxiv.org/pdf/1507.04296.pdf • Horgan et al. Distributed Prioritized Experience Replay. 2018 Available at https://arxiv.org/pdf/1803.00933.pdf

$25.00 View

[SOLVED] Homework 4 coms e6998 problem 1 – transfer learning: shallow learning vs finetuning, pytorch 30 points

In this problem we will train a convolutional neural network for image classification using transfer learning.Transfer learning involves training a base network from scratch on a very large dataset (e.g., Imagenet1K with 1.2 M images and 1K categories) and then using this base network either as a feature extractor or as an initialization network for target task. Thus two major transfer learning scenarios are as follows: • Finetuning the base model: Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on Imagenet dataset. Rest of the training looks as usual however the learning rate schedule for transfer learning may be different.• Base model as fixed feature extractor: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.1. For fine-tuning you will select a target dataset from the Visual-Decathlon challenge. Their web site (link below) has several datasets which you can download. Select any one of the visual decathlon dataset and make it your target dataset for transfer learning. Important : Do not select Imagenet1K as the target dataset.(a) Finetuning: You will first load a pretrained model (Resnet50) and change the final fully connected layer output to the number of classes in the target dataset. Describe your target dataset features, number of classes and distribution of images per class (i.e., number of images per class). Show any 4 sample images (belonging to 2 different classes) from your target dataset. (2+2)(b) First finetune by setting the same value of hyperparameters (learning rate=0.001, momentum=0.9) for all the layers. Keep batch size of 64 and train for 200-300 epochs or until model converges well. You will use a multi-step learning rate schedule and decay by a factor of 0.1 (γ = 0.1 in the link below). You can choose steps at which you want to decay the learning rate but do 3 drops during the training. So the first drop will bring down the learning rate to 0.0001, second to 0.00001, third to 0.000001. For example, if training for 200 epochs, first drop can happen at epoch 60, second at epoch 120 and third at epoch 180. (8)(c) Next keeping all the hyperparameters same as before, change the learning rate to 0.01 and 0.1 uniformly for all the layers. This means keep all the layers at same learning rate. So you will be doing two experiments, one keeping learning rate of all layers at 0.01 and one with 0.1. Again finetune the model and report the final accuracy. How does the accuracy with the three learning rates compare ? Which learning rate gives you the best accuracy on the target dataset ? (6)2. When using a pretrained model as feature extractor, all the layers of the network are frozen except the final layer. Thus except the last layer, none of the inner layers’ gradients are updated during backward pass with the target dataset. Since gradients do not need to be computed for most of the network, this is faster than finetuning.(a) Now train only the last layer for 1, 0.1, 0.01, and 0.001 while keeping all the other hyperparameters and settings same as earlier for finetuning. Which learning rate gives you the best accuracy on the target dataset ? (8)(b) For your target dataset find the best final accuracy (across all the learning rates) from the two transfer learning approaches. Which approach and learning rate is the winner? Provide a plausible explanation to support your observation. (4)For this problem the following resources will be helpful.References • Pytorch blog. Transfer Learning for Computer Vision Tutorial by S. Chilamkurthy Available at https://pytorch.org/tutorials/beginner/transfer learning tutorial.html • Notes on Transfer Learning. CS231n Convolutional Neural Networks for Visual Recognition. Available at https://cs231n.github.io/transfer-learning/• Visual Domain DecathlonThis problem is based on two papers, by Mahajan et al. on weakly supervised pretraining and by Yalinz et al. on semi-supervised learning for image classification. Both of these papers are from Facebook and used 1B images wiith hashtags. Read the two papers thoroughly and then answer the following questions.You can discuss these papers with your classmates if this helps in clarifying your doubts and improving your understanding. However no sharing of answers is permitted and all the questions should be answered individually in your own words.1. Both the papers use the same 1B image dataset. However one does weakly supervised pretraining while the other does semi-supervised . What is the difference between weakly supervised and semi-supervised pretraining ? How do they use the same dataset to do two different types of pretraining ? Explain. (2)2. These questions are based on the paper by Mahajan et al. (a) Are the model trained using hashtags robust against noise in the labels ? What experiments were done in the paper to study this and what was the finding ? Provide numbers from the paper to support your answer. (2) (b) Why is resampling of hashtag distribution important during pretraining for transfer learning ? (2)3. These questions are based on the paper by Yalzin et al. (a) Why are there two models, a teacher and a student, and how does the student model leverages the teacher model ? Explain why teacher-student modeling is a type of distillation technique. (2+2) (b) What are the parameters K and P in stage 2 of the approach where unlabeled images are assigned classes using teacher network ? What was the idea behind taking P > 1 ? Explain in your own words. (2+2)(c) Explain how a new labeled dataset is created using unlabeled images ? Can an image in this new dataset belong to more than one class ? Explain. (2+2) (d) Refer to Figure 5 in the paper. Why does the accuracy of the student model first improves as we increase the value of K and then decreases ? (2)References • Yalniz et al. Billion-scale semi-supervised learning for image classification. Available at https://arxiv.org/pdf/1905.00546.pdf • Mahajan et al. Exploring the Limits of Weakly Supervised Pretraining. Available at https://arxiv.org/pdf/1805.00932.pdfThis question is based on modeling the execution time of deep learning networks by calculating the floating point operations required at each layer. We looked at two papers in the class, one by Lu et al. and the other by Qi et al.1. Why achieving peak FLOPs from hardware devices like GPUs is a difficult propostion in real systems ? How does PPP help in capturing this inefficiency captured in Paleo model. (4)2. Lu et al. showed that FLOPs consumed by convolution layers in VG16 account for about 99% of the total FLOPS in the forward pass. We will do a similar analysis for VGG19. Calculate FLOPs for different layers in VGG19 and then calculate fraction of the total FLOPs attributed by convolution layers. (6)3. Study the tables showing timing benchmarks from Alexnet (Table 2), VGG16 (Table 3), Googlenet (Table 5), and Resnet50 (Table 6). Why the measured time and sum of layerwise timings for forward pass did not match on GPUs ? What approach was adopted in Sec. 5 of the paper to mitigate the measurement overhead in GPUs. (2+2)4. In Lu et al. FLOPs for different layers of a DNN are calculated. Use FLOPs numbers for VGG16 (Table 3), Googlenet (Table 5), and Resnet50 (Table 6), and calculate the inference time (time to have a forward pass with one image) using published Tflops number for K80 (Refer to NVIDIA TESLA GPU Accelerators). Use this to calculate the peak (theoretical) throughput achieved with K80 for these 3 models. (6)References • Qi et al. PALEO: A Performance model for Deep Neural Networks. ICLR 2017. Available at https://openreview.net/pdf?id=SyVVJ85lg • Lu et al. Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices. 2017 Available at https://arxiv.org/pdf/1709.09503.pdfPeng et al. proposed Optimus scheduler for deep learning clusters which makes use of a predictive model to estimate the remaining time of a training job. Optimus assumes a parameter-server architecture for distributed training where synchronization between parameter server(s) and workers happen after every training step.The time taken to complete one training step on a worker includes the time for doing forward propagation (i.e., loss computation) and backward propagation (i.e., gradients computation) at the worker, the worker pushing gradients to parameter servers, parameter servers updating parameters, and the worker pulling updated parameters from parameter servers, plus extra communication overhead.The predictive model proposed in Optimus is based on two sub-models, one to model the training loss as a function of number of steps and the other to model the training speed (training steps per unit time) as a function of resources (number of workers and parameter servers). The training loss model is given by Equation (1) in the paper. It has three parameters β0, β1, and β2 that needs to be estimated from the data.1. The first step is to generate data for predictive model calibration. You will train Resnet models with different number of layers (18, 20, 32, 44, 56) each with 3 different GPU types (K80, P100, V100). For these runs you will use CIFAR10, a batch size of 128, and run each job for 350 epochs. You need to collect training logs containing data on training loss and step number for different configuration. The data collection can be done in a group of up to 5 students. If working as a group each student should pick one of the 5 Resnet models and train it on all three GPU types.So each student in the group will be contributing training data from 3 experiments. If you decide to collaborate in the data collection please clearly mention the name of students involved in your submission. For each of these 15 experiments, use all the training data and calibrate a training loss model. You will report 15 models one of each of the experimental configuration and their corresponding parameters (β0, β1, β2). (15)2. We next study how the learned parameters, β0, β1, and β2, change with the type of GPUs and the size of network. Use a regression model on the data from 15 models to predict the value of these parameters as a function of number of layers in Resnet and GPU type. From these regresssion model predict the training loss curve for Resnet-50. Note that we are effectively doing prediction for a predictive model.To verify how good is this prediction, you will train Resnet-50 on a K80, P100, and V100 for target accuracy of 92% and compare the predicted loss curve with the real measurements. Show this comparison in a graph and calculate the percentage error. From the predicted loss curve get the number of epochs needed to achive 92% accuracy. Observe that there are three curves for three different GPU types, but the number of epochs required to reach a particular accuracy (convergence rate) should be independent of hardware. (8)3. Using the predicted number of epochs for Resnet-50 along with the resource-speed model (use Equation (4) in Peng et al. along with its coefficients from the paper) obtain the time to accuracy of Resnet-50 (to reach 92% accuracy) in two different setting (with 2 and 4 parameter servers respectively) as a function of the number of workers. So you will be plotting two curves, one for 2 and one for 4 parameter server case. Each curve will show how the time to achieve 92% accuracy (on the y-axis) scales with number of workers (on the x-axis). (7)References • Peng et al. Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters Available at https://i.cs.hku.hk/ cwu/papers/yhpeng-eurosys18.pdfNote • In 5.2 other than the ResNet layers being different, every other hyperparameter should be the same during the data collection process across different students in a group (i.e.: Learning Rate, optimizer, preprocessing/normalization method. etc.). You should also use SGD optimizer since it’s one of the key assumptions by Peng et al.• When determining the βs, you can use scipy’s curve fit function for regression based on k (effective step number) and l (training loss).

$25.00 View