Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] ST3188 Statistical methods for market researchR

ST3188 Statistical methods for market research Introduction This course has a 30%-weighted coursework component which will require you to act as a market research agency and produce a market research proposal responding to a client's brief, as if being delivered to the client. The deadline for the coursework is Friday 1 March 2024. This individual project work is treated as an open-book examination. The Research Brief The research brief accompanies this document and contains the following. • A short introduction and background of the company or organisation commissioning the research. • Business objectives: These will be particular problems or challenges they are facing, or they could be more strategic aims and objectives. Examples of these include: o develop / launch a new product or service o grow market share o raise awareness of a product, service, or a particular message o increase customer satisfaction. • Research aims: These will be specific goals which a market research project would help answer. They could be specific questions which the organisation wishes to answer, or they could be information or insights about a particular population. They will be linked to the business objectives. Examples of these include: o understand the attitudes and behaviours of consumers / people o learn what factors lead to higher customer satisfaction o find out what gaps there are in a market o understand the image or associations with a brand or product o estimate the demand for a new product or service. • Some information about what, if any, data the business or organisation can supply, for example a customer database, a sampling frame. or operational data. • An indication of the available budget and required timescale for the research. The Research Proposal You will not be expected to conduct any primary research. The report should cover the following areas. (a) Provide a full summary of the research brief, including the aims of the research. (b) Demonstrate an understanding of the market or business context as well as any other publically available research done in this area. (c) Detail how the fieldwork would be conducted, i.e. face-to-face, telephone, online, focus groups, mixed-mode etc. (d) Explain the proposed sampling method as well as other sampling methods considered, including details on any sampling frame. to be used. (e) Detail the information that would be gathered and collected by the research. (f) Explain how you would use any customer or operational data supplied to you by the client. (g) Describe what multivariate analysis techniques you propose and how these would help the client's research aims. (You are not required to actually conduct any analysis.) (h) Detail the proposed sample size necessary to construct confidence intervals around the survey estimates. (i) An appropriate questionnaire which would capture suitable data to perform. the proposed multivariate analysis. (You are not required to actually run the questionnaire in practice.) (j) Proposed further research, i.e. include ideas for how some business or organisational objectives might be helped by further and different research. Marks will be awarded on the basis of the following:  Demonstration of a full understanding of the client’s issue / business problem and the market context.  A clear explanation of the specific aims of the research.  Thorough justification of the data collection methods, fieldwork approaches and sampling methods chosen, and also why others were rejected.  Creativity and imagination in your approach to the research.  Clear and concise expression of the ideas and your knowledge.  Demonstration of a clear understanding of the statistical concepts related to sampling and sample size determination.  Explanation of your chosen statistical analysis techniques and clear examples of how the client will benefit.  A well-thought out questionnaire design which reflects the aims of the research and intended statistical analysis.  Creative and imaginative suggestions for further research.  The quality and professionalism of the research proposal. The length of the main report should not exceed 3,000 words. You should also include an executive summary at the beginning of no more than one side of A4. Please also include a table of contents. The executive summary and table of contents are not included in the 3,000-word limit. The questionnaire does not count towards the word limit either. The word limit does not apply to text not in the main body such as footnotes and labels. Please note there is no allowance in the word limit. If you exceed the stated word limit you will be penalised. Please also state the word count. You should use the 1.5 lines setting, exactly as per this text. If you wish, you may also include a Technical Appendix at the end of the document (excluded from the word count) but the examiners will not consider anything included here for marking. All submissions will be checked using the anti-plagiarism software TurnItIn. Any duplicated text which is not adequately cited will be deemed to constitute plagiarism and proportional penalties will be applied during marking. You should also provide references wherever possible as this is ultimately a piece of academic work. If finding references for secondary research proves problematic, then please state this and it will be taken into account.

$25.00 View

[SOLVED] Lab 4 write a program to compare the performance of the lru and the optimal page replacement

Lab 4Write a program to compare the performance of the LRU and the Optimal page replacement algorithms. The program will take a reference string and the number of frames as inputs.Assume the maximum length of a reference string is 20 and there are 5 diffent pages from page 1 to page 5. The reference string can be randomly generated and the number of frames is entered through the keyboard. For example, the system generates a reference string 2 1 3 4 5 2 3 …5 and you enter the number of frames 3.Compare the number of page faults generated by the Optimal and LRU algorithms. Print out the page replacement process and you can see how LRU differs from the optimal.Submission:In order not to lose any files, you’d better zip all your files into a .zip file.Submit your project to TRACS before the deadline. Homework will NOT be accepted through emails. You should write a readme textfile telling the grader how to run your programs. Without this file, it is very likely that your project will not be run properly.

$25.00 View

[SOLVED] Lab 3 simulate the concurrent execution of two threads using a single program:

Lab 3Simulate the concurrent execution of two threads using a single program: Thread A: prints five subsequent lines of letter A on the printer and keeps looping (about 10 times). Thread B: prints five subsequent lines of letter B on the printer and keeps looping (about 10 times).Implement (a) and (b) using two programs, respectively:(a) Do not use any algorithm for mutual exclusion and show the printout, which should be similar to this:1: AAAAAAAAA 1: AAAAAAAAA1: BBBBBBBBB 1: BBBBBBBBB 1: BBBBBBBBB1: AAAAAAAAA 1: AAAAAAAAA 1: AAAAAAAAA1: BBBBBBBBB 1: BBBBBBBBB2: AAAAAAAAA 2: AAAAAAAAA…………………(b) Write the program again considering the mutual exclusion. Run the program several times to show that mutual exclusion is guaranteed.To make longer execution of threads, use some sleep(n) functions in the program. Experiment with n to choose the best one to show the results.1: AAAAAAAAA 1: AAAAAAAAA 1: AAAAAAAAA 1: AAAAAAAAA 1: AAAAAAAAA1: BBBBBBBBB 1: BBBBBBBBB 1: BBBBBBBBB 1: BBBBBBBBB 1: BBBBBBBBB2: AAAAAAAAA …………………Submission:In order not to lose any files, you’d better zip all your files into a .zip file.Submit your project to TRACS before the deadline. Homework will NOT be accepted through emails. You should write a readme textfile telling the grader how to run your programs. Without this file, it is very likely that your project will not be run properly.

$25.00 View

[SOLVED] Lab 2 write a small shell – called shhh – that has the following capabilities:

Lab 2Write a small shell – called shhh – that has the following capabilities:1. Can execute a command with the accompanying arguments. 2. Recognize multiple pipe requests and handle them. 3. Recognize redirection requests and handle them. 4. Type “exit” to quit the shhh shell.Sample commands:shhh>lsshhh>ls -t -alshhh>cat file.txt (file.txt is an existing file)shhh>ls -al > output.txtAnd then open output.txt to see if the content is correct or notshhh> ls | more | wcshhh>./pre < input.txt | ./sort > output.txt (./pre and ./sort are the executable from proj1. input.txt is the file that provides the input and output.txt is the output file)shhh> exitThe shell shhh should always wait for ALL the commands to finish. The topology of the forked processes should be linear children; e.g the shell should have as many children as there are processes needed – with pipes connecting adjacent children.You may assume that any redirection in the command is specified like the third example above. E.g. “redirection in” ( < ) is always specified before the first pipe appears and “redirection out” ( > ) is always after the last pipe specified. To make life easier for you, you may assume that only commands with correct syntax are typed in. In other words don’t worry about errors in the formation of the commands.The partial program is available in TRACS, lab2.c. The command parsing part is already done in the program. On your part, you need to implement the above functions.How to submit?Zip all your files (including ./pre and ./sort from the last assignment) and submit to TRACS before the deadline. Homework will NOT be accepted through emails. You should write a readme textfile telling the grader how to run your programs. Without this file, it is very likely that your project will not be run properly.

$25.00 View

[SOLVED] Lab 1 1. write small c programs. a). the first program “pre.c” should read in a list of u.s. states

Lab 11. Write small c programs.a). The first program “pre.c” should read in a list of U.S. states and their populations. You can google to find the information. To be simple, you can use the abbreviation for each state. We assume there are at most 10 states in the list. Enter the inputs through the keyboard and display the output on the screen. The input ends when an EOF (generated by Ctrl-D) is encountered. The output of the program should display the states whose population is above 10 million.For example, the following are the inputs to “pre.c”. The unit is million.TX 26 NC 9 MD 5 NY 19 CA 38 Ctrl-D (press the keys to terminate the inputs.)Then “pre.c” produces the output: TX NY CANote: an EOF is usually ‘sent’ to a process by hitting a CTRL_D. If you type stty -a on your unix command line, you can get info that tells you which keyboard keys mean what. FYI, in c, to put values to standard_out use printf(). To get values from standard_in use scanf() or getchar().b). The second program “sort.c” reads in a list of state abbreviations from stdin and display them in alphabetical order on the screen. Assume there are no more than 10 states and the sequence is read until you press Ctrl-D which generates an EOF.If the inputs are:TX NY CA Ctrl-D (press the keys to terminate the inputs.)The outputs should be:CA NY TX2. Write a c program to set up a child-TO-parent pipe; the child should ‘exec’ to perform the “pre” process from the above and its output should be connected to the pipe connected to the parent, which should ‘exec’ to perform the “sort” process from the above.3. Write a program to take a UNIX command from the command line and fork() a child to execute it. The command can be a simple command like: $ls or $ps, or it can be a command with options such as $ls -t -l. Use argc and argv[] in the main function to pass parameters. When the child process is executing the command, the parent process simply waits for the termination of the child process. The process id of the parent and the child should be printed out using getpid() and getppid() functions.Submission:In order not to lose any files, you’d better zip all your files into a .zip file.Submit your project to TRACS before the deadline. Homework will NOT be accepted through emails. You should write a readme textfile telling the grader how to run your programs. Without this file, it is very likely that your project will not be run properly. 

$25.00 View

[SOLVED] Ecse 427/comp 310 – assignment 02 simple user-level thread scheduler

In this project, you will develop a simple many-to-many user-level threading library with a simple first-come-first-serve (FCFS) thread scheduler. The threading library will have two executors – these are kernel-level threads that run the user-level threads. One executor is dedicated to running compute tasks and the other executor is dedicated for input-output tasks.The user-level threads are responsible for running tasks. The tasks are C functions. Your threading library is expected to at least run the tasks we provide as part of this assignment. A task is run by the scheduler until it completes or yields. The tasks we provide here do not complete (i.e., they have while (true) in their bodies). The only way a task can stop running once it is started is by yielding the execution. A task calls sut_yield() for yielding (i.e., pausing) its execution. A task that is yielding is put back in the task ready queue (at the end of the queue). Once the running task is put back in the ready queue, the task at the front of the queue is selected to run next by the scheduler.A newly created task is added to the end of the task ready queue. To create a task we use the sut_create() function, which takes a C function as its sole argument. When the task gets to run on the executor, the user supplied C function is executed.All tasks execute in a single process. Therefore, the tasks share the process’ memory. In this simple user-level threading system, variables follow the C scoping rules in the tasks. You can make variables local to a task by declaring them in the C function that forms the “main” of the task. The global variables are accessible from all tasks.The sut_yield() pauses the execution of a thread until it is selected again by the scheduler. With a FCFS scheduler the task being paused is put at the back of the queue. That means the task would be picked again after running all other tasks that are ahead of it in the queue. We also have a way of terminating a task’s execution by calling sut_exit(), which stops the execution of the current task like sut_yield() but does not put it back into the task ready queue for further execution.So far, we described what happens with the compute tasks. The threading library has two executors (i.e., kernel-level threads). One is dedicated for compute tasks and is responsible for carrying out all that is described above. The other executor is dedicated for input-output (I/O). For instance, a task would want to send a message to outside processes or receive data from outside processes. This is problematic because input-output can be blocking. We have a problem when a user-level thread blocks – the whole program would stall. To prevent this problem, we use a dedicated executor for I/O. The idea is to offload the input and output operations to the I/O executor such that the compute executor would not block.For instance, sut_read() would read data from an external process much like the read() you would do on a socket. The sut_read() can stall (delay) the task until the data arrives. With userlevel threading, stalling the task is not a good idea. We want to receive the data without stalling the executor with the following approach. When a task issues sut_read() we send the request to a request queue and the task itself is put in a wait queue.When the response arrives, the task is moved from the wait queue to the task ready queue and it would get to run in a future time. We have sut_write() to write data to the remote process. However, the write is non-blocking. To simplify the problem, we don’t even wait for an acknowledgement of the sent data. We copy the data to be sent to local buffer and expect the I/O executor to reliably send it to the destination.Same way we also have sut_open() and expect the I/O executor to make the connection to the remote process without error. Error conditions such as remote process not found is not willing to accept connections are not handled. We assume all is good regarding the remote process and just open the connection and move to the next statement.Overall Architecture of the SUT Library The simple user-level threading (SUT) library that you are developing in this assignment has the following major components.It has two kernel-level threads known as the executors. One of them is the compute executor (CEXEC) and the other is the I/O executor (I-EXEC). The C-EXEC is responsible for most the activities in the SUT library. The I-EXEC is only taking care of the I/O operations. Creating the two kernel-level threads to run C-EXEC and I-EXEC is the first action performed while initializing the SUT library.The C-EXEC is directly responsible for creating tasks and launching them. Creating a task means we need to create a task structure with the given C task-main function, stack, and the appropriate values filled into the task structure. Once the task structure is created, it is inserted into the task ready queue. The C-EXEC pulls the first task in the task ready queue and starts executing it. The executing task can take three actions:1. Execute sut_yield(): this causes the C-EXEC to take over the control. That is the user task’s context is saved in a task control block (TCB) and we load the context of C-EXEC and start it. We also put the task in the back of the task ready queue.2. Execute sut_exit(): this causes the C-EXEC to take over the control like the above case. The major difference is that TCB is not updated or the task inserted back into the task ready queue.3. Execute sut_read(): this causes the C-EXEC to take over the control. We save the user task’s context in a task control block (TCB) and we load the context of C-EXEC and start it. We put the task in the back of the wait queue.When a task executes sut_open(), it sends a message to the I-EXEC by enqueuing the message in the To-IO queue. The message will instruct the I-EXEC to open a connection to the specified destination. In a subsequent statement, the task would issue sut_write() to write the data to the destination process. We would not wait for confirmation from open or write as a way of simplifying their implementations.The destination process we open for I/O is used by sut_read() as well. Because we assume a single active remote process there is no need to pass a connection handle like a file descriptor for the sut_read() or sut_write() call. We will have a sut_close() terminate the connection we have established with the remote process.After the SUT library is done with the initializations, it will start creating the tasks and pushing them into the task ready queue. Once the tasks are created, the SUT will pick a task from the task ready queue and launch it. Some tasks can be launched at runtime by user tasks by calling the sut_create() function.The task scheduler, which is a very simple scheme – just pick the task at the front of the queue, might find that there are no tasks to run in the task ready queue. For instance, the only task in the task ready queue could issue a read and go into the wait queue. To reduce the CPU utilization the C-EXEC will go take a short sleep using the nanosleep command in Linux (a sleep of 100 microseconds is appropriate). After the sleep, the C-EXEC will check the task ready queue again.The I-EXEC is primarily responsible for reading the To-IO message queue and executing the appropriate command. For instance, I-EXEC needs to open a connection to the remote process in response to the sut_open() call. Similarly, it needs to process the sut_write() call as well. With the sut_read() call, I-EXEC get the data and needs to put the data in the From-IO message queue and transfer the task from the wait queue to the task ready queue. This transfer is done by I-EXEC because the task does not need to wait any more. The data the task wanting to read has arrived.The sut_shutdown() call is responsible for cleanly shutting down the thread library. We need to keep the main thread waiting for the C-EXEC and I-EXEC threads and it one of the important functions of sut_shutdown(). In addition, you can put any termination related actions into this function and cleanly terminate the threading library.The SUT Library API and Usage The SUT library will have the following API. You need to follow the given API so that testing can be easy. void sut_init(); bool sut_create(sut_task_f fn); void sut_yield(); void sut_exit(); void sut_open(char *dest, int port); void sut_write(char *buf, int size); void sut_close(); Executor (C-EXEC) Executor (I-EXE C) Task Ready Queue Wait Queuechar *sut_read(); void sut_shutdown();Context Switching and Tasks You can use the makecontext() and swapcontext() to manage the user-level thread creation, switching, etc. The sample code provided in the YAUThreads package illustrates the use of the user-level context management in Linux.Important Assumptions Here are some important assumptions you can make in this assignment. If you want to make additional assumptions, check with the TA-in-charge (Jason) or the professor.• The read and write tasks are launched by the application. We assume that there is only one outstanding read/write at any given time. With the read API provided in SUT, we can have multiple outstanding read calls. However, processing multiple outstanding read calls can be complicated. Therefore, we assume only one read or write call could be pending at any given time. That also means that the wait queue for I/O can have only one task at any given time (do we actually need a queue?).• There are no interrupts in the user-level thread management to be implemented in this assignment. A task that starts running only stops for the three reasons given in Section 2. • You can use libraries for creating queues and other data structures – you don’t need to implement them yourself! We have already given you some libraries for implementing data structures.Grading Your assignment will be evaluated in stages. 1. Only simple computing tasks. We spawn several simple tasks that just print messages and yield. You need to get this working to demonstrate that you can create tasks and they can cooperatively switch among them.2. Tasks that spawn other tasks. In this case, we have some tasks that spawn more tasks. In total a bounded (not more than 15 tasks) will be created. You need to demonstrate that you can have tasks creating other tasks at runtime.3. Tasks that have I/O in them. We have some tasks write to a remote server. We don’t mix write and read, just one type – write from the tasks to an external server.4. Tasks that have I/O again. This time the tasks have read in them.

$25.00 View

[SOLVED] Ecse 427/comp 310 assignment 01 simple remote procedure call service

Consider a simple calculator. It has a read-eval-print-loop (REPL), where you enter something like the following. >> add 6 10 16 >> multiply 4 5 20 >>The calculator is structured as two programs: frontend and backend. The frontend does not do the actual calculations. It just prints the prompt, get the input from the user, parses the input into a message. The frontend passes the message to the backend that does the actual calculations and passes the result back to the frontend, where it is printed to the user. We write two programs backend.c and frontend.c. The backend has some functions it wants to expose for the frontend.The frontend needs to call those functions that are exposed by the backend. The remote procedure call service (RPCServ) is responsible for linking the two. You are expected to develop such a RPCServ as part of this assignment. We start with a very simple implementation. The frontend issues an execution command and backend runs it and returns the result.This simple structure would work even with multiple frontends if the commands issued by the frontends can be executed very quickly. If some commands can hold the backend for a long time (like seconds), we have a problem. When a frontend is running a long command, other frontends will find the backend unavailable.That is, you run sleep 5 in the frontend and the backend is held by that frontend for 5 seconds. If other frontends try to run a calculation, they will not get any results. To solve this problem, you will use multi-processing. That is the backend will create a serving process for each frontend. In this configuration, as soon as the frontend connects to the backend, we create a new serving process that is dedicated to the frontend and let it serve the frontend’s requests. Even if the frontend does a sleep 5, it would not cause problems for other frontends. The backend is still available for requests from other frontends. To keep things simple, we limit the number of concurrent frontends to 5.The pseudo code shown below for the frontend is not complete. It shows the bare essentials. You need to add the missing functions to make the frontend meet all the requirements and make it work with RPCServ and the backend. backend = RPC_Connect(backendIP, backendPort) while(no_exit) { print_prompt() line = read_line()cmd = parse_line(line) RPC_Call(backend, cmd.name, cmd.args) } RPC_Close(backend)The frontend does not implement any of the commands the user enters into the shell. It simply relays them to the backend. You will notice that the command entered by the user looks like the following: command (string) and parameters. We will restrict the parameters to 2 or less. You can have commands with no parameters. The parameters can be integers or floating-point numbers.You need to have an RPCServ interface to send the command and parameters to the backend. In the pseudo code, we show such an interface – RPC_Call(). The frontend will check if the user has entered the exit command. If that is the case, the frontend will stop reading the next command and terminate the association with the backend. The backend is a separate process, so it keeps running even after the frontend has stopped running.The user can enter the shutdown command in the frontend to terminate the backend. With multiple frontends connecting to the backend, the shutdown can be tricky. More on this in the backend requirements. Some commands entered by the user in the frontend may not be recognized by the backend, in that case the backend will send the NOT_FOUND error message. The front needs to display this to the user. We can also have error message for certain operations such as division by zero errors. These error messages need to be displayed as well.The pseudo code shown below for the backend is not complete. It shows the bare essentials. You need to add the missing functions to make the backend meet all the requirements and make it work with RPCServ and the frontend. serv = RPC_Init(myIP, myPort) for_all_functions(name) RPC_Register(serv, name, function) while(no_shutdown) { client = accept_on_server_socket(serv) serv_client(client) }The backend sets up a server at myPort in the current machine (you can use “127.0.0.1” to point to the current machine). The server should be set up, so it is bound to the given port and is listening for incoming connections from the frontend. Before you start accepting the connections, you need to register all the functions that the RPCServ is willing to offer as a service. For example, if you have the following function to add two integers that you want to expose, you need to register it with the RPCServ as shown below. int addInts(int x, int y) { return x + y; }RPC_Register(“add”, addInts) The RPCServ is responsible for invoking (that is calling) the addInts function with the appropriate parameters when a request comes from the frontend.The RPCServ keeps running until a shutdown command is issued by the frontend. With a single frontend, this is quite simple. You simply exit the program after closing the sockets in an orderly manner. With multiple frontends, things can get little bit tricky.With multiple frontends, the pseudo code shown above needs some revision. Soon after accepting a connection, you need to create a child process and let that child process handle the new connection. That is the socket connection (client) is passed to the child process and it will be doing the calculations and sending the results or error back to the client. The server is free to loop back and accept another connection without waiting for the previous frontend’s service to complete.With multiple frontends, let’s consider the situation where the backend received the shutdown command. The command is received in a child process. For the backend to terminate, we must terminate the parent process that started running the backend. The child that received the shutdown must notify the parent about the reception of the shutdown. For this purpose, the child can use the return value in an exit() system call. The parents gets the return value of the child using the waitpid() system call. The example code below shows how you can return a value from a child to parent.#include #include #include #include int main() { int pid; int rval; if ((pid = fork()) == 0) { sleep(10); return 10; } while (1) { sleep(1); int res = waitpid(pid, &rval, WNOHANG); printf(“Returned value %d ”, WEXITSTATUS(rval)); } }On the server socket, the parent is going to block. It is waiting for new connections from the frontends. Once a connection comes in, the parent is going to unblock and proceed to create a child process to handle the frontend. To handle the shutdown properly, we need to check whether any child processes have already issued a shutdown. If so, we close the client connection that just arrived and stop accepting any more connections. When all the child processes that are running have completed their execution, the parent terminates. Because the parent is blocking on the server socket, it would not be able to shutdown or even know about the shutdown when it arrives. The parent would only detect the shutdown at the arrival of the next frontend processing request.Therefore, the backend would keep going even after shutdown has been sent until the next frontend request. NOTE: There is a better way of doing the same activity as the above that will use epoll() or select(). If you are following the advanced tutorials and are already familiar with C/Linux socket programming, you are strongly encouraged to use those system calls. The design is left to you, but your design needs to meet or exceed the above functionality. For example, with epoll() or select() you could terminate the backend without waiting until the next frontend request.Backend Functions: You need to provide the following functions in the backend implementation. 1. int addInts(int a, int b); // add two integers 2. int multiplyInts(int a, int b); // multiple two integers 3. float divideFloats(float a, float b); // divide float numbers (report divide by zero error) 4. int sleep(int x); // make the calculator sleep for x seconds – this is blocking 5. uint64_t factorial(int x); // return factorial xRPCServ Requirements We are not going to test your RPCServ implementation with applications other than the calculator. Therefore, we are not standardizing on the RPCServ interface. However, you are strongly encouraged to provide at least the following. rpc_t *RPC_Init(char *host, int port) // rpc_t is a type defined by you and it holds // all necessary state, config about the RPC connection RPC_Register(rpc_t *r, char *name, callback_t fn) // callback_t is type defined by you rpc_t *RPC_Connect(char *name, int port) RPC_Close(rpc_t *r) RPC_Call(rpc_t *r, char *name, args..) // You can have different variations to // handle different number of parameters and typesThis is a guide for you to organize the RPCServ implementation. You can change the function signatures and have more functions in your RPCServ implementation. How the Assignment will be Graded?We will use a shell script to grade your assignment. The shell script will start the backend and start the frontend and inject different inputs. The script checks the output from your frontend against an expected value and reports an error. Grade distribution will be notified very soon by the TAs.What You Need to Handin? What needs to be handed in? Source files, Makefile or CMakeLists.txt? Can I Collaborate with My Friends? This is an individual assignment. You can brainstorm with your friends in developing RPCServ and other components. The final implementation must be yours only. You cannot do group coding.

$25.00 View

[SOLVED] Regression and model selection math8050: homework 9

1. (30pts total, equally weighted) PH Exercise 9.2 (the diabetes data) 2. (60pts total, equally weighted) Consider the diabetes data azdiabetes.dat in class. The goal here is to fit a Bayesian logistic regression model with the variable diabetes as the response and npreg, bp, bmi, pred, and age as the covariates. Suppose that the logistic regression model we consider is of the form Pr(Yi = 1 | xi , β, γ) = e θi /(1 + e θi ) where β = (β0, . . . , β5), γ = (γ1, . . . , γ5) and θi = β0 + X 5 j=1 βjγjxi,jHere γj = 1 if the jth variable is a predictor of diabetes and 0 otherwise. For example, γ = (1, 1, 0, 0, 0) corresponds to the model θi = β0 + β1xi,1 + β2xi,2. Obtain posterior distribution of β and γ, assuming the following independent priors. γj ∼ Ber(0.5), β0 ∼ Normal(0, 16), βj ∼ Normal(0, 4) 1 for each j > 0.a. Derive the full conditional distributions for βj with j = 0, 1, . . . , 5 and γj with j = 1, . . . , 5. b. Implement a Metropolis-Hastings algorithm to obtain MCMC samples from the joint posterior distribution and perform convergence diagnostics.c. Report the 95% credible intervals for the parameters βj with j = 0, 1, . . . , 5 and report the posterior including probabilities for each covariate.

$25.00 View

[SOLVED] Mixture models and bayesian linear regression math8050: homework 8

1. (60pts total, equally weighted) Consider a three component mixture of normal distribution with a common prior on the mixture component means, the error variance and the variance within mixture component means. The prior on the mixture weights w is a three component Dirichlet distribution.(The data for this problem can be found in Mixture.csv). p(Yi |µ1, µ2, µ3, w1, w2, w3, ε2 ) = X 3 j=1 wiN(µj , ε2 ) µj |µ0, σ2 0 ∼ N(µ0, σ2 0 ) µ0 ∼ N(0, 3) σ 2 0 ∼ IG(2, 2) (w1, w2, w3) ∼ Dirichlet(1) ε 2 ∼ IG(2, 2), for i = 1, . . . n.Specifically, • w1, w2 and w3 are the mixture weight of mixture components 1,2 and 3 respectively • µ1, µ2 and µ3 are the means of the mixture components • ε 2 is the variance parameter of the error term around the mixture components.Since we’re building a hierarchical model for the means of the individual component, we have a common hyperprior, where, µ0 is the mean parameter of this hyperprior, σ 2 0 is its variance parameter. Both of these have priors as well, but the parameters of those priors are fixed, where µ0 has a Normal prior with mean 0 and variance 3, σ 2 0 has an Inverse-Gamma prior with shape and rate parameter of (2,2) respectively.Similarly, ε 2 has an Inverse-Gamma prior with shape and rate parameter of (2,2) respectively. While they have the same parametrisation, they do not share a prior. The mixture weights w1, w2, w3 jointly come from a Dirichlet distribution, with parameter vector (1, 1, 1). w1, w2, w3, µ1, µ2, µ3, ε2 , µ0 and σ 2 0 are all random variables that we will estimate when we fit the model. (a) Let τ = 1/ε2 and ϕ0 = 1/σ2 0 . Derive the joint posterior p(w1, w2, w3, µ1, µ2, µ3, ε2 , µ0, σ2 0 |Y1, …, YN ) up to a normalizing constant.(b) Derive the full conditionals for all the parameters up to a normalizing constant. – p(w1, w2, w3|µ1, µ2, µ3, ε2 , Y1, …, YN ) ∝ – p(µ1|µ2, µ3, w1, w2, w3, Y1, …, YN , ε2 , µ0, σ2 0 ) ∝ – p(µ2|µ1, µ3, w1, w2, w3, Y1, …, YN , ε2 , µ0, σ2 0 ) ∝ – p(µ3|µ1, µ2, w1, w2, w3, Y1, …, YN , ε2 , µ0, σ2 0 ) ∝ – p(ε 2 |µ1, µ2, µ3, w1, w2, w3, Y1, …, YN ) ∝ – p(µ0|µ1, µ2, µ3, σ2 0 ) ∝ – p(σ 2 0 |µ0, µ1, µ2, µ3) ∝(c) Since neither the joint posterior nor any of the full conditionals involving the likelihood are of a form that’s easy to sample, we introduce a data augmentation scheme. A common solution is to introduce an additional set of auxiliary random variables {Zi} N i=1 that assign each observation to one of the mixture components with the probability of assignment being the respective mixture weight. Re-derive the full conditionals under the data augmentation scheme.(d) In task (c) you derived all the full conditionals, and due to data augmentation scheme they are all in a form that is easy to sample. Use these full conditionals to implement Gibbs sampling using the data from “Mixture.csv”. (e) Given tasks (c)-(d), show traceplots for all estimated parameters, and compute means and 95% credible intervals for the marginal posterior distributions of all the parameters except the auxiliary variables. Now suppose you re-run the sampler using 3 different sets of starting values for the parameters, are your results the same? Justify your reasoning by with visualizations.2. (30pts total, equally weighted) PH Exercise 9.1: The file swim.dat contains data on the amount of time in seconds, it takes each of four high school swimmers to swim 50 yards. Each swimmer has six times, taken on a biweekly basis.(a) Perform the following data analysis for each swimmer separately: Write down a linear regression model of swimming time as the response and week as the explanatory variable. Complete the prior 2 specification by using the information that competitive times for this age group generally range from 22 to 24 seconds.(b) Implement a Gibbs sampler to fit each of the models. For each swimmer j, obtain a posterior predictive distribution for Y ∗ j , the time of simmer j if they were to swim two weeks from the last recorded time. (c) The coach has to decide which swimmer should compete in a swimming meet in two weeks. Use your posterior predictive distributions, compute P(Y ∗ j = max{Y ∗ 1 , . . . , Y ∗ 4 }|Y) for each swimmer j, and based on this make a recommendation to the coach.

$25.00 View

[SOLVED] Gibbs sampling and metropolis-hastings algorithm math8050: homework 7

Please load all the packages used in the following R chunk before the function sessionInfo() # load packages sessionInfo() Total points on assignment: 10 (reproducibility) + 45 (Q1) + 45 (Q2)Reproducibility component: 10 points. 1. (45pts in total, equally weighted) Suppose that we want to generate a truncated beta distribution Beta(2.7, 6.3) restricted to the interval (c, d) with c, d ∈ (0, 1). Assume that c = 0.1 and d = 0.9. (a) Implement a Metropolis-Hastings algorithm based on a Beta(2, 6) proposal, and provide convergence diagnostics and acceptance ratio(b) Implement a Metropolis-Hastings algorithm based on a U(c, d) proposal, and provide convergence diagnostics and acceptance ratio. (c) Compute P(X > 0.5) using the samples obtained in part (a) and (b).2. (45pts in total, equally weighted) We call X ∼ Tν is a Student’s t random variable with ν degrees of freedom, that is, its pdf is given by f(x|ν) = Γ( ν+1 2 ) Γ( ν 2 ) 1 √ νπ  1 + x 2 ν −(ν+1)/2 .Assume that ν = 4. Make sure that you need to perform convergence diagnostics. (a) Implement a Metropolis-Hastings algorithm with normal distribution N (0, 1) as the proposal distribution.(b) Implement a Metropolis-Hastings algorithm with t distribution T2 as the proposal distribution. (c) Calculate E(X) and the 95% credible interval for X using the MCMC samplers in part (a) and (b).

$25.00 View

[SOLVED] Cs550 problem set 3

In (Newman 2006, PNAS 103(23): 8577–8582)1 Mark Newman defines the modularity of a network divided into two components as (see paper or course slides for specification on notation): Q = 1 4m X ij  Aij − kikj 2m  sisj (1)We will now get a better intuition on what this quantity means. Consider the network in the figure 1 below: (a) [10pt] If we remove edge (A, G) and partition the graph into two communities, calculate the modularity of this partition.(b) [10pt] Now, consider the original network from the figure and the groups identified in (a). Add a link between nodes E and H and recalculate modularity Q. Did the modularity Q go up or down? Why?(c) [10pt] Consider the original network from the figure and the groups identified in (a). Now add a link between nodes F and A and recalculate modularity Q. Did Q go up or down? Why?1Newman ME. Modularity and community structure in networks. Proceedings of the national academy of sciences. 2006 Jun 6;103(23):8577-82. 1 Figure 1: Figure of problem 1 and problem 2Still consider the graph in Figure 1, assume that any edge in this graph has an equal weight 1. We run spectral clustering to partition the graph into two communities. (a) [10pt] Provide the adjacency matrix A, degree matrix D, and Laplacian matrix L of the graph.(b) [10pt] Using Matlab or Python, compute the eigen values and the corresponding eigen vectors of the Laplacian matrix. Rank the eigen values in ascending order. [You may refer to the problem 1 in homework 2 for some hints of using Python to compute eigen values and eigen vectors](c) [10pt] What is the eigen vector corresponding to the second smallest eigen values? Using 0 as the boundary, partition the graph into two communities, what is the graph partitioning result?What to submit: i. The matrices A, D and L in (a). ii. The eigen values and eigen vectors in (b), as well as the code for computing them. iii. The graph partitioning result in (c).Imagine an undirected graph G with nodes 2, 3, 4, …, 1000000. (Note that there is no node 1.) There is an edge between nodes i and j if and only if i and j have a common factor other than 1. Put another way, the only edges that are missing are those between nodes that are relatively prime; e.g., there is no edge between 15 and 56.We want to find communities by starting with a clique (not a bi-clique) and growing it by adding nodes. However, when we grow a clique, we want to keep the density of edges 2 at 1; i.e., the set of nodes remains a clique at all times. A maximal clique is a clique for which it is impossible to add a node and still retain the property of being a clique; i.e., a clique C is maximal if every node not in C is missing an edge to at least one member of C.(a) [10pt] Prove that if i is any integer greater than 1, then the set Ci of nodes of G that are divisible by i is a clique.(b) [10pt] Under what circumstances is Ci a maximal clique? Prove that your conditions are both necessary and sufficient. (Trivial conditions, like “Ci is a maximal clique if and only if Ci is a maximal clique”, will receive no credit.)(c) [10pt] Prove that C2 is the unique maximal clique. That is, it is larger than any other clique.

$25.00 View

[SOLVED] Cs550: massive data mining and learning problem set 1

Questions 1. Map-Reduce (35 pts) Write a MapReduce program in Hadoop that implements a simple “People You Might Know” social network friendship recommendation algorithm. The key idea is that if two people have a lot of mutual friends, then the system should recommend that they connect with each other.Input: Use the provided input file hw1q1.zip. The input file contains the adjacency list and has multiple lines in the following format: Here, is a unique integer ID corresponding to a unique user and is a commaseparated list of unique IDs corresponding to the friends of the user with the unique ID .Note that the friendships are mutual (i.e., edges are undirected): if A is friend with B, then B is also friend with A. The data provided is consistent with that rule as there is an explicit entry for each side of each edge.Algorithm: Let us use a simple algorithm such that, for each user U, the algorithm recommends N = 10 users who are not already friends with U, but have the largest number of mutual friends in common with U.Output: The output should contain one line per user in the following format: where is a unique ID corresponding to a user and is a commaseparated list of unique IDs corresponding to the algorithm’s recommendation of people that might know, ordered by decreasing number of mutual friends. Even if a user has fewer than 10 second-degree friends, output all of them in decreasing order of the number of mutual friends. If a user has no friends, you can provide an empty list of recommendations.If there are multiple users with the same number of mutual friends, ties are broken by ordering them in a numerically ascending order of their user IDs. Also, please provide a description of how you are going to use MapReduce jobs to solve this problem. We only need a very high-level description of your strategy to tackle this problem. Note: It is possible to solve this question with a single MapReduce job.But if your solution requires multiple MapReduce jobs, then that is fine too. What to submit: (i) The source code as a single source code file named as the question number (e.g., question_1.java). (ii) Include in your writeup a short paragraph describing your algorithm to tackle this problem. (iii) Include in your writeup the recommendations for the users with following user IDs: 924, 8941, 8942, 9019, 9020, 9021, 9022, 9990, 9992, 9993. 2.Association Rules (35 pts) Association Rules are frequently used for Market Basket Analysis (MBA) by retailers to understand the purchase behavior of their customers. This information can be then used for many different purposes such as cross-selling and up-selling of products, sales promotions, loyalty programs, store design, discount plans and many others.Evaluation of item sets: Once you have found the frequent itemsets of a dataset, you need to choose a subset of them as your recommendations. Commonly used metrics for measuring significance and interest for selecting rules for recommendations are: 2a. Confidence (denoted as conf(A → B)): Confidence is defined as the probability of occurrence of B in the basket if the basket already contains A: conf(A → B) = Pr(B|A), where Pr(B|A) is the conditional probability of finding item set B given that item set A is present. 2b. Lift (denoted as lift(A → B)): Lift measures how much more “A and B occur together” than “what would be expected if A and B were statistically independent”: ����(� → �) = ����(� → �) �(�) where �(�) = !”##$%&(() * and N is the total number of transactions (baskets). 3. Conviction (denoted as conv(A→B)): it compares the “probability that A appears without B if they were independent” with the “actual frequency of the appearance of A without B”: ����(� → �) = 1 − �(�) 1 − ����(� → �) (a) [5 pts] A drawback of using confidence is that it ignores Pr(B).Why is this a drawback? Explain why lift and conviction do not suffer from this drawback? (b) [5 pts] A measure is symmetrical if measure(A → B) = measure(B → A). Which of the measures presented here are symmetrical? For each measure, please provide either a proof that the measure is symmetrical, or a counterexample that shows the measure is not symmetrical. (c) [5 pts] A measure is desirable if its value is maximal for rules that hold 100% of the time (such rules are called perfect implications). This makes it easy to identify the best rules. Which of the above measures have this property? Explain why. ProductRecommendations: The action or practice of selling additional products or services to existing customers is called cross-selling. Giving product recommendation is one of the examples of cross-selling that are frequently used by online retailers. One simple method to give product recommendations is to recommend products that are frequently browsed together by the customers.Suppose we want to recommend new products to the customer based on the products they have already browsed on the online website. Write a program using the A-priori algorithm to find products which are frequently browsed together. Fix the support to s = 100 (i.e. product pairs need to occur together at least 100 times to be considered frequent) and find itemsets of size 2 and 3. Use the provided browsing behavior dataset browsing.txt. Each line represents a browsing session of a customer. On each line, each string of 8 characters represents the id of an item browsed during that session. The items are separated by spaces.Note: for the following questions (d) and (e), the writeup will require a specific rule ordering but the program need not sort the output. (d) [10pts] Identify pairs of items (X, Y) such that the support of {X, Y} is at least 100. For all such pairs, compute the confidence scores of the corresponding association rules: X ⇒ Y, Y ⇒ X. Sort the rules in decreasing order of confidence scores and list the top 5 rules in the writeup.Break ties, if any, by lexicographically increasing order on the left hand side of the rule. (e) [10pts] Identify item triples (X, Y, Z) such that the support of {X, Y, Z} is at least 100. For all such triples, compute the confidence scores of the corresponding association rules: (X, Y) ⇒ Z, (X, Z) ⇒ Y, and (Y, Z) ⇒ X. Sort the rules in decreasing order of confidence scores and list the top 5 rules in the writeup. Order the left-hand-side pair lexicographically and break ties, if any, by lexicographical order of the first then the second item in the pair.What to submit: Include your properly named code file (e.g., question_2.java or question_2.py), and include the answers to the following questions in your writeup: (i) Explanation for 2(a). (ii) Proofs and/or counterexamples for 2(b). (iii) Explanation for 2(c). (iv) Top 5 rules with confidence scores for 2(d). (v) Top 5 rules with confidence scores for 2(e). 3. Locality-Sensitive Hashing (30 pts) When simulating a random permutation of rows, as described in Sec 3.3.5 of MMDS textbook, we could save a lot of time if we restricted our attention to a randomly chosen k of the n rows, rather than hashing all the row numbers.The downside of doing so is that if none of the k rows contains a 1 in a certain column, then the result of the min-hashing is “don’t know,” i.e., we get no row number as a min-hash value. It would be a mistake to assume that two columns that both min-hash to “don’t know” are likely to be similar. However, if the probability of getting “don’t know” as a min-hash value is small, we can tolerate the situation, and simply ignore such min-hash values when computing the fraction of min-hashes in which two columns agree. (a) [10 pts] Suppose a column has m 1’s and therefore (n-m) 0’s.Prove that the probability we get “don’t know” as the min-hash value for this column is at most ( +,- + ).. (b) [10 pts] Suppose we want the probability of “don’t know” to be at most �,/0. Assuming n and m are both very large (but n is much larger than m or k), give a simple approximation to the smallest value of k that will assure this probability is at most �,/0. Hints: (1) You can use ( +,- + ). as the exact value of the probability of “don’t know.” (2) Remember that for large x, (1 − / 1 )1 ≈ 1/�. (c) [10 pts]Note: This question should be considered separate from the previous two parts, in that we are no longer restricting our attention to a randomly chosen subset of the rows. When min-hashing, one might expect that we could estimate the Jaccard similarity without using all possible permutations of rows. For example, we could only allow cyclic permutations i.e., start at a randomly chosen row r, which becomes the first in the order, followed by rows r+1, r+2, and so on, down to the last row, and then continuing with the first row, second row, and so on, down to row r−1.There are only n such permutations if there are n rows. However, these permutations are not sufficient to estimate the Jaccard similarity correctly. Give an example of two columns such that the probability (over cyclic permutations only) that their min-hash values agree is not the same as their Jaccard similarity. In your answer, please provide (a) an example of a matrix with two columns (let the two columns correspond to sets denoted by S1 and S2) (b) the Jaccard similarity of S1 and S2, and (c) the probability that a random cyclic permutation yields the same min-hash value for both S1 and S2. What to submit: Include the following in your writeup: (i) Proof for 3(a) (ii) Derivation and final answer for 3(b) (iii) Example for 3(c)

$25.00 View

[SOLVED] Cs550 problem set 2

In this problem we will explore the relationship between two of the most popular dimensionality reduction techniques, SVD and PCA at a basic conceptual level. Before we proceed with the question itself, let us briefly recap the SVD and PCA techniques and a few important observations:• First, recall that the eigenvalue decomposition of a real, symmetric, and square matrix B (of size d × d) can be written as the following product: B = QΛQ T where Λ = diag(λ1, · · · , λd) contains the eigenvalues of B (which are always real) along its main diagonal and Q is an orthogonal matrix containing the eigenvectors of B as its columns.• Principal Component Analysis (PCA): Given a data matrix M (of size p × q), PCA involves the computation of the eigenvectors of MMT or MTM. The matrix of these eigenvectors can be thought of as a rigid rotation in a high dimensional space. When you apply this transformation to the original data, the axis corresponding to the principal eigenvector is the one along which the points are most spread out. More precisely, this axis is the one along which the variance of the data is maximized. Put another way, the points can best be viewed as lying along this axis, with small deviations from this axis.Likewise, the axis corresponding to the second eigenvector (the eigenvector corresponding to the second-largest eigenvalue) is the axis along which the variance of distances from the first axis is greatest, and so on.• Singular Value Decomposition (SVD): SVD involves the decomposition of a data matrix M (of size p × q) into a product: UΣV T where U (of size p × r) and V (of size q × r) are column-orthonormal matrices1 and Σ (of size r × r) is a diagonal matrix. The entries along the diagonal of Σ are referred to as singular values of M. The key to understanding what SVD offers is in viewing the r columns of U, Σ, and V as representing concepts that are hidden in the original matrix M.For answering the questions below, let us define a matrix M (of size p × q) and let us assume this matrix corresponds to a dataset with p data points and q dimensions. (a) [5pt] Are the matrices MMT and MTM symmetric, square and real? Explain. (b) [5pt] Prove that the eigenvalues of MMT are the same as that of MTM. Are their eigenvectors the same?(c) [5pt] Given that we now understand certain properties of MTM, write an expression for MTM in terms of Q, QT and Λ where Λ = diag(λ1, …, λd) contains the eigenvalues of MTM along its main diagonal and Q is an orthogonal matrix containing the eigenvectors of MTM as its columns. (Hint: Check the definition of eigenvalue decomposition provided in the beginning of the question to see if it is applicable.)(d) [5pt] SVD decomposes the matrix M into the product UΣV T where U and V are column-orthonormal and Σ is a diagonal matrix. Given that M = UΣV T , write a simplified expression for MTM in terms of V, V T and Σ.(e) In this question, let us experimentally test if SVD decomposition of M actually provides us the eigenvectors (PCA dimensions) of MTM. We strongly recommend students to use Python and suggested functions for this exercise2 . Initialize matrix M as follows: M =     1 2 2 1 3 4 4 3    • (e)(a) [5pt] Compute the SVD of M (Use scipy.linalg.svd function in Python and set the argument full matrices to False). The function returns values corresponding to U, Σ and V T . What are the values returned for U, Σ and V T ?Note: Make sure that the first element of the returned array Σ has a greater value than the second element. • (e)(b) [5pt] Compute the eigenvalue decomposition of MTM (Use scipy.linalg.eigh function in Python). The function returns two parameters: a list of eigenvalues (let us call 1A matrix U ∈ R p×q is column-orthonormal if and only if U TU = I where I denotes the identity matrix 2Other implementations of SVD and PCA might give slightly different results.Besides, you will just need fewer than five python commands to answer this entire question this list Evals) and a matrix whose columns correspond to the eigenvectors of the respective eigenvalues (let us call this matrix Evecs).Sort the list Evals in descending order such that the largest eigenvalue appears first in the list. Also, re-arrange the columns in Evecs such that the eigenvector corresponding to the largest eigenvalue appears in the first column of Evecs. What are the values of Evals and Evecs (after the sorting and re-arranging process)?• (e)(c) [5pt] Based on the experiment and your derivations in part (c) and (d), do you see any correspondence between V produced by SVD and the matrix of eigenvectors Evecs (after the sorting and re-arranging process) produced by eigenvalue decomposition? If so, what is it? (Note: The function scipy.linalg.svd returns V T , not V .)• (e)(d) [5pt] Based on the experiment and the expressions obtained in part (c) and part (d) for MTM, what is the relationship (if any) between the eigenvalues of MTM and the singular values of M? Explain.Note: The entries along the diagonal of Σ (part (d)) are referred to as singular values of M. The eigenvalues of MTM are captured by the diagonal elements in Λ (part (c)).What to submit: (i) Written solutions to questions 1(a) to 1(e) with explanations wherever required. (ii) Include the code as a single Python file as question 1.py.Let the matrix of the Web M be an n-by-n matrix, where n is the number of Web pages. The entry mij in row i and column j is 0, unless there is an arc from node (page) j to node i. In that case, the value of mij is 1/k, where k is the number of arcs (links) out of node j. Notice that if node j has k > 0 arcs out, then column j has k values of 1/k and the rest 0’s. If node j is a dead end (i.e., it has zero arcs out), then column j is all 0’s. Let r = [r1, r2, …, rn]T be (an estimate of) the PageRank vector; that is, ri is the estimate of the PageRank of node i. Define w(r) to be the sum of the components of r; that is w(r) = Pn j=1 ri . In one iteration of the PageRank algorithm, we compute the next estimate r 0 of the PageRank as: r 0 = Mr. Specifically, for each i we compute r 0 i = Pn j=1 Mijrj .(a) [5pt] Suppose the Web has no dead ends. Prove that w(r 0 ) = w(r). (b) [5pt] Suppose there are still no dead ends, but we use a teleportation probability of 1 − β, where 0 < β < 1. The expression for the next estimate of ri becomes r 0 i = β Pn j=1 Mijrj + (1 − β)/n. Under what circumstances will w(r 0 ) = w(r)? Prove your conclusion.(c) Now, let us assume a teleportation probability of 1 − β in addition to the fact that there are one or more dead ends. Call a node “dead” if it is a dead end and “live” if not. Assume w(r) = 1. At each iteration, we will distribute equally to each node the sum of: • 1. (1 − β)rj if node j is live. • 2. rj if node j is dead. 3 • (c)(a) [5pt] Write the equation for r 0 i in terms of β, M, and r. • (c)(b) [5pt] Then, prove that w(r 0 ) is also 1. What to submit: (i) Proof of 2(a); (ii) Condition for w(r 0 ) = w(r) and Proof of 2(b); (iii) Equation for r 0 i and Proof of 2(c).In this problem, you will learn how to implement the PageRank algorithm. You will be experimenting with the provided graph (assume graph has no dead-ends), which is stores in the graph.txt file.It has n = 100 nodes (numbered 1, 2, …, 100), and m = 1024 edges, 100 of which form a directed cycle (through all the nodes) which ensures that the graph is connected. It is easy to see that the existence of such a cycle ensures that there are no dead ends in the graph. There may be multiple edges between a pair of nodes, your program should handle these instead of ignoring them. The first column in graph.txt refers to the source node, and the second column refers to the destination node.Assume the directed graph G = (V, E) has n nodes (numbered 1, 2, …, n) and m edges, all nodes have positive out-degree, and M = [Mji]n×n is an n × n matrix as defined in class such that for any i, j ∈ [1, n]: Mji = ( 1 deg(i) , if (i → j) ∈ E 0 otherwiseHere, deg(i) is the number of outgoing edges of node i in G. By the definition of PageRank, assuming 1 − β to be the teleport probability, and denoting the PageRank vector by the column vector r, we have the following equation: r = 1 − β n 1 + βMr where 1 is the n × 1 vector with all entries equal to 1.Based on this equation, the iterative procedure to compute PageRank works as follows: 1. Initialize: r (0) = 1 n 1; 2. For i from 1 to k, iterate: r (i) = 1−β n 1 + βMr (i−1) .Run the aforementioned iterative process for 40 iterations (assuming β = 0.8) and obtain the PageRank vector r. Compute the following: (a) [10pt] List the top 5 node IDs with the highest PageRank scores. (b) [10pt] List the bottom 5 node IDs with the lowest PageRank scores. What to submit: (i) List 5 node IDs with the highest and least PageRank scores in your writeup. (ii) Include your code as a single source code file such as question 2.py.Note: This problem requires substantial computing time. Don’t start it at the last minute.This problem will help you understand the nitty gritty details of implementing clustering algorithms on Hadoop. In addition, this problem will also help you understand the impact of using various initialization strategies in practice. Let us say we have a set X of n data points in the d-dimensional space R d . Given the number of clusters k and the set of k centroids C, we now proceed to define the distance metric and the corresponding cost function that we minimize.Euclidean distance: Given two points A and B in d dimensional space such that A = [a1, a2…ad] and B = [b1, b2…bd], the Euclidean distance between A and B is defined as: ka − bk = vuutX d i=1 (ai − bi) 2The corresponding cost function φ that is minimized when we assign points to clusters using the Euclidean distance metric is given by: φ = X x∈X min c∈C kx − ck 2Iterative k-Means Algorithm: We learned the basic k-Means algorithm in class which is as follows: k centroids are initialized, each point is assigned to the nearest centroid and the centroids are recomputed based on the assignments of points to clusters. In practice, the above steps are run for several iterations. We present the resulting iterative version of k-Means in Algorithm 1.Iterative k-Means clustering on Hadoop: Implement iterative k-means using MapReduce where a single step of MapReduce completes one iteration of the k-means algorithm. So, to run k-means for i iterations, you will have to run a sequence of i MapReduce jobs. Please use our provided dataset hw2-q4-kmeans.zip for this problem. The zip has 4 files:• 1. data.txt contains the dataset which has 4601 rows and 58 columns. Each row is a document represented as a 58 dimensional vector of features. Each component in the vector represents the importance of a word in the document.• 2. c1.txt contains k initial cluster centroids. These centroids were chosen by selecting k = 10 random points from the input data. • 3. c2.txt contains initial cluster centroids which are as far apart as possible. (You can do this by choosing the 1st centroid c1 randomly, and then finding the point c2 that is farthest from c1, then selecting c3 which is farthest from c1 and c2, and so on).• 4. vocab.txt is the vocabulary document that contains the words, this file is only for your reference and you do not need it to do the experiment. Set number of iterations (MAX ITER) to 20 and number of clusters k to 10 for all the experiments carried out in this question.Hint about job chaining: We need to run a sequence of Hadoop jobs where the output of one job will be the input for the next one. There are multiple ways to do this and you are free to use any method you are comfortable with. One simple way to handle such a multistage job is to configure the output path of the first job to be the input path of the second and so on.The following pseudo code demonstrates job chaining. var inputDir var outputDir var centroidDir for i in number-of-iterations ( Configure job here with all params Set job input directory = inputDir Set job output directory = outputDir + i Run job centroidDir = outputDir + i )You will also need to share the location of the centroid file with the mapper. There are many ways to do this and you can use any method you find suitable. One way is to use the Hadoop Configuration object. You can set it as a property in the Configuration object and retrieve the property value in the Mapper setup function.For more details see : • http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/conf/Configuration .html#set(java.lang.String,java.lang.String) • http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/conf/Configuration .html#get(java.lang.String) • http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/Mapper .html#setup(org.apache.hadoop.mapreduce.Mapper.Context)(a) [10pt] Using the Euclidean distance as the distance measure, compute the cost function φ(i) for every iteration i. This means that, for your first MapReduce job iteration, you’ll be computing the cost function using the initial centroids located in one of the two text files. Run the k-means on data.txt using c1.txt and c2.txt. Generate a graph where you plot the cost function φ(i) as a function of the number of iterations i = 1, 2…20 for c1.txt and also for c2.txt.(Hint: Note that you do not need to write a separate MapReduce job to compute φ(i). You can just incorporate the computation of φ(i) into the Mapper/Reducer.)(b) [10pt] What is the percentage change in cost after 10 iterations of the k-Means algorithm when the cluster centroids are initialized using c1.txt vs. c2.txt? Is random initialization of k-means using c1.txt better than initialization using c2.txt in terms of cost φ(i)? Explain your reasoning.What to submit: (i) A plot of cost vs. iteration for two initialization strategies for 4(a) (ii) Percentage improvement values and your explanation for 4(b) (iii) The code named as question 4.py or question 4.java.

$25.00 View

[SOLVED] Homework 3 coms 4771  problem 1 (features; 10 points). it is common to pre-preprocess

Problem 1 (Features; 10 points). It is common to pre-preprocess the feature vectors in R d before passing them to a learning algorithm. Two simple and generic ways to pre-process are as follows.• Centering: Subtract the mean µˆ := 1 |S| P (x,y)∈S x (of the training data) from every feature vector: x 7→ x − µˆ .• Standardization: Perform centering, and then divide every feature by the per-feature standard deviation ˆσi = q 1 |S| P (x,y)∈S (xi − µˆi) 2: (x1, x2, . . . , xd) 7→  x1 − µˆ1 σˆ1 , x2 − µˆ2 σˆ2 , . . . , xd − µˆd σˆd  .For each of the following learning algorithms, and each of the above pre-processing transformations, does the transformation affect the learning algorithm? (a) The classifier based on the generative model where class conditional distributions are multivariate Gaussian distributions with a fixed covariance equal to the identity matrix I. Assume MLE is used for parameter estimation.(b) The 1-NN classifier using Euclidean distance.(c) The greedy decision tree learning algorithm with axis-aligned splits. (For concreteness, assume Gini index is used as the uncertainty measure, and the algorithm stops after 20 leaf nodes.) (d) Empirical Risk Minimization: the (intractable) algorithm that finds the linear classifier (both the weight vector and threshold) that has the smallest training error rate.To make this more precise, consider any training data set S from X × Y, and let ˆfS : X → Y be the classifier obtained from the learning algorithm with training set S. Let φ: X → X 0 be the pre-processing transformation; and let ˜fS0 : X 0 → Y be the classifier obtained from the learning algorithm with training set S 0 , where S 0 is the data set from X 0 × Y containing (φ(x), y) for each (x, y) in S. We say φ affects the learning algorithm if, for any training data S, the classifier ˆfS is not the same as x 7→ ˜fS0(φ(x)).You should assume the following: (i) the per-feature standard deviations are never zero; (ii) there are never any “ties” whenever you compute an arg max or an arg min; (iii) there are no issues with numerical precision or computational efficiency.What to submit: “yes” or “no” for each pre-processing and learning algorithm pair, along with a brief statement of justification.Problem 2 (More features; 30 points). Download the review data set reviews tr.csv (training data) and reviews te.csv (test data) from Courseworks. This data set is comprised of reviews of restaurants in Pittsburgh; the label indicates whether or not the reviewer-assigned rating is at least four (on a five-point scale). The data are in CSV format (where the first line is the header); the first column is the label (“label”), and the second column is the review text (“text”). The text has been processed to remove non-alphanumeric symbols and to make all letters lowercase.Write a script that, using only the training data, tries different methods of data representation and different methods for learning a linear classifier, and ultimately selects—via five-fold cross validation—and uses one combination of these methods to train a final classifier. (You can think of the choice of these methods as “hyperparameters”.)The data representations to try are the following. 1. Unigram representation. In this representation, there is a feature for every word w, and the feature value associated with a word w in a document d is tf(w; d) := number of times word w appears in document d . (tf is short for term frequency.)2. Term frequency-inverse document frequency (tf-idf ) weighting. This is like the unigram representation, except the feature associated with a word w in a document d from a collection of documents D (e.g., training data) is tf(w; d) × log10(idf(w; D)), where tf(w; d) is as defined above, and idf(w; D) := |D| number of documents in D that contain word w .This representation puts more emphasis on rare words and less emphasis on common words. (There are many variants of tf-idf that are unfortunately all referred to by the same name.)Important: When you apply this representation to a new document (e.g., a document in the test set), you should still use the idf defined with respect to D. This, however, becomes problematic if a word w appears in a new document but did not appear in any document in D: in this case, idf(w; D) = |D|/0 = ∞. It is ambiguous what should be done in these cases, but a safe recourse, which you should use in this assignment, is to simply ignore words w that do not appear in any document in D.3. Bigram representation. In addition to the unigram features, there is a feature for every pair of words (w1, w2) (called a bigram), and the feature value associated with a bigram (w1, w2) in a given document d is tf((w1, w2); d) := number of times bigram (w1, w2) appears consecutively in document d . In the sequence of words “a rose is a rose”, the bigrams that appear are: (a,rose), which appears twice; (rose, is); and (is, a).4. Another data representation of your own choosing or design. Some examples:• Extend the bigram representation to n-gram representations (for any positive integer n) to consider n-tuples of words appearing consecutively in documents. • Extend to k-skip-n-grams: instead of just counting the number of times the n-tuples (w1, w2, . . . , wn) appears consecutively in the document, also count occurrences where wi and wi+1 are at most k words apart.• Use variants of tf-idf weighting, such as one that replace tf(w; d) with 1+log10(tf(w; d)).This is better for documents where words often appear in bursts.• Select specific words or n-grams you think are informative for the classification task.The learning methods to try are the following. 1. Averaged-Perceptron (with some modifications described below). Averaged-Perceptron is like Voted-Perceptron (which uses Online Perceptron), except instead of forming the final classifier as a weighed-majority vote over the various linear classifiers, you simply form a single linear classifier by taking a weighted average the weight vectors and thresholds of all the linear classifiers used by Online Perceptron.Use the following two modifications: • Run Online Perceptron to make two passes through the data. Before each pass, randomly shuffle the order of the training examples. Note that a total of 2n + 1 linear classifiers are considered during the run of the algorithm (where n is the number of training data).• Instead of averaging the weights and thresholds for all 2n + 1 linear classifiers, just average these things for the final n + 1 linear classifiers. You should use Averaged-Perceptron with all four data representations.2. Na¨ıve Bayes classifier with parameter estimation as in the previous homework assignment. Note that with this learning method, a word that appears more than once in a document is treated the same as appearing exactly once. You only need to use Na¨ıve Bayes with the unigram representation.3. (Optional.) Any other learning method you like. You must write your own code to implement Averaged-Perceptron. The code should be easy to understand (e.g., by using sensible variable names and comments). For other things (e.g., Na¨ıve Bayes, cross validation), you can use your own or existing library implementations in MATLAB (+Statistics/ML library) and Python (+numpy, scipy, sklearn), but you are responsible for the correctness of these implementations as per the specifications from the course lectures. Provide references for any such third-party implementations you use (e.g., specific scikit-learn or MATLAB functions that perform cross validation).What to submit: 1. A concise and unambiguous description of the fourth data representation you try (and also of any additional learning methods you try). 2. Cross-validation error rates for all data representations / learning methods combinations. 3. Name of the method ultimately selected using the cross validation procedure. 4. Training and test error rates of the classifier learned by the selected method. 5. Source code and scripts that you write yourself (in separate files).Problem 3 (MLE; 10 points). (a) Consider the statistical model P = {Pµ,σ2 : µ ∈ R d , σ2 > 0}, where Pµ,σ2 is the multivariate Gaussian distribution with mean µ and covariance matrix σ 2I. Give a formula for the MLE of σ 2 given the data {xi} n i=1 from R d , which is regarded as an iid sample. Also give a clear derivation of the formula in which you briefly justify each step.(b) Consider the statistical model P = {Pθ : θ ∈ N}, where Pθ is the distribution of the random pair (X, Y ) in N × {0, 1} such that: • X is uniformly distributed on {1, 2, . . . , θ}; • for any x ∈ {1, 2, . . . , θ}, the conditional distribution of Y given X = x is specified by Pθ(Y = 1 | X = x) = x/θ. Give a formula for the MLE of θ given a single observation (x, y) ∈ N × {0, 1}. Also give a clear derivation of the formula in which you briefly justify each step.

$25.00 View

[SOLVED] Homework 2 coms 4771 problem 1 (na¨ıve bayes; 30 points). download the “20 newsgroups data set” news.mat

Problem 1 (Na¨ıve Bayes; 30 points). Download the “20 Newsgroups data set” news.mat from Courseworks. The training feature vectors/labels and test feature vectors/labels are stored as data/labels and testdata/testlabels. Each data point corresponds to a message posted to one of 20 different newsgroups (i.e., message boards). The representation of a message is a (sparse) binary vector in X := {0, 1} d (for d := 61188) that indicates the words that are present in the message.If the j-th entry in the vector is 1, it means the message contains the word that is given on the j-th line of the text file news.vocab. The class labels are Y := {1, 2, . . . , 20}, where the mapping from classes to newsgroups is in the file news.groups (which we won’t actually need).In this problem, you’ll develop a classifier based on a Na¨ıve Bayes generative model. Here, we use class conditional distributions of the form Pµ(x) = Qd j=1 µ xj j (1 − µj ) 1−xj for x = (x1, x2, . . . , xd) ∈ X . Here, µ = (µ1, µ2, . . . , µd) ∈ [0, 1]d is the parameter vector from the parameter space [0, 1]d .Since there are 20 classes, the generative model is actually parameterized by 20 such vectors, µy = (µy,1, µy,2, . . . , µy,d) for each y ∈ Y, as well as the class prior parameters, πy for each y ∈ Y.The class prior parameters, of course, must satisfy πy ∈ [0, 1] for each y ∈ Y and P y∈Y πy = 1.(a) Give the formula for the MLE of the parameter µy,j based on training data {(xi , yi)} n i=1. (Remember, each unlabeled point is a vector: xi = (xi,1, xi,2, . . . , xi,d) ∈ {0, 1} d .)(b) MLE is not a good estimator for the class conditional parameters if the estimate turns out to be zero or one. An alternative is the following estimator based on a technique called Laplace smoothing: ˆµy,j := (1 + Pn i=1 1{yi = y}xi,j )/(2 + Pn i=1 1{yi = y}) ∈ (0, 1).Write codes for training and testing a classifier based on the Na¨ıve Bayes generative model described above. Use Laplace smoothing to estimate class conditional distribution parameters, and MLE for class prior parameters. You should not use or look at any existing implementation (e.g., such as those that may be provided as library functions). Using your codes, train and test a classifier with the data from news.mat. Your codes should be easy to understand (e.g., by using sensible variable names and comments).What to submit: (1) training and test error rates, (2) source code (in a separate file). (c) Consider the binary classification problem, where newsgroups {1, 16, 20} comprise the “negative class” (class 0), and newsgroups {17, 18, 19} comprise the “positive class” (class 1). Newsgroups {1, 16, 20} are “religious” topics, and newsgroups {17, 18, 19} are “political” topics. Modify the data in news.mat to create the training and test data sets for this problem. Using these data and your codes from part (b), train and test a Na¨ıve Bayes classifier. What to submit: training and test error rates. Save the learned classifier for part (d)!(d) The classifier you learn is ultimately a linear classifier, which means it has the following form: x 7→ ( 0 if α0 + Pd j=1 αjxj ≤ 0 1 if α0 + Pd j=1 αjxj > 0 for some real numbers α0, α1, . . . , αd. Determine the values of these αj ’s for your learned classifier from part (c). Then, report the vocabulary words whose indices j ∈ {1, 2, . . . , d} correspond to the 20 largest (i.e., most positive) αj value, and also the vocabulary words whose indices j ∈ {1, 2, . . . , d} correspond to the 20 smallest (i.e., most negative) αj value.Don’t report the indices j’s, but rather the actual vocabulary words (from news.vocab). What to submit: two ordered list (appropriately labeled) of 20 words each.Problem 2 (Cost-sensitive classification; 10 points). Suppose you face a binary classification problem with input space X = R and output space Y = {0, 1}, where it is c times as bad to commit a “false positive” as it is to commit a “false negative” (for some real number c ≥ 1). To make this concrete, let’s say that if your classifier predicts 1 but the correct label is 0, you incur a penalty of $c; if your classifier predicts 0 but the correct label is 1, you incur a penalty of $1. (And you incur no penalty if your classifier predicts the correct label.)Assume the distribution you care about has a class prior with π0 = 2/3 and π1 = 1/3, and the class conditional densities are N(0, 1) for class 0, and N(2, 1/4) for class 1. Let f ? : R → {0, 1} be the classifier with the smallest expected penalty.(a) Assume 1 ≤ c ≤ 14. Specify precisely (and with a simple expression involving c) the region in which the classifier f ? predicts 1.(b) Now instead assume c ≥ 15. Specify precisely the region in which the classifier f ? predicts 1.Problem 3 (Covariance matrices; 10 points). Let X be a mean-zero random vector in R d (so E(X) = 0). Let Σ := E(XX> ) be the covariance matrix of X, and suppose its eigenvalues are λ1 ≥ λ2 ≥ · · · ≥ λd. Let σ > 0 be a positive number.(a) What are the eigenvalues of Σ + σ 2I? (b) What are the eigenvalues of (Σ + σ 2I) −2 ? In both cases, give your answers in terms of σ and the eigenvalues of Σ.

$25.00 View

[SOLVED] Homework 1coms 4771 problem 1 (nearest neighbors; 20 points). download the ocr image data set ocr.mat

Problem 1 (Nearest neighbors; 20 points). Download the OCR image data set ocr.mat from Courseworks, and load it into MATLAB: load (‘ocr . mat ‘) Or Python: from scipy . io import loadmat ocr = loadmat (‘ocr . mat ‘) The unlabeled training data (i.e., feature vectors) are contained in a matrix called data (one point per row), and the corresponding labels are in a vector called labels. The test feature vectors and labels are in, respectively, testdata and testlabels. In MATLAB, you can view an image (say, the first one) in the training data with the following commands: imagesc ( reshape ( data (1 ,:) ,28 ,28) ‘); If the colors are too jarring for you, try the following: colormap (1 – gray );In Python, to view the first image, try the following (ideally, from IPython or Jupyter Notebook): import matplotlib . pyplot as plt from matplotlib import cm plt . imshow ( ocr [‘data ‘][0]. reshape ((28 ,28)) , cmap = cm . gray_r ) plt . show ()Write a function that implements the 1-nearest neighbor classifier with Euclidean distance. Your function should take as input a matrix of training feature vectors X and a vector of the corresponding labels Y, as well as a matrix of test feature vectors test. The output should be a vector of predicted labels preds for all the test points. Naturally, you should not use (or look at the source code for) any library functions for computing Euclidean distances, nearest neighbor queries, and so on. If in doubt about what is okay to use, just ask. Note that for efficiency, you should use vector operations (rather than, say, a bunch of for-loops).1Instead of using your 1-NN code directly with data and labels as the training data, do the following. For each value n ∈ {1000, 2000, 4000, 8000}, • Draw n random points from data, together with their corresponding labels. In MATLAB, use sel = randsample(60000,n) to pick the n random indices, and data(sel,:) and labels(sel) to select the examples; in Python, use sel = random.sample(xrange(60000),n) (after import random), ocr[‘data’][sel], and ocr[‘labels’][sel].• Use these n points as the training data and testdata as the test points, and compute the test error rate of the 1-NN classifier.A plot of the error rate (on the y-axis) as a function of n (on the x-axis) is called a learning curve. We get an estimate of this curve by using the test error rate in place of the (true) error rate.Repeat the (random) process described above ten times, independently. Produce an estimate of the learning curve plot using the average of these test error rates (that is, averaging over ten repetitions). Add error bars to your plot that extend to one standard deviation above and below the means. Ensure the plot axes are properly labeled.What to submit: (1) learning curve plot, (2) source code (in a separate file). 1 http://www.mathworks.com/help/matlab/matlab_prog/vectorization.htmlProblem 2 (Prototype selection; 20 points). Prototype selection is a method for speeding-up nearest neighbor search that replaces the training data with a smaller subset of prototypes (which could be data points themselves). For simplicity, assume that 1-NN is used with Euclidean distance.So a prototype selection method simply takes as input: • the training data {(xi , yi)} n i=1 from R d × {0, 1, . . . , 9} (say), and • a positive integer m; it should return m labeled pairs {(x˜i , y˜i)} m i=1, each from R d × {0, 1, . . . , 9}.Design a method for choosing prototypes, where the goal is for the 1-NN classifier based on the prototypes to have good test accuracy. Implement your algorithm; use it to select prototypes for the OCR data set, and evaluate the test error rate of the 1-NN classifier based on the selected prototypes. You should use the whole training data set as input (i.e., all n = 60000 data points), but vary the number of selected prototypes m in the set {1000, 2000, 4000, 8000}. If your procedure is randomized, repeat it at least ten times (for each m) to properly assess its performance.What to submit: 1. A brief description of your method (in words). 2. Concise and unambiguous pseudocode for your algorithm. 3. A table of the test error rates for the different values of m you try. (Report averages and standard deviations if your procedure is randomized.) 4. Source code (in a separate file).Problem 3 (Probability; 10 points). Suppose you have an urn containing 100 colored balls. Each ball is painted with one of five possible colors from the color set C := {red, orange, yellow, green, blue}. For each c ∈ C, let nc denote the number of balls in the urn with color c. (a) Suppose you pick two balls uniformly at random with replacement from the urn. What is the probability that they have different colors? Briefly explain your answer. (b) If you could paint each ball in the urn (with any color from C), what would you do to maximize the probability from part (a)? In other words, for each color in C, how many balls in the urn would you paint with that color? Briefly explain your answer.

$25.00 View

[SOLVED] Project 1: linear regression

Project 1 is to implement the basic linear regression algorithm as described in video lectures 2.1—2.5. Your algorithm should assume that an input file has an m (lines of data) and a n (number of features) on the first line of the file. Each line after that contains n features and the value associated with those features. For example, from lecture 2.5 an input file for the house data might be: 15 3 2 3 1060 119,000 4 2 1195 125,000 4 2 1199 125,000 1 1 925 131,000 3 2 1014 175,000 3 3 1197 175,000 3 2 1008 187,400 3 1 1352 194,000 3 2 1773 200,000 4 3 1625 225,000 4 4 1827 228,000 3 4 1325 235,000 3 3 2120 250,000 4 3 2700 274,500 5 4 2659 319,900Your program should prompt the user for a training file. Using the training file, it should compute and print out to the screen the computed weights and the J value. Next, your program should ask the user for a test file. Using the weights computed from the training file, it should then print out J for the test file. All output should be clearly labelled.Your Python program should be named yourlastname_yourfirstname_P1.py, then zipped and uploaded to Canvas.

$25.00 View