Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Ecse 222 vhdl assignment #3: adders and critical path

In this assignment, you will build upon previously implemented circuits to design a more complex circuit. You will learn how to design and simulate useful adder circuit blocks. If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax. 4 VHDL Description of Adder Circuits In this section, you will be asked to perform the design and simulation of the following two adder circuits: (a) a 4-bit ripple-carry adder; and (b) a one-digit binary coded decimal (BCD) adder. Details of the assignments are described below. 4.1 Ripple-Carry Adder (RCA) In this section, you will implement a structural description of a 4-bit ripple-carry adder using basic addition components: half-adders and full-adders.4.1.1 Structural Description of a Half-Adder in VHDL A half-adder is a circuit that takes two binary digits as inputs, and produces the result of the addition of the two bits in the form of a sum and carry signals. The carry signal represents an overflow into the next digit of a multi-digit addition. Using the following entity definition for your VHDL code, implement a structural description of the half-adder. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y half_adder i s p o r t (a: i n s t d _ l o g i c ; b: i n s t d _ l o g i c ; s: o u t s t d _ l o g i c ; c: o u t s t d _ l o g i c ); end half_adder ; After you have described your structural style of the half-adder in VHDL, you are required to test your circuit. Write a testbench code and perform an exhaustive test of your VHDL description of the half-adder. 4.1.2 Structural Description of a Full-Adder in VHDL Unlike the half-adder, a full-adder adds binary digits while accounting for values carried in (from a previous stage addition). Write a structural VHDL description for the full-adder circuit using the half-adder circuit that you designed in the previous section. Use the following entity declaration for your structural VHDL description of the full-adder. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y full_adder i s p o r t (a: i n s t d _ l o g i c ; b: i n s t d _ l o g i c ; c_in : i n s t d _ l o g i c ; s: o u t s t d _ l o g i c ; c_out : o u t s t d _ l o g i c ); end full_adder ; After you have described your circuit in VHDL, write a testbench code and perform an exhaustive test of your VHDL description of the full-adder. 4.1.3 Structural Description of a 4-bit Ripple-Carry Adder (RCA) in VHDL Using the half-adder and full-adder circuits implemented in the two previous sections, implement a 4-bit carry-ripple adder. Write a structural VHDL code for the 4-bit RCA using the following entity declaration. l i b r a r y IEEE;u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y rca_structural i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (4 downto 0) ); end rca_structural ; Note that S(4) contains the carry-out of the 4-bit adder. After you have described your circuit in VHDL, write a testbench code and perform an exhaustive test of your VHDL structural description of the 4-bit RCA. 4.1.4 Behavioral Description of a 4-bit RCA in VHDL In this part, you are required to implement the 4-bit RCA using behavioral description. One way to obtain a behavioral description is to use arithmetic operators in VHDL (i.e., “+”). Write a behavioral VHDL code for the 4-bit RCA using the following entity declaration for your behavioral VHDL description. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y rca_behavioral i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (4 downto 0) ); end rca_behavioral ; After you have described your circuit in VHDL, write a testbench code and perform an exhaustive test of your VHDL behavioral description of the 4-bit RCA. 4.2 VHDL Description of a One-Digit BCD Adder In this section, you will implement a one-digit BCD adder in VHDL. A one-digit BCD adder adds two four-bit numbers represented in a BCD format. The result of the addition is a BCD-format 4-bit output, representing the decimal sum, and a carry that is generated if this sum exceeds a decimal value of 9 (see slides of Lecture #11). 4.2.1 Structural Description of a BCD Adder in VHDL In this part, you are required to implement the BCD adder using structural description. You are allowed to use either behavioral or structural style of coding for your implementation. Using the following entity definition. Use the following entity declaration. l i b r a r y IEEE; u s e IEEE. STD_LOGIC_11164 .ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y bcd_adder_structural i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (3 downto 0) ; C: o u t s t d _ l o g i c ); end bcd_adder_structural ; After you have implemented the one-digit BCD adder in VHDL, you are required to test your circuit. Write a testbench code and perform an exhaustive test of your VHDL structural description of the one-digit BCD adder. 4.2.2 Behavioral Description of a BCD Adder in VHDL In this part, you are required to implement the BCD adder using behavioral description. You are encouraged to base your code on the VHDL code in Section 5.7.3 of the textbook, so that you learn about conditional signal assignments (these are explained in detail in the same section as well as in the Appendix in Section A.7.4). Use the following entity declaration.l i b r a r y IEEE; u s e IEEE. STD_LOGIC_11164 .ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y bcd_adder_behavioral i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (3 downto 0) ; C: o u t s t d _ l o g i c ); end bcd_adder_behavioral ; After you have implemented the one-digit BCD adder in VHDL, you are required to test your circuit. Write a testbench code and perform an exhaustive test for your VHDL behavioral description of the one-digit BCD adder. 5 Critical Path of Digital Circuits In this part, you will learn how to use the Quartus CAD tool to determine the delay of a given path in digital circuits. To this end, in this section, we use the ripple-carry adder circuit (that you designed in VHDL assignment #3) as the “circuit under examination”. Follow the instructions described in VHDL Assignment #1 to create a project. Make sure to select the Cyclone V family of FPGAs, with the following part number: 5CSEMA5F31C6 when creating a project. Once created, import the VHDL description of your digital circuit into the project and compile it to make sure there are no syntax errors in your design. The critical path is the longest path in the circuit and limits the speed of the circuit speed. The speed of a digital circuit is measured in terms of latency and throughput. Latency is the time needed for the circuit to produce an output for a given input (i.e., the total propagation delay (time) from the input to the output), and it is expressed units of time. Alternatively, throughput refers to the rate at which data can be processed. In this assignment, we only consider the latency as a metric to measure the speed of the circuit. In general, digital circuits are subject to timing constraints dictated by the target application. Whether a circuit meets these timing constraints can only be known after the circuit is synthesized. After synthesis is performed, the designer can analyze the circuit to determine whether timing constraints were satisfied using the term slack. Slack is the margin by which a timing requirement is met or not met; it is the difference between the required arrival time and the actual arrival time. A positive slack value indicates the margin by which a requirement was met. A negative slack value indicates the margin by which a requirement was not met. To insert timing constraints in Quartus, select “Synopsys Design Constraints File” from the “File–>New” menu. The maximum delay can be specified in the Synopsys Design Constraints File using the following command: set_max_delay -from [get_ports ] -to [get_ports ] For example, we can specify a maximum delay of 12 ns for all the possible paths from the inputs of the ripplecarry adder to its outputs as shown below. 5 Once the timing constraints are inserted, save the file with the name “firstname_lastname_sdc.sdc”. Recompile your design by double clicking on “Timing Analysis” in the Tasks window of Quartus. Before recompilation, make sure that the .sdc file is added to the project. The Timing Analyzer will read the .sdc file and use the constraint information when performing timing analysis. Once a green check mark appears next to “Timing Analysis”, double click on “Timing Analyzer” under “Timing Analysis” to open the Timing Analyzer tool. In the Tasks window of the Timing Analyzer tool, double click on “Create Timing Netlist”, “Read SDC File” and “Update Timing Netlist” icons, respectively. Once completed, the colors of the icons will become green. Before measuring the delay of your design, you should specify the operating conditions of the FPGA device. This can be simply set by selecting one of the possible 6 operating conditions listed in the “Set Operating Conditions” in the Timing Analyzer tool. To report the delay of different paths from inputs to outputs, select “Report Timing . . . ” from the “Reports –> Custom Reports” menu. Since no clock signal is associated with your design, we only specify the beginning of the target path(s) by clicking on “From” and the end of the target path/paths by clicking on “To” under the section labeled as “Targets”. In the “Name Finder” window that pops up, click on “List” to list all the I/O signals of your design. Select (double click on) a signal (or signals) to determine the beginning of the path that you want to examine and click “OK”. Repeat the same procedure to determine the end of the path. As an example, we can examine the path from the LSB of the input A (i.e., A(0)) to the LSB of the output S (i.e., S(0)) as shown below.  Now, click on “Report Timing” to obtain timing information for the specified path. The delay of the specified path is denoted “Data Delay” in the section entitled “Summary of Paths”. The positive value of the slack denotes the difference between the delay of the path (i.e., Data Delay) and the timing constraint inserted in the .sdc file (i.e., 12 ns). This information is also visualized under the “waveform” tab. To find the critical path of your design, you should examine all possible paths from all inputs to all outputs and find the one with the longest delay. However, using this “exhaustive search” method is very time consuming as the number of I/O ports increases. To limit the number of paths under examination, we reduce the target delay value in the .sdc file so that a timing violation occurs. For instance, we reduce the target delay constraint of the “circuit under examination” from 12 ns to 5 ns and recompile the design by double clicking on “Timing Analysis” in Quartus. In case of a timing violation, Quartus determines the violating path(s) in the compilation summary of the “Timing Analyzer” tool. 8 In this approach, our search for the critical path will now be limited to the violating paths. Examining the violating paths in the “Timing Analyzer” tool will, therefore, determine the critical path. 6 Questions 1. Briefly explain your VHDL code implementation of all circuits. 2. Show representative simulation plots of the half-adder circuit for all possible input values. 3. Show representative simulation plots of the full-adder circuit for all possible input values. 4. Show representative simulation plots of both behavioral and structural descriptions of the 4-bit RCA for all possible input values. 5. Show representative simulation plots of both behavioral and structural descriptions of the one-digit BCD adder circuit for all possible input values. 6. Perform timing analysis and find the critical path(s) of the one-digit BCD adder circuit for the Fast 1,100 mV 85C Model. Show the obtained timing waveform(s) of the critical path(s) that you found. 7. Report the number of pins and logic modules used to fit your designs on the FPGA board. RCA One-digit BCD adder Structural Behavioral Structural Behavioral Logic Utilization (in LUTs) Total pins  7 Deliverables You are required to submit the following deliverables on MyCourses. Please note that a single submission is required per group (by one of the group members). • Lab report. The report should include the following parts: (1) Names and McGill IDs of group members, (2) an executive summary (short description of what you have done in this VHDL assignment), (3) answers to all questions in previous section (if applicable), (4) legible figures (screenshots) of schematics and simulation results, where all inputs, outputs, signals, and axes are marked and visible, (5) an explanation of the results obtained in the assignments (mark important points on the simulation plots), and (6) conclusions. Note – students are encouraged to take the reports seriously, points will be deducted for sloppy submissions. Please also note that even if some of the waveforms may look the same, you still need to include them separately in the report. • Project files. Create a single .zip file named VHDL#_firstname_lastname (replace # with the number of the current VHDL assignment and firstname_lastname with the name of the submitting group member). The .zip file should include the working directory of the project. 

$25.00 View

[SOLVED] Ecse 222 lab #2: describing sequential circuits in vhdl

1 Introduction In this lab you will learn how to describe sequential logic circuits in VHDL. You will design a stopwatch measuring time every 10 milliseconds. Also, you will use pushbuttons and 7-segment LEDs to control the stopwatch when running on the Altera DE1-SoC board. 2 Learning Outcomes After completing this lab you should know how to: • Design a counter in VHDL • Perform functional simulation of the counter using ModelSim • Design a stopwatch measuring time at every 10 milliseconds • Test the stopwatch on the Altera board 3 Counters A counter is a special sequential circuit. When counting up (by one), we require a circuit capable of “remembering” the current count and adding 1 the next time we request a count. When counting down (by one), we require a circuit capable of “remembering” the current count and subtracting 1 the next time we request a count. Counters use a clock signal to keep track of time. In fact, each increment occurs (or decrement) occurs when one clock period has passed. Since counters are themain building blocks of stopwatches, we will first design an 4-bit up-counter with an asynchronous reset (which should be active low) and an enable signal. The counter counts up when the enable signal is high. Otherwise, the counter holds its previous values. Use the following entity declaration for your VHDL description of the counter: l i b r a r y I E E E ; u s e I E E E . STD _ LOG IC _ 1 1 6 4 .ALL ; u s e I E E E .NUMERIC_STD.ALL ; e n t i t y gNN_counter i s P o r t ( enable : i n s t d _ l o g i c ; reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; count : o u t s t d _ l o g i c _ v e c t o r (3 down to 0) ); end gNN_counter ; Note that the up-counter that you have designed in this section will be used later in Section 5 to build a stopwatch. Once you have your circuit described in VHDL, you should simulate it. Write a testbench code and perform a functional simulation for your VHDL description of the counter.4 Clock Divider A clock divider is a circuit that generates a signal that is asserted once every T clock cycles. This signal can be used as a condition to enable the counters in the stopwatch circuit. An example of the clock and output (i.e., “enable”) waveforms for T = 4 is: Clock Enable 3 2 1 0 3 2 1 0 Implementing the clock divider circuit requires a counter counting clock periods. The counter counts down from T − 1 to 0. Upon reaching the count of 0, the clock divider circuit outputs/asserts 1 and the count is reset to T − 1. For other values of the counter, the output signal of the clock divider circuit remains 0. In this lab, we want to design a stopwatch counting in increments of 10 milliseconds. In other words, we need to assert an enable signal every 10 milliseconds. First, find the value of T for the clock divider circuit to generate an enable signal every 10 milliseconds. Note that the PLL, the device which supplies the clock for your design on the DE1-SoC board, works at a frequency of 50 MHz. Then, describe the clock divider circuit in VHDL using the following entity declaration: l i b r a r y I E E E ; u s e I E E E . STD _ LOG IC _ 1 1 6 4 .ALL ; u s e I E E E .NUMERIC_STD.ALL ; e n t i t y gNN_clock_divider i s P o r t ( enable : i n s t d _ l o g i c ; reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; en_out : o u t s t d _ l o g i c ); end gNN_clock_divider ; Hint: the following figure shows an example of the clock divider circuit. Also, note that the down-counter inside the clock divider circuit is different from the up-counter that you designed in Section 3. Down Counter From T-1 to 0 clk reset enable en_out Once you have your circuit described in VHDL, write a testbench code and perform a functional simulation for your VHDL description of the clock divider.5 Stopwatch In this part, you will design a simple stopwatch using the counter and clock divider circuits. You will use the pushbuttons to control the stopwatch and 7-segment displays to display the elapsed time in decimal. Pushbuttons PB0, PB1 and PB2 are used to start (or resume), pause and reset the stopwatch, respectively. When these buttons are released, the circuit has to remain at the new state denoted by their corresponding function. For example, when PB1 is pushed and then released, the stopwatch circuit pauses the count until told otherwise by pushing pne of the other pushbuttons. Therefore, you need a memory element to hold the operating state (e.g., running, paused) of the stopwatch. Note that the output of a pushbutton is high when the button is not being pushed, and is low when the button is being pushed. The first two 7-segment displays (i.e., HEX1-0), the second two 7-segment displays (i.e., HEX3-2) and the last two 7-segment displays (i.e., HEX5-4) are used to show time in centiseconds, seconds and minutes, respectively. You will need to create six instances of your gNN_counter and gNN_7_segment_decoder you created in Lab Assignment #1 for each decimal digit in the stopwatch. Since we measure time in increments of 10 milliseconds, the counter measuring time in centiseconds increments only when the output signal of the clock divider circuit becomes high. The following figure shows the high-level architecture of the stopwatch circuit. counter #0 counter #1 counter #2 counter #3 counter #4 counter #5 Clock Divider HEX Decoder HEX Decoder HEX Decoder HEX Decoder HEX Decoder HEX Decoder HEX0 HEX1 HEX2 HEX3 HEX4 HEX5 Centiseconds Seconds Minutes Note that counters #0, #1, #2, and #4 count from 0 to 9, while counters #3 and #5 count from 0 to 5. Describe the stopwatch circuit in VHDL using the following entity declaration: l i b r a r y I E E E ; u s e I E E E . STD _ LOG IC _ 1 1 6 4 .ALL ; u s e I E E E .NUMERIC_STD.ALL ; e n t i t y gNN_stopwatch i s P o r t ( start : i n s t d _ l o g i c ; stop : i n s t d _ l o g i c ; reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; HEX0 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ; HEX1 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ; HEX2 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ; HEX3 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ; HEX4 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ; HEX5 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ); end gNN_stopwatch ;You will now test your stopwatch circuit using the DE1-SoC board. Compile the circuit in the Quartus software. Once you have compiled the stopwatch circuit, it is time to map it on the Altera DE1-SoC board. Perform the pin assignment for both HEX displays and pushbuttons according to the aforementioned instruction. Make sure that you connect the clock signal of your design to 50 MHz clock frequency (see the DE1 user’s manual for the pin location of 50 MHz clock frequency). Program the board and demonstrate your stopwatch to the TA. You should be able stop, start and reset your stopwatch circuit using the pushbuttons.6 Deliverables and Grading 6.1 Demo Once completed, you will demo your project to the TA. You will be expected to: • fully explain how the HDL code works, • perform functional simulation using ModelSim, and • demonstrate that the stopwatch circuit is functioning properly using the pushbuttons and 7-segment LEDs on the DE1-SoC board. 6.2 Written report You are also required to submit a written report and your code on myCourses. Your report must include: • A description of the counter and clock divider circuits. Explain why these two circuits are considered as sequential designs. • Explain why even though we could build a clock divider using an up-counter it is easier to build the divider using a down-counter. • A discussion of how the counter and clock divider circuits were tested, showing representative simulation plots. How do you know that these circuits work correctly? • A description of the stopwatch circuit. Explain why you created six instances of the counter circuit in your design and why? • A discussion of how the stopwatch circuit was tested. • A summary of the FPGA resource utilization (from the Compilation Report’s Flow Summary) and the RTL schematic diagram for the stopwatch circuit. Clearly specify which part of your code maps to which part of the schematic diagram. Finally, when you prepare your report have in mind the following: • The title page must include the lab number, name and student ID of the students, as well as the group number. • All figures and tables must be clearly visible. • The report should be submitted in PDF format. • It should document every design choice clearly. • The grader should not have to struggle to understand your design. That is, – Everything should be organized for the grader to easily reproduce your results by running your code through the tools. – The code should be well-documented and easy to read.Grading Sheet Group Number: Name 1: Name 2: Task Grade /Total TA Signature VHDL code for the counter circuit /15 Creating testbench code for the counter circuit /5 Functional simulation of the counter circuit /5 VHDL code for the clock divider circuit /15 Creating testbench code for the clock divider circuit /5 Functional simulation of the clock divider circuit /5 VHDL code for the stopwatch circuit /25 Testing the adder circuit on the DE1-SoC board /25 Total /100

$25.00 View

[SOLVED] Ecse 222 lab #3: finite state machines

1 Introduction In this lab you will learn how to describe finite state machines (FSMs) in VHDL. Specifically, you will design a special kind of a 4-bit counter using an FSM. This counter is not a regular up-counter. In addition to being bi-directional, i.e., it counts up and down, it goes through the following sequence of numbers: 1↔2↔4↔8↔3↔6↔12↔11↔5↔10↔7↔14↔15↔13↔9 (↔1↔2↔4…) Note that the right-arrow (i.e., →) denotes the upward direction whereas the left-arrow (i.e., ←) denotes the downward direction of the counter. See Section 8.7.5 of the textbook for a similar counter. Pushbuttons, slide switches and 7-segment LEDs will be used during this lab to control the counters when running on the Altera DE1-SoC board. 2 Learning Outcomes After completing this lab you should know how to: • Design an FSM for a counter • Perform functional simulation of the FSM using ModelSim • Test the FSM on the Altera board 3 Finite State Machine Similar to the previous lab, the FSM circuit works when the enable signal is high. The direction signal denotes the counting direction. Note that the counter should be initialized according to their direction when the asynchronous active-low reset signal is asserted. Use the leftmost value (1) in the counting sequence for the upward direction and the rightmost value (9) for the downward direction. First, draw the state diagram for the counter. Then using the state diagram describe the counter in VHDL. Use the following entity declaration to describe a Moore-style FSM implementing the above counter: l i b r a r y I E E E ; u s e I E E E . STD _ LOG IC _ 1 1 6 4 .ALL ; u s e I E E E .NUMERIC_STD.ALL ; e n t i t y gNN_FSM i s P o r t ( enable : i n s t d _ l o g i c ; direction : i n s t d _ l o g i c ; reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; count : o u t s t d _ l o g i c _ v e c t o r (3 down to 0) ); end gNN_FSM ; See Section 8.4 of the textbook for a discussion on how to design FSMs using VHDL. Once you have your circuit described in VHDL, you should simulate it. Write a testbench code and perform a functional simulation for your VHDL description of the FSM.4 Multi-Mode Counter In this part, you will test your FSM circuit on the DE1-SoC FPGA board using the clock divider and 7-segment decoder circuits from the previous labs. You will also use pushbuttons, slide switches to control the functionality of the FSM. Pressing pushbuttons PB0, PB1 and PB2 starts/resumes, stops/pauses and resets the counter, respectively. When these buttons are released, the circuit has to remain at the new state denoted by their corresponding function. For example, when PB1 is pushed and then released, the FSM circuit pauses the count until told otherwise by pushing other pushbuttons. Therefore, you need a memory to hold the state imported by the pushbuttons. Note that the output of a pushbutton is high when the button is not being pushed, and is low when the button is being pushed. The counter direction is entered from one of the slide switches on the board. The first two 7-segment displays (i.e., HEX1-0) are used to show the count in decimal format. Use the clock divider circuit from the previous lab to show the current count on the 7-segment displays at every 1 second. To test the FSM circuit on the FPGA board, you will need to create instances of your gNN_clock_divider and gNN_7_segment_decoder to display the current value of the counter after each 1 second. Describe the system testing the FSM circuit on the FPGA board, which is hereafter referred to as multi-mode counter, in VHDL using the following entity declaration: l i b r a r y I E E E ; u s e I E E E . STD _ LOG IC _ 1 1 6 4 .ALL ; u s e I E E E .NUMERIC_STD.ALL ; e n t i t y gNN_multi_mode_counter i s P o r t ( start : i n s t d _ l o g i c ; stop : i n s t d _ l o g i c ; direction : i n s t d _ l o g i c ; reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; HEX0 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ; HEX1 : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ); end gNN_multi_mode_counter ; Compile the multi-mode counter circuit in the Quartus software. Once you have compiled the multi-mode counter circuit, it is time to map it on the Altera DE1-SoC board. Perform the pin assignment for HEX displays, slide switches and pushbuttons according to the DE1 user’s manual. Make sure that you connect the clock signal of your design to the 50 MHz clock frequency. Program the board and demonstrate your work to the TA. You should be able stop, start and reset your counter using the pushbuttons. Also, you should be able to change the counting direction using the slide switch on the board.5 Deliverables and Grading 5.1 Demo Once completed, you will demo your project to the TA. You will be expected to: • fully explain how the HDL code works, • perform functional simulation using ModelSim, and • demonstrate that the FSM circuit is functioning properly using the pushbuttons, slide switches and 7-segment LEDs on the DE1-SoC board. 5.2 Written report You are also required to submit a written report and your code on myCourses. Your report must include: • The state diagram of your FSM. • A description of the FSM circuit. • A discussion of how the FSM circuit was tested, showing representative simulation plots. How do you know that these circuits work correctly? • A description of the multi-mode counter circuit. • A discussion of how the multi-mode counter circuit was tested on the FPGA board. • A summary of the FPGA resource utilization (from the Compilation Report’s Flow Summary) and the RTL schematic diagram for the multi-mode counter circuit. Clearly specify which part of your code maps to which part of the schematic diagram. Finally, when you prepare your report have in mind the following: • The title page must include the lab number, name and student ID of the students, as well as the group number. • All figures and tables must be clearly visible. • The report should be submitted in PDF format. • It should document every design choice clearly. • The grader should not have to struggle to understand your design. That is, – Everything should be organized for the grader to easily reproduce your results by running your code through the tools. – The code should be well-documented and easy to read. Grading Sheet Group Number: Name 1: Name 2: Task Grade /Total TA Signature VHDL code for the FSM circuit /40 Creating testbench code for the FSM circuit /5 Functional simulation of the FSM circuit /5 VHDL code for the multi-mode counter circuit /10 Testing the multi-mode counter circuit on the DE1-SoC board /40 Total /100

$25.00 View

[SOLVED] Ecse 222 lab #1: getting started with vhdl coding

1 Introduction In this lab you will learn the basics of the Altera Quartus II FPGA design software through following a step-by-step tutorial, and use it to implement combinational logic circuits described in VHDL. You will also learn the basics of digital simulation using the ModelSim simulation program. 2 Learning Outcomes After completing this lab you should know how to: • Run the Intel Quartus software • Create the framework for a new project • Design and perform functional simulation of a Binary-to-7-segment LED decoder circuit • Design a 5-bit adder using VHDL • Test the adder on the Altera board 3 Run Intel Quartus In this course you will be using commercial FPGA design software: the Intel Quartus Prime program and the Mentor Graphics ModelSim simulation program. Quartus Prime and ModelSim are installed on the computers in the lab. You can also obtain a slightly restricted version, the Quartus Lite edition, from the Intel web site1 . The program restrictions will not affect any designs you will be doing in this course. You can (and you should) install the applications on your personal computer to work on your project outside of the lab. You should use version 18.0 of the program, as this is the latest version that supports the prototyping board (the Altera DE1-SoC board) that you will be using. To begin, start Quartus Prime by selecting it in the Windows Start menu: The following window will appear on startup (this shows version 18.0 downloaded from Intels’s web site; the versions on the lab computers may look slightly different). 1https://www.intel.com/content/www/us/en/programmable/downloads/download-center.htmlIntel Quartus Prime employs a project-based approach. The goal of a Quartus project is to develop a hardware implementation of a specific function, targeted to an FPGA (Field Programmable Gate Array) device. Typically, the project will involve a (large) number of different circuits, each designed individually, or taken from circuit libraries. Project management is therefore important. The Quartus Prime program aids in the project management by providing a project framework that keeps track of the various components of the project, including design files (such as schematic block diagrams or VHDL descriptions), simulation files, compilation reports, FPGA configuration or programming files, project specific program settings and assignments, and many others. The first step in designing a system using the Quartus Prime approach is therefore to create the project framework. The program simplifies this by providing a “Wizard” which guides you through a step-by-step setting of the most important options. To run the Project Wizard, click on the File menu and select the New Project Wizard entry. 4 Creating a New Project The New Project Wizard involves going through a series of windows. The first window is an introduction, listing the settings that can be applied. After reading the text on this window, click on “Next” to proceed.In the second window, you should give the project the following name: gNN_lab1 where NN is your 2-digit group number. The working directory for your project will be different than that shown in the screenshot below. Use your network drive for your project files. We don’t have a project template at this point, so select Empty project and proceed.You will add files later, so for now, just click on “Next”. In this lab, you will be downloading a design to an FPGA device on the DE1-SoC board. These devices belong to the Cyclone V family of FPGAs, with the following part number: 5CSEMA5F31C6. To ensure proper configuration of the FPGAs, select this device as shown below.The dialog box in the next window permits the designer to specify 3rd-party tools to use for various parts of the design process. We will be using a 3rd-party Simulation tool called ModelSim-Altera, so select this item from the Simulation drop-down menu. The final page in the New Project Wizard is a summary. Check it over to make sure everything is OK (e.g., the project name, directory, and device assignment), then click Finish. McGill University ECSE 222 – Digital Logic (Winter 2019) Lab Assignment #1 6 Your project framework is now ready. In File, click on New, and then select VHDL file from the list as shown below. You should have a VHDL editor opened in your framework. You will write and edit your code from this editor.5 Design a Binary to 7-Segment LED Decoder A 7-segment LED display has 7 individual LED segments, as shown below. By turning on different segments at any one time we can obtain different characters or numbers. There are six of these on the DE1-SoC board, which you will use later in your full implementation of the adder to display the result. In this part of the lab, you will design a circuit that will be used to drive the 7-segment LEDs on the DE1 board. It takes a 4-bit binary code representing the 16 hexadecimal digits between 0 and F (see figure below) as input, and generates the appropriate 7-segment display associated with the input code. Note that the outputs should be made active-low. This is convenient, as many LED displays, including the ones on the DE1 board, turn on when their segment inputs are driven low. Note that active low means “1” is off and “0” is on. To implement the 7-segment LED decoder, write a VHDL description using a single selected signal assignment statement. Use the following entity declaration, replacing the NN in gNN_7_segment_decoder with your group’s number McGill University ECSE 222 – Digital Logic (Winter 2019) Lab Assignment #1 8 (e.g., g08). l i b r a r y I E E E ; u s e I E E E . STD _ LOG IC _ 1 1 6 4 .ALL ; u s e I E E E .NUMERIC_STD.ALL ; e n t i t y gNN_7_segment_decoder i s P o r t ( code : i n s t d _ l o g i c _ v e c t o r (3 down to 0) ; segments : o u t s t d _ l o g i c _ v e c t o r (6 down to 0) ); end gNN_7_segment_decoder ; 6 Simulation of the circuit using ModelSim Once you have your circuit described in VHDL you should simulate it. The purpose of simulation is generally to determine: 1. if the circuit performs the desired function, and 2. if timing constraints are met. In the first case, we are only interested in the functionality of our implementation. We do not care about propagation delays and other timing issues. Because of this, we do not have to map our design to a target hardware. This type of simulation is called functional simulation. This is the type of simulation we will learn about in this lab. The other form of simulation is called timing simulation. It requires that the design be mapped onto a target device, such as an FPGA. Based on the model of the device, the simulator can predict propagation delays, and provide a simulation that takes these into account. Thus, the timing simulation may produce results that are quite different from the purely functional simulation. In this course, you will be using the ModelSim simulation software, created by the company Mentor Graphics (actually you will use a version of it specific to Quartus, called Modelsim-Altera). The Modelsim software operates on an Hardware Description Language (HDL) description of the circuit to be simulated, written either in VHDL, Verilog, or System-Verilog. You will use VHDL. Double-click on the ModelSim desktop icon to run the ModelSim program. A window similar to the one shown below will appear. Select FILE>New>Project and, in the window that appears, give the project the name gNN_lab1.Once you click OK, another dialog box will appear allowing you to add files to the project. Click on “Add Existing File” and select the VHDL file that was generated earlier (gNN_7_segment_decoder). You can also add files later. The ModelSim window will now show your VHDL file in the Project pane. To simulate the design, ModelSim must analyze the VHDL files, a process known as compilation. The compiled files are stored in a library. By default, this is named “work”. You can see this library in the “library” pane of the ModelSim window. The question marks in the Status column in the Project tab indicate that either the files have not been compiled into the project or the source file has changed since the last compilation. To compile the files, select Compile > Compile All or right click in the Project window and select Compile > Compile All. If the compilation is successful, the question marks in the Status column will turn to check marks, and a success message will appear in the Transcript pane (see figure below).The compiled VHDL files will now appear in the library “work”. Since all of the inputs are undefined, if you ran the simulation now, the outputs would be undefined. So you need to have a means of setting the inputs to certain patterns, and of observing the outputs’ responses to these inputs. In ModelSim, this is done by using a special VHDL entity called a Testbench. A testbench is special VHDL code that generates different inputs that will be applied to your circuit so that you can automate the simulation of your circuit and see how its outputs respond to different inputs. Note that the testbench is only used in Modelsim for the purposes of simulating your circuit. You will eventually synthesize your circuits into a real hardware chip called an FPGA. However, you will NOT synthesize the testbench into real hardware. Because of its special purpose (and that it will not be synthesized), the testbench entity is unique in that it has NO inputs or outputs, and uses some special statements that are only used in test benches. These special statements are not used when describing circuits that you will later synthesize to a FPGA. The testbench contains a single component instantiation statement that inserts the module to be tested (in this case the gNN_7_segment_decoder module), as well as some statements that describe how the test inputs are generated. After you gain more experience you will be able to write VHDL testbenches from scratch. However, Quartus has a convenient built-in process, called the Test Bench Writer, which produces a VHDL template from your design that will get you started. To get the template, go back to the Quartus program, making sure that you have the gNN_7_segment_decoder project loaded. Then, in the Processing toolbar item, select “Start/Start Test Bench Template Writer”. This will generate a VHDL file named gNN_7_segment_decoder.vht and place it in the simulation/- modelsim directory. Open the template in Quartus. Note that the template already includes the instantiation of the under test circuit (i.e., gNN_7_segment_decoder component). It also includes the skeletons of two “process” blocks, one labeled “init” and the other labeled “always”. It is not important to understand process blocks at this point. We will learn about them later on. The init process block can be deleted. You should edit the “always” process block to suit your needs, so in this case it will be used to generate the code signal waveform. You will notice that inside the process block, signal code are assigned multiple times! This should not make sense right now. If a signal was assigned multiple times using concurrent signal statements, this would be an error! However, the rules for statements inside a process block are different. We will discuss process blocks later in the course. The “wait for x ns” statement is a special VHDL statement that is only used inVHDL testbenches, and not in VHDL descriptions of synthesizable circuits that are intended to be implemented in real hardware. We never indicate time this way in synthesizable VHDL. There are 24 or 16 possible patterns in the gNN_7_segment_decoder circuit, so complete testing of the circuit will require you to simulate all of these patterns. In order to run through all possible 16 cases, we use a FOR LOOP that increments the value of code signal in the loop. This is done in the testbench shown below. generate_test : PROCESS BEGIN FOR i IN 0 t o 15 LOOP — loop over all code values code Add to Project > Existing File…. Once the testbench file has been added to the project, you should select the testbench file in the Project pane, and click on Compile Selected from the Compile toolbar item. This will compile the testbench file. Now everything is ready for you to actually run a simulation! Select “Start Simulation” from the Simulate toolbar item in the ModelSim program. The window shown below will appear. Select the gNN_7_segment_decoder_tst entity and click on OK. The ModelSim window should now look like the figure below.At first, the “Wave” window will not have any signals in it. You can drag signals from the “Objects” window by click on a signal, holding down the mouse button, and dragging the signal over to the Wave window. Do this for all the signals. The Wave window will now look like the one shown in Figure 1. Figure 1: Signal waveform in ModelSim. Now, to actually run the simulation, click on the “Run all” icon in the toolbar. Check the output of your implementation for every single case. If you get an incorrect output waveform, you will have to go back and look at your design. If you make a correction to your VHDL code, you will have to re-run the compilation of the changed files in ModelSim. Finally, to rerun the simulation, first click on the “Restart” button, then click on the “Run all” button. 7 Design a 5-bit Adder In this part of the lab, you will design a circuit performing addition on two 5-bit inputs A and B. It also displays inputs and the result of the addition in hexadecimal format on the 7-segment LEDs on the DE1-SoC board. Note that you should use the binary-to-7-segment LED decoder to obtain appropriate 7-segment display code. Moreover, each signal requires two hexadecimal digits for representation. Therefore, you will need to use all the six 7-segment LEDs on the aboard in this lab. Use the following entity declaration to write a VHDL description of the adder circuit. Note that you have to instantiate from the gNN_7_segment_decoder circuit in you VHDL description. l i b r a r y I E E E ; u s e I E E E . STD _ LOG IC _ 1 1 6 4 .ALL ; u s e I E E E .NUMERIC_STD.ALL ; e n t i t y gNN_adder i s P o r t ( A, B : i n s t d _ l o g i c _ v e c t o r (4 down to 0) ; decoded_A : o u t s t d _ l o g i c _ v e c t o r (13 down to 0) ; decoded_B : o u t s t d _ l o g i c _ v e c t o r (13 down to 0) ; decoded_AplusB : o u t s t d _ l o g i c _ v e c t o r (13 down to 0) ); end gNN_adder ;8 Testing the Adder on the Altera Board You will now test the adder circuit you designed in Section 7. Compile the decoder in the Quartus software. Once you have compiled the adder circuit, it is time to map it onto the target hardware, in this case the Cyclone V chip on the Altera DE1-SoC board. Please begin by reading over the DE1-SoC userâA˘Zs manual, which can be found on the myCourses lab ´ experiments page. Since you will now be working with an actual device, you have to be concerned with which device package pins the various inputs and outputs of the project are connected. In particular, you will want to connect the LED segment outputs from the instances of the gNN_7_segment_decoder circuit (i.e., the outputs of the adder circuit) to the corresponding segments of one of the six 7-segment LED displays on the board. The mapping of the board’s 7-segment LEDsâA˘Z´ segments to the pins on the Cyclone FPGA device is listed in Table 3-9 on page 24 of the DE1-SoC Development and Education Board Users Manual. You will also want to connect, for testing purposes, 5 of the slide switches on the DE1-SoC board to the input A and the rest to the input B of the gNN_adder circuit. The mapping of the slide switches to the FPGA pins is given in Table 3-6 on pages 23 of the DE1 userâA˘Zs manual. ´ You can tell the compiler of your choices for pin assignments for your inputs and outputs by opening thePin Planner, which can be done by choosing the Pins item in the Assignments menu, as shown below.Once you have assigned all of the inputs and outputs of your circuit to appropriate device pins, re-compile your design. Your design is now ready to be downloaded to the target hardware. Read section 4.1 of the DE1-SoC userâA˘Zs´ manual for information on configuring (programming) the Cyclone V FPGA on the board. You will be using the JTAG mode to configure the device. Take the board out of the kit box, and connect the USB cable to the computer’s USB port and to the USB connector on the board. Next, select the Programmer item from the Tools menu. Click Auto Detect and then select the correct device (5CSEMA5), as shown below. Both FPGA device and HPS should be detected.Next, double-click the FPGA device (5CSEMA5), and from the window that opens add the .sof file created by Quartus. Finally, check the “Program/configure” box beside the 5CSEMA5 device, and then click “Start”. Now, you should be able to use slide switches to insert values for inputs A and B. The 7-segment LEDs should also display inputs and outputs in hexadecimal format.9 Deliverables and Grading 9.1 Demo Once completed, you will demo your project to the TA. You will be expected to: • fully explain how the HDL code works, • perform functional simulation using ModelSim, and • demonstrate that the adder circuit is functioning properly using the slide switches and 7-segment LEDs on the DE1-SoC board. 9.2 Written report You are also required to submit a written report and your code on myCourses. Your report must include: • A description of the 7-segment decoder circuit. Explain why you used the selected signal assignment instead of the conditional signal assignment. • A discussion of how the 7-segment decoder circuit was tested, showing representative simulation plots. How do you know that the circuit works correctly? • A description of the adder circuit. How many 7-segment decoder instances did you use in your design and why? • A discussion of how the adder circuit was tested. • A summary of the FPGA resource utilization (from the Compilation Report’s Flow Summary) and the RTL schematic diagram for both the 7-segment decoder and the adder circuits. Clearly specify which part of your code maps to which part of the schematic diagram. Finally, when you prepare your report have in mind the following: • The title page must include the lab number, name and student ID of the students, as well as the group number. • All figures and tables must be clearly visible. • The report should be submitted in PDF format. • It should document every design choice clearly. • The grader should not have to struggle to understand your design. That is, – Everything should be organized for the grader to easily reproduce your results by running your code through the tools. – The code should be well-documented and easy to read. Grading Sheet Group Number: Name 1: Name 2: Task Grade /Total TA Signature Creating Project /10 VHDL code for the 7-segment decoder circuit /20 Creating testbench code for the 7-segment decoder circuit /10 Functional simulation of the 7-segment decoder circuit /10 VHDL code for the adder circuit /20 Testing the adder circuit on the DE1-SoC board /30 Total /100

$25.00 View

[SOLVED] Ecse 4320/6320 course project #4: implementation of dictionary codec

1. Introduction The objective of this project is to implement a dictionary codec. As discussed in class, dictionary encoding is being widely used in real-world data analytics systems to compress data with relatively low cardinality and speed up search/scan operations. In essence, a dictionary encoder scans the to-be-compressed data to build the dictionary that consists of all the unique data items and replaces each data item with its dictionary ID. To accelerate dictionary look-up, one may use certain indexing data structure such as hash-table or B-tree to better manage the dictionary. In addition to reducing the data footprint, dictionary encoding makes it possible to apply SIMD instructions to significantly speed up the search/scan operations. 2. Requirement Your implementation should support the following operations: (1) Encoding: given a file consisting of raw column data, carry out dictionary encoding and generate an encoded column file consisting of both dictionary and encoded data column. Your code must support multi-threaded implementation of dictionary encoding. (2) Query: enable users to query an existing encoded column file. Your implementation should allow users to (i) check whether one data item exists in the column, if it exists, return the indices of all the matching entries in the column; (ii) given a prefix, search and return all the unique matching data and their indices. Your implementation must support the use of SIMD instructions to speed up the search/scan. (3) Your code should also support the vanilla column search/scan (i.e., without using dictionary encoding), which will be used as a baseline for the speed performance comparison In addition to the source code, your Github site should contain (1) Readme that clearly explains the structure/usage of your code, and how your code utilizes multi-threading and SIMD to speed up (2) Experimental results that show the performance of your implementation (both encoding performance and query performance). When measuring the performance, do not count the time of loading file to memory and writing file to SSD. The performance results must contain: (i) encoding speed performance under different number of threads, (ii) single data item search speed performance of your vanilla baseline, dictionary without using SIMD, and dictionary with SIMD, (iii) prefix scan speed performance of your vanilla baseline, dictionary without using SIMD, and dictionary with SIMD. (3) Analysis and conclusion Note: Use the raw column data file at the following address for your speed performance evaluation https://drive.google.com/file/d/192_suEsMxGInZbob_oJJt-SlKqa4gehh/view?usp=share_link

$25.00 View

[SOLVED] Ecse 4320/6320 course project #3: memory and storage performance profiling

1. Introduction The objective of this project is to develop first-hands knowledge and deeper understanding on the performance of modern memory and storage devices. You do not need to write any code for this project. Instead, you will use publicly available software packages to carry out comprehensive experiments to measure the read/write latency/throughput of your memory and storage devices under various data access throughput. You should observe a clear trade-off between access latency and throughput (as revealed by queueing theory discussed in class): As you increase the memory/storage access queue depth (hence increase data access workload stress), memory/storage devices will achieve higher resource utilization and hence higher throughput, but meanwhile the latency of each data access request will be longer. 2. Requirement For this project, your Github site only needs to host a detailed report that describes your experiment environment/settings/results and presents your analysis and conclusions. Your experiments should cover a wide range of settings in terms of read vs. write intensity ratio (e.g., read-only, write-only, 70%:30% read vs. write), data access size (e.g., 64B/256B for memory and 4KB/32KB/128KB for SSD), throughput vs. latency. Your report must include some discussions that use queuing theory to explain the throughput vs. latency results you have captured. Below are two software packages you may use: • Cache and memory: Intel Memory Latency Checker, and you can find details and download at https://software.intel.com/content/www/us/en/develop/articles/intelr-memory-latency-checker.html • Storage: Flexible IO tester (FIO), which is available at https://github.com/axboe/fio It may be already included in your Linux distribution, and the man page is https://linux.die.net/man/1/fio Warning: FIO may overwrite the entire drive partition, so you may want to create an empty partition on your SSD just for FIO testing. Carelessly running FIO on your existing partition may destroy your data! The specification of Intel Data Center NVMe SSD D7-P5600 (1.6TB) lists a random write-only 4KB IOPS of 130K. Compare your results with this Intel enterprise-grade SSD, and try to explain any unexpected observation (e.g., your client-grade SSD may show higher IOPS than such expensive enterprise-grade SSD, why?).

$25.00 View

[SOLVED] Ecse 4320/6320 course project #2: matrix-matrix multiplication with simd instructions & cache miss minimization

1. Introduction The objective of this design project is to implement a C/C++ module that carries out high-speed matrix-matrix multiplication by explicitly utilizing x86 SIMD instructions and minimizing cache miss rate via restructuring data access patterns. Matrix-matrix multiplication is one of the most important data processing kernels in numerous real-life applications, e.g., machine learning, computer vision, signal processing, and scientific computing. This project aims to help you gain hands-on experience of (1) SIMD programming, and (2) cache access optimization. It will help you develop a deeper understanding of the importance of exploiting data-level parallelism and minimizing cache miss rate. 2. Requirement Your implementation should be able to support (1) configurable matrix size that can be much larger than the onchip cache capacity, and (2) both fixed-point and floating-point data. Each group should create a Github site that hosts the code/results of all the projects through this semester. Other than the source code, your Github site should contain (1) Readme that clearly explains the structure/installation/usage of your code (2) Experimental results that show the performance of your code under different matrix size (at least including 1,000×1,000 and 10,000×10,000) and different data precision (4-byte floating-point, 2-byte fixed-point) (3) Comparison with native implementation of matrix-matrix multiplication (i.e., without any cache optimization and explicit use of SIMD instructions) (4) Analysis and conclusion 3. Additional Information The easiest way to use SIMD instructions is to call the intrinsic functions in your C/C++ code. The complete reference of the intrinsic functions can be found at https://software.intel.com/sites/landingpage/IntrinsicsGuide/, and you can find many on-line materials about their usage. Moreover, matrix-matrix multiplication has been well studied in the industry, and one well-known library is the Intel Math Kernel Library (MKL), which can be a good reference for you.

$25.00 View

[SOLVED] Ecse 4320/6320  course project #1: programming with multiple threads

1. Introduction The objective of this project is to implement a C/C++ module that uses multiple threads to compress an input data stream. It aims to help you gain hands-on experience of multithread programming, which could very much help your job hunting. Compression can be done by calling the ZSTD library (https://facebook.github.io/zstd/), which was open sourced by Facebook and has been widely adopted in the industry. Each 16KB block in the input data stream should be compressed by ZSTD individually (i.e., all the 16KB blocks are compressed independently from each other). All the compressed blocks are in-order written to one output file (i.e., the 1st 16KB in the input stream is compressed and written to the output file as the 1st block, the 2nd 16KB in the input stream is compressed and written to the file right after the 1st block, etc.). You can use a big compressible file (e.g., tens of GBs) as the source of the input data stream. Your code should use one thread to read data from the file, dispatch 16KB blocks to other worker threads for compression, receive compressed blocks from other worker threads, and write compressed blocks (in the correct order) to the output file. 2. Requirement Your implementation should be able to support a configurable number of worker threads. In addition to the source code, your Github site should contain (1) Readme that clearly explains the structure/usage of your code (2) Experimental results that show the performance of your multi-threaded compression module under different number of worker threads (3) Analysis and conclusion 3. Additional Information C/C++ does not have built-in support for multithreaded application, hence must rely on the underlying operating systems. Linux provides the pthread library to support multithreaded programming. You can find a very nice tutorial on pthread at https://computing.llnl.gov/tutorials/pthreads/. Microsoft also provides support for multi-thread programming (e.g., see https://docs.microsoft.com/en-us/windows/win32/procthread/multiple-threads). You are highly encouraged to program on Linux since Linux-based programming experience will help you most on the job market.

$25.00 View

[SOLVED] Deep learning systems (engr-e 533) homework 1 to 4 solutions

1. Replicate the test accuracy graph on M02-S09. 2. Show me your weight visualization, too. 3. Please do not use any advanced optimization methods (Adam, batch norm, dropout, etc.) or initialization methods (Xavier and so on). Plan SGD should just work. 4. In TF 2.x, you can do something like this to download the MNIST dataset: mnist = tf.keras.datasets.mnist In PT, you can use these lines of commands (don’t worry about the batch size and normalization– you can go for your own option for them): import torchvision mnist_train=torchvision.datasets.MNIST(’mnist’, train=True, download=True, transform=torchvision.transforms.Compose([ 1 torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((0.1307,), (0.3081,)) ])) mnist_test=torchvision.datasets.MNIST(’mnist’, train=False, download=True, transform=torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((0.1307,), (0.3081,)) ])) Problem 2: Autoencoders [4 points] 1. Replicate the test accuracy graph on M02-S12. 2. It means, you also want to show the figures in M02-S11. 3. Note that your encoder weights are frozen; you only update the softmax layer weights (the 100 × 10 matrix and the bias). Problem 3: A shallow NN [3 points] 1. Replicate the test accuracy graph on M02-S14. 2. I don’t have to see the visualization of the first layer. Just show me your graphs. Problem 4: Full BP on the both layers [6 points] 1. Replicate the test accuracy graph on M02-S17. Replicate the figures in M03 Adult Optimization, slide 22 using the details as follows: 1. Use the same network architecture and train five different network instances in five different setups. The architecture has to be a fully connected network (a regular network, not a CNN or RNN) with five hidden layers, 512 hidden units per layer. 2. Create five different networks that share the same architecture as follows: (a) Activation function: the logistic sigmoid function; initialization: random numbers generated from the normal distribution (µ = 0, σ = 0.01) (b) Activation function: the logistic sigmoid function; initialization: Xavier initializer (c) Activation function: ReLU; initialization: random numbers generated from the normal distribution (µ = 0, σ = 0.01) (d) Activation function: ReLU; initialization: Xavier initializer (e) Activation function: ReLU; initialization: Kaiming He’s initializer 3. You don’t have to implement your own initializer. Both TF and PT come with pre-implemented initializers. 2 4. Train them with the traditional SGD. Do not improve SGD by introducing momentum or any other advanced stuff. Your goal is to replicate the figures in 22. Feel free to use preimplemented SGD optimizer. 5. In practice, you will need to investigate different learning rate as for SGD, which will give you different convergence behaviors. 6. Don’t worry if your graphs are slightly different from mine. We will give a full mark if your graphs show the same trend.Problem 1: Speech Denoising Using Deep Learning [3 points] 1. If you took my MLSP course, you may think that you’ve seen this problem. But, it’s actually somewhat different from what you did before, so read carefully. And, this time you SHOULD implement a DNN with at least two hidden layers. . 2. When you attended IUB, you took a course taught by Prof. K. Since you really liked his lectures, you decided to record them without the professor’s permission. You felt awkward, but you did it anyway because you really wanted to review his lectures later. 3. Although you meant to review the lecture every time, it turned out that you never listened to it. After graduation, you realized that a lot of concepts you face at work were actually covered by Prof. K’s class. So, you decided to revisit the lectures and study the materials once again using the recordings. 4. You should have reviewed your recordings earlier. It turned out that a fellow student who used to sit next to you always ate chips in the middle of the class right beside your microphone. So, Prof. K’s beautiful deep voice was contaminated by the annoying chip eating noise. 5. But, you vaguly recall that you learned some things about speech denoising and source separation from Prof. K’s class. So, you decided to build a simple deep learning-based speech denoiser that takes a noisy speech spectrum (speech plus chip eating noise) and then produces a cleaned-up speech spectrum. 1 6. Since you don’t have Prof. K’s clean speech signal, I prepared this male speech data recorded by other people. train dirty male.wav and train clean male.wav are the noisy speech and its corresponding clean speech you are going to use for training the network. Take a listen to them. Load them and covert them into spectrograms, which are the matrix representation of signals. To do so, you’ll need to install librosa and use it by using the following codes: !pip install librosa # in colab, you’ll need to install this import librosa s, sr=librosa.load(‘train_clean_male.wav’, sr=None) S=librosa.stft(s, n_fft=1024, hop_length=512) sn, sr=librosa.load(‘train_dirty_male.wav’, sr=None) X=librosa.stft(sn, n_fft=1024, hop_length=512) which is going to give you two matrices S and X of size 513 × 2459. This procedure is something called Short-Time Fourier Transform. 7. Take their magnitudes by using np.abs() or whatever other suitable methods, because S and X are complex valued. Let’s call them |S| and |X|. 8. Train a fully-connected deep neural network. A couple of hidden layers would work, but feel free to try out whatever structure, activation function, initialization scheme you’d like. The input to the network is a column vector of |X| (a 513-dim vector) and the target is its corresponding one in |S|. You may want to do some mini-batching for this. Make use of whatever functions in Tensorflow or Pytorch. 9. But, remember that your network should predict nonnegative magnitudes as output. Try to use a proper activation function in the last layer to make sure of that. I don’t care which activation function you use in the middle layers. 10. test 01 x.wav is the noisy signal for validation. Load them and apply STFT as before. Feed the magnitude spectra of this test mixture |Xtest| to your network and predict their clean magnitude spectra |Sˆ test|. Then, you can recover the (complex-valued) speech spectrogram of the test signal in this way: Sˆ = Xtest |Xtest| ⊙ |Sˆ test|, (1) which means you take the phase information of the input noisy signal Xtest |Xtest| and use that to recover the clean speech. ⊙ stands for the Hadamard product and the division is element-wise, too. 11. Recover the time domain speech signal by applying an inverse-STFT on Sˆ test, which will give you a vector. Let’s call this cleaned-up test speech signal sˆtest. I’ll calculate something called Signal-to-Noise Ratio (SNR) by comparing it with the ground truth speech I didn’t share with you. It should be reasonably good. You can actually write it out by using the following code: librosa.output.write_wav(‘test_s_01_recons.wav’, sh_test, sr) or 2 import soundfile as sf sf.write(‘test_s_01_recons.wav’, sh_test, sr) 12. You can compute SNR if you know the ground-truth source. Load test 01 s.wav. This is the ground-truth clean signal buried in test 01 x.wav. Compute the SNR of the predicted validation signal by comparing it to test 01 s.wav, but do not include this example to your training process. Once the training process is done, or even in the middle of training epochs, you apply your model to this validation example, and compute the SNR value. That way, you can simulate the testing environment, although it doesn’t guarantee that the model will work well on the test example, because the validation example can be different from the test set. This approach is related to the early stopping technique explained in M03 S37. Use this validation signal to prevent overfitting. By the way, SNR is defined as follows: SNR = 10 log10 P t s 2 (t) P ts(t) − sˆ(t) 2 , (2) where s(t) and ˆs(t) are the ground-truth clean speech and the recovered one in the time domain, respectively. Be careful with the division and logarithm: you don’t want your denominator to be zero or anything inside the log function zero. Adding a very small number, e.g., 1e −20, is a good idea to prevent it. 13. Do the same testing procedure for test 02 x.wav, which actually contains Prof. K’s voice along with the chip-eating noise. Enjoy his enhanced voice using your DNN. 14. Grading will be based on the denoised version of test 02 x.wav. So, submit the audio file. Problem 2: Speech Denoising Using 1D CNN [4 points] 1. As an audio guy it’s sad to admit, but a lot of audio signal processing problems can be solved in the time-frequency domain, or an image version of the audio signal. You’ve learned how to do it in the previous homework by using STFT and its inverse process. 2. What that means is nothing stops you from applying a CNN to the same speech denoising problem. In this question, I’m asking you to implement a 1D CNN that does the speech denoising job in the STFT magnitude domain. 1D CNN here means a variant of CNN which does the convolution operation along only one of the axes. In our case it’s the frequency axis. 3. Like you did in Problem 1, install/load librosa. Take the magnitude spectrograms of the dirty signal and the clean signal |X| and |S|. 4. Both in Tensorflow and PyTorch, you’d better transpose this matrix, so that each row of the matrix is a spectrum. Your 1D CNN will take one of these row vectors as an example, i.e. |X| ⊤ :,i. Since this is not an RGB image with three channels, nor you’ll use any other information than just the magnitude during training, your input image has only one channel (depth-wise). Coupled with your choice of the minibatch size, the dimensionality of your minibatch would be like this: [(batch size) × (number of channels) × (height) × (width)] = [B × 1 × 1 × 513]. Note that depending on the implementation of the 1D CNN layers in TF or PT, it’s okay to omit the height information. Carefully read the definition of the function you’ll use. 3 5. You’ll also need to define the size of the kernel, which will be 1 × D, or simply D depending on the implementation (because we know that there’s no convolution along the height axis). 6. If you define K kernels in the first layer, the output feature map’s dimension will be [B ×K × 1 × (513 − D + 1)]. You don’t need too many kernels, but feel free to investigate. You don’t need too many hidden layers, either. 7. In the end, you know, you have to produce an output matrix of [B × 513], which are the approximation of the clean magnitude spectra of the batch. It’s a dimension hard to match using CNN only, unless you take care of the edges by padding zeros (let’s not do zero-padding for this homework). Hence, you may want to flatten the last feature map as a vector, and add a regular linear layer to reduce that dimensionality down to 513. 8. Meanwhile, although this flattening-followed-by-linear-layer approach should work in theory, the dimensionality of your flattened CNN feature map might be too large. To handle this issue, we will used the concept we learned in class, striding: usually, a stride larger than 1 can reduce the dimensionality after each CNN layer. You could consider this option in all convolutional layers to reduce the size of the feature maps gradually, so that the input dimensionality of the last fully-connected (FC) layer is manageable. Maxpooling, coupled with the striding technique, would be something to consider. 9. Be very careful about this dimensionality, because you have to define the input and output dimensionality of the FC layer in advance. For example, a stride of 2 pixels will reduce the feature dimension down to roughly 50%, though not exactly if the original dimensionality is an odd number. 10. Don’t forget to apply the activation function of your choice, at every layer, especially in the last layer. 11. Try whatever optimization techniques you’ve learned so far. 12. Check on the quality of the test signal you used in P1. Submit the denoised signal. Problem 3: Data Augmentation [4 points] 1. CIFAR10 is a pretty straightforward image classification task, that consists of 10 visual object classes. 2. Download them from here1 and be ready to use it. Both PyTorch and Tensorflow have options to conveniently load them, but I chose to download them directly and mess around because I found it easier. 3. Set aside 5,000 training examples for validation. 4. Build your baseline CNN classifier. (a) The images need to be reshaped into 32 × 32 × 3 tensor. (b) Each pixel is an integer with 8bit encoding (from 0 to 255). Transform them down to a floating point with a range [0, 1]. 0 means a black pixel and 1 is a white one. 1https://www.cs.toronto.edu/ kriz/cifar.html 4 (c) People like to rescale the pixels to [-1, 1] so that the input to the CNN is well centered around 0, instead of 0.5. (d) I know you are eager to try out a fancier net architecture, but let’s stick to this simple one: 1st 2d conv layer: there are 10 kernels whose size is 5x5x3; stride=1 Maxpooling: 2×2 with stride=2 1st 2d conv layer: there are 10 kernels whose size is 5x5x10; stride=1 Maxpooling: 2×2 with stride=2 1st fully-connected layer: [flattened final feature map] x 20 2st fully-connected layer: 20 x 10 Softmax on the 10 classes Let’s stick to ReLU for activation and the He initializer. (e) Train this net with an Adam optimizer with a default initial learning late (i.e. 0.001). Check on the validation accuracy at the end of every epoch. Report your validation accuracy over the epochs as a graph. This is the performance of your baseline system. 5. Build another classifier using augmented dataset. Prepare four different datasets out of the original CIFAR10 training set (except for the 5,000 you set aside for validation): (a) I know you already changed the scale of the pixels from 0—255 to -1—+1. Let’s go back to the intermediate range, 0—1. (b) Augmented dataset #1: Brighten every pixel in every image by 10%, e.g., by multiplying 1.1. Make sure though, that they don’t exceed 1. For example, you may want to do something like this: np.minimum(1.1*X, 1). (c) Augmented dataset #2: Darken every pixel in every image by 10%, e.g., by multiplying 0.9. (d) Augmented dataset #3: Flip all images horizontally (not upside down). As if they are mirrored. (e) Augmented dataset #4: The original training set. (f) Merge the four augmented dataset into one gigantic training set. Since there are 45,000 images in the original training set (after excluding the validation set), after the augmentation you have 45,000×4=180,000 images. Each original image has four different versions: brighter, darker, horizontally flipped, and original versions. Note that the four share the same label: a darker frog is still a frog. (g) Don’t forget to scale back to -1—+1. (h) You better visualize a few images after the augmentation to make sure what you did is correct. (i) Train a fresh new network with the same architecture, but using this augmented dataset. Record the validation accuracy over the epochs. 6. Overlay the validation accuracy curve from the baseline with the new curve recorded from the augmented dataset. I ran 200 epochs for both experiments and was able to see convincing results (i.e., the data augmentation improves the validation performance). 7. In theory you have to conduct a test run on the test set, but let’s forget about it. 5 Problem 4: Self-Supervised Learning via Pretext Tasks [4 points] 1. Suppose that you have only 50 labeled examples per class for your CIFAR10 classification problem, totaling 500 training images. Presumably it might be tough to achieve a high performance in this situation. 2. Set aside 500 examples from your training set (I chose the last 500 examples). 3. The pretext task: (a) On the other hand, we will assume that the rest of the 49,500 training examples are unlabeled. We will create a bogus classification problem using them. Let this unlabeled examples (or the examples that you disregard their original labels) be “class 0”. (b) “class 1”: Create a new class, by vertically flipping all the images upside down. (c) “class 2”: Create another class, by rotating the images 90 degree counter-clock wise. (d) Now you have three classes, each of which contains 49,500 labeled examples. (e) This is not a classification problem one can be serious about, but the idea here is that a classifier that is trained to solve this problem may need to learn some features that are going to be helpful for the original CIFAR10 classification problem. (f) Train a network with the same setup/architecture described in Problem 3. In theory you need to validate every now and then to prevent overfitting, but who cares about this dummy problem? Let’s forget about it and just run about a hundred epochs. (g) Store your model somewhere safe. Both TF and PT provide a nice way to save the net parameters. 4. The baseline: (a) Train a classifier from scratch on the 500 CIFAR10 dataset you set aside in the begining. Note that they are for the original 10-class classification problem, and you ARE doing the original CIFAR10 classification, except that you use a ridiculously small amount of dataset. Let’s stick to the same architecture/setup. You may need to choose a reasonable initializer, e.g., the He initializer. You know, since the training set is too small, you may not even have to do batching. (b) Let’s cheat here and use the test set of 10,000 examples as if they are our validation set. If you check on the test accuracy at every 100th epoch, you will see it overfit at some point. Record the accuracy values over iterations. 5. The transfer learning task: (a) Train our third classifier on the 500 CIFAR10 dataset you set aside in the begining. Again, note that they are for the original 10-class classification problem. (b) Instead of using an initializer, you will reload the weights from the pretext network. Yes, that’s exactly the definition of transfer learning. But, because you learned it from an unlabeled set, and had to create a pretext task to do so, it falls in the category of self-supervised learning. 6 (c) Note that you can trasfer all the parameters in except for the final softmax layer, as the pretext task is only with 3 classes. Let’s randomly initialize the last layer parameters with He. (d) You need to reduce the learning rates for transfer learning in general. More importantly, for the ones you transfer in, they have to be substantially lower than 1 × 10−3 , e.g. 1 × 10−5or1 × 10−6 . Meanwhile, the last softmax layer will prefer the default learning rate 1 × 10−3 , as it’s randomly initialized. (e) Report your test accuracy at every 100th epoch. 6. Draw two graphs from the two experiments, the baseline and the finetuning method, and compare the results. For your information, I ran both of them 10,000 epochs, and recorded the validation accuracy (actually, the test accuracy as I used the test set) at every 100th epoch. Of course, the point is that the self-supervised features should give improvement.Problem 1: Network Compression Using SVD [2 points] 1. Train a fully-connected net for MNIST classification. It should be with 5 hidden layers each of which is with 1024 hidden units. Feel free to use whatever techniques you learned in class. You should be able to get the test accuracy above 98%. Let’s call this network “baseline”. You can reuse the one from the previous homework if its accuracy is good enough. Otherwise, this would be a good chance for you to improve your “baseline” MNIST classifier. 2. You learned that Singular Value Decomposition (SVD) can compress the weight matrices (Module 6). You have 6 different weight matrices in your baseline network, i.e. W(1) ∈ R 784×1024 ,W(2) ∈ R 1024×1024 , · · · ,W(5) ∈ R 1024×1024 ,W(6) ∈ R 1024×10. Run SVD on each of them, except for W(6) which is too small already, to approximate the weight matrices: W(l) ≈ Wc (l) = U (l)S (l)V (l)⊤ (1) For this, feel free to use whatever implementation you can find. tf.svd or torch.svd will serve the purpose. Note that we don’t compress bias (just because we’re lazy). 3. If you look into the singular value matrix S (l) , it should be a diagonal matrix. Its values are sorted in the order of their contribution to the approximation. What that means is that you can discard the least important singular values by sacrificing the approximation performance. For example, if you choose to use only D singular values and if the singular values are sorted in the descending order, W(l) ≈ Wc (l) = U (l) :,1:DS (l) 1:D,1:D  V (l) :,1:D ⊤ . (2) You may expect the Wc (l) in (2) is a worse approximation of W(l) than the one in (1) due to the missing components. But, by doing so you can do some compression. 1 4. Vary your D from 10, 20, 50, 100, 200, to Dfull, where Dfull is the original size of S (l) (so D = Dfull means you use (1) instead of (2)). For example, Dfull = 784 when l = 1 and 1024 when l > 1. Now you have 6 differently compressed versions that are using Wc (l) for feedforward. Each of the 6 networks are using one of the 6 D values of your choice. Report the test accuracy of the six approximated networks (perhaps a graph whose x-axis is D and y-axis is the test accuracy). You’ll see that when D = Dfull the test accuracy is almost as good as the baseline, while D = 10 will give you the worst performance. Note, however, that D = Dfull doesn’t give you any compression, while smaller choices of D can reduce the amount of computation during feedforward. 5. Report your test accuracies of the six SVDed versions along with your baseline performance. Report the number of parameters of your SVDed networks and compare them to the baseline’s. Be careful with the S (l) matrices: they are diagonal matrices, meaning that there are only D nonzero elements. 6. Note that you don’t have to run the SVD algorithm multiple times to vary D. Run it once, and extract different versions by varying D. That’s what’s good about SVD. Problem 2: Network Compression Using SVD [2 points] 1. Now you learned that the low rank approximation of W(l) gives you some compression. However, you might not like the performance of the too small D values. From now on, fix your D = 20 and let’s improve its performance. 2. Define a NEW network whose weight matrices W(l) are factorized. Again, this is a new one, different from your baseline in P1. In this new network, you don’t estimate W(l) directly anymore, but its factor matrices, to reconstruct W(l) as follows: W(l) = U (l)V (l)⊤ . 3. In other words, the feedforward is now defined like this: x (l+1) ← g  U (l)V (l)⊤ x (l) + b (l)  (3) 4. But instead of randomly initializing these factor matrices, initialize them using the P1 SVD results of the D = 20 case: U (l) ← U (l) :,1:20, V (l)⊤ ← S (l) 1:20,1:20V (l)⊤ :,1:20 (4) 5. Again, note that U and V are the new variables that you need to estimate via optimization. They are fancier though, because they are initialized using the SVD results. If you stop here, you’ll get the same test performance as in P1. 6. Finetune this network. Now this new network has new parameters to update, i.e. U (l) and V (l) (as well as the bias terms). Update them using BP. Since you initialized the new parameters with SVD, which is a pretty good starting point, you may want to use a smaller-than-usual learning rate. 7. Report the test-time classification accuracy. Problem 3: Network Compression Using SVD [3 points] 1. Another way to improve our D = 20 case is to inform the training process of the SVD approximation. It’s a different method from P1, where SVD was performend once after the network training was completed. This time, we do SVD at every epoch. 2. Initialize W(l) using the “baseline” model. We will finetune it. 3. This time, for the feedforward pass, you never use W(l) . Instead, you do SVD at every iteration and make sure the feedforward pass always uses Wc (l) = U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20. 4. What that means for the training algorithm is that you should think of the low-rank SVD procedure as an approximation function W(l) ≈ f(W(l) ) = U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20. 2 5. Hence, the update for W(l) involves the derivative f ′ (W(l) ) due to the chain rule (See M6 S15 where I explained this in the quantization context). You can na¨ıvely assume that your SVD approximation is near perfect (although it’s not). Then, at least for the BP, you don’t have to worry about the gradients as the derivative will be just one everywhere, because f(x) = x. By doing so, you can feedforward using Wc (l) while the updates are done on W(l) : Feedforward: (5) Perform SVD: W(l) ≈ U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20 (6) Perform Feedforward: x (l+1) ← g  U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20x (l) + b (l)  (7) Backpropagation: (8) Update Parameters: W(l) ← W(l) − η ∂L ∂f(W(l) ) ∂f(W(l) ) ∂W(l) (9) Note that ∂f(W(l) ) ∂W(l) = 1 everywhere due to our identity assumption. 6. As the feedforward is always using the SVD’ed version of the weights, the network is aware of the additional error introduced by the compression and can deal with it during training. The implementation of this technique requires you to define an SVD routine running in the middle of the feedforward process. Both TF and PT provide their SVD implementations you can use: Tensorflow 2x: https://www.tensorflow.org/api_docs/python/tf/linalg/svd PyTorch: https://pytorch.org/docs/stable/generated/torch.svd.html Although it takes more time to train (because you need to do SVD at every iteration), I like it as I can boost the performance of the D = 20 compressed network up to around 97%. Considering the amount of memory saving (i.e., after the compression it uses only about 2%!), this is a great way to compress your network. Problem 4: Speaker Verification [4 points] 1. In this problem, we are going to build a speaker verification system. It takes two utterances as input, and predicts whether they were spoken by the same speaker (positive class) or not (negative class). 2. trs.pkl contains an 500×16,180 matrix, whose row is a speech signal with 16,180 samples. They are the returned vectors from the librosa.load function. Similarly, tes.pkl holds a 200×22,631 matrix. 3. The training matrix is ordered by speakers. Each speaker has 10 utterances, and there are 50 such speakers (that’s why there are 500 rows). Similarly, the test set has 20 speakers, each of which is with 10 utterances. 4. Randomly sample L pairs of utterances from the ten utterance of the first speaker. In theory, there are10 2  = 45 pairs you can sample from (the order of the two utterances within a pair doesn’t matter). You can use all 45 of them if you want. These are the positive examples in your first minibatch. 5. Let’s construct L negative pairs as well. First, randomly sample L utterances from the 49 training speakers. Second, randomly sample another L utterances from the first speaker (the speaker you sampled the positive pairs from). Using these two sets, each has L examples, form another set of L pairs. If L > 10, you’ll need to repeatedly use the first speaker’s utterance (i.e. sampling with replacement). This set is your negative examples, each of whose pair contains an utterance from the first speaker and a random utterance spoken by a different speaker. 6. The L positive pairs and L negative pairs form your first minibatch. You have 2L pairs of utterances in total. 3 7. Repeat this process for the other training speakers, so that each speaker is represented by L positive pairs and L negative pairs. By doing so, you can form 50 minibatches with a balanced number of positive and negative pairs. 8. Train a Siamese network that tries to predict 1 for the positive pairs and 0 for the negative ones. In a minibatch, since you have L positive and L negative pairs, respectively, your net must predict L ones and L zeros, respectively. 9. I found that STFT on the signals serves the initial feature extraction process. Therefore, your Siamese network will take as input TWO spectrograms, each of which is of size 513 × T. I wouldn’t care too much about your choice of the network architecture this time (if it works anyway), but it has to somehow predict a fixed-length feature vector for the given sequence of spectra (consequently, TWO fixted-length vectors for the pair of input spectrograms). Using the inner product of the two latent embedding vectors as the input to the sigmoid function, you’ll do a logistic regression. Use your imagination and employ whatever techniques you learned in class to design/train this network. 10. Construct similar batches from the test set, and test the verification accuracy of your network. Report your test-time speaker verification performance. I was able to get a decent result (∼ 70%) with a reasonable network architecture (e.g., a GRU working on STFT), which converged in a reasonable amount of time (i.e. in an hour). 11. Submit your code and accuracy on the test examples. Problem 5: Speech Denoising Using RNN [4 points] 1. Audio signals natually contain some temporal structure to make use of for the prediction job. Speech denoising is a good example. In this problem, we’ll come up with a reasonably complicated RNN implementation for the speech denoising job. 2. homework3.zip contains a folder tr. There are 1,200 noisy speech signals (from trx0000.wav to trx1199.wav) in there. To create this dataset, I start from 120 clean speech signal spoken by 12 different speakers (10 sentences per speaker), and then mix each of them with 10 different kinds of noise signals. For example, from trx0000.wav to trx0009.wav are all saying the same sentence spoken by the same person, while they are contaminated by different noise signals. I also provide the original clean speech (from trs0000.wav to trs1199.wav) and the noise sources (from trn0000.wav to trn1199.wav) in the same folder. For example, if you add up the two signals trs0000.wav and trn0000.wav, that will make up trx0000.wav, although you don’t have to do it because I already did it for you. 3. Load all of them and convert them into spectrograms like you did in homework 2. Don’t forget to take their magnitudes. For the mixtures (trxXXXX.wav) You’ll see that there are 1,200 nonnegative matrices whose number of rows is 513, while the number of columns depends on the length of the original signal. Ditto for the speech and noise sources. Eventually, you’ll construct three lists of magnitude spectrograms with variable lengths: |X (l) tr |, |S (l) tr |, and |N (l) tr |, where l denotes one of the 1,200 examples. 4. The |X (l) tr | matrices are your input to the RNN for training. An RNN (either GRU or LSTM is fine) will consider it as a sequence of 513 dimensional spectra. For each of the spectra, you want to do a prediction for the speech denoising job. 5. The target of the training procedure is something called Ideal Binary Masks (IBM). You can easily construct an IBM matrix per spectrogram as follows: M(l) f,t = ( 1 if |S (l) tr |f,t > |N (l) tr |f,t 0 if |S (l) tr |f,t ≤ |N (l) tr |f,t (10) 4 IBM assumes that each of the time-frequency bin at (f, t), an element of the |Xtr| (l) matrix, is from either speech or noise. Although this is not the case in the real world, it works like charm most of the time by doing this operation: S (l) tr ≈ Sˆ (l) tr = M(l) ⊙ X (l) tr . (11) Note that masking is done to the complex-valued input spectrograms. Also, since masking is elementwise, the size of M(l) and X (l) tr is same. Eventually, your RNN will learn a function that approximates this relationship: M(l) :,t ≈ Mˆ (l) :,t = RNN |X (l) tr |:,1:t; W  , (12) where W is the network parameters to be estimated. 6. Train your RNN using this training dataset. Feel free to use whatever LSTM or GRU cells available in Tensorflow or PyTorch. I find dropout helpful, but you may want to be gentle about the dropout ratio. I didn’t need too complicated network structures to beat a fully-connected network. 7. Implementation note: In theory you must be able to feed the entire sentence (one of the X (l) tr matrices) as an input sequence. You know, in RNNs a sequence is an input sample. On top of that, you still want to do mini-batching. Therefore, your mini-batch is a 3D tensor, not a matrix. For example, in my implementation, I collect ten spectrograms, e.g. from X (0) tr to X (9) tr , to form a 513 × T × 10 tensor (where T means the number of columns in the matrix). Therefore, you can think that the mini-batch size is 10, while each example in the batch is not a multidimensional feature vector, but a sequence of them. This tensor is the mini-batch input to my network. Instead of feeding the full sequence as an input, you can segment the input matrix into smaller pieces, say 513×Ttrunc×Nmb, where Ttrunc is the fixed number to truncate the input sequence and Nmb is the number of such truncated sequences in a mini-batch, so that the recurrence is limited to Tmb during training. In practice this doesn’t make big difference, so either way is fine. Note that during the test time the recurrence works from the beginning of the sequence to the end (which means you don’t need a truncation for testing and validation). 8. I also provide a validation set in the folder v. Check out the performance of your network on this dataset. Of course you’ll need to see the validation loss, but eventually you’ll need to check out the SNR values. For example, for a recovered validation sequence in the STFT domain, Sˆ (l) v = Mˆ (l) ⊙ X(l) v , you’ll perform an inverse-STFT using librosa.istft to produce a time domain wave form ˆs(t). Normally for this dataset, a well-tuned fully-connected net gives slightly above 10 dB SNR. So, your validation set should give you a number larger than that. Once again, you don’t need to come up with a too large network. Start from a small one. 9. We’ll test the performance of your network in terms of the test data. I provide some test signals in te, but not their corresponding sources. So, you can’t calculate the SNR values for the test signals. Submit your recovered test speech signals in a zip file, which are the speech denoising results on the signals in te. We’ll calculate SNR based on the ground-truth speech we set aside from you.Problem 1: RNNs as a generative model [4 points] 1. We will train an RNN (LSTM or GRU; you choose one) that can predict the rest of the bottom half of an MNIST image given the top half. So, yes, this is a generative model that can “draw” handwritten digits. 1 2. As the first step, let’s divide every training image into 16 smaller patches. Since the original images are with 28×28 pixels, what I mean is that you need to chop off the image into 7 × 7 patches. There is no overlap between those patches. Above is an example from my implementation. It’s an image of number “5” obviously, but it’s just chopped into 16 patches. 3. Let’s give them an order. Let’s do it from the top left corner to the bottom right corner. Below is going to be the order of the patches. 4. Now that we have an order, we’ll use it to turn each MNIST image into a sequence of smaller patches. Although, below would be a potential way to turn these patches into a sequence, I wouldn’t use this way exactly, because then it is a sequence of 2d arrays, not vectors. 5. While I’ll keep the same order, to simplify our model architecture, we will vectorize each patch from 7×7 to a 49-dimensional vector. Finally, our sequence is a matrix X ∈ R 16×49, where 16 is the number of “time” steps. This is an input sequence to your RNN for training. 6. With a proper batch size, say 100, now an input tensor is defined as a 3D array of size 100 × 16 × 49. But, I’ll ignore the batch size in the equations below to keep the notation uncluttered. 7. Train an RNN out of these sequences. There must be 50,000 such sequences, or 500 minibatches if your batch size is 100. I tried a couple of different model architectures but both worked quite well. The smallest one I tried was a 2×64 LSTM. I didn’t do any fancy things like gradient clipping, as the longest sequence length is still just 16. 8. Remember to add a dense layer, so that you can convert whatever choice of the LSTM or GRU hidden dimension back to 49. You may also want to use an activation function for your output units so that the output is bounded. 9. You need to train your RNN in a way that it can predict the next patch out of the so-far-observed patches. To this end, the LSTM should predict the next patch in the following manner: (Yt,: , Ct+1,: , Ht+1,:) = LSTM(Xt,: , Ct,: , Ht,:), (1) where C and H denote the memory cell and hidden state, respectively (or with GRU C will be omitted), that are 0 when t = 0. To work as a predictive model, when you train, you need to compare Yt,: (the prediction) with Xt+1,: (the next patch) and compute the loss (I used MSE as I’m lazy). 2 10. In other words, you will feed the input sequence X1:15,: (the full sequence except for the last patch) to the model, whose output Y ∈ R 15×49 will need to be compared to X2:16,: , vector-by-vector, to compute the loss: L = X 16 t=2 X 49 d=1 D(Xt,d||Yt−1,d), (2) where D(·||·) is a distance metric of your choice. 11. Let’s use the test set to validate the model at every epoch, to see if it overfits. If it starts to overfit, stop the training process early. It took from a few to tens of minutes to train the network. 12. Once your net converges, let’s move on to the fun “generation” part. Pick up a test image that belongs to a digit class, and feed its first 8 patches to the trained model. It will generate eight patches (Y ∈ R 8×49), and two other vectors as the last memory cell and hidden states: C9,: , H9,: . Note that the dimension of C and H vectors depends on your choice of model complexity. 13. Then, run the model frame-by-frame, by feeding the last memory cell states, last hidden states and the last predicted output as if it’s the new input. You will need to run this 7 times using a for loop, instead of feeding a sequence. Remember, for example, you don’t know what to use as an input at t = 9, because we pretend like we don’t know X9,: , until you predict Y8,: : (Y9,: , C10,: , H10,:) = LSTM(Y8,: , C9,: , H9,:) (3) (Y10,: , C11,: , H11,:) = LSTM(Y9,: , C10,: , H10,:) (4) (Y11,: , C12,: , H12,:) = LSTM(Y10,: , C11,: , H11,:) (5) . . . (6) (Y15,: , C16,: , H16,:) = LSTM(Y14,: , C15,: , H15,:) (7) 14. Note that Y15,: is the prediction for your 16-th patch and, e.g., Y8,: is the prediction for your 9-th patch, and so on. We will discard Y1:7,: , as they are the predictions of patches that are already given (i.e., t < 9). Once again, you know, we pretend like the top half (patch 1 to 8) are given, while the rest (patch 9 to 16) are NOT known. 15. Combine the known top half X1:8,: and the predicted patches Y8:15,: into a sequence of 16 patches. We are curious of how the bottom half looks like, as they are the generated ones. 16. Reshape the synthesized matrix [X1:8,: , Y8:15,: ] back into a 28 × 28 image. Repeat this experiment on 10 chosen images from the same digit class. 17. Repeat the experiment for all 10 digit classes. You will generate 100 images in total. 18. Below are examples from my model. On the left, you see the examples whose bottom half is “generated” from the LSTM model, while the right images are the original (whose top half was fed to the LSTM). I can see that the model does a pretty good job, but at the same time there are some interesting failure cases (blue boxes). For example, if the upper arch of “3” is too large, LSTM thinks it’s 2 and draws a 2. Or, for some reason, if the some 5’s are not making a sharp corner on its top left, LSTM thinks it’s 6. Same story for tilted 7 that LSTM thinks it’s 2. So, my point is, if I had to guess the botton half of these images, I’d have been confused as well. 19. Submit your 10 × 10 images that your LSTM generated. Submit their original images as well. Your figures should look like mine (the two figures shown below) in terms of quality. Feel free to do it better and embarrass me but you’ll get a full mark if the generated images look like mine. Note that these have to be sampled from your test set, not the training set. 3 (a) LSTM generated images (b) The original images Problem 2: Variational Autoencoders on Poor Sevens [3 points] 1. tr7.pkl contains 6,265 MNIST digits from its training set, but not all ten digits. I only selected 7’s. Therefore, it’s with a rank-3 tensor of size 6, 265 × 28 × 28. Similarly, te7.pkl contains 1,028 7’s. 2. The digit images in this problem are special, because I added a special effect to them. So, they are different from the original 7’s in the MNIST dataset in a way. I want you to find out what I did to the poor 7’s. 3. Instead of eyeballing all those images, you need to implement a VAE that finds out a few latent dimensions, one of which should show you the effect I added. 4. Once again, I wouldn’t care too much about your network architecture. This could be a good chance for you to check out the performance of the CNN encoder, followed by a decoder with deconvolution layers (or transposed convolution layers), but do something else if you feel like. I found that fully-connected networks work just fine. 5. What’s important here in the VAE is, as a VAE, it needs a hidden layer that is dedicated to learn the latent embedding. In this layer, each hidden unit is governed by a standard normal distribution as its a priori information. Also, be careful about the re-parameterization technique and the loss function. 6. You’ll need to limit the number of hidden units K in your code layer (the embedding vector) with a small number (e.g. smaller than 5) to reduce your search space. Out of K, there must be a dimension that explains the effect that I added. 7. One way to prove that you found the latent dimension of interest is to show me the digits generated by the decoder. More specifically, you may want to “generate” new 7’s by feeding a few randomly generated code vectors, that are the random samples from the K normal distributions that your VAE learned. But, they won’t be enough to show which dimension takes care of my added effect. Therefore, your random code vectors should be formed specially. 8. What I’d do is to generate code vectors by fixing the K − 1 dimensions with the same value over the codes, while varying only one of them. 4 9. For example, if K = 3 and you’re interested in the third dimension, your codes should look like as follows: Z =              0.23 −0.18 −5 0.23 −0.18 −4.5 0.23 −0.18 −4.0 0.23 −0.18 −3.5 0.23 −0.18 −3.0 . . . 0.23 −0.18 4.5 0.23 −0.18 5.0              (8) Note that the first two column vectors are once randomly sampled from the normal distributions, but then shared by all the codes so that the variation found in the decoded output relies solely on the third dimension. 10. You’ll want to examine all the K dimensions by generating samples from each of them. Show me the ones you like. They should show a bunch of similar-looking 7’s but with gradually changing effect on them. The generated samples that show a gradual change of the thickness of the stroke, for example, are not a good answer, because that’s not the one I added, but something that was there in the dataset already. 11. Submit your notebook with figure and code. Problem 3: Conditional GAN [3 points] 1. Let’s develop a GAN model that can generate MNIST digits, but based on the auxiliary input from the user indicating which digit to create. 2. To this end, the generate has to be trained to receive two different kinds of input: the random vector and the class label. 3. The random vector is easy to create. It should be a d-dimensional vector, sampled from a standard normal distribution N (0, 1). d = 100 worked just fine for me. 4. As for the conditioning vector, you somehow need to inform the network of your intention. For example, if you want to generate a “0”, you need to give that information to the generator. There are many different ways to condition a neural network at various stages. But, this time, let’s use a simple one. We will convert the digit label into a one-hot vector. For example, if you want to generate a “7” the conditioning vector is [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]. 5. Then, we need to combine these two different kinds of information. Again, there are many different ways, but let’s just stick to a simple solution. We will concatenate the d-dimensional random vector and the 10-dimensional one-hot vector. Therefore, the input to your generator is with d + 10 dimensions. If your d = 100, the input dimension is 110. 6. You are free to choose whatever network architecture you want to practice with. Here’s the fullyconnected one I found as a good starting point: 110 × 200 × 400 × 784. I used ReLU as the activation function, but as for the last layer, I used tanh. It means that I’ll interpret −1 as the black pixel while +1 being the white pixel. 7. The discriminator has a similar architecture: 794 × 400 × 200 × 100 × 1. The reason why it takes a 794-dim vector is that it wants to know what the image sample is conditioned on. Also note that it does binary classification to discern whether the conditioned input image is a real or fake example, i.e., you will need to set up the last layer as a logistic regression function. 8. To train this GAN model, sample a minibatch of B examples from your MNIST dataset. These are your real examples. But, instead of feeding them directly to your discriminator, you’ll append their label information by turning it into the one-hot representation. Don’t forget to match the scale: it has to be from −1 to +1 instead of [0, 1] as that’s how the generator defines the pixel intensity. 5 9. Accordingly, generate a class-balanced set of fake examples by feeding B random vectors to your generator. Again, each of your random vectors needs to be appended by a randomly chosen onehot vector. For example, if your B = 100, you may want to generate ten ones, ten twos, and so on. Although the generated images are not with any label information anymore, you know that each should belong to a particular digit class based on your conditioning vector. Therefore, when you feed these fake examples to the discriminator, you need to append the one-hot vectors once again. Of course, the one-hot vectors should match the ones you used to inform the generator as input. 10. To summarize, the input to your generator is a d + 10-dim vector. The last 10 elements should be copied to augment your fake example, generated from the generator, to construct a 794-dim vector. You have B fake examples as such. The real examples are with the same size, but their first 784 elements are from the real MNIST images, accompanied by the last 10 elements representing the class, to which the image belongs. 11. Train this GAN model. I used Adam with lower-than-usual learning rates. Dropout helped the discriminator. Below is the figure that shows the change of the classification accuracy over the epochs (red for real and blue for fake examples). I can see that it converged to the Nash equilibrium, as the discriminator seems to be confused. 12. Below is the test examples that I generated by feeding new random vectors (plus the intended class labels). I placed ten examples per class in a row. These are of course not the best MNIST digits I can imagine, but they look fine given the simple structure and algorithm I used. 6 13. Please feel free to use whatever other things you want to try out, such as WGAN, but if your results are decent (like mine) we’ll give away the full score. 14. Report both the convergence graph as well as the generated examples. Problem 4: Missing Value Imputation Using Conditional GAN [5 points] 1. We’ve already seen that LSTM can act like generative model that can “predict future patches” given the “past patches” in P1. 2. This time, we’ll do something similar but by using GAN. This time, it works like a missing value imputation system. We assume that only the center part of the image is known, while the generator has to predict what the other surrounding pixels are. 3. We’ll formulate this as a conditional GAN. First, take a batch of MNIST images. Take their center 10 × 10 patches, and then flatten it. This is your 100-dimensional conditioning vector. Since there are 28 × 28 pixels in each image, you’ll do something like this, X[:,10:19,10:19], to take the center patch. This will form a B × 100 matrix, for your batch of B conditioning vectors. 4. Append this matrix to your random vectors of 100 dimensions drawn from the standard normal distribution. This B × 200 matrix is the input to your generator. 5. The generator takes this 200 dimensional vectors and synthesizes MNIST-looking digits. You will need to prepare another set of B real examples. Eventually, you feed 2B examples in total to your discriminator as a minibatch. 6. If both discriminator and generator are trained properly, you can see that the results are some MNISTlooking digits. But, I found that the generator simply ignores the conditioning vector and generate whatever it wants to generate. They all certainly look like MNIST digits, but the conditioning part doesn’t work. Below is the generated images (left) and the ground-truth images that I extracted the center patches from (right). They are completely different from each other. 7. So, even though I did feed the center patch as the conditioning vector to the generator, it ignores it and generate something totally different. It’s because, I think, the generator has no way to know the conditioning vector is actually the center patch of the digit that it must generate. In other words, the generator is generating the whole image, although it doesn’t have to generate the center patch, which is known to me. Instead, I wanted it to generate the surrounding pixels, that are the missing values. 7 8. As a remedy, I added another regularizer to my generator so that it functions as an autoencoder at least for the center pixels. You know, in an ordinary GAN setup, the generator loss has to penalize the discriminator’s decision that classifies the fake examples into the fake class (i.e., when the generator fails to fool the discriminator). On top of this ordinary generator loss, I add a simple mean squared error term that penalizes the difference between the conditioning vector and the center patch of the generated image, as they have to be the same, essentially. 9. Since it’s a regularizer, I needed to investigate different λ values to control its contribution to the total loss of the generator. It turned out that the generator is not too sensitive to this choice, although it does generate a “less conditioned” example when it comes to a too small λ. Below is the two sets of examples when I set λ = 0.1 (left) and λ = 10 (right). 10. Replicate what I did with the regularized model and submit your code and generated examples (i.e., you don’t have to replicate my failed model with no regularization). Once again, you can try some other fancy models and different ways to condition the model. But we’ll give you a full score if your results are as good as mine.

$25.00 View

[SOLVED] Deep learning systems (engr-e 533) homework 4

Problem 1: RNNs as a generative model [4 points] 1. We will train an RNN (LSTM or GRU; you choose one) that can predict the rest of the bottom half of an MNIST image given the top half. So, yes, this is a generative model that can “draw” handwritten digits. 1 2. As the first step, let’s divide every training image into 16 smaller patches. Since the original images are with 28×28 pixels, what I mean is that you need to chop off the image into 7 × 7 patches. There is no overlap between those patches. Above is an example from my implementation. It’s an image of number “5” obviously, but it’s just chopped into 16 patches. 3. Let’s give them an order. Let’s do it from the top left corner to the bottom right corner. Below is going to be the order of the patches. 4. Now that we have an order, we’ll use it to turn each MNIST image into a sequence of smaller patches. Although, below would be a potential way to turn these patches into a sequence, I wouldn’t use this way exactly, because then it is a sequence of 2d arrays, not vectors. 5. While I’ll keep the same order, to simplify our model architecture, we will vectorize each patch from 7×7 to a 49-dimensional vector. Finally, our sequence is a matrix X ∈ R 16×49, where 16 is the number of “time” steps. This is an input sequence to your RNN for training. 6. With a proper batch size, say 100, now an input tensor is defined as a 3D array of size 100 × 16 × 49. But, I’ll ignore the batch size in the equations below to keep the notation uncluttered. 7. Train an RNN out of these sequences. There must be 50,000 such sequences, or 500 minibatches if your batch size is 100. I tried a couple of different model architectures but both worked quite well. The smallest one I tried was a 2×64 LSTM. I didn’t do any fancy things like gradient clipping, as the longest sequence length is still just 16. 8. Remember to add a dense layer, so that you can convert whatever choice of the LSTM or GRU hidden dimension back to 49. You may also want to use an activation function for your output units so that the output is bounded. 9. You need to train your RNN in a way that it can predict the next patch out of the so-far-observed patches. To this end, the LSTM should predict the next patch in the following manner: (Yt,: , Ct+1,: , Ht+1,:) = LSTM(Xt,: , Ct,: , Ht,:), (1) where C and H denote the memory cell and hidden state, respectively (or with GRU C will be omitted), that are 0 when t = 0. To work as a predictive model, when you train, you need to compare Yt,: (the prediction) with Xt+1,: (the next patch) and compute the loss (I used MSE as I’m lazy). 2 10. In other words, you will feed the input sequence X1:15,: (the full sequence except for the last patch) to the model, whose output Y ∈ R 15×49 will need to be compared to X2:16,: , vector-by-vector, to compute the loss: L = X 16 t=2 X 49 d=1 D(Xt,d||Yt−1,d), (2) where D(·||·) is a distance metric of your choice. 11. Let’s use the test set to validate the model at every epoch, to see if it overfits. If it starts to overfit, stop the training process early. It took from a few to tens of minutes to train the network. 12. Once your net converges, let’s move on to the fun “generation” part. Pick up a test image that belongs to a digit class, and feed its first 8 patches to the trained model. It will generate eight patches (Y ∈ R 8×49), and two other vectors as the last memory cell and hidden states: C9,: , H9,: . Note that the dimension of C and H vectors depends on your choice of model complexity. 13. Then, run the model frame-by-frame, by feeding the last memory cell states, last hidden states and the last predicted output as if it’s the new input. You will need to run this 7 times using a for loop, instead of feeding a sequence. Remember, for example, you don’t know what to use as an input at t = 9, because we pretend like we don’t know X9,: , until you predict Y8,: : (Y9,: , C10,: , H10,:) = LSTM(Y8,: , C9,: , H9,:) (3) (Y10,: , C11,: , H11,:) = LSTM(Y9,: , C10,: , H10,:) (4) (Y11,: , C12,: , H12,:) = LSTM(Y10,: , C11,: , H11,:) (5) . . . (6) (Y15,: , C16,: , H16,:) = LSTM(Y14,: , C15,: , H15,:) (7) 14. Note that Y15,: is the prediction for your 16-th patch and, e.g., Y8,: is the prediction for your 9-th patch, and so on. We will discard Y1:7,: , as they are the predictions of patches that are already given (i.e., t < 9). Once again, you know, we pretend like the top half (patch 1 to 8) are given, while the rest (patch 9 to 16) are NOT known. 15. Combine the known top half X1:8,: and the predicted patches Y8:15,: into a sequence of 16 patches. We are curious of how the bottom half looks like, as they are the generated ones. 16. Reshape the synthesized matrix [X1:8,: , Y8:15,: ] back into a 28 × 28 image. Repeat this experiment on 10 chosen images from the same digit class. 17. Repeat the experiment for all 10 digit classes. You will generate 100 images in total. 18. Below are examples from my model. On the left, you see the examples whose bottom half is “generated” from the LSTM model, while the right images are the original (whose top half was fed to the LSTM). I can see that the model does a pretty good job, but at the same time there are some interesting failure cases (blue boxes). For example, if the upper arch of “3” is too large, LSTM thinks it’s 2 and draws a 2. Or, for some reason, if the some 5’s are not making a sharp corner on its top left, LSTM thinks it’s 6. Same story for tilted 7 that LSTM thinks it’s 2. So, my point is, if I had to guess the botton half of these images, I’d have been confused as well. 19. Submit your 10 × 10 images that your LSTM generated. Submit their original images as well. Your figures should look like mine (the two figures shown below) in terms of quality. Feel free to do it better and embarrass me but you’ll get a full mark if the generated images look like mine. Note that these have to be sampled from your test set, not the training set. 3 (a) LSTM generated images (b) The original images Problem 2: Variational Autoencoders on Poor Sevens [3 points] 1. tr7.pkl contains 6,265 MNIST digits from its training set, but not all ten digits. I only selected 7’s. Therefore, it’s with a rank-3 tensor of size 6, 265 × 28 × 28. Similarly, te7.pkl contains 1,028 7’s. 2. The digit images in this problem are special, because I added a special effect to them. So, they are different from the original 7’s in the MNIST dataset in a way. I want you to find out what I did to the poor 7’s. 3. Instead of eyeballing all those images, you need to implement a VAE that finds out a few latent dimensions, one of which should show you the effect I added. 4. Once again, I wouldn’t care too much about your network architecture. This could be a good chance for you to check out the performance of the CNN encoder, followed by a decoder with deconvolution layers (or transposed convolution layers), but do something else if you feel like. I found that fully-connected networks work just fine. 5. What’s important here in the VAE is, as a VAE, it needs a hidden layer that is dedicated to learn the latent embedding. In this layer, each hidden unit is governed by a standard normal distribution as its a priori information. Also, be careful about the re-parameterization technique and the loss function. 6. You’ll need to limit the number of hidden units K in your code layer (the embedding vector) with a small number (e.g. smaller than 5) to reduce your search space. Out of K, there must be a dimension that explains the effect that I added. 7. One way to prove that you found the latent dimension of interest is to show me the digits generated by the decoder. More specifically, you may want to “generate” new 7’s by feeding a few randomly generated code vectors, that are the random samples from the K normal distributions that your VAE learned. But, they won’t be enough to show which dimension takes care of my added effect. Therefore, your random code vectors should be formed specially. 8. What I’d do is to generate code vectors by fixing the K − 1 dimensions with the same value over the codes, while varying only one of them. 4 9. For example, if K = 3 and you’re interested in the third dimension, your codes should look like as follows: Z =              0.23 −0.18 −5 0.23 −0.18 −4.5 0.23 −0.18 −4.0 0.23 −0.18 −3.5 0.23 −0.18 −3.0 . . . 0.23 −0.18 4.5 0.23 −0.18 5.0              (8) Note that the first two column vectors are once randomly sampled from the normal distributions, but then shared by all the codes so that the variation found in the decoded output relies solely on the third dimension. 10. You’ll want to examine all the K dimensions by generating samples from each of them. Show me the ones you like. They should show a bunch of similar-looking 7’s but with gradually changing effect on them. The generated samples that show a gradual change of the thickness of the stroke, for example, are not a good answer, because that’s not the one I added, but something that was there in the dataset already. 11. Submit your notebook with figure and code. Problem 3: Conditional GAN [3 points] 1. Let’s develop a GAN model that can generate MNIST digits, but based on the auxiliary input from the user indicating which digit to create. 2. To this end, the generate has to be trained to receive two different kinds of input: the random vector and the class label. 3. The random vector is easy to create. It should be a d-dimensional vector, sampled from a standard normal distribution N (0, 1). d = 100 worked just fine for me. 4. As for the conditioning vector, you somehow need to inform the network of your intention. For example, if you want to generate a “0”, you need to give that information to the generator. There are many different ways to condition a neural network at various stages. But, this time, let’s use a simple one. We will convert the digit label into a one-hot vector. For example, if you want to generate a “7” the conditioning vector is [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]. 5. Then, we need to combine these two different kinds of information. Again, there are many different ways, but let’s just stick to a simple solution. We will concatenate the d-dimensional random vector and the 10-dimensional one-hot vector. Therefore, the input to your generator is with d + 10 dimensions. If your d = 100, the input dimension is 110. 6. You are free to choose whatever network architecture you want to practice with. Here’s the fullyconnected one I found as a good starting point: 110 × 200 × 400 × 784. I used ReLU as the activation function, but as for the last layer, I used tanh. It means that I’ll interpret −1 as the black pixel while +1 being the white pixel. 7. The discriminator has a similar architecture: 794 × 400 × 200 × 100 × 1. The reason why it takes a 794-dim vector is that it wants to know what the image sample is conditioned on. Also note that it does binary classification to discern whether the conditioned input image is a real or fake example, i.e., you will need to set up the last layer as a logistic regression function. 8. To train this GAN model, sample a minibatch of B examples from your MNIST dataset. These are your real examples. But, instead of feeding them directly to your discriminator, you’ll append their label information by turning it into the one-hot representation. Don’t forget to match the scale: it has to be from −1 to +1 instead of [0, 1] as that’s how the generator defines the pixel intensity. 5 9. Accordingly, generate a class-balanced set of fake examples by feeding B random vectors to your generator. Again, each of your random vectors needs to be appended by a randomly chosen onehot vector. For example, if your B = 100, you may want to generate ten ones, ten twos, and so on. Although the generated images are not with any label information anymore, you know that each should belong to a particular digit class based on your conditioning vector. Therefore, when you feed these fake examples to the discriminator, you need to append the one-hot vectors once again. Of course, the one-hot vectors should match the ones you used to inform the generator as input. 10. To summarize, the input to your generator is a d + 10-dim vector. The last 10 elements should be copied to augment your fake example, generated from the generator, to construct a 794-dim vector. You have B fake examples as such. The real examples are with the same size, but their first 784 elements are from the real MNIST images, accompanied by the last 10 elements representing the class, to which the image belongs. 11. Train this GAN model. I used Adam with lower-than-usual learning rates. Dropout helped the discriminator. Below is the figure that shows the change of the classification accuracy over the epochs (red for real and blue for fake examples). I can see that it converged to the Nash equilibrium, as the discriminator seems to be confused. 12. Below is the test examples that I generated by feeding new random vectors (plus the intended class labels). I placed ten examples per class in a row. These are of course not the best MNIST digits I can imagine, but they look fine given the simple structure and algorithm I used. 6 13. Please feel free to use whatever other things you want to try out, such as WGAN, but if your results are decent (like mine) we’ll give away the full score. 14. Report both the convergence graph as well as the generated examples. Problem 4: Missing Value Imputation Using Conditional GAN [5 points] 1. We’ve already seen that LSTM can act like generative model that can “predict future patches” given the “past patches” in P1. 2. This time, we’ll do something similar but by using GAN. This time, it works like a missing value imputation system. We assume that only the center part of the image is known, while the generator has to predict what the other surrounding pixels are. 3. We’ll formulate this as a conditional GAN. First, take a batch of MNIST images. Take their center 10 × 10 patches, and then flatten it. This is your 100-dimensional conditioning vector. Since there are 28 × 28 pixels in each image, you’ll do something like this, X[:,10:19,10:19], to take the center patch. This will form a B × 100 matrix, for your batch of B conditioning vectors. 4. Append this matrix to your random vectors of 100 dimensions drawn from the standard normal distribution. This B × 200 matrix is the input to your generator. 5. The generator takes this 200 dimensional vectors and synthesizes MNIST-looking digits. You will need to prepare another set of B real examples. Eventually, you feed 2B examples in total to your discriminator as a minibatch. 6. If both discriminator and generator are trained properly, you can see that the results are some MNISTlooking digits. But, I found that the generator simply ignores the conditioning vector and generate whatever it wants to generate. They all certainly look like MNIST digits, but the conditioning part doesn’t work. Below is the generated images (left) and the ground-truth images that I extracted the center patches from (right). They are completely different from each other. 7. So, even though I did feed the center patch as the conditioning vector to the generator, it ignores it and generate something totally different. It’s because, I think, the generator has no way to know the conditioning vector is actually the center patch of the digit that it must generate. In other words, the generator is generating the whole image, although it doesn’t have to generate the center patch, which is known to me. Instead, I wanted it to generate the surrounding pixels, that are the missing values. 7 8. As a remedy, I added another regularizer to my generator so that it functions as an autoencoder at least for the center pixels. You know, in an ordinary GAN setup, the generator loss has to penalize the discriminator’s decision that classifies the fake examples into the fake class (i.e., when the generator fails to fool the discriminator). On top of this ordinary generator loss, I add a simple mean squared error term that penalizes the difference between the conditioning vector and the center patch of the generated image, as they have to be the same, essentially. 9. Since it’s a regularizer, I needed to investigate different λ values to control its contribution to the total loss of the generator. It turned out that the generator is not too sensitive to this choice, although it does generate a “less conditioned” example when it comes to a too small λ. Below is the two sets of examples when I set λ = 0.1 (left) and λ = 10 (right). 10. Replicate what I did with the regularized model and submit your code and generated examples (i.e., you don’t have to replicate my failed model with no regularization). Once again, you can try some other fancy models and different ways to condition the model. But we’ll give you a full score if your results are as good as mine.

$25.00 View

[SOLVED] Deep learning systems (engr-e 533) homework 3

Problem 1: Network Compression Using SVD [2 points] 1. Train a fully-connected net for MNIST classification. It should be with 5 hidden layers each of which is with 1024 hidden units. Feel free to use whatever techniques you learned in class. You should be able to get the test accuracy above 98%. Let’s call this network “baseline”. You can reuse the one from the previous homework if its accuracy is good enough. Otherwise, this would be a good chance for you to improve your “baseline” MNIST classifier. 2. You learned that Singular Value Decomposition (SVD) can compress the weight matrices (Module 6). You have 6 different weight matrices in your baseline network, i.e. W(1) ∈ R 784×1024 ,W(2) ∈ R 1024×1024 , · · · ,W(5) ∈ R 1024×1024 ,W(6) ∈ R 1024×10. Run SVD on each of them, except for W(6) which is too small already, to approximate the weight matrices: W(l) ≈ Wc (l) = U (l)S (l)V (l)⊤ (1) For this, feel free to use whatever implementation you can find. tf.svd or torch.svd will serve the purpose. Note that we don’t compress bias (just because we’re lazy). 3. If you look into the singular value matrix S (l) , it should be a diagonal matrix. Its values are sorted in the order of their contribution to the approximation. What that means is that you can discard the least important singular values by sacrificing the approximation performance. For example, if you choose to use only D singular values and if the singular values are sorted in the descending order, W(l) ≈ Wc (l) = U (l) :,1:DS (l) 1:D,1:D  V (l) :,1:D ⊤ . (2) You may expect the Wc (l) in (2) is a worse approximation of W(l) than the one in (1) due to the missing components. But, by doing so you can do some compression. 1 4. Vary your D from 10, 20, 50, 100, 200, to Dfull, where Dfull is the original size of S (l) (so D = Dfull means you use (1) instead of (2)). For example, Dfull = 784 when l = 1 and 1024 when l > 1. Now you have 6 differently compressed versions that are using Wc (l) for feedforward. Each of the 6 networks are using one of the 6 D values of your choice. Report the test accuracy of the six approximated networks (perhaps a graph whose x-axis is D and y-axis is the test accuracy). You’ll see that when D = Dfull the test accuracy is almost as good as the baseline, while D = 10 will give you the worst performance. Note, however, that D = Dfull doesn’t give you any compression, while smaller choices of D can reduce the amount of computation during feedforward. 5. Report your test accuracies of the six SVDed versions along with your baseline performance. Report the number of parameters of your SVDed networks and compare them to the baseline’s. Be careful with the S (l) matrices: they are diagonal matrices, meaning that there are only D nonzero elements. 6. Note that you don’t have to run the SVD algorithm multiple times to vary D. Run it once, and extract different versions by varying D. That’s what’s good about SVD. Problem 2: Network Compression Using SVD [2 points] 1. Now you learned that the low rank approximation of W(l) gives you some compression. However, you might not like the performance of the too small D values. From now on, fix your D = 20 and let’s improve its performance. 2. Define a NEW network whose weight matrices W(l) are factorized. Again, this is a new one, different from your baseline in P1. In this new network, you don’t estimate W(l) directly anymore, but its factor matrices, to reconstruct W(l) as follows: W(l) = U (l)V (l)⊤ . 3. In other words, the feedforward is now defined like this: x (l+1) ← g  U (l)V (l)⊤ x (l) + b (l)  (3) 4. But instead of randomly initializing these factor matrices, initialize them using the P1 SVD results of the D = 20 case: U (l) ← U (l) :,1:20, V (l)⊤ ← S (l) 1:20,1:20V (l)⊤ :,1:20 (4) 5. Again, note that U and V are the new variables that you need to estimate via optimization. They are fancier though, because they are initialized using the SVD results. If you stop here, you’ll get the same test performance as in P1. 6. Finetune this network. Now this new network has new parameters to update, i.e. U (l) and V (l) (as well as the bias terms). Update them using BP. Since you initialized the new parameters with SVD, which is a pretty good starting point, you may want to use a smaller-than-usual learning rate. 7. Report the test-time classification accuracy. Problem 3: Network Compression Using SVD [3 points] 1. Another way to improve our D = 20 case is to inform the training process of the SVD approximation. It’s a different method from P1, where SVD was performend once after the network training was completed. This time, we do SVD at every epoch. 2. Initialize W(l) using the “baseline” model. We will finetune it. 3. This time, for the feedforward pass, you never use W(l) . Instead, you do SVD at every iteration and make sure the feedforward pass always uses Wc (l) = U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20. 4. What that means for the training algorithm is that you should think of the low-rank SVD procedure as an approximation function W(l) ≈ f(W(l) ) = U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20. 2 5. Hence, the update for W(l) involves the derivative f ′ (W(l) ) due to the chain rule (See M6 S15 where I explained this in the quantization context). You can na¨ıvely assume that your SVD approximation is near perfect (although it’s not). Then, at least for the BP, you don’t have to worry about the gradients as the derivative will be just one everywhere, because f(x) = x. By doing so, you can feedforward using Wc (l) while the updates are done on W(l) : Feedforward: (5) Perform SVD: W(l) ≈ U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20 (6) Perform Feedforward: x (l+1) ← g  U (l) :,1:20S (l) 1:20,1:20V (l)⊤ :,1:20x (l) + b (l)  (7) Backpropagation: (8) Update Parameters: W(l) ← W(l) − η ∂L ∂f(W(l) ) ∂f(W(l) ) ∂W(l) (9) Note that ∂f(W(l) ) ∂W(l) = 1 everywhere due to our identity assumption. 6. As the feedforward is always using the SVD’ed version of the weights, the network is aware of the additional error introduced by the compression and can deal with it during training. The implementation of this technique requires you to define an SVD routine running in the middle of the feedforward process. Both TF and PT provide their SVD implementations you can use: Tensorflow 2x: https://www.tensorflow.org/api_docs/python/tf/linalg/svd PyTorch: https://pytorch.org/docs/stable/generated/torch.svd.html Although it takes more time to train (because you need to do SVD at every iteration), I like it as I can boost the performance of the D = 20 compressed network up to around 97%. Considering the amount of memory saving (i.e., after the compression it uses only about 2%!), this is a great way to compress your network. Problem 4: Speaker Verification [4 points] 1. In this problem, we are going to build a speaker verification system. It takes two utterances as input, and predicts whether they were spoken by the same speaker (positive class) or not (negative class). 2. trs.pkl contains an 500×16,180 matrix, whose row is a speech signal with 16,180 samples. They are the returned vectors from the librosa.load function. Similarly, tes.pkl holds a 200×22,631 matrix. 3. The training matrix is ordered by speakers. Each speaker has 10 utterances, and there are 50 such speakers (that’s why there are 500 rows). Similarly, the test set has 20 speakers, each of which is with 10 utterances. 4. Randomly sample L pairs of utterances from the ten utterance of the first speaker. In theory, there are10 2  = 45 pairs you can sample from (the order of the two utterances within a pair doesn’t matter). You can use all 45 of them if you want. These are the positive examples in your first minibatch. 5. Let’s construct L negative pairs as well. First, randomly sample L utterances from the 49 training speakers. Second, randomly sample another L utterances from the first speaker (the speaker you sampled the positive pairs from). Using these two sets, each has L examples, form another set of L pairs. If L > 10, you’ll need to repeatedly use the first speaker’s utterance (i.e. sampling with replacement). This set is your negative examples, each of whose pair contains an utterance from the first speaker and a random utterance spoken by a different speaker. 6. The L positive pairs and L negative pairs form your first minibatch. You have 2L pairs of utterances in total. 3 7. Repeat this process for the other training speakers, so that each speaker is represented by L positive pairs and L negative pairs. By doing so, you can form 50 minibatches with a balanced number of positive and negative pairs. 8. Train a Siamese network that tries to predict 1 for the positive pairs and 0 for the negative ones. In a minibatch, since you have L positive and L negative pairs, respectively, your net must predict L ones and L zeros, respectively. 9. I found that STFT on the signals serves the initial feature extraction process. Therefore, your Siamese network will take as input TWO spectrograms, each of which is of size 513 × T. I wouldn’t care too much about your choice of the network architecture this time (if it works anyway), but it has to somehow predict a fixed-length feature vector for the given sequence of spectra (consequently, TWO fixted-length vectors for the pair of input spectrograms). Using the inner product of the two latent embedding vectors as the input to the sigmoid function, you’ll do a logistic regression. Use your imagination and employ whatever techniques you learned in class to design/train this network. 10. Construct similar batches from the test set, and test the verification accuracy of your network. Report your test-time speaker verification performance. I was able to get a decent result (∼ 70%) with a reasonable network architecture (e.g., a GRU working on STFT), which converged in a reasonable amount of time (i.e. in an hour). 11. Submit your code and accuracy on the test examples. Problem 5: Speech Denoising Using RNN [4 points] 1. Audio signals natually contain some temporal structure to make use of for the prediction job. Speech denoising is a good example. In this problem, we’ll come up with a reasonably complicated RNN implementation for the speech denoising job. 2. homework3.zip contains a folder tr. There are 1,200 noisy speech signals (from trx0000.wav to trx1199.wav) in there. To create this dataset, I start from 120 clean speech signal spoken by 12 different speakers (10 sentences per speaker), and then mix each of them with 10 different kinds of noise signals. For example, from trx0000.wav to trx0009.wav are all saying the same sentence spoken by the same person, while they are contaminated by different noise signals. I also provide the original clean speech (from trs0000.wav to trs1199.wav) and the noise sources (from trn0000.wav to trn1199.wav) in the same folder. For example, if you add up the two signals trs0000.wav and trn0000.wav, that will make up trx0000.wav, although you don’t have to do it because I already did it for you. 3. Load all of them and convert them into spectrograms like you did in homework 2. Don’t forget to take their magnitudes. For the mixtures (trxXXXX.wav) You’ll see that there are 1,200 nonnegative matrices whose number of rows is 513, while the number of columns depends on the length of the original signal. Ditto for the speech and noise sources. Eventually, you’ll construct three lists of magnitude spectrograms with variable lengths: |X (l) tr |, |S (l) tr |, and |N (l) tr |, where l denotes one of the 1,200 examples. 4. The |X (l) tr | matrices are your input to the RNN for training. An RNN (either GRU or LSTM is fine) will consider it as a sequence of 513 dimensional spectra. For each of the spectra, you want to do a prediction for the speech denoising job. 5. The target of the training procedure is something called Ideal Binary Masks (IBM). You can easily construct an IBM matrix per spectrogram as follows: M(l) f,t = ( 1 if |S (l) tr |f,t > |N (l) tr |f,t 0 if |S (l) tr |f,t ≤ |N (l) tr |f,t (10) 4 IBM assumes that each of the time-frequency bin at (f, t), an element of the |Xtr| (l) matrix, is from either speech or noise. Although this is not the case in the real world, it works like charm most of the time by doing this operation: S (l) tr ≈ Sˆ (l) tr = M(l) ⊙ X (l) tr . (11) Note that masking is done to the complex-valued input spectrograms. Also, since masking is elementwise, the size of M(l) and X (l) tr is same. Eventually, your RNN will learn a function that approximates this relationship: M(l) :,t ≈ Mˆ (l) :,t = RNN |X (l) tr |:,1:t; W  , (12) where W is the network parameters to be estimated. 6. Train your RNN using this training dataset. Feel free to use whatever LSTM or GRU cells available in Tensorflow or PyTorch. I find dropout helpful, but you may want to be gentle about the dropout ratio. I didn’t need too complicated network structures to beat a fully-connected network. 7. Implementation note: In theory you must be able to feed the entire sentence (one of the X (l) tr matrices) as an input sequence. You know, in RNNs a sequence is an input sample. On top of that, you still want to do mini-batching. Therefore, your mini-batch is a 3D tensor, not a matrix. For example, in my implementation, I collect ten spectrograms, e.g. from X (0) tr to X (9) tr , to form a 513 × T × 10 tensor (where T means the number of columns in the matrix). Therefore, you can think that the mini-batch size is 10, while each example in the batch is not a multidimensional feature vector, but a sequence of them. This tensor is the mini-batch input to my network. Instead of feeding the full sequence as an input, you can segment the input matrix into smaller pieces, say 513×Ttrunc×Nmb, where Ttrunc is the fixed number to truncate the input sequence and Nmb is the number of such truncated sequences in a mini-batch, so that the recurrence is limited to Tmb during training. In practice this doesn’t make big difference, so either way is fine. Note that during the test time the recurrence works from the beginning of the sequence to the end (which means you don’t need a truncation for testing and validation). 8. I also provide a validation set in the folder v. Check out the performance of your network on this dataset. Of course you’ll need to see the validation loss, but eventually you’ll need to check out the SNR values. For example, for a recovered validation sequence in the STFT domain, Sˆ (l) v = Mˆ (l) ⊙ X(l) v , you’ll perform an inverse-STFT using librosa.istft to produce a time domain wave form ˆs(t). Normally for this dataset, a well-tuned fully-connected net gives slightly above 10 dB SNR. So, your validation set should give you a number larger than that. Once again, you don’t need to come up with a too large network. Start from a small one. 9. We’ll test the performance of your network in terms of the test data. I provide some test signals in te, but not their corresponding sources. So, you can’t calculate the SNR values for the test signals. Submit your recovered test speech signals in a zip file, which are the speech denoising results on the signals in te. We’ll calculate SNR based on the ground-truth speech we set aside from you.

$25.00 View

[SOLVED] Deep learning systems (engr-e 533) homework 2

Problem 1: Speech Denoising Using Deep Learning [3 points] 1. If you took my MLSP course, you may think that you’ve seen this problem. But, it’s actually somewhat different from what you did before, so read carefully. And, this time you SHOULD implement a DNN with at least two hidden layers. . 2. When you attended IUB, you took a course taught by Prof. K. Since you really liked his lectures, you decided to record them without the professor’s permission. You felt awkward, but you did it anyway because you really wanted to review his lectures later. 3. Although you meant to review the lecture every time, it turned out that you never listened to it. After graduation, you realized that a lot of concepts you face at work were actually covered by Prof. K’s class. So, you decided to revisit the lectures and study the materials once again using the recordings. 4. You should have reviewed your recordings earlier. It turned out that a fellow student who used to sit next to you always ate chips in the middle of the class right beside your microphone. So, Prof. K’s beautiful deep voice was contaminated by the annoying chip eating noise. 5. But, you vaguly recall that you learned some things about speech denoising and source separation from Prof. K’s class. So, you decided to build a simple deep learning-based speech denoiser that takes a noisy speech spectrum (speech plus chip eating noise) and then produces a cleaned-up speech spectrum. 1 6. Since you don’t have Prof. K’s clean speech signal, I prepared this male speech data recorded by other people. train dirty male.wav and train clean male.wav are the noisy speech and its corresponding clean speech you are going to use for training the network. Take a listen to them. Load them and covert them into spectrograms, which are the matrix representation of signals. To do so, you’ll need to install librosa and use it by using the following codes: !pip install librosa # in colab, you’ll need to install this import librosa s, sr=librosa.load(‘train_clean_male.wav’, sr=None) S=librosa.stft(s, n_fft=1024, hop_length=512) sn, sr=librosa.load(‘train_dirty_male.wav’, sr=None) X=librosa.stft(sn, n_fft=1024, hop_length=512) which is going to give you two matrices S and X of size 513 × 2459. This procedure is something called Short-Time Fourier Transform. 7. Take their magnitudes by using np.abs() or whatever other suitable methods, because S and X are complex valued. Let’s call them |S| and |X|. 8. Train a fully-connected deep neural network. A couple of hidden layers would work, but feel free to try out whatever structure, activation function, initialization scheme you’d like. The input to the network is a column vector of |X| (a 513-dim vector) and the target is its corresponding one in |S|. You may want to do some mini-batching for this. Make use of whatever functions in Tensorflow or Pytorch. 9. But, remember that your network should predict nonnegative magnitudes as output. Try to use a proper activation function in the last layer to make sure of that. I don’t care which activation function you use in the middle layers. 10. test 01 x.wav is the noisy signal for validation. Load them and apply STFT as before. Feed the magnitude spectra of this test mixture |Xtest| to your network and predict their clean magnitude spectra |Sˆ test|. Then, you can recover the (complex-valued) speech spectrogram of the test signal in this way: Sˆ = Xtest |Xtest| ⊙ |Sˆ test|, (1) which means you take the phase information of the input noisy signal Xtest |Xtest| and use that to recover the clean speech. ⊙ stands for the Hadamard product and the division is element-wise, too. 11. Recover the time domain speech signal by applying an inverse-STFT on Sˆ test, which will give you a vector. Let’s call this cleaned-up test speech signal sˆtest. I’ll calculate something called Signal-to-Noise Ratio (SNR) by comparing it with the ground truth speech I didn’t share with you. It should be reasonably good. You can actually write it out by using the following code: librosa.output.write_wav(‘test_s_01_recons.wav’, sh_test, sr) or 2 import soundfile as sf sf.write(‘test_s_01_recons.wav’, sh_test, sr) 12. You can compute SNR if you know the ground-truth source. Load test 01 s.wav. This is the ground-truth clean signal buried in test 01 x.wav. Compute the SNR of the predicted validation signal by comparing it to test 01 s.wav, but do not include this example to your training process. Once the training process is done, or even in the middle of training epochs, you apply your model to this validation example, and compute the SNR value. That way, you can simulate the testing environment, although it doesn’t guarantee that the model will work well on the test example, because the validation example can be different from the test set. This approach is related to the early stopping technique explained in M03 S37. Use this validation signal to prevent overfitting. By the way, SNR is defined as follows: SNR = 10 log10 P t s 2 (t) P ts(t) − sˆ(t) 2 , (2) where s(t) and ˆs(t) are the ground-truth clean speech and the recovered one in the time domain, respectively. Be careful with the division and logarithm: you don’t want your denominator to be zero or anything inside the log function zero. Adding a very small number, e.g., 1e −20, is a good idea to prevent it. 13. Do the same testing procedure for test 02 x.wav, which actually contains Prof. K’s voice along with the chip-eating noise. Enjoy his enhanced voice using your DNN. 14. Grading will be based on the denoised version of test 02 x.wav. So, submit the audio file. Problem 2: Speech Denoising Using 1D CNN [4 points] 1. As an audio guy it’s sad to admit, but a lot of audio signal processing problems can be solved in the time-frequency domain, or an image version of the audio signal. You’ve learned how to do it in the previous homework by using STFT and its inverse process. 2. What that means is nothing stops you from applying a CNN to the same speech denoising problem. In this question, I’m asking you to implement a 1D CNN that does the speech denoising job in the STFT magnitude domain. 1D CNN here means a variant of CNN which does the convolution operation along only one of the axes. In our case it’s the frequency axis. 3. Like you did in Problem 1, install/load librosa. Take the magnitude spectrograms of the dirty signal and the clean signal |X| and |S|. 4. Both in Tensorflow and PyTorch, you’d better transpose this matrix, so that each row of the matrix is a spectrum. Your 1D CNN will take one of these row vectors as an example, i.e. |X| ⊤ :,i. Since this is not an RGB image with three channels, nor you’ll use any other information than just the magnitude during training, your input image has only one channel (depth-wise). Coupled with your choice of the minibatch size, the dimensionality of your minibatch would be like this: [(batch size) × (number of channels) × (height) × (width)] = [B × 1 × 1 × 513]. Note that depending on the implementation of the 1D CNN layers in TF or PT, it’s okay to omit the height information. Carefully read the definition of the function you’ll use. 3 5. You’ll also need to define the size of the kernel, which will be 1 × D, or simply D depending on the implementation (because we know that there’s no convolution along the height axis). 6. If you define K kernels in the first layer, the output feature map’s dimension will be [B ×K × 1 × (513 − D + 1)]. You don’t need too many kernels, but feel free to investigate. You don’t need too many hidden layers, either. 7. In the end, you know, you have to produce an output matrix of [B × 513], which are the approximation of the clean magnitude spectra of the batch. It’s a dimension hard to match using CNN only, unless you take care of the edges by padding zeros (let’s not do zero-padding for this homework). Hence, you may want to flatten the last feature map as a vector, and add a regular linear layer to reduce that dimensionality down to 513. 8. Meanwhile, although this flattening-followed-by-linear-layer approach should work in theory, the dimensionality of your flattened CNN feature map might be too large. To handle this issue, we will used the concept we learned in class, striding: usually, a stride larger than 1 can reduce the dimensionality after each CNN layer. You could consider this option in all convolutional layers to reduce the size of the feature maps gradually, so that the input dimensionality of the last fully-connected (FC) layer is manageable. Maxpooling, coupled with the striding technique, would be something to consider. 9. Be very careful about this dimensionality, because you have to define the input and output dimensionality of the FC layer in advance. For example, a stride of 2 pixels will reduce the feature dimension down to roughly 50%, though not exactly if the original dimensionality is an odd number. 10. Don’t forget to apply the activation function of your choice, at every layer, especially in the last layer. 11. Try whatever optimization techniques you’ve learned so far. 12. Check on the quality of the test signal you used in P1. Submit the denoised signal. Problem 3: Data Augmentation [4 points] 1. CIFAR10 is a pretty straightforward image classification task, that consists of 10 visual object classes. 2. Download them from here1 and be ready to use it. Both PyTorch and Tensorflow have options to conveniently load them, but I chose to download them directly and mess around because I found it easier. 3. Set aside 5,000 training examples for validation. 4. Build your baseline CNN classifier. (a) The images need to be reshaped into 32 × 32 × 3 tensor. (b) Each pixel is an integer with 8bit encoding (from 0 to 255). Transform them down to a floating point with a range [0, 1]. 0 means a black pixel and 1 is a white one. 1https://www.cs.toronto.edu/ kriz/cifar.html 4 (c) People like to rescale the pixels to [-1, 1] so that the input to the CNN is well centered around 0, instead of 0.5. (d) I know you are eager to try out a fancier net architecture, but let’s stick to this simple one: 1st 2d conv layer: there are 10 kernels whose size is 5x5x3; stride=1 Maxpooling: 2×2 with stride=2 1st 2d conv layer: there are 10 kernels whose size is 5x5x10; stride=1 Maxpooling: 2×2 with stride=2 1st fully-connected layer: [flattened final feature map] x 20 2st fully-connected layer: 20 x 10 Softmax on the 10 classes Let’s stick to ReLU for activation and the He initializer. (e) Train this net with an Adam optimizer with a default initial learning late (i.e. 0.001). Check on the validation accuracy at the end of every epoch. Report your validation accuracy over the epochs as a graph. This is the performance of your baseline system. 5. Build another classifier using augmented dataset. Prepare four different datasets out of the original CIFAR10 training set (except for the 5,000 you set aside for validation): (a) I know you already changed the scale of the pixels from 0—255 to -1—+1. Let’s go back to the intermediate range, 0—1. (b) Augmented dataset #1: Brighten every pixel in every image by 10%, e.g., by multiplying 1.1. Make sure though, that they don’t exceed 1. For example, you may want to do something like this: np.minimum(1.1*X, 1). (c) Augmented dataset #2: Darken every pixel in every image by 10%, e.g., by multiplying 0.9. (d) Augmented dataset #3: Flip all images horizontally (not upside down). As if they are mirrored. (e) Augmented dataset #4: The original training set. (f) Merge the four augmented dataset into one gigantic training set. Since there are 45,000 images in the original training set (after excluding the validation set), after the augmentation you have 45,000×4=180,000 images. Each original image has four different versions: brighter, darker, horizontally flipped, and original versions. Note that the four share the same label: a darker frog is still a frog. (g) Don’t forget to scale back to -1—+1. (h) You better visualize a few images after the augmentation to make sure what you did is correct. (i) Train a fresh new network with the same architecture, but using this augmented dataset. Record the validation accuracy over the epochs. 6. Overlay the validation accuracy curve from the baseline with the new curve recorded from the augmented dataset. I ran 200 epochs for both experiments and was able to see convincing results (i.e., the data augmentation improves the validation performance). 7. In theory you have to conduct a test run on the test set, but let’s forget about it. 5 Problem 4: Self-Supervised Learning via Pretext Tasks [4 points] 1. Suppose that you have only 50 labeled examples per class for your CIFAR10 classification problem, totaling 500 training images. Presumably it might be tough to achieve a high performance in this situation. 2. Set aside 500 examples from your training set (I chose the last 500 examples). 3. The pretext task: (a) On the other hand, we will assume that the rest of the 49,500 training examples are unlabeled. We will create a bogus classification problem using them. Let this unlabeled examples (or the examples that you disregard their original labels) be “class 0”. (b) “class 1”: Create a new class, by vertically flipping all the images upside down. (c) “class 2”: Create another class, by rotating the images 90 degree counter-clock wise. (d) Now you have three classes, each of which contains 49,500 labeled examples. (e) This is not a classification problem one can be serious about, but the idea here is that a classifier that is trained to solve this problem may need to learn some features that are going to be helpful for the original CIFAR10 classification problem. (f) Train a network with the same setup/architecture described in Problem 3. In theory you need to validate every now and then to prevent overfitting, but who cares about this dummy problem? Let’s forget about it and just run about a hundred epochs. (g) Store your model somewhere safe. Both TF and PT provide a nice way to save the net parameters. 4. The baseline: (a) Train a classifier from scratch on the 500 CIFAR10 dataset you set aside in the begining. Note that they are for the original 10-class classification problem, and you ARE doing the original CIFAR10 classification, except that you use a ridiculously small amount of dataset. Let’s stick to the same architecture/setup. You may need to choose a reasonable initializer, e.g., the He initializer. You know, since the training set is too small, you may not even have to do batching. (b) Let’s cheat here and use the test set of 10,000 examples as if they are our validation set. If you check on the test accuracy at every 100th epoch, you will see it overfit at some point. Record the accuracy values over iterations. 5. The transfer learning task: (a) Train our third classifier on the 500 CIFAR10 dataset you set aside in the begining. Again, note that they are for the original 10-class classification problem. (b) Instead of using an initializer, you will reload the weights from the pretext network. Yes, that’s exactly the definition of transfer learning. But, because you learned it from an unlabeled set, and had to create a pretext task to do so, it falls in the category of self-supervised learning. 6 (c) Note that you can trasfer all the parameters in except for the final softmax layer, as the pretext task is only with 3 classes. Let’s randomly initialize the last layer parameters with He. (d) You need to reduce the learning rates for transfer learning in general. More importantly, for the ones you transfer in, they have to be substantially lower than 1 × 10−3 , e.g. 1 × 10−5or1 × 10−6 . Meanwhile, the last softmax layer will prefer the default learning rate 1 × 10−3 , as it’s randomly initialized. (e) Report your test accuracy at every 100th epoch. 6. Draw two graphs from the two experiments, the baseline and the finetuning method, and compare the results. For your information, I ran both of them 10,000 epochs, and recorded the validation accuracy (actually, the test accuracy as I used the test set) at every 100th epoch. Of course, the point is that the self-supervised features should give improvement.

$25.00 View

[SOLVED] Deep learning systems (engr-e 533) homework 1

1. Replicate the test accuracy graph on M02-S09. 2. Show me your weight visualization, too. 3. Please do not use any advanced optimization methods (Adam, batch norm, dropout, etc.) or initialization methods (Xavier and so on). Plan SGD should just work. 4. In TF 2.x, you can do something like this to download the MNIST dataset: mnist = tf.keras.datasets.mnist In PT, you can use these lines of commands (don’t worry about the batch size and normalization– you can go for your own option for them): import torchvision mnist_train=torchvision.datasets.MNIST(’mnist’, train=True, download=True, transform=torchvision.transforms.Compose([ 1 torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((0.1307,), (0.3081,)) ])) mnist_test=torchvision.datasets.MNIST(’mnist’, train=False, download=True, transform=torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((0.1307,), (0.3081,)) ])) Problem 2: Autoencoders [4 points] 1. Replicate the test accuracy graph on M02-S12. 2. It means, you also want to show the figures in M02-S11. 3. Note that your encoder weights are frozen; you only update the softmax layer weights (the 100 × 10 matrix and the bias). Problem 3: A shallow NN [3 points] 1. Replicate the test accuracy graph on M02-S14. 2. I don’t have to see the visualization of the first layer. Just show me your graphs. Problem 4: Full BP on the both layers [6 points] 1. Replicate the test accuracy graph on M02-S17. Replicate the figures in M03 Adult Optimization, slide 22 using the details as follows: 1. Use the same network architecture and train five different network instances in five different setups. The architecture has to be a fully connected network (a regular network, not a CNN or RNN) with five hidden layers, 512 hidden units per layer. 2. Create five different networks that share the same architecture as follows: (a) Activation function: the logistic sigmoid function; initialization: random numbers generated from the normal distribution (µ = 0, σ = 0.01) (b) Activation function: the logistic sigmoid function; initialization: Xavier initializer (c) Activation function: ReLU; initialization: random numbers generated from the normal distribution (µ = 0, σ = 0.01) (d) Activation function: ReLU; initialization: Xavier initializer (e) Activation function: ReLU; initialization: Kaiming He’s initializer 3. You don’t have to implement your own initializer. Both TF and PT come with pre-implemented initializers. 2 4. Train them with the traditional SGD. Do not improve SGD by introducing momentum or any other advanced stuff. Your goal is to replicate the figures in 22. Feel free to use preimplemented SGD optimizer. 5. In practice, you will need to investigate different learning rate as for SGD, which will give you different convergence behaviors. 6. Don’t worry if your graphs are slightly different from mine. We will give a full mark if your graphs show the same trend.

$25.00 View

[SOLVED] Csci 3104 problem set 10

(15 pts total) A matching in a graph G is a subset EM ⊆ E(G) of edges such that each vertex touches at most one of the edges in EM. Recall that a bipartite graph is a graph G on two sets of vertices, V1 and V2, such that every edge has one endpoint in V1 and one endpoint in V2. We sometimes write G = (V1,V2;E) for this situation. For example: V1 : 1 2 3 4 5 6 V2 : 7 8 9 10 11 The edges in the above example consist of all the lines, whether solid or dotted; the solid lines form a matching. The bipartite maximum matching problem is to find a matching in a given bipartite graph G , which has the maximum number of edges among all matchings in G. (a) Prove that a maximum matching in a bipartite graph G = (V1,V2;E) has size at most min{|V1|,|V2|}. (b) Show how you can use an algorithm for max-flow to solve bipartite maximum matching on undirected simple bipartite graphs. That is, give an algorithm which, given an undirected simple bipartite graph G = (V1,V2;E), (1) constructs a directed, weighted graph G0 (which need not be bipartite) with weights w : E(G0) → R as well as two vertices s,t ∈ V (G0), (2) solves max-flow for (G0,w),s,t, and (3) uses the solution for max-flow to find the maximum matching in G. Your algorithm may use any max-flow algorithm as a subroutine. (c) Show the weighted graph constructed by your algorithm on the example bipartite graph above. 2. (20 pts total) In the review session for his Deep Wizarding class, Dumbledore reminds everyone that the logical definition of NP requires that the number of bits in the witness w is polynomial in the number of bits of the input n. That is, |w| = poly(n). With a smile, he says that in beginner wizarding, witnesses are usually only logarithmic in size, i.e., |w| = O(logn). (a) Because you are a model student, Dumbledore asks you to prove, in front of the whole class, that any such property is in the complexity class P. (b) Well done, Dumbledore says. Now, explain why the logical definition of NP implies that any problem in NP can be solved by an exponential-time algorithm. 1 CSCI 3104 Problem Set 10 (c) Dumbledore then asks the class: “So, is NP a good formalization of the notion of problems that can be solved by brute force? Discuss.” Give arguments for both possible answers. 3. (30 pts total) The Order of the Phoenix is trying to arrange to watch all the corridors in Hogwarts, to look out for any Death Eaters. Professor McGonnagall has developed a new spell, Multi-Directional Sight, which allows a person to get a 360-degree view of where they are currently standing. Thus, if they are able to place a member of the Order at every intersection of hallways, they’ll be able to monitor all hallways. In order not to spare any personnel, they want to place as few people as possible at intersections, while still being able to monitor every hallway. (And they really need to monitor every hallway, since Death Eaters could use Apparition to teleport into an arbitrary hallway in the middle of the school.) Call a subset S of intersections “safe,” if, by placing a member of the Order at each intersection in S, every hallway is watched. (a) Formulate the above as an optimization problem on a graph. Argue that your formulation is an accurate reflection of the problem. In your formulation, show that the following problem is in NP: Given a graph G and an integer k, decide whether there a safe subset of size ≤ k. (b) Consider the following greedy algorithm to find a safe subset S = empty mark all hallways unwatched while there is an unwatched intersection pick any unwatched hallway; let u,v be its endpoints add u to S for all hallways h with u as one of its endpoints mark h watched end end Although this algorithm need not find the minimum number of people needed to cover all hallways, prove that it always outputs a safe set, and prove that it always runs in polynomial time. (c) Note that, in order to be polynomial-time, an algorithm for this problem cannot simply try all possible subsets of intersections. Prove why not. (d) Give an example where the algorithm from (3b) outputs a safe set that is strictly larger than the smallest one. In other words, give a graph G, give a list of vertices 2 CSCI 3104 Problem Set 10 in the order in which they are picked by the algorithm, and a safe set in G which is strictly smaller than the safe set output by the algorithm. (e) Consider the following algorithm: S = empty mark all hallways unwatched while there is an unwatched hallway pick any unwatched hallway; let u,v be its endpoints add u,v to S for all hallways h with u or v one of their endpoints mark h watched end end Prove that this algorithm always returns a safe set, and runs in polynomial time. (f) In any safe set of intersections, each hallway is watched by at least one member of the Order. Use this to show that the algorithm from (3e) always outputs a safe set whose size is no more than twice the size of the smallest safe set. Note: you don’t need to know what the smallest safe set is to prove this! All you need is the fact stated here. This is called a “2-approximation algorithm,” because it is guaranteed to output a solution that is no worse than a factor of 2 times an optimal solution. (g) Does the algorithm from (3b) always produce a safe set no bigger than that produced by the algorithm in (3e)? If so, give a proof; if not, give a counterexample. A counterexample here consists of a graph, and for each algorithm, the list of vertices it chooses in the order it chooses them, such that the safe set output by algorithm (3b) is at least as large as the safe set output by algorithm (3e). If you are unable to give either a proof or a counterexample, then for partial credit give a plausible intuitive argument for your answer. (h) Compare the greedy algorithm from (3e) with the greedy algorithm from (3b). Show which runs faster asymptotically? Which of these two algorithms would you rather use to solve the Order of the Phoenix’s problem and why? (i) This problem is, in fact, NP-complete. Why does the 2-approximation polynomialtime algorithm from (3e) not show that P=NP?1 1Interestingly, it is known that if there were a 1.3606…-approximation algorithm for this problem in polynomial time, then it would follow that P=NP, but that is a very nontrivial theorem. Under a standard complexity-theoretic assumption, even if there were a 1.99999-approximation algorithm in polynomial time, 3 CSCI 3104 Problem Set 10 4. (20 pts extra credit) Every young wizard learns the classic NP-complete problem of determining whether some unweighted, undirected graph G = (V,E) contains a simple path of length at least k (where both G and k are part of the input to the problem), known as the Longest Path Problem. Recall that a simple path is a path (v1,v2,…,v`) where each (vi,vi+1) in the path is an edge, and all the vi are distinct; its length is `−1 (=the number of edges in the path). (a) Ginny Weasley is working on a particularly tricky instance of this problem for her Witchcraft and Algorithms class, and she believes she has written down a “witness” for a particular input (G,k) in the form of a path P on its vertices. Explain how she should verify in polynomial time whether P is or is not simple path of length ≥ k. (And hence, demonstrate that the problem of Longest Path is in the complexity class NP.) (b) For the final exam in Ginny’s class, each student must visit the Oracle’s Well in the Forbidden Forest. For every bronze Knut a young wizard tosses into the Well, the Oracle will give a yes or no response as to whether, given an arbitrary graph G and an integer k, G contains a simple path of length ≥ k. Ginny is given an arbitrary graph G and must find the longest simple path in G. First, she realizes it would be useful to determine the length of the longest simple path. Describe an algorithm that will allow Ginny to use the Oracle to find the length of the longest simple path in G by asking it a series of questions, each involving a modified version of the original graph G and a number k. Her solution must not cost more Knuts than a number that grows polynomially as a function of the number of vertices in G. (Hence, prove that if we can solve the Longest Path decision problem in polynomial time, we can solve its optimization problem as well.) (c) Next, once she knows the length ` of the longest simple path in G, Ginny muse use the Oracle to actually find a path of length `. Describe an algorithm that will allow Ginny to use the Oracle to find the longest simple path in G by asking it a series of questions, each involving a modified version of the original graph G and a number k of her choosing (for each question she can ask about a different graph G and a different number k). Her solution must not cost more Knuts than a number that grows polynomially as a function of the number of vertices in G. (Hence, it would follow that P=NP, but this assumption remains a conjecture, and opinion in the research community is divided on whether this conjecture is true or false. We will provide references to these results after the problem set has been handed in. 4 CSCI 3104 Problem Set 10 prove that if we can solve the Longest Path decision problem in polynomial time, we can solve its search problem as well.) 5. (20 pts extra credit) Recall that the MergeSort algorithm (Chapter 2.3 of CLRS) is a sorting algorithm that takes Θ(nlogn) time and Θ(n) space. In this problem, you will implement and instrument MergeSort, then perform a numerical experiment that verifies this asymptotic analysis. There are two functions and one experiment to do this. (i) MergeSort(A,n) takes as input an unordered array A, of length n, and returns both an in-place sorted version of A and a count t of the number of atomic operations performed by MergeSort. (ii) randomArray(n) takes as input an integer n and returns an array A such that for each 0 ≤ i < n, A[i] is a uniformly random integer between 1 and n. (It is okay if A is a random permutation of the first n positive integers; see the end of Chapter 5.3.) (a) From scratch, implement the functions MergeSort and randomArray. You may not use any library functions that make their implementation trivial. You may use a library function that implements a pseudorandom number generator in order to implement randomArray. Submit a paragraph that explains how you instrumented MergeSort, i.e., explain which operations you counted and why these are the correct ones to count. (b) For each of n = {24,25,…,226,227}, run MergeSort(randomArray(n),n) fives times and record the tuple (n,hti), where hti is the average number of operations your function counted over the five repetitions. Use whatever software you like to make a line plot of these 24 data points; overlay on your data a function of the form T(n) = Anlogn, where you choose the constant A so that the function is close to your data. Hint 1: To increase the aesthetics, use a log-log plot. Hint 2: Make sure that your MergeSort implementation uses only two arrays of length n to do its work. (For instance, don’t do recursion with pass-by-value.)

$25.00 View

[SOLVED] Csci 3104 problem set 9

1. (10 pts) Let G = (V,E) be a graph with an edge-weight function w, and let the tree T ⊆ E be a minimum spanning tree on G. Now, suppose that we modify G slightly by decreasing the weight of exactly one of the edges in (x,y) ∈ T in order to produce a new graph G0. Here, you will prove that the original tree T is still a minimum spanning tree for the modified graph G0. To get started, let k be a positive number and define the weight function w0 as w0(u,v) =w(u,v) if (u,v) 6= (x,y) w(x,y)−k if (u,v) = (x,y) . Now, prove that the tree T is a minimum spanning tree for G0, whose edge weights are given by w0. 2. (20 pts) Professor Snape gives you the following unweighted graph and asks you to construct a weight function w on the edges, using positive integer weights only, such that the following conditions are true regarding minimum spanning trees and singlesource shortest path trees: • The MST is distinct from any of the seven SSSP trees. • The order in which Jarn´ık/Prim’s algorithm adds the safe edges is different from the order in which Kruskal’s algorithm adds them. • Boru˙vka’s algorithm takes at least two rounds to construct the MST. Justify your solution by (i) giving the edges weights, (ii) showing the corresponding MST and all the SSSP trees, and (iii) giving the order in which edges are added by each of the three algorithms. (For Boru˙vka’s algorithm, be sure to denote which edges are added simultaneously in a single round.) a b d f c e g 1 CSCI 3104 Problem Set 9 3. (10 pts extra credit) Crabbe and Goyle think they have come up with a way to get rich by playing the foreign exchange markets in the wizarding world. Their idea is to exploit these exchange rates in order to transform one unit of British wizarding money into more than one unit of British wizarding money, through a sequence of money exchanges. For instance, suppose 1 British wizarding penny buys 0.82 French wizarding pennies, 1 French wizarding penny buys 129.7 Russian wizarding pennies, and finally 1 Russian wizarding penny buys 0.0008 British wizarding pennies. By converting these coins, Crabbe and Goyle think they could start with 1 British wizarding penny and buy 0.82×129.7×12×0.0008 ≈ 1.02 British wizarding pennies, thereby making a 2% profit! The problem is that those goblins at Gringots charge a transaction cost for each exchange. Suppose that Crabbe and Goyle start with knowledge of n wizard monies c1,c2,…,cn and an n×n table R of exchange rates, such that one unit of wizard money ci buys R[i,j] units of wizard money cj. A traditional arbitrage opportunity is thus a cycle in the induced graph such that the product of the edge weights is greater than unity. That is, a sequence of currencies hci1,ci2,…,ciki such that R[i1,i2]×R[i2,i3]×···× R[ik−1,ik]×R[ik,i1] 1. Each transaction, however, must pay Gringots a fraction α of the total transaction value, e.g., α = 0.01 for a 1% rate. (a) When given R and α, give an efficient algorithm that can determine if an arbitrage opportunity exists. Analyze the running time of your algorithm. Hermione’s hint: It is possible to solve this problem in O(n3). Recall that BellmanFord can be used to detect negative-weight cycles in a graph. (b) For an arbitrary R, explain how varying α changes the set of arbitrage opportunities that exist and that your algorithm might identify. 4. (40 pts) Bidirectional breadth-first search is a variant of standard BFS for finding a shortest path between two vertices s,t ∈ V (G). The idea is to run two breadth-first searches simultaneously, one starting from s and one starting from t, and stop when they “meet in the middle” (that is, whenever a vertex is encountered by both searches). “Simultaneously” here doesn’t assume you have multiple processors at your disposal; it’s enough to alternate iterations of the searches: one iteration of the loop for the BFS that started at s and one iteration of the loop for the BFS that started at t. As we’ll see, although the worst-case running time of BFS and Bidirectional BFS are asymptotically the same, in practice Bidirectional BFS often performs significantly better. Throughout this problem, all graphs are unweighted, undirected, simple graphs. 2 CSCI 3104 Problem Set 9 (a) Give examples to show that, in the worst case, the asymptotic running time of bidirectional BFS is the same as that of ordinary BFS. Note that because we are asking for asymptotic running time, you actually need to provide an infinite family of examples (Gn,sn,tn) such that sn,tn ∈ V (Gn), the asymptotic running time of BFS and bidirectional BFS are the same on inputs (Gn,sn,tn), and|V (Gn)|→∞ as n →∞. (b) Recall that in ordinary BFS we used a state array (see Lecture Notes 8) to keep track of which nodes had been visited before. In bidirectional BFS we’ll need two state arrays, one for the BFS from s and one for the BFS from t. Why? Give an example to show what can go wrong if there’s only one state array. In particular, give a graph G and two vertices s,t such that some run of a bidirectional BFS says there is no path from s to t when in fact there is one. (c) Implement from scratch a function BFS(G,s,t) that performs an ordinary BFS in the (unweighted, directed) graph G to find a shortest path from s to t. Assume the graph is given as an adjacency list; for the list of neighbors of each vertex, you may use any data structure you like (including those provided in standard language libraries). Have your function return a pair (d,k), where d is the distance from s to t (-1 if there is no s to t path), and k is the number of nodes popped off the queue during the entire run of the algorithm. (d) Implement from scratch a function BidirectionalBFS(G,s,t) that takes in a(n unweighted, directed) graph G, and two of its vertices s,t, and performs a bidirectional BFS. As with the previous function, this function should return a pair (d,k) where d is the distance from s to t (-1 if there is no path from s to t) and k is the number of vertices popped off of both queues during the entire run of the algorithm. (e) For each of the following families of graphs Gn, write code to execute BFS and BidirectionalBFS on these graphs, and produce the following output: • In text, the pairs (n,d1,k1,d2,k2) where n is the index of the graph, (d1,k1) is the output of BFS and (d2,k2) is the output of BidirectionalBFS. • a plot with n on the x-axis, k on the y-axis, and with two line charts, one for the values of k1 and one for the values of k2: i. Grids. Gn is an n×n grid, where each vertex is connected to its neighbors in the four cardinal directions (N,S,E,W). Vertices on the boundary of the grid will only have 3 neighbors, and corners will only have 2 neighbors. Let sn be the midpoint of one edge of the grid, and tn the midpoint of the opposite edge. For example, for n = 3 we have: 3 CSCI 3104 Problem Set 9 • • • s3 • t3 • • • (When n is even sn and tn can be either “midpoint,” since there are two.) Produce output for n = 3,4,5,…,20. ii. Trees. Gn is a complete binary tree of depth n. sn is the root and tn is any leaf. Produce output for n = 3,4,5,…,15. For example, for n = 3 we have: s3 • • • • t3 • iii. Random graphs. Gn is a graph on n vertices constructed as follows. For each pair of of vertices (i,j), get a random boolean value; if it is true, include the edge (i,j), otherwise do not. Let sn be vertex 1 and tn be vertex 2 (food for thought: why does it not matter, on average, which vertices we take s,t to be?) For each n, produce 50 such random graphs and report just the average values of (d1,k1,d2,k2) over those 50 trials. Produce this output for n = 3,4,5,…,20.

$25.00 View

[SOLVED] Csci 3104 problem set 8

1. (10 pts) Ginerva Weasley is playing with the network given below. Help her calculate the number of paths from node 1 to node 14. Hint: assume a “path” must have at least one edge in it to be well defined, and use dynamic programming to fill in a table that counts number of paths from each node j to 14, starting from 14 down to 1. 1 //  2 //  3 //  4 //  5 // !! 6  7 // 8 // 9 //  10 // !! 11 !! 12 // 13 // 14 2. (10 pts) Ginny Weasley needs your help with her wizardly homework. She’s trying to come up with an example of a directed graph G = (V,E), a start vertex v ∈ V and a set of tree edges ET ⊆ E such that for each vertex v ∈ V , the unique path in the graph (V,ET) from s to v is a shortest path in G, yet the set of edges ET cannot be produced by running a depth-first search on G, no matter how the vertices are ordered in each adjacency list. Include an explanation of why your example satisfies the requirements. 3. (15 pts) Prof. Dumbledore needs your help to compute the in- and out-degrees of all vertices in a directed multigraph G. However, he is not sure how to represent the graph so that the calculation is most efficient. For each of the three possible representations, express your answers in asymptotic notation (the only notation Dumbledore understands), in terms of V and E, and justify your claim. (a) An adjacency matrix representation. Assume the size of the matrix is known. (b) An edge list representation. Assume vertices have arbitrary labels. (c) An adjacency list representation. Assume the vector’s length is known. 4. (30 pts) Deep in the heart of the Hogwarts School of Witchcraft and Wizardry, there lies a magical grey parrot that demands that any challenger efficiently convert directed multigraphs into directed simple graphs. If the wizard can correctly solve a series of arbitrary instances of this problem, the parrot will unlock a secret passageway. Let G = (E,V ) denote a directed multigraph. A directed simple graph is a G0 = (V,E0), such that E0 is derived from the edges in E so that (i) every directed multiedge, e.g.,{(u,v),(u,v)}or even simply{(u,v)}, has been replaced by a single directed 1 CSCI 3104 Problem Set 8 1 2 3 54 1 2 3 54 1 3 2 2 1 3 3 3 3 4 4 4 5 2 5 1 5 3 1 input output 1 2 3 2 1 3 3 4 4 1 2 5 5 1 3 G = (V,E) G′ = (V,E′) An example of transforming G → G0 edge {(u,v)} and (ii) all self-loops (u,u) have been removed. Describe and analyze an algorithm (explain how it works, give pseudocode if necessary, derive its running time and space usage, and prove its correctness) that takes O(V + E) time and space to convert G into G0, and thereby will solve any of the Sphinx’s questions. Assume both G and G0 are stored as adjacency lists. Hermione’s hints: Don’t assume adjacencies Adj[u] are ordered in any particular way, and remember that you can add edges to the list and then remove ones you don’t need. 5. (15 pts extra credit) Professor McGonagall has provided the young wizard Ron with three magical batteries whose sizes are 42, 27, and 16 morts, respectively. (A mort is a unit of wizard energy.) The 27-mort and 16-mort batteries are fully charged (containing 27 and 16 morts of energy, respectively), while the 42-mort battery is empty, with 0 morts. McGonagall says that Ron is only allowed to use, repeatedly, if necessary, the mort transfer spell when working with these batteries. This spell transfers all the morts in one battery to another battery, and it halts the transfer either when the source battery has no morts remaining or when the destination battery is fully charged (whichever comes first). 2 CSCI 3104 Problem Set 8 McGonagall challenges Ron to determine whether there exists a sequence of morttransfer spells that leaves exactly 12 morts either in the 27-mort or in the 16-mort battery. (a) Ron knows this is actually a graph problem. Give a precise definition of how to model this problem as a graph, and state the specific question about this graph that must be answered. (b) What algorithm should Ron apply to solve the graph problem? (c) Apply that algorithm to McGonagall’s question. Report and justify your answer.

$25.00 View

[SOLVED] Csci 3104 problem set 7

1. (45 pts) Recall that the string alignment problem takes as input two strings x and y, composed of symbols xi,yj ∈ Σ, for a fixed symbol set Σ, and returns a minimal-cost set of edit operations for transforming the string x into string y. Let x contain nx symbols, let y contain ny symbols, and let the set of edit operations be those defined in the lecture notes (substitution, insertion, deletion, and transposition). Let the cost of indel be 1, the cost of swap be 13 (plus the cost of the two sub ops), and the cost of sub be 12, except when xi = yj, which is a “no-op” and has cost 0. In this problem, we will implement and apply three functions. (i) alignStrings(x,y) takes as input two ASCII strings x and y, and runs a dynamic programming algorithm to return the cost matrix S, which contains the optimal costs for all the subproblems for aligning these two strings. alignStrings(x,y) : // x,y are ASCII strings S = table of length nx by ny // for memoizing the subproblem costs initialize S // fill in the basecases for i = 1 to nx for j = 1 to ny S[i,j] = cost(i,j) // optimal cost for x[0..i] and y[0..j] }} return S (ii) extractAlignment(S,x,y) takes as input an optimal cost matrix S, strings x,y, and returns a vector a that represents an optimal sequence of edit operations to convert x into y. This optimal sequence is recovered by finding a path on the implicit DAG of decisions made by alignStrings to obtain the value S[nx,ny], starting from S[0,0]. extractAlignment(S,x,y) : // S is an optimal cost matrix from alignStrings initialize a // empty vector of edit operations [i,j] = [nx,ny] // initialize the search for a path to S[0,0] while i 0 or j 0 a[i] = determineOptimalOp(S,i,j,x,y) // what was an optimal choice? [i,j] = updateIndices(S,i,j,a) // move to next position } return a When storing the sequence of edit operations in a, use a special symbol to denote no-ops. 1 CSCI 3104 Problem Set 7 (iii) commonSubstrings(x,L,a) which takes as input the ASCII string x, an integer 1 ≤ L ≤ nx, and an optimal sequence a of edits to x, which would transform x into y. This function returns each of the substrings of length at least L in x that aligns exactly, via a run of no-ops, to a substring in y. (a) From scratch, implement the functions alignStrings, extractAlignment, and commonSubstrings. You may not use any library functions that make their implementation trivial. Within your implementation of extractAlignment, ties must be broken uniformly at random. Submit (i) a paragraph for each function that explains how you implemented it (describe how it works and how it uses its data structures), and (ii) your code implementation, with code comments. Hint: test your code by reproducing the APE / STEP and the EXPONENTIAL / POLYNOMIAL examples in the lecture notes (to do this exactly, you’ll need to use unit costs instead of the ones given above). (b) Using asymptotic analysis, determine the running time of the call commonSubstrings(x, L, extractAlignment( alignStrings(x,y), x,y ) ) Justify your answer. (c) (15 pts extra credit) Describe an algorithm for counting the number of optimal alignments, given an optimal cost matrix S. Prove that your algorithm is correct, and give is asymptotic running time. Hint: Convert this problem into a form that allows us to apply an algorithm we’ve already seen. (d) String alignment algorithms can be used to detect changes between different versions of the same document (as in version control systems) or to detect verbatim copying between different documents (as in plagiarism detection systems). The two data string files for PS7 (see class Moodle) contain actual documents recently released by two independent organizations. Use your functions from (1a) to align the text of these two documents. Present the results of your analysis, including a reporting of all the substrings in x of length L = 9 or more that could have been taken from y, and briefly comment on whether these documents could be reasonably considered original works, under CU’s academic honesty policy. 2. (20 pts) Ron and Hermione are having a competition to see who can compute the nth Pell number Pn more quickly, without resorting to magic. Recall that the nth Pell number is defined as Pn = 2Pn−1 +Pn−2 for n 1 with base cases P0 = 0 and P1 = 1. Ron opens with the classic recursive algorithm: 2 CSCI 3104 Problem Set 7 Pell(n) : if n == 0 { return 0 } else if n == 1 { return 1 } else { return 2*Pell(n-1) + Pell(n-2) } which he claims takes R(n) = R(n−1) + R(n−2) + c = O(φn) time. (a) Hermione counters with a dynamic programming approach that “memoizes” (a.k.a. memorizes) the intermediate Pell numbers by storing them in an array P[n]. She claims this allows an algorithm to compute larger Pell numbers more quickly, and writes down the following algorithm.1 MemPell(n) { if n == 0 { return 0 } else if n == 1 { return 1 } else { if (P[n] == undefined) { P[n] = 2*MemPell(n-1) + MemPell(n-2) } return P[n] } } i. Describe the behavior of MemPell(n) in terms of a traversal of a computation tree. Describe how the array P is filled. ii. Determine the asymptotic running time of MemPell. Prove your claim is correct by induction on the contents of the array. (b) Ron then claims that he can beat Hermione’s dynamic programming algorithm in both time and space with another dynamic programming algorithm, which eliminates the recursion completely and instead builds up directly to the final solution by filling the P array in order. Ron’s new algorithm2 is DynPell(n) : P[0] = 0, P[1] = 1 for i = 2 to n { P[i] = 2*P[i-1] + P[i-2] } return P[n] Determine the time and space usage of DynPell(n). Justify your answers and compare them to the answers in part (2a). 1Ron briefly whines about Hermione’s P[n]=undefined trick (“an unallocated array!”), but she point out that MemPell(n) can simply be wrapped within a second function that first allocates an array of size n, initializes each entry to undefined, and then calls MemPell(n) as given. 2Ron is now using Hermione’s undefined array trick; assume he also uses her solution of wrapping this function within another that correctly allocates the array. 3 CSCI 3104 Problem Set 7 (c) With a gleam in her eye, Hermione tells Ron that she can do everything he can do better: she can compute the nth Pell number even faster because intermediate results do not need to be stored. Over Ron’s pathetic cries, Hermione says FasterPell(n) : a = 0, b = 1 for i = 2 to n c = 2*a + b a = b b = c end return a Ron giggles and says that Hermione has a bug in her algorithm. Determine the error, give its correction, and then determine the time and space usage of FasterPell(n). Justify your claims. (d) In a table, list each of the four algorithms as columns and for each give its asymptotic time and space requirements, along with the implied or explicit data structures that each requires. Briefly discuss how these different approaches compare, and where the improvements come from. (Hint: what data structure do all recursive algorithms implicitly use?) (e) (5 pts extra credit) Implement FasterPell and then compute Pn where n is the four-digit number representing your MMDD birthday, and report the first five digits of Pn. Now, assuming that it takes one nanosecond per operation, estimate the number of years required to compute Pn using Ron’s classic recursive algorithm and compare that to the clock time required to compute Pn using FasterPell

$25.00 View

[SOLVED] Csci 3104 problem set 5

1. (15 pts) Bellatrix Lestrange is writing a secret message to Voldemort and wants to prevent it from being understood by meddlesome young wizards and Muggles. She decides to use Huffman encoding to encode the message. Magically, the symbol frequencies of the message are given by the Pell numbers, a famous sequence of integers known since antiquity and related to the Fibonacci numbers. The nth Pell number is defined as Pn = 2Pn−1 + Pn−2 for n 1 with base cases P0 = 0 and P1 = 1. (a) For an alphabet of Σ = {a,b,c,d,e,f,g,h} with frequencies given by the first |Σ| non-zero Pell numbers, give an optimal Huffman code and the corresponding encoding tree for Bellatrix to use. (b) Generalize your answer to (1a) and give the structure of an optimal code when the frequencies are the first n non-zero Pell numbers. 2. (30 pts) A good hash function h(x) behaves in practice very close to the uniform hashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to h(). Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing. Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet Σ = [A,…,Z], and let the function f(xi) return the index of a letter xi ∈ Σ, e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string x ∈ Σm, define h(x) = ([Pm i=1 f(xi)] mod `), where ` is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the ` buckets. (a) The following list contains US Census derived last names: https://www2.census.gov/topics/genealogy/1990surnames/dist.all.last Using these names as input strings, first choose a uniformly random 50% of these name strings and then hash them using h(x). Produce a histogram showing the corresponding distribution of hash locations when ` = 200. Label the axes of your figure. Briefly describe what the figure shows about h(x), and justify your results in terms of the behavior of h(x). Do not forget to append your code. Hint: the raw file includes information other than name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table. 1 CSCI 3104 Problem Set 5 (b) Enumerate at least 4 reasons why h(x) is a bad hash function relative to the ideal behavior of uniform hashing. (c) Produce a plot showing (i) the length of the longest chain (were we to use chaining for resolving collisions under h(x)) as a function of the number n of these strings that we hash into a table with ` = 200 buckets, (ii) the exact upper bound on the depth of a red-black tree with n items stored, and (iii) the length of the longest chain were we to use a uniform hash instead of h(x). Include a guide of cn Then, comment (i) on how much shorter the longest chain would be under a uniform hash than under h(x), and (ii) on the value of n at which the red-black tree becomes a more efficient data structure than h(x) and separately a uniform hash. 3. (15 pts) Draco Malfoy is struggling with the problem of making change for n cents using the smallest number of coins. Malfoy has coin values of v1 < v2 < ··· < vr for r coins types, where each coin’s value vi is a positive integer. His goal is to obtain a set of counts{di}, one for each coin type, such thatPr i=1 di = k and where k is minimized. (a) A greedy algorithm for making change is the cashier’s algorithm, which all young wizards learn. Malfoy writes the following pseudocode on the whiteboard to illustrate it, where n is the amount of money to make change for and v is a vector of the coin denominations: wizardChange(n,v,r) : d[1 .. r] = 0 // initial histogram of coin types in solution while n 0 { k = 1 while ( k < r and v[k] n ) { k++ } if k==r { return ’no solution’ } else { n = n – v[k] } } return d Hermione snorts and says Malfoy’s code has bugs. Identify the bugs and explain why each would cause the algorithm to fail. (b) Sometimes the goblins at Gringotts Wizarding Bank run out of coins,1 and make change using whatever is left on hand. Identify a set of U.S. coin denominations for which the greedy algorithm does not yield an optimal solution. Justify your 1It’s a little known secret, but goblins like to eat the coins. It isn’t pretty for the coins, in the end. 2 CSCI 3104 Problem Set 5 answer in terms of optimal substructure and the greedy-choice property. (The set should include a penny so that there is a solution for every value of n.) (c) (8 pts extra credit) On the advice of computer scientists, Gringotts has announced that they will be changing all wizard coin denominations into a new set of coins denominated in powers of c, i.e., denominations of c0,c1,…,c` for some integers c 1 and ` ≥ 1. (This will be done by a spell that will magically transmute old coins into new coins, before your very eyes.) Prove that the cashier’s algorithm will always yield an optimal solution in this case. Hint: first consider the special case of c = 2.

$25.00 View