Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Homework 3 coms e6998  problem 1 – adaptive learning rate methods, cifar-10 20 points

We will consider five methods, AdaGrad, RMSProp, RMSProp+Nesterov, AdaDelta, Adam, and study their convergence using CIFAR-10 dataset. We will use multi-layer neural network model with two fully connected hidden layers with 1000 hidden units each and ReLU activation with minibatch size of 128.1. Write the weight update equations for the five adaptive learning rate methods. Explain each term clearly. What are the hyperparameters in each policy ? Explain how AdaDelta and Adam are different from RMSProp. (5+1)2. Train the neural network using all the five methods with L2-regularization for 200 epochs each and plot the training loss vs number of epochs. Which method performs best (lowest training loss) ? (5)3. Add dropout (probability 0.2 for input layer and 0.5 for hidden layers) and train the neural network again using all the five methods for 200 epochs. Compare the training loss with that in part 2. Which method performs the best ? For the five methods, compare their training time (to finish 200 epochs with dropout) to the training time in part 2 (to finish 200 epochs without dropout). (5)4. Compare test accuracy of trained model for all the five methods from part 2 and part 3. Note that to calculate test accuracy of model trained using dropout you need to appropriately scale the weights (by the dropout probability). (4)References: • The CIFAR-10 Dataset.In this problem we will compare strong scaling and weak scaling in distributed training using tensorflow.distribute.strategy in Tensorflow 2.0. tf.distribute.Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines or TPUs. In strong scaling, each worker computes with (batch size/# workers) training examples whereas in weak scaling, the effective batch size of SGD grows as the number of workers increases.For example, in strong scaling, if the batch size with 1 worker is 256, with 2 workers it will be 128 per worker, with 4 workers it will be 64 per worker, thus keeping the effective batch size at 256. In weak scaling, if the batch size with 1 worker is 64, with 2 workers it will be still be 64 per worker (with an effective batch size of 128 with 2 workers), thus the effective batch size increases linearly with number of workers. So the amount of compute per worker decreases in strong scaling whereas with weak scaling it remains constant.Using FashionMNIST dataset and Resnet50 you will run distributed training using tensorflow.distribute.strategy and compare strong and weak scaling scenarios. Using an effective batch size of 256 you will run training jobs with 1,2,4,8,16 learners (each learner is a K80 GPU). For 8 or less number of learners, all the GPUs can be allocated on the same node (GCP povides 8 K80s on one node). You will run each training job for 10 epochs and measure average throughout, training time, and training cost.In total, you will be running 10 training jobs, 5 (with 1,2,4,8,16 GPUs) for weak scaling and 5 for strong scaling. For single node (worker) training using multiple GPUs you will use tf.distribute.MirroredStrategy with default all-reduce. For training with two or more workers you will use tf.distribute.experimental.MultiWorkerMirroredStrategy with CollectiveCommunication.AUTO.1. Plot throughput vs number of learners for weak and strong scaling. (5)2. Plot training time vs number of learners for weak and strong scaling. (5) 3. Plot training cost vs number of learners for weak and strong scaling. The training cost can be estimated using GPU per unit hour cost and the training time. (2)4. For weak scaling, calculate scaling efficiency defined as the increase in time to finish one iteration at a learner as the number of learners increases. Show the plot of scaling efficiency vs number of learners for weak scaling. (5)5. MirroredStrategy uses NVIDIA NCCL (tf.distribute.NcclAllReduce) as the default all-reduce. Change this to tf.distribute.HierarchicalCopyAllReduce and tf.distribute.ReductionToOneDevice and compare throughput of the three all-reduce implementations. You will be doing this for 1,2,4,8 GPUs single-node training. So you will be running 8 new training jobs (4 with HierarchicalCopyAllReduce and 4 with ReductionToOneDevice). For NcclAllReduce you can reuse results from part 1 of the question. (8)6. Change MultiWorkerMirroredStrategy to use CollectiveCommunication.NCCL and CollectiveCommunication.RING and repeat the experiment with 2 nodes. Yow will be running two new training jobs (one with RING and one with NCCL). For AUTO you can reuse throughput from part 1 of the question.Compare the throughput of the three all-reduce methods (AUTO, NCCL, RING) ? Does AUTO gives the best throughput ? (5)References: • Tensorflow Blog. Distributed Training with Tensorflow.In this problem we will study and compare different convolutional neural network architectures. We will calculate number of parameters (weights, to be learned) and memory requirement of each network. We will also analyze inception modules and understand their design.1. Calculate the number of parameters in Alexnet. You will have to show calculations for each layer and then sum it to obtain the total number of parameters in Alexnet. When calculating you will need to account for all the filters (size, strides, padding) at each layer. Look at Sec. 3.5 and Figure 2 in Alexnet paper (see reference). Points will only be given when explicit calculations are shown for each layer. (5)2. VGG (Simonyan et al.) has an extremely homogeneous architecture that only performs 3×3 convolutions with stride 1 and pad 1 and 2×2 max pooling with stride 2 (and no padding) from the beginning to the end. However VGGNet is very expensive to evaluate and uses a lot more memory and parameters. Refer to VGG19 architecture on page 3 in Table 1 of the paper by Simonyan et al. You need to complete Table 1 below for calculating activation units and parameters at each layer in VGG19 (without counting biases). Its been partially filled for you. (6)3. VGG architectures have smaller filters but deeper networks compared to Alexnet (3×3 compared to 11×11 or 5×5). Show that a stack of N convolution layers each of filter size F × F has the same receptive field as one convolution layer with filter of size (NF − N + 1) × (NF − N + 1). Use this to calculate the receptive field of 3 filters of size 5×5. (4)4. The original Googlenet paper (Szegedy et al.) proposes two architectures for Inception module, shown in Figure 2 on page 5 of the paper, referred to as naive and dimensionality reduction respectively. (a) What is the general idea behind designing an inception module (parallel convolutional filters of different sizes with a pooling followed by concatenation) in a convolutional neural network ? (3)Layer Number of Activations (Memory) Parameters (Compute) Input 224*224*3=150K 0 CONV3-64 224*224*64=3.2M (3*3*3)*64 = 1,728 CONV3-64 224*224*64=3.2M (3*3*64)*64 = 36,864 POOL2 112*112*64=800K 0 CONV3-128 CONV3-128 POOL2 56*56*128=400K 0 CONV3-256 CONV3-256 56*56*256=800K (3*3*256)*256 = 589,824 CONV3-256 CONV3-256 POOL2 0 CONV3-512 28*28*512=400K (3*3*256)*512 = 1,179,648 CONV3-512 CONV3-512 28*28*512=400K CONV3-512 POOL2 0 CONV3-512 CONV3-512 CONV3-512 CONV3-512 POOL2 0 FC 4096 FC 4096 4096*4096 = 16,777,216 FC 1000 TOTAL Table 1: VGG19 memory and weights(b) Assuming the input to inception module (referred to as ”previous layer” in Figure 2 of the paper) has size 32x32x256, calculate the output size after filter concatenation for the naive and dimensionality reduction inception architectures with number of filters given in Figure 1. (4) (c) Next calculate the total number of convolutional operations for each of the two inception architecture again assuming the input to the module has dimensions 32x32x256 and number of filters given in Figure 1. (4)(d) Based on the calculations in part (c) explain the problem with naive architecture and how dimensionality reduction architecture helps (Hint: compare computational complexity). How much is the computational saving ? (2+2) Figure 1: Two types of inception module with number of filters and input size for calculation in Question 3.4(b) and 3.4(c).References: • (Alexnet) Alex Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks. Paper available at https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutionalneural-networks.pdf • (VGG) Karen Simonyan et al. Very Deep Convolutional Networks for Large-scale Image Recognition. Paper available at https://arxiv.org/pdf/1409.1556.pdf • (Googlenet) Christian Szegedy et al. Going deeper with convolutions. Paper available at https://arxiv.org/pdf/1409.4842.pdfIn this problem we will be achieving large-batch SGD using batch augmentation techniques. In batch augmentation instances of samples within the same batch are generated with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling. One such augmentation scheme is using Cutout regularization, where additional samples are generated by occluding random portions of an image.1. Explain cutout regularization and its advantages compared to simple dropout (as argued in the paper by DeVries et al) in your own words. Select any 2 images from CIFAR10 and show how does these images look after applying cutout. Use a square-shaped fixed size zero-mask to a random location of each image and generate its cutout version. Refer to the paper by DeVries et al (Section 3) and associated github repository. (2+4)2. Using CIFAR10 datasest and Resnet-44 we will first apply simple data augmentation as in He et al. (look at Section 4.2 of He et al.) and train the model with batch size 64. Note that testing is always done with original images. Plot validation error vs number of training epochs. (4)3. Next use cutout for data augmentation in Resnet-44 as in Hoffer et al. and train the model and use the same set-up in your experiments. Plot validation error vs number of epochs for different values of M (2,4,8,16,32) where M is the number of instances generated from an input sample after applying cutout M times effectively increasing the batch size to M ·B, where B is the original batch size (before applying cutout augmentation).You will obtain a figure similar to Figure 3(a) in the paper by Hoffer et al. Also compare the number of epochs and wallclock time to reach 94% accuracy for different values of M. Do not run any experiment for more than 100 epochs. If even after 100 epochs of training you did not achieve 94% then just report the accuracy you obtain and the corresponding wallclock time to train for 100 epochs. Before attempting this question it is advisable to read paper by Hoffer et al. and especially Section 4.1. (5+5)You may reuse code from github repository associated with Hoffer et al. work for answering part 2 and 3 of this question.References: • DeVries et al. Improved Regularization of Convolutional Neural Networks with Cutout. Paper available at https://arxiv.org/pdf/1708.04552.pdf Code available at https://github.com/uoguelph-mlrg/Cutout • Hoffer et al. Augment your batch: better training with larger batches. 2019 Paper available at https://arxiv.org/pdf/1901.09335.pdf Code available at https://github.com/eladhoffer/convNet.pytorch/tree/master/models • He et al. Deep residual learning for image recognition. Paper available at https://arxiv.org/abs/1512.03385Multilayer layer feedforward network, with as little as two layers and sufficiently large hidden units can approximate any arbitrary function. Thus one can tradeoff between deep and shallow networks for the same problem. In this problem we will study this tradeoff using the Eggholder function defined as: f(x1, x2) = − (x2 + 47) sin rx1 2 + (x2 + 47)− x1 sin p |x1 − (x2 + 47)|Let y(x1, x2) = f(x1, x2) + N (0, 0.3) be the function that we want to learn from a neural network through regression with −512 ≤ x1 ≤ 512 and −512 ≤ x2 ≤ 512. Draw a dataset of 100K points from this function (uniformly sampling in the range of x1 and x2) and do a 80/20 training/test split.1. Assume that total budget for number of hidden units we can have in the network is 512. Train a 1, 2, and 3 hidden layers feedforward neural network to learn the regression function. For each neural network you can consider a different number of hidden units per hidden layer so that the total number of hidden units does not exceed 512. We would recommend to work with 16, 32, 64, 128, 256, 512, hidden units per layer. So if there is only one hidden layer you can have at most 512 units in that layer.If there are two hidden layers, you can have any combination of hidden units in each layer, e.g., 16 and 256, 64 and 128, etc. such that the total is less than 512. Plot the RMSE (Root Mean Square Error) on test set for networks with different number of hidden layers as a function of total number of hidden units. If there are more than one network with the same number of hidden units (say a two hidden layer with 16 in first layer and 128 in second layer and another network with 128 in first layer and 16 in second) you will use the average RMSE.So you will have a figure with three curves, one each for 1, 2, and 3 layer networks, with x-axis being the total number of hidden units. Also plot another curve but with the x-axis being the number of parameters (weights) that you need to learn in the network. (20)2. Comment on the tradeoff between number of parameters and RMSE as you go from deeper (3 hidden layers) to shallow networks (1 hidden layer). Also measure the wall clock time for training each configuration and plot training time vs number of parameters. Do you see a similar tradeoff in training time ? (10)For networks with 2 and 3 layers you will use batch normalization as regularization. For hidden layers use ReLU activation and for training use SGD with Nesterov momentum. Take a batch size of 1000 and train for 2000 epochs. You can pick other hyperparameter values (momentum, learning rate schedule) or use the default values in the framework implementation.

$25.00 View

[SOLVED] Homework 2 coms e6998 problem 1 – perceptron 15 points consider a 2-dimensional data set in which all points

Consider a 2-dimensional data set in which all points with x1 > x2 belong to the positive class, and all points with x1 ≤ x2 belong to the negative class. Therefore, the true separator of the two classes is linear hyperplane (line) defined by x1 − x2 = 0.Now create a training data set with 20 points randomly generated inside the unit square in the positive quadrant. Label each point depending on whether or not the first coordinate x1 is greater than its second coordinate x2.Now consider the following loss function for training pair (X, y ¯ ) and weight vector W¯ : L = max{0, a − y(W¯ · X¯)}, where the test instances are predicted as ˆy = sign{W¯ · X¯}. For this problem, W¯ = [w1, w2], X¯ = [x1, x2] and ˆy = sign(w1x1 + w2x2). A value of a = 0 corresponds to the perceptron criterion and a value of a = 1 corresponds to hinge-loss.1. Implement the perceptron algorithm without regularization, train it on the 20 points above, and test its accuracy on 1000 randomly generated points inside the unit square. Generate the test points using the same procedure as the training points. (6)2. Change the perceptron criterion to hinge-loss in your implementation for training, and repeat the accuracy computation on the same test points above. Regularization is not used. (5) 3. In which case do you obtain better accuracy and why? (2)4. In which case do you think that the classification of the same 1000 test instances will not change significantly by using a different set of 20 training points? (2)Read the two blogs, one by Andre Pernunicic and other by Daniel Godoy on weight initialization. You will reuse the code at github repo linked in the blog for explaining vanishing and exploding gradients. You can use the same 5 layer neural network model as in the repo and the same dataset.1. Explain vanishing gradients phenomenon using standard normalization with different values of standard deviation and tanh and sigmoid activation functions. Then show how Xavier (aka Glorot normal) initialization of weights helps in dealing with this problem. Next use ReLU activation and show that instead of Xavier initialization, He initialization works better for ReLU activation. You can plot activations at each of the 5 layers to answer this question. (10)2. The dying ReLU is a kind of vanishing gradient, which refers to a problem when ReLU neurons become inactive and only output 0 for any input. In the worst case of dying ReLU, ReLU neurons at a certain layer are all dead, i.e., the entire network dies and is referred as the dying ReLU neural networks in Lu et al (reference below). A dying ReLU neural network collapses to a constant function. Show this phenomenon using any one of the three 1-dimensional functions in page 11 of Lu et al. Use a 10-layer ReLU network with width 2 (hidden units per layer). Use minibatch of 64 and draw training data uniformly from [− √ 7, √7]. Perform 1000 independent training simulations each with 3,000 training points. Out of these 1000 simulations, what fraction resulted in neural network collapse. Is your answer close to over 90% as was reported in Lu et al. ? (10)3. Instead of ReLU consider Leaky ReLU activation as defined below: φ(z) =  z if z > 0 0.01z if z ≤ 0.Run the 1000 training simulations in part 2 with Leaky ReLU activation and keeping everything else same. Again calculate the fraction of simulations that resulted in neural network collapse. Did Leaky ReLU help in preventing dying neurons ? (10)References: • Andre Perunicic. Understand neural network weight initialization. Available at https://intoli.com/blog/neural-network-initialization/ • Daniel Godoy. Hyper-parameters in Action Part II — Weight Initializers. • Initializers – Keras documentation. https://keras.io/initializers/. • Lu Lu et al. Dying ReLU and Initialization: Theory and Numerical Examples .Batch normalization and Dropout are used as effective regularization techniques. However its not clear which one should be preferred and whether their benefits add up when used in conjunction. In this problem we will compare batch normalization, dropout, and their conjunction using MNIST and LeNet-5 (see e.g., https://engmrk.com/lenet-5-a-classic-cnn-architecture/). LeNet-5 is one of the earliest convolutional neural network developed for image classification and its implementation in all major framework is available.You can refer to Lecture 3 slides for definition of standardization and batch normalization. 1. Explain the terms co-adaptation and internal covariance-shift. Use examples if needed. You may need to refer to two papers mentioned below to answer this question. (5)2. Batch normalization is traditionally used in hidden layers, for input layer standard normalization is used. In standard normalization the mean and standard deviation are calculated using the entire training dataset whereas in batch normalization these statistics are calculated for each mini-batch. Train LeNet-5 with standard normalization of input and batch normalization for hidden layers. What are the learned batch norm parameters for each layer ? (5)3. Next instead of standard normalization use batch normalization for input layer also and train the network. Plot the distribution of learned batch norm parameters for each layer (including input) using violin plots. Compare the train/test accuracy and loss for the two cases ? Did batch normalization for input layer improve performance ? (5)4. Train the network without batch normalization but this time use dropout. For hidden layers use dropout probability of 0.5 and for input layer take it to be 0.2 Compare test accuracy using dropout to test accuracy obtained using batch normalization in part 2 and 3. (5)5. Now train the network using both batch normalization and dropout. How does the performance (test accuracy) of the network compare with the cases with dropout alone and with batch normalization alone ? (5)References: • N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R.Salakhutdinov . Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Available at at https://www.cs.toronto.edu/ rsalakhu/papers/srivastava14a.pdf. • S. Ioffe, C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Available at https://arxiv.org/abs/1502.03167.Recall cyclical learning rate policy discussed in Lecture 4. The learning rate changes in cyclical manner between lrmin and lrmax, which are hyperparameters that need to be specified. For this problem you first need to read carefully the article referenced below as you will be making use of the code there (in Keras) and modifying it as needed. For those who want to work in Pytorch there are open source implementations of this policy available which you can easily search for and build over them. You will work with FashionMNIST dataset and MiniGoogLeNet (described in reference).1. Summarize FashionMNIST dataset, total dataset size, training set size, validation set size, number of classes, number of images per class. Show any 3 representative images from any 3 classes in the dataset. (3)2. Fix batch size to 64 and start with 10 candidate learning rates between 10−9 and 101 and train your model for 5 epochs. Plot the training loss as a function of learning rate. You should see a curve like Figure 3 in reference below. From that figure identify the values of lrmin and lrmax. (5)3. Use the cyclical learning rate policy (with exponential decay) and train your network using batch size 64 and lrmin and lrmax values obtained in part 1. Plot train/validation loss and accuracy curve (similar to Figure 4 in reference). (5)4. Fix learning rate to lrmin and train your network starting with batch size 64 and going upto 8192. If your GPU cannot handle large batch sizes, you can employ effective batch size approach as discussed in Lecture 3 to simulate large batches. Plot the training loss as a function of batch size. Do you see a similar behavior of training loss with respect to batch size as seen in part 2 with respect to learning rate ? (5)5. Can you identify bmin and bmax from the figure in part 4 for devising a cyclical batch size policy ? Create an algorithm for automatically determining batch size and show its steps in a block diagram as in Figure 1 of reference. (4)6. Use bmin and bmax values identified in part 3 and devise a cyclical batch size policy such that the batch size changes in a cyclical manner between bmin and bmax. In part 3 we did exponential decrease in learning rate as training progress. What should be an analogous trajectory for batch size as training progresses, exponential increase or decrease ? Use cyclical batch size policy (with appropriate trajectory) and train your network using learning rate lrmin. (6)7. Compare the best accuracy from the two cyclical policies. Which policy gives you the best accuracy ? (2) PS: In part 3 of problem we are doing cyclical learning rate with exponential decay. The code under ”Keras Learning Rate Finder” in the blog implements triangular policy, you may need to change it to have exponential decay as mentioned in the first reference below. For part 4 and 6, you will be writing your own python project ”Keras Batch Finder”.References: 1. Leslie N. Smith Cyclical Learning Rates for Training Neural Networks. Available at https://arxiv.org/abs/1506.01186. 2. Keras implementation of cyclical learning rate policy. Available at https://www.pyimagesearch.com/2019/08/05/keraslearning-rate-finder/.

$25.00 View

[SOLVED] Homework 1 coms e6998 problem 1 – linear separability 10 points consider a dataset with two features x1 and x2 in which the points

Consider a dataset with two features x1 and x2 in which the points (−1, −1),(1, 1),(−3, −3),(4, 4) belong to one class and (−1, 1),(1, −1),(−5, 2),(4, −8) belong to the other.1. Is this dataset linearly separable ? Can a linear classifier be trained using features x1 and x2 to classify this data set ? You can plot the dataset points and argue. (2) 2. Can you define a new 1-dimensional representation z in terms of x1 and x2 such that the dataset is linearly separable in terms of 1-dimensional representation corresponding to z ? (4) 3. What does the separating hyperplane looks like ? (2) 4. Explain the importance of nonlinear transformations in classification problems. (2)1. Derive the bias-variance decomposition for a regression problem, i.e., prove that the expected mean squared error of a regression problem can be written as E[MSE] = Bias2 + V ariance + NoiseHint: Let y(x) = f(x) +  be the true (unknown) relationship and ˆy = g(x) be the model predicted value of y. Then MSE over test instance xi , i = 1, . . . , t, is given by: MSE = 1 t Xt i=1 (f(xi) +  − g(xi))2 (5)2. Consider the case when y(x) = x + sin(1.5x) + N (0, 0.3), here f(x) = x + sin(1.5x) and  = N (0, 0.3). Create a dataset of size 20 points by randomly generating samples from y. Display the dataset and f(x). Use scatter plot for y and smooth line plot for f(x). (5)3. Use weighted sum of polynomials as an estimator function for f(x), in particular, let the form of estimator function be: gn(x) = β0 + β1x + β2x 2 + ….. + βnx nConsider three candidate estimators, g1, g3, and g10. Estimate the coefficients of each of the three estimators using the sampled dataset and plot f(x), g1(x), g3(x), g10(x). Which estimator is underfitting ? Which one is overfitting ? (10)4. Generate 100 datasets (each of size 50) by randomly sampling from y. Partition each dataset into training and test set (80/20 split). Next fit the estimators of varying complexity, i.e., g1, g2, ….g15 using the training set for each dataset. Then calculate and display the squared bias, variance, and error on testing set for each of the estimators showing the tradeoff between bias and variance with model complexity. Can you identify the best model ? (10)5. One way to increase model bias is by using regularization. Lets take the order 10 polynomial and apply L2 regularization. Compare the bias, variance, and MSE of the regularized model with the unregularized order 10 polynomial model ? Does the regularized model have a higher or lower bias ? What about MSE ? Explain. (10)OpenML (https://www.openml.org) has thousands of datasets for classification tasks. Select any 2 datasets from OpenML with different number of output classes. 1. Summarize the attributes of each dataset: number of features, number of instances, number of classes, number of numerical features, number of categorical features. (5)2. For each dataset, select 80% of data as training set and remaining 20% as test set. Generate 10 different subsets of the training set by randomly subsampling 10%, 20%, . . . , 100% of the training set. Use each of these subsets to train two different classifiers: Random forest and Gradient boosting. When training a classifier also measure the wall clock time to train. After each training, evaluate the accuracy of trained models on the test set.Report model accuracy and training time for each of the 10 subsets of the training set. Generate learning curve for each classifier. A learning curve shows how the accuracy changes with increasing size of training data. Also create a curve showing the training time of classifiers with increasing size of training data. So, for each dataset you will have two figures: First figure showing learning curves (x-axis being training data size and y-axis accuracy) for the two classifiers and second Figure showing training time for the two classifiers as a function of training data size. (15)3. Study the scaling of training time and accuracy of classifiers with training data size using the two figures generated in part 2 of the question. Compare the performance of classifiers in terms of training time and accuracy and write 3 main observations. Which gives better accuracy ? Which has shorter training time ? (5)This question is based on two papers, one from ICML 2006 and other from NIPS 2015 (details below). ICML paper talks about the relationship between ROC and Precision-Recall (PR) curves and shows a one-to-one correspondence between them. NIPS paper introduces Precision-Recall-Gain (PRG) curves. You need to refer to the two papers to answer the following questions.1. Does true negative matter for both ROC and PR curve ? Argue why each point on ROC curve corresponds to a unique point on PR curve ? (5)2. Select one OpenML dataset with 2 output classes. Use two binary classifiers (Adaboost and Logistic regression) and create ROC and PR curves for each of them. You will have two figures: one containing two ROC and other containing two PR curves. Show the point where an all positive classifier lies in the ROC and PR curves. (10)3. NIPS paper defined PR Gain curve. Calculate AUROC (Area under ROC), AUPR (Area under PR), and AUPRG (Area under PRG) for two classifiers and compare. Do you agree with the conclusion of NIPS paper that practitioners should use PR gain curves rather than PR curves. (10)Related papers: • Jesse Davis, Mark Goadrich, The Relationship Between Precision-Recall and ROC Curves, ICML 2006. https://www.biostat.wisc.edu/ page/rocpr.pdf • Peter A. Flach and Meelis Kull, Precision-Recall-Gain Curves: PR Analysis Done Right, NIPS 2015. https://papers.nips.cc/paper/5867-precision-recall-gain-curves-pr-analysis-done-right

$25.00 View

[SOLVED] Cs 212 project 4 frequency count of words write a program that will read from a file and display

Write a program that will read from a file and display to the console all the words in the file in alphabetical order along with the number of times each word appeared in the file. For example, if the input file is: This is an example, of an input file for project four, as an example. The output to the console would be: an – 3 example – 2 file – 1 for – 1 four – 1 input – 1 is – 1 of – 1 project – 1 This – 1 To get the individual words from each line of the file use a regular expression (perhaps with the split method of class String) as shown in lecture. Note that the input file may contain reasonable punctuation marks separating the words.Use a TreeMap to store the words and their counts (that is, TreeMap ) You will need to use the wrapper class Integer to hold the count of the words as TreeMaps do not store primitives.Allow the user to select the input file using a JFileChooser. Since this project can be done with one class, you do not need to create a jar file. You can submit the file Project4.java to Blackboard. Make sure you upload the correct file by the due date (which is also the cutoff date) as there will be no opportunities for resubmission of projects.

$25.00 View

[SOLVED] Cs 212 project 3 continuing with the word puzzle winning the game

Let the player know if he or she won the game by guessing all the words on the solutions list. Show a MessageDialog in this case, and ask the user if he or she would like to play again.Create a File Menu in your GUI Add a file menu to your game GUI with options to open any file for reading (and processing the file as in Project 2), and one to Quit the program. You will need a FileMenuHandler class to handle the events from the FileMenu. Be sure to use getAbsolutePath() when getting the file from the JFileChooser, not getName().Handle Exceptions Create an exception called IllegalWordException (by extending IlegalArgumentException as shown in lecture) and have the constructor of the Word throw it. A Word is illegal if it doesn’t contain all lowercase letters. Use a try/catch statement to catch this exception in your program, and show the erroneous Words in the console. A data file will be provided that has illegal words in it.Create a jar file called Project3.jar and submit that to Blackboard by the due date for full credit. Be sure your jar file contains .java files, not .class files.

$25.00 View

[SOLVED] Cs 212 project 2 improving on the word game add the following improvements to the word game.

Add the following improvements to the word game. (1) The first letter of the subject letters (the first line of the input file) must be contained in all the correct guessed words. (2) If a guessed word contains ALL of the subject letters, that is worth 3 points. (3) Display the correctly guessed words in alphabetical order.Create a class called WordNode which has fields for the data (a Word) and next (WordNode) instance variables. Include a one-argument constructor which takes a Word as a parameter. (For hints, see the PowerPoint on “Static vs. Dynamic Structures”.) public WordNode (Word w) { . . }The instance variables should have protected access. Create an abstract linked list class called WordList. This should be a linked list with head node as described in lecture. Modify it so that the data type in the nodes is Word. The no-argument constructor should create an empty list with first and last pointing to an empty head node, and length equal to zero. Include an append method in this class.Create two more linked list classes that extend the abstract class WordList: One called UnsortedWordList and one called SortedWordList, each with appropriate no-argument constructors.Each of these classes should have a method called add(Word) that will add a new node to the list. In the case of the UnsortedWordList it will add it to the end of the list by calling the append method in the super class. In the case of the SortedWordList it will insert the node in the proper position to keep the list sorted.Instantiate two linked lists, one sorted and one unsorted. Add the solutions from the input file to the unsorted linked list. This list will be searched to see if a guessed word matches. As words are correctly guessed, add them to the sorted list and display the contents of that list in the TextArea for the guessed words. Check the method setText in class TextArea to update the contents of the TextArea.Submit a jar file. Rather than upload all the files for this project separately, we will use Java’s facility to create the equivalent of a zip file that is known as a Java ARchive file, or “jar” file. Instructions on how to create a jar file using Eclipse are on Blackboard. Be sure to include source files (.java files), not class files in your jar. Create a jar file called Project2.jar and submit that.

$25.00 View

[SOLVED] Cs 212 project 1 word game this project is loosely based on a word puzzle called the spelling beehive

This project is loosely based on a word puzzle called the Spelling Beehive found in the Sunday New York Times magazine. In it, a player is given a set of seven letters and has to find as many words as possible using some portion, but at least five, of those seven letters. Letters may be used more than once. Each correct word earns one point.The input file To make a simple example, let’s suppose the player is given just four letters (instead of the seven we will use for this project) and has to make words of at least three letters. The first line of the input file will be the letters to use, and the rest of the input file will contain solutions that would be hidden from the user. Here is an example: PRTA PART TARP ART RAT APART TRAP Etc. You program should read the first line into a String variable for the letters, and the rest of the file into an array of Strings against which the user’s guesses can be matched.Create a GUI for the puzzle with a grid layout of one row and two columns. In the left column put the puzzle letters, and in the right column display the words that the user has found so far (words the user has guessed and your program has found on the solutions list.) and the user’s score. Accept words from the user via a JOptionPane.MessageDialogs should be shown to the user in the following cases: 1. The user has used a letter that is not one of the seven letters given. 2. The user’s guess is less than 5 letters long. 3. The user’s guess is not in the solutions list. Submitting the Project. You should have two files to submit for this project: Project1.java PuzzleGUI.java Upload your project files to Blackboard by the due date for full credit.

$25.00 View

[SOLVED] Csci 212 – object-oriented programming in java project 0

This project is intended for you to use programming concepts you learned in CSCI 111 (decision statements, loop statements) and apply them in a simple Java program (using some of the classes covered in lecture and lab). In addition, you will submit the project through Blackboard to make sure it is clear how to do that.We will look at your coding style, documentation (comments) and, of course, that the project works. Check out the grading criteria in the Projects folder in Blackboard. Ignore Javadoc for this project.Write a Java program that will 1. Ask the user to type in a sentence, using a JOptionPane.showInputDialog(). 2. The program will examine each letter in the string and count how many time the upper-case letter ‘E’ appears, and how many times the lower-case letter ‘e’ appears. The key here is to use the charAt method in class String.3. Using a JOptionPane.showMessageDialog(), tell the user how many upper and lower case e’s were in the string. 4. Repeat this process until the user types the word “Stop”. (Check out the method equalsIgnoreCase in class String to cover all upper/lower case possibilities of the word “STOP”).Program Submission:  The name of your file must be Project0.java with upper/lower case exact.  Your program (the .java file, not the .class file) should be submitted through Blackboard by uploading the file. In the “Comments” field of the assignment put your name and lab section.  Note that Blackboard lets you submit the assignment only once. If you make a mistake, you will have to ask Dr. Lord to clear the project, and you may lose time.  The program is due by midnight on the date indicated at the top of this handout, and will not be accepted after the cutoff date. The date of submission to Bb is the official date for your project.

$25.00 View

[SOLVED] Cse 4360 / 5364 – autonomous robots project 3

The final robot projects will use the same Lego EV3 robot. For the final project, groups have a choice among several different projects. If you would like to modify the project or propose your own you need to have the modified project approved by the instructor first.The goal of this project is to design a robot that can move through a room environment similar to the one used for the second project (again, its layout is unknown beforehand) and clean up a number of colored blocks into the appropriate locations.For this task, a number of blocks (approximately 5cm x 5cm x 5cm in size) of two different colors (red and blue) will be distributed in random locations in a room environment. Blocks will be initially on top (and in the center) of a colored floor tile. In addition, there will be two corners in the environment that are color coded (red and blue) that serve as the deposit areas (and for which the locations are known beforehand). The task of the robot is to find the blocks and to bring them to the matching corner and leave them there.The following figure shows an example environment:The goal of this project is to build and program a robot arm that can draw polygons using a marker. Here you should build a robot arm that can move a marker mounted at its end across a piece of paper to draw arbitrary polygons given to it (at compile time). Given that polygon, the robot should move the marker to the first corner (without drawing a line) and then trace the shape on the paper. Drawing will be limited to a 15cm x 10cm area that can be located based on the kinematic characteristics of the robot constructed.The following shows an example scenario: X YThe goal of this project is to build and program a robot that can climb up and down a set of stairs with 10cm steps. The depth of a step can be variable but its height will be fixed. In addition steps can be winding upward (i.e. each step can be at an angle with respect to the previous one). The ends of each step will be marked in black.For this project it is necessary that at least 2 teams choose it so they can play against each other. The goal of the project is to build a robot that can play a simplified version of soccer using an IR ball and IR seeker sensors. Two teams will play against each other on a field that has 3 zones: two defense zones that only the robot of the defensive team can be in and a middle zone that both teams’ robot can be in.The goal is for each team is to have the ball cross the other team’s base line in order to score a goal. Once a goal is scored, the robots are moved into their team’s defense zone and the ball is placed in the center. Then the game is started again with the team that has been scored on getting a 1s head start.Each participating team will receive an IR seeker sensor that provides a direction signal towards the ball which is equipped with a set of IR LEDs. Each robot has to fit within 0.75f t × 0.75f t.

$25.00 View

[SOLVED] Cse 4360 / 5364 – autonomous robots project 2

The goal of this project is to design a behavior-based object finding and removal robot that is able to move from an unknown position in an in-door environment to look for an object, raise an alarm, and push the object off its location on the floor. The robot should use a set of behaviors, including “wander” (search) and “wall following”.The walls of the rooms will be marked with 2” wide (1.88” is ok) blue painters tape and the object will be an empty soda can sitting on top of a red square made with red painters tape. There will be no ”door” openings to the outside of the house and the robot is to find the object, move towards it, indicate when it is within 1 foot of the object (using either a tone or a light), and then push it off the marking it is sitting on. The following figure shows an example environment:Object The behavioral repertoire of your robot should include “wander”, i.e a behavior that enables the robot to move in freespace looking for either a wall or the object, “wall following” which should permit the robot to move along the wall to be able to get to all the rooms in the environment (you might want to implement only one direction, i.e. clockwise or counterclockwise wall following), “goal finding” which should allow you to detect the object and move to it, and a ”clearing” behavior which can remove the can from the mark on the floor.As the walls are represented by blue painters tape, it is ok for the wall detection sensor (the color sensor) to cross on top of the wall. However, the center of the robot is not allowed to ever move on top of the blue tape. All walls will be either horizontal or vertical (i.e. all angles in the environment will be right angles), any piece of wall will be at least 1 foot long, and the marking on the floor under the can will be an area of red painters tape covering a 1 square foot area.At the end of the project each group has to hand in a report, the code, and a recording or your system, and give a short demonstration of their robot. During this demonstration you shouldprovide a short description of the robot and of the details of your behavior-based control system.1. Build a mobile robot for this task. Using the parts in your robot kit, build a mobile robot for the task. (In this assignment the robot has to be able to detect and follow “walls” and to detect the object. Robot localization, on the other hand, is not important since the start location of the robot and the location of the object will not be known. One way to perform “wall following” in the given environment would be to use the color sensor to keep track of the wall.)Your project report should include a short description of your robot design (including the critical design choices made).2. Implement “wander”, “wall following”, ”goal finding”, and ”clearing” on the robot. To address the given task you have to implement a “wander” (search), a ‘wall following”, a ”goal finding” and identification, and a ”clearing” behavior for your robot. “Wander” is intended here to move the robot through freespace to a wall, “wall following” is intended to permit the robot to move between rooms, ”goal finding” is intended to locate the object, and ”clearing” is intended to move the object off of its current location on the floor.To integrate these behaviors you also have to implement a behavior coordination mechanism (e.g. subsumption, weighted averaging, etc.). Once the object has been found and your system has moved closer than 1 foot, your robot should indicate this by starting an alarm and it should then attempt to clear the object from the location (the easiest would simply be to push it for at least 1 foot to make sure it clears the area it is standing on).Your report should contain a description of the important components of your control system. The submission should also contain the actual code for the robot and a recording of the system performing the task.

$25.00 View

[SOLVED] Cse 4360 / 5364 – autonomous robots project 1

The objective of this assignment is to navigate a mobile robot through an obstacle course to a goal location. The start position of the robot as well as the locations of all obstacles and of the goal are given before the robot is started.The workspace for the robot is a rectangular area, 4.88 m x 3.05 m in size (this corresponds to exactly 16 x 10 floor tiles in the lab). Obstacles are black cardboard squares 0.305 m x 0.305 m in size (the size of 1 floor tile) which will be placed in the workspace. The goal is a black circle with a radius of 0.305 m. To simplify experiments, the center of the goal area, of the obstacles, and of the start will coincide with the intersection point of four floor tiles and their orientation will be aligned with the tiles. An example of such an obstacle course is shown in the figure below.Obstacles, goal, and start location of the robot will be selected arbitrarily with the restriction that if a path exists, then there will always be a path which is at least 0.61 m (2 tiles) wide. The location of obstacles, start, and goal will be provided at compile time so it is not necessary to write interactive input routines. One possibility is to include them as a header file by providing the coordinates of their centers. For the obstacle course shown above this could look as follows:#define MAX_OBSTACLES 25 /* maximum number of obstacles */ int num_obstacles = 13; /* number of obstacles */ double obstacle[MAX_OBSTACLES][2] = /* obstacle locations */ {{0.61, 2.743},{0.915, 2.743},{1.219, 2.743},{1.829, 1.219}, {1.829, 1.524},{ 1.829, 1.829}, {1.829, 2.134},{2.743, 0.305}, {2.743, 0.61},{2.743, 0.915},{2.743, 2.743},{3.048, 2.743}, {3.353, 2.743}, {-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1},{-1,-1}, {-1,-1},{-1,-1},{-1,-1}}; double start[2] = {0.305, 1.219}; /* start location */ double goal[2] = {3.658, 1.829}; /* goal location */At the end of the project each group has to hand in a report and give a short demonstration of their robot. During this demonstration you should provide a short description of the robot and navigation system, and be prepared to answer some basic questions.The Project 1. Build a mobile robot for this task. Using the parts in your robot kit, build a mobile robot for the navigation task. (Since the robot has to be able to navigate through the course, an important design criterion might be that the robot is able to keep track of its position and maybe to detect the goal once it is reached). Your project report should include a short description of your robot design (including the critical design choices made).2. Implement a navigation strategy to address the task. Implement a navigation strategy which will permit your robot to accomplish the navigation task with arbitrary obstacle, goal, and start configurations (subject to the constraints described above). The speed of your robot and the length of the path generated by your robot are not of major important here. The main objective is reaching the goal while not hitting any obstacles (i.e. without crossing over one of the obstacle tiles). In addition, the robot has to stay within the assigned workspace.Your report should contain a description of the important components of your navigation strategy and the actual code for the robot.Words of Caution Dead reckoning (i.e. keeping track of the position of the robot using only internal sensors) for the Lego Robots can be relatively imprecise and navigation strategies might therefore sometimes fail. Don’t get discouraged if your robot does not succeed all the time. Also, moving at slower speeds can improve the overall precision.

$25.00 View

[SOLVED] Cse1320-004/005 hw 3 purpose: use bit operations to extract information

 Assignment: Computers sometimes use a format called binary coded decimal or ‘BCD’.  Good background information is available from https://en.wikipedia.org/wiki/Binary-coded_decimal , but it is far deeper than you need for this assignment. (read at your own risk, the section marked “Basics” covers all you need to know at this point in time). BCD encodes a decimal number into 4 binary digits.  A byte holds 2 BCD digits.   A 32 bit word can contain 8 packed BCD digits.  Example: The decimal number 4660. Expressed as hex, it is 0x1234. In BCD, it is 0100 0110 0110 0000 or 4660 BCD or 18016 base 10. The decimal number 9999.  As hex it is 0x270F In BCD, it is 1001 1001 1001 1001, or 9999 BCD, or 39,321 base 10. Write a program that accepts an integer input (base 10) to stdin.   This input value is in packed BCD format.  Using the packed BCD format, display the number as a sequence of characters to stdout.  The sequence of characters is a simulation of a calculator’s display (5 characters wide, 5 characters tall, with a space between each).   Grading Criteria:   Example Output: $ gcc -Wall -std=c99 -g hw4.c$ ./a.out4660—   —   —   —        —-  —-|   | |   | |   | |   |     |    /      | ||   | |   | |   | |   |     |   /   —-  —-|   | |   | |   | |   |     |  /        |   ^—   —   —   —        —-  —-    ^305419896—-  —-       —-        —-   —|    /      | |    |     |        /  |   ||   /   —-  —- —-   —    /    —|  /        |   ^      | |   |  /    |   |—-  —-    ^  —-   —         —39321—   —   —   —   —   —   —   —|   | |   | |   | |   | |   | |   | |   | |   ||   | |   | |   | |   |  —   —   —   —|   | |   | |   | |   |     |     |     |     |—   —   —   —-1

$25.00 View

[SOLVED] Cse1320 hw2 purpose: experience using arrays and numeric functions.

  Write a program that demonstrates the correct operation of the described sin_() function. float sin_ (float input_angle) The calculation of this trigonometric value will be performed in a multi-stage process; first a look up table (array) will be built storing the sin() for all values between 0 degrees and 359 degrees using a taylor series.  (see https://en.wikipedia.org/wiki/Taylor_series for background and the algorithm to be used). This lookup table is created once and then used when the function is called for the life of the program. The sin_() function will use the lookup table to linearly interpolate between two values of the lookup table to return the answer. (see https://en.wikipedia.org/wiki/Linear_interpolation for background and the equation to be used for linear interpolation). The advantage to this design is the function call is very fast, requiring only some addition and division.  Slower calculations, like the calculation of the sin in the lookup table is done only one time during initialization. An example of using a look up table and interpolation:  To solve for X = 7 Y =   (25(10-7) +100(7-5) ) / (10-5)Y  =  (75 +200) / 5Y =   55 In this example, the function we are interpolating is Y=X*X.  An X value of 7 would give 49, so the Error in using the lookup table and interpolation is: Error = Correct – Measured Error = 49 – 55 =  -6 You will find the sin() function calculated as described above with be much closer to actual value. Specific Requirements for the sin_() function:  Specific Requirements for the program: Grading Criteria:    Example Output.  (Note the numerical accuracy may differ depending on your code, the specific compiler used, and the size of operands used) Note.  These examples are not exhaustive and do not test all of the functionality described for this homework assignment.  [bdavis@localhost hw2]$ ./a.out -1  [bdavis@localhost hw2]$ ./a.out 0 0.000000 0.000000 0.000000 0.000000 90 90.000000 1.000004 1.000000 -0.000004 180 180.000000 -0.006925 -0.000000 0.006925 270 270.000000 -1.000004 -1.000000 0.000004 359 359.000000 -0.017453 -0.017453 -0.000000 -1 [bdavis@localhost hw2]$ ./a.out ofile [bdavis@localhost hw2]$ gnuplot gnuplot> plot “ofile” using 1,”ofile” using 2,”ofile” using 3,”ofile” using 4 gnuplot> quit

$25.00 View

[SOLVED] Cse1320 hw1 purpose: gain experience in the mechanics of editing, compiling

  Write a program that will calculate and display the mean, median, sum, max, and min of a provided sequence of 5  numbers. The return types of these functions should be consistent with the result.  Example; The sum of two integers is always an integer, so an ‘int’ would be the correct return value for the “sum” function. It is acceptable for functions to be used by other functions you have written for this assignment.   25 5 0 0 2 min = 0 max = 25 median = 2 sum = 32 mean = 6.4000000 0 0 0 0 min = 0 max = 0 median = 0 sum = 0 mean = 0.000000100 100 100 100 100 min = 100 max = 100 median = 100 sum = 500 mean = 100.0000001 2 3 4 5 min = 1 max = 5 median = 3 sum = 15 mean = 3.0000005 4 3 2 1 min = 1 max = 5 median = 3 sum = 15 mean = 3.000000 Grading Criteria: 

$25.00 View

[SOLVED] Cse 590-54/59 mid-term programming portion

Data Description: The Metropolitan Museum of Art presents over 5,000 years of art from around the world for everyone to experience and enjoy. The Museum lives in three iconic sites in New York City—The Met Fifth Avenue, The Met Breuer, and The Met Cloisters. Millions of people also take part in The Met experience online.Since it was founded in 1870, The Met has always aspired to be more than a treasury of rare and beautiful objects. Every day, art comes alive in the Museum’s galleries and through its exhibitions and events, revealing both new ideas and unexpected connections across time and across cultures.The Metropolitan Museum of Art provides select datasets of information on more than 470,000 artworks in its Collection for unrestricted commercial and noncommercial use. Critical Details and Instructions:iii. For problems 1-5, you can manipulate the data-frames/dictionaries as you see fit and using whatever functions/libraries you want.  However, it is critically important that your end results for each problem match the provided variable name (ex: the result of problem 1 is called df_init) so that they are accessible for grading. You should upload your exam via the File Response dialogue through the Blackboard exam – but if you cannot do so, email it to me ASAP. Note that if you are submitting a .py file you are highly encouraged to include a README to explain what should be run to produce the required structures for problems 1-5 and graphs for problem 6.

$25.00 View

[SOLVED] Cse 590 assignment 4

5/5 - (1 vote) 1. (Machine Learning (Classification)) a. Choose one of the toy classification datasets bundled with sklearn other than the digits dataset. b. Train three distinct sklearn classification estimators for the chosen dataset and compare the results to see which one performs the best when using 2-fold cross-validation. Note that you should use three distinct classification models here (not just tweak underlying parameters). A relatively complete listing of the available estimators can be found here (https://scikit-learn.org/stable/supervised_learning.html) — but make sure you only use classifiers! Unless you have an inclination to do otherwise, I recommend using the model default parameters when available. c. Repeat a. for 20-fold cross-validation. Explain in a paragraph the difference in your results when using 20-fold vs 2-fold cross-validation (if any). d. Construct a confusion matrix for your most accurate model between the three estimators and two cross-fold options. Which class in your dataset is most accurately predicted to have the correct label by the best classifier, and and which is most likely to be confused among one or more of the wrong classes? 2 (Option I). (Trends, Searches, and Sentiment) a. Use the Twitter Trends API to determine the available trending topics for a city of your choice, assigning a tweet volume of 5000 to any trend with no volume provided. b. After sorting the trends in descending order by volume, create a bar graph with each (sorted) trend on the x-axis against its volume on the y-axis. c. Use the Twitter Search API to find 20 tweets for each of the three most popular trends in the chosen city, and preprocess their associated tweet text (preferring extended tweet text, if available) in a manner appropriate for tweets. d. Use TextBlob to determine the sentiment for each set of 20 tweets. i. Do you notice a substantial difference in the proportion of positive and negative sentiment for the three trends? Try to theorize why or why not. ii. Do you believe the sentiment analysis to be reliable for any or all of the trend? Explain why or why not. 2 (Option II). (Machine Learning (Regression)) a. Locate a non-proprietary, small-scale dataset suitable for regression online. There are countless sources and repositories than you can use in this task, but if you have trouble finding one, I recommend starting via Kaggle (https://www.kaggle.com/code/rtatman/datasets-for-regression-analysis/notebook). Explain briefly what the dataset represents, what target variable you will be using, and what other features are present. You may want or need to apply preprocessing to your data to insure it can be used properly with the regression models (e.g. making every feature numeric through transformation or by dropping some) b. Train three distinct sklearn regression estimators for the chosen dataset and compare the results to see which one performs the best when using 10-fold cross-validation, utilizing the R-Squared score to gauge performance. Note that you should use two distinct regression models here (not just tweak underlying parameters). A relatively complete listing of the available estimators can be found here (https://scikit-learn.org/stable/supervised_learning.html) — but make sure you only use regression models! Unless you have an inclination to do otherwise, I recommend using the model default parameters when available. c. Repeat part b utilizing the Mean Square Error to gauge performance. Briefly research the difference between the two metrics (MSE and R2), and explain in a paragraph or two i. the difference between them ii. when each one is the preferable metric to use.

$25.00 View

[SOLVED] Cse 590 assignment 3

5/5 - (1 vote) This assignment deals with using textblob and other open-source libraries to perform NLP-based analysis on documents using Python. All parts should use the same three documents (as outlined in Part 1 below). In addition to your .ipynb and/or .py files, you must submit a report document (in .doc or .pdf format) that answers various questions below. Part 1: Select and download three texts of your choosing that represent different media or writing formats (for example, you could choose i. a novel, movie script, and play script or ii. a short story, poem, and novel, etc.) Make sure you briefly descibe your documents and explain the difference between them in a paragraph. Part 2: (a) Compute word counts for each of your documents after excluding English stop words (and optionally, performing lemmatization). (b) Create and display a bar plot for each document that include word counts for the 25 most frequent words (after the above processing). (c) Create and display a word cloud for each document (using a mask image of your choice) that includes only the 100 most frequent words. Note that you’ll likely want to use the approach outlined in Session 25 that utilizes the fitwords method, since you will want data consistent with those for part (b). (d) Do you see any notable difference between the documents wrt (b) and/or (c) above? Try to explain why or why not, and whether you would expect such a difference. Part 3: (a) Use Textatistic to compute the average of the Flesch–Kincaid, Gunning Fog, SMOG, and Dale–Chall scores for each document. (b) Are there noticeable differences among your documents’s readability scores, and do you suspect any difference is present (or should be present)? Part 4: (a) Use spaCy to compute the pairwise similarity between your documents (i.e. doc. 1 to doc. 2, doc. 1 to doc. 3, doc. 2 to doc. 3). (b) Do any of these similarity scores seem higher or lower than you would expect? Explain your response. Part 5: (a) Use spaCy to find the named entities in your documents. (b) Produce a bar plot for each document that includes the count for the 20 most common named entities (by name). (c) Produce a second bar plot per document based on the counts of every named entity type (PERSON, ORG, etc.) (d) Do you notice any meaningful differences (or similarities) among the documents wrt to these plots? If so, explain what they are.

$25.00 View

[SOLVED] Cse 590 assignment 2

5/5 - (1 vote) A small, but important aspect in text mining and natural language processing is measuring word frequency. This assignment deals with a heavily boiled-down exercise in loading a text file into Python and computing word frequency statistics. It requires usage of text files, strings and dataframes, so it is heavily encouraged that you take a look at relevant sessions (14-17) if you have not already done so. (a) Locate a movie script, play script, poem, or book of your choice in .txt format. You are free to choose nearly any novel, movie script, or play that you like, with the qualification that your chosen document must have a minimum of 5 chapters, scenes, and/or acts that distinguish one portion of the document’s narrative from another. For example, the novel “Great Expectations” has 59 chapters, the script for “Jaws” has about 27 scenes, and all or almost all Shakespearean plays have exactly five acts. It is important that for part (e) of the document that these segments exist for your document. Project Gutenburg is a great resource for this if you’re not sure where to start. (b) Load the words of this structure in sequential order of appearance into a one-dimensional Python list (i.e. the first word should be the first element in the list, while the last word should be the last element) that is case insensitive. It’s up to you how to deal with special chacters — you can remove them manually, ignore them during the loading process, or even count them as words, for example. Make sure you have this list clearly assigned to a variable, so we can evaluate it during grading. (c) Use your list to create and print a two-column pandas data-frame with the following properties: i. The first column for each index should represent the word in question at that index ii. The second column should represent the number of times that particular word appears in the text. iii. The rows of the data-frame should be ordered according to the first occurrence of each word. iv. It’s up to you whether or not your data-frame will include an index per row. Make sure you have this data-frame clearly assigned to a variable, so we can evaluate it during grading. Ex: if the first word in your text is “the” which occurs 500 times and the second is “balcony” which only appears twice, your data-frame should begin like the following: Word Count 1 “the” 500 2 “balcony” 2 … … … Again, the indices are optional. (d) Stop-words are commonly used words in a given language that often fail to communicate useful summative information about its content. The attached stop_words.py file has a simple list of common stop words assigned to a variable. For this part of the assigment, you are to create a modified copy of the data-frame from (c) with the following modifications: i. all stop words have been removed from the data-frame and ii. the data frame rows have been sorted in decreasing order of frequency counts. Again, make sure you have this data-frame clearly assigned to a variable, so we can evaluate it during grading. (e) While total word counts can provide a useful measure of the content of a document, they cannot reveal much about its underlying trends. In the context of document analysis, the term trend implies a direction (in terms of theme, mood, etc.) in which the content changes throughout the narrative. For example, some works of fiction begin with a comedic tone, and take on a more serious tone in later stages, or vice versa. For the last part of your assignment, you are going to modify the approach taken in part (d) to address individual segments of the document. More specifically, you are to divide the raw document into partitions according to the chapters, acts, etc. that are present, and then produce a list of data-frames, where each list element is a single data-frame containing word frequencies for a single segment with the same format as the data-frame from part (d) outlined above. You are free to use whatever means you prefer in splitting the text into chapters and constructing the list of data-frames, but one option is to use regular expressions with the raw document. Once again, you must insure your list is readily accessible to us in the form of a variable. You can use .py files, .ipynb files, or a combination of the two in your solution. Zip these file(s) along with a simple README telling me what to run to generate the list and data-frames into a zip file with the name , where LN is your last-name and FN is your first-name, and submit this file to Blackboard.

$25.00 View