### Neural network What are some good resources for learning about Artificial Neural Networks?

I'm really interested in Artificial Neural Networks, but I'm looking for a place to start. What resources are out there and what is a good starting project?

QuestionBank

I'm really interested in Artificial Neural Networks, but I'm looking for a place to start. What resources are out there and what is a good starting project?

Same length 5 metal rods has a hole (of same diameter) in different places. Using neural network we have to detect where the hole is by using sound produced when the metal rod is knocked. Note: Instead of metal rod, a plastic bottle might also be used.

I'm running some experiments on various classification datasets using WEKA's MultilayerPerceptron implementation. I was expecting to be able to observe overfitting as the number of train iterations (epochs) increased. However, despite letting the number of epochs grow fairly large (15k), I haven't seen it yet. How should I interpret this? Note that I'm not achieving 100% accuracy on the train or test sets so it's not that the problem is too simplisitic. Some ideas I came up with are: I simpl

When I study neural network, the mathematical derivation always use sigma function in the hidden layer and the output layer. But the NNtool box in Mathworks suggests the user to use sigma in the hidden layer and pureline in the output layer. Can anyone tell me why the output layer can be pureline? I just can't catch the reason for this activation function. http://imgur.com/9V2HIlF // the traditional back propagation formula As the formula,If I use pureline function, the result will be very di

Which one neural network is faster during study resilent propogation or quick propogation. If to judge by name quick is faster, but in my expeirments it wasn't always. To be honest percentage was 50/50. I'd like to know am I alone in my conclusion, or there is somebody else who achieved similar results?

I'm working on implementing a back propagation algorithm. Initially I worked on training my network to solve XOR to verify that it works correctly before using it for my design. After reading this I decided to train it to solve AND gate first. I'm using sigmoid as transfer function and MSE to calculate the total error. I used different learning rates that ranged between 0.01 and 0.5. I trained the network several times each time for different iterations ranging from 100 iterations to 1000 iterat

For logistic regression : Are initial theta values for hypothesis guessed ? If so, what range of values should theta take (perhaps lie between min and max output values) ?

I know that there are similar questions like this. But I wanna know the plain basics. Let's assume I Have some data (x,y) -> z where z can be 0 or 1 and x,y in [0,1]. Now I wanna train a neural network with that data and my desired output should be a boundary or a line or curved line in the x,y space where it splits the zeros from the ones (e.g. male/female or whatever). So, I wanna have one hidden layer. I guess I somehow understand how to feed the network: feed it with X = (x,y) to the

I have a quite simple ANN using Tensorflow and AdamOptimizer for a regression problem and I am now at the point to tune all the hyperparameters. For now, I saw many different hyperparameters that I have to tune : Learning rate : initial learning rate, learning rate decay The AdamOptimizer needs 4 arguments (learning-rate, beta1, beta2, epsilon) so we need to tune them - at least epsilon batch-size nb of iterations Lambda L2-regularization parameter Number of neurons, number of layers what

I am using pycaffe to do a multilabel classification task. When I run solver.slove() or solver.step(2), only one iteration is executed, then the current process is killed somehow. ipython console says the kernel died unexpectedly. No other error information is provided. Then, I use terminal to run the command "python Test.py", and get the "Floating point exception (core dumped)" information. Besides, the net.forward() and net.backward() methods are all ok. What is the reason? And how to solv

Why does word2vec outperform other neural network methods? Word2vec is more shallow than other neural network methods(NNLM,RNNLM,etc.). Can it be explained? And I want to know whether it suffers any drawbacks because the word2vec model does not contain hidden layer (activation function like sigmoid, etc.)?

I am fine-tuning vgg16 network with keras 2.0.2 and theano 0.9.0 as backend on Windows10 64bit Anaconda 2 as this blog:https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html I find someone else had the same issue in the pull requests and it was fixed by changing a few lines of code (link: https://github.com/Theano/Theano/pull/2075). However , that's an old version of theano.(the pr was in 2014) . Theano 0.9.0 have already change the code and I still have

I'm learning ANN, I did two script (in fortran90 and python) for simple binary classification problem. I first did without bias, and I get a good convergence. But adding a bias for each node it does not converge anymore (or everything is going near to 0 or everything near to 1) The bias is 1 and has a specific weight for each node. It is randomly initialized and then update adding delta such as others weights. I have tried to change gradient step size but it still doing the same thing. Som

I'm trying to predict whether a player of a video game will stop playing the game (0/1 for not-stopping/stopping) within the next month based on the game data from matches they've had so far. Each match a player plays generates (X) data points, however, each player may have played a different number of matches to date (M), thus when a player's data is put into one long vector, the length of their vector will be X*M. I'm very new to how neural networks work, but it is my understanding that eac

Is it possible to invert an avgPool2d operation in PyTorch, like maxunpool2d for a maxpool2d operation, and if so, how could that be done? I've already checked the documentation, and there isn't an option to return the indices, like in the maxpool2d operation, so I assume the unpooling won't be possible in a similar way. EDIT: I found a document by Intel which describes how the unpooling works. After checking the math regarding the avgpool2d function the unpooling seems to be pretty straight f

I understand that implementing Fully Connected Layer as Convolution Layer reduces parameter, but does it increases Computational Speed. If Yes, then why do people still use Fully Connected Layers?

Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data). Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabele

I struggle to make my neural network working. I have a dataset of pictures of cells that have malaria or not (https://www.kaggle.com/iarunava/cell-images-for-detecting-malaria). And I arranged my data like this : X_training a matrix of dimension 30000×2668, type : Array{Float64,2} Y_training a matrix of dimension 1×2668, type : Array{Float64,2} same with X_tests and Y_tests My simple neural network : function simple_nn(X_tests, Y_tests, X_training, Y_training) input = 100*100*3 hl1

I have been following a course about neural networks in Coursera and came across this model: I understand that the values of z1, z2 and so on are the values from the linear regression that will be put into an activation function. The problem that I have is when the author says that there should be one matrix of weights and a vector of the inputs, like this: I know that the vector of Xs has a dimension of 3 x 1 because there are three inputs, but why the array of Ws is of dimensions 4 x 3?.

I know pytorch starts to support tensorboard since the version 1.11. But I am wondering is it possible for us to use the debugger plugin for tensorboard with pytorch? I didn't find any information about this. If pytorch can also support tensorboard debugger, it would be extremely convenient and could save us a lot of time.

My network just refuses to train. To make code reading less of a hassle, I abbreviate some complicated logic. Would update more if needed. model = DistMultNN() optimizer = optim.SGD(model.parameters(), lr=0.0001) for t in range(500): e1_neg = sampling_logic() e2_neg = sampling_logic() e1_pos = sampling_logic() r = sampling_logic() e2_pos = sampling_logic() optimizer.zero_grad() y_pred = model(tuple(zip(e1_pos, r, e2_pos)), e1_neg, e2_neg) loss = model.loss(y_pred

I am trying to code an OCR for shop tickets (in Java), I have good results with image dictionary distance, but not for skewed texts or bad scans. I heard that neuronal networks are perfect for this. Question: which type of neuronal network do you recommand for shop tickets character detection ? Thks

I had hard time working on caffe with HDF5 on the image classification and regression tasks, for some reason, the training on HDF5 will always fail at the first beginning that the test and train loss could very soon drop to close to zero. after trying all the tricks such reducing the learning rate, adding RELU, dropout, nothing started to work, so I started to doubt that the HDF5 data I am feeding to caffe is wrong. so currently I am working on the universal dataset (Oxford 102 category flo

I am solving a classification problem. I train my unsupervised neural network for a set of entities (using skip-gram architecture). The way I evaluate is to search k nearest neighbours for each point in validation data, from training data. I take weighted sum (weights based on distance) of labels of nearest neighbours and use that score of each point of validation data. Observation - As I increase the number of epochs (model1 - 600 epochs, model 2- 1400 epochs and model 3 - 2000 epochs), my AU

I am trying to perform an experiment in Caffe with a very simple single hidden layer NN. I am using the MNIST dataset trained with a single hidden layer (of 128 nodes). I have all the weights from the fully trained network already. However, during the feed forward stage I would like to use only a smaller subset of these nodes i.e 32 or 64. So for example, I would like to calculate the activations of 64 nodes during the feed forward pass and save them. then during the next run, calculate the ac

Looking at GoogleNet architecture you can see such blocks: convolution operation is tf.nn.conv2d() pooling is tf.nn.max_pool() But I cannot find in examples and tutorials how is Filter Concatenation implemented in TF?

I am new in DNN and TesorFlow. I have the problem with NN using for binary classification. As input data I have text dataset, which was transformed by TF-IDF into numerical vectors. The number of rows for training dataset is 43 000 The number of features 4235 I tried to use TFlearn library and then Keras io. But the result is the same - NN predict only one label 0 or 1 and give worse Accuracy then Random Forest. I will add the script, which I use for NN building. Please, tell me what is wr

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification. For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so: Iteration 19200, loss = -118232 Train net output #0: loss = 39.318

I am a beginner in Caffe and Python. I installed Caffe and compiled it successfully in ubuntu 16.04. I created an environment in anaconda 2 and used Cmake for compiling. I ran this code and it printed caffe version. $ python -c "import caffe;print caffe.__version__" 1.0.0-rc3 So I suppose that I have installed correctly. I wanted to have my first experience in caffe, so I followed the instructions in this link. But I am not really familiar with this. It is giving me this error: ~/deeplearn

I am doing FCN32 semantic segmentation on my data. I ran the algorithm to fine-tune for my data (grayscale images with only one channel), till 80,000 iterations; however, the loss and accuracy are fluctuating and the output image completely black. Even, the loss is so high after 80,000 iterations. I thought the classifier cannot do training well on my data. So, I am going to train from scratch. On the other hand, my data has imbalanced class members. The background pixels are more than the othe

I want to recreate the result of this paper. They use the term convolutional ply for the neural network they apply on the audio spectogram. I am not sure I understand what a convolutional ply is, and how it differs from an ordinary convolutional neural network (cnn). The paper states this as being the difference: A convolution ply differs from a standard, fully connected hidden layer in two important aspects, however. First, each convolutional unit receives input only from a local area

I am working on some problems on room design. I got a lot of room design samples and would like to produce new designs by studying these samples. The very first problem is to decide what kind of and how many furniture to appear in a room. For a specific design sample, I know its room function, e.g. bedroom or living room. I can also count the number of furniture of different categories in this room, say one sofa, one tea table and two chairs. I built a neural network whose input is the one-hot

I am wondering why we stack basically identical activation maps on top of each other? Since it's always the same filter applied on the same input, wouldn't it be always the same activation map? If that's the case, we wouldn't even need to recompute the activation map, but just copy the activation map N times. What additional information does this provide us? Yes, we create again a layer with depth (output volume), but if it's the same value, what is the rational behind it? Src: http://cs231n

Can someone help me understand a bit better this problem. I must train a neural network which should output 200 mutually independent categories, each of these categories is a percentage ranging from 0 to 1. To me this seems like a binary_crossentropy problem but every example i see on the internet uses binary_crossentropy with a single output. Since my output should be 200, if i apply binary_crossentropy, would that be correct? This is what i have in mind, is that a correct approach or should

I need to go (upsample) from a layer input = HxWxn1, where H: height, W:width and n1: number of filters, to a layer output = 2Hx2Wxn2, where 2H = 2*Height etc, and n2=n1/2: number of new filters. One way of achieving this is by using transposed convolution operators. However, it is known that deconvolution (transposed convolution) operators can lead to the checkerboard artifacts. One way to overcome this problem is to perform resize and then apply a convolution map. E.g. output = transpose_con

I'm trying to decide on the best architecture for a multilayerPerceptron in Apache Spark and am wondering whether I can use cross-validation for that. Some code: // define layers int[] layers = new int[] {784, 78, 35, 10}; int[] layers2 = new int[] {784, 28, 28, 10}; int[] layers3 = new int[] {784, 84, 10}; int[] layers4 = new int[] {784, 392, 171, 78, 10}; MultilayerPerceptronClassifier mlp = new MultilayerPerceptronClassifier() .setMaxIter(25) .setLayers(layers4); ParamMap[

I've red a few paper about speech recognition based on neural networks, the gaussian mixture model and the hidden markov model. On my research, I came across the paper "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition" from George E. Dahl, Dong Yu, et al.. I think I understand the most of the presented idea, however I still have trouble with some details. I really would appreciate, if someone could enlighten me. As I understand it, the procedure consist

I try to create the easiest of a NeuralNetwork and training it with some data: Therefore I created a test.csv with a the following pattern: number,number+1; number2,number2+1 ... I try to make a linear regression with the network... But I do not find a way to acquire the data, DataSetIterator does not work. How to fit the Data, how to test the Data?

While running ANN model using R "Can we use catagorical variable as input in ANN model".

I'm trying to write a neural Network for binary classification in PyTorch and I'm confused about the loss function. I see that BCELoss is a common function specifically geared for binary classification. I also see that an output layer of N outputs for N possible classes is standard for general classification. However, for binary classification it seems like it could be either 1 or 2 outputs. So, should I have 2 outputs (1 for each label) and then convert my 0/1 training labels into [1,0] and [

I load my previously trained model and want to classify a single (test) image from the disk through this model. All the operations in my model are carried out on my GPU. Hence, I move the numpy array of the test image to GPU by calling cuda() function. When I call the forward() function of my model with the numpy array of the test image, I get the RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'weight'. Here is the code I use to load the image from disk and ca

I'm writing a program to predict when will something happens. I don't know which activation function to get output in day of week (1-7). I tried sigmoid function but i need to input the predicted day and it output probability of it, I don't want it to be this way. I expect the activation function returning 0 to infinite, is ReLU the best activation function for this task? EDIT: also, what if i wanted output more than 7 days, for example, x will hapen in 9th day from today, or 15th day from

Take a simple neural network that takes in data of dimension NxF, and output NxC where the N, F, and C represent number of samples, features, and C output neurons respectively. Needless to say, softmax function with cross-entropy is used given we are dealing with multi-class classification problem. I have some problem with my understanding on how gradients are calculated for backpropagation. I have given below the gradient calculation steps. Could someone please clarify where I am going wrong.

Is there a neural network architecture that I can use to find a low dimensional mapping for documents comprised of multiple sentences such that the mapping is invariant to sentence order? So, if Doc 1 is: I like dogs. Cats are very nice. and Doc 2 is: Cats are very nice. I like dogs. That in the new space, they would be represented by the same point?

I'm just trying to find out how the convolution layers are trained in a CNN. Unfortunately, the relevant tutorials are silent about it or are very vague. What I found out: If I have understood correctly, the backpropagation method is used here just like with a multilayer perceptron (MLP). The only difference is that a weight change is calculated for each kernel position in the feature map and then an average value is calculated over all these weight changes. is this statement correct? So you ave

What is the concept behind taking the derivative? It's interesting that for somehow teaching a system, we have to adjust its weights. But why are we doing this using a derivation of the transfer function. What is in derivation that helps us. I know derivation is the slope of a continuous function at a given point, but what does it have to do with the problem.

I am trying to implement a general SOM with batch training. and i have doubt regarding the formula for batch training. i have read about it in the following link http://cs-www.cs.yale.edu/c2/images/uploads/HR15.pdf https://notendur.hi.is//~benedikt/Courses/Mia_report2.pdf i noticed that the weight updates are assigned rather than added at the end of an epoch - wouldn't that overwrite the whole networks previous values, and the update formula did not include the previous weights of the nodes,

Currently I am training a YOLO model to detect object, but I have noted that sometimes the loss in the output is like in a loop, for example "in 20 minute of training my loss was between 0.2 and 0.5 each time that my program decrease to 0.2 it's automatically increase to 0.5 and it loop like that " My question is: Do I need to change my learning rate if the loss loop?

I am getting the following output on trying to run neuraltalk2 What is going wrong here? Do I need to run this code in Torch?

I have did some reach and calculation and If I understand correctly, Stochastic gradient descent - "Adam optimiser" is basically ordinary gradient descent, with one specific that it selects a random data in smaller proportion with training dataset, to avoid NN being caught in gap, which might not necessary reflect minimum value in descent function? Thank you