Neural network How should I interpret a neural network that won't overfit?

I'm running some experiments on various classification datasets using WEKA's MultilayerPerceptron implementation. I was expecting to be able to observe overfitting as the number of train iterations (epochs) increased. However, despite letting the number of epochs grow fairly large (15k), I haven't seen it yet. How should I interpret this? Note that I'm not achieving 100% accuracy on the train or test sets so it's not that the problem is too simplisitic. Some ideas I came up with are: I simpl

Neural network why the suggestion of output activation function of neural network in NNtool box is pureline?

When I study neural network, the mathematical derivation always use sigma function in the hidden layer and the output layer. But the NNtool box in Mathworks suggests the user to use sigma in the hidden layer and pureline in the output layer. Can anyone tell me why the output layer can be pureline? I just can't catch the reason for this activation function. http://imgur.com/9V2HIlF // the traditional back propagation formula As the formula,If I use pureline function, the result will be very di

Neural network QPROP vs RPROP in neural networks

Which one neural network is faster during study resilent propogation or quick propogation. If to judge by name quick is faster, but in my expeirments it wasn't always. To be honest percentage was 50/50. I'd like to know am I alone in my conclusion, or there is somebody else who achieved similar results?

Neural network Neural network to solve AND

I'm working on implementing a back propagation algorithm. Initially I worked on training my network to solve XOR to verify that it works correctly before using it for my design. After reading this I decided to train it to solve AND gate first. I'm using sigmoid as transfer function and MSE to calculate the total error. I used different learning rates that ranged between 0.01 and 0.5. I trained the network several times each time for different iterations ranging from 100 iterations to 1000 iterat

Neural network Basics: How does a neural network work? (Decision)

I know that there are similar questions like this. But I wanna know the plain basics. Let's assume I Have some data (x,y) -> z where z can be 0 or 1 and x,y in [0,1]. Now I wanna train a neural network with that data and my desired output should be a boundary or a line or curved line in the x,y space where it splits the zeros from the ones (e.g. male/female or whatever). So, I wanna have one hidden layer. I guess I somehow understand how to feed the network: feed it with X = (x,y) to the

Neural network In what order should we tune hyperparameters in Neural Networks?

I have a quite simple ANN using Tensorflow and AdamOptimizer for a regression problem and I am now at the point to tune all the hyperparameters. For now, I saw many different hyperparameters that I have to tune : Learning rate : initial learning rate, learning rate decay The AdamOptimizer needs 4 arguments (learning-rate, beta1, beta2, epsilon) so we need to tune them - at least epsilon batch-size nb of iterations Lambda L2-regularization parameter Number of neurons, number of layers what

Neural network When using pycaffe to run solver.solve(), only one iteration is executed, then the current process is killed

I am using pycaffe to do a multilabel classification task. When I run solver.slove() or solver.step(2), only one iteration is executed, then the current process is killed somehow. ipython console says the kernel died unexpectedly. No other error information is provided. Then, I use terminal to run the command "python Test.py", and get the "Floating point exception (core dumped)" information. Besides, the net.forward() and net.backward() methods are all ok. What is the reason? And how to solv

Neural network crash on the GPU with {inc,set}_subtensor and broadcasting the value

I am fine-tuning vgg16 network with keras 2.0.2 and theano 0.9.0 as backend on Windows10 64bit Anaconda 2 as this blog:https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html I find someone else had the same issue in the pull requests and it was fixed by changing a few lines of code (link: https://github.com/Theano/Theano/pull/2075). However , that's an old version of theano.(the pr was in 2014) . Theano 0.9.0 have already change the code and I still have

Neural network With bias, ANN not converge anymore

I'm learning ANN, I did two script (in fortran90 and python) for simple binary classification problem. I first did without bias, and I get a good convergence. But adding a bias for each node it does not converge anymore (or everything is going near to 0 or everything near to 1) The bias is 1 and has a specific weight for each node. It is randomly initialized and then update adding delta such as others weights. I have tried to change gradient step size but it still doing the same thing. Som

Neural network Unequal input vector lengths for Neural Network

I'm trying to predict whether a player of a video game will stop playing the game (0/1 for not-stopping/stopping) within the next month based on the game data from matches they've had so far. Each match a player plays generates (X) data points, however, each player may have played a different number of matches to date (M), thus when a player's data is put into one long vector, the length of their vector will be X*M. I'm very new to how neural networks work, but it is my understanding that eac

Neural network How to Invert AvgPool2d?

Is it possible to invert an avgPool2d operation in PyTorch, like maxunpool2d for a maxpool2d operation, and if so, how could that be done? I've already checked the documentation, and there isn't an option to return the indices, like in the maxpool2d operation, so I assume the unpooling won't be possible in a similar way. EDIT: I found a document by Intel which describes how the unpooling works. After checking the math regarding the avgpool2d function the unpooling seems to be pretty straight f

Neural network How to pre-train a deep neural network (or RNN) with unlabeled data?

Recently, I was asked about how to pre-train a deep neural network with unlabeled data, meaning, instead of initializing the model weight with small random numbers, we set initial weight from a pretrained model (with unlabeled data). Well, intuitively, I kinda get it, it probably helps with the vanishing gradient issue and shorten the training time when there are not too much labeled data available. But still, I don't really know how it is done, how can you train a neural network with unlabele

Neural network Julia Flux, Out of Memory with accuracy function

I struggle to make my neural network working. I have a dataset of pictures of cells that have malaria or not (https://www.kaggle.com/iarunava/cell-images-for-detecting-malaria). And I arranged my data like this : X_training a matrix of dimension 30000×2668, type : Array{Float64,2} Y_training a matrix of dimension 1×2668, type : Array{Float64,2} same with X_tests and Y_tests My simple neural network : function simple_nn(X_tests, Y_tests, X_training, Y_training) input = 100*100*3 hl1

Neural network weight matrix dimension intuition in a neural network

I have been following a course about neural networks in Coursera and came across this model: I understand that the values of z1, z2 and so on are the values from the linear regression that will be put into an activation function. The problem that I have is when the author says that there should be one matrix of weights and a vector of the inputs, like this: I know that the vector of Xs has a dimension of 3 x 1 because there are three inputs, but why the array of Ws is of dimensions 4 x 3?.

Neural network Weights of network stays the same after optimizer step

My network just refuses to train. To make code reading less of a hassle, I abbreviate some complicated logic. Would update more if needed. model = DistMultNN() optimizer = optim.SGD(model.parameters(), lr=0.0001) for t in range(500): e1_neg = sampling_logic() e2_neg = sampling_logic() e1_pos = sampling_logic() r = sampling_logic() e2_pos = sampling_logic() optimizer.zero_grad() y_pred = model(tuple(zip(e1_pos, r, e2_pos)), e1_neg, e2_neg) loss = model.loss(y_pred

Neural network OCR and Neuron Network?

I am trying to code an OCR for shop tickets (in Java), I have good results with image dictionary distance, but not for skewed texts or bad scans. I heard that neuronal networks are perfect for this. Question: which type of neuronal network do you recommand for shop tickets character detection ? Thks

Neural network how to feed the image data to HDF5 on caffe or existing examples?

I had hard time working on caffe with HDF5 on the image classification and regression tasks, for some reason, the training on HDF5 will always fail at the first beginning that the test and train loss could very soon drop to close to zero. after trying all the tricks such reducing the learning rate, adding RELU, dropout, nothing started to work, so I started to doubt that the HDF5 data I am feeding to caffe is wrong. so currently I am working on the universal dataset (Oxford 102 category flo

Neural network Evaluating performance of Neural Network embeddings in kNN classifier

I am solving a classification problem. I train my unsupervised neural network for a set of entities (using skip-gram architecture). The way I evaluate is to search k nearest neighbours for each point in validation data, from training data. I take weighted sum (weights based on distance) of labels of nearest neighbours and use that score of each point of validation data. Observation - As I increase the number of epochs (model1 - 600 epochs, model 2- 1400 epochs and model 3 - 2000 epochs), my AU

Neural network CAFFE: Run forward pass with fewer nodes in FC layer

I am trying to perform an experiment in Caffe with a very simple single hidden layer NN. I am using the MNIST dataset trained with a single hidden layer (of 128 nodes). I have all the weights from the fully trained network already. However, during the feed forward stage I would like to use only a smaller subset of these nodes i.e 32 or 64. So for example, I would like to calculate the activations of 64 nodes during the feed forward pass and save them. then during the next run, calculate the ac

Neural network Neural Network in TensorFlow works worse than Random Forest and predict the same label each time

I am new in DNN and TesorFlow. I have the problem with NN using for binary classification. As input data I have text dataset, which was transformed by TF-IDF into numerical vectors. The number of rows for training dataset is 43 000 The number of features 4235 I tried to use TFlearn library and then Keras io. But the result is the same - NN predict only one label 0 or 1 and give worse Accuracy then Random Forest. I will add the script, which I use for NN building. Please, tell me what is wr

Neural network Caffe classification labels in HDF5

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification. For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so: Iteration 19200, loss = -118232 Train net output #0: loss = 39.318

Neural network Issue on running first example in caffe

I am a beginner in Caffe and Python. I installed Caffe and compiled it successfully in ubuntu 16.04. I created an environment in anaconda 2 and used Cmake for compiling. I ran this code and it printed caffe version. $ python -c "import caffe;print caffe.__version__" 1.0.0-rc3 So I suppose that I have installed correctly. I wanted to have my first experience in caffe, so I followed the instructions in this link. But I am not really familiar with this. It is giving me this error: ~/deeplearn

Neural network Is there any example of using weighted loss for pixel-wise segmentation/classification tasks?

I am doing FCN32 semantic segmentation on my data. I ran the algorithm to fine-tune for my data (grayscale images with only one channel), till 80,000 iterations; however, the loss and accuracy are fluctuating and the output image completely black. Even, the loss is so high after 80,000 iterations. I thought the classifier cannot do training well on my data. So, I am going to train from scratch. On the other hand, my data has imbalanced class members. The background pixels are more than the othe

Neural network What is a convolutional ply?

I want to recreate the result of this paper. They use the term convolutional ply for the neural network they apply on the audio spectogram. I am not sure I understand what a convolutional ply is, and how it differs from an ordinary convolutional neural network (cnn). The paper states this as being the difference: A convolution ply differs from a standard, fully connected hidden layer in two important aspects, however. First, each convolutional unit receives input only from a local area

Neural network How to use neural networks to decide what kind of and how many furniture to appear in a room?

I am working on some problems on room design. I got a lot of room design samples and would like to produce new designs by studying these samples. The very first problem is to decide what kind of and how many furniture to appear in a room. For a specific design sample, I know its room function, e.g. bedroom or living room. I can also count the number of furniture of different categories in this room, say one sofa, one tea table and two chairs. I built a neural network whose input is the one-hot

Neural network CNN: Why stack same activation maps on top of each other

I am wondering why we stack basically identical activation maps on top of each other? Since it's always the same filter applied on the same input, wouldn't it be always the same activation map? If that's the case, we wouldn't even need to recompute the activation map, but just copy the activation map N times. What additional information does this provide us? Yes, we create again a layer with depth (output volume), but if it's the same value, what is the rational behind it? Src: http://cs231n

Neural network Keras multiple binary outputs

Can someone help me understand a bit better this problem. I must train a neural network which should output 200 mutually independent categories, each of these categories is a percentage ranging from 0 to 1. To me this seems like a binary_crossentropy problem but every example i see on the internet uses binary_crossentropy with a single output. Since my output should be 200, if i apply binary_crossentropy, would that be correct? This is what i have in mind, is that a correct approach or should

Neural network Practical difference between deconvolution and resize followed by convolution for upscaling

I need to go (upsample) from a layer input = HxWxn1, where H: height, W:width and n1: number of filters, to a layer output = 2Hx2Wxn2, where 2H = 2*Height etc, and n2=n1/2: number of new filters. One way of achieving this is by using transposed convolution operators. However, it is known that deconvolution (transposed convolution) operators can lead to the checkerboard artifacts. One way to overcome this problem is to perform resize and then apply a convolution map. E.g. output = transpose_con

Neural network Using cross-validation to choose network-architecture for multilayer perceptron in Apache Spark

I'm trying to decide on the best architecture for a multilayerPerceptron in Apache Spark and am wondering whether I can use cross-validation for that. Some code: // define layers int[] layers = new int[] {784, 78, 35, 10}; int[] layers2 = new int[] {784, 28, 28, 10}; int[] layers3 = new int[] {784, 84, 10}; int[] layers4 = new int[] {784, 392, 171, 78, 10}; MultilayerPerceptronClassifier mlp = new MultilayerPerceptronClassifier() .setMaxIter(25) .setLayers(layers4); ParamMap[

Neural network Understanding relation between Neural Networks and Hidden Markov Model

I've red a few paper about speech recognition based on neural networks, the gaussian mixture model and the hidden markov model. On my research, I came across the paper "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition" from George E. Dahl, Dong Yu, et al.. I think I understand the most of the presented idea, however I still have trouble with some details. I really would appreciate, if someone could enlighten me. As I understand it, the procedure consist

Neural network DeepLearning4J - Acquiring Data and Train Model

I try to create the easiest of a NeuralNetwork and training it with some data: Therefore I created a test.csv with a the following pattern: number,number+1; number2,number2+1 ... I try to make a linear regression with the network... But I do not find a way to acquire the data, DataSetIterator does not work. How to fit the Data, how to test the Data?

Neural network Loss Function & Its Inputs For Binary Classification PyTorch

I'm trying to write a neural Network for binary classification in PyTorch and I'm confused about the loss function. I see that BCELoss is a common function specifically geared for binary classification. I also see that an output layer of N outputs for N possible classes is standard for general classification. However, for binary classification it seems like it could be either 1 or 2 outputs. So, should I have 2 outputs (1 for each label) and then convert my 0/1 training labels into [1,0] and [

Neural network PyTorch - RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'weight'

I load my previously trained model and want to classify a single (test) image from the disk through this model. All the operations in my model are carried out on my GPU. Hence, I move the numpy array of the test image to GPU by calling cuda() function. When I call the forward() function of my model with the numpy array of the test image, I get the RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'weight'. Here is the code I use to load the image from disk and ca

Neural network Activation function to get day of week

I'm writing a program to predict when will something happens. I don't know which activation function to get output in day of week (1-7). I tried sigmoid function but i need to input the predicted day and it output probability of it, I don't want it to be this way. I expect the activation function returning 0 to infinite, is ReLU the best activation function for this task? EDIT: also, what if i wanted output more than 7 days, for example, x will hapen in 9th day from today, or 15th day from

Neural network Dimension of gradients in backpropagation

Take a simple neural network that takes in data of dimension NxF, and output NxC where the N, F, and C represent number of samples, features, and C output neurons respectively. Needless to say, softmax function with cross-entropy is used given we are dealing with multi-class classification problem. I have some problem with my understanding on how gradients are calculated for backpropagation. I have given below the gradient calculation steps. Could someone please clarify where I am going wrong.

Neural network Neural network for text documents invariant to sentence order

Is there a neural network architecture that I can use to find a low dimensional mapping for documents comprised of multiple sentences such that the mapping is invariant to sentence order? So, if Doc 1 is: I like dogs. Cats are very nice. and Doc 2 is: Cats are very nice. I like dogs. That in the new space, they would be represented by the same point?

Neural network Understanding, confirmation of the training of the convolution laysers and the effect of the pooling layer

I'm just trying to find out how the convolution layers are trained in a CNN. Unfortunately, the relevant tutorials are silent about it or are very vague. What I found out: If I have understood correctly, the backpropagation method is used here just like with a multilayer perceptron (MLP). The only difference is that a weight change is calculated for each kernel position in the feature map and then an average value is calculated over all these weight changes. is this statement correct? So you ave

Neural network Why do we take the derivative of the transfer function in calculating back propagation algorithm?

What is the concept behind taking the derivative? It's interesting that for somehow teaching a system, we have to adjust its weights. But why are we doing this using a derivation of the transfer function. What is in derivation that helps us. I know derivation is the slope of a continuous function at a given point, but what does it have to do with the problem.

Neural network Batch training in SOM?

I am trying to implement a general SOM with batch training. and i have doubt regarding the formula for batch training. i have read about it in the following link http://cs-www.cs.yale.edu/c2/images/uploads/HR15.pdf https://notendur.hi.is//~benedikt/Courses/Mia_report2.pdf i noticed that the weight updates are assigned rather than added at the end of an epoch - wouldn't that overwrite the whole networks previous values, and the update formula did not include the previous weights of the nodes,

Neural network adjust Learning Rate in a deep neural network

Currently I am training a YOLO model to detect object, but I have noted that sometimes the loss in the output is like in a loop, for example "in 20 minute of training my loss was between 0.2 and 0.5 each time that my program decrease to 0.2 it's automatically increase to 0.5 and it loop like that " My question is: Do I need to change my learning rate if the loss loop?

  1    2   3   4   5   6  ... 下一页 最后一页 共 10 页