Ask Ghassem - Recent questions tagged deep-learning

Step-by-Step Hidden State Calculation in a Recurrent Neural Network

Mon, 01 Dec 2025 18:32:24 +0000

Consider a simplified Recurrent Neural Network (RNN) with a single input and a single output. The hidden state is updated using the recurrence:

$$ h_t = \text{ReLU}(W_{ih} \cdot x_t + W_{hh} \cdot h_{t-1}) $$

Assume the following:

$ x_t = 3 $ for every time step
$ h_0 = 0 $
$ W_{ih} = 0.4 $
$ W_{hh} = 0.6 $
Activation function: ReLU

Compute the value of the hidden state $ h_4 $ at time $ t = 4 $.

How to calculate feed-forward (forward-propagation) in neural network for classification?

Wed, 02 Oct 2024 14:47:26 +0000

For the following neural network, calculate accuracy of classification, given these settings

Bankruptcy prediction and credit card

Sun, 10 Apr 2022 05:50:14 +0000

Hello everyone newbie data scientist here.
I'm working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :
For example below 50 probability is good and above is bad ( just for the example)
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).
I'm planning to use this models:
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.

The question is and i know it is a dumb question:
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I'm not sure about it.

Otherwise, any guidance,tips anything will be appreciated.

how many samples do we need to test image segmentation using synthetic data ?

Mon, 21 Jun 2021 12:26:32 +0000

Hello,

I trained a CNN using synthetic data to perform a segmentation task on human faces. During the test and to evaluate the prediction of this network, I used 200 examples from the database to compute precision and recall.

Is this number sufficient, knowing that I control myself the data generator and that I build the database by randomly drawing the elements using centered Gaussian distributions.

Thank you,

Binary Classification and neutral tag

Sat, 30 Jan 2021 10:08:01 +0000

I am trying to create a sentiment analysis model using binary classification as loss.I have a batch of tweets that some of them are tagged as positive (labeled as 1) and negative (labeled as 0).I manage to gather some tweets that are tagged as neutral but there are less tweets than positive and negative.My thinking is to tag them with 0.5 to balance the classification probability.Is this legit?

"Rare words" on vocabulary

Sat, 30 Jan 2021 09:57:31 +0000

I am trying to create a sentiment analysis model and I have a question.

After I preprocessed my tweets and created my vocabulary I've noticed that I have words that appear less than 5 times in my dataset (Also there are many of them that appear 1 time). Many of them are real words and not gibberish. My thinking is that if I keep those words then they will get wrong "sentimental" weights and gonna make my model worse.
Is my thinking right or am I missing something?

My vocab size is around 40000 words and those that are "rare" are around 10k.Should I "sacrifice" them?

Thanks in advance.

How to update the weights in backpropagation algorithm when activation function in not linear?

Mon, 10 Aug 2020 21:55:19 +0000

The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.

Assume for the following neural network, inputs = [$i_1,i_2$] = [0.05, 0.10], we want the neural network to output = [$o_1$,$o_2$] = [0.01, 0.99], and for learning rate, $\alpha=0.5$.
In addition, the activation function for the hidden layer (both $h_1$ and $h_2$) is sigmoid (logistic):

$S(x)=\frac{1}{1+e^{-x}}$

https://i.imgur.com/cnY5feu.png

Hint:
$w_{new} = w_{old} - \alpha \frac{\partial E}{\partial w}$

$E_{\text {total}}=\sum \frac{1}{2}(\text {target}-\text {output})^{2}$

a) Show step by step solution to calculate weights $w_1$ to $w_8$ after one update in table below.
b) Calculate initial error and error after one update (assume biases $[b_1,b_2]$ are not changing during the updates).

Updating weights in backpropagation algorithm
Weights	Initialization	New weights after one step
$w1$	0.15	?
$w2$	0.20	?
$w3$	0.25	?
$w4$	0.30	?
$w5$	0.40	?
$w6$	0.45	?
$w7$	0.50	?
$w8$	0.55	?

Pre trainned word Embeddings and Preproceess

Fri, 10 Apr 2020 12:08:09 +0000

How should i preprocess my data if i am gonna use a pretrainned word embedding like glove or word2vec?Should I use stemming or stopword removal techniques?

How to calculate convolutions on a CONV layer for a Convolutional Neural Network?

Wed, 26 Jun 2019 08:54:12 +0000

Assume we have a $5\times5$ px RGB image with 3 channels respectively for R, G, and B. If

R
2	0	0	0	0
1	2	0	0	1
2	0	1	0	2
1	2	1	0	1
0	1	0	2	0

G
0	2	1	2	2
1	1	1	0	0
0	0	2	2	0
2	0	0	2	0
0	2	1	1	1

B
0	1	0	0	1
1	1	2	0	1
1	0	2	0	2
1	0	1	1	0
1	2	1	1	2

We have one $3\times3$ px kernel (filter) with 3 channels as follows:

Filter - R
0	0	1
1	0	1
1	0	0

Filter - G
0	0	-1
1	0	0
1	-1	0

Filter - B
1	0	1
0	1	-1
1	-1	0

a) If Stride = 2, and Zero-padding = 1, and Bias = 1, what will be the result of convolution?

b) What is the result after applying a ReLU layer ($max(z,0)$)on the result with the same size of the reuslt in part a?

c) Calculate the output by applying max-pooling layer with the size of $2\times2$ on the output of part b, and Stride = 1. (hint: max-pooling layer here and usually do not include any zero-paddings)

d) What is the result after applying flatten on the output of part c and creating a vector?

e) Assume the vector you created contains m elements. Consider it as the input vector for a Softmax Regression classifier (without any hidden layers and biases and it is fully connected). Assume there are 2 classes of 0 and 1. For all the weights from each element in the feature vector, the optimized weights are 1 for odd elements and 2 for even elements. For example, if the feature vector is [10,11,12,13,14], all the weights from 10 are 1 (because 10 is element 1 and 1 is odd), all the weights from 11 are 2, all the weights from 12 are 1, all the weights from 13 are 2 and all the weights from 14 are 1 and so on. Draw the Softmax Regression network and calculate the class should be 0 or 1?

Hint:
Softmax Regression: $p_{i}=\frac{e^{z_{i}}}{\sum_{i=1}^{c} e^{z_{i}}}$
Where $p_{i}$ is the probability of class $i$ anc $c$ is the number of classes.

What loss function to use in CNN-SVM model

Sat, 08 Jun 2019 09:24:21 +0000

I am using Matlab R2018b and am trying to infuse SVM classifier within CNN. My plan is to use CNN only as a feature extractor and use SVM as the classifier. I know people have already implemented it a few years back either in tensorflow or in other platforms. In implementing this I got stuck at a point during backward propagation. I got puzzled about which loss function I need to implement to upgrade the gradients and the parameters.

Few points came up during this:

1. I got a feeling to implement the hinge loss here. But which form of hinge loss should I implement? Should I move on to the second form of hinge loss implementation for calculating loss during backward propagation?

2. Besides, calculating the backward loss, should I calculate the forward loss as well to find out the loss occurred in the model?

Any form of advice doing this CNN-svm infusion will be appreciated as I am unable to find any such material implemented in Matlab to get help.

Thanks.

is impossible predict hours time series to minutes time series?

Wed, 01 May 2019 13:11:26 +0000

https://stackoverflow.com/questions/55930051/is-impossible-predict-hours-time-series-to-minutes-time-series

i want to this hours time series predict model to minute predict model

How to update weights in backpropagation algorithm (a numerical example)?

Thu, 11 Apr 2019 17:02:04 +0000

Assume we have the following neural network and all activation functions are $f(z)=z$. If the weights are initialized with the values you see in table below, what will be new updated weights after one step if learning rate, $\alpha = 0.05$?

Assume the input values are [$i_1$,$i_2$] = [2,3] and target value $out = 1$.

Hint:
$w_{new} = w_{old} - \alpha \frac{\partial E}{\partial w}$

$E_{\text {total}}=\sum \frac{1}{2}(\text {target}-\text {output})^{2}$

Updating weights in backpropagation algorithm
Weights	Initialization	New weights after one step
$w1$	0.11	?
$w2$	0.21	?
$w3$	0.12	?
$w4$	0.08	?
$w5$	0.14	?
$w6$	0.15	?

https://i.imgur.com/v0RMeOQ.png

How to calculate feed-forward (forward-propagation) in neural network?

Thu, 04 Apr 2019 15:54:17 +0000

In the figure below, a neural network is shown. Calculate the following:

1) How many neurons do we have in the input layer and the output layer?

2) How many hidden layers do we have?

3) If all the weights initialized with 1 ($w1=w2=w3=...=w19=1$), what is the output of this network after feed-forward for the sample shown in the figure (X = (x1,x2,x3) = (2,5,3) and y=10)? What is the error of the network ($\text { Error }=\frac{1}{2}(\hat{y}-y)^{2}$)? Assume activation functions for all neurons except the output neuron is $f(z)=z$.

4) If we change the activation function of all the neurons in the second hidden layer to Sigmoid ($S(x)=\frac{1}{1+e^{-x}}=\frac{e^{x}}{e^{x}+1}$), what would be the output of the network after this change? Calculate the error as well.

https://i.imgur.com/rtqPiRa.jpg

How to update weights using gradient decent algorithm?

Thu, 28 Mar 2019 17:17:39 +0000

For the below neural network, imagine we are going to use the backpropagation algorithm to update weights. If the Bias (b) in this problem is always 0 (ignore bias when you solve the problem), and we have a dataset with only one record of $x=2$ and the target value of $y=5$ as you can see in the following table, and activation function is defined as $f(z) = z$

feature (x)	Target (y)
2	5

1) Define the cost function, $J(w)$, based on the error in backpropagation algorithm: $J(w) = E = \frac{1}{2}(predicted - target)^2$, and draw it

2) Initialize the weight by $w=3$, and calculate the error

3) Calculate updated weights using the gradient decent algorithm after three updates if we have the following values for learning rate ($\alpha$)

$\alpha$ = 1
$\alpha$ = 0.1
$\alpha$ = 0.5

Hint: $w_{new} = w_{old} - \alpha \frac{\partial E}{\partial w}$

https://i.imgur.com/uohFS6l.png

Determine weights on the paths that connect to the different data points in a neural network?

Mon, 18 Mar 2019 23:35:25 +0000

How do you determine the weight values that connect to the other data points when solving for our output in neural networks?

Passing variable length sentences to Tensorflow LSTM

Mon, 11 Feb 2019 05:06:27 +0000

I have a tensorflow LSTM model for predicting the sentiment. I build the model with the maximum sequence length 150. (Maximum number of words) While making predictions, i have written the code as below:

batchSize = 32
maxSeqLength = 150

def getSentenceMatrix(sentence):
    arr = np.zeros([batchSize, maxSeqLength])
    sentenceMatrix = np.zeros([batchSize,maxSeqLength], dtype='int32')
    cleanedSentence = cleanSentences(sentence)
    cleanedSentence = ' '.join(cleanedSentence.split()[:150])
    split = cleanedSentence.split()
    for indexCounter,word in enumerate(split):
        try:
            sentenceMatrix[0,indexCounter] = wordsList.index(word)
        except ValueError:
            sentenceMatrix[0,indexCounter] = 399999 #Vector for unkown words
    return sentenceMatrix

input_text = "example data"
inputMatrix = getSentenceMatrix(input_text)

In the code i'm truncating my input text to 150 words and ignoring remaining data.Due to this my predictions are wrong.

cleanedSentence = ' '.join(cleanedSentence.split()[:150])

I know that if we have lesser length than sequence length we can pad with zero's. What we need to do if we have more length. Can you suggest me the best way to do this. Thanks in advance.

What is the difference between a batch and an epoch in a Neural Network?

Tue, 30 Oct 2018 14:45:56 +0000

Both of the batch size and number of epochs are integer values and seem to do the same thing in Stochastic gradient descent. What are these two hyper-parameters of this learning algorithm?

What is the difference between machine learning and deep learning?

Tue, 30 Oct 2018 11:29:38 +0000

Using Tensorflow.DNNClassifier, getting Error: assertion failed: [Labels must >= 0]

Wed, 24 Oct 2018 03:12:33 +0000

Hi All,

I am writing a simple program using Tensorflow and DNNClassifier. Training Data is 9 pixel with four spectral bands, i.e. 4*9=36 featurs. And each data-point will be mapped to a class (from 1 to 7).

Last parameter, is the class label.

A line of data-point is like this:

67,75,77,62,67,79,81,62,75,87,89,71,66,79,88,63,66,79,84,63,66,79,80,59,67,84,86,68,71,84,86,64,67,81,82,64,7

But I got below Error:

InvalidArgumentError (see above for traceback): assertion failed: [Labels must >= 0] [Condition x >= 0 did not hold element-wise:] [x (dnn/head/labels:0) = ] [[3][3][3]...]

I am sure there is no datapoint which has a label less than 0. Would you please advise?

import numpy as np

import pandas as pd

import tensorflow as tf

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import StratifiedShuffleSplit

print('** DNN Classification *******************************************************')

landsatData = pd.read_csv("./resources/landsat/lantsat.1.csv")

landsatData.describe()

X_landSatAllFeatures = landsatData.iloc[:, np.arange(36)].copy()

y_midPixelAsTarget = landsatData.iloc[:, 36].copy()

# Testing and training sentences splitting (stratified + shuffled) based on the index (sentence ID)
allFeaturesIndexes = X_landSatAllFeatures.index
targetData = y_midPixelAsTarget
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.3, random_state=42)

for train_index, test_index in sss.split(allFeaturesIndexes, targetData):
    train_ind, test_ind = allFeaturesIndexes[train_index], allFeaturesIndexes[test_index]

Test_Matrix = X_landSatAllFeatures.loc[test_ind]
Test_Target_Matrix = y_midPixelAsTarget.loc[test_ind]
Train_Matrix = X_landSatAllFeatures.loc[train_ind]
Train_Target_Matrix = y_midPixelAsTarget.loc[train_ind]

scaler = StandardScaler().fit(Train_Matrix)
Train_Matrix, Test_Matrix = scaler.transform(Train_Matrix), scaler.transform(Test_Matrix)

def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

X_train = Train_Matrix
y_train = Train_Target_Matrix
X_test = Test_Matrix
y_test = Test_Target_Matrix

xx, yy = Train_Matrix.shape
#training phase
feature_cols = [tf.feature_column.numeric_column("X", shape=[36])]
dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300,100], n_classes=8, feature_columns=feature_cols)
# dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300,100], n_classes=10)


input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"X": X_train}, y=y_train, num_epochs=40, batch_size=64, shuffle=True)
dnn_clf.train(input_fn=input_fn)

#testing phase
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"X": X_test}, y=y_test, shuffle=False)
eval_results = dnn_clf.evaluate(input_fn=test_input_fn)
print("The prediction result is : {0:.2f}%".format(100*eval_results['accuracy']))
y_pred_iter = dnn_clf.predict(input_fn=test_input_fn)
y_pred = list(y_pred_iter)
y_pred[0]


print('**********************************************************************************')

What are the main branches of Deep Learning algorithms?

Mon, 15 Oct 2018 02:52:59 +0000

Why do we need big data to train Deep Neural Networks?

Mon, 08 Oct 2018 12:15:46 +0000

Why should we use Machine learning instead of deep learning?

Mon, 08 Oct 2018 03:54:29 +0000

I am wondering why should we use machine learning instead of deep learning. We know that deep learning is very powerful. Anything which machine learning algorithm can do deep learning could achieve that.

Plus using deep learning we don't have to worry about feature extraction, data cleaning etc.

So why should we use machine learning algorithms instead of deep learning ?

What are the best resources for studying Deep Learning?

Sun, 26 Aug 2018 07:43:30 +0000

I am wondering if anyone can suggest the best resources for studying Deep Learning?