<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged classification</title>
<link>https://ask.ghassem.com/tag/classification</link>
<description>Powered by Question2Answer</description>
<item>
<title>Bankruptcy prediction and credit card</title>
<link>https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</link>
<description>Hello everyone newbie data scientist here.&lt;br /&gt;
I&amp;#039;m working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :&lt;br /&gt;
For example below 50 probability is good and above is bad ( just for the example)&lt;br /&gt;
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).&lt;br /&gt;
I&amp;#039;m planning to use this models:&lt;br /&gt;
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.&lt;br /&gt;
&lt;br /&gt;
The question is and i know it is a dumb question:&lt;br /&gt;
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I&amp;#039;m not sure about it.&lt;br /&gt;
&lt;br /&gt;
Otherwise, any guidance,tips anything will be appreciated.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</guid>
<pubDate>Sun, 10 Apr 2022 05:50:14 +0000</pubDate>
</item>
<item>
<title>How to perform a classification or regression using k-NN?</title>
<link>https://ask.ghassem.com/658/how-to-perform-a-classification-or-regression-using-k-nn</link>
<description>&lt;p&gt;Suppose, you have given the following dataset where x and y are the 2 features and color Red or Blue&amp;nbsp;is the target variable.&lt;/p&gt;

&lt;p&gt;a) A new&amp;nbsp;data point $x=1$ and $y=1$ is given. Using Euclidean distance in 3-NN, what you predict as the color for this data point?&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;0&quot; style=&quot;height:300px; width:200px&quot;&gt;
&lt;caption&gt;Dataset&lt;/caption&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;x&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;y&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Color&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;b) Now assume we have the following dataset and the target value is the price.&amp;nbsp;A new&amp;nbsp;data point $x=1$ and $y=1$ is given. Using Euclidean distance in 3-NN. What would be the estimated price?&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;0&quot; style=&quot;height:300px; width:200px&quot;&gt;
&lt;caption&gt;Dataset&lt;/caption&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;x&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;y&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;$40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;$30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;$40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;$70&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;$30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/658/how-to-perform-a-classification-or-regression-using-k-nn</guid>
<pubDate>Thu, 27 Jun 2019 02:54:42 +0000</pubDate>
</item>
<item>
<title>Passing variable length sentences to Tensorflow LSTM</title>
<link>https://ask.ghassem.com/561/passing-variable-length-sentences-to-tensorflow-lstm</link>
<description>&lt;p&gt;I have a tensorflow LSTM model for predicting the sentiment. I build the model with the maximum sequence length 150. (Maximum number of words) While making predictions, i have written the code as below:&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
batchSize = 32
maxSeqLength = 150

def getSentenceMatrix(sentence):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;arr = np.zeros([batchSize, maxSeqLength])
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sentenceMatrix = np.zeros([batchSize,maxSeqLength], dtype=&#039;int32&#039;)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cleanedSentence = cleanSentences(sentence)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cleanedSentence = &#039; &#039;.join(cleanedSentence.split()[:150])
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;split = cleanedSentence.split()
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;for indexCounter,word in enumerate(split):
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;try:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sentenceMatrix[0,indexCounter] = wordsList.index(word)
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;except ValueError:
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sentenceMatrix[0,indexCounter] = 399999 #Vector for unkown words
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return sentenceMatrix

input_text = &quot;example data&quot;
inputMatrix = getSentenceMatrix(input_text)&lt;/pre&gt;

&lt;p&gt;&lt;br&gt;
&lt;br&gt;
In the code i&#039;m truncating my input text to 150 words and ignoring remaining data.Due to this my predictions are wrong.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
cleanedSentence = &#039; &#039;.join(cleanedSentence.split()[:150]) &lt;/pre&gt;

&lt;p&gt;&lt;br&gt;
I know that if we have lesser length than sequence length we can pad with zero&#039;s. What we need to do if we have more length. Can you suggest me the best way to do this. Thanks in advance.&lt;/p&gt;</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/561/passing-variable-length-sentences-to-tensorflow-lstm</guid>
<pubDate>Mon, 11 Feb 2019 05:06:27 +0000</pubDate>
</item>
<item>
<title>Using Tensorflow.DNNClassifier, getting Error: assertion failed: [Labels must &gt;= 0]</title>
<link>https://ask.ghassem.com/440/tensorflow-dnnclassifier-getting-assertion-failed-labels</link>
<description>&lt;p&gt;Hi All,&lt;/p&gt;

&lt;p&gt;I am writing a simple program using Tensorflow and DNNClassifier. Training Data is 9 pixel with four spectral bands, i.e. 4*9=36 featurs. And each data-point will be mapped to a class (from 1 to 7).&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Last parameter, is the class label.&lt;/p&gt;

&lt;p&gt;A line of data-point is like this:&lt;/p&gt;

&lt;pre&gt;
67,75,77,62,67,79,81,62,75,87,89,71,66,79,88,63,66,79,84,63,66,79,80,59,67,84,86,68,71,84,86,64,67,81,82,64,7&lt;/pre&gt;

&lt;p&gt;But I got below Error:&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
InvalidArgumentError (see above for traceback): assertion failed: [Labels must &amp;gt;= 0] [Condition x &amp;gt;= 0 did not hold element-wise:] [x (dnn/head/labels:0) = ] [[3][3][3]...]&lt;/pre&gt;

&lt;p&gt;I am sure there is no datapoint&amp;nbsp;which has a label&amp;nbsp;less than 0. Would you please advise?&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
import numpy as np

import pandas as pd

import tensorflow as tf

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import StratifiedShuffleSplit

print(&#039;** DNN Classification *******************************************************&#039;)

landsatData = pd.read_csv(&quot;./resources/landsat/lantsat.1.csv&quot;)

landsatData.describe()

X_landSatAllFeatures = landsatData.iloc[:, np.arange(36)].copy()

y_midPixelAsTarget = landsatData.iloc[:, 36].copy()

# Testing and training sentences splitting (stratified + shuffled) based on the index (sentence ID)
allFeaturesIndexes = X_landSatAllFeatures.index
targetData = y_midPixelAsTarget
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.3, random_state=42)

for train_index, test_index in sss.split(allFeaturesIndexes, targetData):
    train_ind, test_ind = allFeaturesIndexes[train_index], allFeaturesIndexes[test_index]

Test_Matrix = X_landSatAllFeatures.loc[test_ind]
Test_Target_Matrix = y_midPixelAsTarget.loc[test_ind]
Train_Matrix = X_landSatAllFeatures.loc[train_ind]
Train_Target_Matrix = y_midPixelAsTarget.loc[train_ind]

scaler = StandardScaler().fit(Train_Matrix)
Train_Matrix, Test_Matrix = scaler.transform(Train_Matrix), scaler.transform(Test_Matrix)

def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

X_train = Train_Matrix
y_train = Train_Target_Matrix
X_test = Test_Matrix
y_test = Test_Target_Matrix

xx, yy = Train_Matrix.shape
#training phase
feature_cols = [tf.feature_column.numeric_column(&quot;X&quot;, shape=[36])]
dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300,100], n_classes=8, feature_columns=feature_cols)
# dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300,100], n_classes=10)


input_fn = tf.estimator.inputs.numpy_input_fn(
    x={&quot;X&quot;: X_train}, y=y_train, num_epochs=40, batch_size=64, shuffle=True)
dnn_clf.train(input_fn=input_fn)

#testing phase
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={&quot;X&quot;: X_test}, y=y_test, shuffle=False)
eval_results = dnn_clf.evaluate(input_fn=test_input_fn)
print(&quot;The prediction result is : {0:.2f}%&quot;.format(100*eval_results[&#039;accuracy&#039;]))
y_pred_iter = dnn_clf.predict(input_fn=test_input_fn)
y_pred = list(y_pred_iter)
y_pred[0]


print(&#039;**********************************************************************************&#039;)&lt;/pre&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
<category>Deep Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/440/tensorflow-dnnclassifier-getting-assertion-failed-labels</guid>
<pubDate>Wed, 24 Oct 2018 03:12:33 +0000</pubDate>
</item>
<item>
<title>What are Training set, Validation set, Test set, and Gold set in supervised and unsupervised machine learning?</title>
<link>https://ask.ghassem.com/294/training-validation-supervised-unsupervised-machine-learning</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/294/training-validation-supervised-unsupervised-machine-learning</guid>
<pubDate>Mon, 08 Oct 2018 11:48:29 +0000</pubDate>
</item>
<item>
<title>What are the most important machine learning algorithms?</title>
<link>https://ask.ghassem.com/292/what-are-the-most-important-machine-learning-algorithms</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/292/what-are-the-most-important-machine-learning-algorithms</guid>
<pubDate>Mon, 08 Oct 2018 11:43:59 +0000</pubDate>
</item>
<item>
<title>Is Naive Bayes a good classifier?</title>
<link>https://ask.ghassem.com/264/is-naive-bayes-a-good-classifier</link>
<description>&lt;p&gt;Here is an example of training a model using the Naïve Bayes classifier on the Glass dataset(from UCI). The objective is to predict the type of glass based on the 9 parameters. The metric used to understand the classification result are confusion matrix and classification report.&lt;/p&gt;

&lt;p&gt;The program is available &lt;a rel=&quot;nofollow&quot; href=&quot;https://repl.it/@haroldj/NaiveBayesClassifier&quot;&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Few Observations/ Questions&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;By Varying the ‘random_state’ value inside the function train_test_split, we can observe different accuracy values? Is the behavior correct?&lt;/li&gt;
	&lt;li&gt;The StratfiedShuffle method of &amp;nbsp;train_test_split also produces random results on the every run. Is there a bug with Naïve Bayes classifier implementation?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/264/is-naive-bayes-a-good-classifier</guid>
<pubDate>Thu, 04 Oct 2018 00:23:00 +0000</pubDate>
</item>
<item>
<title>Explain Cross-validation and why should we use it in Machine Learning?</title>
<link>https://ask.ghassem.com/180/explain-cross-validation-and-why-should-use-machine-learning</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/180/explain-cross-validation-and-why-should-use-machine-learning</guid>
<pubDate>Fri, 28 Sep 2018 15:44:58 +0000</pubDate>
</item>
<item>
<title>What is Bayes’ Theorem? How is it useful in machine learning? Where should we use it?</title>
<link>https://ask.ghassem.com/165/what-bayes-theorem-how-useful-machine-learning-where-should</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/165/what-bayes-theorem-how-useful-machine-learning-where-should</guid>
<pubDate>Thu, 27 Sep 2018 05:27:55 +0000</pubDate>
</item>
</channel>
</rss>