<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged linear-regression</title>
<link>https://ask.ghassem.com/tag/linear-regression</link>
<description>Powered by Question2Answer</description>
<item>
<title>How to calculate the residual errors, (MSE),(MAE), and (RMSE)?</title>
<link>https://ask.ghassem.com/1031/how-to-calculate-the-residual-errors-mse-mae-and-rmse</link>
<description>&lt;p&gt;Given the following sample dataset with 5 samples and 2 features:&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th&gt;Sample&lt;/th&gt;
&lt;th&gt;Feature 1&lt;/th&gt;
&lt;th&gt;Feature 2&lt;/th&gt;
&lt;th&gt;Actual Value&lt;/th&gt;
&lt;th&gt;Predicted Value&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br&gt;
Calculate the residual errors, mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) using a sample model.&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1031/how-to-calculate-the-residual-errors-mse-mae-and-rmse</guid>
<pubDate>Fri, 27 Jan 2023 04:09:28 +0000</pubDate>
</item>
<item>
<title>Bankruptcy prediction and credit card</title>
<link>https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</link>
<description>Hello everyone newbie data scientist here.&lt;br /&gt;
I&amp;#039;m working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :&lt;br /&gt;
For example below 50 probability is good and above is bad ( just for the example)&lt;br /&gt;
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).&lt;br /&gt;
I&amp;#039;m planning to use this models:&lt;br /&gt;
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.&lt;br /&gt;
&lt;br /&gt;
The question is and i know it is a dumb question:&lt;br /&gt;
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I&amp;#039;m not sure about it.&lt;br /&gt;
&lt;br /&gt;
Otherwise, any guidance,tips anything will be appreciated.</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/1021/bankruptcy-prediction-and-credit-card</guid>
<pubDate>Sun, 10 Apr 2022 05:50:14 +0000</pubDate>
</item>
<item>
<title>How to calculate residual errors for linear regression and interpret regression metrics?</title>
<link>https://ask.ghassem.com/829/calculate-residual-regression-interpret-regression-metrics</link>
<description>Assuming we have a linear regression equation and some data points (sample), how can we calculate residual error for each data point, and total cost based on the metrics such as MAE, MSE, RMSE, MAPE, or MPE if we have their formula?</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/829/calculate-residual-regression-interpret-regression-metrics</guid>
<pubDate>Tue, 18 Feb 2020 18:30:51 +0000</pubDate>
</item>
<item>
<title>How to create def for cross_val_score related to linear regression problem?</title>
<link>https://ask.ghassem.com/674/create-crossvalscore-related-linear-regression-problem</link>
<description>&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
def cross_val_score(estimator,X,y,scoring,cv):
&amp;nbsp; &amp;nbsp; scores=cross_val_score
&amp;nbsp; &amp;nbsp; scores_rmse=np.sqrt(-scores)
&amp;nbsp; &amp;nbsp; print(&#039;Scores: &#039;,scores_rmse)
&amp;nbsp; &amp;nbsp; print(&quot;Mean:&quot;, scores_rmse.mean())
&amp;nbsp; &amp;nbsp; print(&quot;Standard deviation:&quot;, scores_rmse.std())&lt;/pre&gt;

&lt;p&gt;This is the def I created and passed to below&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
cross_val_score(SGDRegressor,X_train,y_train,scoring=&#039;neg_mean_squared_error&#039;,cv=5) &lt;/pre&gt;

&lt;p&gt;I am getting below error...&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
ValueError                                Traceback (most recent call last)
&amp;lt;ipython-input-181-275d240df219&amp;gt; in &amp;lt;module&amp;gt;()
----&amp;gt; 1 plot_validation_curve(lin_reg_SGD,X_train,y_train,&#039;alpha&#039;, [0.001,0.01],scoring=&#039;neg_mean_squared_error&#039;,cv=5)
3 frames

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    203     if len(uniques) &amp;gt; 1:
    204         raise ValueError(&quot;Found input variables with inconsistent numbers of&quot;
--&amp;gt; 205                          &quot; samples: %r&quot; % [int(l) for l in lengths])
    206 
    207 

ValueError: Found input variables with inconsistent numbers of samples: [13903, 13903, 22]
SEARCH STACK OVERFLOW&lt;/pre&gt;


</description>
<category>Programming</category>
<guid isPermaLink="true">https://ask.ghassem.com/674/create-crossvalscore-related-linear-regression-problem</guid>
<pubDate>Wed, 03 Jul 2019 13:37:35 +0000</pubDate>
</item>
<item>
<title>How do I Plot the linear classifier calculated with LIBLINEAR using sklearn?</title>
<link>https://ask.ghassem.com/629/plot-linear-classifier-calculated-liblinear-using-sklearn</link>
<description>Make a scatter plot where the x-axis is the height of the citizens and the y-axis is the weight of the citizens. The color of the points need to be different for males and females. In the same figure, plot the linear classifier calculated with LIBLINEAR using sklearn</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/629/plot-linear-classifier-calculated-liblinear-using-sklearn</guid>
<pubDate>Thu, 16 May 2019 08:13:06 +0000</pubDate>
</item>
<item>
<title>how can i convert LSTM model to linear regression model?</title>
<link>https://ask.ghassem.com/624/how-can-i-convert-lstm-model-to-linear-regression-model</link>
<description>&lt;p&gt;Here is LSTM predict model and i want to convert Linear Regression.&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;...
model.fit(x_train, y_train, epochs=10, batch_size=16)

trainPredict = model.predict(x_train)
testPredict = model.predict(x_test)
# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([y_train])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([y_test])
&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;I tried,&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;y = trainPredict
x = range(0,len(y))
XGBModel = XGBRegressor()
XGBModel.fit(x,y, verbose=False)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And the result is :&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Check failed: preds.Size() == info.labels_.Size() (1 vs. 56969) labels are not correctly providedpreds.size=1, label.size=56969&#039;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I don&#039;t know why this error occurs. How can I solve this problem?&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/624/how-can-i-convert-lstm-model-to-linear-regression-model</guid>
<pubDate>Mon, 29 Apr 2019 11:50:07 +0000</pubDate>
</item>
<item>
<title>How to calculate univariate linear regression?</title>
<link>https://ask.ghassem.com/610/how-to-calculate-univariate-linear-regression</link>
<description>&lt;p&gt;For the following dataset, calculate the regression equation $\hat{y} = ax+b$&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;height:246px; width:213px; border-spacing: 1px;&quot;&gt;
&lt;caption&gt;dataset&lt;/caption&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;x&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;y&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;42&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;


</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/610/how-to-calculate-univariate-linear-regression</guid>
<pubDate>Thu, 11 Apr 2019 16:46:47 +0000</pubDate>
</item>
<item>
<title>How to update weights using gradient decent algorithm?</title>
<link>https://ask.ghassem.com/596/how-to-update-weights-using-gradient-decent-algorithm</link>
<description>&lt;p&gt;For the&amp;nbsp;below neural network, imagine we are going to use&amp;nbsp;the&amp;nbsp;&lt;strong&gt;backpropagation algorithm&lt;/strong&gt; to update weights. If the Bias (b) in this problem is always 0 (ignore bias when you solve the problem), and we have a dataset with only one record of $x=2$ and the target value of $y=5$ as you can see in the following table,&amp;nbsp;and activation function&amp;nbsp;is defined as $f(z) = z$&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:200px&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot;&gt;feature (x)&lt;/th&gt;
&lt;th scope=&quot;col&quot;&gt;Target (y)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;1) Define the cost function, $J(w)$, based on the error in backpropagation algorithm: $J(w) = E = \frac{1}{2}(predicted - target)^2$, and draw it&lt;/p&gt;

&lt;p&gt;2) Initialize the weight by $w=3$, and calculate the error&lt;/p&gt;

&lt;p&gt;3) Calculate updated weights using the gradient&amp;nbsp;decent algorithm &lt;strong&gt;after three updates &lt;/strong&gt;if we have the following values for learning rate ($\alpha$)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$\alpha$ = 1&lt;/li&gt;
&lt;li&gt;$\alpha$ = 0.1&lt;/li&gt;
&lt;li&gt;$\alpha$ = 0.5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hint:&amp;nbsp; &amp;nbsp;$w_{new} = w_{old} - \alpha \frac{\partial E}{\partial w}$&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://i.imgur.com/uohFS6l.png&quot;&gt;https://i.imgur.com/uohFS6l.png&lt;/a&gt;&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/596/how-to-update-weights-using-gradient-decent-algorithm</guid>
<pubDate>Thu, 28 Mar 2019 17:17:39 +0000</pubDate>
</item>
<item>
<title>Looking for guidance on whether I have the necessary data to answer a Regression question</title>
<link>https://ask.ghassem.com/595/looking-guidance-whether-necessary-answer-regression-question</link>
<description>&lt;p&gt;Hi everyone.&lt;/p&gt;

&lt;p&gt;I&#039;m currently working on my final project for a Data Science degree and after a month of literature review, exploratory analysis and model testing,&amp;nbsp;I&#039;m not sure if the questions I set out to answer are suitable for&amp;nbsp;the data I have.&lt;/p&gt;

&lt;p&gt;This is a very broad question I&#039;m asking here, as it&#039;s more guidance than anything else, so if this is not the place to ask, I would appreciate it if you could redirect me to the right place.&lt;/p&gt;

&lt;p&gt;You can find the data sets and code on my github &lt;a rel=&quot;nofollow&quot; href=&quot;https://github.com/TomGoncalves/IAQ-Project&quot;&gt;here&lt;/a&gt;.&amp;nbsp;The code is messy but working; I&#039;ve only picked up programming last year.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;The data&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;Indoor Air Quality data recorded hourly through 4 sensors (Kitchen, Bedroom, Living Room, Bathroom) for 7 days in a house for a total of 3 houses. For 6 of those days, each sensor was in a different room and on the last one, all sensors were together so we could see how spread apart their signals were and account for that). So in here&amp;nbsp;I have 9 continuous variables: Temperature, Relative Humidity, CO, CO2, TVOC, PM2.5, NO2, Ozone and Air Pressure.&lt;/p&gt;

&lt;p&gt;I then got 3 manually-filled questionnaires on Occupant Activity, one for each house, such as &quot;Door open/closed&quot;, &quot;Window open/closed&quot;, &quot;Heating On/off&quot;, &quot;Frying&quot;, &quot;Boiling&quot;, &quot;Hoovering&quot;, &quot;Mopping&quot;, etc. Now, these logs were missing a lot of data.&lt;/p&gt;

&lt;p&gt;These questionnaires were a mess and a lot of the missing values had to be imputed. This data is reported in binary format such as &quot;Did Activity X occur at hour Y? - Yes(1)/No(1).&lt;/p&gt;

&lt;p&gt;With this project I&#039;ve chosen to&amp;nbsp;predict a sensor data variable (in this case CO2), based on activities.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Models&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;Just to have a feel for the data, I&amp;nbsp;ran a Linear Regression, Decision Tree and Random Forest model with a choice of only Occupant Activity predictors and both Occupant Activity and other sensor variables as predictors on individual rooms of each house and the results are just atrocious in every case. Cross-validation shows the model&#039;s performance to be all over the place and looking at features for statistical significance gives me different significant features in every room of every house, it&#039;s like I&#039;m playing feature roulette. Problem with some features such as Mopping, Frying, Boiling, Hoovering is that there will be a lot of &quot;0&quot;s in comparison to &quot;1&quot;s due to the nature of the feature, so one or two &quot;1&quot;s in the wrong place is enough to give a misguided correlation.&lt;/p&gt;

&lt;p&gt;As you can tell and see from this, I&#039;m still a Data Scientist in-training here, having only done a few models in the past and rather new-ish to programming (1 year experience).&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;What I&#039;m looking for&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;I suppose that more than anything, I&#039;m asking for guidance on whether pursuing this as a Regression problem is feasible or not.&lt;/p&gt;

&lt;p&gt;I&#039;m very short on time but if this won&#039;t work, I can look into alternatives. For instance,&amp;nbsp;Air Pollutants have safety thresholds. I could create a class feature on whether the value is over the threshold or not and turn it into a classification problem or even a cluster one to identify the room based on activities and air pollutants..&lt;/p&gt;

&lt;p&gt;Bottom-line is that I have a 12,500 word paper to deliver in a month, I&#039;ve been at this for month already with nothing to show for, so I&#039;m hoping someone with more experience under their belt could see if I&#039;m chasing a dead end.&amp;nbsp;Any help in the form of guidance would be so very much appreciated, I&#039;ve ran out of ideas here.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Tom&lt;/p&gt;</description>
<category>Machine Learning</category>
<guid isPermaLink="true">https://ask.ghassem.com/595/looking-guidance-whether-necessary-answer-regression-question</guid>
<pubDate>Sun, 24 Mar 2019 23:40:09 +0000</pubDate>
</item>
<item>
<title>What is residual in the context of linear regression?</title>
<link>https://ask.ghassem.com/484/what-is-residual-in-the-context-of-linear-regression</link>
<description></description>
<category>Machine Learning Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/484/what-is-residual-in-the-context-of-linear-regression</guid>
<pubDate>Tue, 30 Oct 2018 11:25:42 +0000</pubDate>
</item>
<item>
<title>Please explain Linear Regression with an example?</title>
<link>https://ask.ghassem.com/336/please-explain-linear-regression-with-an-example</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/336/please-explain-linear-regression-with-an-example</guid>
<pubDate>Fri, 12 Oct 2018 02:14:00 +0000</pubDate>
</item>
</channel>
</rss>