Ask Ghassem - Recent questions

Step-by-Step Hidden State Calculation in a Recurrent Neural Network

Mon, 01 Dec 2025 18:32:24 +0000

Consider a simplified Recurrent Neural Network (RNN) with a single input and a single output. The hidden state is updated using the recurrence:

$$ h_t = \text{ReLU}(W_{ih} \cdot x_t + W_{hh} \cdot h_{t-1}) $$

Assume the following:

$ x_t = 3 $ for every time step
$ h_0 = 0 $
$ W_{ih} = 0.4 $
$ W_{hh} = 0.6 $
Activation function: ReLU

Compute the value of the hidden state $ h_4 $ at time $ t = 4 $.

How to calculate feed-forward (forward-propagation) in neural network for classification?

Wed, 02 Oct 2024 14:47:26 +0000

For the following neural network, calculate accuracy of classification, given these settings

How to analyse imbalanced categorical colum in dataset

Sat, 24 Jun 2023 17:55:23 +0000

Hello,

I have a dataset with a categorical column that contains three categories. One of the categories represents 98% of the data, while the remaining 2% are distributed between the other two categories, with a few (maybe around 50) in each. It is worth mentioning that the output for these 50 rows is the same, which suggests that these data points may be important.

However, the data is obviously imbalanced, and I am unable to perform any analysis. Should I drop the entire column, or perform a chi-square test on the data as-is?

When to use one hot encode a category and when to segment by category?

Wed, 22 Feb 2023 20:30:38 +0000

When pre processing data for machine learning. Is there any difference in using one hot encoding to turn categoric variables into numeric variables or to segment the data and the model being used along the category. So say you run a multivariate regression model on data covering 5 cities. Would a single model with one variable for each city be more better or worse than having 5 models specific for each city? Or is there no difference? Or does it depend on certain factors and intuition?

How to calculate the residual errors, (MSE),(MAE), and (RMSE)?

Fri, 27 Jan 2023 04:09:28 +0000

Given the following sample dataset with 5 samples and 2 features:

Sample	Feature 1	Feature 2	Actual Value	Predicted Value
1	2	3	4	6
2	3	4	5	6
3	4	5	6	7
4	5	6	7	8
5	6	7	8	9

Calculate the residual errors, mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) using a sample model.

Can you verify the validity of this chart comparing the review scores for Marvel Phase 4?

Mon, 09 Jan 2023 16:29:14 +0000

I have some skepticism about the validity of the charts below comparing the critic and audience reviews for Phase 4 of the MCU to the previous 3 phases. There are over 18 movies and tv shows in Phase 4 compared to the 6 movies in Phases 1 & 2 and the 11 movies in Phase 3. Also, there are far fewer critic reviews for the Phase 4 tv shows than the Phase 4 movies. For example, on Rotten Tomatoes there are only 40 critic reviews for The Falcon and the Winter Soldier and 452 critic reviews for Black Widow. Could this uneven and inconsistent number of reviews between tv shows and movies in Phase 4 be inaccurately making the overall averages higher than they should be? Or do you agree with the conclusions presented in the charts?

https://cdn.discordapp.com/attachments/997145183172964435/1059948060194652230/image.png

https://cdn.discordapp.com/attachments/997145183172964435/1049356020469739520/image.png

List of free Qwiklabs labs

Fri, 28 Oct 2022 23:36:55 +0000

Provide a short list of free cloud labs on Qwiklabs

Which code has best runtime and why?(the one commented or the other one)

Fri, 02 Sep 2022 14:39:49 +0000

# for key, value in dict.items():
            #     if value >= long:
            #         long = value
            #         long_name = key
            #     if value < short:
            #         short = value
            #         short_name = key
        long = max(dict.values())
        long_name = max(dict, key=dict.get)
        short = min(dict.values())
        short_name = min(dict, key=dict.get)

Creating tables from unstructured texts about stock market

Tue, 02 Aug 2022 00:47:49 +0000

I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?

How do I compare the count of a value in each year while having a different sanple size each year.

Wed, 08 Jun 2022 10:32:33 +0000

How do I accurately compare between the number of something a survey measure from my employees each year with a varying umber of survey engagement and employee size?

If I was measuring the satisfaction of my employees over the years by collecting a survey from my them each year by asking them wether they are satisfied or not, and then comparing yes’s over the years but the number of employees who answer is not the same each year and the number of employees increases every year. How do I correctly compare this throughout each year?

In other words, how do I remove the effect of the survey engagement rate when calculating the results?

Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform?

Thu, 02 Jun 2022 16:10:26 +0000

Is it possible to make a forecast of a future value of Air Temperature using Fast Fourier Transform, if yes, what should be the process or how you'll be able to do it. Thank you!

forecast log transformed fitted values for 2 years using ARMA model

Wed, 04 May 2022 20:31:44 +0000

Input is a stock price in exponential transformation. We are asked to forecast using ARMA results for 2 years.

Kmeans clustering in python - Giving original labels to predicted clusters

Wed, 27 Apr 2022 05:32:54 +0000

I have a dataset with 7 labels in the target variable.

X = data.drop('target', axis=1)
Y = data['target']
Y.unique()

array(['Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II',
'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II',
'Obesity_Type_III'], dtype=object)

km = KMeans(n_clusters=7, init="k-means++", random_state=300)
km.fit_predict(X)
np.unique(km.labels_)

array([0, 1, 2, 3, 4, 5, 6])

After performing KMean clustering algorithm with number of clusters as 7, the resulted clusters are labeled as 0,1,2,3,4,5,6. But how to know which real label matches with the predicted label.

In other words, I want to know how to give original label names to new predicted labels, so that they can be compared like how many values are clustered correctly (Accuracy).

Bankruptcy prediction and credit card

Sun, 10 Apr 2022 05:50:14 +0000

Hello everyone newbie data scientist here.
I'm working on a project to predict companies (probability of default) bankruptcy probability and to assign them a credit rating/score based on that :
For example below 50 probability is good and above is bad ( just for the example)
I have a dataset contains financial ratios and a class refers if the company is bankrupted or not (0 and one).
I'm planning to use this models:
Logistic regression linear discrimination analysis, decision trees, random forest, ANN, adaboost, Svm.

The question is and i know it is a dumb question:
Does those models return a probability? Which i can transform to labels, I saw that in a thesis and I'm not sure about it.

Otherwise, any guidance,tips anything will be appreciated.

how to output f1-score instead of accuracy

Sat, 02 Apr 2022 13:04:21 +0000

I have the code below, outputting the accuracy. How can I output the F1-score instead? Thanks in advance,

 clf.fit(data_train,target_train)  
preds = clf.predict(data_test)  
# accuracy for the current fold only     
r2score = clf.score(data_test,target_test)

I cannot get this code to work. please help.

Mon, 21 Mar 2022 05:59:53 +0000

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.model_selection import train_test_split

model = Sequential()
model.add(LSTM( 10, input_shape=(1, 1)))
model.add(Dense(1, activation="linear"))
model.compile(loss="mse", optimizer="adam")

X, y = get_data()

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)
X_train_2, X_val, y_train_2, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

model.fit(X_train, y_train, epochs=800, validation_data=(X_val, y_val), shuffle=False)

html, body, table, thead, input, textarea, select {color: #bab5ab!important; background: #35393b;} input[type="text"], textarea, select {color: #bab5ab!important; background: #35393b;} [data-darksite-inline-background-image-gradient] {background: linear-gradient(rgba(0, 0, 0, 0.5), rgba(0, 0, 0, 0.5))!important; -webkit-background-size: cover!important; -moz-background-size: cover!important; -o-background-size: cover!important; background-size: cover!important;} [data-darksite-force-inline-background] * {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-background] {background-color: rgba(0,0,0,0.7)!important;} [data-darksite-inline-color] {color: #fff!important;} [data-darksite-inline-background-image] {background-image: linear-gradient(rgba(0,0,0,0.3), rgba(0,0,0,0.3))!important}

Battery data projects

Wed, 02 Mar 2022 18:11:57 +0000

Where can I find projects related to battery data?

How can you build dynamic pricing model with data only from rigid pricing?

Fri, 21 Jan 2022 06:44:31 +0000

I want to build a dynamic pricing model which means if product is too expansive for a client and there is a risk that we might loose a client we lower the price for them but if client doesn't care that much about the price we might increase price a little.

All the articles I've seen describe some kind of A/B testing for the pricing and then create a model.

I want to build a model only on the existing rigid pricing data. So I have prices offered to customers and I know who bought the product and who went to other company.

How can I do the increasing price part?

What analytical software would be good for a company to use?

Fri, 14 Jan 2022 16:46:38 +0000

This would be for a company that is just now looking into using a software to track data for wine making.

Do you usually collect you own data or there is always a resource available for you? Or it depends on the company?

Sun, 09 Jan 2022 22:13:34 +0000

When dealing with categorical values, should the 'year' column be encoded using OHE or OrdinalEncoder?

Sat, 18 Dec 2021 18:46:07 +0000

It's a car prices dataset, and so I'm assuming that the more recent the more value a car should have. The values in the 'year' column simply consist of years from 1995 to 2020.
I am trying to predict the selling price of the car.

I'm a bit new to ML, currently still doing my undergraduate so any help / tips are appreciated. Thank you.

How to use Genetic Algorithm to optimize a function?

Tue, 07 Dec 2021 23:53:51 +0000

Assume the function is defined as $f(x,y)=x^2+y^2-4xy$, and $1\leq x \leq 4,1\leq y \leq 4$. The Genetic Algorithm is selected to maximize the function. If the first population for pairs of $(x,y)$ is defined as $S=\{A=(1,2), B=(2,1), C=(2,2), D=(2,3), E=(3,1) \}$.

a) Calculate the fitness of each of individuals (A,B,C,D,E) in population if: $\text{fitness function}=f(x,y)$
b) Calculate the probability of each individual and sort them in descending order. Which individual has the maximum fitness (probability)? $p_{i}=\frac{f_{i}}{\sum_{j=1}^{N} f_{j}}$
c) Draw the roulette wheel and calculate the boundaries for each individual
d) If we use two individuals and their arithmetic mean for crossover each time, and for mutation, we add 0.1 to x and subtract 0.1 from y for each individual created after crossover, what will be the next population with five members?
For part (d), use the following random numbers in order whenever you need them in the selection process:
$\text{random numbers} = \{0.780,0.220,0.776,0.507,0.822,0.765,0.288,0.881,0.895,0.421\}$

How to create a Decision Tree using the ID3 algorithm?

Wed, 01 Dec 2021 11:26:02 +0000

NASA wants to be able to discriminate between Martians (M) and Humans (H) based on the
following characteristics: Green ∈{N, Y }, Legs ∈{2, 3}, Height ∈{S, T}, Smelly ∈{N, Y }.
Our available training data is as follows:

https://i.imgur.com/3bC391L.png

a) Greedily learn a decision tree using the ID3 algorithm and draw the tree.
b) Write the learned concept for Martian as a set of conjunctive rules (e.g., if (green=Y
and legs=2 and height=T and smelly=N), then Martian; else if ... then Martian; ...; else
Human).

How do I know which encoder to use to convert from categorical variables to numerical?

Mon, 29 Nov 2021 04:09:06 +0000

So say I have a column with categorical data like different styles of temperature: 'Lukewarm', 'Hot', 'Scalding', 'Cold', 'Frostbite',... etc.

I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other 'converters' (not sure if that's the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).

How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I'd greatly appreciate it.

ValueError: Length mismatch: Expected axis has 60 elements, new values have 2935849 elements

Fri, 26 Nov 2021 06:09:16 +0000

I'm creating a new data frame with the most used items grouped together. But I got the following error when grouping through ID and items. ValueError: Length mismatch: Expected axis has 60 elements, new values have 2935849 elements.

df = sales_df[sales_df['shop_id'].duplicated(keep=False)]
df['Grouped'] = sales_df.groupby('shop_id')['item_name'].transform(lambda x: ','.join(x))
df2 = df[['shop_id', 'Grouped']].drop_duplicates()

In the aforementioned code, I'm making a data frame with respect to shop id and then grouping through shop items. My objective here is to group items with similar ID.

Text Mining, Artificial Neural Networks, Speech Processing, Cloud Computing in DS? Essential for a good Data Scientist ?

Wed, 27 Oct 2021 19:15:16 +0000

Classification of data object might be incorrect

Mon, 25 Oct 2021 15:26:46 +0000

I am learning a new Salesforce product (Evergage) for the company I work for. In the program's documentation they have listed a set of data objects as an example. It appears to me that the classification might be incorrect. Their system makes a division between 'catalog objects' and 'profile objects' and the example they have given is a banking institution. They classified Customer Credit Card as a profile object and Credit Card Level as a catalog object. Seems to me that it should be the other way i.e Customer Credit Card = catalog object and Credit Card Level = profile object. Maybe I am not reading the context correctly?

here is a link to an image with the complete classification: https://drive.google.com/file/d/1nG4aX4Ty_NoHxm04AQo1Ow61m3MZ3pXm/view?usp=sharing

Can Data Science solve this problem?

Sun, 24 Oct 2021 15:43:11 +0000

So, I live in Brazil, and I have a task for college that I don't know what data science method to use, if at all, to solve it. My idea is the following: We Brazilians have Real (BRL) as currency, and we of course have the dollar quotation value to see "how many Reais a dollar is worth". What I wanted to do was to make a research and see whether the Country News have any influence over this price. So for example, if Bolsonaro, our president, says some dumb stuff, the dollar got up in price, and vice versa. What I wanted to do was collect all dollar values and variance over a set time interval, and try and get webscraping to get the news over some economy sites. Here's my question then: How can I correlate the news with the dollar variance over a set time? Can data science do that? How do I preprocess this, if at all? Do I need to use bag-of-words? At least I heard so... Please help and thank you for reading.

Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?

Tue, 05 Oct 2021 17:45:38 +0000

I am trying to build an unsupervised ML model to detect anomalies within 5000+ users' login data. I selected 5 features contained within each of the user-login events (e.g. IP, hour of day, day of week, device_id, OS). I am looking for the best algorithm to use. I am considering using density function to determine probabilities of the feature values and whether an event is an outlier. The problem is that feature values are only relevant to the specific user. For example, you cannot compare login IP across users, login IP is only applicable to the user.
Ultimately, I want to detect events that are changes in a user login behavior, like different IP, day, hour, device_id, or OS, where the more features that have changed increase the probability of an outlier.
At this point, I am not sure how to build a model with data that contains multiple users, because I don't know how to separate the user data so the model is trained per user and finding anomalies within the individual user's features.

I also don't have any labeled data to use for testing, should I fabricate some?

Any advice greatly appreciated.

Thank you!

How we incorporate the polyline in machine learnning tools

Wed, 29 Sep 2021 06:16:30 +0000

Suppose I have to predict the traffic of a road segment based on available data such as number of houses and business along the road segment. Which machine learning tool would be the option to use that can incorporate the road segment (polylines) through coordinates in the attributes.

should i start as a data analyst then data science?

Mon, 21 Jun 2021 20:31:04 +0000

should I start as a data analyst then data science?

I am a second-year Bachelor's in Computer Science and wanted to pursue to be a Data Scientist.

However, when I am trying to apply for internships/jobs, most of it requires a Masters's/Ph.D.

But, a Data Analyst has fewer requirements.

Do you recommend starting off as a Data Analyst and then change to Data Science?

how many samples do we need to test image segmentation using synthetic data ?

Mon, 21 Jun 2021 12:26:32 +0000

Hello,

I trained a CNN using synthetic data to perform a segmentation task on human faces. During the test and to evaluate the prediction of this network, I used 200 examples from the database to compute precision and recall.

Is this number sufficient, knowing that I control myself the data generator and that I build the database by randomly drawing the elements using centered Gaussian distributions.

Thank you,

How best to ensure data quality?

Tue, 08 Jun 2021 22:02:23 +0000

Can we have multiple target values in a ML problem dataset for supervised learning?

Sun, 30 May 2021 16:31:48 +0000

Searching for movie dataset containing movie synopses/plots?

Thu, 27 May 2021 09:57:31 +0000

Hello
To build a hybrid recommendation system, I used the movielens 1M dataset, for the collaborative filtering part. Now, I'm looking for a database/dataset that contains descriptions/summaries/details/synopses/plots of movies for the content-based recommendation.
Is there someone who could help me and tell me where I can find a such dataset?
thank you in advance.

Intermittent Mathematics (Logarim)

Wed, 05 May 2021 12:16:20 +0000

The old keypad of the telephone, it has 10 numbers (10 keys) , this keypad allows the user to enter a text by successively pressing certain key many times in a small period of time. you need to draw a graph of entering a text input using this keypad. after that you need to have a certain algorithm of finding the length of a path to enter certain text
example
aaa --> 6
aba --> 5

the link below shows the phone keypad

https://commons.wikimedia.org/wiki/File:Telephone-keypad.png

The old keypad of the telephone, it has 10 sume. yypad. after that yout

Tue, 04 May 2021 14:39:49 +0000

هاد سؤال رياضيات متقطعة

The old keypad of the telephone, it has 10 numbers (10 keys) , this keypad allows the user to enter a text by successively pressing certain key many times in a small period of time. you need to draw a graph of entering a text input using this keypad. after that you need to have a certain algorithm of finding the length of a path to enter certain text

How to calculate average with deviating sensors?

Tue, 04 May 2021 14:39:14 +0000

In case of 3 sensors reporting loads of values individually.. one sensor might be off. The average of the 2 trustworthy sensors is to be reported.. the third in need for recalibration is to be neglected. I'm in need of an (excel) formula looking at three columns which row-by-row detects a significant deviation compared to the others and calculate the average of the most trustworthy.
Example:
48.1 ; 45.2 ; 45.4 => 45.3, as sensor 1 is way off....
36.0 ; 37;0 ; 45.0 => 36.5, as sensor 3 is way off....
36.0 ; 36;5 ; 37.0 => 36.5 as the deviation is too small to be considered an anomaly, so all values are valid to create the average.

Working with long periods of time.. the readings might be trustworthy for a few weeks, but in defect from moment X up until now... so simply ruling out one sensor is not really an option either.. What is the best way forward?
Please help. Highly appreciated.

design a computer-based system that will encourage autistic children to communicate and express themselves better.

Thu, 01 Apr 2021 07:04:59 +0000

a) A company has been asked to design a computer-based system that will encourage autistic children to communicate and express themselves better.

b) What type of interaction would be appropriate to use at the interface for this particular user group?

Very short text classification when category text should be replaced by another category text?

Thu, 11 Feb 2021 12:48:47 +0000

I need some tool to classify articles based on short category text which consists of two or three words separated by '-'. The RSS/XML tag content is for example:

Foreign - News

Football - Foreign

I created my own categories in DB and now I need to classify categories from parsed RSS of this news source, so it fits news categories defined by me.

I would, for example need all articles containing category "football" to be identified as a category Sport but sometimes those categories XML tags contains exact match like Foreign - News should belong in the DB to category defined by me as Foreign.

Since I used only trained decision trees frameworks from AI so for another project so far, I would like to hear advice about probably AI based approach, technique or particular framework I can use to solve this problem. I don't want to get into a dead-end street by my own poor, in the field of AI not very experienced decision.

While it can be solved by many ifs and 'contains' function, it seems to me like not a very good solution.

TLDR; I need basically something like "clever, flexible and universal if-elseif".

NOTE: I can also use article description text, if that would be necessary but it seems to me that this former category text is unambiguous enough for this kind of problem.

Terminology clarification in Spark

Sat, 06 Feb 2021 18:03:32 +0000

I have a hard time distinguishing terminologies of SparkSQL. While SparkSQL are quite flexible in terms of abstraction layers, its really difficult for beginner to navigate around those options.

1. When we say " using SparkSQL to perform .....", does it mean that we can use any API/abstraction layers such as Scala, Python, HiveQL to query? As long as the core dataframe is in spark, we should be fine?

2. Can we manipulate data in both PySpark and Scala sequentially?

For example, may I clean up the data in Scala, then perform follow up manipulation in PySpark, then go back to Scala?

3. As demonstrated in the tutorial, we can query with SQL command by using the api spark.sql("My SQL command"). does it count as SQL or SPARK?

Binary Classification and neutral tag

Sat, 30 Jan 2021 10:08:01 +0000

I am trying to create a sentiment analysis model using binary classification as loss.I have a batch of tweets that some of them are tagged as positive (labeled as 1) and negative (labeled as 0).I manage to gather some tweets that are tagged as neutral but there are less tweets than positive and negative.My thinking is to tag them with 0.5 to balance the classification probability.Is this legit?

"Rare words" on vocabulary

Sat, 30 Jan 2021 09:57:31 +0000

I am trying to create a sentiment analysis model and I have a question.

After I preprocessed my tweets and created my vocabulary I've noticed that I have words that appear less than 5 times in my dataset (Also there are many of them that appear 1 time). Many of them are real words and not gibberish. My thinking is that if I keep those words then they will get wrong "sentimental" weights and gonna make my model worse.
Is my thinking right or am I missing something?

My vocab size is around 40000 words and those that are "rare" are around 10k.Should I "sacrifice" them?

Thanks in advance.

My GloVe word embeddings contain sentiment?

Sun, 03 Jan 2021 14:09:37 +0000

I've been researching sentiment analysis with word embeddings. I read papers that state that word embeddings ignore sentiment information of the words in the text. One paper states that among the top 10 words that are semantically similar, around 30 percent of words have opposite polarity e.g. happy - sad.

So, I computed word embeddings on my dataset (Amazon reviews) with the GloVe algorithm in R. Then, I looked at the most similar words with cosine similarity and I found that actually every word is sentimentally similar. (E.g. beautiful - lovely - gorgeous - pretty - nice - love). Therefore, I was wondering how this is possible since I expected the opposite from reading several papers. What could be the reason for my findings?

Two of the many papers I read:

Yu, L. C., Wang, J., Lai, K. R. & Zhang, X. (2017). Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 671-681.
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. & Qin, B. (2014). Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1: Long Papers, 1555-1565

Do I need to save the standardization transformation?

Tue, 15 Dec 2020 13:06:48 +0000

When I standardized my data when I created my model. Do I need to save the standardization transformation when I want to predict with my model new data ?

Why should I use Dynamic Time Warping over GMM for timer series clustering?

Fri, 04 Dec 2020 03:19:16 +0000

is it possible to derive a new 95% CI from two separate 95% CIs?

Mon, 23 Nov 2020 14:45:19 +0000

How to predict from unseen data?

Tue, 17 Nov 2020 16:18:28 +0000

Hi. I have a question about model-based predictions when data is only available after the fact. Let me give you an example. I try to predict the result (HOME, AWAY or a DRAW) of the match based on data like number of shots, ball possession, number of fouls, etc.

TARGET	TEAM 1	TEAM 2	possesion team 1	possesion team 2	shots team 1	shots team 2	fouls team 1	fouls team 2
HOME	Arsenal	Chelsea	60	40	12	8	5	7

TARGET

TEAM 1

TEAM 2

possesion

team 1

possesion

team 2

shots

team 1

shots

team 2

fouls

team 1

fouls

team 2

HOME

Arsenal

Chelsea

Let's say I'm already after training the model and I want to see if I can predict the upcoming match. However, this match is only a few days away and I want to know the result of the model today. I understand that if the match had already taken place and I had the data, I could test it on the model and get the result. The goal is for the model to predict what will happen before the match.

Is it possible at all? What are my options? Should I only select pre-match variables? For example, last game form, match referee etc or should I aggregate the variables and include average possession, average shots and average number of fouls from recent matches?

Probability of a bus arrived in its destination based on weather condition

Mon, 09 Nov 2020 13:06:47 +0000

A bus is making its way to a destination. If the weather conditions are favorable today, the likelihood of delay is 3%. If the weather conditions are not favorable today, the likelihood of delay is 50%. The forecast predicts that it is 20% likely that the weather conditions will be favorable today.

1. What is the likelihood that the bus will be delayed?

2. The bus has arrived, but it was delayed. Given that the bus was delayed, what is the likelihood that the weather conditions were favorable?

How to remove unwanted Jupyter notebook kernels?

Fri, 30 Oct 2020 17:15:17 +0000

Whener I run Jupyter notebook there are some kernels that do not exist on system and generate errors. How can I remove them?