<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent activity in Data Science Interview Questions</title>
<link>https://ask.ghassem.com/activity/data-science-interview-questions</link>
<description>Powered by Question2Answer</description>
<item>
<title>Do you usually collect you own data or there is always a resource available for you? Or it depends on the company?</title>
<link>https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/1014/usually-collect-always-resource-available-depends-company</guid>
<pubDate>Sun, 09 Jan 2022 22:13:34 +0000</pubDate>
</item>
<item>
<title>Data manipulation problem study resources</title>
<link>https://ask.ghassem.com/715/data-manipulation-problem-study-resources</link>
<description>&lt;p&gt;A colleague of mine is&amp;nbsp;studying for tech roles, and they&#039;re asked to solve a&amp;nbsp;consistent&amp;nbsp;type of problem&amp;nbsp;during the phone screenings: practicing manipulating data (sets, hash tables/dictionaries, arrays/lists, strings). These questions aren’t necessarily difficult problems and tend to require very little logic, and tend to be more about having a good understanding of the data types (such as listed above). I&#039;ve provided some examples in this link:&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;https://imgur.com/a/ITVeVnr&quot;&gt;https://imgur.com/a/ITVeVnr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I&#039;m wondering if there are resources to study these questions. They aren&#039;t really Leetcode questions or the kind of thing found on Reddit daily programmer, which is where I&#039;m generally&amp;nbsp;directed to most often in the time I&#039;ve been asking all over.&amp;nbsp;Even if it&#039;s a textbook, it would be incredibly handy. And to be clear, I&#039;m not looking for like a hack or golden secret, just resources for studying. Thank you for any help!&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/715/data-manipulation-problem-study-resources</guid>
<pubDate>Wed, 28 Aug 2019 12:53:53 +0000</pubDate>
</item>
<item>
<title>Answered: Do you have a cheatsheet for Data Science?!</title>
<link>https://ask.ghassem.com/573/do-you-have-a-cheatsheet-for-data-science?show=574#a574</link>
<description>&lt;p&gt;A great cheatsheet&amp;nbsp;is available in &lt;a rel=&quot;nofollow&quot; href=&quot;https://github.com/ml874/Data-Science-Cheatsheet&quot;&gt;this link&lt;/a&gt;, and can be downloaded directly from &lt;a rel=&quot;nofollow&quot; href=&quot;https://github.com/ml874/Data-Science-Cheatsheet/raw/master/data-science-cheatsheet.pdf&quot;&gt;here&lt;/a&gt;. &quot;The cheatsheet is loosely based off of&amp;nbsp;&lt;em&gt;The Data Science Design Manual&lt;/em&gt;&amp;nbsp;by Steven S. Skiena and&amp;nbsp;&lt;em&gt;An Introduction to Statistical Learning&lt;/em&gt;&amp;nbsp;by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani .&quot;&lt;/p&gt;

&lt;p&gt;Some screenshots:&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://raw.githubusercontent.com/ml874/Data-Science-Cheatsheet/master/Screenshots/screenshot1.png&quot;&gt;https://raw.githubusercontent.com/ml874/Data-Science-Cheatsheet/master/Screenshots/screenshot1.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://raw.githubusercontent.com/ml874/Data-Science-Cheatsheet/master/Screenshots/screenshot2.png&quot;&gt;https://raw.githubusercontent.com/ml874/Data-Science-Cheatsheet/master/Screenshots/screenshot2.png&lt;/a&gt;&lt;/p&gt;


</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/573/do-you-have-a-cheatsheet-for-data-science?show=574#a574</guid>
<pubDate>Wed, 27 Feb 2019 05:59:34 +0000</pubDate>
</item>
<item>
<title>Answered: What is summary statistics?</title>
<link>https://ask.ghassem.com/500/what-is-summary-statistics?show=501#a501</link>
<description>The information that gives a quick and simple description of the data. These include mean, median, mode, minimum value, maximum value, range, standard deviation, etc</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/500/what-is-summary-statistics?show=501#a501</guid>
<pubDate>Thu, 01 Nov 2018 19:45:32 +0000</pubDate>
</item>
<item>
<title>Answered: What is the purpose of randomization in statistics?</title>
<link>https://ask.ghassem.com/486/what-is-the-purpose-of-randomization-in-statistics?show=499#a499</link>
<description>The main purpose for using randomization in an experiment is to control the lurking variable.&lt;br /&gt;
&lt;br /&gt;
Using randomization is the most reliable method of creating homogeneous treatment groups, without involving any potential biases or judgments.</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/486/what-is-the-purpose-of-randomization-in-statistics?show=499#a499</guid>
<pubDate>Thu, 01 Nov 2018 19:26:53 +0000</pubDate>
</item>
<item>
<title>Answered: What is the difference between univariate and multivariate analysis?</title>
<link>https://ask.ghassem.com/491/what-difference-between-univariate-multivariate-analysis?show=495#a495</link>
<description>&lt;p&gt;&lt;strong&gt;Univariate analysis&lt;/strong&gt; is the simplest form of analyzing data. “Uni” means “one”, so in other words,&amp;nbsp;your data has only one variable. It doesn&#039;t&amp;nbsp;deal with causes or relationships (unlike regression) and its major purpose is to describe.&lt;/p&gt;

&lt;p&gt;For example, the distribution of the&amp;nbsp;educational background of students involves only one variable and can the analysis can be referred to as univariate analysis.&lt;/p&gt;

&lt;p&gt;To know more:&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.statisticshowto.datasciencecentral.com/univariate/&quot;&gt;https://www.statisticshowto.datasciencecentral.com/univariate/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multivariate analysis&lt;/strong&gt;&amp;nbsp;(&lt;strong&gt;MVA&lt;/strong&gt;) involves observation and analysis of more than one statistical outcome variable at a time. The technique is used across multiple dimensions while taking into account the effects of all variables on the responses of interest, and the techniques are especially valuable when working with correlated variables. One example mentioned in class is Factor Analysis.&lt;/p&gt;

&lt;p&gt;Specifically, if attempting to understand the difference between two variables at a time is called&amp;nbsp;Bivariate analysis.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;To know more:&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/multivariate-analysis/&quot;&gt;https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/multivariate-analysis/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/multivariate-analysis&quot;&gt;https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/multivariate-analysis&lt;/a&gt;&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/491/what-difference-between-univariate-multivariate-analysis?show=495#a495</guid>
<pubDate>Tue, 30 Oct 2018 11:54:20 +0000</pubDate>
</item>
<item>
<title>Answered: How will you create a classification to identify key customer trends in unstructured data?</title>
<link>https://ask.ghassem.com/461/create-classification-identify-customer-trends-unstructured?show=465#a465</link>
<description>A model does not hold any value if it cannot produce actionable results, an experienced data analyst will have a varying strategy based on the type of data being analysed. For example, if a customer complain was retweeted then should that data be included or not. Also, any sensitive data of the customer needs to be protected, so it is also advisable to consult with the stakeholder to ensure that you are following all the compliance regulations of the organization and disclosure laws, if any.&lt;br /&gt;
&lt;br /&gt;
You can answer this question by stating that you would first consult with the stakeholder of the business to understand the objective of classifying this data. Then, you would use an iterative process by pulling new data samples and modifying the model accordingly and evaluating it for accuracy. You can mention that you would follow a basic process of mapping the data, creating an algorithm, mining the data, visualizing it and so on. However, you would accomplish this in multiple segments by considering the feedback from stakeholders to ensure that you develop an enriching model that can produce actionable results.</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/461/create-classification-identify-customer-trends-unstructured?show=465#a465</guid>
<pubDate>Sun, 28 Oct 2018 11:50:59 +0000</pubDate>
</item>
<item>
<title>Answered: Mention some common problems that data analysts encounter during analysis.</title>
<link>https://ask.ghassem.com/460/mention-common-problems-analysts-encounter-during-analysis?show=464#a464</link>
<description>&lt;ul&gt;
	&lt;li&gt;Having a poor formatted data file. For instance, having CSV data with un-escaped newlines and commas in columns.&lt;/li&gt;
	&lt;li&gt;Having inconsistent and incomplete data can be frustrating.&lt;/li&gt;
	&lt;li&gt;Common Misspelling and Duplicate entries are a common data quality problem that most of the data analysts face.&lt;/li&gt;
	&lt;li&gt;Having different value representations and misclassified data.&lt;/li&gt;
&lt;/ul&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/460/mention-common-problems-analysts-encounter-during-analysis?show=464#a464</guid>
<pubDate>Sun, 28 Oct 2018 11:50:17 +0000</pubDate>
</item>
<item>
<title>Answered: Explain the typical data analysis process.</title>
<link>https://ask.ghassem.com/459/explain-the-typical-data-analysis-process?show=463#a463</link>
<description>&lt;p&gt;Data analysis deals with collecting, inspecting, cleaning, transforming and modeling data to glean valuable insights and support better decision making in an organization. The various steps involved in the data analysis process include&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&amp;nbsp;Data Exploration&amp;nbsp;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Having identified the business problem, a data analyst has to go through the data provided by the client to analyse the root cause of the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Preparation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the most crucial step of the data analysis process wherein any data anomalies (like missing values or detecting outliers) with the data have to be modelled in the right direction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Modelling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The modelling step begins once the data has been prepared. Modelling is an iterative process wherein the model is run repeatedly for improvements. Data modelling ensures that the best possible result is found for a given business problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this step, the model provided by the client and the model developed by the data analyst are validated against each other to find out if the developed model will meet the business requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation of the Model and Tracking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the final step of the data analysis process wherein the model is implemented in production and is tested for accuracy and efficiency.&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/459/explain-the-typical-data-analysis-process?show=463#a463</guid>
<pubDate>Sun, 28 Oct 2018 11:49:51 +0000</pubDate>
</item>
<item>
<title>Answered: What is the difference between Data Mining and Data Analysis?</title>
<link>https://ask.ghassem.com/458/what-is-the-difference-between-data-mining-and-data-analysis?show=462#a462</link>
<description>&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;border-spacing: 1px;&quot;&gt;
	&lt;caption&gt;Data Mining vs Data Analysis&lt;/caption&gt;
	&lt;tbody&gt;
		&lt;tr&gt;
			&lt;td&gt;
			&lt;h3&gt;&lt;strong&gt;Data Mining&lt;/strong&gt;&lt;/h3&gt;
			&lt;/td&gt;
			&lt;td&gt;
			&lt;h3&gt;&lt;strong&gt;Data Analysis&lt;/strong&gt;&lt;/h3&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;Data mining usually does not require any hypothesis.&lt;/td&gt;
			&lt;td&gt;Data analysis begins with a question or an assumption.&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;Data Mining depends on clean and well-documented data.&lt;/td&gt;
			&lt;td&gt;Data analysis involves data cleaning.&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;Results of data mining are not always easy to interpret.&lt;/td&gt;
			&lt;td&gt;Data analysts interpret the results and convey the to the stakeholders.&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;Data mining algorithms automatically develop equations.&lt;/td&gt;
			&lt;td&gt;Data analysts have to develop their own equations based on the hypothesis.&lt;/td&gt;
		&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/458/what-is-the-difference-between-data-mining-and-data-analysis?show=462#a462</guid>
<pubDate>Sun, 28 Oct 2018 11:49:13 +0000</pubDate>
</item>
<item>
<title>Answered: What is TF-IDF algorithm?</title>
<link>https://ask.ghassem.com/296/what-is-tf-idf-algorithm?show=457#a457</link>
<description>&lt;p&gt;Tf-idf stands for&amp;nbsp;term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining.&lt;/p&gt;

&lt;p&gt;This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus.&lt;/p&gt;

&lt;p&gt;Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document&#039;s relevance given a user query.&lt;/p&gt;

&lt;p&gt;One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model.&lt;/p&gt;

&lt;p&gt;Tf-idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to Compute:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Typically, the tf-idf weight is composed by two terms:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;the first computes the normalized Term Frequency (TF), aka. the number of times a word appears in a document, divided by the total number of words in that document;&lt;/li&gt;
	&lt;li&gt;the second term is the Inverse Document Frequency (IDF), computed as the logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;TF: Term Frequency, which measures how frequently a term occurs in a document.&lt;/strong&gt; Since every document is different in length, it is possible that a term would appear much more times in long documents than shorter ones. Thus, the term frequency is often divided by the document length (aka. the total number of terms in the document) as a way of normalization:&amp;nbsp;&lt;/p&gt;

&lt;p&gt;TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IDF: Inverse Document Frequency, which measures how important a term is.&lt;/strong&gt; While computing TF, all terms are considered equally important. However, it is known that certain terms, such as &quot;is&quot;, &quot;of&quot;, and &quot;that&quot;, may appear a lot of times but have little importance. Thus we need to weigh down the frequent terms while scaling up the rare ones, by computing the following:&amp;nbsp;&lt;/p&gt;

&lt;p&gt;IDF(t) = log_e(Total number of documents / Number of documents with term t in it).&lt;/p&gt;

&lt;p&gt;See below for a simple example.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consider a document containing 100 words wherein the word cat appears 3 times. The term frequency (i.e., tf) for cat is then (3 / 100) = 0.03. Now, assume we have 10 million documents and the word cat appears in one thousand of these. Then, the inverse document frequency (i.e., idf) is calculated as log(10,000,000 / 1,000) = 4. Thus, the Tf-idf weight is the product of these quantities: 0.03 * 4 = 0.12.&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/296/what-is-tf-idf-algorithm?show=457#a457</guid>
<pubDate>Sun, 28 Oct 2018 11:26:52 +0000</pubDate>
</item>
<item>
<title>Answered: What are Natural Language Processing (NLP) and its applications?</title>
<link>https://ask.ghassem.com/297/what-are-natural-language-processing-nlp-and-applications?show=454#a454</link>
<description>&lt;p&gt;The&amp;nbsp;&lt;strong&gt;majority of activities performed by humans are done through language&lt;/strong&gt;, whether communicated directly or reported using natural language. As technology is increasingly making the methods and platforms on which we communicate ever more accessible, there is an even greater need to&amp;nbsp;&lt;strong&gt;understand the languages we use to communicate&lt;/strong&gt;. By combining the power of&amp;nbsp;&lt;strong&gt;artificial intelligence, computational linguistics, and computer science&lt;/strong&gt;,&amp;nbsp;Natural Language Processing (NLP) helps machines “read” text by simulating the human&amp;nbsp;&lt;strong&gt;ability to understand language&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NLP is everywhere even if we don’t realize it&lt;/strong&gt;. Does your email application automatically correct you when you try to send an email without the attachment that you referenced in the text of the email?&amp;nbsp;&lt;strong&gt;This is Natural Language Processing Applications at work&lt;/strong&gt;.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;some examples of the most widely used NLP applications&lt;/strong&gt;:&lt;/p&gt;

&lt;h2&gt;Natural Language Processing Applications: Machine Translation&lt;/h2&gt;

&lt;p&gt;As the&amp;nbsp;&lt;strong&gt;amount of information available online is growing&lt;/strong&gt;, the need to access it becomes increasingly important and the value of natural language processing&amp;nbsp;applications becomes clear. Machine translation helps us conquer language barriers that we often encounter by translating technical manuals, support content or catalogs at a significantly reduced cost. The challenge with&amp;nbsp;&lt;strong&gt;machine translation technologies is not in translating words, but in understanding the meaning&lt;/strong&gt;&amp;nbsp;of sentences to provide a true translation.&lt;/p&gt;

&lt;h2&gt;Automatic summarization&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Information overload is a real problem&lt;/strong&gt;&amp;nbsp;when we need to access a specific, important piece of information from a huge knowledge base. Automatic summarization is relevant not only for summarizing&lt;strong&gt; the meaning of documents and information&lt;/strong&gt; but also for&amp;nbsp;&lt;strong&gt;understanding the emotional meanings inside the information&lt;/strong&gt;, such as in collecting data from social media.&amp;nbsp;&lt;strong&gt;Automatic summarization&lt;/strong&gt;&amp;nbsp;is especially relevant when used to provide an overview of a news item or blog posts while avoiding redundancy from&amp;nbsp;&lt;strong&gt;multiple sources and maximizing the diversity of content obtained&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;Sentiment analysis&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The goal of sentiment analysis is to identify sentiment among several posts&lt;/strong&gt;&amp;nbsp;or even in the same post&amp;nbsp;&lt;strong&gt;where emotion is not always explicitly expressed&lt;/strong&gt;. Companies use natural language processing applications,&amp;nbsp;such&amp;nbsp;as&amp;nbsp;&lt;strong&gt;sentiment analysis&lt;/strong&gt;, to identify opinions and sentiment online to help them&amp;nbsp;&lt;strong&gt;understand what customers think about their products and services&lt;/strong&gt;&amp;nbsp;(i.e., “I love the new iPhone” and, a few lines later “But sometimes it doesn’t work well” where the person is still talking about the iPhone) and overall indicators of their reputation. Beyond determining simple polarity, sentiment analysis understands the sentiment in context to help you better understand what’s behind an expressed opinion, which can be extremely relevant in understanding and driving purchasing decisions.&lt;/p&gt;

&lt;h2&gt;Text classification&lt;/h2&gt;

&lt;p&gt;Text classification&amp;nbsp;makes it possible to assign predefined categories to a document and&amp;nbsp;&lt;strong&gt;organize it to help you find the information&lt;/strong&gt;&amp;nbsp;you need or simplify some activities. For example,&amp;nbsp;application&lt;strong&gt; of text categorization&lt;/strong&gt;&amp;nbsp;is spam filtering in an email.&lt;/p&gt;

&lt;h2&gt;Question Answering&lt;/h2&gt;

&lt;p&gt;As speech-understanding technology and voice-input applications improve,&lt;strong&gt;&amp;nbsp;the need for NLP will only increase&lt;/strong&gt;. Question-Answering (QA) is becoming more and more popular thanks to applications such as Siri, OK Google, chat boxes and virtual assistants. A QA application is a system capable of coherently answering a human request.&amp;nbsp;&lt;strong&gt;It may be used as a text-only interface or as a spoken dialog system&lt;/strong&gt;. While they offer great promise, they still have a long way to go. This remains&amp;nbsp;&lt;strong&gt;a relevant challenge especially for search engines&lt;/strong&gt; and is one of the main applications of&amp;nbsp;&lt;strong&gt;natural language processing research&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using natural language processing for creating a seamless and interactive &lt;/strong&gt;interface between humans with machines will continue to be a top priority for today’s and tomorrow’s increasingly cognitive applications.&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/297/what-are-natural-language-processing-nlp-and-applications?show=454#a454</guid>
<pubDate>Sun, 28 Oct 2018 11:04:40 +0000</pubDate>
</item>
<item>
<title>Commented: Which scenarios among the following are a valid reason to use regularization?</title>
<link>https://ask.ghassem.com/451/which-scenarios-among-following-valid-reason-regularization?show=453#c453</link>
<description>Please provide the links to the sources as well.</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/451/which-scenarios-among-following-valid-reason-regularization?show=453#c453</guid>
<pubDate>Sat, 27 Oct 2018 17:45:22 +0000</pubDate>
</item>
<item>
<title>Answered: How to transform categorical variable into a matrix binary feature?</title>
<link>https://ask.ghassem.com/449/transform-categorical-variable-into-matrix-binary-feature?show=450#a450</link>
<description>&lt;p&gt;&lt;strong&gt;Answer: Letter A: One hot encoding&lt;/strong&gt;&amp;nbsp;is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Source: Hackernoon&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/449/transform-categorical-variable-into-matrix-binary-feature?show=450#a450</guid>
<pubDate>Sat, 27 Oct 2018 17:24:33 +0000</pubDate>
</item>
<item>
<title>Answered: When should we use Mean, Median or Mode as Measures of Central Tendency?</title>
<link>https://ask.ghassem.com/255/when-should-use-mean-median-mode-measures-central-tendency?show=443#a443</link>
<description>&lt;p&gt;All three measures are used to give us a good representative for &quot;&lt;strong&gt;average&lt;/strong&gt;&quot; in our data samples. However, based on the type and properties of each we have to use them in different situations. Based on the types of variables, we can use the following table to see what measure we should use:&lt;/p&gt;

&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px; border-spacing: 1px;&quot;&gt;
	&lt;thead&gt;
		&lt;tr&gt;
			&lt;th scope=&quot;col&quot;&gt;&lt;strong&gt;Type of &lt;/strong&gt;&lt;strong&gt;Variable&lt;/strong&gt;&lt;/th&gt;
			&lt;th scope=&quot;col&quot;&gt;&lt;strong&gt;Best&lt;/strong&gt;&lt;strong&gt; measure of central &lt;/strong&gt;&lt;/th&gt;
		&lt;/tr&gt;
	&lt;/thead&gt;
	&lt;tbody&gt;
		&lt;tr&gt;
			&lt;td&gt;Categorical (Nominal)&lt;/td&gt;
			&lt;td&gt;Mode&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;Ordinal&lt;/td&gt;
			&lt;td&gt;Median&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;Interval/Ratio (not skewed)&lt;/td&gt;
			&lt;td&gt;Mean&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td&gt;Interval/Ratio (skewed)&lt;/td&gt;
			&lt;td&gt;Median&lt;/td&gt;
		&lt;/tr&gt;
	&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consider the effect of Outliers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In addition, when we have ratio&amp;nbsp;variables (such as numeric values) and it contains outliers, we have to &lt;strong&gt;use Median instead of the mean&lt;/strong&gt;. An example is a &lt;strong&gt;salary data&lt;/strong&gt; columns that may contain very large or very small values which affect the mean, but if we use Median instead, we can see a better representative for the &quot;&lt;strong&gt;average&lt;/strong&gt;&quot;. That is why on many websites you see Median Salary for a job position instead of mean. For more information, you can take a look at &lt;a rel=&quot;nofollow&quot; href=&quot;https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php&quot;&gt;this page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/255/when-should-use-mean-median-mode-measures-central-tendency?show=443#a443</guid>
<pubDate>Wed, 24 Oct 2018 20:57:42 +0000</pubDate>
</item>
<item>
<title>What are the main steps in making a decision tree?</title>
<link>https://ask.ghassem.com/337/what-are-the-main-steps-in-making-a-decision-tree</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/337/what-are-the-main-steps-in-making-a-decision-tree</guid>
<pubDate>Fri, 12 Oct 2018 02:17:19 +0000</pubDate>
</item>
<item>
<title>Please explain Linear Regression with an example?</title>
<link>https://ask.ghassem.com/336/please-explain-linear-regression-with-an-example</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/336/please-explain-linear-regression-with-an-example</guid>
<pubDate>Fri, 12 Oct 2018 02:14:00 +0000</pubDate>
</item>
<item>
<title>How to return the outliers by having a list of numbers ?</title>
<link>https://ask.ghassem.com/301/how-to-return-the-outliers-by-having-a-list-of-numbers</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/301/how-to-return-the-outliers-by-having-a-list-of-numbers</guid>
<pubDate>Mon, 08 Oct 2018 12:19:22 +0000</pubDate>
</item>
<item>
<title>Can you explain the percentiles and quartiles and their applications?</title>
<link>https://ask.ghassem.com/295/can-you-explain-percentiles-quartiles-their-applications</link>
<description></description>
<category>Data Science Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/295/can-you-explain-percentiles-quartiles-their-applications</guid>
<pubDate>Mon, 08 Oct 2018 11:52:30 +0000</pubDate>
</item>
</channel>
</rss>