<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Ask Ghassem - Recent questions tagged dataframe</title>
<link>https://ask.ghassem.com/tag/dataframe</link>
<description>Powered by Question2Answer</description>
<item>
<title>How do I know which encoder to use to convert from categorical variables to numerical?</title>
<link>https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</link>
<description>So say I have a column with categorical data like different styles of temperature: &amp;#039;Lukewarm&amp;#039;, &amp;#039;Hot&amp;#039;, &amp;#039;Scalding&amp;#039;, &amp;#039;Cold&amp;#039;, &amp;#039;Frostbite&amp;#039;,... etc.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I know that we can use pd.get_dummies to convert the column to numerical data within the dataframe, but I also know that there are other &amp;#039;converters&amp;#039; (not sure if that&amp;#039;s the correct terminology) that we can use, i.e. OneHotEncoder from Sk-learn (like I could use the pipeline module to make a nice pipeline and feed my dataframe through the pipeline to also get my categorical data encoded to numerical).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
How do I know which to use? Does it matter? If it does matter, when does it matter the most (i.e. what types of problems? When there are lots of categorical variables, or few?) If anyone can give me any pointers on this type of stuff I&amp;#039;d greatly appreciate it.</description>
<category>Exploratory Data Analysis</category>
<guid isPermaLink="true">https://ask.ghassem.com/1006/know-which-encoder-convert-categorical-variables-numerical</guid>
<pubDate>Mon, 29 Nov 2021 04:09:06 +0000</pubDate>
</item>
<item>
<title>Terminology clarification in Spark</title>
<link>https://ask.ghassem.com/979/terminology-clarification-in-spark</link>
<description>&lt;p&gt;I have a hard time distinguishing&amp;nbsp;terminologies&amp;nbsp;of SparkSQL. While SparkSQL are quite flexible in terms of abstraction layers, its really difficult for beginner to navigate around those options.&lt;/p&gt;

&lt;p&gt;1. When we say &quot; using SparkSQL to perform .....&quot;, does it mean that we can use any API/abstraction layers such as Scala, Python, HiveQL to query? As long as the core dataframe is in spark, we should be fine?&lt;/p&gt;

&lt;p&gt;2. Can we manipulate data in both PySpark and Scala sequentially?&lt;/p&gt;

&lt;p&gt;For example, may I clean up the data in Scala, then perform follow up manipulation in PySpark, then go back to Scala?&lt;/p&gt;

&lt;p&gt;3. As demonstrated in the tutorial, we can query&amp;nbsp;with SQL command by using the api spark.sql(&quot;My SQL command&quot;). does it count as SQL or SPARK?&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;Image result for spark sql&quot; src=&quot;http://cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/f4a5b21d-66fa-4885-92bf-c4e81c06d916/Image/753ace3c801b53535077d9474ecc5f1e/odi_spark_sql_databricks.jpg&quot;&gt;&lt;/p&gt;</description>
<category>Big Data Tools</category>
<guid isPermaLink="true">https://ask.ghassem.com/979/terminology-clarification-in-spark</guid>
<pubDate>Sat, 06 Feb 2021 18:03:32 +0000</pubDate>
</item>
<item>
<title>How to filter a dataframe?</title>
<link>https://ask.ghassem.com/775/how-to-filter-a-dataframe</link>
<description>&lt;p&gt;Consider the Pandas DataDrame&amp;nbsp;&lt;code&gt;df&lt;/code&gt;&amp;nbsp;below. Filter it appropriately so that it outputs the shown results.&lt;/p&gt;

&lt;pre class=&quot;prettyprint lang-python&quot; data-pbcklang=&quot;python&quot; data-pbcktabsize=&quot;4&quot;&gt;
     gh owner language      repo  stars
0  pandas-dev   python    pandas  17800
1   tidyverse        R     dplyr   2800
2   tidyverse        R   ggplot2   3500
3      has2k1   python  plotnine   1450&lt;/pre&gt;

&lt;h2&gt;Expected Output&lt;/h2&gt;

&lt;pre class=&quot;prettyprint lang-&quot; data-pbcklang=&quot;&quot; data-pbcktabsize=&quot;&quot;&gt;
     gh owner language    repo  stars
0  pandas-dev   python  pandas  17800&lt;/pre&gt;</description>
<category>Python Interview Questions</category>
<guid isPermaLink="true">https://ask.ghassem.com/775/how-to-filter-a-dataframe</guid>
<pubDate>Wed, 25 Dec 2019 05:56:14 +0000</pubDate>
</item>
<item>
<title>How to reshape in pandas dataframe?</title>
<link>https://ask.ghassem.com/608/how-to-reshape-in-pandas-dataframe</link>
<description>&lt;p&gt;Dataframe looks like below&lt;/p&gt;

&lt;p&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;https://user-images.githubusercontent.com/31833270/55626163-bb1edc80-57e5-11e9-9807-ebc69ff5915a.png&quot; target=&quot;_blank&quot;&gt;&lt;img alt=&quot;image&quot; src=&quot;https://user-images.githubusercontent.com/31833270/55626163-bb1edc80-57e5-11e9-9807-ebc69ff5915a.png&quot;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have dataframe like above. which I want to a~t reshape&amp;nbsp;&lt;code&gt;(a~t, 1)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I want to reshape dataframe like below ( b~t column is go to under the a column)&lt;/p&gt;

&lt;p&gt;&lt;code&gt;날짜 역번호 역명 구분 a &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2018-01-01 150 서울역 승차 379 &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2018-01-01 150 서울역 승차 287&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2018-01-01 150 서울역 승차 371 &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2018-01-01 150 서울역 승차 876 &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2018-01-01 150 서울역 승차 965&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.... &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2008-01-01 152 종각 승차 2920 &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2008-01-01 152 종각 승차 2290 &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2008-01-01 152 종각 승차 802 &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;2008-01-01 152 종각 승차 1559&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;like&amp;nbsp;&lt;code&gt;df = df.reshape(len(data2)*a~t, 1)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;i tried pd.melt but It does not work well.&lt;/p&gt;

&lt;pre&gt;
&lt;code&gt;df2 = pd.melt(df, id_vars=[&quot;날짜&quot;, &quot;역번호&quot;, &quot;역명&quot;, &quot;구분&quot;], value_name=&quot;t&quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;is remove b ~ t but i want insert b~t behind a&lt;/p&gt;

&lt;p&gt;dataset is&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;https://drive.google.com/file/d/1Upb5PgymkPB5TXuta_sg6SijwzUuEkfl/view?usp=sharing&quot;&gt;https://drive.google.com/file/d/1Upb5PgymkPB5TXuta_sg6SijwzUuEkfl/view?usp=sharing&lt;/a&gt;&lt;/p&gt;</description>
<category>General</category>
<guid isPermaLink="true">https://ask.ghassem.com/608/how-to-reshape-in-pandas-dataframe</guid>
<pubDate>Fri, 05 Apr 2019 13:41:30 +0000</pubDate>
</item>
</channel>
</rss>