Ask Ghassem - Recent questions tagged sentiment-analysis

Very short text classification when category text should be replaced by another category text?

Thu, 11 Feb 2021 12:48:47 +0000

I need some tool to classify articles based on short category text which consists of two or three words separated by '-'. The RSS/XML tag content is for example:

Foreign - News

Football - Foreign

I created my own categories in DB and now I need to classify categories from parsed RSS of this news source, so it fits news categories defined by me.

I would, for example need all articles containing category "football" to be identified as a category Sport but sometimes those categories XML tags contains exact match like Foreign - News should belong in the DB to category defined by me as Foreign.

Since I used only trained decision trees frameworks from AI so for another project so far, I would like to hear advice about probably AI based approach, technique or particular framework I can use to solve this problem. I don't want to get into a dead-end street by my own poor, in the field of AI not very experienced decision.

While it can be solved by many ifs and 'contains' function, it seems to me like not a very good solution.

TLDR; I need basically something like "clever, flexible and universal if-elseif".

NOTE: I can also use article description text, if that would be necessary but it seems to me that this former category text is unambiguous enough for this kind of problem.

Binary Classification and neutral tag

Sat, 30 Jan 2021 10:08:01 +0000

I am trying to create a sentiment analysis model using binary classification as loss.I have a batch of tweets that some of them are tagged as positive (labeled as 1) and negative (labeled as 0).I manage to gather some tweets that are tagged as neutral but there are less tweets than positive and negative.My thinking is to tag them with 0.5 to balance the classification probability.Is this legit?

"Rare words" on vocabulary

Sat, 30 Jan 2021 09:57:31 +0000

I am trying to create a sentiment analysis model and I have a question.

After I preprocessed my tweets and created my vocabulary I've noticed that I have words that appear less than 5 times in my dataset (Also there are many of them that appear 1 time). Many of them are real words and not gibberish. My thinking is that if I keep those words then they will get wrong "sentimental" weights and gonna make my model worse.
Is my thinking right or am I missing something?

My vocab size is around 40000 words and those that are "rare" are around 10k.Should I "sacrifice" them?

Thanks in advance.

My GloVe word embeddings contain sentiment?

Sun, 03 Jan 2021 14:09:37 +0000

I've been researching sentiment analysis with word embeddings. I read papers that state that word embeddings ignore sentiment information of the words in the text. One paper states that among the top 10 words that are semantically similar, around 30 percent of words have opposite polarity e.g. happy - sad.

So, I computed word embeddings on my dataset (Amazon reviews) with the GloVe algorithm in R. Then, I looked at the most similar words with cosine similarity and I found that actually every word is sentimentally similar. (E.g. beautiful - lovely - gorgeous - pretty - nice - love). Therefore, I was wondering how this is possible since I expected the opposite from reading several papers. What could be the reason for my findings?

Two of the many papers I read:

Yu, L. C., Wang, J., Lai, K. R. & Zhang, X. (2017). Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 671-681.
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. & Qin, B. (2014). Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1: Long Papers, 1555-1565

How to perform sentiment analysis in NLP?

Wed, 17 Oct 2018 00:45:12 +0000

If trying to read text and need to finalize texts as good, bad , ugly or any such buckets, where to start? What sentiment functions to use?