Wednesday, January 10, 2018

Sentiment Analysis of Twitter Users

One of my soft spots is for social media, and how the public is influenced by it, so I decided to take a course in sentiment analysis using R and Tableau. The course exposed me to two lexicons which classify words as either "good" or "bad", which is really useful, since we can add or remove terms from the lexicons based on context. A good example would be the addition of local vernacular or industry conventions which wouldn't necessarily be universally negative or positive.

In this application, we looked at the sentiment and popularity of the Samsung Galaxy versus the Apple iPhone on Twitter from Los Angeles, New York, Austin, and Seattle. We scraped data from Twitter using R, after grabbing geocodes from https://www.latlong.net, which we passed through a sentiment function, and saved to individual output files. Check out the Jupyter Notebook below.




In Tableau, the extracted Tweets were joined into one data source for analysis. The first observation was that there was a very low number of Tweets in Seattle about Apple, even though the script was able to retrieve 2,000 entries for all other cities. This should be taken into consideration for all further insights.



The ratio of Original versus Duplicate Tweets was then analyzed, finding that 63.56% of the extracted tweets were duplicate entries, meaning that a majority of the discussion about the two products is being repeated in the cities sampled.

Of the Original Tweets, we noticed that 57.43% of them were Retweeted at least once.

Out of all the Tweets sampled, Samsung trailed Apple 44.58% to 52.52%, meaning that Apple is being discussed more than Samsung throughout the four cities Sampled.

A histogram of the Samsung and Apple sentiment was constructed. After removing neutral entries, it was readily apparent that although Apple is being discussed more, the sentiment of the Samsung histogram is more right-skewed, indicating that the discussion is more positive than that of Apple's. 89% of the Samsung data was positive, while only 68% of the Apple data was positive.

Visualizing the frequency of the Tweets for each of the devices show that they are both increasing during the two-week period of analysis, which may be a result of new devices being released, or any associated news releases surrounding either one of the companies. The two devices do seem to be competing based on the frequency graph, where the frequency of Tweets about Samsung and Apple are seemingly correlated to some degree.

Finally, a map was constructed displaying the distribution of total tweets between Samsung and Apple, showing that Apple clearly has dominance on Twitter, at least for the two-week sample period in Los Angeles, New York, Austin, and Seattle.




Here's a link to the github:

https://github.com/SLPeoples/Text-Mining-Sentiment-Analysis



No comments:

Post a Comment