· to access data from it. We use

·       
Google Trend

Google Trend is public web
facility based on Google searches which shows that how often the particular
keyword or term is searched relative to the total searches across the region.
Google trend does not public their API (Application Programming Interface) as
it was the big problem related to access data from it. We use 3rd
Party API to access the data of Google trend which gives us the required News
related data

 

·       
Twitter

Data is in the form of raw tweets.
It is extracted by using the Node library which provides a package for simple
twitter streaming API . This API allows two modes of accessing tweets: Sample
Stream and Filter Stream..Sample Stream simply delivers a small, random sample
of all the tweets streaming at a real time. Filter Stream delivers tweet which
match a certain criteria. It can filter the delivered tweets according to three
criteria:

• Specific keyword(s) to track/search for in the
tweets

• Specific Twitter user(s) according to their
user-id’s

• Tweets originating from specific location(s) (only
for geo-tagged tweets).

A programmer can specify any
single one of these filtering criteria or a multiple combination of these. But
for our purpose we are using Filter Stream as we are accessing the real time
data having user-id, location and according to the Trending keywords. Tweets
are coming on run time as with respect to the trending keywords.

1.1  
Labeling

We labeled the tweets in three classes: positive,
negative, neutral.

·        
Positive:
If the overall tweet has a positive sentiment or if there is more than one
sentiment is expressed in the tweet but the positive sentiment is more
dominant. Example: “4 more years of being in shithole Australia then I move
to the USA! :D”2.

·        
Negative: If
the overall tweet has a negative sentiment or if there is more than one
sentiment is expressed in the tweet but the negative sentiment is more
dominant. Example: “I want an android now this iPhone is boring: S”2.

·        
Neutral/Objective: If
the overall tweet has a neutral sentiment or if there is more than one
sentiment is expressed in the tweet but the neutral sentiment is more dominant.
Example: “US House Speaker vows to stop Obama contraceptive rule…
http://t.co/cyEWqKlE”2.

 

1.2  
Classification

Classification is the process in which data is divided into different
classes according to some common pattern. The aim of our project is to design a
classifier which accurately classifies in the following three sentiment
classes: positive, negative and neutral.

 

1.2.1   
Naïve Bays Classifier

A Naive Baye’s classifier applies Baye’s Theorem. Naive
Baye’s is a very simple model for classification purpose. It is simple and
works well on text categorization. We adopt multinomial Naive Baye’s in our
project. It assumes each feature is conditional independent to other features
given the class. That is,

Equation 1 Naive Bayes

 

Where c is a specific class either positive or
negative and t is a tweet in the form of text we want to classify. P(c) and P(t)
is the prior probabilities of this class and this text. And P (t | c) is the
probability the text appears given this class. In our case, the value of class
c might be POSITIVE or NEGATIVE, and t is just a sentence.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter 4

2     
METHODOLGY

 

Figure 1 Methodology

                                                        

2.1  
User Interface

It is a desktop application;
real-time results are shown in the form of charts. Charts are also specifies as
corresponding to the individual keywords. User can easily see the news related
to that trending keyword as well as related news on run time.

Figure 2 Homepage

 

 

Figure 3 Trending Keywords

Figure 4 Tweets Related to Keywords

 

 

Figure 5 Sentiment Analysis

 

2.2  
Operations

2.2.1  Extract Trending Words

Trending Keywords  
will be extracted from Google Trend API. Keywords are related to
specific News. News related to specific trending keyword is obtain through
Google News

 

2.2.2   
Tweet Extraction

Tweets will be extracted from Twitter through twitter
API. Tweet can be maximum 140 words and 100 or 200 tweets will be extracted at
a time. 

 

2.2.3   
Sentiment Analysis

Sentiment analysis is done in the form of positive, negative and neutral
reviews. All reviews have their specific range of percentage based on
analysis performed.

 

2.2.4   
Real Time Analysis

Real-time analysis is show in the form of bar charts.
Bar chart is dividing into three categories which are positive, neutral and
negative

 

2.2.5   
User Characteristics

Users of this application have basic knowledge of
computer.

 

 

 

 

2.3  
Software Requirements

 Certain
software requirements are required for the application to install, run and                      perform is given as:

·        
Node.js

·        
Windows
2000/XP/2003/Vista/7/8/2012 Server/8.1/10

 

2.4  
Hardware Requirements

Minimum Hardware requirements are required for the
application to install, run and perform is given as:

·          
32MB of RAM

·          
Intel Pentium 4, Celeron 4,
Dual Core, Core 2 Duo/Quad, Core i3/i5/i7

·           
A network connection

 

2.5  
Reliability

Our Software is quite reliable but there exist very
little probability of it failing to give the good results.

 

2.6  
Availability

·        
Node.js is required for
implementation

·        
There should be a valid
dataset of tweets.

·        
Access of Google Trend is
required as we extract our trending Keywords From it.

 

 

 

Chapter 5

3     
IMPLEMENTATION

 

 

3.1   Trending Keyword Extraction

 

 

3.2   Tweets Extraction

 

Extraction of tweets is
quite easy. After installing node is, a library needs to be activated. Then
sign up on twitter if you are new user. Once the account is established on
twitter; then go to https://dev.twitter.com/ and click on button create Twitter
App and fill required information. Now it’s easy to access API keys.
Application Programming Interface (API) provides four keys which are Consumer
Key, Consumer Secret Key, Access Token and Access Token Secret. Add these key
in source code, and then tweets extraction starts.

 

3.2.1  Pre Processing

 

Tweets are extracted in raw form because it consists
of many ineffective words that are required to be removed in order to make
sense out of those tweets. So we carry out preprocessing on this large dataset
with the help of the libraries. Preprocessing eliminates all unwanted words
from dataset such as.

·          
Misspells

·          
Stopping Words

·          
Special Characters

·          
White Spaces

·          
Transform to Lower Case

 

 

3.3   Sentiment
Identification

 

Opinion of public varies from
person to person. Everyone has its own perception and point of view about anything.
Three major differences of opinions are positive, negative and neutral. In
positive sentiments , all those words welcome which are progressive and better
influence for example satisfy, truth, increase, glad, glory, success and a lot
like those 2. In negative sentiments of, all those words will come which are
non-progressive and immoral for example corrupt, disqualify, decrease, loss,
evil, defeat and a lot like those. In neutral sentiments of all those words
will come which are neither positive in nature nonnegative.

 

 

3.4   Real Time Analysis

 

Real-time Analysis of data which is extracted from social media
plat-form (Facebook, Twitter, Instagram, Snap chat etc.) will provide as data
which will be live, instant and direct. This type of data will be very much
effective in situations where we need to analyze and study current conditions
related to any required situation 5.

At the last stage of our project for real-time results of sentiments we
generated Bar Charts by using node Chart library as shown in Figure.

 

 

3.5   Result Discussion

 

We will now first present
our results for the objectivity or subjectivity and positive or negative
classifications. These results will act as our first step for our
classification approach. We only use the features for both of these results. This
means that for the objectivity / subjectivity classification we have 5 features
and for positivity and negativity classification we have 3 features. For both
of these results we use the Naïve Bayes classification algorithm, because that
is the algorithm we are employing in our actual classification approach at the
first step.

 

BACK TO TOP