[SOLVED] algorithm python statistic network Introduction of applications of text classification

$25

File Name: algorithm_python_statistic_network_Introduction_of_applications_of_text_classification.zip
File Size: 810.12 KB

5/5 - (1 vote)

Introduction of applications of text classification
Text classification also called as text categorization or text tagging is the task of assigning a set of predefined categories to freetext. It can be used for organization, construction, and classification. For instance, new articles can be organized by topic. Text categorization can be used in a wide range of contexts, such as categorizing short text or organizing larger documents. Most famous examples of text categorization include sentiment analysis, subject tags, language detection, and intent detection. The most common example of text categorization might be sentiment analysis: which is an automated process that determines whether a text is positive, negative, or neutral.
Use classification to tag content or products to improve browsing or identifying relevant content on your site. Platforms such as ecommerce, blogs, and catalogs can use automation technology to categorize and tag content and products.
Using tags to classified the content on your site helps Google easily crawl your site and ultimately helps SEO. In addition, automating content tags on websites and applications can standardize them so that it can make the user experience better. By classifying panic conversations on social media, a faster emergency response system can be established. In the event of any emergency, the authorities can monitor and classify emergencies for a quick response.
As marketing becomes more targeted every day, automatically classifying users into cohorts can make life easier for marketers. Marketers can monitor and classified users based on how they talk about products or brands online. A classifier can be trained to identify promoters or destroyers. Therefore, the brand is better served for the same kind.
Text classification techniques can also be used in academia, legal practitioners, social researchers, government and nonprofit organizations. Because these organizations handle large amounts of unstructured text, it is easier to process the data if it is standardized by categorylabel.

Description of the dataset and characteristics

Differences in structured data scenario and nonsturctured text mining scenario

Presentation and discussion of the results obtained
Effect of the variations of the dataset used

Perception of the possible rationale for doing the tasks

Comparison of the results

Similarity and differences
First of all. All these are classifiers, which can solve the problem of statistical classification which means that it have many data split into two or more categories. The goal of the classifier is to learn how to divide the article into these two categories and then classify the new article itself.
Naive Bayes has conditional independence assumptions which means Features are independent of each other, are not coupled, and do not interfere with each other.
Because of the conditional independent hypothesis, Nave Bayes can use weights instead of gradients, but directly by counting the logical occurrence of each feature as a weight. Naive Bayes also use the multivariate Bernoulli event model.
Neural Networks are networks which can read data in sequence while retaining the memory of what it has previously read. These are very useful when working with text because they are related. It belongs to a model called generativel. This means that during training the process of learning the classification of the algorithm, Naive Baye first tries to find out how the data is generated. It is essentially trying to figure out the underlying distribution that produces the examples you enter into the model.
SVM is based on the discriminant function given by yw.xb. Here, the weight w and the deviation parameter b are estimated based on the training data. It tries to find a hyperplane that maximizes the margin and has optimization in this area. It has better efficiency on handling nonlinearities in the data.

Reflection of the assignment:
For this assignment, we seperate the assignment in different parts and give them to both of us. As Junchao has better ability on writing and Yiluo has better on coding, we put the description of the dataset and its characteristics, differences of structured data scenario and nonstructured text mining scenario, and the presentation and discussion of the results obtained to Junchao.
Coding for text classification and also include brief introductory discussion of applications of text classification, discussion of the similarity and the differences between the three classifiers, and this reflection to Yiluo.
In this assignment, we have a good team work and discussion. Both of us finish our works in time with efficiency.
The coding part stuck me at first as we do not know how to make enrol 1, 3, 5 as train and 2,4 as test because in resource code, the train data and test data are all in one folder and use the percentage to distinguish and the resource code for labs 3 cannot run on our computer or school computer as the module cannot be install in it. Then we put the enrol 1,3,5 to one folder and enrol 2,4 to another folder and using different name in python to load them and using them to set the train value and test value. Fortunately it works and can show all FPR with nice table. In the button we will post the code of our works and the results of each classifiers with FPR.
The assignment also let us know about what is the text classification and how to use it in the data mining. We also get the knowledge about each classifiers differences and similarity. This assignment let us to try the basic working of the python. It also improve our programming skills as we have never use Python before.
If we do the assignment again, we decide to use different ways to load the dataset and different ways of using classifiers as each classifier using it own algorithm and coding page. It is interesting for us to study how to use data mining in our life and how to use python as a programming languages.
Reference

Code
Code for 70 training and 30 testing:

Code for use environ1, environ3 and environ5 for training and environ2 and environ4
for testing:

Different algorithm:

Naive Bayes:

Nerual Network:

SVM:

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] algorithm python statistic network Introduction of applications of text classification
$25