, , , , ,

[SOLVED] Hw assignment 4 (a4): text classification cs6120

$25

File Name: Hw_assignment_4__a4___text_classification_cs6120.zip
File Size: 452.16 KB

5/5 - (1 vote)

HuggingFace is a popular platform that was primarily created to host open-source NLP models. It has expanded scope and capabilities to include computer vision and audio tasks. The site offers a wide range of resources, tools, and services, primarily focusing on transformer-based models. The “Models” tab hosts a vast collection of pre-trained transformer-based models for various NLP tasks, such as text classification, sentiment analysis, language generation, translation, and more. Users can explore and download these models for their specific applications or fine-tune them on custom datasets. In the “Datasets” tab, users can find diverse datasets curated for NLP tasks. These datasets cover various topics, languages, and domains, making them valuable resources for training and evaluating machine learning models. Users can explore, download, and contribute to these datasets.

Hugging Face “Spaces”, the collaborative workspaces on the platform, are a testament to its commitment to fostering teamwork. These Spaces provide a conducive environment for users to create, share, and collaborate on NLP and machine learning projects. Equipped with tools for version control, collaboration, and deployment, they enable teams to work together efficiently on research or development projects. The ‘Community’ tab further enhances this collaborative spirit, serving as a hub for interaction and collaboration among Hugging Face users. It features discussion forums, Q&A sections, and community-contributed resources such as tutorials, code snippets, and project showcases. Users can engage with other community members, seek help, and share their knowledge and expertise, thereby enriching the platform’s collective learning experience. The site also hosts Blogs, Competitions and Courses in ML.

The last assignment is an exploration of various capabilities of the portal, focusing on the classification tasks in NLP using both pipeline and AutoModel classes.

You will use the “yelp_review_full,” dataset for this assignment. The dataset is a comprehensive collection of reviews sourced from the popular business review platform Yelp. This dataset contains many reviews accompanied by star ratings, providing a valuable resource for sentiment analysis and natural language processing tasks.

The dataset covers various business categories and types, including restaurants, hotels, bars, salons, spas, retail stores, and more. Each review in the dataset contains textual content that provides insights into the reviewer’s experiences, opinions, and perceptions of the businesses they have visited. The reviews may vary in length and language style, reflecting the diverse nature of user-generated content on Yelp.

 

 

 

This assignment aims to test your familiarity with classification tasks for transformer-based models using the Yelp review dataset from Hugging Face.

 

Explore the Yelp review dataset using Python and Hugging Face’s Datasets library. Understand the structure of the dataset, including the features and labels.

 

Utilize the pipeline interface from the `transformers` library to perform sentiment analysis on Yelp reviews. Analyze the overall sentiment distribution and provide insights based on the results.

Perform a tournament of 3 models of your choice and compare results.

The comparison must include inference on your understanding of the types of mistakes each model made. It is not sufficient to simply report Accuracy or Confusion Matrix for each model. 

 

 

Summarize each review in the dataset using transformer-based models for text summarization (e.g., BART, T5). Use a pretrained model to predict the star rating based on the summaries instead of the full review text. Evaluate the model’s performance.

 

Utilize the zero-shot classification capability provided by Hugging Face’s transformers library to categorize Yelp reviews into one of the 20 provided classes. The classes can include categories such as restaurants, hotels, bars, salons, spas, fitness centers, automotive services, and others. Evaluate the predictions made by three different zero-shot classification models (e.g., BART, T5, GPT).

 

Classes:

 

Manual Categorization:

Randomly select 100 reviews from the Yelp review dataset. Read each review and manually categorize it into one of the 20 provided classes. Maintain a record of the manual categorizations for comparison with the predictions made by the zero-shot classification models.

Comparison and Accuracy Calculation:

Compare the manual categorizations with the predictions made by the zero-shot classification models for the subset of 100 reviews. Calculate the accuracy of each model by determining the percentage of reviews for which the predicted class matches the manually assigned class.

 

 

Please submit a fully executed jupyter notebook identifying question number and steps. Make sure to add comments to your solution.

Shopping Cart

No products in the cart.

No products in the cart.

[SOLVED] Hw assignment 4 (a4): text classification cs6120[SOLVED] Hw assignment 4 (a4): text classification cs6120
$25