For this lab you will be using Ruby. Although Ruby is Hybrid language which combines Scripting, Imperative, Functional, and Object-Oriented concepts, we will be focusing simply on the Scriptin and Imperative elements. Thus, solutions utilizing Functional or OO capabilities will be heavily penalized.You can install ruby using instructions found at the following sites:WindowsMacLinuxConcepts we will explore as a part of this Lab:Regular Expression -> See this tutorial and this tooRuby -> See this tutorialProcedural LanguagesStep -1: Fork and Clone thisRepository1. Using the Fork button above fork your own version of this repository.2. On your system, use your installation of Git to clone your fork of this repo on your localmachine.3. Once you have the repository cloned, you can continue on.Step 0: Getting everything ready1. Install Bundlergem install bundler2. Install project dependencies (from the project root directory)bundler install binstubsDatasetThis lab will make use of a dataset containing a million song titles. This dataset is used in variousmachine learning experiments and is provided by the Laboratory for the Recognition andOrganization of Speech and Audio at Columbia University. I have added this dataset to therepository under the name unique_tracks.txtIn addition, I have created a subset of this dataset containing only song titles that begin with theletter A. We will use this file for debugging and testing purposes. This can be found in the filea_tracks.txt .File TemplatesThere are four files of concern in the project directory:ruby_lab.rb This is the ruby template in which you will add code to complete thisassignment. You are given some code, do not remove this code unless you are sure of whatyou are doing. If the tests do not pass due to your modifications, you will lose credit.questions.txt This is the file in which you will answer identified questions as part of the lab.a_tracks.txt A small testing data set which is a subset (only tracks with titles starting withA) derived from the next file.unique_tracks.txt The million record data set which we will use for the lab.There are some other files:spec/*spec.rb These are test files used by RSpec to evaluate your code..rspec This indicates to RSpec that this project can be testedGemfile provides the specicification of dependencies for this projectREADME.md this markdown fileThis program takes as input the dataset file. For example, I execute the program at the commandline as follows:$ ruby ruby_lab.rb unique_tracks.txtThis initial template gives code to loop through each line of the file and prints out the line. Youprobably will not want to keep this line. Remember you use Ctrl+C or Cmd+C to cancel theexecution of the program.Pre-processingStep 1: Extract song titleEach line contains a track id, song id, artist name, and the song title, such as:TRWRJSX12903CD8446<SEP>SOBSFBU12AB018DBE1<SEP>Frank Sinatra<SEP>Everything Happens To MeYou are only concerned with the last field, the song title. As your first task, you will write a regularexpression that extracts the song title and stores it as the variable title . You will discard allother information.You may find this site useful in debugging your regular expression: https://regex101.com/. It allowsyou to test your regular expressions on a block of text that your provide.Step 2: Eliminate superfluous textThe song title, however, is quite noisy, often containing additional information beyond the songtitle. Consider this example:Strangers in the Night (Remastered Album Version) [The Frank Sinatra Collection]You need to perform some pre-processing in order to clean up the song titles. You will write aseries of regular expressions that match a block of text and replace it with nothing.Begin by writing a regular expression that matches a left parenthesis and any text that follows it.You need not match the right parenthesis explicitly. Replace the parenthesis and all text thatfollows it with nothing.In the above Sinatra example, the modified title becomes Strangers In The Night .Repeat this for patterns beginning with the left bracket, the left curly brace, and all the othercharacters listed below:( [ { / _ : ` + = * feat.Note that the above lists the left quote (on the tilde key above tab) and not the apostrophe (locatedleft of the enter key). This is a very important distinction. We do no want to omit the apostrophe asit allows contractions.Many of these characters have special meanings in Ruby. Make sure you properly escapesymbols when necessary. Failing to escape characters properly will be the most common mistakemade in this lab.The last one listed above is an abbreviation feat. short for featuring and followed by artistinformation you do not need to retain. For example, Sunbeam feat. Vishal Vaid becomesSunbeam .In most cases, these symbols indicate additional information that need not concern us for thisexercise. The above steps will very occasionally corrupt a valid song title that actually contains, forexample, parentheses in the song title. Do not worry about these infrequent cases and uniformlycarry out the procedure listed above. These steps will catch and fix the vast majority ofirregularities in the song titles.Step 3: Eliminate punctuationNext, find and delete the following typical punctuation marks:? ! . ; & @ % # |Unlike before, delete only the symbol itself and leave all of the text that follows. Be sure to do aglobal match in order to replace all instances of the punctuation mark. Be careful to match theperiod itself as the symbol . has a special meaning in regular expressions. This is true for manyof the symbols above. Again, refer to a list of escape characters specific to the language youselected.Step 4: Filter out non-English charactersLastly, ignore all song titles that contain a non-English character (e.g., , , , etc.). (Hint: it may beeasier to match titles that contain only English characters than to match titles that contain non-English characters). I define English characters to include the word meta-character definition(typically w and s in most languages) as well as the apostrophe character. This process willallow a few non-English song titles to creep through (e.g., amore mio), but will eliminate themajority of non-English titles.Step 5: Set to lowercaseConvert all words in the sentence to lowercase. Each of these languages has a special function todo this for you.Self-CheckIn the a_tracks dataset, after all filtering steps, I find 52,760 valid song titles.N.B.: If you are close to my number (within 10s), that is sufficient. If you are way off (i.e., 100+),you should double check your regular expressions.This check can be performed using the following command:On Mac or Linux:rspec spec/self_check_1_spec.rbOn Windows:rspec specself_check_1_spec.rbBi-gram CountsA bigram is a sequence of two adjacent words in a text. The frequency distribution of bigrams intext(s) is commonly used in statistical natural language processing (seehttp://en.wikipedia.org/wiki/Bigram). Across this corpus of one million song titles, you will count allthe bigram words.First, you need to split the title string into individual words. Next, you should use one or more datastructures to keep track of these word pair counts. That is, for every word, you must keep track ofthe count for each word that follows it. I strongly recommend you design your data structure forfast retrieval. Put some thought into which data structure to choose. Once you have decided, youcan compare your choice to mine.Self-CheckAfter you build and populate your bigram data structure, you can check yourself.In the a_tracks dataset:This check can be performed using the following command:On Mac or Linux:rspec spec/self_check_2_spec.rbOn Windows:rspec specself_check_2_spec.rbThe most common word to follow happy is nowThe most common word to follow sad is loveThe most common word to follow love is songThere are 80 distinct words that follow the word love.The word song follows love 33 times.Building a Song TitleNow you are going to build a probabilistic song title. First begin by creating a function mostcommon word mcw() . This function will take in one argument, some word, and returns the wordthat most often followed that word in the dataset. If you find a tie, randomly select one value. Forexample, the line puts mcw(computer) should give you your answer to Question 4.Now you are going to use this function to string together a song title. Beginning with a givenstarting word, write an iterative structure that strings together words that most commonly followeach other in the dataset. Continue until a word does not have a successive word in the dataset,or the count of words in your title reaches 20.Lab QuestionsUse your data structure(s) on the unique_tracks dataset to answer these and all subsequentLab Questions.To answer these questions execute the following command (in the terminal) from the root directoryof your project:On Mac or Linuxrspec spec/lab_quest_1_5_spec.rb -o lab_quest_1_5_output.txtOn Windows:rspec speclab_quest_1_5_spec.rb -o lab_quest_1_5_output.txt1. Which word most often follows the word happy?2. Which word most often follows the word sad?3. How many different (unique) words follow the word computer?4. Which word most often follows the word computer?5. How many times does this word follow computer?User ControlNow add loop that repeatedly queries the user for a starting word until they choose to quit. Istarted it for you in the template. Your program will ask:Enter a word [Enter q to quit]:For each word entered, use your code above to create a song title of 20 words (or less). Print outyour newly designed song title. Repeat, querying the user for a new word.Self-CheckFor the a_tracks dataset:This check can be performed using the following command:On Mac or Linux:rspec spec/self_check_3_spec.rbOn Windows:rspec specself_check_3_spec.rbUsing the seed word happy, you should get the title: happy now the world of the worldof the world of the world of the world of the world ofUsing the seed word sad, you should get the title: sad love song for you ready for youready for you ready for you ready for you ready for you readyUsing the seed word computer, you should get the title: computer because no song titlesin a_tracks contain the word computerLab QuestionsTo answer these questions execute the following command (in the terminal) from the root directoryof your project, for questions 6-9. Question 10 should be answered in the questions.txt file:On Mac or Linuxrspec spec/lab_quest_6_9_spec.rb -o lab_quest_6_9_output.txtOn Windows:rspec speclab_quest_6_9_spec.rb -o lab_quest_6_9_output.txt1. Using the starting word happy, what song title do you get?2. Using the starting word sad, what song title do you get?3. Using the starting word hey, what song title do you get?4. Using the starting word little, what song title do you get?5. Try a few other words. What problem(s) do you see? Which phrase do you most often findrecurring in these titles?Stop WordsNext try to fix the aforementioned problem(s) you observed in Question 10. In NLP, stop words arecommon words that are often filtered out, such as common function words and articles. Beforetaking your bigram counts, filter out the following common stop words from the song title:a, an, and, by, for, from, in, of, on, or, out, the, to, withLab QuestionsTo answer these questions execute the following command (in the terminal) from the root directoryof your project, for questions 11-13. Questions 14 and 15 should be answered the qustions.txtfile.:On Mac or Linuxrspec spec/lab_quest_11_13_spec.rb -o lab_quest_11_13_output.txtOn Windows:rspec speclab_quest_11_13_spec.rb -o lab_quest_11_13_output.txt1. Using the starting word amore, what song title do you get?2. Using the starting word love, what song title do you get?3. Using the starting word little, what song title do you get?4. Explain why so many of the titles devolve into repeating patterns.5. Try several words. Find a song title that terminates in less than 20 words. Could you find one?If so, which song title did you find? If not, why not?Last StepImplement a fix for the problematic phenomenon you observed in Question 6. If you havesuccessfully solved these problems, you can remove the restriction of 20 words maximum in thesong title. (Hint: If it goes boom, then you have not solved the problem)Lab QuestionsAnswere the following questions Questions 16 through 20 in the questions.txt file.:1. Describe in one or two paragraphs your extension and how it fixed the repeating phrase/wordproblem.2. Using the starting word montana, what song title do you get?3. Using the starting word bob, what song title do you get?4. Using the starting word bob again, do you get the same title? If no, what do you get? Try ita third time. Explain why the title might differ each time.5. Share your favorite song title that you have found.TroubleshootingThis lab requires an independent study of the Ruby language. You are encouraged to use any webtutorials and resources to learn this language. Given the size of the class, I will not be able todebug your code for you. Please do not send panicked emails requesting I fix your bug foryou. Allow yourself plenty of time, and use patience, perseverance, and the internet todebug your code.Lab QuestionsThe following questions are for feedback and evaluation purposes. Points are awarded for anysincere answer.These questions should be answered in the questions.txt file.1. Name something you like about Ruby. Explain.2. Name something you dislike about Ruby. Explain.3. Did you enjoy this lab? Which aspects did you like and/or dislike?4. Approximately how many hours did you spend on this lab?5. Do you think you would use Ruby again? For which type(s) of project(s)?GradingWith the exception of questions 10, 14, 15, 16, 19, and 20 questions 1-20 are automaticallygraded by passing the associated tests. This is all or nothing, either you passed the tests or not.The remaining questions will be graded based on your answers and may receive partial credit.Questions 21 25 will be awarded full credit as long as there is a sincere and appropriate answer.Questions 1-20 -> 2 points eachQuestions 21-25 -> 1 points eachComments and Code Style -> 5 pointsSubmissionEach student will complete and submit this assignment individually. Do not consult with others.However, you are encouraged to use the internet to learn any aspect of Ruby you need tocomplete the assignment, but not to answer the questions asked in this lab.Comment your program heavily. Intelligent comments and a clean, readable formatting of yourcode accounts for 20% of your grade.Save the final version of your program and construct a zip file containing ONLY the followingitems:ruby_lab.rblab_quest_1_5_output.txtlab_quest_6_9_output.txtlab_quest_11_13_output.txtquestions.txtNote, I will only accept plain text files for your answers. The submission of PDF,Word, Images, or anything other format except plain text will recevie zero credit.
CSCI305
[Solved] The Most Probable Song Title CSCI 305
$25
File Name: The_Most_Probable_Song_Title__CSCI_305.zip
File Size: 357.96 KB
Only logged in customers who have purchased this product may leave a review.

![[Solved] The Most Probable Song Title CSCI 305](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] MLing Sets CSCI 305](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.