5/5 - (1 vote)

1. This assignment will make use of the “Music & Speech” dataset of Marsyas:
• You can download the dataset from: https://opihi.cs.uvic.ca/sound/music_
speech.tar.gz
• This dataset has two copies of each song, delete the music/ and speech/ directories
and use the files in music-wav/ and speech-wav/ directories. There are 64 music
and 64 speech files. Each file has 30 seconds of audio stored as 16-bit signed integers
at 22050 Hz.
• Ground truth data for this dataset can be downloaded from IVLE. Format of the
file is filename t (tab) label
(newline), one song per line:
filename1tlabel1

filename2tlabel2

…
filename128tlabel128

The label field is either music or speech.
2. Follow the following steps to complete this assignment:
• Read the ground-truth file (music speech.mf).
• Load each wav file and convert the data to floats by dividing the samples by 32768.0.
Hint: use scipy.io.wavfile.read()
• Split the data into buffers of length 1024 with 50% overlap (or a hopsize of 512).
Only keep complete buffers, e.g. if the last buffer only has 1020 samples, omit it.
Hint: the starting and ending indices for the first few buffers are:
Buffer number start index end index
(not included in array)
0 0 1024
1 512 1536
2 1024 2048
. . .
We recommend that you use the “array slicing” feature provided by numpy:
for i in range(num_buffers):
start = …
end = …
buffer_data = whole_file_data[start:end]
• For each file, calculate time domain features for each buffer according to the given
formula. Given X = {x0, x1, x2, . . . xN−1} (N = 1024 for this assignment):
(a) Root-mean-squared (RMS):
XRMS =
vuut
1
N
N
X−1
i=0
x
2
i
1
(b) Zero crossings (ZCR):
XZCR =
1
N − 1
N
X−1
i=1 (
1 if (xi
· xi−1) < 0
0 else
• After calculating the features for each buffer, calculate the mean and uncorrected
sample standard deviation for each feature over all buffers for each file.
• Now you have finished calculating time domain features. To calculate frequency
domain features, multiply each buffer with a Hamming window.
Hint: use scipy.signal.windows.hamming()
• Perform a Discrete Fourier Transform for each windowed buffer.
Hint: use scipy.fft().
Note: the DFT gives you both “positive” and “negative” frequencies, whose values
are mirrored around the Nyquist frequency. Discard the negative frequencies (whose
array indices are above N/2 for an FFT of length N).
• Calculate the following frequency domain features for each spectral buffer. Given a
spectral buffer X:
(a) Spectral Centroid (SC):
SC =
PN−1
k=0 k · |X[k]|
PN−1
k=0 |X[k]|
(b) Spectral Roll-Off (SRO): which is the smallest bin index R such that L energy
is below it. For this assignment, we will use L = 0.85.
X
R−1
k=0
|X[k]| ≥ L ·
N
X−1
k=0
|X[k]|
(c) Spectral Flatness Measure (SFM):
SFM =
exp
1
N
PN−1
k=0 ln |X[k]|

1
N
PN−1
k=0 |X[k]|
Note: using the log-scale is useful for avoiding multiplications which may exceed
the bounds of double floating-point arithmetic.
• After calculating the features for each buffer, calculate the mean and uncorrected
sample standard deviation for each feature over all buffers for each file.
• Output your results to a new ARFF file and name it as results.arff. The header
of it should be like:
@RELATION music_speech
@ATTRIBUTE RMS_MEAN NUMERIC
@ATTRIBUTE ZCR_MEAN NUMERIC
@ATTRIBUTE SC_MEAN NUMERIC
@ATTRIBUTE SRO_MEAN NUMERIC
@ATTRIBUTE SFM_MEAN NUMERIC
@ATTRIBUTE RMS_STD NUMERIC
@ATTRIBUTE ZCR_STD NUMERIC
@ATTRIBUTE SC_STD NUMERIC
@ATTRIBUTE SRO_STD NUMERIC
@ATTRIBUTE SFM_STD NUMERIC
@ATTRIBUTE class {music,speech}
The format of the data section should be:
@DATA
RMS_MEAN1,ZCR_MEAN1,SC_MEAN1,SRO_MEAN1,SFM_MEAN1,RMS_STD1,ZCR_STD1,SC_STD1,SRO_STD1,SFM_STD1,music
…
Concretely, the @DATA section should be:
2
@DATA
0.057447,0.191595,128.656296,239.404651,0.329993,0.027113,0.036597,13.206525,27.957121,0.087828,music
…
0.062831,0.082504,78.481380,145.886047,0.198849,0.032323,0.070962,39.388633,66.942115,0.133545,speech
Note: Please keep at least 6 digits after the decimal point for output.
3. Submit a zip file to IVLE containing your source code (a single .py file) and the ARFF
file. Name the zip file using your student number (e.g. A0123456H.zip).
4. Note: You may use any python standard libraries, numpy (including pylab / matplotlib)
and scipy. No other libraries are permitted. Late submissions will receive no marks.
5. Grading scheme:
• 4/6 marks: correct ARFF file.
• 2/6 marks: readable source code (good variable names, clean functions, necessary
comments).

0. This assignment will use the same “music / speech” dataset that we used for assignments
1.
1. Follow the following steps to complete this assignment:
• Read the ground truth music speech.mf file.
• Load each wav file and splits the data into buffers of length 1024 with 50% overlap.
Only keep complete buffers.
• Calculate the MFCCs for each window as specified in the lecture notes. Here are
more detailed steps:
– Given input x(t) and output y(t), the pre-emphasis filter should be
y(t) = x(t) − 0.95x(t − 1).
– Use a Hamming window before the mag-spectrum calculation.
– Mel-scale of frequency f is:
Mel(f) = 1127 ln(1 + f
700
).
– Calculate 26 mel-frequency filters, covering the entire frequency range (from
0 Hz to the Nyquist limit). To calculate the filters,
∗ find the X-axis points of the filters (left side, top, right side). All points must
be convereted into integer FFT bins; the left side should use the floor()
operation; the top point should use round(); the right point should use
ceil().
∗ assign the left bin to be 0, top bin to be 1.0, right bin to be 0; linearly
interpolate between the rest.
– the log step should be log base 10.
– scipy has DCT built-in: scipy.fftpack.dct()
– do not calculate any delta-features
• Calculate the mean and standard deviation for each MFCC bin over the entire file.
So if there are M MFCC bins in each buffer, you will end up with a feature vector
of length 2M for each song.
• Write the data to an ARFF file (each line should contain the 26 means, followed by
the 26 standard deviations, and finally the class).
• Make two plots: the overall range of the triangular windows, and the triangular
windows from 0 to 300 Hz. They should match the examples below.
2. Submit a zip file to IVLE containing your program’s source code ((a single .py file), the
ARFF file and 2 plots. Name the zip file using your student number (e.g. A0123456H.zip).
Late submissions will receive no marks.
3. Note: You may use any python standard libraries, numpy (including pylab / matplotlib)
and scipy. No other libraries are permitted.
4. Grading scheme:
• 4/9 marks: correct ARFF file.
• 2/9 marks: 2 correct plots.
• 3/9 marks: readable source code (good variable names, clean functions, necessary
comments).

1. Visualization
Before performing any analysis of data, a good starting point is always trying to visualize
it. What does your data look like? To visualize, plot the following pairs of features:
• ZCR MEAN TIME (x-axis) and PAR MEAN TIME (y-axis)
• ZCR STD TIME (x-axis) and PAR STD TIME (y-axis)
Each plot should have axis labels, distinguishable markers and a legend. Save these plots
as zcr-par-mean.png and zcr-par-std.png. Hint: you can use the python library arff
to load an arff file. Could you use any of these features to distinguish between music and
speech? Why? Keep these questions in mind for the next section.
2. Classification
Build a classifier to perform classification on the features in the given ARFF file in Weka
with trees.LMT (Logistic Model Tree) with 10-fold cross-validation and save the results.
Choose at least one other classification algorithm with 10-fold cross-validation to build
another classifier, perform classification and save the results.
Compare the results of these two algorithms and save your findings to a file called
classifications-results.txt. This file should answer at least these three questions:
• Which algorithm gives you the best results?
• Which features contribute most significantly in classification?
• Where do these features come from (time, spectral or perceptual domain) and why
do you think these features can contribute?
You are encouraged to write down any other findings. After answering these questions,
save the Weka results for the two algorithms and put them at the end of the text file.
3. Submit a zip file to IVLE containing the two plots (zcr-par-mean.png, zcr-par-std.png)
and the classification-results.txt. Name the zip file using your student number
(e.g. A0123456H.zip). Late submissions will receive no marks.
4. Grading scheme:
• 3/7 marks: correct and well labeled plots.
• 4/7 marks: results and discussion of Weka output.

1. Pitch tracking and wav file synthesis
Listen to scale.wav and write a program to synthesize a sinusoidal wav file which contains
the same notes. You may consider the following steps:
• Use open source pitch detection tools to detect the f0 of the notes in scale.wav.
Hint: you can use Sonic Visualiser with pYIN plugin.
• Check the output of pitch detection and find out the start and end time of each note.
• Generate a sine wave for each note and concatenate them together with the same
time arrangement as the original scale.wav.
• Use a sampling rate of 44100Hz to save your synthesized wav file to sin scale.wav.
Hint: use scipy.io.wavfile.write().
• Listen to sin scale.wav and compare it with scale.wav. Why do the same notes
sound differently in these two wav files? Save your answer to comparison.txt.
Hint: you can plot the time-aligned spectrograms of two wav files to find the difference.
2. Note length modification
Choose one of the eight notes in scale.wav and increase the length of it by 2 seconds.
You may consider the following steps:
• Read the data from scale.wav and plot it.
• In the plot, find a single complete period of the note.
• Increase the length of the note by looping this period for several times.
Hint: you can use numpy.concatenate().
• Save your modified data to long scale.wav.
3. Submit a zip file to IVLE containing your source code, two wav files and your answer
to the question (comparison.txt). Name the zip file using your student number (e.g.
A0123456H.zip). Late submissions will receive no marks.
4. Grading scheme:
• 2/8 marks: correct wav file for task 1.
• 2/8 marks: correct answer to the question in task 1.
• 2/8 marks: correct wav file for task 2.
• 2/8 marks: readable source code.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] Cs 4347 assignments 1 to 4 solution

Reviews

Whatsapp Us

[SOLVED] Cs 4347 assignments 1 to 4 solution

Reviews

Related products

[SOLVED] Comp 1406 assignments 1 to 6 solution

[SOLVED] ELEC 373 Verilog Assignments 1 2 2024-2025 Python

[SOLVED] Comp 2401 assignments 1 to 5 solution

[SOLVED] FIT5145 Foundations of Data Science Assignments 1 3 Business and Data Case Study Semester

[SOLVED] COMP1005 B Winter 2025 Introduction to Computer Science I Assignments 3Python

[SOLVED] Csis 215 programming assignments 1 to 4 solution