Name: [SOLVED] Data Mining Projects List
Brand: Assignment Chef
SKU: 57565
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

Data Mining: Projects List

February 19th, 2025

1 Problem 1: Comparison of different positional encoding mecha–

nisms for Vision Transformers (ViTs)

Compare absolute positional encoding (APE) with various Relative Positional Encoding (RPE) meth- ods on regular classification tasks with Vision Transformers (ViTs). The family of RPE methods you test should include in particular: (1) regular additive RPE discussed in the class with 2L − 1 learnable parameters (where L stands for the sequence length), (2) a variant where the (i,j)-entry of the RPE matrix is given as a polynomial of the L1-distance between corresponding patches in the patch-grid and with learnable coefficients, (3) two RoPE variants (of your choice) taken from [Heo et al., 2024]. The comparison should be conducted on the MNIST and CIFAR datasets and should focus on the accuracy of different models on the evaluation set. It is fine to use a fraction of the CIFAR dataset for training different models, in the presence of limited computational resources.

2 Problem 2: Performer model for the custom attention kernel

Consider a customized attention matrix A ∈ RL×L (where L stands for sequence length), given as:

Ai,j = (qi(T)kj)4 (1)

Design a Performer model providing unbiased estimation of the above attention matrix and more computationally efficient that the regular Transformer applying that kernel. Consider two cases: (1) dQK(4) ≪ L and (2) dQK ≪ L, but dQK(4) ≫ L. In the latter setting, quantify the accuracy of your approximation (coming from the Performer model), by computing the empirical mean squared error of the attention matrix estimation as a function of the number of random projections that you use.

3 Problem 3: Linearization of the feedforward layers in NNs

Consider a feedforward layer of the following form, where W ∈ Rd×d , x, y ∈ Rd:

y = f(Wx). (2)

Assume that f is a GELU function. Propose an algorithm to approximate it via the following ”linearized variant”, where Φ : Rd×d → Rd×m , Ψ : Rd → Rm are some functions (to be constructed by you):

y′ = Φ(W)Ψ(x). (3)

The approximation does not need to be unbiased. Can you propose the unbiased variant ?

4 Problem 4: Smoothened version of the local attention matrix

In this problem, we do not make a distinction between query and key vectors, i.e. we assume that qi = ki for i = 1,…, L, where L stands for sequence length. Assume that query/key vectors for the attention mechanism are taken from R3 . Assume furthermore that the attention matrix used to model that data explicitly zeroes out interactions between tokens that are too far from each other:

for some hyperparameter δ > 0. Can you design an unbiased approximation of A of the form.

A = Q′ (K′)T , (5)

where Q′ , K′ ∈ RL×m and m is another hyperparameter (in particular we might have: m ≪ L) ?

References

[Heo et al., 2024] Heo, B., Park, S., Han, D., and Yun, S. (2024). Rotary position embedding for vision transformer. In Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., and Varol, G., editors, Computer Vision – ECCV 2024 – 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part X, volume 15068 of Lecture Notes in Computer Science, pages 289–305. Springer.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] Data Mining Projects List

Reviews

Related products

[SOLVED] data MATLAB

[SOLVED] ENVIR301 Data Mining Census Records

[SOLVED] Secondary Data Analysis Lab Exercise

[SOLVED] DSCI 550 Building Visual Apps to Your Multimodal Haunted Places Data using Data Science Creating

[SOLVED] Csci 2270 – data structures assignment 4

[SOLVED] CSE 6242 / CX 4242: Data and Visual Analytics HW3