5/5 - (1 vote)

In this assignment, you will develop a data pipeline to ingest, process, store it so you can access it through different means.

Preparation:

Review the tutorials we discussed in class on Snowflake, Apache Beam, Dataflow, Accessing Amazon S3 buckets, Sevir etc.
Review:

https://nbviewer.jupyter.org/github/MIT-AI-Accelerator/eie-sevir/blob/master/examples/SE VIR_Tutorial.ipynb

https://www.ncdc.noaa.gov/stormevents/ftp.jsp

Case:

You are a freshly minted data scientist engineer at WeSpace Inc. which leverages space and weather data to build forecasting systems! It is exciting and you just bought your first Tesla to celebrate. Post Covid, you drive to your office and your manager is kind enough to offer the corner office with the view of the ocean on one side and mountains on the other. She checks in to make sure you dont miss the freshly baked cookies that are at the kitchen. During group lunch, the team is engrossed in a fresh challenge. They plan to build a nowcasting system leveraging satellite data and weather datasets but everyone in the team wants to build models!

Where is the data ? You ask. Everybody stares at you and pretend it isnt a problem and continues the conversation about Deep Learning and other problems. Your manager

interjects and asks your question to the group again. That is an important question! Where is the data?.

Some team members say it is all on AWS and in csv files and HDF5 format.

How do we plan to access it? You ask!

Well, we will get there when we get there!, a team member says!

Your manager in your 1:1, commends you for asking the hard questions and gives you your first assignment! First, we need to try out how to access the data. She suggests 3 architectures and asks you to experiment and provide your recommendations.

Here is the dataset catalog:

https://github.com/MIT-AI-Accelerator/eie-sevir/blob/master/CATALOG.csv

Gives you information on the datasets we need for the project.

Experiments needed:

1. AWS S3

Leave the data here where it is: https://registry.opendata.aws/sevir/
Upload a sample of data to Amazon S3
Experiment with Amazon Glue and build a pipeline.
Show how to use Amazon Quicksight to query and sample datasets

2. Google

Download sample datasets and move it to Google storage buckets
Experiment with Google Dataflow and build a pipeline
Show how Google Bigquery and Data studio to query and sample datasets. (https://cloud.google.com/bigquery/docs/visualize-data-studio)

3. Snowflake

Download sample datasets and move it to Snowflake
Use Sql-alchemy and Apache Superset to query and sample datasets.https://superset.apache.org/docs/databases/snowflake

Questions to answer:

How can you combine satellite data + storm data?
Which architecture would you recommend for machine learning purposes?
What are the challenges working with different architectures?
Do a cost/benefit analysis for each architecture

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSYE7245 Assignment1-Three experiments with Big data

1. AWS S3

2. Google

3. Snowflake

Reviews

Whatsapp Us

[Solved] CSYE7245 Assignment1-Three experiments with Big data

1. AWS S3

2. Google

3. Snowflake

Reviews

Related products

[Solved] CSYE7245 Lab10-Fastapi

[Solved] CSYE7245 Assignment3-APIs

[Solved] CSYE7245 Lab4- SQLAlchemy

[Solved] CSYE7245 Lab8-Airflow TFX

[Solved] CSYE7245 Assignment4

[Solved] CSYE7245 Lab 1- Getting started with AWS + Lambda