[Solved] CSYE7245 Lab8-Airflow TFX

$25

File Name: CSYE7245_Lab8-Airflow_TFX.zip
File Size: 235.5 KB

SKU: [Solved] CSYE7245 Lab8-Airflow TFX Category: Tag:
5/5 - (1 vote)

This lab demonstrates the functionalities of Airflow to programmatically automate, author, schedule and monitor workflows.

  • Airflow is a platform to programmatically automate, schedule and monitor workflows.
  • In Airflow, a DAG or a Directed Acyclic Graph is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.
  • The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies.
  • Rich command line utilities make performing complex surgeries on DAGs a snap.
  • The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

Experiment Setup

  1. Create a new project and install the required dependencies.

pip install apache-airflow

  1. EXPORT AIRFLOW_HOME

Enter the path of the present working directory

  1. Initialize the instance

airflow db init

  1. Create an admin user

airflow users create

username admin

firstname YourName

lastname YourLastName

role Admin

email [email protected]

  1. Start the Daemon in the background.

airflow webserver -D

It usually runs on port 8080

  1. To check whether Airflow Daemon is running:

List the services running on port 8080

lsof -i tcp:8080

  1. Start the scheduler

airflow scheduler

Check the web server on 127.0.0.1:8080

  1. Create folder dags inside AIRFLOW_HOME

Place the DemoDag python file under the dags folder.

  1. Kill and Start the scheduler again to show the dags on the web server

lsof -i tcp:8080

Kill the pid of the running services on port 8080

  1. Start the web server again by airflow webserver -D
  2. The file can now be seen under the dags folder.
  • Dags can be scheduled and run every minute or hourly/daily
  • You can also pause/unpause the dag depending on the requirement
  1. Trigger your dag
  2. Check the logs for additional information
  1. Adding tasks to a DAG

Adding task_2 by making changes in the code and clicking the Update button

Checking logs for additional information

  1. Restructuring the code

Airflow TFX

  1. Installing requirements.txt
  2. You can now see the taxi_pipeline dag in the dags folder.

3. The Tree view looks something like this:

  1. Successful execution of taxi pipeline.
  • ExampleGen ingests and splits the input dataset.
  • StatisticsGen calculates statistics for the dataset.
  • SchemaGen SchemaGen examines the statistics and creates a data schema.
  • ExampleValidator looks for anomalies and missing values in the dataset.

Lessons learned

  1. This lab helps us understand how Airflow allows users to create workflows with high granularity and track the progress as they execute.
  2. Airflow enables us to have a platform that can run and automate all the jobs on a schedule.
  3. You can also add/transform jobs as and when required.

References

  1. https://www.tensorflow.org/tfx/tutorials/tfx/airflow_workshop
  1. https://github.com/tensorflow/tfx/tree/master/tfx/examples/airflow_workshop

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSYE7245 Lab8-Airflow TFX
$25