5/5 - (1 vote)

This lab demonstrates leveraging and implementing Kafka services for static data alongwith real-timeTwitter streaming.

Apache Kafka is a streaming message platform. It is a publish-subscribe based durable messaging system. Kafka is designed to be high performance, highly available, and redundant. It is used to collect, process, store, and integrate data at scale. A messaging system sends messages between processes, applications, and servers.

Its basic use cases includes:

Stream Processing
Messaging
Website Activity Tracking
Log aggregation
Event Sourcing
Application health monitoring

These are four main parts in a Kafka system:

Broker: Handles all requests from clients (producer, consumer and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster

Name: [Solved] CSYE7245 Lab3- Apache Kafka
Brand: Assignment Chef
SKU: [Solved] CSYE7245 Lab3- Apache Kafka
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

Zookeeper: Keeps track of status of the Kafka clusters (brokers, topics, users)

Producer: Sends records to a broker

Consumer: Consumes batches of records from the broker

Experiment setup

Prerequisites:

Installing Oracle Virtual VM Box

Specifications:

4 GB RAM
25 GB Hard Drive
Downloading ubuntu iso file

Oracle VM Virtual Box Manager

Installing Ubuntu Guest Edition

sudo apt install build-essential dkms linux-headers-$(uname -r)

Able to copy/paste the contents easily
Full screen mode available
Certain in-built headers/packages available for additional functionalities

Installing Python

Installing the latest version of Python

sudo apt install python3

sudo apt install python3-pip

python3 version

Installing AWS CLI

AWS CLI helps to access multiple AWS services and functionalities from the command line.

sudo apt install curl

curl https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -o awscliv2.zip

unzip awscliv2.zip

sudo ./aws/install

/usr/local/bin/aws version

Connecting with AWS

Connecting the server with AWS account by entering the Access and Secret keys

aws configure

aws s3 ls

Installing Java jdk

Java jdk is required for starting the Kafka broker and services

sudo apt update

sudo apt list

sudo apt install default-jre

sudo apt install default-jdk

javac version

Installing Pycharm in Ubuntu

Test Results

Installing Kafka

Download Apache Kafka from here

Unzip Kafka binaries by using tar -xzvf

pip3 install kafka-python

Starting the Zookeeper service and Kafka broker

Navigate to the directory where the downloaded files are unzipped and start the Zookeeper service

bin/zookeeper-server-start.sh config/zookeeper.properties

Start the Kafka broker in a new terminal

bin/kafka-server-start.sh config/server.properties

Use Cases

Collecting real time sampled tweets from Twitter and publishing them to our Kafka Broker

Running the script producer.py for generating events

Running the script consumer.py to consume the events published by the producer.

twitter-stream.py

Using the twitter-stream.py script to fetch tweets from Twitters API in real-time.

Entering our bearer token in the twitter.py script under the BEARER_TOKEN parameter.

Tweets are published to the Kafka Broker.

On running consumer.py again, we can see all the published events that are collected by the consumer.

Lessons learned

Learnt configuration of Oracle Virtual Box with Ubuntu operating system
Learnt the basic fundamentals of Apache Kafka
Implemented real-time data streaming using Twitter API in Apache Kafka

References

https://docs.cloudera.com/documentation/enterprise/6/6.1/PDF/cloudera-kafka.pdf

https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSYE7245 Lab3- Apache Kafka

Broker: Handles all requests from clients (producer, consumer and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster

Zookeeper: Keeps track of status of the Kafka clusters (brokers, topics, users)

Producer: Sends records to a broker

Consumer: Consumes batches of records from the broker

Experiment setup

Test Results

Use Cases

Lessons learned

References

Reviews

Related products

[Solved] CSYE7245 Lab2- GCP-Datalab/Dataflow

[Solved] CSYE7245 Assignment1-Three experiments with Big data

[Solved] CSYE7245 Assignment4

[Solved] CSYE7245 Lab7-Streamlit

[Solved] CSYE7245 Lab5-Snowflake

[Solved] CSYE7245 Lab9-Acne Type Classification Pipeline using CNN