[Solved] COMP1007 Lab 8-Staff Salary Calculation and Charts

$25

File Name: COMP1007_Lab_8-Staff_Salary_Calculation_and_Charts.zip
File Size: 471 KB

SKU: [Solved] COMP1007 Lab 8-Staff Salary Calculation and Charts Category: Tag:
5/5 - (1 vote)

In this section, we are going to learn how to use advance feature of Pandas package to calculate staff salaries and use Matplotlib package to create different types of charts.

By finished this section, you will be able to:

  • Retrieve the content of the CSV files and create data frames.
  • Perform calculation with DataFrame.
  • Create summaries using Pivot Table.
  • Plot graphs.

Downloading the Data Files and Creating a Code File

Download the files door_access_records.csv and hourly_rates.csv from the BU eLearning web site and save them at your Spyder working directory.

Importing Required Library

Add the following lines to import the libraries:

import pandas as pd import numpy as np

Creating a Data Frame with the CSV File Content

Firstly, we have to load the data from the data file and create a data frame.

  1. Load the door access records from the file named csv and create a data frame named records. The pd.read_csv() function accepts a file name and returns a data frame.

Add the following line to create a data frame.

records = pd.read_csv(door_access_records.csv)

The data is stored in the records data frame as follows:

Date Staff Name Department Working Hours (In & Out)
0 6/15/2018 Jacky Lee IT 6:12-20:44
1 6/15/2018 Joyce Ho Human Resources 8:26-19:51
2 6/15/2018 Ivy Chu Sales 8:29-18:44
3 6/15/2018 David Chan Purchasing 8:33-19:50
4 6/15/2018 Gary Yuen Logistics 8:34-19:45
  1. Load the hourly salary rates from another file named csv and create a data frame named hourly_rates.

hourly_rate = pd.read_csv(hourly_rate.csv, index_col = Staff Name)

With the index_col parameter, we set the column Staff Name as an index. The data is stored in the hourly_rates data frame as follows:

Salary Rate (per hour)
Staff Name
Jacky Lee 60
Joyce Ho 70
Ivy Chu 55
David Chan 65
Gary Yuen 75

Computing the Working Hours

In the records data frame, the Working Hours (In & Out) column does not really store the working hours. It stores the IN time and OUT time of the staffs on individual days. The IN time and OUT time are stored in the same column with a minus sign as a separator. Now, we split the values and use them to compute the working hours. The following formula are used to compute the working hours:

( ) ( )

=

60

  1. Split the Working Hours (In & Out) column by using the split() function. We temporarily store the values in two sets of Series in_time and out_time.

in_time, out_time = records[Working Hours (In & Out)].str.split(-).str

The str.split() function accepts a string for splitting the values. In this example, we use a minus sign. And, we use .str to convert the results to string format. Now, in_time and out_time store time strings respectively.

  1. Store the string versions of IN time and OUT time in the records data frame.

records[In] = in_time

records[Out] = out_time

The records data frame has two more columns now.

  1. Convert the values stored in in_time and out_time to the datetime format, so as to compute the time differences.

in_time = pd.to_datetime(in_time) out_time = pd.to_datetime(out_time) The to_datetime() function returns a datetime object of the time string.

  1. Then, we subtract the OUT time by the IN time to get the working hours. The results of the subtraction are in timedelta object format.
  1. Use the astype() function with the option timedelta64[m] to retrieve the number of minutes representing in float format.

working_hours = out_time in_time

working_hours = working_hours.astype(timedelta64[m])

  1. Convert the working hours from minutes to hours and round the results down to two decimal places.

Add the working hours to the records data frame as a new column named Working Hours.

working_hours = round(working_hours/60, 2) records[Working Hours] = working_hours

  1. Delete the Working Hours (In & Out) column because it is no longer necessary.

records.drop(Working Hours (In & Out), axis = 1, inplace = True)

We now have the records data frame as follows:

Date Staff Name Department In Out Working Hours
0 6/15/2018 Jacky Lee IT 6:12 20:44 14.53
1 6/15/2018 Joyce Ho Human Resources 8:26 19:51 11.42
2 6/15/2018 Ivy Chu Sales 8:29 18:44 10.25
3 6/15/2018 David Chan Purchasing 8:33 19:50 11.28
4 6/15/2018 Gary Yuen Logistics 8:34 19:45 11.18

Computing the Daily Wages

To compute the daily wages, we need the hourly salary rate of each staff. After that we use the following formula to compute the daily wages.

  1. Look up the hourly salary rates for individual staffs by merging two data frames.

records = pd.merge(records, hourly_rate, how=left, left_on=Staff Name, right_index=True)

In the statement above, we provide the merge() function many things including:

  • records, hourly_rate two source data frames. The records data frame is on left-hand side, the hourly_rate data frame is on right-hand side.
  • how=left use left join method to merge the data. The merged result includes all rows of the left-hand side but some rows of right-hand side may not be included.
  • left_on=Staff Name the left-hand side, use the Staff Name column for merging.
  • right_index=True the right-hand side, use the index column for merging.

After the merging, the records data frame stores the data as follows:

Date Staff Name Department In Out Working Hours Salary Rate (per hour)
0 6/15/2018 Jacky Lee IT 6:12 20:44 14.53 60
1 6/15/2018 Joyce Ho HumanResources 8:26 19:51 11.42 70
2 6/15/2018 Ivy Chu Sales 8:29 18:44 10.25 55
3 6/15/2018 David Chan Purchasing 8:33 19:50 11.28 65
4 6/15/2018 Gary Yuen Logistics 8:34 19:45 11.18 75
  1. Compute the daily wages using the columns Working Hours and Salary Rate (per hour).

records[Daily Wage] = records[Salary Rate (per hour)] * records[Working Hours]

  1. Sort the rows by Working Hours and Staff Name in descending The result is stored back in the records data frame.

records.sort_values(by = [Working Hours, Staff Name], ascending = False, inplace = True)

Creating Summaries using Pivot Table

Pivot Table is a very useful and powerful feature which can be used to summarize, analyze, explore and present our data. We could use a Pivot Table to produce meaningful information from a large table of information, such as:

  • The total and daily working hours for each staff
  • The total and average working hours for each staff in each department
  • The total and average working hours of all staff in each department
  • The total and average working hours for each staff
  1. Create a pivot table named table1 that contains the total working hours for each staff.

table1 = pd.pivot_table(records, values = Working Hours, index = Staff Name, aggfunc = np.sum)

  1. Create a pivot table named table2 that contains the daily working hours grouped by the departments.
table2 = pd.pivot_table(records, values = Working columns=Date, aggfunc = np.sum) Hours, index = Department,
  1. Create a pivot table named table3 that contains the total working hours and the total wages, average wages, minimum and maximum daily wages for each staff.

table3 = pd.pivot_table(records, values = [Working Hours, Daily Wage], index = Staff Name, aggfunc = {Working Hours:[min, max, np.mean, np.sum], Daily Wage:np.sum})

  1. Create a pivot table named table4 that contains the total wages and working hours grouped by the departments. In addition, we rename the columns by using the rename()

table4 = pd.pivot_table(records, values = [Daily Wage, Working Hours], index = Department, aggfunc = np.sum)

table4.rename(columns={Daily Wage:Total salary, Working Hours:Total Working Hours}, inplace = True)

Creating Charts

After generating the pivot tables, we could create charts to show the distribution of the working hours among all departments, total working hours of each staff in the period, etc. Pandas DataFrame provides the plot() function for creating charts.

The plot() function provides many options for us to create different charts. The commonly used options (parameters) are:

  • x: label of x axis. If not specify, the index column will be used.
  • y: label of y axis.
  • kind: the kind of the chart including:
    • line : line plot (default) o bar : vertical bar plot o barh : horizontal bar plot o hist : histogram o box : boxplot
    • kde : Kernel Density Estimation plot o density : same as kde o area : area plot o pie : pie plot o scatter : scatter plot o hexbin : hexbin plot
  • ax: matplotlib axes object.
  • title: add the title to the chart.
  • legend: hide, show or reverse the legend.
  1. Create a vertical bar chart using the data of table1 with parameters kind = bar and legend = False:

table1.plot(kind=bar, legend = False)

  1. Create a horizontal bar chart using the data of table2 with parameters title = Working Hours in Different Departments and kind = barh.

table2.plot(title = Working Hours in Different Departments, kind = barh)

  1. Create a pie chart using the data in the column Total Working Hours of table4 with parameters kind = pie, y = Total Working Hours and legend = False:

table4.plot(title = Working Hours of Different Departments, kind = pie, y = Total Working Hours, legend = False)

Exercise

Exercise 1 Calculating Discounted Amount

Write a program to load the data from the data file named payment_records.csv to a data frame. Then, add two additional columns discount rate and discounted amount. The discount rate can be found in another data file named vip.csv. The discounted amount can be calculated using the following formula:

#$%&() =* _

You need to use the VIP Level to match the discount rate for each record. And, save the data to another file named result.csv.

Exercise 2 Pivot Table about Payment Method

Write a program to create a pivot table to show the sum of amounts in different payment methods (Mastercard, VISA, and Cash).

More Visualization Practice using Matplotlib

You may find that there may be some problems on the charts, such as the bars go behind the legend, text are overlapped, etc. To solve the problems and create some advanced charts, we use the library Matplotlib too. Lets explore the common chart functions include pie(), bar() and barh().

Pie Chart

The following example creates a pie chart with percentage labels using Mathplotlibs pie(). The parameters of the pie() function used in the example means:

  • data: the chart data.
  • labels: the text label shown for each slice.
  • autopct: the value label with a formatted string %.0f%%. In the formatted string, the string %.0f represents a float with no decimal place. The double-percent signs are used to add a percent sign to the label. With the string %.0f%%, we can show the float numbers as a percentage with no decimal place.

import matplotlib.pyplot as plt

plt.pie(table4[Total salary], labels=table4.index, autopct=%.0f%%) plt.title(Working Hours of Different Departments) plt.show()

Horizontal Bar Chart

The following example creates a horizontal bar chart using Matplotlibs barh() function. We pass two parameters to the barh() function as chart data and table4.index is used as Y axis labels, table4[Total salary] is used as X axis labels.

plt.barh(table4.index, table4[Total salary]) plt.title(Working Hours of Different Departments) plt.show()

Vertical Bar Chart

The following example creates a vertical bar chart using Matplotlibs bar() function. We pass two parameters to the bar() function as chart data and table4.index is used as X axis labels, table4[Total salary] is used as Y axis labels.

plt.bar(table4.index, table4[Total salary]) plt.xticks(rotation=vertical)

plt.title(Working Hours of Different Departments) plt.show()

MatPlotlib Together with Pandas Plot

Creating a chart below using MatPlotlib along is not easy. But, the legend overlaps with the bars if we create a chart using the Pandas plot() function. Actually, the problem can be solved by using both MatPlotlib and Pandas plot() function.

The procedure is as follows:

  1. Create a figure using Matplotlib.

fig = plt.figure()

  1. Create a bar chart using Pandas plot() function with the axes of the figure. The gca() function returns the current axes of the figure.

table2.plot(title = Working Hours in Different Departments, kind = barh, ax=fig.gca())

  1. Move the legend outside the chart area.

plt.legend(bbox_to_anchor=(1,1))

  1. Show the chart.

plt.show()

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] COMP1007 Lab 8-Staff Salary Calculation and Charts
$25