, , ,

[SOLVED] Cs537 p1: unix utilities

$25

File Name: Cs537_p1__unix_utilities.zip
File Size: 226.08 KB

Categories: , , , Tags: , , ,
5/5 - (1 vote)
5/5 – (1 vote)

In this assignment, you will build a set of small utilities that are variants of commonly-used commands in UNIX systems. You will be building 2 utilities:

  1. wisc-sed: This utility will be a variation of the sed tool which is used for searching, editing, replacing and deleting patterns in files.
  2. wisc-tar: This utility is similar to tar, which is a commonly used UNIX utility to combine/compress a collection of files into one file. This functionality is useful in a number of scenarios e.g. offering a single file to download for software. (If you’ve heard the phrase tarball, that comes from using tar!)

Learning objectives:

  • Re-familiarize yourself with the C programming language
  • Re-familiarize yourself with a shell / terminal / command-line of UNIX
  • Learn how to compile C files and execute binaries on the command-line
  • Learn a little about how UNIX utilities are implemented

Deliverables:

  • Set of .c files, one for each utility :  wisc-sed.c, wisc-tar.c
  • Each file should compile successfully when compiled with the -Wall and -Werror flags.
  • Each utility should pass tests we supply as well as tests that meet our specification that we do not supply
  • Include a single README.md for all the files describing your implementation

Administrivia:

  • This project is to be performed alone.
  • Due Date: June 10th, at 11:59pm (4 slip days throughout course, use wisely)
  • A small portion of the credit is allocated for good programming style and memory management. Read the detailsLinks to an external site..
  • This project is to be done on the lab machinesLinks to an external site., so you can learn more about programming in C on a typical UNIX-based platform (Linux).

wisc-sed

sed stands for Stream Editor, it is used to perform basic text transformations on an input stream such as a file. You will build wisc-sed, which implements just one functionality: string substitution.

Input

wisc-sed will take 3 mandatory inputs: <search string>, <replacement string>, <filename>. It will also take some optional flags. Here’s an example which substitutes the string ‘mascot’ with ‘bucky’ in file ‘a.txt’ while ignoring the case of the strings during comparison.

prompt> ./wisc-sed -c -s mascot -r bucky -f a.txt

There are 6 flags that your utility needs to handle. Some are mandatory while some are optional.

  1. -s <search string>: (Mandatory) This is the string to be searched and replaced.
  2. -r <replacement string>: (Mandatory) This is the replacement string.
  3. -f <input file>: (Mandatory) This is the input filename.
  4. -o <output file>: (Optional) This flag is used to specify an output file. If this isn’t specified than all the output should go to stdout.
  5. -n <line number>: (Optional) If this flag is passed, then the strings should be replaced only on line <line number>. If this flag is not passed, then the string replacement should be applied globally, i.e. all lines of the file. For this assignment, line numbers start from 1 (not 0).
  6. -c: (Optional) This flag means the string comparison is not case-sensitive. If this flag is not passed (default), then the string comparison is case-sensitive.

Output

The output of your utility is the modified file contents with the search string replaced by replacement string. If the -o flag is passed, then the output should be written to a new file with the specified name, else it should be printed to stdout.

Details

Here are a few details and assumptions you should keep in mind during implementation:

  1. All error messages are printed to stdout.
  2. If the 3 mandatory inputs are not passed to your utility, then you should print “usage: wisc-sed [optional flags] -s <search string> -r <replacement string> -f <file>” (followed by a newline) and exit with status 1.
  3. If the input file is not found you should print “wisc-sed: cannot open file” (followed by a newline) and exit with status 1.
  4. You can assume that the files provided as an input exist in the directory where the program is run from (no need to handle ways to store pesky path names).
  5. If an output file of the same name already exists, you can overwrite it with the new contents specified.
  6. If the line number passed to -n flag exceeds the number of lines in the file, then ignore the flag and don’t replace anything.
  7. If an unknown flag is passed, then you can ignore the flag.
  8. For simplicity, assume that the search string and replacement string don’t contain spaces, i.e., they are one word each.

wisc-tar

The second utility you will build is wisc-tar whose functionality is similar to the actual tar utility but much simpler. tar is a UNIX tool, similar to zip, used to combine a collection of files into one file. tar stands for Tape Archive. This tool is useful in a number of scenarios, e.g. offering a single file to download for software.

Input

wisc-tar will take 2 or more inputs: <output tar filename> <list of files …..>. For example:

prompt> echo abcd > a.txt # creates file a.txt
prompt> echo efgh > b.txt # creates file b.txt
prompt> ./wisc-tar test.tar a.txt b.txt # combines a.txt and b.txt into test.tar

Output

The output will be a tar file. In the actual tar implementation, a tar file consists of a series of file objects. Each file object consists of a file header (file name, size, checksum …) and the file contents.
For the purpose of this assignment, we will use a simpler file format for our wisc-tar file objects. We will use a 128 byte header which contains only the file name and file size, followed by the file contents padded to multiples of 512 bytes, i.e., if a file only has 1000 bytes of data, 24 bytes of NULL (‘’) are padded at the end. Here’s an example format of a wisc-tar file:

 file1 name [120 bytes in ASCII] 
 file1 size [8 bytes as binary]
 contents of file1 [in ASCII, padded to multiple of 512 bytes]
 file2 name [120 bytes]
 file2 size [8 bytes]
 contents of file2 [in ASCII, padded to multiple of 512 bytes]
 ...

Details

Here are a few details and assumptions you should keep in mind during implementation:

  1. All error messages are printed to stdout.
  2. You can assume that the files provided as an input exist in the directory where the program is run from (no need to handle ways to store pesky path names).
  3. You can also assume the files provided as inputs only contain ASCII characters.
  4. If fewer than two arguments are supplied to your program then you should print “wisc-tar: tar-file file1 […]” (followed by a newline) and exit with status 1.
  5. If any of the input files that should be a part of the tar file are not found you should print “wisc-tar: cannot open <filename>” (followed by a newline) and with exit status 1.
  6. If a tar-file of the same name already exists you can overwrite it with the new contents specified.
  7. If any of the input file names are longer than 120 characters, use the first 120 characters as the filename to store in the output tar file.
  8. If any of the input file names are shorter than 120 characters, you should pad the filename such that it uses 120 bytes. For example if the file name is “a.txt”, that only has 5 characters, you should append 115 NULL () characters to make the name use 120 bytes in the tar file. (Remember '' is not the same as '0'. The first one represents NULL while the second one represents the character 0!)

Example

Lets look at a complete example to make sure we understand the format and how to understand the contents of a valid tar file. In the following example we first create a text file which contains the string “hey”. So its size is 3 (Remember this!).

Next we run wisc-tar to create test.tar as shown below. Finally we print the contents of test.tar using hexdump, a utility to print the contents of a binary file. The comments to the right explain the output of hexdump. Remember that the bytes are represented in hexadecimal format, so handy table like thisLinks to an external site. will help you lookup the ASCII values for strings. Try to see if you can decode the contents based on the comments on the right! The below example was captured on a machine with little endian byte order.

prompt> echo -n "hey" > a.txt # '-n' does not append newline or linefeed at end of string
prompt> ./wis-tar a.tar a.txt
prompt> hexdump -v a.tar
0000000 2e61 7874 0074 0000 0000 0000 0000 0000  --> The first five bytes here contain the file name a.txt
0000010 0000 0000 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
0000030 0000 0000 0000 0000 0000 0000 0000 0000
0000040 0000 0000 0000 0000 0000 0000 0000 0000
0000050 0000 0000 0000 0000 0000 0000 0000 0000
0000060 0000 0000 0000 0000 0000 0000 0000 0000 ---> We have padded filename using  till we hit 120 bytes
0000070 0000 0000 0000 0000 0003 0000 0000 0000 ---> The byte containing 03 indicates the file size is 3. Note that this is not in ASCII!
0000080 6568 0079 0000 0000 0000 0000 0000 0000 ---> The first three bytes contain the string "hey", the contents of the file, followed by 509 bytes of .
0000090 0000 0000 0000 0000 0000 0000 0000 0000
.................

Hints

Read this lab tutorialLinks to an external site.; it has some useful tips for programming in the C environment.

You’ll need to learn how to use a few library routines from the C standard library (often called libc) to implement the source code for this assignment. All C code is automatically linked with the C library, which is full of useful functions you can call to implement your program. Learn more about the C library hereLinks to an external site. and perhaps hereLinks to an external site..
On UNIX systems, the best way to read about such functions is to use what are called the man pages (short for manual). In our HTML/web-driven world, the man pages feel a bit antiquated, but they are useful and informative and generally quite easy to use. To access the man page for fopen(), for example, just type the following at your UNIX shell prompt: prompt> man fopen

Suggested routines

Once a file is open, there are many different ways to read from it. There are two functions that might be useful for you in the context of this assignment.

  • Opening/closing files: For this assignment, we recommend using the following routines to open and close files: fopen() and fclose(). There are other system calls open() and close() you could also use but reading lines might be more difficult.
  • Reading/Writing a line: There are two useful functions to read lines: fgets() which can read a fixed number of characters but stops when it reaches the end of a line, and getline() which can read entire lines into a buffer. You can learn more about these functions in, you guessed it, the man pages! To write a string to a file you can similarly use fputs() or the more powerful formatting capabilities in fprintf().
    We recommend using getline(). You will need to be able to handle lines that are of an arbitrary length; this means, that you cannot simply call fgets() once for each line of the file,  since you will not know the size of the buffer needed to fit the line.  You can either look into using a different library routine such as getline() or call fgets() multiple times for each line.
  • String search/comparison: There are many functions available for finding and comparing strings. We recommend using strstr() and strcasestr() to find the position of the matching string.
  • Finding file size: To find the size of a file, you can use either stat()or fstat(). Again, refer to the man pages for more information.

Good practices

  1. Understand the Code Structure: Start by understanding the problem description and requirements. Read through the man pages and use external resources to understand any unfamiliar concepts or functions.
  2. Familiarize yourself with the functions/APIs: Test the new functions with toy examples just to understand how they work.
  3. Start Small: Don’t try to implement everything at once. For example, begin by opening a file, reading lines from it and closing.
  4. Implement step-by-step: Grow your program by adding one feature at a time
  5. Implement Error Handling: Once the core of your code has been implemented, add error handling code as specified in requirements above.
  6. Test Your Code: After implementing a feature, test it thoroughly to make sure it works as expected. This will help you catch any bugs or errors early on.
  7. Ask for Help When Stuck: Don’t hesitate to ask for help if you’re having trouble understanding a concept or figuring out why your code isn’t working. You can ask the instructors, classmates, or use online resources.

Remember, the goal of the project isn’t just to produce a working program, but also to understand the concepts and techniques you’re using.

Testing and Handing in your code

  • Some tests are provided at ~cs537-1/tests/p1. Read more about the tests, including how to run them, by executing the command cat ~cs537-1/tests/p1/README.md on any lab machine. Note these test cases are not complete, and you are encouraged to create more on your own.
  • Handing it in: Copy your files to ~cs537-1/handin/login/p1 where login is your CS login. Do NOT use this handin directory for your work space.  You should keep a separate copy of your project files in your own home directory and then simply copy the relevant files to this handin directory when you are done.  The permissions to this handin directory will be turned off promptly when the deadline passes and you will no longer be able to modify files in that directory. If you cannot find your handin directory, send email to the instructor asking for a handin directory; tell us your CS login.
Shopping Cart
[SOLVED] Cs537 p1: unix utilities[SOLVED] Cs537 p1: unix utilities
$25