COMP1730COMP6730 Programming for Scientists
IO and files
Outline
Input and output
Files and directories
Reading and writing text files
Input and output
IO: Input and Output
A common way for a programs to interact with the world.
Reading data keyboard, files, network.Writing data screen, files, network.
Scientific computing often means processing or generating large volumes of data.
2016, 07, 01, 2.0, 1, Y 2016, 07, 02, 0.0, 1, Y 2016, 07, 03, 0.0, 1, Y 2016, 07, 04, 0.0, , Y 2016, 07, 05, 4.4, 1, Y 2016, 07, 06, 15.4, 1, Y 2016, 07, 07, 1.0, 1, Y 2016, 07, 08, 0.0, 1, Y 2016, 07, 09, 4.2, 1, Y 2016, 07, 10, 0.0, 1, Y 2016, 07, 11, 10.4, 1, Y
Terminal IO
print generates output to the terminal typically, screen.
input prints a prompt and reads input from the terminal typically, keyboard.
input always returns a string.
input strinputEnter a number:input intintinput str
Image from PunchEnbody
Files and directories
What is a file?
A file is a collection of data on secondary storage hard drive, USB key, network file server.
A program can open a file to readwrite data.
Data in a file is a sequence of bytes integer 0b255.
The program reading a file must interpret the data as text, image, sound, etc.
pythonthe operating system OS provide support for interpreting data as text.
Text encoding recap
Every character has a number.
Unicode defines numbers code points for
120,000 characters in a space for 1 million.
Encoding UTF8
Font
Bytes Code point
0100 0101 69 1110 0010 226 1000 0010 130
1010 1100 172
Glyph
69
8364
A text file contains encodings of printable characters including spaces, newlines, etc.python program source code, HTML files,
etc.
A binary file contains arbitrary data, which may not correspond to printable characters.
images, audiovideo, word documents.
Directory structure
Files on secondary storage are organised into directories a.k.a. folders.
This is an abstraction provided by the operating system.
It will appear differently on different operating systems.
The directory structure is typically treelike.
File path
A path is string that identifies the location of a file in the directory structure.
Consists of directory names with a separator between each; the last name in the path is the name of the file.
Two kinds of paths:
Absolute
Relative to the current working directory cwd
When running a python file script mode, the current working directory cwd is the directory where that file is.
If the python interpreter was started in interactive mode without running a file, the cwd is the directory that it was started from.
The os module has functions to get and change the current working directory.
import os
os.getcwd
homepatrikteachingpython
Example: Posix Linux, OSX
Single directory tree.
Removable media and network file systems
appear at certain places in the tree.
The separator is
An absolute path starts with a
.. means the directory above.
File and directory names are case sensitive.
home
u123
Desktop
lab1
lab2
lib
If the cwd is homeu123lab1 then
prob1.py refers to homeu123lab1prob1.py
..lab2prob1.py refers to homeu123lab2prob1.py
liblibbz2.so refers to liblibbz2.so
homeu123Lab1prob1.py
does not exist.
Example: Windows
One directory tree for each drive; each drive is identified by a letter A to Z
The separator is
Must be writtenin python string literals.
Absolute path starts with drive letter and :
.. means the directory above.
File and directory names are not case sensitive.
C:Userspatriktest.py ..lab1exercise1.py
Reading and writing text files
File objects
When we open a file, python creates a file object or stream object.
The file object is our interface to the file: all reading, writing, etc, is done through methods of this object.
The type of file object and what we can do with it depends on the access mode specified when the file was opened.
For example, text mode vs. binary mode, readonly, writeonly, readwrite mode, etc.
Opening a file
openfile path, access mode opens a file and returns the file object.
my fileopennotes.txt, r first linemy file.readline second linemy file.readline my file.close
Close the file when done!
After calling file obj.close, we can do no more readwrite calls on file obj.
Access modes
access mode is a string, made up of flags.
if the file exists w Erases file content
a Appends new content at end of file
r Readsoverwrites from beginning of file
w Erases file content
a Readsoverwrites starting at end of file
b Open as binary file default is text
if it does not exist Error
Creates a new empty file
Creates a new empty file
Error
Creates a new empty file
Creates a new empty file
read only
write only
write only
readwrite
readwrite
readwrite
r
Caution
Be careful with write modes. Erased or overwritten files cannot be recovered.
Can we check if an existing file will be overwritten?
Yes!
os.path.existsfile path returns
True or False.
Catching exceptions more later in the course.
Reading text files
file obj.readline reads the next line of text and returns it as a string, including the newline character n.
file obj.readsize reads at most size characters and returns them as a string.
Ifsize0,readstoendoffile.
If already at endoffile, readline and read
return an empty string.
file obj.readlines reads all remaining lines of text returning them as a list of strings.
File position
A file is sequence of bytes.
But the file object is not a sequence type!
The file object keeps track of where in the file to read or write next.
The next read operation or iteration starts from the current position.
When a file is opened for reading mode r, the starting position is 0 beginning of the file.
File position is not the line number.
Suppose notes.txt contains:
First line
Second line
last line
Then
foopennotes.txt, r
fo.read4
Firs
fo.readline
t linen
fo.readlines
Second linen, last linen
Iterating through a file
Pythons text file objects are iterable.Iterating yields one line at time.
my fileopennotes.txt, r line num1
for line in my file:
printline num, :, line
line numline num1 my file.close
Writing text files
Access mode w or a opens a file for writing text.
file obj.writestring writes the string to the file.
Note: write does not add a newline to the end of the string.
print, filefile obj prints to the specified file instead of the terminal.
Buffering
File objects typically have an IO buffer.
Writing to the file object adds data to the
buffer; when full, all data in it is written to the file flushing the buffer.
Closing the file flushes the buffer.
If the program stops without closing an output
file, the file may end up incomplete.
Always close the file when done!
Programming problem
Read a python source code file, and
print each line;
prefix each line of code with a line number;
for numbering, count only lines containing
code not empty lines, or lines with only comments.
Takehome
File system is organised into directories and files in a treelike structure.
File path usesLinux, macOS orseparator Windows.
Python file object is iterable but not a sequence.
Good practice: Write fileobjopenabc and fileobj.close immediately before adding code inbetween.
Reviews
There are no reviews yet.