UCLA CS 131 lecture 2021-01-05
Core of this class (core of programming languages)
1. Principles and limitations of programming models.
2. Notations for these models, design + use + support for the above. 3. How to evaluate strengths + weaknesses in various contexts.
Lets do a quiz (wont count for a grade).
Write a program where the input is arbitrary text (finite sequence of ASCII characters, single byte, 0-127 decimal).
Program looks for words in that text, word [A-Za-z]+ (maximal sequence), outputs a concordance, i.e., a list of all the words it found,
along with the words popularity.
Q. Implementation is the same as design?
A. Implementation = compiler, interpreter, etc.
Design = deciding what the language is Example input (stdin):
Now is the time for four score and seven years ago for ago is now the time for four 4 !@$2w4w3 for is
Example output (stdout)
4 for 3 is
2 the 2 time 2 four 2w
2 ago 1 Now 1 now 1 score 1 and
1 seven 1 years
spaces within line dont matter, but have at least one between count and word.
List in popularity order, if theres a tie, any order will do.
You pick the (real) programming language, take ten minutes to write it.
The Lecture 1 quiz is now up on CCLE. There are also some questions
and answers on chat. Here they are again:
Q. Does bash count? A. Yes.
Q. Case sensitive?
A. Yes, as in Now vs now.
Q. Input in one line or multiple lines?
A. The input can be in multiple lines. The newline character is treated like any other word-separator.
Q. will the input be passed as an argument?
A. The input is in stdin, that is, the standard input stream.
Background behind this quiz question. Classic interview question.
Donald Knuth had a problem.
starting in the 1960s *The* computer programming textbook.
seven volumes (halfway through so far)
volume 1 (1969)
Addison-Wesley had trouble producing the book to high enough quality.
side project:
Write a book formatter, so he could format the way he liked. beautiful math, beautiful code, images, text,
designed his own font
TeX classic high-quality formatter
He produced the books 2nd editions
side project: The TeXbook book about TeXs design and algorithms. detailed internal doc. about TeX.
tex.pas The TeX source code, written in Pascal texbook.tex The TeXbook source code, written in TeX.
Keeping these two files in sync is a pain.
side side project: Literate Programming put documentation in the source
easy to keep things in sync.
tex.tangled (both source and documentation)
p1
How to tell people about Literate Programming? 1. Read the TeXbook (too long)
2. Simple introduction to the idea as a short paper.
CACM (leading research journal) editor John Mashey (Bell Labs)
suggested this problem to Knuth.
Knuth designed a new data structure hash trie
that solves this exact problem, wrote it in tangled Pascal + CACM paper.
Literate programming is now common. Well see it in this class. JDK documentation is done this way.
BUT it was a set-up job, I think. Afterword by Mashey in CACM.
1. Find the words in the input, with tr. 2. Sort them.
tr -cs A-Za-z [
*] | sed 1d | sort | uniq -c | sort -rn
tr options
-c complement the character set
(256 byte patterns 52 characters) = 204 characters
-s squeeze out duplicated translated characters uniq
-c count duplicates
sort
-r reverse -n numeric
CHOICE OF NOTATION MATTERS
Related notion in north campus courses (north zoom)
Sapir-Whorf hypothesis
in natural languages (not programming languages)
1. The structural diversity of natural languages is essentially limitless.
2. The language we speak to some extent determines how we view the world, and how we think.
2 : determines -> affects
2 is a factor in programming languages, partly because pick them up later.
Q. Is what youre talking about right now referring to formal language theory in CS181?
A. Its about both programming and natural languages.
Take a break Resume 15:04
Q. Do we have class on thursday of week 5? A. No, just midterm.
Q. are the tests multiple choice? A. Maybe.
Q. Is a passing average a C, for the HW?
A. Yes.
Q. Do you ever give out A+ grades?
A. No.
Q. Rumor says the mean score of final is pretty low. Will the score be curved afterwards?
A. Yes and no.
Q. Will upcoming homework assignment specs be released early? A. Not too early. Tentative until published.
Q. If I enrolled 131 before but dropped later. Can I use my own code (unpublished) from previous homework?
A. Yes.
Sample exams will be available. They cover everything in the class
up to exam time, including lectures, readings, discussions, homeworks. Essay questions often designed to cover multiple topics per question.
Q. will we have to write code on the exams?
A. Yes. You can debug the code too, you have a computer. Dont overdo it though.
Q. will partial credit of essay questions be given?
A. Yes.
Q. Office hours. A. On CCLE.
Q. The code on the exams will be grade by idea (with some bug) or the ability to solve the problems (bugs free)?
A. Yes. A tiny bug wont subtract much. Bugs are often symptoms
of idea problems.
Q. Is office hours using the same zoom link with lectures?
A. No.
Language design issues
Idea: How to make decisions about which language to use,
or how to design a language, or how to improve it. Successful languages evolve.
(A) Orthogonality. comes from mathematics. orthogonal axes.
x, y, z axes are orthogonal
You can choose the x value without worrying about y and z.
Theyre independently chosen.
In a language, you have multiple features, each of which requires a selection, and you can make any single selection without worrying about the others. E.g., select a type, or
select an identifier name.
Q. is this related to modularity?
A. Yes, modularity is orthogonality applied to modules.
A design that is *not* orthogonal.
In C/C++, functions cannot return arrays.
int *f () { } is OK (pointer to 0th element) struct s {
int a[10]; };
struct s g () { } is OK (struct containing an array)
typedef int[10] trouble; trouble h () { } no can do
#include
wonderful f (int x) { } // ??
Q. So Orthogonality is about flexibility?
A. Yes, users need flexible choices on each attribute independently.
Q. an array is not really a type right?
A. In C/C++, it is, sort of (except it causes trouble).
Q. I thought the type is really the pointer A. Yes and no.
int a[10];
int *p = a; // In most expressions, arrays decay into pointers. int v = sizeof a;
int w = sizeof p;
Q. how does the compiler distinguish the pointer to a to the entire array a when using sizeof?
A. By context.
(B) Efficiency at runtime (how long to run the program?)
B.1 wall clock time B.2 CPU time
B.3 RAM
B.4 I/O
B.5 network access latency (request vs response time) B.6 energy and power
(C) Efficiency at compile-time (how long to build?)
(D) Efficiency at development time (how long to write the code?) More-verbose takes longer to write and read.
Too-concise can be even harder to read.
* Safety
What happens if you violate the rules of the language? Undefined behavior is a kind of violation
How much damage is caused?
static checking (in C/C++): are identifiers declared properly? do all the types match?
vs
dynamic checking: dereference a null pointer
subscript error
stack overflow classic dynamic checking problem.
* Concurrency
* Exception handling
* Mutability / evolvability
.
Next time: syntax OCaml
Q. Is attendance mandatory or can we watch the recordings on our own time? A. Its not mandatory, but the questions keep me honest.
Q. Assigned readings.
A. Webber 1, 2, 5, 7, 9, 11.
shell transcript
501-day $ tr -cs A-Za-z
*
int f (void)
if (close (fd) != 0 && errno == EBADF)
return 1;
else
return 0;
}
Our own unistd.h.
#include_next
#define close rpl_close
workaround.h:
#ifdef MSVC
# define TRY try {
# define CATCH } catch {
# define DONE }
#else
# define TRY
#
#endif
Our own library.
#include
#include
{
int
rpl_close (int fd)
TRY
int r = close (fd);
CATCH
r = -1;
errno = EBADF;
DONE
return r;
}
Another example of syntax mutability,
:-op(500, yfx, [+,-]).
:-op(200, fy, [+,-]).
Prolog lets you define your own operators.
(Example using the standard operators.)
+ and have precedence 500, are binary operators, left associative.
:-op(400, yfx, [*,/]).
* and / have precedence 400, are binary operators, left associative.
:-op(200, xfy, [**]).
:-op(700, xfx, [=, =, ==, =<, >=, ]).
a+b*c == a+(b*c)
a+b-c == (a+b)-c
f = function symbol
y = operand of equal or lower-numbered precedence
x = operand of lower-numbered precedence
a+-b == a+(-b)
a ** b ** c == a ** (b ** c)
a = b = c is a syntax error.
= is a non-associative operator
a >= b >= c is a syntax error in Prolog
(Thats not true in C/++.)
You can add your own operator, and people often do.
:-op(900, yfx, [>]).
a > b + c
:-op(200, fy, [*]).
:-op(200, xf, [!]). Factorial. n!
Q. does this have anything to do with why C++ uses the >> operator for cout?
A. C++ overloads >> for both left shift and output and
BUT C++ doesnt let you invent new operators,
or change the precedence of existing operators.
Because they thought it would confuse programmers.
You can overdo this.
a++/ -> ~b!
Q. Are there any typical use case for defining operators?
A. Languages catering to specialties with their own notation.
Q. so when we are doing assignment operator for classes in C++, we are
only changing the functionality but not the actual language
A. Youre not changing the syntax, just the semantics.
Q. Is there a book/website where we could study this syntax? (For Prolog)
A. See Prolog resources.
Syntax
Definition: form independent of meaning
Classic English example: Colorless green ideas sleep furiously.
due to N. Chomsky 1950s syntax good, semantics dubious
Ireland has leprechauns galore. (due to yours truly)
semantics good, syntax dubious
galore is an adjective in the wrong place syntactically.
Time flies.
NV
V N Measure the speed of flies in action.
ambiguity thats both syntactic and semantic
Ambiguity is sometimes desirable in English (poets, diplomats, politicians,)
But its not desirable in programming languages.
Why you should prefer one syntax to another.
Its what people are already used to. (why Java looks like C++)
Its simple and regular. (e.g., Lisp ()()())
Its readable.
Leibnizs criterion: a propositions form should mirror its meaning.
E.g., style advice from Val Schorre
Never use >, >=.
if(i>=0&&i
getchar -> ID
C/C++ have *reserved words* look like identifiers but are not.
There are certain strings like if that cannot be used
as identifiers. iff is OK.
int iff = 12;
This is a problem: it makes extending the language harder.
int class = 29; // OK in C, not in its extension C.
Some languages have no reserved words; keywords but no reserved words.
PL/I
IF (a=b) THEN a(3)=5; ELSE a(5)=7;
IF (IF=IF) THEN THEN(ELSE)=5; ELSE THEN(4)=9;
+ easier to extend the language
rules for tokenization are more complicated
The C solution to this problem?
All identifiers beginning with _ and then a capital letter
(or _) are reserved.
New keywords in C all begin with _ and a capital letter.
_Noreturn void f(void) { exit(1); } // f never returns Parsing.
terminology:
Assumption is input is a finite sequence of tokens.
terminal symbol = token (theyre the same thing)
We need to specify the language in order figure out how to parse it.
Context-Free Grammar (CFG) is the typical way its done.
(FORTRAN, Python are counterexamples)
terminal symbol = one of a chosen finite set (dozens to hundreds)
string = finite sequence of terminal symbols
parser is given a string and asked to parse it.
How to generate a string that belongs to a particular language.
language = set of strings
CFG =
set of terminal symbols
+ (finite) set of nonterminal symbols each representing a phrase
+ (finite) set of *rules*, each with:
a left hand side (nonterminal symbol)
a right hand side (finite sequence of symbols)
+ start symbol (one of your nonterminal symbols)
example (nonterminals in CAPS, terminals in lowercase)
S is the start symbol
S->Sa 1
S->Sb 2
S->c 3
S (initial sequence of symbols must be the start symbol)
Sa (applied rule 1)
Saa (1 again)
Sbaa (2)
cbaa (3)
This shows that cbaa is a sentence in our language. sentence = member of a language
Q. Can you go from rule 1 to rule 2 to rule 1 again, or do you
have to go in order (1->2->3)?
A. No, you can apply any rule whose LHS appears in your symbol list.
Q. Does a CFG represent a language?
A. Yes, it represents the set of strings that you can derive
from the start symbol.
Q. Is the start symbol a nonterminal?
A. Yes.
Q. Is everything from the right hand side terminal symbols?
A. No, rule 1 has both a nonterminal and a terminal in it.
S -> S + S 1 This grammar is ambiguous!
S->S*S 2
S->(S) 3
S->num 4
((3 + 4) * 5) + (6)
( ( num + num ) * num ) + ( num )
S
S+S 1
(S)+S 3
(S) + (S) 3
(S*S) + (S) 2
((S)*S) + (S) 3
((S)*S) + (num) 4
Q. is num also a terminal symbol then since it doesnt contain
nonterminals in rule 4
A. num is a terminal symbol, not a nonterminal.
Q. Do we need to know how to proof ambiguity in the class?
A. Yes, but not yet.
Q. How about proofing some language is not ambiguous? it ways harder I think?
A. Yes, its so hard I wont ask you to do that!
UCLA CS 97 lecture 2021-01-12 more about grammars
usage of grammars
Lets take a simple grammar
how grammars get used
various notations for grammars
(eventually) parsing
computer science theory
language definitions
how to automatically process languages
S -> S a 1
S -> b 2 S->(S) 3
example in language:
(b a a) Q. (old version) I dont think this is in the language.
example not in the language
((b) a a (a b (a)))
defining a language
parse sentences in the language
We need a proof that a string is a sentence in the languag
Multiple ways of doing the proof.
Heres one simple way: leftmost derivation proof
We have a string of symbols, initially,
a singleton string containing just the start symbol.
Step: replace the leftmost nonterminal with
a right-hand side of a rule which has that
nonterminal as its left hand side
Keep repeating this step as long as there is
any nonterminal in the string
S3
(S) 1
(Sa)1
(Saa)2
(baa)
We have now proved that ( b a a ) is a sentence in the language.
Shorthand notation for proof (leftmost derivation): 3 1 1 2
Q. If there are multiple rules with the nonterminal on the lhs, how
would we know which rule to replace the nonterminal with?
A. Youre right, we dont necessarily know when we try to come up
with the derivation. But its easy to check a purported derivation;
just run it and check, no decisions other than checking.
Q. Are we assuming that we always start with S here?
A. Yes, always start with the start symbol.
So, the problem of parsing is the problem of figuring out a derivation.
A more-popular way to express a proof (same power as derivations),
but its easier to read. A parse tree:
Q. Why does rule one need to appear twice in the parse tree?
A. For this example, we had two as in the token sequence,
so rule 1 needed to be done twice to get them.
Q. will parse trees be unique on a given string?
A. No, unfortunately.
S->aS 1 S->Sa 2
S-> 3
There will be multiple ways to generate the string a a a.
This grammar is *syntactically ambiguous* it allows multiple parse
trees for the same sentence.
Q. when is S -> useful in the above grammar
A. Its essential in any sentence. Without it, there would be no
sentences. Every sentence must be finite. This leftmost derivation:
12123
generates
aaaa
A parse tree is a tree:
leaves are all terminals
interior nodes are all nonterminals +
each interior node exactly matches a rule
node is LHS
children are RHS
Q. If the grammar is unambiguous, does that mean every string has a
unique parse tree?
A. Yes, every sentence (parseable string) has just one parse tree.
Q. Can we consolidate rules?
A. Yes. Trivial example:
S -> b
S -> S a S -> S a
is equivalent (generates the same language as):
S -> b
S -> S a
Q. Can a string have a potentially infinitely long parse string?
A. In our system all strings are finite.
Q. Or rather, can a string have an infinite number of valid parse strings?
A. Yes, if some rules have empty RHSes, this is possible.
Problems that you can run into with grammars. * Nonterminals that are used but not defined.
S -> T b // useless rule
S -> S a
S -> c
c a is a sentence
* Nonterminals that are defined but never used.
* Nonterminals that can never be reached from the start symbol.
* Useless loops.
Sa3
ca
S2
S -> S a
S -> c
T -> S b // useless rule.
S -> S a
S -> c
T -> U d // useless
U -> T e // useless
S -> S a
S -> c
S -> S // counterproductive introduces ambiguity for no reason
S -> S a
S -> c
S -> T // counterproductive introduces ambiguity for no reason
T -> S //
Aside these problems are similar to what you find in your
ordinary programs. A bug like this in your grammar,
is likely to cause a similar bug in the corresponding parser.
Q. Cant the useless rules from earlier be summarized as either
unproductive or unreachable? And arent there algorithms to filter
mthem?
A. Yes and yes. Ambiguity is a harder nut to crack, though.
* Syntactic constraints that your grammar does not capture.
* You grammar tries to capture too much detail.
An example from natural languages.
Small subset of English grammar.
N {dog,cats}
V { barks, meow }
Adj { brown, black }
Adv { loudly, quietly }
S -> NP VP .
NP -> N
NP -> Adj NP
VP -> V
VP -> VP Adv
dog barks loudly.
brown cats meow.
dog meow softly. // This should not be allowed!
SN {doghorse}
PN { cats donkeys }
SV { barks brays }
PV { meow neigh }
Adj { brown, black }
Adv { loudly, quietly }
S -> SNP SVP .
S -> PNP PVP .
SNP -> SN
SNP -> Adj SNP
SVP -> SV
SVP -> SVP Adv
PNP -> PN
PNP -> Adj PNP
PVP -> PV
PVP -> PVP Adv
But .. weve doubled the size of our grammar!
Grammar size grows exponentially with the number of attributes.
Lets not do this!
* Last problem with grammars (most interesting one)
Ambiguity can crop up in seemingly reasonable grammars.
E -> E + E
E -> E * E
E -> ID
E -> NUM
E -> ( E )
This grammar is ambiguous.
()
a+b*c
()
()
a+b+c
()
Lets fix the grammar to make it unambiguous.
E -> E + T
E -> T
T -> T * F
T -> F
F -> ID
F -> NUM
F -> ( E )
E expression
T term
F factor
( ) This doesnt work, because only F can precede *
a+b*c
()
()
a+b+c
( ) This doesnt work, because only T can follow +
Another example of finding and eliminating ambiguity. subset of C programming language
STMT -> ;
STMT -> EXPR ;
STMT -> return ;
STMT -> return EXPR ;
STMT -> break ;
STMT -> continue ;
STMT -> while ( EXPR ) STMT
STMT -> do STMT while ( EXPR ) ;
STMT -> if ( EXPR ) STMT
STMT -> if ( EXPR ) STMT else STMT
STMT -> for ( EXPROPT ; EXPROPT ; EXPROPT ) STMT
STMT -> switch ( EXPR ) STMT
STMT -> { STMTS }
STMTS ->
STMTS -> STMTS STMT
EXPROPT -> EXPR
EXPROPT ->
Why this:
STMT -> return EXPR ;
and not this:
STMT -> return ( EXPR ) ;
Because C designers didnt want to clutter code with
unnecessary parentheses. Parentheses are not needed here,
its not going to be ambiguous without them. The semicolon
is good enough.
Why this:
STMT -> while ( EXPR ) STMT
and not this:
STMT -> while EXPR STMT
Because C would be ambiguous otherwise.
while x < y x++; // This is not ambiguous while x < y ++x; // This is ambiguous. () () while (x < y++) x; // valid interpretation while (x < y) ++x; // also valid Q. Why does python do it then? A. Because it has the : after the while-expression. Why this: STMT -> do STMT while ( EXPR ) ;
and not this:
STMT -> do STMT while EXPR ;
Answer: just because (the language would be unambiguous without them,
but designers liked the looks, and its more consistent in some sense)
STMT -> ;
STMT -> EXPR ;
STMT -> return ;
STMT -> return EXPR ;
STMT -> break ;
STMT -> continue ;
STMT -> while ( EXPR ) STMT
STMT -> do STMT while ( EXPR ) ;
STMT -> if ( EXPR ) STMT
STMT -> if ( EXPR ) LSTMT else STMT
STMT -> for ( EXPROPT ; EXPROPT ; EXPROPT ) STMT
STMT -> switch ( EXPR ) STMT
STMT -> { STMTS }
LSTMT -> ;
LSTMT -> EXPR ;
LSTMT -> return ;
LSTMT -> return EXPR ;
LSTMT -> break ;
LSTMT -> continue ;
LSTMT -> while ( EXPR ) STMT
LSTMT -> do STMT while ( EXPR ) ;
LSTMT -> if ( EXPR ) LSTMT else STMT
LSTMT -> for ( EXPROPT ; EXPROPT ; EXPROPT ) STMT
LSTMT -> switch ( EXPR ) STMT
LSTMT -> { STMTS }
This is too long lets simplify it.
STMT -> LSTMT
STMT -> if ( EXPR ) STMT
LSTMT -> ;
LSTMT -> EXPR ;
LSTMT -> return ;
LSTMT -> return EXPR ;
LSTMT -> break ;
LSTMT -> continue ;
LSTMT -> while ( EXPR ) STMT
LSTMT -> do STMT while ( EXPR ) ;
LSTMT -> if ( EXPR ) LSTMT else STMT
LSTMT -> for ( EXPROPT ; EXPROPT ; EXPROPT ) STMT
LSTMT -> switch ( EXPR ) STMT
LSTMT -> { STMTS }
This avoids ambiguity, BUT if you look at the C standard.
Q. side question: do C and C++ standards members collaborate when
developing standards updates (since a C change could impact a C++
change)?
A. Yes, they try. Its usually collegial. Who has the time to do the work?
ISO/IEC 9899 n2478.pdf February 5, 2020
I dont recommend reading it cover to cover! Its long.
Avoiding ambiguity clutters your grammars, and many people dont
want to bother. Theyll say:
E -> E + E
E -> E * E
E -> ID
and say This is my abstract grammar. They want it to govern
their parse trees they want simple parse trees.
Alternate notations for grammars.
Examples:
1. so far today
E -> E + E
3. C standard:
2. classic syntax
expr:
expr + expr
4. Internet RFCs
RFC Request For Comments
Internet RFC 5322 the standard for email.
Simplified version
https://tools.ietf.org/html/rfc5322 section 3.6.4
<” id-left “@” id-right “> dot-atom-text / obs-id-left
msg-id
id-left
id-right
no-fold-literal = [ *dtext ]
= =
=
dot-atom-text / no-fold-literal / obs-id-right
Each rule looks like:
LHS = RHS
nonterminals are id-left, id-right, etc.
terminals are quoted
Example msg-id might be in:
Message-ID: <[email protected]>
This set of rules is an extended version of the rules weve
seen so far, in that *X in the right hand side,
this is shorthand for zero or more occurrences of X.
Next time well see how this falls out.
UCLA CS 131 lecture 2021-01-14
This a subset of the Internet RFC 5322 grammar for Message-ID values: Message-ID:
Start symbol is msg-id (for us, anyway)
alternate notations for grammars
you need to see the ideas under these notations
as opposed to getting stuck in the notations themselves.
msg-id = <” id-left “@” id-right “>
id-left = dot-atom-text
id-right = dot-atom-text / no-fold-literal
/ is a metanotation (part of EBNF Extended BNF)
easier to read than straight BNF
you can easily translate it to straight BNF if you want
BNF = Backus-Naur Form (Backur Normal Form)
what we use in homeworks.
lhs is a nonterminal symbol
rhs is a finite sequence of symbols
id-right = dot-atom-text
id-right = no-fold-literal
dot-atom-text = 1*atext *(. 1*atext)
(, ), * 1* are metasymbols (in EBNF, but not part of BNF)
(X) means X
*X means X repeated zero or more times
1*X means X repeated one or more times
atexts = atext
atexts = atext atexts
dotatexts =
dotatexts = . atexts dotatexts
dot-atom-text = atexts dotatexts
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 ; characters not including [ ]
; comment
%d33-90 = ! / / Z
%d94-126 = ^ / / ~
atext = ALPHA / DIGIT / ; Printable US-ASCII
! / # / ; characters not including
$ / % / ; specials. Used for atoms.
& / /
* / + /
– / / /
= / ? /
^ / _ /
` / { /
| / } /
~
Message-ID:
msg-id = <” id-left “@” id-right “>
id-left = dot-atom-text
id-right = dot-atom-text / no-fold-literal
dot-atom-text = 1*atext *(. 1*atext)
no-fold-literal = [ *dtext ]
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 ; characters not including [ ]
atext = ALPHA / DIGIT / ; Printable US-ASCII
! / # / ; characters not including
$ / % / ; specials. Used for atoms.
& / /
* / + /
– / / /
= / ? /
^ / _ /
` / { /
| / } /
~
We can turn this grammar in a regular expression!
atext='[A-Za-z0-9!#$%&*+-/=?^_`{|}~]
dtext='[!-Z^-~]
dot_atom_text=$atext+(.$atext+)*
id_right=($dot_atom_text|$no_fold_literal)
no_fold_literal=\($dtext*]
id_left=$dot_atom_text
msg_id=<$id_left@$id_right>
grep -E $msg_id
or I could have said this:
Can you do this with any EBNF grammar?
=
Can you do this with any BNF grammar? No.
grep -E <[A-Za-z0-9!#$%&”’*+-/=?^_`{|}~]+(.[A-Za-z0-9!#$%&”’*+-/=?^_`{|}~]+)*@([A- Za-z0-9!#$%&”’*+-/=?^_`{|}~]+(.[A-Za-z0-9!#$%&”’*+-/=?^_`{|}~]+)*|([!-Z^-~]*])>
Regular expressions cant count.
They cant deal with nesting.
Expr -> ( Expr )
Expr ->
Seq -> Unit
Seq -> Unit Seq
This is doable, because it does only *tail recursion*,
which can be optimized into a loop X*
seq=$unit*
Two more notations (ISO notation, diagrams)
ISO standard for EBNF
International Standardization Organization
ISO standards for electrical wall plates, C++, JavaScript,
kinda hard to read all these standards
every standard uses its own EBNF notation
So, lets have an ISO standard for EBNF!
https://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html summarizes it.
terminal symbol terminal symbol (equivalent notations)
meta identifier (nonterminal symbol names can contain spaces)
(A) A
[A] A or nothing (zero or one instances of A)
{A} zero or more As
(* comment *)
decreasing precedence order:
N*A N or more As
A B A but not B (like set difference; there are restrictions)
A, B A concatenated with B
A | B A or B
id = A; rule, with id being LHS and A being RHS Use a grammar to formally specify the syntax of ISO EBNF.
syntax = syntax rule, {syntax rule};
syntax rule = meta id, =, defns list, ;; (* describes itself! *)
defns list = defn, {|, defn};
defn = term, {,, term};
term = factor, [-, exception];
exception = factor;
factor = [integer, *], primary;
primary =
Philosophically were in trouble:
Weve defined ISO EBNF by using ISO EBNF.
One more notation: this one is a diagram
syntax graphs
syntax diagrams
racetrack diagrams
Can be used in manuals/doc for big languages.
SQL (used for database queries)
many suppliers provide extensions to SQL
they need to document the syntax for their extensions
diagrams can simplify the description of the syntax
a shorter example taken from Scheme (simple syntax)
so the example is a simple one
Time to talk about functional programming.
Q. do we have to read the textbook or do lectures cover all the
material we need?
A. For next time, please read Webber chapters 1-7, 9, 11
(if you havent already).
motivations
* Clarity Traditional programming languages use
nonmathematical notation.
n=n+1
C++
But mathematicians are *really* good at notations
and they have centuries of experience!
We should use their work rather than reinventing it.
theres also a problem of performance
* Performance parallelizability
C++ etc. force you to think and write code sequentially
a = f (x); // f might change globally-visible memory
b = g (y); // g might access that memory
Compilers often cant do this stuff in parallel.
Key idea here: avoid *side effects*
A side effect is a change to the machine state made
because a statement was executed. I can be:
* change contents of visible storage
* do I/O
Because we avoid side effects,
we avoid the problem where the same name n means different
things to different audiences (which leads to confusion
and lack of clarity). If have no assignment statements,
this problem goes away.
Technical term referential transparency: we want to be
as transparent as possible about what a name stands for.
This is not just clarity.
You can cache more variables in registers.
Q. Is Side effect similar to the idea of a race condition?
A. Yes, in the sense that side effects often cause races.
Many functional languages:
ML, OCaml, F#, Scala, .
scientific programming (e.g., 3D simulations of the universe)
Functional programming makes it easier to write programs
that reason about other programs (side effects get in the way
of this sort of reasoning).
OCaml vs ML (why?)
book uses ML
OCaml more popular here
for this class, differences are small (minor syntactic stuff)
[3;4;5] vs [3,4,5]
floating point arithmetic is more of a pain in OCaml 3.0 +. 5.0 instead of 3.0 + 5.0
Basic properties of OCaml (and ML)
* Static type checking (done at compile-time)
like C++, Java,
* You need not write down types all the time.
like Python, Scheme,
* No need to worry about storage management
like Java, Python, Scheme,
* Good support for higher order functions (just like the math guys have!)
a functional language
Q. So OCaml, although object-oriented, preserved the functional style
of grammars?
A. Its certainly a functional style.
Its easier to translate a grammar to an OCaml parser than
to a C++ parser.
Q. how WOULD we write this expression in a way that works?
A. First, you wouldnt. Second, if you need it, youd use
a discriminant union (like symbol in homework).
Q. Can you explain why the tuple cant have any number of elements?
A. Tuples can have as many elements as you like,
but youre stuck with whatever number you pick.
Q. How come when I try to add a value to a list via 1::[2; 3];;, the
top-level says (::) (1, [2; 3]) instead of [1; 2; 3]
A. @Kyle, that often happens after you do open List
Q. tuple is immutable?
A. Everything is immutable in our subset of OCaml.
UCLA CS 131 lecture 2021-01-19
for next time (if you havent already), read Webber 1-9, 11
functional programming with OCaml as an example
lecture is a bit behind homework on OCaml
stuff you already know part of today
idea is to see how to explain OCaml in a basic core + add-ons
you need to know the core!
(you see this all the time C++ core language + libraries) (OCaml has the same thing.)
Whats the core of the OCaml language (add-ons will still be part of the language) (our add-ons will be syntactic sugar)
(syntactic sugar in C++: ++i is short for i += 1 and for i = i + 1)
In val f : int -> int =
the
Lisp has a function (cons x y) taking a value x and a list y, and it returns a list thats one longer, first elt is x, rest is y.
Q. just clarifying, the let f x = syntax is syntactic sugar for the let f = fun x -> syntax?
A. Yes.
If you use let f x y = , OCaml *curries* the function. Instead of taking 2 arguments, it just takes the first one, and then returns a function that when you call it and gives you the 2nd argument, finally does the actual work.
Aside: Haskell Curry designed this technique to simplify his theoretical papers on functional programming. Nowadays its not simply a notational convenience.
Q. Did Haskell curry invent haskell A. No, but its named after him.
In OCaml, if you just write x y z, its parsed as *two* function calls: the first is a call to x, the second is call to whatever x y returns. You can write it as (x y) z.
If you wrote x (y z), that means something else.
Q. So essentially every expression is just one function call and one argument?
A. Every normal is, yes, but there are if, match,
Also, OCaml has syntactic sugar for common operators.
Q. Isnt this heavy use of function calls inherently more inefficient
due to extra call and ret instructions?
A. Yes and no. Yes, because call+ret does cost you. No, because OCamls optimizer optimizes away many of these calls, or uses cheaper versions of the calls (cheaper than what you normally get in C++).
let cons x y = x::y let c3 = cons 3
c3 [4;5] When this executes, the code somehow knows about the 3. How does it do that?
* OCaml interpreter could clone the machine code for cons,
mentally substitutes 3 for every occurrence of x in that code,
it now has a new machine code sequence, and we call it c3. * Typically a different approach is taken, though.
OCaml interpreter creates a pointer to the environment where x=3, and packages that into the pointer that implements c3.
OCaml functions are represented by fat pointers
a pointer to the machine code (
({x = 3}, for us)
You cannot do this in C/C++, because their function
pointers are single word pointers (only point to machine code)
You can achieve the effect by passing an extra environment argument to the function (by hand).
Q. what defines an environment?
A. For now, a set of name-value pairs. (Like POSIX sh.)
(its actually more complicated)
Q. So, Higher Order Function is some kind of configurable function
after define?
A. Yes, for cons because you can give it just the one arg. Strictly speaking, a HOF is function taking a function as an arg, or returns a function as its result.
Q. So would the fixed point function on hw1 be a higher order function? A. Yes!
In Lisp:
(cons a b) yields a list one longer than b
(car l) yields the first elt of the list l (hd in OCaml) (cdr l) yields the tail of the list l (tl in OCaml)
Q. does a functions domain have to cover everything of a certain type?
A. Not in OCaml; you can have partial functions that (say) throw exceptions. But please dont do that in your homework.
Q. In CS97, you mentioned how C is a lower level language that
Python or Javascript due to checking and other factors, how does OCaml compare to C, Python, and Javascript?
A. OCaml is closer to C in that it has static type checking.
OCaml is closer to Python in that it has a garbage collector.
Q. where the 5 come from?
A. we specified it as first arg of dcar
Q. How does OCaml pattern matching work under the hood? A. after the break (restart at 15:04)
OCaml patterns are used in match expressions match l with
| h::t -> 3::h::t | [] -> [4]
Q. What would h::t in the match evaluate to? I.e. would L be matched to the list resulting from h::t?
A. Patterns are not evaluated in the usual sense. Expressions are evaluated in order to construct or yield values, but patterns are
used only to take values apart. If l evaluates to [1;2;3], then
the match expression binds h to 1 and t to [2;3], and evaluates 3::h::t in that environment.
some OCaml patterns
a _
0
[] P,Q
matches anything, binds a that value (a bound variable) matches anything, and discards it
___ -> _39_47_293
matches only itself (true for any constant)
matches only itself ( )
matches any tuple with two elts, first elt matches the pattern P
the second one matches Q
(this applies recursively;
a,3 matches any pair whose 2nd elt is 3,
and whose first elt is then bound to a)
P,Q,R matches any tuple with 3 elts,
() matches any tuple with 0 elts
[P;Q;R] matches any 3-elt list whose components match P,Q,R respectively P::Q matches any list whose head matches P and tail matches Q
T P matches any object built from constructor T with argument
matching the pattern P.
:: is just another constructor (with two args)
So, how does it work?
Q. is it possible to write a recursive function with the anonymous
function syntax?
A. Yes, but outside the scope of this lecture (its a bit tricky).
A higher order function that does some pattern matching too.
Find the maximum element in a list of int.
sum of the empty list is 0 (identity element) product of the empty list is 1 (identity element)
let rec maxlist = fun l -> match l with
| h::t -> let mt = maxlist t in
if mt < h then h else mt| [] -> -99999999999 let rec maxlist l = match l with
| h::t -> let mt = maxlist t in if mt < h then h else mt | [] -> -99999999999 let rec maxlist = function
| h::t -> let mt = maxlist t in
if mt < h then h else mt | [] -> -99999999999
This works only on lists of integers. Lets generalize it to list of anything.
let rec maxlist lt identity = function
| h::t -> let mt = maxlist lt identity t in
if lt mt h then h else mt | [] -> identity
We have two keywords here:
fun function
They operate differently (syntactic sugar in different ways) fun is for currying (though you can use patterns)
fun x y -> x + y fun x -> (fun y -> x + y)
fun x::y -> x (pattern matching) function is for pattern matching
function | [] -> 0 | _::l -> f l
function |x::y -> x function x::y -> x
Q. this is a bit unrelated, but do you think learning OCaml makes us better at more traditional programming languages, considering its features are so different?
A. Yes, but its more your programming toolkit than just being better at the languages.
Q. is there somewhere we can check for corresponding readings for each lecture? A. Not really, because theres not a 1:1 correspondence.
shell transcript
501-day $ ocaml
OCaml version 4.08.1
# [];;
: a list = []
# let f x = x + 1;;
val f : int -> int =
# let n = 10 + 3;;
val n : int = 13
# let f1 = (fun x -> (x + 1));;
val f1 : int -> int =
# fun x -> (x + 1);;
: int -> int =
# 10 + 3;;
: int = 13
# 3::(4::[]);;
: int list = [3; 4]
# let cons x y = x::y ;;
val cons : a -> a list -> a list =
val cons1 : a -> a list -> a list =
# let cons3 = fun x -> fun y -> x::y;;
val cons3 : a -> a list -> a list =
: int list = [3; 4]
# cons 3;;
: int list -> int list =
# cons 3 [];;
: int list = [3]
# (cons 3) [];;
: int list = [3] # ((cons 3) []);; : int list = [3] # 3 + 4;;
: int = 7
# (+) 3 4;; : int = 7 # (+);;
: int -> int -> int =
# let c3 = cons 3;;
val c3 : int list -> int list =
# c3 [4;5];;
: int list = [3; 4; 5]
# c3 (c3 [4;5]);;
: int list = [3; 3; 4; 5]
# let tcons (x, y) = x::y;;
val tcons : a * a list -> a list =
# tcons (3, [4;5]);;
: int list = [3; 4; 5]
# tcons;;
: a * a list -> a list =
Line 1, characters 8-18:
1 | let car (h::t) = h;;
^^^^^^^^^^
Warning 8: this pattern-matching is not exhaustive. Here is an example of a case that is not matched: []
val car : a list -> a =
# car [];;
Exception: Match_failure (//toplevel//, 1, 8).
# let dcar default l = match l with |[] -> default |h::t -> h;; val dcar : a -> a list -> a =
# dcar 5 [3;9;12]
;;
: int = 3
# dcar 5 [];;
: int = 5
# let d5 = dcar 5;;
val d5 : int list -> int =
: int = 3
# d5 [];;
: int = 5
# x;;
Line 1, characters 0-1: 1 | x;;
^
Error: Unbound value x # let x = x + 1;;
Line 1, characters 8-9: 1 | let x = x + 1;;
^
Error: Unbound value x
# let x = 12;;
val x : int = 12
# let x = x + 1;; val x : int = 13
# let f y = x + y;;
val f : int -> int =
: int = 15
# let x = x + 100;;
val x : int = 113 # f 2;;
: int = 15
# let rc maxlist lt identity = function
| h::t -> let mt = maxlist t in if lt mt h then h else mt
| [] -> identity;;
val rc : (a list -> a) -> (a -> a -> bool) -> a -> a list -> a =
# let rec maxlist lt identity = function | h::t -> let mt = maxlist t in
3 |
if lt mt h then h else mt | [] -> identity;;
Line 3, characters 10-12:
if lt mt h then h else mt
^^
Error: This expression has type a list
This is not a function; it cannot be applied. # let rec maxlist lt identity = function
| h::t -> let mt = maxlist t in if (lt mt h) then h else mt
| [] -> identity ;;
4 |
Line 4, characters 11-13:
if (lt mt h) then h else mt
^^
Error: This expression has type a list
This is not a function; it cannot be applied.
# let rec maxlist lt identity = function
3 |
| h::t -> let mt = maxlist t in
if lt mt h then h else mt
| [] -> identity;; Line 3, characters 38-40:
if lt mt h then h else mt ^^
Error: This expression has type a list
This is not a function; it cannot be applied.
# let rec maxlist lt identity = function
| h::t -> let mt = maxlist lt identity t in if lt mt h then h else mt
| [] -> identity
;;
val maxlist : (a -> a -> bool) -> a -> a list -> a =
# let imaxlist = maxlist (<) -99999999;; Line 1, characters 15-26:1 | let imaxlist = maxlist (<) -99999999;;^^^^^^^^^^^Error: This expression has type ‘a -> a list -> a
but an expression was expected of type int # (<);;- : ‘a -> a -> bool =
# maxlist (<);;- : ‘_weak1 -> _weak1 list -> _weak1 =
# maxlist (<) 0;;- : int list -> int =
# let imaxlist = maxlist (<) (-99999999);;val imaxlist : int list -> int =
# imaxlist [-19; 27; -33; 12];;
: int = 27
# let maxfoo lt = maxlist lt 0;;
val maxfoo : (int -> int -> bool) -> int list -> int =
502-day $ ocaml
OCaml version 4.08.1
# (* DNA fragment analyzer. *)
type nucleotide = A | C | G | T
type fragment = nucleotide list
type acceptor = fragment -> fragment option
type matcher = fragment -> acceptor -> fragment option
type pattern =
| Frag of fragment
| List of pattern list | Or of pattern list
| Junk of int
| Closure of pattern
let match_empty frag accept = accept frag
let match_nothing frag accept = None
let rec match_junk k frag accept = match accept frag with
| None -> (if k = 0
then None
else match frag with
| [] -> None
| _::tail -> match_junk (k 1) tail accept) | ok -> ok
let rec match_star matcher frag accept = match accept frag with
| None -> matcher frag
(fun frag1 ->
if frag == frag1
then None
else match_star matcher frag1 accept)
| ok -> ok
let match_nucleotide nt frag accept =
match frag with
| [] -> None
| n::tail -> if n == nt then accept tail else None
let append_matchers matcher1 matcher2 frag accept =
matcher1 frag (fun frag1 -> matcher2 frag1 accept);; type fragment = nucleotide list
type nucleotide = A | C | G | T
type acceptor = fragment -> fragment option
type matcher = fragment -> acceptor -> fragment option type pattern =
Frag of fragment
| List of pattern list
| Or of pattern list
| Junk of int
| Closure of pattern
val match_empty : a -> (a -> b) -> b =
val match_nothing : a -> b -> c option =
val match_junk : int -> a list -> (a list -> b option) -> b option =
val match_star :
(a -> (a -> b option) -> b option) ->
a -> (a -> b option) -> b option =
val match_nucleotide : a -> a list -> (a list -> b option) -> b option =
val append_matchers :
(a -> (b -> c) -> d) -> (b -> e -> c) -> a -> e -> d =
UCLA CS 131 lecture 2021-01-21 OCaml and types
simple OCaml function to reverse a list
let rec reverse = function | [] -> []
| h::t -> (reverse t) @ h
This is wrong, because its type (a list list -> a list) is wrong.
let rec reverse = function | [] -> []
| h::t -> (reverse t) @ [h]
You can debug your OCaml program by looking at its types. By the way, this program is too slow.
L@M cost is proportional to len(L) + len(M)
Cost of reverse is O(N**2).
Lets fix that by use an *accumulator* its an extra argument
to subsidiary functions that accumulates the work youve already done
(to save work later).
A function with an accumulator is more general, but in some sense its easier to write.
let rec revapp a l = match l with | [] -> a
| h::t -> revapp (h::a) t let reverse l = revapp [] l
Or more simply:
let reverse =
let rec revapp a = function
| [] -> a
| h::t -> revapp (h::a) t
in revapp []
Q. Does OCaml not have a cons of type a list -> a -> a list that we could use instead?
Q.cons: a->alist->alist
let rcons a b = b :: a;;
This fixes the types, but doesnt help much.
let reverse =
let rec revapp a = function
| [] -> a
| h::t -> revapp (rcons a h) t in revapp []
Q. how to understand in correctly? A. Simple form:
let v = E1 in E2
E1, E2 arbitrary expressions v identifier
means: evaluate E1
evaluate E2 in a context where v = E1s value. (fun v -> E2) (E1)
A more complicated form: let f x = E1 in E2
equivalent to:
let f = (fun x -> E1) in E2
equivalent to:
(fun f -> E2) (fun x -> E1)
Q. True but thats ignoring the potential for doubly linked lists @Frank A. Lets not do that; not functional.
Q. What is (fun f -> E2) means? I am sure it is not give input type function f and return type E2
A. Its a higher order function that takes (fun x -> E1) as an argument.
Q. So accept is a function in match_empty, while its just a type in
match_none?
A. In match_nothing, accept can be any type.
Q. So if you have a function a->b, a and b could technically be functions? A. Yes, functions are values, so a can be a function type.
Q. Why are we using == instead of =?
A. == for object identity (pointer comparison, which is faster),
= for value equality. Here, == is good enough.
Q. Kind of a useless question, but what does OCaml call an any type if youve already go through all letters a-z?
A. Maybe a1; but you should rewrite your code!
Q. why do we accept the matches rather than just returning them? A. In general, we need to give the caller the ability to reject
some matches that our pattern allows, so that we can give the caller alternatives. This can be more efficient than returning all
possible answers.
Q. when calling this function, how would you define frag and accept? how would you use the nameless matcher thats nested inside the overall function?
A. (make_matcher PAT) [A;G;T;C;] (function | C::x -> Some x | _->None) This code creates nameless matchers all over the place, and uses them as part of the way it glues together little functions into bigger ones.
Q during break. Is let A = E1 in let B = E2 in E3 equivalent to let A
= E1 and B = E2 in E3?
A. Yes, except for scope. In the former, the scope of B does not extend into E1; in the latter, it does.
[Aside: announcement: delaying midterm from Thu Feb 4 to Tue Feb 9.] Aside: parsing problem has three components:
1. Recursion (comes with the territory native to OCaml)
2. Disjunction (Or) same nonterminal can be LHS of multiple rules we just looked at this with make_or_matcher
3. Concatenation (List) is what acceptors are for
basic idea if you have List [P1;Pn] hm = matcher for P1
tm = matcher for List [P2;;Pn]
how to glue together hm, tm.
let append_matchers matcher1 matcher2 frag accept = matcher1 frag (fun frag1 -> matcher2 frag1 accept)
Type is this thing:
(a -> (b -> c) -> d) -> (b -> e -> c) -> a -> e -> d =
let append_matchers matcher1 matcher2 = fun frag accept ->
matcher1 frag (fun frag1 -> matcher2 frag1 accept)
Translation environments.
Compiler: translating from source code to machine code
#include
int main (void) { return !getchar (); }
gcc -E runs just the preprocessor, and turns this into lots of stuff from stdio.h
extern int getchar (void);
int main (void) { return !getchar (); }
gcc -S converts this to assembly language:
The software tools approach: you have a POSIX pipeline of programs Bell Labs 1970s for Unix
low level stuff (IoT e.g.)
preprocess compile to asm asm to .o
ld to .exe run
The IDE approach Integrated Development Environment
Single app that does it all (including interpreting your mouse events etc.)
Do modularization of this app via OO (object oriented) techniques rather as a Unix/POSIX-style pipeline
Xerox PARC 1970s for Smalltalk
You can edit the source code for your editor on the fly, press DoIt and your editor
operates differently high level stuff
Compiling vs interpreting
compiling: translate to machine code, run that code
+ better runtime performance
interpreting: keep source code analog in RAM, an interpreter
+ easier to debug
executes the text of the program
Can translate source code to byte code software defined instruction set.,
classic example: Java: Foo.java -> Foo.class
byte-codes
hybrids
dynamic linking
executable can run ld (dynamic flavor) to bring in some code
from a file (or net or etc.) into RAM, so that the executable
java interpreter reads Foo.class and then executes its bytecodes.
can use the code
self-modifying code! (under some control) performance implications
just-in-time compilation
Q. So does the IDE just bundle the preprocessing, compiling, assembling, linking and execution, meaning its technically still the same pipeline, just automated? Or is there a fundamental difference? A. It can just bundle, and some IDEs do, but some keep source code,
parse trees, etc., in RAM, to help debugging.
shell transcript
501-day $ ocaml
OCaml version 4.08.1
# let rec reverse = function | [] -> []
| h::t -> (reverse t) @ h ;;
val reverse : a list list -> a list =
: int list = [1; -9; 2; 7; 5; 3; 1]
# let rec reverse = function
| [] -> []
| h::t -> (reverse t) @ [h] ;;
val reverse : a list -> a list =
: int list list = [[1; -9]; []; [2; 7; 5]; [3; 1]] # (::);;
Line 1, characters 0-4:
1 | (::);;
^^^^
Error: The constructor :: expects 2 argument(s),
but is applied here to 0 argument(s) # let rcons a b = cons b a;;
Line 1, characters 16-20:
1 | let rcons a b = cons b a;; ^^^^
Error: Unbound value cons
Hint: Did you mean cos?
# let rcons a b = b :: a;;
val rcons : a list -> a -> a list =
let rec revapp a = function | [] -> a
| h::t -> revapp (h::a) t
in revapp []
;;
#
val reverse : _weak1 list -> _weak1 list =
# reverse [[3;1];[2;7;5];[];[1;-9]];;
: int list list = [[1; -9]; []; [2; 7; 5]; [3; 1]]
502-day $ ocaml
OCaml version 4.08.1
# (* DNA fragment analyzer. *)
type nucleotide = A | C | G | T
type fragment = nucleotide list
type acceptor = fragment -> fragment option
type matcher = fragment -> acceptor -> fragment option
type pattern =
| Frag of fragment
| List of pattern list
| Or of pattern list
| Junk of int
| Closure of pattern
let match_empty frag accept = accept frag let match_nothing frag accept = None
let rec match_junk k frag accept =
match accept frag with
| None ->
(if k = 0
then None
else match frag with
| [] -> None
| _::tail -> match_junk (k 1) tail accept)
| ok -> ok
let rec match_star matcher frag accept =
match accept frag with
| None ->
matcher frag
(fun frag1 ->
if frag == frag1
then None
else match_star matcher frag1 accept)
| ok -> ok
let match_nucleotide nt frag accept =
match frag with
| [] -> None
| n::tail -> if n == nt then accept tail else None
let append_matchers matcher1 matcher2 frag accept =
matcher1 frag (fun frag1 -> matcher2 frag1 accept)
let make_appended_matchers make_a_matcher ls =
let rec mams = function
| [] -> match_empty
| head::tail -> append_matchers (make_a_matcher head) (mams tail)
in mams ls
let rec make_or_matcher make_a_matcher = function
| [] -> match_nothing
| head::tail ->
let head_matcher = make_a_matcher head
and tail_matcher = make_or_matcher make_a_matcher tail
in fun frag accept ->
let ormatch = head_matcher frag accept
in match ormatch with
| None -> tail_matcher frag accept
| _ -> ormatch
let rec make_matcher = function
| Frag frag -> make_appended_matchers match_nucleotide frag
| List pats -> make_appended_matchers make_matcher pats
| Or pats -> make_or_matcher make_matcher pats
| Junk k -> match_junk k
| Closure pat -> match_star (make_matcher pat);;
nucleotide = A | C | G | T
type
type fragment = nucleotide list
type acceptor = fragment -> fragment option
type matcher = fragment -> acceptor -> fragment option
type pattern =
Frag of fragment
| List of pattern list
| Or of pattern list
| Junk of int
| Closure of pattern
val match_empty : a -> (a -> b) -> b =
val match_nothing : a -> b -> c option =
val match_junk : int -> a list -> (a list -> b option) -> b option =
val match_star :
(a -> (a -> b option) -> b option) ->
a -> (a -> b option) -> b option =
val match_nucleotide : a -> a list -> (a list -> b option) -> b option =
val append_matchers :
(a -> (b -> c) -> d) -> (b -> e -> c) -> a -> e -> d =
val make_appended_matchers :
(a -> b -> (b -> c) -> c) -> a list -> b -> (b -> c) -> c =
val make_or_matcher :
(a -> b -> c -> d option) -> a list -> b -> c -> d option =
val make_matcher :
pattern -> nucleotide list -> (nucleotide list -> a option) -> a option =
UCLA CS 131 lecture 2021-01-26
Types in OCaml, Java, C++, etc.
Well start by looking static type checking (compile-time) vs dynamic (later)
What are types? why do we have them?
Types are more trouble than theyre worth. Is that so?
What is a type?
* A type is a set of values
int = { -21474348, , -1, 0, 1, . 214734347 } There are set-oriented languages
Q. Do these set oriented languages allow for countably and uncountably infinite sets?
A. Sort of. You can define predicates p, {x | x elem p}.
* A type describes how objects are represented in memory. (Some details, but perhaps not all.)
struct s { int a; char b[7]; double c; } x;
char *p;
int *q;
char * and int * have same representation on x86-64. But thats not guaranteed on all platforms.
&x.b[0] (char *) &x.a == sizeof (int)
* A type is a set of objects and operations defined on those objects.
Like C++ in which a class has everything private except some constructors and methods.
* (research) A type is a set of objects, operations, and axioms about the objects.
Distinction between primitive and constructed types. Primitive types are builtin.
int double char bool .
Constructed types are defined by the user. char *const *p;
char b[7];
struct s { int a; char b[7]; double c; } x;
Primitive types have plenty of issues all by themselves. portability issues
int What are the set of int values? Its machine dependent.
int 32-bit 2s complement integers (most common nowadays) 16-bit
36-bit -N = ~N + 1 ones complement -N = ~N signed magnitude
char -128..127 if GCC (other compilers may disagree) unsigned char 0..255
signed char -128..127
float IEEE-754 floating point (by far most common implementation) CS 33
set of objects for float
32-bit representation
s 1 bit sign (0 -> positive, 1 -> negative) e 8 bit exponent
f 23-bit fraction (*not* a mantissa)
The number represented:
+- 2**(e-127) * 1.f if 0 < e < 255+- depends on s1.f stands for the base-2 number 1.11010…0101where f = 1.11010…0101 +- 2**(-126) * 0.f if e = 0These are the *tiny* or *denormalized* numbers. The number 0 is in this category.All bits 0 represents +0 There is a -0. So: two distinct representations of 0. Issues because of this. float big[10000000]; memset (big, 0, sizeof big);float f = 0.0, g = -0.0;f == g && memcmp (&f, &g, sizeof f) != 0+- ife=255,f=0 float f = 1, g = 0;f / g returns infinity f/g – 1 returns infinityf/g – f/g returns something else, not a number+-NaN ife=255,f!=0 (NotaNumber) float f = 0;f/f returns NaN as well.Issue: how do you compare NaNs to numbers? float f = 0;float nan = f/f;if (nan < 5)By convention: NaN are never <, =, > any float.
So: !(x
and it will happen if x is a NaN
float f = 0.0/0.0, g = f;
f != g && memcmp (&f, &g, sizeof f) == 0
Q. Sound like Nan have the property of superposition (just for fun) A. Whats that?
An axiom that wed like to be true. float f, g;
f != g implies that f g != 0
This is true of real numbers. Is it true for float?
YES. The tiny numbers make it possible.
You could underflow without them in subtraction. a = f g; Yields zero without them
if f and g are close enough.
Q. would this be true for 0 and -0?
A -0.0 != 0.0 implies that -0.0 0.0 != 0
Common fallacy: Never compare floating point values for equality. if (f == g)
if (fabs (f g) < 0.00001 * f) if (f == 0.0) …Note that there are no exceptions in conventional floating point arithmetic. The system uses special values instead. Infinities, NaNs, +-01/ -infinity == -0.0An alternative approach would be to throw an exception if you overflow (or underflow). No NaNs, just exceptions.Should you use exceptions or special values?partial functions vs total functions return a superset of values x86-64 hardware support both models (special values (NaN)and exceptions (trap)). You can flip a bit to get the other behavior. Hardly anybody does this. Why not?* Fortran didnt have exception handling.* Lots of useful computations do even better without it.Say, a weather map. Some uses of types.[* Annotations (int x; tells the programmer some useful info) (int x; tells the compiler useful info, to generate better code)Q. this is specific to C++, but does using annotation auto slowdown the speed of compilation compared to using a true annotation A. No. (But Im no expert.)* Inference (int i; float f; ….; return i * f;) OCaml is big on type inference * Checkingstatic (before the program starts)compile-time link-time vsdynamic (at runtime, as the program runs)(same code can succeed 1000 times, then fail)This can be a spectrum. some languages have bothQ. So there arent any static typed interpreted languages A. Classic Java.Strictly speaking, the implementation is interpreted or compiled, not the language.PROBLEM: Write a subset-of-C interpreter. This is doable!*strongly typed* = all operations are type-checkedC is not strongly typed, because you can cast pointersand this can get you into trouble*undefined behavior* – the system can do anything it wants. Type equivalencetwo types T and U, does T = U?two standard answers*name equivalence* – two types are the same if they have the same name.*structural equivalence* – two types are the same if – theyre laid out in memory the same- they behave the same as far as the implementation (ABI) goes ABI = application binary interfaceexample of name equivalence in C/C++ struct s { int a; struct s *next; };struct t { int a; struct t *next; }; These are different types because their names s and t differ. struct s x; … struct t y = x; // not allowedexample of structural equivalence in C/C++typedef int s; typedef int t;s x; ….ty=x; //ThisisOK. Suppose we wanted structural equivalence in C/C++. struct s { int a; struct t *next; };struct t { int a; struct s *next; };Subtypestwo types T and U, is T a subtype of U?assumption is that if T is a subtype of U and U is a subtype of T then T and U are equivalent.Why do we need subtypes?general idea: we have a specific value of type T that we want pass it as an type U argument to a more-general functionSome classic examples: Pascal:type lower_case_alpha = ‘a’..’z’; is a subtype of the type charCommon Lisp:(declare (type (and integer (satisfies evenp)) x))a subtype of integerset of all integers i for which (evenp i) returns true In class-based systems, a subclass is a subtype. class C extends P { … }C is a subtype of P Generally speaking, subtypes can have operations that supertypes lack. subset of values implies superset of operations. char * vs char const * (char const * == const char *) Which of these is a subtype of the other?char *p; // unrestricted char * pointerchar const *q; // restricted (only reads allowed) char * pointerp = q; is not allowed, because you might have *p=’x’ later. q = p; is OK, because youre adding a restriction Therefore char * is a subtype of char const *.Polymorphism* A function that accepts many forms is polymorphic. (it might be int->int function)
(it might be float->float function)
(it depends on the context)
Many languages are polymorphic in this sense.
Two major forms of *ad hoc polymorphism* in conventional languages
overloading
identifying which function to call by examining the types of its arguments (and maybe the type of the context)
arithmetic operators in C, C++, Java, Standard ML (not OCaml)
in statically compiled languages, its often implemented by having several different functions at the machine level to implement one source-code function
You write: x = cos(y);
At the machine level:
call cosf trailing f mean float, not double or long double
call cosld for long double
.
This can even work in C++, where you can define your own
overloaded functions, via *name mangling* myfun (x); // overloaded
call myfun$li (because x is of type long)
This works as long as the caller and callee agree on the
name mangling convention.
Q. Would a templated function in C++ be overloaded even if you
have to specify the types in the function call via
A. No, thats the next topic (after coercion).
coercion
implicit type conversion
double f = 0; uses coercion.
doubled=f+1; Thereisno+:double->int->double
All we have are + : int -> int -> int + : double -> double -> double
so we coerce 1 to double first
This adds convenience, BUT it can get complicated.
double f(int, double); double f(double, int);
f(1, 0) is ambiguous in this situation The C++ rule is to disallow this. There are a lot of rules here.
We want something better and cleaner than this, if we can get it. Parametric polymorphism
When a functions type contains one or more *type variables*. Without parametric polymorphism (in low-level Java, no sugar)
// Walk through a collection of strings, remove all the length-1 strings.
static void remove1 (Collection c) {
for (Iterator i = c.iterator(); i.hasNext(); )
if (((String)(i.next())).length() == 1) i.remove();
}
// wont compile without a cast, because i.next() return Object, // and Object doesnt have a length method
// This is ugly and slow because of the runtime check.
With parametric polymorphism (in low-level Java, no sugar)
// Walk through a collection of strings, remove all the length-1 strings.
static void remove1 (Collection
for (Iterator
if (i.next().length() == 1) i.remove();
}
// No cast needed, no runtime check.
// Code is safer, because no exceptions possible.
Q. So Collection
A. If only it were so. Unfortunately its not see later.
Q. Could the same code work by just having a string iterator i.e without specifying the Collection
A. Yes, but an Iterator needs something to iterate over; this example used a Collection.
Two typical ways to do parametric polymorphism.
templates (C++, Ada): basic idea:
template represents code that doesnt exist yet, but it will exist once you instantiate it.
Compiler can finish the job of compiling the code
when you say what the types actually are in a caller or user
of the template. You may need multiple copies of the machine code, one for each type of instantiation.
class TemplateName
generics (OCaml, Java): code is compiled and type-checked just once,
Only one copy of the machine code needs to exist, itll work on any type of arguments.
This works because all types smell the same. Every object is represented via a (say) 64-bit pointer.
– *scratch* transcript
(+ 3 4) 7
(+ 3.0 4) 7.0
+ : int -> int -> int
: int -> float -> float
: float -> int -> float
: float -> float -> float
UCLA CS 131 lecture 2021-01-28
Q. Professor Eggert I wanted to ask how much hw2 is weighted.
A. all hws weighted equally, except Project, which *2
4% of total grade
more about types
generics vs templates
for now, assume generics
example (list of objects) not involving Java generics like List
Integer is an object that contains an int.
Object o = new Integer (0);
Object o = 0; // not allowed because primitive types (int, char)
// are not objects for efficiency
List l = new LinkedList();
l.add (new Integer(0));
Integer n = (Integer) l.iterator().next();
^^^^^^^^^^^^^^^^^^^ is of type Object
This cast involves a runtime check that the Object
is actually of type Integer.
List
l.add (new Integer(0));
Integer n = l.iterator().next();
^^^^^^^^^^^^^^^^^^^ is of type Integer
because l.iterator() is of type Iterator
Q. When you call l.iterator(), does it always start at the first value?
A. Yes, the first time.
// simplified class
class List
void add (E x);
Iterator
}
class Iterator
E next();
}
Subtypes and generics
This is bogus!
List
List
Collection
printAll(cs); // not gonna work
So, we use a *wildcard* = a type variable with no name.
Collection > = Collection of unknown = a collection
void printAll(Collection > c) {
}
for (Object i: c)
System.out.println(i);
Q. Collection of unknown seems more compatible, then what are some
advantages of being specific.
A. If you need a string, then Collection
Collection >only for generic code that *ought* to work on any
sort of collection.
Q. If >is a primitive, would the code fail since its not an Object?
A. ? stands for some reference type, so it cant be a primitive type.
Now lets use ?
}
}
}
// OK, but too specific, wont work on Collection
static void convert(Object[] a, Collection
for (Object o: a)
b.add(o);
static void convert (Object[] a, Collection >b) {
for (Object o: a)
b.add(o); // not allowed
// Type checks, and works on String[], Collection
// BUT, doesnt type-check on String[], Collection
static
for (T o: a)
b.add(o);,
// *Bounded* wildcards; this typechecks even for String[], Collection
// and so is more flexible/general
static
for (T o: a)
b.add(o);
}
}
equivalent to the following harder-to-read formulation:
static
for (T o: a)
b.add(o);
Collection extends T>means unknown type must be a subtype of T
so you can use either kind of bound.
Q. Does this work for String[] Collection
itself?
A. It works, because (T extends T) and (T super T).
You can think of ? in Java types as being _ in OCaml patterns:
it matches any type and we dont bother to give that type a name.
Good style rule is to use ? when you dont need the type name anywhere else.
How are generic types and methods implemented? Long story, short version:
We compile the code just once, generate bytecodes thatll work
on any value of any type that the type variables can be instantiated to.
source code: List
byte code as if: List lt = .
This process is called *erasure* the runtime has less
type information than the compile-time does because
it doesnt need all the details.
+ compile once, all type checking is done
source code: ((Collection
bytecode sees: ((Collection) c) .. (Integer) c.get()
so some extra runtime checks needed afterwards
Theres more to Java typechecking than this
Java has function types too,
Theres a skeptical camp that says Java, OCaml, C++, etc. are going too
far in type checking. Rules are too complicated, users dont understand
them, etc. Static type checking is more trouble than its worth.
Dynamic type checking let the runtime worry about it.
one possibility: be like Java, but do the checks at runtime.
another possibility: duck typing
If an object waddles like a duck, and quacks like a duck,
it is a duck.
I.e., dont worry about an objects type.
Just worry about whether it does what you want to do.
o = ;
o.quack(10);
o.waddle(100);
# If this works, then fine! Who cares if its a duck?
Focus on behavior, not types.
Smalltalk (1970s)
Python, Ruby,
Write the code, dont let types slow you down.
+ flexibility
less reliability (runtime checking o.quack(10) throws an exception)
performance issues
restart at 15:01
Q. during break. In homework 2, if we have a rule Exp -> [(, Expr,
)]. It might be expand forever. Can we assume that it is not the
case. Or do we need to do Iterative Deepening Search.
A. You can assume that the rules are not left-recursive, e.g., we
wont have a rule Expr -> [Expr, !]. But other forms of recursion
are allowed. In the case of your rule, it shouldnt expand forever
because each iteration will gobble up a ( from the input, and the
input is a finite sequence of tokens.
Java overview
Whats the worlds fastest computer?
top500.org
500 fastest computers by a benchmark a lot of linear programming,
matrix ops, that sort of thing.
Q. Does that include quantum computers?
A. Yes, in theory. Q computers are terrible at this benchmark.
This is a large number of ARM core, specially designed for this
computer, these ARM cores have SVE Scalable Vector Extension
theyre all little vector processors in their own right. Each
processor is SIMD (Single Insn Multiple Data), you have thousands of
such processors the whole system is MIMD (Multiple Insn Multiple
Data). Previous Top500 winners were use GPU-style computation, but
here we just have CPUs that each can do a bit of GPU-style stuff.
This is easier to program because you have just one kind of processor.
Where Java came from:
Bunch of engineers at Sun Microsystems (now part of Oracle)
early 1990s commercial research computers of the future
what they had:
workstations and servers all running Solaris (Linux-like)
(core written in C)
all with same CPUs (SPARC)
all running connected to the Internet
what they were looking forward to:
Internet of Things toaster on the Internet
can we port Solaris etc. to toasters, and get it to work
short answer: yes, you can do it
long answer: there were real problems
problem
1: SPARC is not universal, x86 is not universal, ARM neither
mfrs wanted to choose their own CPUs
2: C/C++ is too unreliable they crash, blue screen of death,
your toaster burns up the toast, etc.
3: C/C++ executables are too big (39 MB for Emacs, say)
take too long to download over the Internet (28 kb/s)
4: reliability issues are not just crashes, theyre also
memory bloat.
First approach they took: C++ (too complicated) C+- (C++ cleaned up)
Did what greatest CS experts do:
drove down the road to Xerox PARC
and stole the best ideas
What was at Xerox PARC? (Apple stole macOS from them)
bitmapped display
mouse pointer
machines on a network
text processing applications
easy to use
What did Sun steal from this? Smalltalk had:
object-oriented programming environment
bytecode interpreter (compile Smalltalk source into portable bytecode)
different versions of interpreter for different machines
bytecode was the same on all machines
bytecode was small garbage collection
runtime checking for null pointers, subscript errors, etc.
Downside:
Smalltalk uses a weird syntax.
Smalltalk uses dynamic type checking.
So Sun designed Oak took Smalltalk core, switched to C syntax,
added static type checking, got it working in the lab.
Renamed to Java.
As a writeup, they decided to write a browser (ca 1994, most
popular browser was Mosaic (UIUC), written in C++, crashed a lot, bloated,
). Wrote a browser called HotJava, which was like Mosaic,
but a lot simpler, more reliable, and would let you download
Java bytecodes from the Internet and run them in the browser.
Mosaic guys fought back plugins (C++ modules that you can download).
scripting via JavaScript.
They won, in the sense that Firefox from Mozilla from Mosaic.
We discovered that Java was pretty cool: hit a sweet spot
between C/C++ (statically checked, faster) and JavaScript
(dynamically checked, slower).
Q. Are most modern browsers descended from Mosaic?
A. Going too far. Chrome isnt. IE was. Current Microsoft browser isnt.
Java basics
statically typed for reliability
uses a C++-like syntax
types have two major categories:
primitive (small set)
byte short int long char boolean float double
these types are all the same size on all platforms
int is always 32 bits
long is always 64 bits
Write once, run everywhere. make code easy to port,
OK to give up some performance
reference types (larger set, can be user-defined)
internally, implemented via pointers to the actual values
but theres no direct access to the pointers.
Theyre not pointers in the C/++ sense.
There is no &x operation.
variables an object slots are always initialized (to zero)
no uninitialized variables
arrays are objects (in C, C++: a[i] == *(a + i), but not true in Java)
zero-origin, like C,
contents are always in the heap (no arrays on the stack) (for reliability)
int guesses[10];
int[] guesses = new int[10];
size of array is fixed once allocated.
guesses.length == 10
classes are like C++, except no multiple inheritance
reminder: read all Webber chapters with Java in title
Q. Why is putting contents on the heap for reliability?
A. When the stacks popped you might have dangling pointers to it.
Q. as the hw3 due date pushed back also?
A. Due on 02-05.
Q. Some python ml libraries have neat uses for multiple inheritance
A. Yes, it can be quite useful.
Q. will next weeks lecture material be tested on our midterm?
A. Yes, everything before midterm is fair game, including HWs.
Q. Are we gonna go over how the java GC works?
A. Yes, but later.
UCLA CS 131 lecture 2021-02-02
Q. will we get any practice midterm problems?
A. Ill post some on CCLE and will be covered in discussion section.
Java continued
last time Java generics and types, motivation for Java
this time: a few more basics first, concurrency, Java Memory Model
Java is single inheritance (simpler)
BUT if we had *only* single inheritance,
each class would have to live in its own subtree
even if its behavior were compatible with a subtree somewhere else
This can happen all the time:
You have a graphics package
objects represent regions of the screen
they have shapes (Rectangle, Square, Ellipse, )
they have behaviors (Drawable, Colored, )
they have other attributes
This extra compatibility is a good things, as the class
can be used in lots of places
Java has a substitute for multiple inheritance:
interfaces
An interface is defined like a class, *except* interface not class,
it provides no implementations
public interface Stack {
void push (int);
int pop ();
}
Any object is of type stack, if its class implements the Stack
interface
public class ArrayStack implements Stack extends Array {
void push (int) { MUST BE SOMETHING HERE }
int pop () { MUST BE SOMETHING HERE }
}
Q. In this case, the ArrayStack must be a Stack of Arrays of ints right?
A. Its a stack implemented via an array (as opposed to a linked list,
say).
Java compiler can check that you implement interfaces properly.
There is an interface hierarchy, but this hierarchy operates
in a different way from the class hierarchy.
If class C extends class P (inherits from P), then it inherits
Ps code thats wealth.
If class C implements interface I, then it inherits an obligation
to implement the interface.
This idea of using interfaces is very common.
interface Runnable { void run(); }
Its so common that we often would have a class and a tightly
associated interface, except we use an abstract class it
supplies both wealth and obligations on its child classes.
abstract class A {
}
void push (int i) { This actually implements push. }
abstract int pop (); // This is an obligation on the child.
Q. So the multiple inheritance part in Java can be accomplished by a
class implementing multiple interfaces?
A. Yes, basically thats it.
Q. Can an interface have data members (or whatever the equivalent
is)? And if so are there any restrictions on them?
A. Id have to look it up; its generally not considered to be good
practice anyway.
Q. why use an abstract class over an instance (and vice versa)?
A. You cannot construct an object that is directly of an abstract class
for the same reason that you cant do it for an interface.
interface I { void run(); }
class C implements I { void run() { do something here } }
I v = new I(); // bogus
I v = new C(); // OK
abstract class A { void run(); void push (int i) { do something here }}
new A(); // bogus
class B extends A { void run() { do something here }}
A w = new B (); // OK
Q. why use an abstract class over an interface
A. The common pattern is this:
you have an API with a few core methods (store into slot i,
load from slot i)
and a few extra methods that can be built from the core
(copy from slots i..j into slots m..n)
abstract class can provide naive implementations for
the extra methods; might be slow in some cases
but good enough for others
descendant classes must implement the core methods
and can override the extra methods as needed for efficiency
The Object class in Java.
Object is the root of the Java class hierarchy even a class
with no declared parent, has Object as its parent.
Its an important class go read the documentation on the Object class?
Well take a brief (not a complete) tour of it.
public class Object {
public Object (); // constructs an Object
// why is this here? the most boring object
// with no particular features
// You can use this in test cases, if you want your
// code to work on anything.
public boolean equals (Object obj);
// o1.equals(o2) is true if o1, o2 should be considered equal.
// default implementation is o1==o2 (reference comparison)
public int hashCode();
// o.hashCode() yields a random integer that is actually
// a function of the objects contents.
// used to build hash tables
// default implementation of o.hashCode() is sort like ((int) o)
// except you cant do that cast in Java
// Important property: hashCode() and equals() must be compatible.
// o1.equals(o2) implies (o1.hashCode() == o2.hashCode())
// without this, hash tables stop working
// the Java compiler cant check this, so youd better be careful.
public String toString();
// o.toString() yields a string representation of the object.
// useful for debugging.
// System.out.println(o) work on any object,
// because println can call o.toString() and print its characters.
public final Class getClass(); //* actual API is more complicated
// o.getClass() returns an object of type Class that represents
// os class.
// This is an instance of reflection when a program
// looks in the mirror and sees a representation of itself.
// Could be used by a fancier debugger that prints out
// not just the value of an object, but its type.
// final is Java keyword, and is the opposite of abstract
// abstract only API; no implementation; child classes must define
// final implementation is here, and it cannot be overridden
// Whats final for? To prevent objects about lying about their
// their types
// Q. How objects lie about their type
// A. If getClass were not final, then
// class Foo { Class getClass() { return .getClass(); }}// NG
// another reason: one way to implement Java
// is for every object to have a type field (first word of memory)
// that type field tells you the objects class
// final lets Java compiler generate code that simply
// returns the type field so it can be very fast
// so the reason is efficiency
// Ive seen final used a lot in Java simulations
// to gain performance not recommended.
// it gains performance because the compiler can inline
// a call to final methods
// but its brittle youre stuck with the implementation.
//* public final Class extends Object>getClass(); // still not quite right
// Class extends X>where X is the static type of object
// Thread t = ;
// t.getClass() returns an object of type
// Class extends Thread>
protected void finalize() throws Throwable;
// o.finalize() is called by the garbage collecter
// just before the storage for o is reclaimed
// default implementation is empty
// you can override it to provide a cleanup action
// e.g., close a database connection
// protected just for the GC (nobody else should call it)
// throws X means if you call the method, it might not return!
// instead, it might throw an exception of type X.
// exceptions form a hierarchy
// methods have to say whether they might throw an exception.
// Java compiler checks this statically by inspecting
// catch, throw, and the APIs of methods
protected Object clone() throws CloneNotSupportedException;
// makes a copy of an object and returns the copy.
// protected because intended for internal Java use
// several more methods: o.notify(), o.notifyAll(), o.wait()
// talk about this shortly.
}
//* A description of a Java class.
public class Class {
}
Q. What exactly is inside the Class type?
A. https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/lang/Class.html
Q. Is isInstance() and instanceOf looked down upon? I know java has
them, but I think Professor Smallberg said to tend away from such
things
A. Yes, theyre disparaged. The idea is that you shouldnt test
to see whether an object is a particular type; use object dispatch instead.
if (o instanceOf String)
System.out.println(o);
else
System.out.println(toString(o));
System.out.println(o); Threads
processor CPUs (or CPU cores) each with own insn pointer,
that can run programs simultaneously, one per processor
process a program being run by a processor, each with its
own separate memory, so they cant affect each other
thread a program being run by a processor, but its sharing
memory with all the other threads in the same process
race conditions galore are possible
very fast to communicate from one thread to another
via shared memory
Java is thread-oriented Threads life cycle in Java
* creation via new resulting thread state: NEW
Thread t = new Thread();
more commonly:
Runnable r = ;
// interface Runnable { void run(); }
Thread t = new Thread(r);
* t.start() resulting thread state: RUNNABLE
allocates OS resources,
give the thread a virtual or real processor,
so it has an IP, calls t.run() using that IP
If a thread is RUNNABLE, it can do:
* keep running (execute ordinary code) RUNNABLE
* yield() RUNNABLE
let some other thread run, if it wants to
advice to the thread *scheduler* which decides
which threads get processors
* sleep(10) TIMED_WAITING
* o.wait() WAITING
* do I/O BLOCKED
* terminate the thread TERMINATED
(e.g., run returns)
Bad part about threads:
race conditions that occur when two thread compete
for the same location: one writes, the other reads or writes.
The resulting behavior depends on who wins.
E.g., one thread:
n++; load n, add 1, store n
Other thread;
n; load n, subtract 1, store n
synchronization simple solution:
synchronized method.
class C {
int n;
synchronized int next() {
// Compiler generates code here that grabs a lock on this.
int r = n++;
// Compiler generates code here that releases the lock.
return r;
}
}
C o = new (C); o.next()
Way its implemented, via a lock, one lock per object.
If youre in a synchronized method, you have exclusive access
to that object.
Can be done via spin locks (you keep the CPU busy while the objects
locked), so the body of a synchronized method should be simple
and fast so that other theads dont spin (waste CPU resources
doing nothing useful).
We dont like this inefficiency!
So more sophisticated synchronization techniques.
o.wait()
remove all spinlocks held by this thread
wait until o becomes available (thread goes into WAITING state)
o.wait(100) like o.wait() except TIMEDWAITING for 100 ms
o.notify()
marks o as available
wakes up one of the threads waiting on o (if there are any)
at random depends on scheduler
o.notifyAll()
wakes up *all* the threads waiting on o
let them negotiate which one wins the object
lets you control scheduling better than the scheduler does
Q. So just using wait/notify can starve threads at random?
A. Yes, in theory a thread could wait forever.
Synchronization approaches built atop wait/notify
1. Exchanger
represents a rendezvous point.
Exchanger e = ;
thread 1:
Object w =
Object v = e.exchange(w);
thread 2:
Object s = ;
Object t = e.exchange(s);
v == s && t == w
The thread that gets to the rendezvous point first waits.
2. CountDownLatch.
like N horses starting a horse race.
each trot to their stall in starting gate; theyre frozen.
starter starts the race, all the horses run in parallel
3. CyclicBarrier
repeated CountDownLatch
use this in large simulations
weather simulation of US
break it into 100 pieces
each thread simulates weather over 1/100th of the US.
occasionally check their neighbors to see what its weather looks
like, while the neighbor is stopped
4, 5, 6, .
But theres a problem: performance:
bottleneck introduced by needing to synchronize
We want to avoid these bottlenecks
no synchronize
no o.wait()
so, next time look at the JMM, which tells you when you can do this.
Q. So every object in java effectively has its own mutex and
condition variable.
A. Yes, though typically they arent used and can be optimized away
in some cases.
Reviews
There are no reviews yet.