• Encapsulation as an aid to optimisation

• Optimisation of low level types • Java optimisation

Optimising OO code
• The OO programming style is intended to improve overall code maintainability
• Performance optimisation is intended to improve the performance of the code on a particular hardware/software environment
• The two are therefore often in conflict
• However most of a programs run-time comes from a very
small fraction of the code
• Code optimisation techniques should be applied to this small fraction only
• The rest of the code should be written for maintainability

Memory layout
• Because each Objects is implemented as a contiguous region of memory your choice of object structure constrains the data memory layout.
• However memory layout is usually very significant for code performance.
• This often makes existing OO code hard to optimise.
• Solution is to encapsulate the performance hot-spots

• Encapsulation is the key concept of OO programming
• Particularly encapsulation of data structures.
• Data structures are private
• Only visible within the type itself
• Rest of the code only interacts with type via its public interface.
• This makes it easy to modify the data structures
• Only the owning type needs to be changed
• Provided the external interface is unchanged (or extended) rest of the code remains the same.

Performance Encapsulation
• Usually most of the run-time of a program comes from a very small fraction of the code (the hot-spot).
• If you can encapsulate this code fraction (and the data it works on) in a class then optimisation is now easier not harder.
• As long as external interface is preserved then internal data layout can be restructured however is necessary for performance.
• May have to resort to arrays rather than low level objects but only in the time critical code sections.
• If the problems hot-spot is known in advance try to design- in this encapsulation from the start.
• However first implementation should be written for correctness not speed as optimised version is easier to write if you have a reference version to compare results with.

OO and Hardware acceleration.
• This approach is particularly useful for acceleration hardware
• Accelerators typically have their own private memory spaces
• Data needs to be copied in/out
• Objects also have private data structures
• Again data copied in/out
• With careful design a good OO interface can completely encapsulate the use of accelerators.

Low level types
• For HPC applications the main problem is likely to be heavily used small classes
• C++ handles this well, though good idea to use concrete classes and default constructors.
• Less successful in Java: 1000 complex number classes may have 1000 words taken up with vtable-pointers/class-references
• Adding 1000 complex numbers may also take 1000 method calls • This is unfortunate as small classes can be very useful
• In Java try to define higher level classes
• E.g. corresponding to an array of complex numbers
• Better still a physically meaningful concept like “pressure field”

Immutable Objects
• Immutable Objects don’t change their internal state after construction.
• Java Number classes work like this.
• Operations on Numbers always produce a new Immutable
Object leaving the arguments unchanged.
• Prevents many types of subtle programming error.
• Not so good for performance when large amounts of internal state
• Lots of additional Object creation • Lots of data copying.
• Need to adjust programming/design style to accommodate performance requirements.

Functional languages
• Compare with functional languages
• Everything is immutable
• There are no variables an no assignment
• Just definitions that define new immutable values as functions of others.
• In functional languages the programmer has no control over memory layout
• Instead the Compiler controls data location and lifetime

Scientific OO
• Scientific problems are often naturally expressed in a functional or operator notation
• A = f(B) A = B * C A =  B
• This does not always map efficiently or cleanly onto normal
• If implemented as methods on B
• Constructor needs to be called to generate a new A for each call
• Loses symmetry between B and C for binary operations
• Consider implementing as methods on A
• Objects can be created at a higher level and live longer
• Encapsulation still holds where result and arguments are the same type
• Non intuitive
• Objects can’t be immutable.

Java Optimisation
• As with other languages obtaining good performance from Java requires careful consideration
• A number of standard performance optimisations can be applied to Java codes
• e.g. loop unrolling, common sub expression elimination. • JIT compilers usually very good at this.
• OO Performance optimisations should also be applied • e.g. minimise object creation
• There are also a number of Java specific optimisations to consider

Java Optimisation
• Java bytecode is typically unoptimised
• Performance often comes down to the choice of JVM • Use of a good JIT is essential
• JIT compilers have potential advantages over static compilers
• Can use profile information to identify hotspots
• Full knowledge of dynamically loaded classes
• On the other hand compilation speed more important so highest levels of code optimisation may not be attempted

Java Arrays
• Java only implements one-dimensional arrays and arrays of arrays (see previous lecture)
• Many scientific codes naturally map to multi- dimensional arrays
• Arrays of arrays can have performance problems
• Need multiple dereferences
• Increased memory use
• Less control over data access pattern

Multidimensional arrays
• Solution is to use a one dimensional array and implement methods to perform the index calculations
public final double getData(int I, int j, int k){ return data[ I + I_size * ( j + (j_size * k))];
• This is all a bit low level • Ugly syntax
• Efficient (methods should in-line)
• Refactoring the data-layout is now a local change much easier than index re-ordering in C/Fortran

Data Structures
• Data structures (Collections) are considerably slower than simple arrays in Java
• Standard libraries still typically much faster than writing your own equivalents. • java.util.ArrayList and java.util.Hashmap introduced in 1.4
• Adding often faster than with java.util.Vector and java.util.Hashtable as no synchronisation present

• Synchronised methods and classes are often slower than unsynchronised methods and classes
• Even sequentially
• Overhead associated with synchronized methods also influences scaling of parallel code

General guidelines
• Encapsulate the code hot-spot
• This ensures you are free to optimise without impacting on the rest of the code.
• Where ever possible be as restrictive as possible
• Declare methods as final
• Declaring a method as final helps the compiler to inline the methods
• This is good programming practice as it reduces dependency between different parts of the code
• Static vs instance variables
• Declaring variables as constants (static final) allows the compiler to carry
out more optimisations
• Instance variables are initialised every time a new object is created – static variables are initialised once

Performance Case Study • Simple case study
• Multiply a vector of 102400 complex numbers • Implement in C, F90 and Java on Sun platform
• Three Java versions
• Naïve version
• Each complex Number a separate object • New object created for each multiply
• Simple version
• Each complex Number a separate object • Method on result object
• Vector version
• Single object to represent the vector • Method on result object

Performance Case Study • Performance
• Full optimisation flags used throughout
Java naïve
Java simple
Java vector

• Java JIT compilers seem to be equally good at
optimising code as conventional “static” compilers
• Java performance issues are OO performance issues, not interpreted language issues
• A range of OO specific performance issues exist
• A range of Java specific performance issues exist

