Week 2 Interactional Challenges
Mobile and Ubiquitous Computing 2020/21
Sandy Gould, School of Computer Science University of Birmingham
Overview
This week we will consider the interactional challenges of mobile and ubiquitous computing
devices. The topics well cover include:
The challenges of interacting with mobile devices, with a particular focus on mobile typing
How we create interfaces in Android and how we deal with user input
The challenges of interacting with ubiquitous computing devices that lack traditional
screen/keyboard interfaces
What devices without any discernible interface might look like
Important concepts
We are focusing specifically on why it can be difficult to design interactions (i.e., something that allows the intention of a person to a change the state of a system) in mobile and ubiquitous computing contexts. Difficult that is, in the sense that it is not possible to easily map more traditional desktop-based computing paradigms (which presume a keyboard and pointing device) to small portable devices.
Typing on touchscreens
Typing on touchscreens can be difficult. There are several reasons for this:
The keys are close together. On a phone-sized device, a finger is usually much wider than a single key. A finger might cover four or five keys (vertically and horizontally). This makes it difficult to hit the keys you might be aiming for.
There are no physical keys, so you cant feel where one key begins and another ends, like you might do with a physical keyboard. Your fingers occlude the keyboard, so you cant see what your digits are hitting either.
There is no haptic (touch) feedback to let you know youve hit the correct key. You might have your whole phone set to buzz, but unlike the feeling of a physical keyboard you cant feel the keys being depressed. (The lack of reactivity to peoples fingers is one of the reasons Apples butterfly keyboards have frustrated so many people.)
On some screens the touch layer of the screen is some distance from the display itself. This leads to a parallax effect, where the apparent location of keys doesnt correspond with the input layer (think of the ticket machines at the station). Modern smartphones dont have this issue to any noticeable degree because the touchscreen and display are laminated together during production.
Some screens are tiny (think of a smartwatch). Your finger might cover virtually the whole screen!
Leiva et al. (2015) investigated typing on very small smartwatch screens. They trialled three designs shown in the figure below, ZoomBoard, Callout and ZShift. ZoomBoard allowed people to type by zooming in on parts of an on-screen QWERTY keyboard to select their key. Callout allowed people to run their finger over the keyboard, with the selected character shown above the keyboard. When they had the character they wanted they lifted their finger. Finally, ZShift showed a zoomed view of the keyboard above the keyboard indicating the key that a user had just pressed. Each of these keyboards was trialled on three (mocked-up) screen sizes: 18mm, 24mm and 32mm. Leiva et al. found that participants liked ZoomBoard and ZShift best, and that Callout was significantly slower to use than the other two keyboards on the smallest screen. For the 24mm and 32mm screens there was no real difference in typing performance,
suggesting that the design of these keyboards becomes more critical at very small sizes, but at larger sizes designers have more freedom.
One of the reasons touchscreen keyboards do not work as well as they might is that most of them use the QWERTY layout, despite the fact that a touchscreen permits keys to be placed on screen in any place and in any orientation. Oulasvirta et al. (2013) took on this challenge, and developed the KALQ keyboard. This keyboard was designed specifically for two-thumb typing on a tablet keyboard.
The design of the KALQ keyboard was determined both by the structure of the English language (i.e., the frequency of letters and the likelihood of a given character being followed by a given character) as well as the performance of participants. Once Oulasvirta et al. had developed their keyboard they tested it with participants over a large number of training trials. Eventually, participants peformed better with the KALQ keyboard than with a QWERTY one.
The most important thing to note about this work is that although the design of the KALQ keyboard is superior from an ergonomic perspective, it took a lot of training for participants to master it. Just as with physical alternatives to the QWERTY layout like Dvorak, familiarity is the main contributor to typing performance. Optimisations only become important as people gain competency. This is why, although QWERTY is terrible for all sorts of reasons we persist in using it: for most people the time, effort and frustration of learning a keyboard in the short term means that they do not persist with it long enough to see the benefits of using it. That said, in some cases where the layout of keyboards is mandated, using these optimisation techniques. A similar approach to the KALQ work was taken in the redesign of the offical French keyboard layout.
Other work to optimise keyboards has made them easier to use slouched on the couch (Azenkot & Zhai, 2012) or while walking (Goel et al., 2012).
Input engineering
So far we have considered work focused on redesigning touchscreen keyboards to make them easier to use. What if instead of redesigning keyboards we redesigned the strings that people were trying to type? Of course, this is difficult to do with written language (although text speak is an excellent example of where people have adapted language to fit a particular keyboard layout), but not everything we type into our phones is written language. Win codes from bottle caps, bus stop identifiers or even attendance codes in class are examples of non-language input that we could design for the intended context in which it is to be used.
Well focus on short links. You often see these in places where making long URLs shorter is aesthetically useful (see the picture below) or where it is intended that someone manually enter the URL on a device. These links typically look something like http://bit.ly/y7vB92e.
What do you notice about these links? See that they contain a mix of uppercase and lowercase characters as well as numbers. The first problem with these kinds of links is that O (capital o) and 0 (zero) are a normal part of the output of link shortening services. So is I (capital i) and l (lowercase L). In many fonts the glyphs for these characters are essentially interchangeable. This makes it extremely difficult to tell them apart in printed media if certain fonts are used.
Theres another problem with them too. They are really really difficult to enter on mobile phone keyboards. This is because entering mixes of number and uppercase and lowercase characters involves significant mode switching on many touchscreen keyboards (e.g. to enter the number keyboard, to switch between uppercase and lowercase keyboards).
Why then are these links designed in this way (and they are designed)? The designers do this to provide a sufficient set of possible links (i.e., to have sufficient entropy). However, in the case of, for instance, bit.ly links, the possible set is unnecessarily huge. Twenty-six uppercase characters, 26 lowercase characters and 10 digits mean that that for any character in a link there are 62 options. There are seven characters in a link which means the set of possible links under this scheme is 3,521,614,606,208. This is an unnecessarily large number of links.
Even if we accept bit.lys need to represent 3.5 trillion URLS, can we improve the design of these links to make them easier to type on mobile phones? The answer is yes! I looked at this question in my own research (Gould et al., 2016). I started with the premise that entering strings of lowercase characters is preferable to mixed-character strings on touchscreen keyboards because it requires no mode switching. We can easily calculate that if we use only one of 26 possible characters for each character in our string, then to match or exceed the 3.5 trillion set size required we will need nine lowercase characters. (Which gives us a set size of
5,429,503,678,976). This does mean that every link needs two extra characters though. Does this extra work mean that these lowercase-only links will be harder to enter? This was something that I investigated with my colleagues (Gould et al., 2016).
Characters
a-z, A-Z, 0-9 a-z
a-z
a-z
Length (characters) Permutations
7 627 7 267 8 268 9 269
Set size
3.5 trillion 8 billion 209 billion 5.4 trillion
To start our paper, we used Monte Carlo modelling to try and understand the effort required to enter these short links. Entering a lower case character on touchscreen keyboards is usually just one tap. You tap the character you want and move on to the next. But because of mode switching, capital letters and numbers require more taps. From a default, lowercase keyboard, entering a capital letter is two taps; one to hit shift and then another to hit the character. For entering a number, you might need to switch to number mode, hit the number and then hit the mode selector to go back to the character mode. How many taps are needed will depend on the exact code; having certain characters next to each other (e.g., two numbers in a row) could speed up entry. Having to change mode after every character will slow it down. The number of taps for these kinds of alphanumeric codes therefore is not uniform, but on a distribution. Some will require few taps, others will require more. Our Monte Carlo model generated large numbers of example short links and then simulated their entry on a touchscreen keyboard to work out how many taps theyd take. We found that a seven-character short link code normally takes eleven or twelve taps to enter.
40% 30% 20% 10%
0%
7 8 9 1011121314151617
Total taps required
If we were to use nine lower case characters instead, then the
expected number of taps would be nine there is no mode
switching. Our simulations suggest, then, that typing these
shorter more complex codes will actually be slower! We ran an
experiment with participants where they had to type links onto
their phone as they appeared on a screen. We found that the
lowercase links were indeed quicker to enter than their mixed-
case counterparts, though they are longer in terms of
characters. It is absolutely possible to design input to better match the input device you expect people to use to enter that input. Designers should do so wherever possible, whether its for entering on a smartphone or a video game console.
Probability
60 40 20
0
3000 8000
Response time (ms) log scale
25000
Type
Lowercase Mixed
In other work that Ive done with colleagues (Wiseman et al., 2016), weve looked at designing these kinds of non-word tokens in the context of Digital Radios. Our project was with the BBC, and we were thinking about the kinds of codes that could be displayed on the matrix displays of digital radios. These codes could be entered into a web-app to pair devices and provide a kind of joined-up listening experience for things like recommendations.
In this study we didnt just think about using non-word strings for but words too these words are still non-word in the sense that they still have no meaning in the system, but the words are meaningful to people typing them, which is potentially beneficial. The required complexity for these codes was relatively small compared to what youd need for link shortening 500 million. This was what was suggested by the BBC engineers so that collisions would be avoided. This is probably quite ambitious, given these codes are short-lived and their pairing nature. How many people are likely to be pairing a radio in any given hour?
Nevertheless, this was what the engineering required, and we developed three kinds of codes, numeric, alphanumeric and words. The number codes were formed of nine digits, each between zero and nine. This gave a set size, or complexity, of 999,999,999 (109). The alphanumeric codes were five characters long, contained mixed cases and numbers. This gave a set size of 550 million (565). Finally, the word codes used three three-letter words from (e.g., cat buy rod) the English language. There are about eight hundred three-letter words in the English language, so there are 512 million permutations (8003). We tested how people would enter these onto smartphones and laptops.
We found that generally the pattern was similar across devices, just that people were much slower entering the codes on smartphones than on laptops. Most significantly, people were able to enter the word codes most quickly on both laptops and smartphones. This was partly down to the lack of mode-switching that, say, alphanumeric codes might require, but it probably reflects that although the words had no semantic value, they were still recognisable to participants in the way that w8M o2x P1a would not be. This allows people to chunk the information more easily. When you see cat buy rod on the screen, you probably only need to look once you read cat buy rod and its in your head the whole time youre typing. Random
Frequency
numbers or alphanumeric characters are much harder to chunk in your working memory because they dont have any meaning. This means that word-based codes are more likely to be useful in scenarios where people have to remember codes, for example if they need to read a code from a router under a sofa and then type it into a device on their kitchen table.
The salience of words is behind the idea of what 3 words which is a proprietary geolocation system based on the idea of referring to locations around the world in terms of three English words. The University of Birmingham, for instance, is located at fruit like agenda. The idea is that these are more accessible or easier to remember than grid references, but are more precise than the kinds of directions people might otherwise use. Of course, though, being proprietary youre reliant on the system to make any kind of sense out of these words because they have no meaning in the context of mapping.
Interfaces for Android
A smartphone app is not of much use without an interface. All apps in Android are based around the idea of an activity. An activity represents a single screen of an application. Each activity contains views. A view can be a simple widget like a textbox or a button, or it could be a more complex interface element.
Activities are defined by XML documents. The XML defines the characteristics of an activity, as well as allowing for the definition of sub-elements of the activity, such as views. We can write the XML by hand if we want, but it generally makes more sense to specify an activity and its sub-elements using the GUI interface builder that is part of Android Studio. (Although it is often necessary to do some editing by hand after we have done so.) XML provides a way of describing interfaces declaratively (what sub-elements exist, their size, position etc). It does not say much about how an interface should behave though. This is specified in a Java class file associated with each activity. This is good practice separating the specification of what an interface should look like from the directions for how it should behave.
In your labwork this week, youre going to be learning about how you create activities in Android. Theres also a demonstration of collecting text-based input that Ive recorded for the Week 2 Canvas page.
Interacting with IoT devices
Weve talked so far about interactions with devices with keyboards. These might be tiny tiny smartwatch keyboard or they might be full-sized desktop keyboards. What about interaction without keyboards as in input technology?
We now have a huge variety of devices. Some of these have properties that vary continuously (e.g., the size of their screen; the speed of their network connections). But they also vary discretely (screen or no screen; camera or no camera). Whereas shrinking those keyboards down to fit a tiny smartwatch screen was about designing for those continuous aspects, designing for discrete variability is much harder. (If a device has no screen the design of a touchscreen keyboard is obviously moot.)
We focus here on an example of a ubiquitous computing (IoT) device, the Amazon Dash Button. These buttons allow you to automatically order items from Amazon at the push of a button. The idea is that you will have one in every cupboard for your toothpaste, loo roll, deodorant etc. When youve nearly run out you press the button and Amazon send you more.
The button has very limited interactivity. It has one button that
recognises three kinds of press. A single press, a double press
and a long press. Inside these buttons we have an ARM-based System on a Chip (SoC, 120MHz), a very small amount of RAM (128KB) and flash storage (16Mb) and a WiFi chip (wireless n). It has a single RGB LED as a display. Last week we talked about how powerful modern smartphones are. This is a more typical IoT device: it is capable of incredible
computations by historical standards but would be no good at anything demanding by modern standard. Still, all this is in a package with a non-rechargeable, non-removable battery. Despite its sophistication, when it runs out of power you have to chuck it away. Is this a good idea? Probably not, but it illustrates how cheap internet-connected devices can now be.
The commercial buttons are, to a degree, pre-programmed and ready to go. The buttons Id normally demonstrate in class are developer editions, which have to be linked to Amazons cloud infrastructure. Thinking back to the pairing codes we covered earlier; how do we pair them? Well theres an app now, but originally the app only worked in the USA (thats that dynamic heterogenous environment portable technology sometimes escapes from the physical parameters youve assumed). Instead, you had to put the device into pairing mode, which would launch a very simple webserver on the device and turn it into an access point (this is quite incredible for a device that is designed to be thrown away). After a complex series of steps involving certificates the Dash Button would be pair with an access point. The microphone is interesting here because it is used in combination with the mobile app to receive programming instructions. The smartphone app encodes the preferences into a set of ultrasonic pulses. The Dash Button microphone receives these and uses it to update its state also very clever stuff! You can read Jay Grecos investigations of the ultrasonic pairing. Its quite technical but might be interesting to some of you.
One of the challenges of using these devices is that of the dynamic heterogenous execution environment. These are consumer products designed to be used in the home. They do not understand the 802.1X authentication that the Universitys WiFi uses. WiFiGuest can only be accessed through a portal and that needs us to have display and a web browser. This makes our whole set-up very much more complicated if we want to use them, say, on campus. Remember, when you design these kinds of devices its very hard to predict where theyll end up. Doubtless, the Amazon developers did not anticipate these buttons being used in a classroom setting!
The point with these devices is that, although relatively powerful and complex, getting them working in a world that presumes a screen and a keyboard is complicated. As the number and variety of devices we have multiplies, we need to think of innovative ways of allowing people to easily interact with devices that have little or no interface. This is especially true for a future where we might have no interfaces at all! Sensors and machine learning mean we may often end up interacting with ubiquitous computing systems that have no obvious interface, either for input or output. How do we design for interaction in such contexts? Is interaction still a meaningful concept?
Amazon Web Services
The Dash buttons make use of AWS services. We do not have scope to go into detail because AWS is sufficiently vast and complex that we could run an entire module (or even degree programme) on them. The Dash buttons make use of the AWS simple notification service which allows Amazon to see that a particular button has had an action (e.g., long press), and then follow some rules for dealing with it. Normally, I have these buttons hooked-up to AWS Lambda functions. These are part of the AWS serverless offering and they allow you to run arbitrary functions without needing to set up any kind of environment. Your code runs, it returns
and then it disappears. Theres no configuration, no server, just your function. Its an interesting way of building microservices that massively reduces complexity for fairly simple operations in terms of getting things hosted.
When Amazon Dash Buttons are pressed (the developer versions anyway), they try and send information to an Amazon Lambda endpoint. The information it sends is just how it was pressed (long, short, double). In class I normally demonstrate an example of this written in Javascript. You can see it below. What it does is not really important; the important thing to take away is that sometimes a very simple interaction (e.g., a button press) can have a very complex infrastructure supporting it. What do you think will happen to these buttons if Amazon decided to make a breaking change to their APIs or cloud offerings? Well talk more about how devices talk to one another next week when we go into detail on APIs for ubiquitous computing.
Many simple partscomplex system
The internet
Moving beyond interaction
Interactional challenges of mobile and ubiquitous computing devices Sandy Gould, University of Birmingham 20th January 2020
A lot of the technology were using is still quite traditional, in that there is some kind of display and some kind of physical input device. On a desktop computer you might have a keyboard and mouse. On your smartphone theres a touchscreen. These devices have different affordances and are used in different ways, but they have a lot of commonalities; their graphical user interfaces, keyboard-based entry. Modern smartphones are not that different from desktop computers.
We know a lot about interactions with these kinds of devices, and a lot of what we knew from desktops was helpful in designing smartphones. But ubiquitous computing goes far beyond these kinds of interactions. In the future, device may not have any discernible interface for people to interact with. They may rely on sensors to understand and act. Designing for this kind of context is difficult; interaction has been about people having intentions and carrying these out doing things to manipulate the state of machines. Sensor-based system may not have this kind of interaction; they might sense the world and act accordingly, without any kind of intent from you as the user. This requires us to think differently about how we interact with technologies in a way that Mark Weiser didnt really see coming. Its a difficult design challenge, and over the coming weeks youll learn more about how machines can be built to understand context in ways that might mean they can act independently.
References
Azenkot, S., & Zhai, S. (2012). Touch behavior with different postures on soft smartphone keyboards. Proceedings of the 14th International Conference on Human-Computer Interaction with Mobile Devices and Services, 251260. https://doi.org/10.1145/2371574.2371612
Goel, M., Findlater, L., & Wobbrock, J. (2012). WalkType: Using Accelerometer Data to Accomodate Situational Impairments in Mobile Touch Screen Text Entry. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 26872696. https://doi.org/10.1145/2207676.2208662
Gould, S. J. J., Cox, A. L., Brumby, D. P., & Wiseman, S. (2016). Short links and tiny keyboards: A systematic exploration of design trade-offs in link shortening services. International Journal of Human-Computer Studies, 96, 3853.
Leiva, L. A., Sahami, A., Catala, A., Henze, N., & Schmidt, A. (2015). Text Entry on Tiny QWERTY Soft Keyboards. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 669678. https://doi.org/10.1145/2702123.2702388
Oulasvirta, A., Reichel, A., Li, W., Zhang, Y., Bachynskyi, M., Vertanen, K., & Kristensson, P. O. (2013). Improving Two-thumb Text Entry on Touchscreen Devices. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 27652774. https://doi.org/10.1145/2470654.2481383
Wiseman, S., Soto Mino, G., Cox, A. L., Gould, S. J. J., Moore, J., & Needham, C. (2016). Use Your Words: Designing One-time Pairing Codes to Improve User Experience. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/2858036.2858377
Reviews
There are no reviews yet.