USABILITY Overview: Concepts and Methods ===================================== I. Distinction of terms a. usability attibutes b. usability principles c. usability heuristics ============================== II. Types of Evaluation + Formative: obtaining user requirements in the early stages of design + Summative: evaluating systems that have been built ======================================================== Why Systems are Not Used + excess functionality + accessibility + availability + system dynamics (e.g. response times) + lack of training and user aids + documentation + user conceptualization + consistency/portability + fear ========================================================== Distinction of Terms (Nielsen, p 24-25): usefulness: whether the system can be used to achieve some desired goal utility: whether the functionality of the system in principle can do what is needed usability: how well users can use that functionality Definition: not determined by one or two attributes, but is influenced by a number of factors. =========================================================== Shackel on Usability Criteria -- suggests a system should have to pass the usability criteria of: effectiveness learnability flexibility user attitude ============================================================= ISO Definition The usability of a product is the degree to which specific users can achieve specific goals within a particular environment, effectively, efficiently, comfortably, and in an acceptable manner. ============================================================ Usability Attributes (Nielsen) Defining usability by these attributes: learnability -- ease of learning What level of user experience? efficiency -- navigation memorability -- memory load errors -- handling and messages feedback --help satisfaction -- pleasant to use, ease of use ======================= Eason, 1984 These combined determine usability: task, system, & user Task characteristics: frequency: number of times a task is performed by a user. Considerations include routine and infrequent tasks System functions: ease of learning: effort required to understand and operate a system ease of user: the effort that is required to operate a system once it has been understood and mastered by the user User Characteristics: knowledge motivation discretion ================================== Nielsen's 1990-1993 Usability Principles Use for heuristic evaluation: Simple and natural dialogue Speak the user's language Minimize memory load Be consistent Provide feedback Provide clearly marked exits Provide shortcuts Provide good error messages Prevent errors Maintain user control of the system =========== Eight Golden Rules of Dialogue Design (Shneiderman, 1992) strive for consistency enable frequent users to use shortcuts offer informative feedback design dialogues to yield closure offer simple error handling permit easy reversal of actions support internal locus of control reduce short-term memory load ================================= Heuristic Evaluation Criteria Garzotto, Mainetti, and Paolini, 1995 Communications of ACM, August Richness expresses the abundance of information items and ways to reach them. Ease measures information accessibility and how easy to grasp operations are. Consistency measures application regularity, and can be summed up a simple generic rule: Treat conceptually different elements differently and conceptually similar elements in a similar fashion. Self-evidence expresses how well users guess the meaning and the purpose of whatever (content or navigational element) is being presented. Predictability expresses how well users anticipate an operation's outcome. Readability expresses the overall "feeling" about an application's validity. Readability depends upon all factors mentioned. Reuse considers using objects and operations in different contexts and for different purposes. Reuse promotes consistency (and therefore predictability). ================================= The need for usability analysis (Newman and Lamming, 1995) The need arises when we're faced with the following kinds of questions during design: Is the small size of this screen target going to result in a significant number of errors in selecting it? If the user invoked this command by mistake, will he or she find the escape route? Will the word-processor user remember that there are three different ways of changing the properties of a formating style? ============================================================== The Effectiveness of Usability Engineering Nayak, Mrazek, and Smith, 1995 SIGCHI Bulletin Levels of Acceptance Stage 1: Skepticism The company is focused on product features and the development schedule. There is fear that usability evaluations will lengthen the development cycle. Stage 2: Curiosity There is recognition that products need improvement, and a usability engineer is consulted late in the development cycle. Stage 3: Acceptance There is recognition that usability evaluations must be an integral part of the process, and multiple iterations of design and evaluation are performed. Stage 4: Partnership Consists of a cross-functional group working to gather early customer input to drive the product definition and scope, along with multiple iterations of design and evaluation. ========================= Analyzing and Communicating Usability Data Nayak, Mrazek, and Smith, 1995 SIGCHI Bulletin Why is Usability Data Difficult to Analyze and Communicate? tools and techniques nature of data -- observation based not measurement- based what to communicate who receives the data how to communicate -- systemic or in modules timeliness who collects the data =================== Techniques for Analyzing and Communicating Usability Data Affinity Diagramming writing observations, comments, issues, and concerns on index cards or adhesive sheets. These cards or sheets are then laid out and sorted to derive a pattern or structure observations. Used in many variants throughout the design lifecycle. Also used in analyzing observations from a usability test. Data Visualization -- morphing application to display the transformation from old interface to new interface -- video summary tape presentation -- multimedia presentation -- given the image of an interface design, provide a pop up video segment showing the difficulties users had with the selected control ======================== Techniques, continued: Usability Specifications a living document that represents a usability specification before embarking on design This includes: a proposed schedule of usability activities during the product cycle a usability goals grid a problems/recommendations table ========================== Usability Evaluation -- Team Involvement Pros: Helps educate teams on the process of usability engineering and speeds up the cultural change required for usability engineering to become successful. Increases the speed and efficiency of usability activities. Reduces complaints that the wrong thing has been tested with the wrong set of users. Cons: Requires special team skills of which members may not have been trained in. An influence of "team attitudes" towards the data. "Too many cooks" syndrome. ======================= Dedicated Usability Professional Pros: easier for organizations without a mature usability process in place the usability engineer can be objective may be more efficient from data collection and resource point of view Cons: team acceptance of data is lower the resources are usually too limited to support all the needs =============== Measurement We must be able to MEASURE usability. time stamping keystrokes video/audio capture control room a script Attributes of Measurement: operational definitions of measure(s) scales of measurement validity of measures reliability of measures ================================== Common Measures for User Interfaces -- frequency of use --task completion time -- speed of navigation through the system -- titles and screen formats causing problems -- error counts, requests for help -- amount of work, errors per unit time -- subjective evaluation ===================================== Evaluation Methods: Five classifications: observation and monitoring experimenting and benchmarking collecting users' opinions interpreting situated events predicting usability Formative -- during design -- supports iterative design Summative -- after production -- good for field of beta testing, comparing existing products; not conducive to iterative design process ============================ ETHICS OF RESEARCH WITH HUMAN PARTICIPANTS APA ETHICAL PRINCIPLE 9 (APA, 1982) Includes: ethical treatment of subjects fairness and freedom from exploitation removing undesirable conditions participant anonymity, data confidentiality Thought for the day: It is the software system, not the user that is being evaluated; there are no right answers, no correct behavior! ====================================== USABILITY Nielsen: usability has multiple components and is traditionally associated with these five usability attributes: learnability efficiency memorability errors satisfaction "DISCOUNT" USABILITY EVALUATION METHODS: Heuristic Think Aloud Heuristic Evaluation: Looking at an interface and presenting an opinion about what is good and bad about the interface. Typically, opinions about the interface are made on the basis on a set of rules or heuristics. UI experts are ideal participants for this method. However, subject matter experts (SMEs) are also good candidates. Evaluator Expertise: Heuristic evaluation is systematic and involves a group of experts. The proportion of usability problems increases as the number of evaluators increase (Nielsen). ========================================== General Heuristic Evaluation Process: Each evaluator inspects the interface alone. Opinions, suggestions, annotation of problems found, etc., are submitted by each evaluator. Reports are either verbal or written. Observers often take notes during evaluation. Evaluators discuss the interface. Findings are aggregated. Interfaces are presented either via paper mock-ups or scenarios, or system prototypes. Walk up and use interfaces give the most potential for locating problems. Overall output is a list of usability problems in the interface, annotated with references to those usability principles that were violated by the design in each case in the opinion of the evaluator (Nielsen, p. 159). Usability Principles -- Nielsen, Table 2, pg. 20: simple and natural dialogue speak the user's language minimize the users' memory load consistency feedback clearly marked exits shortcuts good error messages prevent errors help and documentation ================================ Finding Usability Problems Through Heuristic Evaluation study by Nielsen, 1992 Discoveries: -- usability specialists were better than non-specialists at performing heuristic evaluation -- "double experts" with specific expertise in the kind of interface being evaluated performed even better -- major usability problems have a higher probability than minor problems of being found in heuristic evaluation -- usability heuristics relating to exits and use errors were more difficult to apply than the rest -- additional measures should be taken to find problems relating to exit and use error heuristics -- missing interface elements were more difficult to find with paper prototypes, but easier to find in running systems =================== THINK ALOUD METHOD As it implies, a subject tests the system while continously thinking out loud. Thoughts, procedures, ideas, error findings, error recoveries are verbalized. Observer records the dialogue. End results are good qualitatively, but weak quantitatively. However, a cheap and effective method for finding simple and complex usability problems. Some problems with think aloud: -- people are uncomfortable about being observed -- the verbal efforts may interfere with performing the task -- observers often need to prompt the user to think aloud -- user comments are not always in context to the interface Some good things with think aloud: -- minimum training needed -- with practice the method gets better -- users gain confidence -- users can finish tasks faster via think aloud -- users can externalize their problems during the test -- less interference than "coaching" method The use of think aloud evaluation methods in design a study by Wright and Monk (1991): Discoveries: think aloud method is effective when using this method, more persistent problems were detected by the designers of the system than other groups designers cannot predict the problems users will experience in advance of user testing Some think aloud prompts: What will the system do if.....? Why did you do that? Tell me what you are doing now. Keep talking, please....... =============================== Usability testing: the goal is to identify defects or difficulties and correct the problem(s). Types of Usability Testing Methods: high fidelity prototypes: simulation trials: using rough mock-ups or prototypes of the intended system low fidelity prototypes: task scenarios: a situational context in which the system is used other: iterative testing competitive products documentation tests informal lab experiments formal lab experiments field trials/beta testing ====================== Know What Users Are Thinking (Wildman, July 1995, Interactions) Think Aloud as User Response Data -- Qualitative vs. Quantitative Qualitative: capture user's thought process vocalize what users think they are doing as they work through the interface reveal ambiguities in the interface provide the big picture that behavioral measures cannot Quantitative: less revealing but captures the necessary narrow flows and discrete operations Think aloud: users may inhibit questions for fear of appearing stupid mere physical presence of a usability tester may impact situation Paired-user paradigm: co-discovery learning written instructions: (discuss what you think the icons (buttons) in this window mean) ============================================== User Response Data: The Potential for Errors and Biases (Hufnagel and Conca, 1994, ISR) surveys: can produce misleading results if respondents do not interpret or answer questions in the ways intended by the researcher errors and biases in judgement can result when users are asked to: -- categorize a system -- explain its effects -- or predict their own future actions and preferences with respect to use of a system =============== The Testing Environment: Create the scenario (test lab) needed to conduct the test. Establish the following: workstation arrangement comfort/space of the testing lab modifiability of the testing environment room details (lighting/heat/air/cleanliness/noise/distractions) OPTIONS: Simple single room setup -- essentially a quiet secluded room. the test observer is about four to six feet from subject Modified single-room setup -- a large room with a workstation for the subject and an observer station Classic Testing Lab -- A room with one or more workstations connected to an observation room =============================== ROLE OF THE EXPERIMENTER/FACILITATOR - experimenter should have knowledge of the test method and knowledge of the application - may have to provide initial training - neutral observer of process - plays host to participants - keeps subjects "thinking aloud" - watches participants for stress - decides when task/test is completed BE AWARE OF PARTICIPANT SENSITIVITY - participants will blame themselves for problems - facilitator is responsible for watching out - observers have to be careful about their own responses - co-discovery can help reduce stress - help participants finish tasks after testing to get a sense of completion - put each task on a separate page -- don't give participants the total tasks on one page ============================== EVALUATION MEASURES THINGS TO OBSERVE IN TESTING: - Learning factors -- number of new concepts introduced -- how natural or self-evident is it to communicate with the program - Performance factors -- ease of retention -- user interface analysis does it handle user's level of experience how efficient how interruptible - Error recovery factors -- how difficult is it to make an error -- how convenient is it to recover from errors - Ease of use -- effort to learn a task -- effort to relearn a task -- effort to perform a learned task -- effort to recover from errors while doing a task -- attitude towards program - Installation & startup -- time to install -- number of steps -- number of "calls" for help -- time to get the program up and running - Learning and relearning measures -- number of new terms -- number of existing terms used and new ways to use them -- learning time -- number of options user must decide about % pages or screens that are devoted to conceptual info -- relearning time -- number of items user has to deal with at one time -- time to find information -- training time -- % of users who do not need any information aids - User/program interface analysis -- number of key entries to perform a task -- time to perform a task -- % of users indicating that task is too difficult - Task Analysis -- Measures -- time between a user request and program feedback -- time between request and complete execution of request -- number of errors associated with a task -- number of steps to do a task -- % of users who indicate that portions of info provided is irrelevant - Attitude toward program -- survey -- how pleased are the users with the product? -- recommend product to a friend? -- amount of frustration -- willingness to continue to use product -- anxiety test ============================= Simulating Change Through Usability Testing -- (Dumas) Major issues: 1. Testing can have long range influence on product development. Those involved with usability testing need to view themselves as change agents. 2. Involve designers in test planning and execution. Position so that underlying causes of the problems users have with the product can be discussed. 3. Beta testing alone is not sufficient. Sometimes too late. 4. Usability testing -- improve products that are tested and go beyond influencing a product design -- raise the awareness of product designers about usability issues -- improve the technical and managerial skills of product designers and product managers -- stimulate cooperation among design team members, such as engineers, writers, graphic designers These can change the way people think about and develop products: -- structure your test reporting so that it speaks not only to specific, product-related problems, but also to the underlying technical and organizational causes of problems -- provide periodic feedback to designers and managers about progress toward long-term goals. 5. Planning steps (a usability team must do the following): -- identify test goals -- identify the qualifying characteristics of test subjects -- create realistic tasks that test the product or manual -- order and prioritize the tasks -- determine which performance and subjective measurements to take -- create special materials needed to conduct the test -- involve the developers in these planning steps 6. Produce a data log. -- a well-planned test yields a substantial amount of data. A typical test may produce a 15-20 page data log for each subject showing: -- each action of that subject -- as many as eight videotapes for each subject -- notes made by the members of the test team during the test -- notes summarizing the post-test interview -- background and post-test questionnaires for each test subject. 7. Reporting the data. The extent to which data is reported will have a major influence on the impact of the test. Give developers periodic reports as the test progresses 8. A five level rating scheme for identifying user problems with product. Level 1: The problem prevents the user from performing or completing the task Level 2: The problem creates significant frustration for the user Level 3: The problem creates some frustration for the user Level 4: The problem does not significantly affect usability Level 5: The problem is for product enhancement (the next release of the product) 9. The final report: an effective test report involving definition of the following characteristics: scope, purpose, audience, audience tasks, etc. scope: the results of a single test the results of a series of tests the results of a single product line the results of a series of tests of different product lines purpose: to describe the results of a test to describe the problems that cut across a series of tests to demonstrate the lack of consistency between products to demonstrate the lack of cooperation between test and team members audiences: managers writers, hardware engineers, software engineers test specialists/HCI specialists audience tasks: check to see that the test results are reliable and valid report itself -- some options: -- an executive summary of major findings and recommendations for managers -- a volume of global findings of underlying causes that cut across products for some managers, software developers, etc. ============================================ Thought for the Day: It is the software system, not the user that is being evaluated; there are no right answers, no correct behavior! -- no deception or harm -- provide full explanation -- obtain consent form -- permission to participate -- permission to stop test -- permission to be observed -- permission to be recorded and data ------------------------------------------------------------- Severity levels (Dumas) Five level rating scheme for identifying user problems with product: Level 1: The problem prevents performance or completion of task Level 2: The problem creates significant delay and/or frustration for the user Level 3: The problem creates some frustration for the user Level 4: The problem does not significantly affect usability Level 5: Enhancement issues -------------------------------------------------------------------