Now, let me turn to the second topic, which is how do we maximize the inside, we get out of doing user research and they’re really kind of two things that one typically looks at is doing research, which is reliability and validity of the study. So reliability is the question of if I do the same study the second time, I get the same results, if I do the third time, still get the same result, if it were ten times would get the same result all 10 times. Yeah, this is very important because if you get the differences on every time we try, then this is random. It flip a coin and it’s not a very good.
So we do want reliability, but we also want visitors believes it is the defining thing for the real world, not just for the research world, but for the real world and reliability we can operationalize based on these probabilities and the people who are going to be the best trained user experience, because we have a lot a lot of details about how to do this. But we have very good formulas and ways to operationalize reliability and see whether something is statistically significant or just random. But in the end, we don’t really have. So I would like to operationalize politicians saying if we make a business decision based on this research recommendation, are we actually going to make more money in the company? Is that going to work? Does it move the needle in one of those cliches? And I think that is at least as important as reliability, because if you study something that doesn’t make that translate to the real world, it also calls for nothing. And there’s so much attention that the reliability I think it’s because we do have great formulas for calculating it. And so people forget about opportunity. And I think this is very similar to this old anecdote about the drunk guy who’s looking for his lost car keys under the street light. And the police officer comes along and says, this is what’s happened. I lost my copy. The police officer tries to help him find the keys and they can’t find it. Finally, the police officer turns to say they sure you lost your car keys here under the streetlight, a drunk. I said, no, no, I lost him over the top. But, you know, it’s easy to look here because it’s like I think it’s the same here is easier to think about reliability, because we have a very firm handle on that concept, really create formulas for looking at it. And then people often forget about validity. And, you know, particularly when you think about the way that reliability is usually thought about when it’s just to thought about in terms of things like statistical significance, which we usually like to say, well, if the P is the probability is less than five percent, then we’ll say this is a good, good research finding. But think about that. That really means the probability that it’s just a random outcome. So this means that if I do a study, that is if I do 20 studies based on the bits and bytes finding and one time is the finding, well, I’m doing it from my website or for my project, then this is pretty good odds, right? 19 times out of 20 studies, I’m going to give you the correct recommendations for what to do one time and I give you the wrong recommendation. This is still so much better than just guessing, which means half the time I’m coming to the right recommendation half the time and giving the wrong recommendation. So for any individual problem, those parts are good. This is my favorite thing, is that it’s usually recognized no need for reliability. If you think about the world, that’s good for the world. But there are thousands upon thousands of user research that is being done every year and both people do the convention. But also people do that for publication in various places. At least one out of every twenty research papers you read is that those findings and now is much worse. And it’s actually even worse than that because what you hear about. So all of it, all the people who do a research study and there finally comes out the same as what we already know, when instead you’re not going to hear about that. So let’s return to, let’s say, paragliders. So somebody else does a study about Diana by saying, oh, this is there. That’s not going to get a bit hyped up and be covered everywhere. And I get millions of tweets and everything. Right. You don’t hear about those things, but you do hear about the study to come up with some. We had an unknown quantity of unexpected finding, which is very often wrong. For example, take the question of our response times and how fast you see Web pages appear. So basically almost every study ever done of this shows that response times and Portland, the faster the website, the more you do your business e-commerce site, you can see the sales go up. Google can see people do more searches. I mean, everybody has studied that almost will have this finding. So one more place goes and does response time studies. As far as the Web pages are better. You’re probably going to hear a. However, I about 10 years ago, somebody says response times don’t matter, a lot of attention at the time, and that’s one of the problems about this kind of the statistical analysis, because if the unprecedented is only very rare, that this would be would be bogus finding. Yeah. If it’s only your own project, fine. But if you look at that all over the world, tens of thousands studies is going to be a lot of bogus findings there.
You’re going to hear about the wrong ones. So my recommendation on how to deal with that is that you’re kind of back in one of these kind of balance scales and you put all the research, find this is one thing. And once again, I said Française response times are important. This is the other skill you put that one study says response times don’t ever need to balance scale, come out and says this is why the evidence, this is the evidence. You should trust that this little tiny one thing. So that’s one way of dealing with that. And so you can very easily be distrait if you think about only about statistical significance. So let me ask you what I just want to talk to you about the study we did about websites in 2016. What makes that an interesting study? Why do I did I tell you about it? Because we had one of the thousand data points is before Internet users monitoring tasks for the three websites will be tested, two different countries. Which of these numbers basis they could study? Well, 1000 data points means that our statistics and numbers are kind of reasonably tight. So that’s good. But what I think really makes this a study where you can what the results are interesting is actually the 43 websites and know 15 different tasks because it’s the diversity that makes this generalizable. This is not just one website for some unknown reason. You know, people couldn’t find it. They were looking for this across a broad set of websites of different industries and where people try to make a difference. Much as that’s one thing and they couldn’t find it. It’s it’s type many things and they often couldn’t find it. That’s what makes it believable. So I really want to encourage more diversity in research and trying different things, but just kind of like trying one thing, an enormous, huge and so is the number of users typically. And that’s what a lot of people that almost like the only thing many people care about, I think that’s the least important. You want to look at how many different things were tried and how many different designs.
Another example from another of the things we did, we had five different studies of mobile text comprehension that came, I believe that that project. So what’s the best piece? That is? If we got started, it was pretty useless and far because nothing to do with twenty five users and six other articles in that study with twenty, thirty seven users, a focus groups or another report cited with more than 200 users. Well, it’s a bit of a trick question here because that’s what makes this the best. It is all of them, because what happened was that they all came out with the same result. And the result was that test comprehension is actually either the same or slightly better on mobile than it is on on regular computers, which was a big surprise to us because back in 2010, other research had found that test companies is much worse, been working for mobiles than when reading from desktop screens. And so when we did our first study with like 20 users and the study came back saying and the results came back saying completely the opposite result. All the research, I just said, I don’t trust this. There’s going to be, I think, some other article, just maybe these four articles. There was something good about them that’s been in the study of this result now. I thought so. We did these two studies but uses system, which is a platform for remote studies that we use a lot and we’ve had good results. But that said, I was just thinking, well, maybe someday we’ll use a zoom in six countries. Maybe it works for like ecommerce studies, but not for takes comprehension studies. So we sent people into the lab and we watched what they were reading. This article was rather boring study, but we watched them widely read the articles and the same result. And the focus was more like talking to people about how they read stuff so that kids have always been very nice to read a book like I used to do that I like reading on my iPhone and then the more big sample size to see if we could. Illicitness ticks down and again, same result now. So. So why is this? Well, I think one big explanation is that if you think about what was a phone in 2010, there was a phone. Now, current phone has six and a half times more pixels on the screen. So, you know, one thing is the. From your screen, another one is on this kind of a big but much bigger, much better screen. So that’s it is one possible explanation. But in any case, my general point was I result almost the opposite of the old research. You should not trust that when you when something is like it used to be to behave. If you decide to test out B, that is usually wrong. You say you’re going to be one of those five percent cases where you happen to be wrong. But does this happen from time to time? So if you do a study and it’s the same as you expect and you can kind of be OK, let’s move on to the next thing, because you can’t spend endless resources on every single problem you have. But if you do a study and it comes out opposite of what all previous studies show, then worry that maybe something’s wrong with your study. And then I said, then do it again a little bit different ways if you still get the same. And after a while, you know, yeah, we can trust that things have changed in the world, but not the very first time we do a study that’s different than anything else. OK, so I wanted to emphasize this kind of point about diversification in the research sense that we don’t just do one thing and go by that. And so there’s a variety of ways in which we can diversify. It’s one of them is just different people. And we do want to do that because we do know people are different.
So if I just take one person and have them use it, decide, can we go by that, what happened in that one session? Probably not. Could be just a widow that happens. So we do want to have more than one person. But if we talking about hundreds of people, thousands of people, usually it’s not worth doing. It’s more important to different personas of just different individuals, so different types of people. And what we say, if anybody went to the persona’s Cassaday that comes out is that it’s more important to have behavioral differentiation to emphasize that are defined by people’s behavior, characteristics and their demographic characteristics, because to for the type of things we look at that we look at the user experience, there’s usually not that much difference between demographic characteristics, let’s say like age, gender or location in one city or another city. They usually tend to be kind of about the same for other types of of of of market research type of things people might want to date. That could be important. But from our type of interaction with certain behavioral characteristics and says the job, the computer experience, the technical skills, they tend to be more important, but also diversify, not just for the users, but also the work you are testing to test different designs. But just like your one best idea to test your five best ideas, I have five different designers each come up with the best idea and test that. That’s a much better way of getting insight into what is going to be the best solution for your design from to appeal to different things. That is one thing, but a variety of different things and finally also try to employ different methods. So I have, you know, such a super advocate of usability testing and putting a user down in front of the computer to see what they do. Yes, but there are also other methods. We also want to use and use, you know, use measurement methods, use qualitative methods, use a bunch of different things. And they all come together, will help you get a much richer insight and try to do the same thing again and again and again and sort of pouring, you know, the gold into this one part of this one, one approach. Try to spread it out a little bit and get a little bit more of different types of insights. It is also worthwhile having good methodology. So here is a chart that shows a product I did a while ago where we had twenty different teams to do the same study, which is not something we usually do. But in this case, twenty different teams all test the same software product which had eight kind serious design problems in that show, how they found the right exercise, how many usability problems that team found. And on the X axis, we kind of hated how well they did the study in terms of all the recommended ways of doing usability testing that people will have heard if they went to the course we had a few days ago, I think over the course of usability testing anyway. And what you can see here, there’s a pretty strong correlation that the better you run your study, the more you follow the methodology. And that is the more you will actually learn about your software. That said, it’s 100 percent correlation. And also even the worst teams that only won one fifth of what’s freshmen attribute that study. They still found No. Two out of the eight things. They still found a quarter of the things. And if you think about if you doing a product development and you can, like, remove a quarter of the really bad things and you decide that’s still worth doing, I mean, is it worth doing to do a better study? Which is why we do have courses on usability testing. Right. But even the study was still.
People will still find something, so we go do this stuff. So to conclude, I want to show you a painting that I saw here we had out of London a few weeks ago. The title of the painting is Man Proposes, God Disposes. And it’s from 1864. And it shows what I just imagined happens to Sir John Franklin’s polar expedition. And so that was almost never heard from again. And so the oddest of imagines, you know, you can see that the polar bears like the lion on the remains of Sir John. And so that’s what happened to the man in the space of time. Proposals will go unexplored over the weekend. But God says no bi polar bears should have something to eat. So that’s a nice, nice little moralizing they’re trying to eat. But I want to use this sort of as a metaphor to think about our world, because in our world, you know, the similar slogan will be decided, is proposed, but he uses dispose. So you propose something that said this is going to be such a wonderful design day and the future state may and I can’t use it, I’m going to go elsewhere. And so we still have we don’t have that kind of reality check of facing up with nature of this case, nature being the juicers, not the polar regions. And we have to be designed for reality. So the scientists propose, but juicers disposed.
So you better do your user testing or you decide will be like polar bears.