What's with these flaky tests?
Joel Clermont (00:00):
Welcome to No Compromises, a peek into the mind of two old web devs who have seen some things. This is Joel.
Aaron Saray (00:07):
And this is Aaron.
Joel Clermont (00:15):
Many Laravel developers write tests as part of their software development workflow. And I do the same, I like that. Regardless of when you write the test, if it's before or after the code, doesn't matter. However, one thing we don't like is flaky tests. You run a test today it passes, you run a test tomorrow it fails. I've run into that and one of the things I've learned that leads to flaky tests is unnecessary randomization. I came to this enlightenment, if you will, through the advice of others, such as Aaron. Aaron, do you recall these events?
Aaron Saray (01:03):
I do. I remember you running a test about 40 times, it passing. And then one time failing and then, "Hey, can we take a look?" And every time we look at it together it works and get off the Zoom call and then it's not working again. Send it out to CI/CD and it fails.
Joel Clermont (01:21):
Not good. Like, if the whole point of tests is to build confidence and to make us move faster, this is the opposite. And to be honest, this flakiness is pretty random and doesn't happen a lot. I would say in my case, it happened just little enough that I never really felt a need to go figure it out. So I go, "Oh, I'll just run the test again," and then they pass and like, okay, not a big deal. So maybe you bump into it like once a week or something. It wasn't like every other time it failed. It was flaky in the flakiest sense of the flakiness that there can be in a test. That's a term, look it up. It's in Martin Fowler.
Aaron Saray (02:02):
No, it made me think of a croissant or something. So flaky.
Joel Clermont (02:06):
Ooh, flaky pastry. Very good. Let's dig in a little bit as to why... because randomization on the surface is a necessary thing in tests. Let's take it back a step. Where randomization often comes into play is in the factories. We use factories to make it easier to build up the state of our application that we're trying to test. We're sort of led down this path of randomization by the fact that it's really easy. In fact, I think the docs even show like, "Here's how you use Faker in your factories." Because if you're hard coding a name in a user object, well, you don't want all your users to have the same name, so you have to have some randomization. But I guess the thing to balance is when to have randomization. When is it useful and when might it lead to some of these flaky tests?
Aaron Saray (03:02):
Well, actually that's a good question because I have unsurprisingly very strong opinions about that.
Joel Clermont (03:10):
Never.
Aaron Saray (03:13):
Basically here's what I do when I look at whether something should be random or not. Is, is this piece of data used to make a decision in my product? If it is, I don't want it to be random because I want the decisions to always flow in the same direction for my test when I'm testing that decision. Whereas, if it's something that doesn't affect my business logic, I will go and make it random. I can give you some examples, right?
Joel Clermont (03:42):
Yeah, that'd be helpful.
Aaron Saray (03:43):
Let's just say you have a user object and in your application it has a first and last name, maybe an email address and it also has a state. And your product figures out taxes for a shopping cart. Now, things that maybe don't matter so much are the first and last name so those could be random. Like, this isn't just the application for John Smith, it can be any of these names that you make up. So you might use Faker for that. Give me a first name or last name, just kind of nice to have some random data to see how things might react. An area where you might want to start considering whether they should be random would be the email address. It might be random, but it might have to be forced to be unique. So that's kind of that middle case.
And one of the cases where I would think you wouldn't want to make it random would be the state. Because perhaps each state and each municipality has different taxes, right? We're going to say we're always going to assign it to a known state, let's just say Wisconsin, because we know that if we check out something and it's always Wisconsin, the tax rate's always going to be the same. Whereas, if we did that as a random state it might pick Wisconsin with a 5% tax rate, it might pick Illinois with a 5% tax rate, it might pick California with a 9% tax rate, on one of those tests. And suddenly your calculation just isn't correct.
Joel Clermont (05:13):
I like that example because it makes it a bit more concrete. Let me pose a scenario though. What if, currently my app has no logic based on state? Is this the kind of thing where you might even try to predict the future a little bit and be like, "Hmm, I'm not doing this now. But this is a thing I've done enough in the past."? Like, where state impacts logic that I just won't randomize that? Or, do you randomize it until it's a problem? Like, how do you make that decision?
Aaron Saray (05:45):
Yeah, that is a tough question. That's a good question too because we don't want to do pre optimization.
Joel Clermont (05:52):
Sure.
Aaron Saray (05:52):
We don't want to do things YAGNI, you ain't going to need it. But I think there are things you can kind of... like you said, "If you've done enough and you understand the business maybe you're working with, you kind of have an idea." There are a couple of little tips too that I've seen that kind of make this a little better and a little bit easier. Which would be the way you refer to these items, how you name them, right?
Joel Clermont (06:16):
Okay.
Aaron Saray (06:17):
For example, if you're going to name something, it doesn't matter what business you're in, but if you're going to name it something something code. Like a standard code, or a text code, or a country code, or something that says code. That just screams to me that's a business's way of measuring the differences in something. They need code to be able to say there's a different process here. You know, you might not have code around that per se, decisions around that, but that's a good one that you might know that, "Maybe I shouldn't randomize that." I would say don't go crazy trying to figure out every single possible thing you might do logic on. But there are some middle grounds where you're like, "Chances are this means something."
Joel Clermont (06:58):
Yeah. Roles was another one that we've talked about where even if it's like role ID or something like that, don't randomize that. That's going to cause you a world of hurt. That's where states are important. We have that ability to define states against our factories and just be super explicit. Like, maybe the default user has a role of whatever makes it like a normal user. And if you have an admin, or a manager, or something like that, just create additional states for that to track that so you can pull in the right thing in your test. And it makes your tests read nicer too.
Aaron Saray (07:32):
Right, that's a good idea.
Joel Clermont (07:34):
Another place I've seen flaky tests bite me... and just to point out something too. Sometimes a flaky test goes the other direction. Something that normally passes will occasionally fail, it's kind of what we normally think of. But there's this other weird flaky test where something's passing all along and it kind of shouldn't have been. Then you make a change in your logic and it starts failing and you realize your test was actually a false positive. It was passing for the wrong reason. What are some scenarios where you've seen that? And how do you guard against those other types of flaky tests?
Aaron Saray (08:15):
Sure. So one of the very common places where you can see that is if you're going to retrieve something and maybe you want to test the order of it or you want to test some filtering, right? So you go and use a factory and you create one object, and now your database contains one item. And then you retrieve all of them and you say, "Yeah, it's in the proper order." Well, of course, because there's only one of them or if you create them in the same order that you're going to sort them. I create three in a row with incrementing IDs and then I say, "Order this by date created." Well those are going to be always the same order so I might consider creating them in a different order, changing up the dates or something like that, so that when I retrieve my data, I can actually tell it's been sorted. Same thing when you do a filter.
So you say, "Give me all of the models from, state equals, Wisconsin," but you only made models where state is Wisconsin. But is it going to only be all those? You could remove that filtering logic and the test would still pass. That's why when we've been working together on some of our projects, you'll see I'll set up not only the data I know I want to retrieve and what data, I'll also set up alternate data as well. Another place that that comes in useful is if you have models that are related to, like, a parent model, and you say, "Give me all the child models from an endpoint." I'll create maybe three or four child models so we can test our ordering and our sorting. But I'll also create another one or two interspersed in that whole setup that belonged to a different parent. That way I can make sure that when I retrieve that data, it's ignoring the ones that are not from the parent I want and it's still sorting it.
Joel Clermont (10:08):
Yes, that's definitely a good practice to get into. You don't want to go nuts with it. Like, where does it end? Like, do I have to create a hundred unrelated things? No, just one or two. Like you were talking about some of the important places to do that. Ordering, filtering, things like that, where having an extra thing in your results set would actually... should cause the test to fail. Just one tip that I've picked up or sort of fine-tuned is, when you're creating those extra objects in your test, you can even name them. Like, someOtherUser or unrelatedData, or things like that so when you're reading the test you won't scratch your head like, "Why are we creating these other things that have nothing to do with what we're actually testing?" Even in PhpStorm, it might even be gray. Like, "ooh, you're not even using this variable, you could just remove it." But if you name it the right thing, then it conveys to the person reading the test like, "Oh, this is done for this reason. It's to make sure we don't get extra data."
Aaron Saray (11:11):
Yeah, self-documenting code basically.
Joel Clermont (11:13):
Perfect. Love it.
Aaron Saray (11:21):
There seems to be a resurgence in things that were cool in the past and kind of went the way of the dodo bird. Some of those now are TV shows, there's a bunch of remakes. But I want to think about before the remakes, right?
Joel Clermont (11:35):
Okay.
Aaron Saray (11:37):
Can you think of any shows that maybe as a kid seem awesome or great, and now as adults when you think about it and you're like, "What was going on?" I can give you some examples just to kind of get you on... First of all, there was that live-action Mario Brothers TV show.
Joel Clermont (11:53):
Oh yeah.
Aaron Saray (11:54):
I remember really thinking that was cool and now when you watch something on YouTube, you're like, "What is going on?" Or, let's be honest, all of the Power Rangers. Just like, "What are you guys doing?" Even the remake, which was much better I guess. But it's just, "What's going on here?"
Joel Clermont (12:17):
The movie?
Aaron Saray (12:17):
Or shows. Disney's Gargoyles, an animated show where Gargoyles came to life and saved a city. Another one I remember watching was Bots Master. It was five robots, I think, or four robots, that somehow lived with a child. And then they combined into a mega-robot and then them and that child saved the city?
Joel Clermont (12:42):
Okay.
Aaron Saray (12:43):
Can you think of any shows that maybe you liked when you were little and now when you look back you're like, "What was going on?" Was it just a different time back then?
Joel Clermont (12:52):
Yeah, that's part of it. I would also say that I'm probably not the best person to answer this because there are shows that have been remade where I still like them, even though as an adult, I should know they're cheesy. For example, MacGyver. Like, I loved that a kid and I even watched the new one. I like it and it's stupid, and I know it's stupid and it's cheesy, but I still like it. Boy, let me think harder. I can think of some shows that hold up. Like, Sesame Street is still solid, man. My kids watch that now and we've pulled up some of the old ones. That's still solid. Muppets, same sort of thing.
Aaron Saray (13:34):
Which is strange when you think about the history of HBO and now Sesame Street's home is HBO. Like, what's going on?
Joel Clermont (13:43):
Yeah. There is an Elmo after dark show, but it's still clean, it's like a talk show. A lot of the examples you gave were cartoons and clearly kid shows. No, I can't think of... I guess I watched only Highbrow TV. You probably watched Masterpiece Theater as a kid like me and-
Aaron Saray (14:06):
Nope. Never did.
Joel Clermont (14:09):
Another one that jumped in my head was that show Perfect Strangers. I loved that-
Aaron Saray (14:14):
Oh yeah.
Joel Clermont (14:14):
... as a kid. Man, I saw a clip of that the other day, because there was like some funny phrase I was trying to share with my kids that have never seen the show, that don't even know the characters. I'd found the clip on YouTube, I'm like, "Wow, this show is pretty bad," and I think a lot of sitcoms from that era. Alf, that one just jumped into my head. I loved that as a kid. And that's almost... No, it's too bad I couldn't watch it now even for nostalgia. It really is dumb. Yeah, Alf. But I will still love MacGyver.
Aaron Saray (14:47):
Do you have your own phantom test that seems to fail one out of a hundred times?
Joel Clermont (14:51):
We can help. Book a free consultation with us on our website nocompromises.io.