Transcript Part 1
Hello everybody. My name is Mohit Deshpande. And in this video, I want to kinda introduce you guys to the concept of image segmentation.
So to motivate this discussion, here is an image of a wallet on a table. And so as humans, we can easily differentiate between the wallet and the table, and we kinda know where one begins and one ends. And I can actually draw this outline, pick a nice bright color here. So here is the wallet and here is the table. And so as humans, like I said, we can easily differentiate between the two, but this is a bit more difficult for a computer. Because remember that computers only have access to the raw pixel data of this image. So if we wanna perform any image processing operations on the wallet, we’re at kind of a loss, because we first need to separate the wallet from the table. We don’t wanna apply the same operations to both the wallet and the table. We just want to apply it on the wallet. So this particular challenge of separating the different parts of an image is called image segmentation.
So image segmentation. Let me see if I can get that m up there actually. There we go. So it’s a problem of image segmentation. That’s like, given an image, we want to separate the image into different parts. We wanna kinda carve out the different parts of the image based on some of the criteria. And just as a terminology thing, if we’re looking for one object in particular, let’s just call the object of interest, which is, pretty self-explanatory there. So if I just wanted to get this wallet, I can carve out this portion of the image, like what I did here. However, like I mentioned, this is actually a bit more challenging for a computer, ’cause we have to tell the computer exactly how we want to make this, how do we wanna make this determination. And this is part of the reason why this is so challenging.
And to kind of further explain why this is challenging, let’s consider a different image here. So now I have a lot of different objects that are on, on this table. And there’s still lots of research done in this field of computer vision, particularly it’s called SDS: Simultaneous Detection and Segmentation. And so there’s kinda two parts here. There is the detection part. So the detection part would say, this is a card. This is a pencil, pencil. This is also a pencil. And this is a pen. So that’s kind of what the detection part of SDS is, but we’re not gonna be getting too much into that. But the interesting part that we’re concerned about is the segmentation part. And that is separating this, separating the different parts of the image.
So here is a foreground object, it’s a card. And here’s another object here, this is a pencil. And then there’s some questions that come into play like, these are both pencils, so should I be drawing these boxes individually around them? Or should I draw one big box and carve out one big region for both of them, because they’re both labeled pencils? So these are some of the kind of questions that come into play. And here is just a pen here. So this is kind of the problem of simultaneous detection and segmentation. We’re not gonna be worrying about the detection part, but we’re particularly interested in the segmentation part. And so in the next video, we’re going to look at, probably the simplest form of image segmentation called thresholding. So I’ll get to that in the next video.
Transcript Part 2
Hello everybody. My name is Mohit Deshpande, and in this video I want to introduce you guys to this concept called contour detection.
So on the bottom left you see one of the test images that we’ve been using. It’s just a picture of a playing card, some pencils and a pen, and on the right is actually the same image, but thresholded. So you can kind of see the difference between the two, and the reason that we need contour detection and contours is because thresholding just isn’t enough. If we look at the thresholded image, that gives us a good indication of where our object of interest is, visually, assuming we can see this, but that’s still not quite enough because you notice there’s still some other errors that are kind of over here. And so we want to go one step further, and that’s to actually figure out what this contour is. So let me put it in red. And the goal is to actually draw this boundary here, this contour. I’m doing kind of a bad job at it, but in reality, it’s gonna be much, much smoother than what I’ve drawn it as, but the goal is to detect this contour here. Because then when we have this contour, then we know the dimensions of our object of interest, the total perimeter around it, and we can do all sorts of image processing tasks to it and completely ignore these. So I’ve been saying the word contour a lot, but let me actually define it.
So a contour, a contour is just a closed curve along a boundary of color or intensity. And intensity is just the same meaning as color, except in a gray scale context. Intensity is just another image metric, but in our case we can use either. So the apparent change of intensity would be like going from black to white, that’s a change in intensity, because black has a value of zero and white has a value of 1 or 255. And so…Well I can mention that a contour is a closed curve along a boundary of color or intensity, and intuitively that’s what the computer looks for when it’s doing this contour detection, because intuitively that boundary of color is a very good indicator of a contour, or a closed curve like this of some shape, for example. In this case, it happens to be a shape like a rectangle, but it doesn’t have to be a shape, it just has to be a closed curve thing, like what I’ve drawn here. And so we can draw a contour around this playing card, and what we get is, more concretely, it’s a list of points. It’s a list of points or coordinates in the curve, or I should say that define the curve. For example, the curve is this list of coordinates, and so the contour is more concretely the list of points or coordinates that define the curve and define the boundary. And the details of contour detection are actually fairly complicated, so I don’t really want to get too much into the details, but there are a few parameters that are very useful to know and understand.
And so in the next few videos, we are going to be looking at a couple of the parameters for contour detection, and then another topic about how we can fit the contour to any shape, so we’ll be discussing first the parameters in the next videos.
Transcript Part 3
Hello everybody, my name is Mohit Deshpande, and in this video, I want to show you guys what we will be building.
What I have here is… I have some images here, and in particular, I have something like this. Oh one thing I should mention is that some of these images are actually not mine. They’re taken from… On Github there’s this arnab.org, and so I’ve borrowed some images, so I should give credit where credit is due and mention that these aren’t really mine. But, I just want to mention that these are kind of the images that we’ll be working with. You know, and in particular there’s already lots of code out there on Github. Particular, you know people have actually already build images like this. This is also by the same person, arnab.org on Github, and what he’s done is actually built this kind of training set of all of the cards, in playing cards, in a deck of cards.
So, we’re actually going to be using that to compare with our new input here. We can compare these images to kind of our like our training set here, and we’ll be able to detect which cards have been selected. And we’ll actually be able to see which cards have been selected. And so, let me actually run this and we can see the demo. So, when I run this, it’s going to be really cool because… Just a second, yeah there we go. You can see that, what’s really cool about this if you look at the title, it’s actually representative of what card we’re looking at. This is a 9 of spades and we’re looking at 9, S for spades, or like 7, H for hearts, or 8, S for spades, 3, H for hearts. So, let’s actually see if these are correctly being detected. So, if I open up Test… See we have 3 of hearts, yeah, that’s this one. We have our, whoops… There’s so many images here. What else, we have the 9, or we have 7 of hearts which is this one right here. and then we have what? We have our 9 of spades, and then we have our 8 of spades. Let me find that, that 8 of spades. There we go, our 8 of spades. And so you can still, we found all of our images. We were able to detect the playing cards on the deck. And in fact one extra thing that we did was, notice how these are all nice and square, they’re flat? That’s not the case with these images, right? These are tilted, these are kind of at an angle, and so we’ll also learn how we can de-warp these images, so that they look nice and flat like this. And so you know, from here, this is where, from here you can kind of go on. So now that we’ve detected the cards, this, you know you can take the next steps, and once we have the card detection going, this is where you can go forward with this. Use this kind of like a baseline for any other projects that you might use that involve playing cards for example.
So, this is a really good detection scheme. What we’re going to be talking about in the next few videos is we were going, or… I’m going to give you some code first of all. I should mention that you don’t have to code all of this from pure, from like absolute scratch. You’re not going to have to do that. I’ve actually have some code that I’m going to provide you guys. and then there’s going to be, we’re just going to kind of fill in portions of the code, in particular, the training of our model. We’re kind of building like a really, really small machine learning model that we’re just going to be using to compare. I don’t want you guys to worry about that. We didn’t cover any of that. So, I’ve just coded that up for you. But, what we’ll have to do is we’ll have to build the… Well to write the code that will actually detect this card and then do this de-warping thing. So, that we can actually compare it to what’s in our model. And the result you’ll see is going to be all of… You know we can see all of these images here. and we get a, let me pull this up here. And we get a list of tuples, where the first thing in the tuple is what number it is, and the second thing is what suit it is. This is something that we’re going to be building, and I’ve already provided some code for you, and I just want to show you what it’s going to look like. You can use this as a kind of the base or foundation or framework for any other projects that you’re going to be using regarding cards.
Anyways, this is kind of the app that we’re going to be building. In the next video, I just want to kind of… Like I said I’ve written a ton of code for you guys that we can just use. In the next video, I’m going to explain what code is there, so that you kind of have a better understanding. And then you know off we’re going to go. We’re going to go ahead and build our app here. On the next video like I said, I just want to kind of introduce you to the code that’s currently there.
Transcript Part 4
Hello everybody, my name is Mohit Deshpande, and in this video I want to go over some of the code that I’m going to be providing you guys so that you don’t have to worry about getting all of the small machine learning models set up.
You don’t have to worry about that. I’ve already provided everything for you. But I kind of want to go through this step by step, kind of, so that you are at least familiar with the code here. And so, the first thing you might notice is that this is actually not that many lines of code to get something this cool up and running, and that’s kind of the standard that you can expect with OpenCV, is that it doesn’t take a lot of code to get something really cool up and running.
So, as you look through this code, you can see well, hey, this code is what we’re going to have to write, probably, because it’s currently not really doing anything. So, you know, we’re going to have to write this code here and, actually, let me return an empty list here, so that we know that this will actually still function.
So just to kind of go over some of this code here, first thing’s first, we have to kind of build some model, some, you know, means of looking at all of our images and our training data And our labels so that we can correctly know that, well this card looks like the king of spades, and that’s really what it is. So I provided you this training image and this comma-separated value list. I should say that it’s separated by tabs, actually. Tab-separated values list that correspond to the label so all that is there and this model just kind of extracts all that data and builds kind of our model that I’m calling our machine learning model. But anyway that’s what this, whoops, let me scroll back here, that’s what this function does, but you know, there are some utility functions that are here. In particular, there’s this reorder function. Basically, what this reorder function does is there’s a comma here that says “Reorder points in a rectangle in clockwise order to be consistent with OpenCV.” And that’s really what this does is when we do any kind of detection with, or contour we have to make sure that we’re consistent in OpenCV in particular, OpenCV likes it when our rectangles start at the top left, and then the points are in clockwise order. I mean, we need this function because what we’re going to do is, when we detect a contour for our cards we’re robust, we want to be robust to if they’re tilted or if there’s any sort of perspective tilt to it.
You notice that when I ran the full code, you notice that when we saw our images they weren’t tilted or anything. They were actually perfectly square, and so this function, we’re going to need that so that we when can detect our contour around our card we make sure that we get the points in just some order we can order the points so that they fit OpenCV’s scheme so that then we can just apply what’s called an affine transform. We can apply that so that we kind of square up our corner of our images. So that’s kind of basically what this function does. It’s a utility function that reorders our points so that we’re consistent with OpenCV so that we can get that nice square image.
There’s this preprocessing function, that’s actually really common with a lot of computer applications. There’s some sort of cleaning that’s basically done here. And what we’re doing here for this kind of cleaning is, first of all, we’re converting to greyscale, then we’re applying a little bit of blur and, you might be saying, “Well, why do I want my image to be blurry? That seems kind of counterintuitive.” And the reason behind this is so that you kind of get rid of all of the small noise. I kind of talk about this when we’re doing contour detection that you want to apply a little bit of blur so that you don’t have to take, like, these really small contours that are just like these random image noise from like the camera that you were using at the time or something. So the noise, images can be kind of noisy, so this blur helps kind of smooth out, smooth out the noise. And then you see we have this adaptive threshold thing.
Hey wait a minute. What are we doing here with this adaptive threshold thing? And so what adaptive threshold is, I didn’t want to really get too much into this because it can be kind of complicated, adaptive threshold is a way that we can make our it’s a different algorithm used for thresholding and it’s used to help make our image a bit more, it’s a bit more robust to things like lighting. Because if you notice in the image with the random objects on my desk, you notice that one part of the image is kind of lighter than the other and the other part’s like darker. And so when we were computing like with just plain old thresholding, the issue is that it sometimes can be kind of hard to find a global threshold, and so what adaptive threshold basically does is it will kind of find a local threshold and do these threshold operations more locally instead of one giant, global threshold so that’s what adaptive threshold does. How it does it is a bit more complicated than regular thresholding, so I just kind of passed on that. But anyway, this is just part of the image cleaning and preprocessing steps so that the images, you know, are really nice.
So, here is, the next function is our comparison and this is how we’re going to look at two images A and B, and say “How similar are they?” So then what we’re doing here is, we just, again, apply a blur so that we kind of smooth out any noise. We take the absolute value of the difference of these two images, basically just the difference using matrices and we take the absolute value of each of the elements and then we apply another thresholding operation on that so that, you know, it just helps reduce any of, we’re trying to minimize noise, because when we’re dealing with this, you know, there’s a lot of opportunities for noise, for image noise, to kind of mess up our, mess up our results. So we’re trying to minimize that as much as possible. And then, this np.sum just returns a single value that represents all of the errors. It’s the sum of all our errors. This closest card will, given our machine learning model, then what this does is given any input image it tries to find the card that’s closest to this node, to the given card. And so what it first does, is it preprocesses this image, then it will sort our image, and what this does is it looks through all the image in our model and compares it to this input image, and it uses that comparison and then because it returns this single value we can sort on this value.
And so what we want to do is sort so that the first, here, the first value here represents the label that corresponds to the image that’s the closest to this image. So we use this to compute, this is a metric of image comparison, and the smaller this value is, then we know that these two images are really close together so we try to find the closest possible match and then return the number and the suite of this, of our closest match here. So that’s kind of what we are trying to do, and we can actually return the image itself. So, and we’ll get to how we do this image, but probably one of the most important functions here that we’re going to be writing is called extract cards and that takes this raw image, and it takes however many cards that are in that image, this is what we have to kind of fix ourselves, and what it will do is, what we want to return is a list a list of images that are, you know, de-warped so that there’s no, like, kind of bending to the image or, like, perspective tilting or anything like that. We want to return those nice, clean images so that we can see them.
And you might be saying, “Well, where’s this used?” And I say, well let’s find out. So if we look for extract cards, you see that it’s particularly used in our model and in actually executing the code. And so, you know, we’re, it’s kind of used throughout our app and it’s really important and so we have to make sure we’re very careful when we’re coding this. So you can see it’s used here so that we can actually see the images where we get all of the cards that are in this image so that we can see it. And then here is where we get all the cards in our image. Basically this line, we get all the cards in our image, and then we return the card that is closest to any of the, you know, this particular card here. So we basically get all the cards in our image and we pass is through our model and say, “Hey, which card is this? What is the number and the suite of this card?” And that is what our model will tell us. So that is basically the code that I’ve given you and so in the next few videos we’re actually going to be coding this function. And it’s not actually surprisingly not going to take that many lines of code to get this up and running. But it’s a very important function, and we have to make sure we’re coding this appropriately.
So, in this video, I just kind of wanted to show an overview of the stuff that I’ll be giving you, the code, so just to go from top to bottom here. This just is a utility function that helps reorder our points when we get our contour. This preprocessing stuff just cleans up our image a bit. The image comparison is actually used to find the closest card. You want to basically find the card that has the smallest error to our given card in our training data, or our machine learning model. And this training data, this train function, just returns our machine learning model based on these two key pieces of data that I’m going to be giving, the images and then the correct ground truth label, so to speak. And then, you know, this just helps you to visualize what the resulting cards are going to look like. So, anyway, that is kind of this provided code here and then in the next few videos, we’re going to be filling out this function here.