Object identification has been a hot topic in computer vision research for decades. Various applications, such as surveillance, scene interpretation, and advanced driving assistance systems have piqued attention.

The main issue and first stage in tracking are to reliably identify the item in various settings. However, tracking an object becomes challenging owing to varied backdrops, weather conditions, cast shadows, and occlusions.

  • Occlusion occurs when you want to view something but can’t because of a property of your sensor setup or an event. Due to the nature of the problem, how it expresses itself or how you deal with it will differ.

For example, an important component of dealing with occlusions in Object Tracking is designing an efficient cost function that can distinguish between the occluded object and the object that is occluding it. If the cost function is not right, the object instances may switch, resulting in the object being tracked improperly. Cost functions can be written in several different ways. Some approaches make use of CNNs, whereas others prefer greater control and aggregation characteristics. The downside of CNN models is that if you are tracking items from the training set in the presence of objects from other sets, and the first ones become occluded, the tracker may grab onto the erroneous object and never recover. Here’s a video that demonstrates this. The downside of aggregate features is that the cost function must be manually engineered, which takes effort and often extensive mathematical expertise.

Occlusion occurs in dense Stereo Vision reconstruction when a region is visible with the left camera but not with the right (or vice versa). This occluded region looks black on the disparity map (because the corresponding pixels in that region have no equivalent in the other image). Backdrop filling algorithms are used in certain ways to fill the occluded black region with pixels from the background. Because the pixels from the background filling method may be erroneous in some places, other reconstruction methods simply leave those pixels with no values in the disparity map.


  • If you’re working on a system that monitors items, occlusion happens when one of the objects you’re tracking gets obscured (occluded) by another. A car driving beneath a bridge, for example, or two people strolling past one other. In this scenario, the issue is what to do when an object vanishes and returns.
  • If you’re utilizing a range camera, occlusion refers to locations where you can’t see anything. Some laser range cameras function by projecting a laser beam onto the surface being examined, followed by a camera that detects the laser’s point of impact in the ensuing picture. That provides you with the point’s 3D coordinates. Because the camera and laser are not always aligned, there may be places on the investigated surface that the camera sees but the laser misses (occlusion). The issue here is more of a sensor configuration issue.
  • In stereo imaging, the same thing might happen if elements of the image are only visible to one of the two cameras. No range of data can be gathered from these places.

Final thoughts

In essence, there is a conceptual divide between humans and machines.

Every image is seen by a computer as a series of numbers, generally in the range 0-255, for each color in an RGB image. For each point in the picture, these values are indexed in the form of (row, col). So, if an object changes its position with the camera in such a way that some aspect of the object is hidden (for example, you can’t see a person’s hands), the computer will see different numbers (or edges, or any other features) and the computer algorithm will detect, recognize, or track the object will change.