Size-invariant single-shot object detection
Current face detection methods work well on real crowd scenarios in real-time where the constraints on faces to be found are narrow. Specifically, current real-time methods need a specification of the range of face sizes to be found. The underlying issue is that the single-shot object detection methods used for this task make assumptions on the size of the objects to detect.
We propose to research a real-time object detection framework that reduces these constraints. This research would explore ways to remove the implicit tie between object sizes and the network architecture of single-shot object detection methods. This could be done by finding ways to remove the tie between anchor boxes and receptive fields, by exploring alternatives to prior boxes or by some other method.
- How can the effective receptive field be calculated and what optimal tiling mechanism can be used?
- Can we learn a tilting mechanism and layer selection mechanism that would help the SSD model to be more scale invariant?