Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes

Eckstein et al., 2017

Source: Eckstein et al., 2017

Summary

  • Investigate how humans use information about scenes to guide search towards likely target sizes
  • Show that humans often miss missized targets, but not deep neural networks
  • Human behavior is a result of a useful strategy for rapidly discounting potential distractors
  • Links: [ website ]

Background

  • Animals have an unmatched abilty to visually search complex scenes
    • Humans rapidly process a scene, utilizing object relationships and global properties to efficiently guide search
    • Hypothesize that scene information also guides search towards likely target sizes, in addition to locations
  • If search is guided towards likely target sizes, mis-scaled targets would be missed more often, despite being larger

Methods

  • 60 participants viewed 42 rendered scenes, each with unique target object
    • 14 target objects, repeated 3 times with variations (color and viewing angle)
    • Presented with word of target, then given 1s to search the scene
    • Reported wethere target was present or not (50% of scenes contained target)
  • Experimental conditions:
    • One-third of scenes had targets consistent in size with the scene (normal)
    • One-third where target was enlarged by a factor of 3-4x (mis-scaled)
    • One-third where the scene was cropped and rescaled so the target matched the size in the mis-scaled condition (control)

Results

  • The hit rate of humans was significantly lower when target was mis-scaled (~70%) compared to normal (~80%)
  • Deficient for mis-scaled targets diminished in the last (of six) blocks
  • Minimum distance between fovea and target was the same between normal and mis-scaled conditions, indicating that observers were foveating on the target in both cases
  • Hit rate in control condition was almost perfect, indicating deficient in mis-scaled condition not due to feature-based changes
  • Results remained the same when only considering mis-scaled target objects that observers could reliably classify
  • Object detection DNNs (Faster R-CNN, R-FCN, YOLO) do not show reduced target probabilities when target is mis-scaled
    • DNNs had similar target object probabilities when target was present and absent, possibly because they do not take size consistency into account

Conclusion

  • Demonstrates that huamns use scene information to guide search towards likely target sizes, resulting in higher miss rates for mis-scaled targets, which does not occur for object detection DNNs
    • Could be a result of humans using likely target size to rapidly filter out potential distractors
  • Would be interesting if they also tested scaled-down targets, although correcting for feature-based effects might be more difficult
  • How does human performance change with different search durations?
  • Would DNNs better match human behavior if they were also trained to perform time-limited visual search?
    • Also the DNNs tested were overall not great at detecting the objects, with target object probabilities under 50%
Elias Z. Wang
Elias Z. Wang
AI Researcher | PhD Candidate