Huge amounts of social multimedia is being created daily by a combination of globally distributed disparate sensors, including human-sensors (e.g. tweets) and video cameras. Taken together, this represents information about multiple aspects of the evolving world. Understanding the various events, patterns and situations emerging in such data has applications in multiple domains. We develop abstractions and tools to decipher various spatio-temporal phenomena which manifest themselves across such social media data. We describe an approach for aggregating social interest of users about any particular theme from any particular location into 'social pixels'. Aggregating such pixels spatio-temporally allows creation of social versions of images and videos, which then become amenable to various media processing techniques (like segmentation, convolution) to derive semantic situation information. We define a declarative set of operators upon such data to allow for users to formulate queries to visualize, characterize, and analyze such data. Results of applying these operations over an evolving corpus of millions of Twitter and Flickr posts, to answer situation-based queries in multiple application domains are promising.