We present a dynamic data-driven framework for tracking gestures and facial expressions from monocular sequences. Our system uses two cameras, one for the face and one for the body view for processing in different scales. Specifically, and for the gesture tracking module, we track the hands and the head, obtaining as output the blobs (ellipses) of the ROIs, and we detect the shoulder positions with straight lines. For the facial expressions, we first extract the 2D facial features, using a fusion between KLT tracker and a modified Active Shape Model, and then we obtain the 3D face mask with fitting a generic model to the extracted 2D features. The main advantages of our system are (i) the adaptivity, i.e., it is robust to external conditions, e.g., lighting, and independent from the examined individual, and (ii) its computational efficiency, providing us results off- and online with a rates higher than 20fps.