Estimation of camera pose is an integral part of augmented reality systems. Vision-based methods offer a flexible and accurate method for this estimation. Current vision based methods rely on markers to reduce the computation and increase robustness of the pose estimation. However, this limits the algorithm's applicability while being expensive since the markers also require maintenance. Alternatively, reconstructed scene features can be used for pose estimation but this can lead to a loss of accuracy. To avoid this we propose a two-stage balanced tracking method which does not require any visual markers in the scene. The first stage of our method is based on the sequential recovery of structure from motion which allows the system to learn the scene from a few frames in which the markers are visible. In the next stage, the learned features are used for camera tracking. The system ensures greater accuracy and reduces error drift due to its use of the HEIV estimator which is provably unbiased to the first degree. We also make use of a novel method for the detection and removal of outliers which are unavoidable in such systems. The experiments show the superiority of our method when compared to a nonlinear method based on Levenberg-Marquardt minimization.