Automatic vertebra localization and identification in 3D medical images plays an important role in many clinical tasks, including pathological diagnosis, surgical planning and postoperative assessment. In this paper, we propose an automatic and efficient algorithm to localize and label the vertebra centroids in 3D CT volumes. First, a deep image-to-image network (DI2IN) is deployed to initialize vertebra locations, employing the convolutional encoder-decoder architecture. Next, the centroid probability maps from DI2IN are modeled as a sequence according to the spatial relationship of vertebrae, and evolved with the convolutional long short-term memory (ConvLSTM) model. Finally, the landmark positions are further refined and regularized by another neural network with a learned shape basis. The whole pipeline can be conducted in the end-to-end manner. The proposed method outperforms other state-of-the-art methods on a public database of 302 spine CT volumes with various pathologies. To further boost the performance and validate that large labeled training data can benefit the deep learning algorithms, we leverage the knowledge of additional 1000 3D CT volumes from different patients. Our experimental results show that training with a large database improves the performance of proposed framework by a large margin and achieves an identification rate of 89%.