Bike sharing systems, aiming at providing the missing links in public transportation systems, are becoming popular in urban cities. A key to success for a bike sharing systems is the effectiveness of rebalancing operations, that is, the efforts of restoring the number of bikes in each station to its target value by routing vehicles through pick-up and drop-off operations. There are two major issues for this bike rebalancing problem: the determination of station inventory target level and the large scale multiple capacitated vehicle routing optimization with outlier stations. The key challenges include demand prediction accuracy for inventory target level determination, and an effective optimizer for vehicle routing with hundreds of stations. To this end, in this paper, we develop a Meteorology Similarity Weighted K-Nearest-Neighbor (MSWK) regressor to predict the station pick-up demand based on large-scale historic trip records. Based on further analysis on the station network constructed by station-station connections and the trip duration, we propose an inter station bike transition (ISBT) model to predict the station drop-off demand. Then, we provide a mixed integer nonlinear programming (MINLP) formulation of multiple capacitated bike routing problem with the objective of minimizing total travel distance. To solve it, we propose an Adaptive Capacity Constrained K-centers Clustering (AdaCCKC) algorithm to separate outlier stations (the demands of these stations are very large and make the optimization infeasible) and group the rest stations into clusters within which one vehicle is scheduled to redistribute bikes between stations. In this way, the large scale multiple vehicle routing problem is reduced to inner cluster one vehicle routing problem with guaranteed feasible solutions. Finally, the extensive experimental results on the NYC Citi Bike system show the advantages of our approach for bike demand prediction and large-scale bike rebalancing optimization.