The past decade has seen dramatic growth in systems that collect data from human activities. Online social networks record not just friendships, but interactions, messages, photos, and interests. Mobile devices track location via GPS information. Online stores monitor millions of customers as they explore and transact. Sensors, wearable and otherwise, produce detailed behavioral data. Collectively, this provides ever-larger collections of human social-activity information -- we refer to this as Big Social Data. While Big Social Data is growing rapidly, the available processing resources -- CPU, memory, communication -- are growing at a slower pace. To realize the promise of big social data, we need algorithms that use only sublinear resources, that is, resources growing much less than the growth of the data in suitable parameters. Designing these algorithms will be the core activity of this research project. This work will be in consultation with practitioners handling Big Social Data, leading to many opportunities for technology transfer. The research program both enables and benefits from an education and outreach program that will help develop the new breed of algorithmically-trained data scientists for Big Social Data.Emerging systems -- MapReduce, Hadoop, Spark, Storm, etc. -- use large scale distributed computation: clusters of machines not only gathering and storing data in parallel, but also working together to perform computations. Often, these systems and applications work via incremental processing, storing and returning only approximate solutions, trading off quality and certainty for efficiency. In addition, these systems take a data-centric view, wherein the data is stored as pairs. This project will address fundamental problems with Big Social Data -- search, ranking, and optimization, etc. in these modern computing and data models. For these problems, this project will design algorithms that are sublinear in the relevant parameter -- number of keys, size of values, computing time per key or over all keys, and other variations that map to underlying storage, number of machines, bandwidth and other computational constraints.For further information, see the project web site at http://www.stanford.edu/~ashishg/socialdata.html .
|Effective start/end date||9/1/14 → 8/31/18|
- National Science Foundation (National Science Foundation (NSF))