About Us

Background of the CUNY Data Mining Initiative

With the development of multiple computationally-intensive methods for analyzing “big data” – including “Data Mining,” “Computer Learning,” and “Sequence Analysis” techniques – the goal of this project is to investigate how Data Mining and related methods can be applied to research in the social sciences. By bringing together an existing interdisciplinary team of researchers from computer science, applied statistics, sociology, and education, this project seeks to develop new strategies for employing these methods in educational and other social science research.  The CUNY Data Mining Initiative is funded by a grant from the National Science Foundation.

In order to achieve these goals and build a larger community of researchers dedicated to exploring how Data Mining methods can be utilized in research, this project hosts:

(1)    A seminar series bringing domain experts in the social sciences and in data mining from around the U.S. to brainstorm with participants about strategies involved in applying computationally-intensive techniques to the analysis of large longitudinal educational datasets

(2)    Workshops for researchers and doctoral students from multiple disciplines, drawn from several colleges, who along with national domain experts, will design and pilot approaches for analyzing large datasets.

(3)    A semester-long class for doctoral students in Data Mining and related methods, drawn from a consortium of research institutions, and co-taught by a computer scientist and a quantitative social scientist.

Our research agenda and findings will be systematically published on this website. Be sure to check back with us as our research evolves.

Principal Project Personnel

 Paul Attewell, (PI) is Distinguished Professor of Sociology and Urban Education at the CUNY Graduate Center. Prof. Attewell has published a number of influential articles, based on analyses of longitudinal student datasets, and holds a restricted data license from the National Center for Education Statistics that provides access to a huge database of student transcript data. He is the recipient of the Grawemeyer Award in Education and AERA’s outstanding book award. 

Robert Haralick (co-PI) is Distinguished Professor of Computer Science at the CUNY Graduate Center. He has published articles on pattern recognition, has developed computer algorithms for linear and non-linear manifold clustering, and most recently has developed a technique for database decomposition as a new method for data mining. He has served on the editorial board of IEEE Transactions on Pattern Analysis and Machine Intelligence, and other journals. Professor Haralick will provide intellectual leadership for the seminar and workshop and will co-teach the course for doctoral students.

David Rindskopf (seminar and workshop leader) is a Distinguished Professor of Educational Psychology at the CUNY Graduate Center and a statistician with particular expertise in missing data and analytical methods. He has served as editor of the Journal of Educational and Behavioral Statistics and is a Fellow of the American Statistical Association.

Mary Clare Lennon (seminar and workshop leader) is a professor of sociology and a statistician who has published about sequence analysis, and she will apply that knowledge to sequential analyses of transcript and student data. She has recently served as the SRC-SSRC Collaborative Visiting Scholar at the Centre for Longitudinal Studies, Institute of Education, at the University of London.

Andrew Rosenberg (seminar and workshop leader) is an assistant professor of computer science at Queens College, who does research on pattern recognition and Markov models.