Alexis Gabadinho’s “Workshop on Sequence Analysis and TraMineR

October 11, 2013 – Two months ago, Mr. Alexis Gabadinho, a scientific collaborator at the Institute for Demographic and Life Course Studies at the University of Geneva. We are excited to provide a recorded video of the workshop here on our website.gabadinho_pic

At the workshop, Mr. Gabadinho introduced TraMineR, a downloadable software package in R. Mr. Gabadinho is one of the creators of this program. He demonstrated the versatility of TraMineR  as a software and introduced the use of sequence analysis as a method of analyzing longitudinal data.

The video to this workshop is split into two parts. You can find part one of Sequence Analysis workshop here: http://videostreaming.gc.cuny.edu/videos/video/1235/

Part two of Sequence Analysis workshop can be watched here: http://videostreaming.gc.cuny.edu/videos/video/1236/

Additionally, Mr. Gabadinho has generously shared his handouts with us. You can download them here:  As a friendly reminder, be sure to use the proper citations when referencing Mr. Gabadinho’s presentation and handouts. You can download the handouts here: Handouts for Workshop on Sequence Analysis and TramineR

Professor Robert Stine’s Seminar: “Featurizing Text”

October 25, 2013 – Robert Stine, Professor of Statistics at the Wharton School of the University of Pennsylvania, presented his seminar, “Featurizing Text: Converting Text into Predictors for Regression Analysis.” Professor Stine has appeared in numerous journals, including the Journal of the American Statistical Association, Journal of the Royal Statistical Society, and the Annals of Statistics.

In his presentation, Professor Stine introduces three ways of converting text into numerical values. He guides the audience through the methods of counting words, principal components analysis of word counts, and the forming of eigenwords from sequences of words. These new predictors can be then used to build regression models.

Be sure to watch the seminar if you were unable to attend the seminar. You can find Professor Stine’s seminar here: http://videostreaming.gc.cuny.edu/videos/video/1079/

Want to learn more about Professor Stine’s work, read his draft manuscripts, or look through his presentation slides? You can find them on his website here: www-stat.wharton.upenn.edu/~stine

Professor Lev Manovich’s Seminar: “How to see 2 million Instagrams?”

October 2, 2013 – Lev Manovich, Professor of Digital Humanities at the Graduate Center, CUNY, presented his seminar, “How to see 2 million instagrams.” This was the first seminar of Fall 2013. Professor Manovich’s presentation is the first about data visualization within the CUNY Data Mining Initiative’s seminar series.

In the talk, Professor Manovich elaborated on how images and user-generated content can be analyzed to understand patterns and trends. Features can be created from these images. In turn, these features can be visualized. These new visualizations are almost art forms in themselves, but they can also be useful in helping the researcher analyze data about a set of individual images.

If you were unable to attend the seminar, then be sure to catch the talk here: https://videostreaming.gc.cuny.edu/videos/video/942/.

You can also learn more about Professor Manovich’s work on his website: www.manovich.net.

Fall 2013 Events

We have an exciting lineup of speakers for our Fall 2013 Seminar Series. Be sure to check them out below and reserve your spot early!

All of our events are free and open to the public. If you would like to RSVP, you can do so at cunydatamining.eventbrite.com. Alternatively, you can email us with your name, organizational affiliation (if any), and the event name(s) at datamining@gc.cuny.edu.

We look forward to seeing you at the event!

FORTHCOMING EVENTS

SEPTEMBER 27, 2013, 12 Noon @ The Graduate Center, Room 6112

How to see 2 million Instagram photos? Visualizing patterns in art, films, mass media, and user-generated content.

Speaker: Professor Lev Manovich, The Graduate Center, CUNY 3181071223_aa4132bae0

Bio: Lev Manovich is the author of Software Takes Command (Bloomsbury Academic, 2013), Soft Cinema: Navigating the Database (The MIT Press, 2005), and The Language of New Media (The MIT Press, 2001) which is described as “the most suggestive and broad ranging media history since Marshall McLuhan.” Manovich is a Professor at The Graduate Center, CUNY and a Director of the Software Studies Initiative at CUNY and California Institute for Telecommunication and Information (Calit2).

——–

OCTOBER 11, 2013, 9 AM to 5 PM @ The Graduate Center, Room 4102

Workshop on Sequence Analysis and Traminer.

Speaker: Mr. Alexis Gabadinho, Scientific Collaborator at the University of Genevagabadinho_pic

Bio: Alexis Gabadinho holds a postgraduate diploma in demography. He is a scientific collaborator at the Institute for Demographic and Life Course Studies at the University of Geneva where he is finishing a PhD on methods for sequence analysis. He is also a junior researcher at the Life Course and Inequality Research Center at the University of Lausanne.

He is a developer of the TraMineR R package for sequence analysis and has taught sequence analysis in doctoral schools at the University of Bristol, Geneva, Lausanne and Lille and in conferences and postgraduate courses. His research interests are the application of data-mining methods in social sciences and the development of methods for categorical state sequences analysis. He worked in particular on measures of sequence complexity and methods for summarizing sets of sequences. His current research is focused on the development of Markovian model oriented methods for sequence analysis that are made available in the PST R package.

——–

OCTOBER 25, 2013, 12 Noon @ The Graduate Center, Room 6112

Featurizing Text: Converting Text into Predictors for Regression Analysis.

Speaker: Professor Robert Stine, Wharton School of the University of Pennsylvania  stine-pic

Bio: Robert Stine is Professor of Statistics in the Wharton School of the University of Pennsylvania.  His research spans a variety of areas with practical applications, ranging from forecasting and spatial temporal models to fundamentals of multiple testing and methods for text analysis. Recent projects consider methods for selecting factors for predictive models from large databases, with particular relevance to the selection of factors that produce cost-effective decisions.  These methods are crucial in the development of predictive models in data mining.  His research has appeared in numerous academic journals, including the Journal of the American Statistical Association, Journal of the Royal Statistical Society, and the Annals of Statistics.  His teaching has been recognized by awards in both the Wharton MBA and Undergraduate programs, and he is the co-author of a recent textbook Business Statistics: Decision Making and Analysis.

 ——–

NOVEMBER 22, 2013, 12 Noon @ The Graduate Center, Room 6112

Optimal Dissemination on Graphs: Theory and Algorithms.

Speaker: Professor Hanghang Tong, City College, CUNY  Hanghang Tong - pic

Bio: Hanghang Tong is currently an Assistant Professor of  Computer Science at City College, City University of New York. Before that, he was a research staff member at IBM T.J. Watson Research Center and a Post-doctoral fellow in Carnegie Mellon University. He received his M.Sc and Ph.D. degrees in Machine Learning from Carnegie Mellon University in 2008 and 2009, respectively. His research interest is in large scale data mining for graphs and multimedia. He has received several awards, including best paper award in CIKM 2012, best paper award in SDM 2008 and best research paper award in ICDM 2006. He has published over 70 referred articles and more than 20 patents. He has served as a program committee member in top data mining, databases and artificial intelligence venues (e.g., SIGKDD, SIGMOD, AAAI, WWW, CIKM, etc).

Professor Andrew Gelman’s Seminar: “Causality and Statistical Learning”

April 26, 2013 – Andrew Gelman, Professor of Statistics and Political Science at Columbia University, presented his seminar, “Causality and Statistical Learning.” This was the final seminar of the Spring Semester, but Professor Gelman certainly delivered an entertaining and informative talk. If you were not able to attend the talk, be sure to catch it here: http://videostreaming.gc.cuny.edu/videos/video/693/.

In the talk, Professor Gelman spoke about disagreements about methodology between different disciplines. On the one hand, experiments are often seen as the gold standard for establishing causal arguments. Yet, many social scientists opt to use observational or survey data to make causal inferences. How then do we negotiate between different internal and external validity?

If you want to find out more, then be sure to check out the video! One thing to note about this video is that, unlike our previous ones, this video does not offer a separate window to view the presentation slides. Unfortunately, we had a bit of technical difficulty with that. However, some of the slides can be found on Professor Gelman’s website here: http://www.stat.columbia.edu/~gelman/presentations/causaltalk3_handout.pdf.

And lastly, be sure to check back to our website over the summer when we will announce our Fall schedule!

Anthony Babinec’s Seminar: “Neural Networks: From Basics To New Developments”

March 15, 2013 – Mr. Anthony Babinec, President of AB Analytics, presented his seminar, “Neural Networks: From Basics To New Developments.” This seminar was both a strong introduction to neural networks for new data miners and an informative examination of how neural networks can be used in data analyses for those more experienced. If you missed the talk, be sure to catch it here: https://videostreaming.gc.cuny.edu/videos/video/528/

Mr. Babinec gave an overview of neural networks and some of its applications. Drawing from his years of work in statistics and experience in consulting firms like IBM, Mr. Babinec showed how neural networks can outperform linear and logistic regressions. Additionally, Mr. Babinec also demonstrated examples of neural networks using sample data sets with SPSS Modeler and SAS JMP. As he points out, both software packages have their pros and cons when it comes to model building and data analyses.

Want to learn more about neural networks? Then be sure to check out the presentation! Mr. Babinec has also offered to share the word document he used in his presentation. If you would like a copy, you can make a request by emailing us at datamining@gc.cuny.edu.

Professor Ryan Baker’s Seminar: “Educational Data Mining: Predict the Future, Change the Future”

February 15, 2013 –  Professor Ryan Baker (Teacher’s College) kicked off the Spring 2013 seminar series with his talk, “Educational Data Mining: Predict the Future, Change the Future,” at the CUNY Graduate Center. If you missed the talk, be sure to catch the video here: http://videostreaming.gc.cuny.edu/videos/video/448/. And if you enjoyed the talk, then do not forget to check out the Initiative’s other events!

In the talk, Professor Baker discussed educational research of online courses and educational software that tracks student performance. Educational data mining has proved useful in detecting student behaviors, such as frustration or not paying attention. Leveraging the substantial amount of student data gathered, researchers can model and predict certain student behaviors. Because data mining can be such a powerful tool for prediction, it can also be an important asset in educational interventions in the future. By modeling and predicting students’ off task behavior, for instance, educators can steer students back to their studies and improve academic performance in the long run.

Be sure to check out the entirety of Professor Baker’s talk if you have not done so already! Included below are the citations, listed by slides courtesy of Professor Baker. (In order to maximize your video streaming experience, you can change the viewing modes of the video. This will allow you to rotate between the PowerPoint slide and the recorded talk.)

Citations by slides:

–        36: Koedinger et al. (2008; 2010)

  • Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. (2010) A Data Repository for the EDM community: The PSLC DataShop. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press, pp. 43-56.

–        45: Baker & Yacef (2009)

  • Baker, R.S.J.d., Yacef, K. (2009) The State of Educational Data Mining in 2009: A Review and Future Visions. Journal of Educational Data Mining, 1 (1), 3-17

–        49: Corbett & Anderson (1995)

  • Corbett, A.T., & Anderson, J.R. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4, 253-278.

–        49: Sao Pedro et al. (2012)

  • Sao Pedro, M.A., Baker, R.S.J.d., Gobert, J., Montalvo, O. Nakama, A. (in press) Leveraging Machine-Learned Detectors of Systematic Inquiry Behavior to Estimate and Predict Transfer of Inquiry Skill. To appear in User Modeling and User-Adapted Interaction.

–        56: Razzaq et al. (2005)

  • Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N. T., Koedinger, K., Junker, B., … & Rasmussen, K. (2005, May). Blending Assessment and Instructional Assisting, Proceedings of the 2005 conference on Artificial Intelligence in Education: Supporting Learning through Intelligent and Socially Informed Technology, 555-562.

–        56: Mendicino et al. (2009)

  • Mendicino, M., Razzaq, L., & Heffernan, N. T. (2009). A comparison of traditional homework to computer-supported homework. Journal of Research on Computing in Education41(3), 331-358.

–        68: Baker et al. (2008)

  • Baker, R.S.J.d., Corbett, A.T., Roll, I., Koedinger, K.R. (2008) Developing a generalizable detector of when students game the system. User Modeling and User-Adapted Interaction, 18, 3, 287-314.

–        94: Baker (2007)

  • Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R. (2007) The difficulty factors approach to the design of lessons in intelligent tutor curricula. International Journal of Artificial Intelligence in Education, 17 (4), 341-369.

–        94: Cetintas et al. (2009)

  • Cetintas, S., Si, L., Xin, Y. P., Hord, C., & Zhang, D. (2009). Learning to Identify Students’ Off-task Behavior in Intelligent Tutoring Systems. Proceedings of the 14th International Conference on Artificial Intelligence in Education, 701-703.

–        95: D’Mello et al. (2008)

  • D’Mello, S. K., Craig, S. D., Witherspoon, A., Mcdaniel, B., & Graesser, A. (2008). Automatic detection of learner’s affect from conversational cues. User Modeling and User-Adapted Interaction18(1), 45-80.

–        95: Sabourin et al. (2011)

  • Sabourin, J., Rowe, J., Mott, B., & Lester, J. (2011). When off-task is on-task: the affective role of off-task behavior in narrative-centered learning environments, Artificial Intelligence in Education, 534-536.

 

–        95: Baker et al. (2012)

  • Baker, R.S.J.d., Gowda, S.M., Wixon, M., Kalka, J., Wagner, A.Z., Salvi, A., Aleven, V., Kusbit, G., Ocumpaugh, J., Rossi, L. (2012) Sensor-free automated detection of affect in a Cognitive Tutor for Algebra. Proceedings of the 5th International Conference on Educational Data Mining, 126-133.

–        100: Baker, Gowda, & Corbett (2011)

  • Baker, R.S.J.d., Gowda, S., Corbett, A.T. (2011) Towards predicting future transfer of learning.Proceedings of 15th International Conference on Artificial Intelligence in Education, 23-30.

–        100: Hershkovitz et al. (in preparation)

  • Hershkovitz, A., Baker, R.S.J.d., Gobert, J., Kauffman-Rogoff, Z., Wixon, M. (accepted) Student Attributes, Affective States, and Engagement in Science Inquiry Microworlds. To be presented at The European Association for Research on Learning and Instruction (EARLI) SIG 20 Conference.

–        101: Arnold (2010)

  • Arnold, K. E. (2010). Signals: Applying Academic Analytics. Educause Quarterly33(1), n1.

–        101: Ming & Ming (2012)

–        102: Dekker et al. (2009)

  • Dekker, G.W., Pechenizkiy, M., & Vleeshouwers, J.M. (2009). Predicting students drop out: A case study, Proceedings of the 2nd International Conference on Educational Data Mining, EDM, 9, 41-50.

–        102: Kovacic (2010)

  • Kovačić, Z. J. (2011). Early Prediction of Student Success: Mining Students Enrolment Data. Proceedings of Informing Science & IT Education Conference (InSITE), 647-665.

–        102: Marquez-Vera et al. (2012)

  • Márquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2012). Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence, 1-16.

–        108: Baker & Gowda (2010)

–        115: Baker et al. (2006)

  • Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R., Evenson, E., Roll, I., Wagner, A.Z., Naim, M., Raspat, J., Baker, D.J., Beck, J. (2006) Adapting to When Students Game an Intelligent Tutoring System. Proceedings of the 8th International Conference on Intelligent Tutoring Systems, 392-401.

–        115: Rodrigo et al. (2011)

  • Rodrigo, M.M.T., Baker, R.S.J.d., Agapito, J., Nabos, J., Repalam, M.C., Reyes, S.S., & San Pedro, M.O.C. (2012). The Effects of an Interactive Software Agent on Student Affective Dynamics while Using; an Intelligent Tutoring System. Affective Computing, IEEE Transactions on3(2), 224-236.