Data Analytics Colloquium

Upcoming webinars

Big data problem taxes computation resources in many ways including RAM, storage, swapping, ability to parallelize, and the limitations of specific software packages to even perform the operations. There is no conventional way to estimate spacial models of this size. As a result we have been required to creatively reform matrix objects, use relatively obscure linear algebra relationships, break operations up into multiple discrete tasks, and consider hardware issues in new ways. The current solution is written in C++ code to run on AWS, which is labor intensive and ultimately expensive. Human  Read More

The biggest challenge in empirical work is to get our statistical models to correctly represent the politics of what we are studying.  For example, Donald Trump raised voter turnout.  So did Franklin Roosevelt and Adolf Hitler.  Strong preferences motivate voters to go to the polls.  Yet studies of elections nearly always analyze vote choices and turnout separately, missing the politics that mobilizes voters.  Researchers have long understood the theoretical limitation of doing so, but issues of parameter identification, computing power, unavailability of survey weighting, and complexity of  Read More

We present two studies arising from collaborations between the Rhode Island Department of Health and The Policy Lab @ Brown University to generate discussion of methods for causal inference in randomized and non-randomized studies. We present an adaptive randomized field experiment of SMS messages with roughly 160,000 people and a research design created by non-bipartite matching of about 3000 respondents to a series of surveys and associated sensitivity analysis of those results. We show some of the benefits that arise from collaborations between academics and governments as well as some of  Read More

Social network analysis is useful for identifying the structural relationships among social entries. To promote the use of social network analysis and develop better methods for social and behavioral research, we propose a general structural equation modeling framework in which a social network can be treated as a new type of observed variable. Under the framework, we have investigated the roles of social networks as a predictor, an outcome, and a mediator. In this talk, I will present three studies to illustrate the application of the new framework.

The p-value controversy is not new. As early as 1970, Morrison and Henkel edited a reader titled The Significance Test Controversy (Aldine). The critiques of the Null Hypothesis Significance Test (NHST), however, did not prevent researchers in all disciplines from continuing to practice the methodology. The situation is different when the controversy reemerged in recent years:

  • In 2015, the editors of Basic and Applied Social Psychology (BASP) announced that the journal would no longer publish papers containing P values because the statistics were too often used to support lower-quality
 Read More

Conflict forecasting models provide information about potential future violence in a country or region. Toward the goal of improving sub-national and country level forecasts, I run a series of experiments devised around three broad criteria informed by the literatures on political violence and predictive modeling: (1) flexibility to capture nonlinear relationships, (2) an emphasis on the endogenous nature of violence, and (3) avoid overfitting, which is especially problematic with flexible algorithms and the types of highly disaggregate data used here. To meet these criteria, I experimented  Read More

The talk will draw on Dr. King-wa Fu’s decade long experience in researching social media in China and Hong Kong, including China’s Internet censorship research: Weiboscope, WeChatscope, and more recently on Zhihu; and Hong Kong’s social movement mobilization on Telegram Channels. Dr. King-wa Fu will outline not only his lessons learned in these projects when collecting and analyzing data, but also put forward the opportunity for future study.

The biggest challenge in empirical work is to get our statistical models to correctly represent the politics of what we are studying. For example, Donald Trump raised voter turnout. So did Franklin Roosevelt and Adolf Hitler. Strong preferences motivate voters to go to the polls. Yet studies of elections nearly always analyze vote choices and turnout separately, missing the politics that mobilizes voters. Researchers have long understood the theoretical limitation of doing so, but issues of parameter identification, computing power, unavailability of survey weighting, and complexity of  Read More

This talk introduces the theoretical and applied foundations of Bayesian statistical analysis in a manner appropriate to social and behavioral scientists The Bayesian paradigm is ideally suited to the type of data analysis required in these fields because it recognizes the mobility of population parameters, ncorporates prior knowledge that researchers possess, and updates estimates as new data are observed. Examples will be drawn from political science, public policy, and anthropology. Issues in Bayesian computing will also be discussed.

What do proportions of government budgets allocated to particular policy areas, support for political parties, and shares of total income for quartiles of a national population have in common? They are examples of compositional variables that evolve in important ways over time. In each case, we can express the outcome of interest as a set of relative proportions such that a gain in one area must be offset by a loss in another area or areas. In the Dynamic Pie Project, we are developing modeling strategies to allow researchers to test theories about the determinants of compositions as they  Read More