Data Analytics Colloquium

Upcoming webinars

Many political phenomena are hard to quantify. Take electoral fraud. In the 21st century, both democracies and autocracies held regular elections. How do we know if the majority of voters indeed supported the winner? To answer this question, scholars and experts should be able to separate “real” votes from fakes. This ability is equally crucial for theory-testing and policy-making. In this talk, Dr. Sobolev will discuss the evolution of measurement approaches in social sciences, using an example of yet another paramount phenomenon: mass protest behavior. The ability of citizens to  Read More

A fundamental challenge facing applied time-series analysts is how to draw inferences about long-run relationships (LRR) when we are uncertain whether the data contain unit roots. Unit root tests are notoriously unreliable and often leave analysts uncertain, but popular extant methods hinge on correct classification. Webb, Linn, and Lebo (WLL; 2019) develop a framework for inference based on critical value bounds for hypothesis tests on the long-run multiplier (LRM) that eschews unit root tests and incorporates the uncertainty inherent in identifying the dynamic properties of the data into  Read More

To study the evolution of electoral preferences, Wlezien and Erikson (2002) propose assessing the relationship between pre-election vote intentions and the final vote for a set of elections. That is, they model poll data not as a set of different time series, which are difficult to analyze in most election years in most countries because of missing data and survey error, but as a series of cross-sections—across elections—for each day of the election ‘timeline.’ Although the method does not provide information about preference dynamics in particular election years, it does reveal  Read More

The biggest challenge in empirical work is to get our statistical models to correctly represent the politics of what we are studying.  For example, Donald Trump raised voter turnout.  So did Franklin Roosevelt and Adolf Hitler.  Strong preferences motivate voters to go to the polls.  Yet studies of elections nearly always analyze vote choices and turnout separately, missing the politics that mobilizes voters.  Researchers have long understood the theoretical limitation of doing so, but issues of parameter identification, computing power, unavailability of survey weighting, and complexity of  Read More

Big data problem taxes computation resources in many ways including RAM, storage, swapping, ability to parallelize, and the limitations of specific software packages to even perform the operations. There is no conventional way to estimate spacial models of this size. As a result we have been required to creatively reform matrix objects, use relatively obscure linear algebra relationships, break operations up into multiple discrete tasks, and consider hardware issues in new ways. The current solution is written in C++ code to run on AWS, which is labor intensive and ultimately expensive. Human  Read More

We present two studies arising from collaborations between the Rhode Island Department of Health and The Policy Lab @ Brown University to generate discussion of methods for causal inference in randomized and non-randomized studies. We present an adaptive randomized field experiment of SMS messages with roughly 160,000 people and a research design created by non-bipartite matching of about 3000 respondents to a series of surveys and associated sensitivity analysis of those results. We show some of the benefits that arise from collaborations between academics and governments as well as some of  Read More

Social network analysis is useful for identifying the structural relationships among social entries. To promote the use of social network analysis and develop better methods for social and behavioral research, we propose a general structural equation modeling framework in which a social network can be treated as a new type of observed variable. Under the framework, we have investigated the roles of social networks as a predictor, an outcome, and a mediator. In this talk, I will present three studies to illustrate the application of the new framework.

The p-value controversy is not new. As early as 1970, Morrison and Henkel edited a reader titled The Significance Test Controversy (Aldine). The critiques of the Null Hypothesis Significance Test (NHST), however, did not prevent researchers in all disciplines from continuing to practice the methodology. The situation is different when the controversy reemerged in recent years:

  • In 2015, the editors of Basic and Applied Social Psychology (BASP) announced that the journal would no longer publish papers containing P values because the statistics were too often used to support lower-quality
 Read More

Conflict forecasting models provide information about potential future violence in a country or region. Toward the goal of improving sub-national and country level forecasts, I run a series of experiments devised around three broad criteria informed by the literatures on political violence and predictive modeling: (1) flexibility to capture nonlinear relationships, (2) an emphasis on the endogenous nature of violence, and (3) avoid overfitting, which is especially problematic with flexible algorithms and the types of highly disaggregate data used here. To meet these criteria, I experimented  Read More

The talk will draw on Dr. King-wa Fu’s decade long experience in researching social media in China and Hong Kong, including China’s Internet censorship research: Weiboscope, WeChatscope, and more recently on Zhihu; and Hong Kong’s social movement mobilization on Telegram Channels. Dr. King-wa Fu will outline not only his lessons learned in these projects when collecting and analyzing data, but also put forward the opportunity for future study.