In this article, the authors analyze the relation between stock market liquidity and real-time measures of sentiment obtained from the social-media platforms StockTwits and Twitter. The authors find that extreme sentiment corresponds to higher demand for and lower supply of liquidity, with negative sentiment having a much larger effect on demand and supply than positive sentiment. Their intraday event study shows that booms and panics end when bullish and bearish sentiment reach extreme levels, respectively. After extreme sentiment, prices become more mean-reverting and spreads narrow. To quantify the magnitudes of these effects, the authors conduct a historical simulation of a market-neutral mean-reversion strategy that uses social-media information to determine its portfolio allocations. These results suggest that the demand for and supply of liquidity are influenced by investor sentiment and that market makers who can keep their transaction costs to a minimum are able to profit by using extreme bullish and bearish emotions in social media as a real-time barometer for the end of momentum and a return to mean reversion.Available Here >
Breakthroughs in computing hardware, software, telecommunications, and data analytics have transformed the financial industry, enabling a host of new products and services such as automated trading algorithms, crypto-currencies, mobile banking, crowdfunding, and robo-advisors. However, the unintended consequences of technology-leveraged finance include firesales, flash crashes, botched initial public offerings, cybersecurity breaches, catastrophic algorithmic trading errors, and a technological arms race that has created new winners, losers, and systemic risk in the financial ecosystem. These challenges are an unavoidable aspect of the growing importance of finance in an increasingly digital society. Rather than fighting this trend or forswearing technology, the ultimate solution is to develop more robust technology capable of adapting to the foibles in human behavior so users can employ these tools safely, effectively, and effortlessly. Examples of such technology are provided.Download (PDF) >
We apply machine-learning techniques to predict drug approvals and phase transitions using drug-development and clinical-trial data from 2003 to 2015 involving several thousand drug-indication pairs with over 140 features across 15 disease groups. Imputation methods are used to deal with missing data, allowing us to fully exploit the entire dataset, the largest of its kind. We achieve predictive measures of 0.74, 0.78, and 0.81 AUC for predicting transitions from phase 2 to phase 3, phase 2 to approval, and phase 3 to approval, respectively. Using five-year rolling windows, we document an increasing trend in the predictive power of these models, a consequence of improving data quality and quantity. The most important features for predicting success are trial outcomes, trial status, trial accrual rates, duration, prior approval for another indication, and sponsor track records. We provide estimates of the probability of success for all drugs in the current pipeline.Available Here >
The agglomeration of rules and regulations over time has produced a body of legal code that no single individual can fully comprehend. This complexity produces inefficiencies, makes the processes of understanding and changing the law difficult, and frustrates the fundamental principle that the law should provide fair notice to the governed. In this Article, we take a quantitative, unbiased, and software engineering approach to analyze the evolution of the United States Code from 1926 to today. Software engineers frequently face the challenge of understanding and managing large, structured collections of instructions, directives, and conditional statements, and we adapt and apply their techniques to the U.S. Code over time. Our work produces insights into the structure of the U.S. Code as a whole, its strengths and vulnerabilities, and new ways of thinking about individual laws. For example, we identify the first appearance and spread of important terms in the U.S. Code like “whistleblower” and “privacy.” We also analyze and visualize the network structure of certain substantial reforms, including the Patient Protection and Affordable Care Act and the Dodd-Frank Wall Street Reform and Consumer Protection Act, and show how the interconnections of references can increase complexity and create the potential for unintended consequences. Our work is a timely illustration of computational approaches to law as the legal profession embraces technology for scholarship in order to increase efficiency and to improve access to justice.Available Here >
This Article proposes a novel and provocative analysis of judicial opinions that are published without indicating individual authorship. Our approach provides an unbiased, quantitative, and computer scientific answer to a problem that has long plagued legal commentators.
United States courts publish a shocking number of judicial opinions without divulging the author. Per curiam opinions, as traditionally and popularly conceived, are a means of quickly deciding uncontroversial cases in which all judges or justices are in agreement. Today, however, unattributed per curiam opinions often dispose of highly controversial issues, frequently over significant disagreement within the court. Obscuring authorship removes the sense of accountability for each decision’s outcome and the reasoning that led to it. Anonymity also makes it more difficult for scholars, historians, practitioners, political commentators, and—in the thirty-nine states with elected judges and justices—the electorate, to glean valuable information about legal decision-makers and the way they make their decisions. The value of determining authorship for unsigned opinions has long been recognized but, until now, the methods of doing so have been cumbersome, imprecise, and altogether unsatisfactory.
Our work uses natural language processing to predict authorship of judicial opinions that are unsigned or whose attribution is disputed. Using a dataset of Supreme Court opinions with known authorship, we identify key words and phrases that can, to a high degree of accuracy, predict authorship. Thus, our method makes accessible an important class of cases heretofore inaccessible. For illustrative purposes, we explain our process as applied to the Obamacare decision, in which the authorship of a joint dissent was subject to significant popular speculation. We conclude with a chart predicting the author of every unsigned per curiam opinion during the Roberts Court.Download (PDF) >
In this paper we provide a brief survey of algorithmic trading, review the major drivers of its emergence and popularity, and explore some of the challenges and unintended consequences associated with this brave new world. There is no doubt that algorithmic trading has become a permanent and important part of the financial landscape, yielding tremendous cost savings, operating efficiency, and scalability to every financial market it touches. At the same time, the financial system has become much more of a system than ever before, with globally interconnected counterparties and privately-owned and -operated infrastructure that facilitates tremendous integration during normal market conditions, but which spreads dislocation rapidly during period of financial distress. A more systematic and adaptive approach to regulating this system is needed, one that fosters the technological advances of the industry while protecting those who are not as technologically advanced. We conclude by proposing “Financial Regulation 2.0,” a set of design principles for regulating the financial system of the Digital Age.Download (PDF) >
To reduce risk, investors seek assets that have high expected return and are unlikely to move in tandem. Correlation measures are generally used to quantify the connections between equities. The 2008 financial crisis, and its aftermath, demonstrated the need for a better way to quantify these connections. We present a machine learning-based method to build a connectedness matrix to address the shortcomings of correlation in capturing events such as large losses. Our method uses an unconstrained optimization to learn this matrix, while ensuring that the resulting matrix is positive semi-definite. We show that this matrix can be used to build portfolios that not only “beat the market,” but also outperform optimal (i.e., minimum variance) portfolios.Download (PDF) >
Unlike other industries in which intellectual property is patentable, the financial industry relies on trade secrecy to protect its business processes and methods, which can obscure critical financial risk exposures from regulators and the public. We develop methods for sharing and aggregating such risk exposures that protect the privacy of all parties involved and without the need for a trusted third party. Our approach employs secure multi-party computation techniques from cryptography in which multiple parties are able to compute joint functions without revealing their individual inputs. In our framework, individual financial institutions evaluate a protocol on their proprietary data which cannot be inverted, leading to secure computations of real-valued statistics such a concentration indexes, pairwise correlations, and other single- and multi-point statistics. The proposed protocols are computationally tractable on realistic sample sizes. Potential financial applications include: the construction of privacy-preserving real-time indexes of bank capital and leverage ratios; the monitoring of delegated portfolio investments; financial audits; and the publication of new indexes of proprietary trading strategies.Download (PDF) >
We propose to study market efficiency from a computational viewpoint. Borrowing from theoretical computer science, we define a market to be efficient with respect to resources S (e.g., time, memory) if no strategy using resources S can make a profit. As a first step, we consider memory-m strategies whose action at time t depends only on the m previous observations at times t – m,…t – 1. We introduce and study a simple model of market evolution, where strategies impact the market by their decisions to buy or sell. We show that the effect of optimal strategies using memory m can lead to “market conditions” that were not present initially, such as (1) market bubbles and (2) the possibility for a strategy using memory m′ > m to make a bigger profit than was initially possible. We suggest ours as a framework to rationalize the technological arms race of quantitative trading firms.Download (PDF) >
Using account level credit-card datafrom six major commercial banks from January 2009 to December 2013, we apply machine-learning techniques to combined consumer-tradeline, credit-bureau, and macroeconomic variables to predict delinquency. In addition to providing accurate measures of loss probabilities and credit risk, our models can also be used to analyze and compare risk management practices and the drivers of delinquency across the banks. We find substantial heterogeneity in risk factors, sensitivities, and predictability of delinquency across banks, implying that no single model applies to all six institutions. We measure the efficacy of a bank’s risk-management process by the percentage of delinquent accounts that a bank manages effectively, and find that efficacy also varies widely across institutions. These results suggest the need for a more customized approached to the supervision and regulation of financial institutions, in which capital ratios, loss reserves, and other parameters are specified individually for each institution according to its credit-risk model exposures and forecasts.Download (PDF) >