Numerai, a hedge fund based in San Francisco, has gained a following since launching this month as the first hedge fund that gives stock market data to machine learning scientists using structure-preserving encryption. The fund allows open participation by data scientists around the world. Founder Richard Craib and…
Numerai, a hedge fund based in San Francisco, has gained a following since launching this month as the first hedge fund that gives stock market data to machine learning scientists using structure-preserving encryption. The fund allows open participation by data scientists around the world.
Founder Richard Craib and his company recently completed their first round of venture funding, led by the New York venture capital firm Union Square Ventures, according to Wired. Union Square has invested $3 million in the round, with an additional $3 million coming from others.
Numerai has built a technology that masks the fund’s trading data before sharing it with a community of anonymous data scientists. Using a method similar to “homomorphic encryption,” the technology ensures that the scientists can’t see the details of the company’s proprietary trades, but also organizes data so the scientists can build machine learning models that provide better ways of trading securities.
Anyone can submit predictions back to Numerai using the shared data. If Numerai uses the submitted material, it pays for it in bitcoin.
Because the scientists work from encrypted data, they can’t use their machine learning models on other data, and neither can Numerai. But Craib believes the data will result in a better hedge fund.
Numerai has been trading stocks for year. While Craib will not say how successful it has been, he claims to be profitable.
Sources like Yahoo! Finance provide a lot of data. But most high-quality stock market data remains unavailable to the public.
Much data is guarded by hedge funds and data monopolies. Incentives are such that good datasets will become more secretive and more expensive. But unearthing stock market data won’t happen in plain sight. There is no free, public high-quality dataset for machine learning.
Data scientists not working on Wall Street have had no way to participate in the progress toward more efficient markets. This is happening despite the fact that data science has become more democratized through freely available tools like TensorFLow and Theano, cloud computing resources, machine learning communities such as Kaggle, Andrew Ng’s Coursera course and free books like “The Elements of Statistical Learning.”
Breakthroughs in artificial intelligence are limited in what benefits they yield unless there is a way to share the data, Craib noted.
Encryption offers a way to secure data. Encrypting data makes it useless to data scientists. But new cryptography developments are enabling the sharing of datasets securely without compromising their use to data scientists. New encryption schemes enable machine learning algorithms to discover things that are blind to the raw data.
“Homomorphic” encryption schemes like Vercuateren scheme and the Fan allow mathematical operations on high-degree polynomial ciphertexts in an algebraic ring. If addition and multiplication are preserved, the structure is too. Because machine learning algorithms care only about structure, this breakthrough permits machine learning algorithms on encrypted data.
Simpler schemes such as order-preserving symmetric encryption provide strong security in some settings, and can be used with out-of-the-box machine learning tools.
Craib developed a way to transform a small amount of data into an attractable binary classification problem while working with financial data at an asset management company. He claims he was able to train a machine learning algorithm on the data.
Numerai invested around $50 million to create a system that outperformed the market “significantly.”
Craib was motivated to figure out a way to use cryptography to share his data set with other machine learning experts, thinking someday people would build better models than his.
Numerai launched on Dec. 1, 2015 and hit the top of the r/machine learning until Elon Musk and Sam Altman and his billion-dollar OpenAI project surpassed it. In its first month, Numerai users uploaded 10,292 prediction sets for a total of 200,098,002 equity price predictions.
The error rate has continued to fall as users discover new techniques.
Numerai is now trading user generated predictions in its hedge fund — Numerai Fund 1, LP.
Image from Shutterstock and Numerai.
Last modified: January 26, 2020 12:03 AM UTC